HK40013668A - Rna-guided nucleic acid modifying enzymes and methods of use thereof - Google Patents
Rna-guided nucleic acid modifying enzymes and methods of use thereof Download PDFInfo
- Publication number
- HK40013668A HK40013668A HK62020003591.8A HK62020003591A HK40013668A HK 40013668 A HK40013668 A HK 40013668A HK 62020003591 A HK62020003591 A HK 62020003591A HK 40013668 A HK40013668 A HK 40013668A
- Authority
- HK
- Hong Kong
- Prior art keywords
- casy
- activity
- polypeptide
- nucleic acid
- seq
- Prior art date
Links
Description
Cross-referencing
This application claims the benefit of U.S. provisional patent application No. 62/402,849 filed 2016, 9, 30, which is hereby incorporated by reference in its entirety.
Incorporation by reference of sequence listing provided in the Presence document
Hereby a sequence table in the text file "BERK-343 WO _ SeqList _ st25. txt" created 2017, 9, 28 and having a size of 244KB is provided. The contents of the text file are incorporated by reference herein in their entirety.
Introduction to the design reside in
The CRISPR-Cas system is an example of a pathway not known by the scientific community prior to the time of DNA sequencing, now thought to confer acquired immunity to bacteriophages and viruses to bacteria and archaea. Intensive research over the past decade has revealed the biochemistry of this system. The CRISPR-Cas system consists of a Cas protein involved in the acquisition, targeting and cleavage of foreign DNA or RNA and a CRISPR array comprising forward repeats flanking short spacers that guide the Cas protein to its target. Class 2 CRISPR-Cas is a condensed version, in which a single Cas protein that binds to RNA is responsible for binding and cleaving the targeting sequence. The programmable nature of these minimal systems enables them to be used as a versatile technology that is revolutionizing the field of genome manipulation.
Current CRISPR-Cas technology is based on systems from cultured bacteria, leaving the vast majority of organisms that are not isolated in an undeveloped state. To date, only a few class 2 CRISPR/Cas systems have been discovered. There is a need in the art for additional class 2 CRISPR/Cas systems (e.g., a combination of Cas protein plus guide RNA).
Disclosure of Invention
The present disclosure provides RNA-guided endonuclease polypeptides, referred to herein as "CasY" polypeptides (also referred to as "CasY proteins"); a nucleic acid encoding a CasY polypeptide; and modified host cells comprising a CasY polypeptide and/or a nucleic acid encoding a CasY polypeptide. CasY polypeptides are useful in a variety of applications provided.
The present disclosure provides a guide RNA that binds to a CasY protein and provides sequence specificity for the CasY protein (referred to herein as a "CasY guide RNA"); a nucleic acid encoding a CasY guide RNA; and modified host cells comprising a CasY-directing RNA and/or a nucleic acid encoding a CasY-directing RNA. The CasY guide RNA can be used in a variety of applications provided.
The present disclosure provides methods for identifying CRISPR RNA-directed endonucleases.
Drawings
FIG. 1 depicts an example of a naturally occurring CasY protein sequence.
FIG. 2 depicts an alignment of naturally occurring CasY protein sequences.
Fig. 3 (panels a-b) depicts a schematic domain representation of CasY. The results of various searches attempting to identify homologues of CasY are also shown. Also depicted are portions of the CRISPR loci identified that contain CasY.
FIG. 4 depicts a schematic representation of the CasY and C2C3 loci. The interfering proteins are shown in green and the acquired proteins in red. The repeat sequence folded using the RNA structure is shown on the right, revealing a strong hairpin at the 5' end, indicating that the CRISPR array is self-processed by CasY.
Fig. 5 (fig. a to d) depicts experiments performed to determine the PAM sequence of CasY (PAM-dependent plasmid interference of CasY).
FIG. 6 (panels a-b) presents the 'repeat' sequence of a naturally occurring CasY guide RNA, along with an exemplary CasY guide RNA that hybridizes to the target DNA. (from top to bottom, SEQ ID NOS: 11-15 and SEQ ID NO:20)
Fig. 7 (fig. a-b) presents a newly identified CRISPR-Cas system from a non-cultured organism. a, proportion of all bacteria and archaea with major pedigrees compared to none of the isolated representatives, based on the data of Hug et al32. The result highlights the fieldsLarge scale, yet unexplored biology. Archaea Cas9 and the novel CRISPR-CasY are only present in lineages without an isolated representation. b, locus structure of newly discovered CRISPR-Cas system.
Fig. 8 (fig. a-b) presents the identification of the arnan-1 CRISPR array diversity and the arnan-1 Cas9 PAM sequence. a CRISPR array reconstituted from 15 different AMD samples. White boxes represent repeating sequences and colored diamonds represent spacer sequences (identical spacer sequences are similar in color; unique spacer sequences are black). Conserved regions of the array (right) are highlighted. The diversity of the most recently obtained spacer sequences (left) indicates that the system is active. Analysis also including CRISPR fragments from the read data is presented in figure 14. b, a single putative viral contig reconstructed from AMD metagenomic data contained 56 protospacer sequences (red vertical bars) from the armam-1 CRISPR array. c, sequence analysis revealed a conserved 'NGG' PAM motif downstream of the protospacer on the non-target strand.
Fig. 9 (panels a to d) presents data showing CasX mediated programmable DNA interference in e. a, CasX plasmid interference assay diagram. Coli expressing the minimal CasX locus was transformed with a plasmid containing a spacer that matched sequences in the CRISPR array (target) or a plasmid containing a non-matching spacer (non-target). After transformation, cultures were plated and colony forming units (cfu) quantified. b, serial dilutions of E.coli expressing the Planctometes (Planctomycetes) CasX locus targeting spacer 1(sX.1) and transformed with the indicated target (sX1, CasX spacer 1; sX2, CasX spacer 2; NT, non-target). c, plasmid interference of the delta Proteobacteria (Deltaproteobacteria) CasX. Experiments were performed in triplicate and mean ± standard deviation are shown. d, PAM deletion assay of the CasX locus of the phylum Flavobacterium expressed in E.coli. Deletion of PAM sequences greater than 30-fold compared to control libraries was used to generate WebLogo.
Fig. 10 (fig. a to c) presents data showing that CasX is a double-guided CRISPR complex. a mapping of environmental RNA sequences (macrotranscriptome data) to the CasX CRISPR locus as shown below (red arrow, putative tracrRNA; white box, repeats; green diamonds, spacer). The inset shows a detailed view of the first repeating and spacing sequence. b, CasX double-stranded DNA interference pattern. The site of RNA processing is indicated by a black arrow. c, results of plasmid interference assays using putative tracrRNA of knock-out CasX locus (T, target; NT, non-target). Experiments were performed in triplicate and mean ± standard deviation are shown.
FIG. 11 (panels a to c) presents data showing that the expression of the CasY locus in E.coli is sufficient to cause DNA interference. a, map of the CasY locus and adjacent proteins. b, WebLogo with deletion of 5' PAM sequence greater than 3 fold relative to control library CasY. c, plasmid interference of E.coli expressing CasY.1 and transformed with a target containing the indicated PAM. Experiments were performed in triplicate and mean ± standard deviation are shown.
Fig. 12 (fig. a-b) presents the newly identified CRISPR-Cas in the context of known systems. a, a simplified phylogenetic tree of universal Cas1 proteins. Recording CRISPR type of known system on wedges and branches; the newly described system is shown in bold. Detailed Cas1 phylogeny is presented in supplementary data 2. B, the evolutionary assumption proposed to generate the archaeal type II system due to recombination between type II-B and type II-C loci.
Figure 13 presents the presence of archaea Cas9 from arnan-4 on many contigs with a degenerate CRISPR array. Cas9 from arnan-4 is highlighted in dark red on 16 different contigs. Proteins with putative domains or functions are labeled, whereas putative proteins are unlabeled. Fifteen of the contigs contained two degenerate forward repeats (one bp mismatch) and one single conserved spacer sequence. The remaining contigs contained only one forward repeat. Unlike ARMAN-1, no additional Cas protein is found adjacent Cas9 in ARMAN-4.
Figure 14 presents the complete reconstruction of the arnan-1 CRISPR array. Reconstruction of CRISPR arrays comprising reference assembly sequences and array segments reconstructed from short DNA reads. Green arrows indicate repeat sequences, and colored arrows indicate CRISPR spacers (identical spacers are the same color, while unique spacers are black). In CRISPR systems, the spacer is usually added unidirectionally, so the various spacers on the left are due to recent acquisitions.
FIG. 15 (panels a-b) shows the mapping of ARMAN-1 spacer sequences to the genomes of members of the archaeal flora. a, the protospacer sequence from ARMAN-1 (red arrow) maps to the genome of ARMAN-2, which is a nano archaea from the same environment. The six protospacer sequences uniquely map to a portion of the genome flanked by two Long Terminal Repeats (LTRs), and two additional protospacer sequences perfectly match within the LTRs (blue and green). This region may be a transposon, suggesting that the CRISPR-Cas system of ARMAN-1 plays a role in inhibiting the mobilization of this element. b, the protospacer sequence also maps to thermogenic archaea (I-plasma), another member of the Richmond Mine ecosystem found in the same sample as the ARMAN organism. The protospacer was clustered within the genomic region encoding the short putative protein, indicating that this might also represent a mobile element.
FIG. 16 (panels a through e) presents the predicted secondary structures of ARMAN-1crRNA and tracrRNA. a, CRISPR repeats and tracrRNA anti-repeats are depicted in black, while spacer-derived sequences are shown in a series of green N. No clear termination signal was predictable from the locus, so three different tracrRNA lengths were tested based on their secondary structure-69, 104 and 179 red, blue and pink respectively. b, engineered single guide RNA corresponding to the double guide in a. c, double guidance of ARMAN-4Cas9 with two different hairpins (75 and 122) at the 3' end of tracrRNA. d, engineered single guide RNA corresponding to the double guide in c. e, test conditions in an in vivo targeting assay for E.coli.
Figure 17 (figures a to b) presents the purification scheme of the in vitro biochemical study. a, ARMAN-1(AR1) and ARMAN-4(AR4) Cas9 were expressed and purified under a variety of conditions as outlined in the supplementary material. The proteins outlined in the blue boxes were tested in vitro for cleavage activity. b, fractions purified from AR1-Cas9 and AR4-Cas9 were separated on a 10% SDS-PAGE gel.
Figure 18 presents a newly identified CRISPR-Cas system compared to known proteins. Similarity of CasX and CasY to known proteins based on the following search: (1) blast search of non-redundant (NR) protein databases for NCBI, (2) Hidden Markov Model (HMM) search of HMM databases for all known proteins and (3) far homology search using HHpred30。
FIG. 19 (FIGS. a to d) presents data relating to DNA interference with CasX-mediated programming. a plasmid interference assays for CasX2 (Deuteromycota) and CasX1 (DeltaProteobacteria) are shown in FIG. 9, Panel c (sX1, CasX spacer 1; sX2, CasX spacer 2; NT, non-target). Experiments were performed in triplicate and mean ± standard deviation are shown. b, serial dilutions of E.coli expressing the CasX locus and transformed with the indicated target, FIG. 9 panel b. c, PAM deletion assay against CasX of δ proteobacteria, and d, CasX of phytophthora expressed in escherichia coli. PAM sequences that were deleted greater than the indicated PAM Deletion Value Threshold (PDVT) compared to the control library were used to generate WebLogo.
Figure 20 presents a clade of Cas9 homologs. Maximum likelihood phylogenetic tree of Cas9 protein showing the previously described system based on system type coloring: II-A is blue, II-B is green, and II-C is violet. Archaebacteria Cas9 clustered together with type II-C CRISPR-Cas system and two newly described bacterial Cas9 from uncultured bacteria.
FIG. 21 presents a table of cleavage conditions determined for Cas9 from ARMAN-1 and ARMAN-4.
Definition of
"heterologous" as used herein means a nucleotide or polypeptide sequence that is not present in a native nucleic acid or protein, respectively. For example, a heterologous polypeptide comprises an amino acid sequence from a protein other than a CasY polypeptide, relative to a CasY polypeptide. In some cases, a portion of a CasY protein from one species is fused to a portion of a CasY protein from a different species. Thus, the CasY sequences from each species can be considered to be heterologous with respect to each other. As another example, a CasY protein (e.g., a dCasY protein) can be fused to an active domain from a non-CasY protein (e.g., a histone deacetylase), and the sequence of the active domain can be considered a heterologous polypeptide (which is heterologous to the CasY protein).
The terms "polynucleotide" and "nucleic acid" are used interchangeably herein to refer to a polymeric form of nucleotides (ribonucleotides or deoxynucleotides) of any length. Thus, the term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or polymers comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The terms "polynucleotide" and "nucleic acid" are understood to include both single-stranded (such as sense or antisense strands) and double-stranded polynucleotides as may be suitable for use in the described embodiments.
The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymeric form of amino acids of any length, which may include genetically encoded and non-genetically encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes: fusion proteins, including but not limited to fusion proteins with heterologous amino acid sequences, fusions with heterologous and homologous leader sequences, with or without an N-terminal methionine residue; an immunolabeling protein; and the like.
As used herein, the term "naturally occurring" as applied to a nucleic acid, protein, cell, or organism refers to a nucleic acid, cell, protein, or organism that occurs in nature.
As used herein, the term "isolated" is intended to describe a polynucleotide, polypeptide or cell that is in an environment different from the environment in which it naturally occurs. The isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.
As used herein, the term "exogenous nucleic acid" refers to a nucleic acid that is not normally or naturally occurring in nature and/or is not produced by a given bacterium, organism, or cell. As used herein, the term "endogenous nucleic acid" refers to a nucleic acid that normally occurs in nature and/or a nucleic acid produced by a given bacterium, organism, or cell. An "endogenous nucleic acid" is also referred to as a "native nucleic acid" or a nucleic acid that is "native" to a given bacterium, organism, or cell.
As used herein, "recombinant" means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, and/or ligation steps that result in a construct with structural coding or non-coding sequences that are distinguishable from endogenous nucleic acids present in the native system. In general, a DNA sequence encoding a structural coding sequence can be assembled from a cDNA fragment and a short oligonucleotide linker, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid capable of being expressed from a recombinant transcription unit contained in a cell or in a cell-free transcription and translation system. Such sequences may be provided in open reading frame form uninterrupted by internal untranslated sequences or introns that are typically present in eukaryotic genes. Genomic DNA comprising related sequences may also be used in the formation of recombinant genes or transcription units. Sequences of untranslated DNA can be present at the 5 'or 3' end of an open reading frame, where such sequences do not interfere with the manipulation or expression of the coding region, and can actually function to modulate the production of the desired product by a variety of mechanisms (see "DNA regulatory sequences" below).
Thus, for example, the term "recombinant" polynucleotide or "recombinant" nucleic acid refers to a polynucleotide or nucleic acid that does not occur naturally, e.g., a polynucleotide or nucleic acid made by the artificial combination of two additional separate segments of sequence by human intervention. Such artificial combination is often accomplished by chemical synthesis means or by artificial manipulation of separate segments of nucleic acid (e.g., by genetic engineering techniques). This is typically done to replace codons with redundant codons that encode the same or conserved amino acids, while sequence recognition sites are typically introduced or removed. Alternatively, nucleic acid segments having the desired functions are ligated together to produce the desired combination of functions. Such artificial combination is often accomplished by chemical synthesis means or by artificial manipulation of separate segments of nucleic acid (e.g., by genetic engineering techniques).
Similarly, the term "recombinant" polypeptide refers to a non-naturally occurring polypeptide, such as a polypeptide made by human intervention from an artificial combination of two additional separate segments of an amino acid sequence. Thus, for example, a polypeptide comprising a heterologous amino acid sequence is recombinant.
By "construct" or "vector" is meant a recombinant nucleic acid, typically a recombinant DNA, which is generated for the purpose of expressing and/or propagating one or more specific nucleotide sequences, or which is used to construct other recombinant nucleotide sequences.
The terms "DNA regulatory sequence", "control element" and "regulatory element" are used interchangeably herein to refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate the expression of a coding sequence and/or the production of an encoded polypeptide in a host cell.
The terms "transformation" and "genetic modification" are used interchangeably herein and refer to a permanent or transient genetic change induced in a cell following the introduction of a novel nucleic acid (i.e., DNA foreign to the cell) into the cell. Genetic changes ("modifications") can be accomplished by introducing a novel nucleic acid into the genome of a host cell or by transient or stable maintenance of the novel nucleic acid as an episomal element. When the cell is a eukaryotic cell, permanent genetic changes are typically accomplished by introducing new DNA into the genome of the cell. In prokaryotic cells, permanent changes may be introduced into the chromosome or through extrachromosomal elements (such as plasmids and expression vectors) that may contain one or more selectable markers to aid in their maintenance in the recombinant host cell. Suitable methods for genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like. The choice of method will generally depend on the cell type to be transformed and the environment in which the transformation is to take place (i.e., in vitro, ex vivo or in vivo). A general discussion of these methods can be found in Ausubel et al, Short Protocols in Molecular Biology, 3 rd edition, Wiley & Sons, 1995.
"operably linked" refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. For example, a promoter is operably linked to a coding sequence if it affects the transcription or expression of the coding sequence. As used herein, the terms "heterologous promoter" and "heterologous control region" refer to promoters and other control regions not normally associated with a particular nucleic acid in nature. For example, a "transcriptional control region heterologous to a coding region" is a transcriptional control region that is not normally associated with a coding region in nature.
As used herein, "host cell" refers to an eukaryotic cell, a prokaryotic cell, or a cell (e.g., cell line) from a multicellular organism cultured as a unicellular entity in vivo or in vitro that can be used or has been used as a recipient for a nucleic acid (e.g., an expression vector), and includes progeny of the original cell that have been genetically modified by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be identical in morphology or in genomic or total DNA complement to the original parent due to natural, accidental, or deliberate mutation. A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which a heterologous nucleic acid (e.g., an expression vector) has been introduced. For example, the subject prokaryotic host cell is a genetically modified prokaryotic host cell (e.g., a bacterium) by introducing a heterologous nucleic acid, e.g., an exogenous nucleic acid that is exogenous to the prokaryotic host cell (typically not found in nature) or a recombinant nucleic acid that is not typically found in the prokaryotic host cell, into a suitable prokaryotic host cell; and the subject eukaryotic host cell is a genetically modified eukaryotic host cell by introducing into a suitable eukaryotic host cell a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell or a recombinant nucleic acid that is not normally found in the eukaryotic host cell.
The term "conservative amino acid substitution" refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids with aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consisting of asparagine and glutamine; a group of amino acids with aromatic side chains consists of phenylalanine, tyrosine and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine and histidine; and a group of amino acids having sulfur-containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
A polynucleotide or polypeptide has a certain percentage of "sequence identity" to another polynucleotide or polypeptide, meaning that the percentage of bases or amino acids are the same when aligned and are in the same relative position when two sequences are compared. Sequence similarity can be determined in a number of different ways. To determine sequence identity, sequences may be aligned using methods and computer programs including BLAST, available through the world wide web ncbi. See, e.g., Altschul et al (1990), J.mol.biol.215: 403-10. Another alignment algorithm is FASTA, available from the Genetic Computing Group (GCG) package of Oxford Molecular Group, Inc., of Madison, Wis. Other techniques for alignment are described in Methods in Enzymology, volume 266: computer Methods for Macromolecular Sequence Analysis (1996), Doolittle eds, academic Press, Inc., a division of Harbour Brace & Co., San Diego, California, USA. Of particular interest are alignment programs that allow gaps in the sequence. Smith-Waterman is a type of algorithm that allows gaps in sequence alignments. See Meth.mol.biol.70:173-187 (1997). In addition, the GAP program using Needleman and Wunsch alignment methods can be used to align sequences. See J.mol.biol.48: 443-.
As used herein, the term "treating" or the like refers to obtaining a desired pharmacological and/or physiological effect. The effect may be prophylactic in terms of completely or partially preventing a disease or a symptom thereof, and/or therapeutic in terms of a partial or complete cure for a disease and/or a side effect attributable to the disease. As used herein, "treatment" covers any treatment of a disease in a mammal (e.g., a human) and includes: (a) preventing disease development in a subject who may be predisposed to a disease but has not yet been diagnosed with said disease; (b) inhibiting the disease, i.e. arresting its development; and (c) relieving the disease, i.e., causing regression of the disease.
The terms "individual", "subject", "host" and "patient" are used interchangeably herein to refer to an individual organism, such as a mammal, including but not limited to murines, apes, humans, mammalian farm animals, mammalian sport animals and mammalian pets.
Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the stated limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
It must be noted that, as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a CasY polypeptide" includes a plurality of such polypeptides, and reference to "a guide RNA" includes reference to one or more guide RNAs and equivalents thereof known to those skilled in the art, and so forth. It is also to be noted that the claims may be drafted to exclude any optional element. Thus, such statements are intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only," and the like, in connection with the recitation of claim elements, or use of a "negative" limitation.
It is to be understood that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. All combinations that are embodiments of the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination were individually and explicitly disclosed. In addition, all subcombinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein as if each and every such subcombination was individually and specifically disclosed herein.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
Detailed Description
The present disclosure provides RNA-guided endonuclease polypeptides, referred to herein as "CasY" polypeptides (also referred to as "CasY proteins"); a nucleic acid encoding a CasY polypeptide; and modified host cells comprising a CasY polypeptide and/or a nucleic acid encoding a CasY polypeptide. CasY polypeptides are useful in a variety of applications provided.
The present disclosure provides a guide RNA that binds to a CasY protein and provides sequence specificity for the CasY protein (referred to herein as a "CasY guide RNA"); a nucleic acid encoding a CasY guide RNA; and modified host cells comprising a CasY-directing RNA and/or a nucleic acid encoding a CasY-directing RNA. The CasY guide RNA can be used in a variety of applications provided.
The present disclosure provides methods for identifying CRISPR RNA-directed endonucleases.
Composition comprising a metal oxide and a metal oxide
CRISPR/CASY proteins and guide RNAs
The CRISPR/Cas endonuclease (e.g., a CasY protein) interacts with (binds to) a corresponding guide RNA (e.g., a CasY guide RNA) to form a Ribonucleoprotein (RNP) complex that targets a specific site in a target nucleic acid by base pairing between the guide RNA and a target sequence within the target nucleic acid molecule. The guide RNA includes a nucleotide sequence (guide sequence) complementary to the sequence of the target nucleic acid (target site). Thus, the CasY protein forms a complex with the CasY guide RNA, and the guide RNA provides sequence specificity to the RNP complex through the guide sequence. The CasY protein of the complex provides site-specific activity. In other words, the CasY protein is directed to (e.g., stabilized at) a target site within a target nucleic acid sequence (e.g., a chromosomal sequence or an extrachromosomal sequence, such as an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) due to its association with the guide RNA.
The present disclosure provides compositions comprising a CasY polypeptide (and/or a nucleic acid encoding a CasY polypeptide) (e.g., where the CasY polypeptide can be a naturally occurring protein, a nickase CasY protein, a dCasY protein, a chimeric CasY protein, etc.). The present disclosure provides compositions comprising a CasY-guiding RNA (and/or a nucleic acid encoding a CasY-guiding RNA). The present disclosure provides a composition comprising: (a) a CasY polypeptide (and/or a nucleic acid encoding a CasY polypeptide) (e.g., wherein the CasY polypeptide can be a naturally occurring protein, a nickase CasY protein, a dCasY protein, a chimeric CasY protein, etc.) and (b) a CasY guide RNA (and/or a nucleic acid encoding a CasY guide RNA). The present disclosure provides a nucleic acid/protein complex (RNP complex) comprising: (a) a CasY polypeptide of the present disclosure (e.g., wherein the CasY polypeptide can be a naturally occurring protein, a nickase CasY protein, a dCasY protein, a chimeric CasY protein, etc.); and (b) a CasY guide RNA.
Casy protein
A CasY polypeptide (which term is used interchangeably with the term "CasY protein") can bind to and/or modify (e.g., cleave, nick, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with a target nucleic acid (e.g., methylation or acetylation of a histone tail) (e.g., in some cases, a CasY protein includes a fusion partner having activity, and in some cases, a CasY protein provides nuclease activity). In some cases, the CasY protein is a naturally occurring protein (e.g., naturally occurring in a prokaryotic cell). In other cases, the CasY protein is not a naturally occurring polypeptide (e.g., the CasY protein is a variant CasY protein, a chimeric protein, etc.).
The assay to determine whether a given protein interacts with a CasY-directed RNA may be any convenient binding assay that tests for binding between the protein and the nucleic acid. Suitable binding assays (e.g., gel migration assays) are known to those of ordinary skill in the art (e.g., assays that include the addition of CasY guide RNAs and proteins to a target nucleic acid). The assay to determine whether a protein is active (e.g., to determine whether a protein has nuclease activity to cleave a target nucleic acid and/or some heterologous activity) can be any convenient assay (e.g., any convenient nucleic acid cleavage assay that tests for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) are known to those of ordinary skill in the art.
Naturally occurring CasY proteins function as endonucleases catalyzing double strand breaks at specific sequences in the targeted double stranded dna (dsdna). Sequence specificity is provided by an associated guide RNA that hybridizes to a target sequence within the target DNA. The naturally occurring CasY guide RNA is a crRNA, wherein the crRNA comprises (i) a guide sequence that hybridizes to a target sequence in a target DNA and (ii) a protein-binding segment comprising a stem-loop (hairpin-dsRNA duplex) that binds to a CasY protein.
In some embodiments, the CasY protein of the subject methods and/or compositions is (or is derived from) a naturally occurring (wild-type) protein. Examples of naturally occurring CasY proteins are depicted in FIG. 1 and are shown in SEQ ID NOS: 1-7. Examples of naturally occurring CasY proteins are depicted in FIG. 1 and are shown in SEQ ID NOS: 1-8. An alignment of exemplary naturally occurring CasY proteins is presented in figure 2 (proteins labeled "Y1.", "Y2.", "Y3.", etc.). A partial DNA scaffold of 7 naturally occurring CasYCRISPR loci (assembled from sequencing data) is shown in SEQ ID NOS: 21-27. It is important to note that this newly discovered protein (CasY) is shorter than the previously identified CRISPR-Cas endonucleases, and thus using this protein as an alternative offers the advantage of a relatively shorter nucleotide sequence encoding the protein. For example, where a nucleic acid encoding a CasY protein is desired, such as where a viral vector (e.g., an AAV vector) is used, this can be useful for delivery to cells such as eukaryotic cells (e.g., mammalian cells, human cells, mouse cells, in vitro, ex vivo, in vivo) for research and/or clinical applications. It is also noted herein that bacteria carrying the CasY CRISPR locus are present in environmental samples taken at low temperatures (e.g., 10 ℃ -17 ℃). Therefore, CasY is expected to function well (e.g., better than other Cas endonucleases found to date) at low temperatures (e.g., 10 ℃ -14 ℃, 10 ℃ -17 ℃, 10 ℃ -20 ℃).
In some cases, a CasY protein (of a subject composition and/or method) comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to a CasY protein sequence set forth in SEQ ID No. 1. For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 1. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 1. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 1. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 1. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:1, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence that has 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 2. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 2. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 2. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 2. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 2. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:2, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 3. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 3. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 3. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO. 3. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 3. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:3, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 4. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 4. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 4. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO. 4. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 4. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:4, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence that has 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 5. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 5. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 5. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 5. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 5. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:5, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO 6. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 6. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 6. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO 6. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO 6. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO 6, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence that has 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO. 7. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO. 7. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO. 7. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO. 7. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 7. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO 7, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence that has 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 8. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 8. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 8. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO: 8. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 8. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:8, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO. 9. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO. 9. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the CasY protein sequence set forth in SEQ ID NO: 9. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the sequence of the CasY protein set forth in SEQ ID NO. 9. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO. 9. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in SEQ ID NO:9, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-4. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-4. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-4. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID NOS: 1-4. In some cases, the CasY protein comprises an amino acid sequence having the sequence of a CasY protein set forth in any one of SEQ ID NOS: 1-4. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein set forth in any one of SEQ id nos 1-4, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally-occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-5. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-5. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-5. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID NOS: 1-5. In some cases, the CasY protein comprises an amino acid sequence having the sequence of a CasY protein set forth in any one of SEQ ID NOS: 1-5. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein set forth in any one of SEQ id nos 1-5, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally-occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-7. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-7. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-7. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID NOS: 1-7. In some cases, the CasY protein comprises an amino acid sequence having the sequence of a CasY protein set forth in any one of SEQ ID NOS: 1-7. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein set forth in any one of SEQ id nos 1-7, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally-occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
In some cases, the CasY protein comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-8. In some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-8. In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID Nos. 1-8. In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to any of the CasY protein sequences set forth in SEQ ID NOS: 1-8. In some cases, the CasY protein comprises an amino acid sequence having the sequence of a CasY protein set forth in any one of SEQ ID NOS: 1-8. In some cases, the CasY protein comprises an amino acid sequence having the sequence of the CasY protein shown in any one of SEQ ID NOs: 1-8, except that the sequence includes amino acid substitutions (e.g., 1, 2, or 3 amino acid substitutions) that reduce the naturally-occurring catalytic activity of the protein (e.g., such as, for example, at the amino acid positions described below).
Casy protein domains
The domains of the CasY protein are depicted in FIG. 3. As can be seen in the schematic representation of FIG. 3 (amino acids numbering based on the CasY1 protein (SEQ ID NO: 1)), the CasY protein comprises an N-terminal domain of approximately 800-1000 amino acids in length (e.g., about 815 for CasY1 and 980 for CasY 5) and a C-terminal domain comprising 3 partial RuvC domains (RuvC-I, RuvC-II and RuvC-III, also referred to herein as subdomains) which are discontinuous with respect to the primary amino acid sequence of the CasY protein but which form the RuvC domain when the protein is produced and folded. Thus, in some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having an N-terminal domain (e.g., excluding any fused heterologous sequence, such as an NLS and/or a catalytically active domain) that is within a range of 750 to 1050 amino acids in length (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids). In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) that is N-terminal to a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III) that is within a range of 750 to 1050 amino acids in length (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids).
In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a). For example, in some cases, a CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having amino acids 1 to 812 of the CasY protein sequence shown in SEQ ID NO: 1.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence represented by any one of SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a). For example, in some cases, a CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence set forth in any one of SEQ ID NOs: 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any one of SEQ ID NOs 1-4 corresponding to amino acids 1-812 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence represented by any one of SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a). For example, in some cases, a CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence set forth in any one of SEQ ID NOs: 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any one of SEQ ID NOs 1 to 5 corresponding to amino acids 1 to 812 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence represented by any one of SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a). For example, in some cases, a CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence set forth in any one of SEQ ID NOs: 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any one of SEQ ID NOs 1 to 7 corresponding to amino acids 1 to 812 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence represented by any one of SEQ ID NOs 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a). For example, in some cases, a CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of a CasY protein sequence set forth in any one of SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of the CasY protein sequence set forth in any one of SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any one of SEQ ID NOs 1-8 corresponding to amino acids 1-812 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). For example, in some cases, a CasY protein comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, the CasY protein comprises an amino acid sequence corresponding to amino acids 1-812 of the CasY protein sequence set forth in SEQ ID NO: 1; and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III).
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). For example, in some cases, a CasY protein comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, the CasY protein comprises an amino acid sequence corresponding to amino acids 1-812 of the CasY protein sequence set forth in SEQ ID NO: 1; and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III).
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). For example, in some cases, a CasY protein comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, the CasY protein comprises an amino acid sequence corresponding to amino acids 1-812 of the CasY protein sequence set forth in SEQ ID NO: 1; and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III).
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). For example, in some cases, a CasY protein comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3 a); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 80% or greater sequence identity (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, a CasY protein comprises a first amino acid sequence having 90% or greater sequence identity (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs: 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in FIG. 3A); and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III). In some cases, the CasY protein comprises an amino acid sequence corresponding to amino acids 1-812 of the CasY protein sequence set forth in SEQ ID NO: 1; and a second amino acid sequence C-terminal to the first amino acid sequence that includes a separate Ruv C domain (e.g., 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III).
In some embodiments, the separate RuvC domain of the CasY protein (of the subject compositions and/or methods) includes a region between the RuvC-II and RuvC-III subdomains that is larger than the RuvC-III subdomains. For example, in some cases, the ratio of the length of the region between the RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is 1.1 or greater (e.g., 1.2). In some cases, the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1). In some cases, the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of the RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, or between 1 and 1.2).
In some embodiments (for the CasY protein of the subject compositions and/or methods), the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less). For example, in some cases, the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less). In some embodiments, the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4).
In some cases (for the CasY protein of the subject compositions and/or methods), the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomain is greater than 1. In some cases, the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of the RuvC-III subdomain is greater than 1 and between 1 and 1.3 (e.g., between 1 and 1.2).
In some cases (for the CasY proteins of the subject compositions and/or methods), the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65, 68, or 70 amino acids in length). In some cases, the region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids).
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 75% or greater sequence identity (e.g., 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-4 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 75% or greater sequence identity (e.g., 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-5 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 75% or greater sequence identity (e.g., 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-7 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 75% or greater sequence identity (e.g., 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, a CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having 85% or more sequence identity (e.g., 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the N-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOs 1-8 (e.g., the domain depicted as amino acids 1-812 of CasY1 in fig. 3 a); and a second amino acid sequence at the C-terminus of the first amino acid sequence comprising 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III-wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence having an N-terminal domain (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) that has a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence (at the C-terminus of the first amino acid sequence) having a separate Ruv C domain (having 3 partial RuvC domains-RuvC-I, RuvC-II and RuvC-III), wherein: (i) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of RuvC-III subdomains is 1.1 or greater (e.g., 1.2); (ii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (iii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.4, between 1 and 1.3, between 1 and 1.2); (iv) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 2 or less (e.g., 1.8 or less, 1.7 or less, 1.6 or less, 1.5 or less, or 1.4 or less); (v) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is 1.5 or less (e.g., 1.4 or less); (vi) the ratio of the length of the RuvC-II subdomain to the length of the RuvC-III subdomain is in the range of 1 to 2 (e.g., 1.1 to 2, 1.2 to 2, 1 to 1.8, 1.1 to 1.8, 1.2 to 1.8, 1 to 1.6, 1.1 to 1.6, 1.2 to 1.6, 1 to 14, 1.1 to 1.4, or 1.2 to 1.4); (vii) the ratio of the length of the region between RuvC-II and RuvC-III subdomains to the length of the RuvC-III subdomains is greater than 1; (viii) the ratio of the length of the region between RuvC-II and RuvC-III subdomain to the length of RuvC-III subdomain is greater than 1 and between 1 and 1.5 (e.g., between 1 and 1.2); (ix) the region between the RuvC-II and RuvC-III subdomains is at least 60 amino acids in length (e.g., at least 65 or at least 70 amino acids in length); (x) The region between the RuvC-II and RuvC-III subdomains is at least 65 amino acids in length; (xi) The region between the RuvC-II and RuvC-III subdomains has a length in the range of 60-110 amino acids (e.g., in the range of 60-105, 60-100, 60-95, 60-90, 65-110, 65-105, 65-100, 65-95, or 65-90 amino acids); or (xii) the region between RuvC-II and RuvC-III subdomain has a length in the range of 65-95 amino acids.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or greater sequence identity (e.g., 30% or greater, 40% or greater, 50% or greater, 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the C-terminal domain of the CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the C-terminal domain of the CasY protein sequence set forth in SEQ ID NO:1 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of the CasY protein sequence shown in SEQ ID NO:1 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). In some cases, the CasY protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of the CasY protein sequence shown in SEQ ID NO:1 (e.g., the domain depicted in FIG. 3 as amino acids 812-1125 of CasY 1). In some cases, the CasY protein comprises the amino acid sequence of amino acids 812-1125 having the sequence of the CasY protein shown in SEQ ID NO: 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any of the CasY protein sequences set forth in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1-4 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the C-terminal domain of any of the CasY protein sequences set forth in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1-5 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any of the CasY protein sequences set forth in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1-7 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises an amino acid sequence having 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). For example, in some cases, the CasY protein comprises an amino acid sequence having 50% or greater sequence identity (e.g., 60% or greater, 70% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100% sequence identity) to the C-terminal domain of any of the CasY protein sequences set forth in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3 a). In some cases, the CasY protein comprises an amino acid sequence having 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises an amino acid sequence having 90% or more sequence identity (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences set forth in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acids 812-1125 in FIG. 3A as CasY 1). In some cases, the CasY protein comprises a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1-8 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence (N-terminal domain) (e.g., does not include any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). For example, in some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences represented by SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a sequence identity of 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-4 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1 to 4 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence (N-terminal domain) (e.g., does not include any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). For example, in some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences represented by SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a sequence identity of 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-5 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1 to 5 corresponding to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence (N-terminal domain) (e.g., does not include any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). For example, in some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-7 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a sequence identity of 90% or more (e.g.a sequence identity of 95% or more, 97% or more, 98% or more, 99% or more or 100%) with the C-terminal domain of any of the CasY protein sequences shown in SEQ ID NOS: 1-7 (e.g.the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID NO 1 to 7 corresponding to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID NO 1.
In some cases, the CasY protein (of the subject compositions and/or methods) comprises a first amino acid sequence (N-terminal domain) (e.g., does not include any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 20% or more sequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3A). For example, in some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequences, such as NLS and/or catalytically active domains) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 50% or more sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences represented by SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has 80% or more sequence identity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acid 812-1125 of CasY1 in FIG. 3 a). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a sequence identity of 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more or 100% sequence identity) to the C-terminal domain of any one of the CasY protein sequences shown in SEQ ID NOS: 1-8 (e.g., the domain depicted as amino acids 812-1125 of CasY1 in FIG. 3A). In some cases, a CasY protein comprises a first amino acid sequence (N-terminal domain) (e.g., excluding any fused heterologous sequence, such as NLS and/or catalytically active domain) having a length in the range of 750 to 1050 amino acids (e.g., 750 to 1025, 750 to 1000, 750 to 950, 775 to 1050, 775 to 1025, 775 to 1000, 775 to 950, 800 to 1050, 800 to 1025, 800 to 1000, or 800 to 950 amino acids); and a second amino acid sequence located C-terminally of the first amino acid sequence, which has a fragment of the amino acid sequence of any of the CasY protein sequences shown in SEQ ID Nos. 1 to 8 which corresponds to amino acids 812-1125 of the CasY protein sequence shown in SEQ ID No. 1.
Casy variants
A variant CasY protein has an amino acid sequence that differs by at least one amino acid (e.g., has a deletion, insertion, substitution, fusion) when compared to the amino acid sequence of the corresponding wild-type CasY protein. A CasY protein that cleaves one strand of a double-stranded target nucleic acid but does not cleave the other strand is referred to herein as a "nickase" (e.g., "nickase CasY"). A CasY protein having substantially no nuclease activity is referred to herein as a dead CasY protein ("dCasY") (it is noted that nuclease activity can be provided by a heterologous polypeptide (fusion partner) in the context of a chimeric CasY protein, as described in more detail below). For any of the CasY variant proteins described herein (e.g., nickase CasY, dCasY, chimeric CasY), a CasY variant can include a CasY protein sequence having the same parameters (e.g., domains present, percent identity, etc.) as described above.
Variant-catalytic Activity
In some cases, a CasY protein is a variant CasY protein, e.g., a protein that is mutated relative to a naturally-occurring catalytically active sequence and exhibits reduced cleavage activity (e.g., exhibits 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or less cleavage activity) when compared to a corresponding naturally-occurring sequence. In some cases, such variant CasY proteins are catalytic "dead" proteins (essentially without cleavage activity) and may be referred to as 'dCasY'. In some cases, the variant CasY protein is a nickase (cleaves only one strand of a double-stranded target nucleic acid (e.g., a double-stranded target DNA)). As described in more detail herein, in some cases, a CasY protein (in some cases, a CasY protein having wild-type cleavage activity and in some cases, a variant CasY having reduced cleavage activity, e.g., dCasY or nickase CasY) is fused (conjugated) to a heterologous polypeptide having an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein (chimeric CasY protein).
The catalytic residues of CasY, when numbered according to CasY1(SEQ ID NO:1), include D828, E914, D1074 (for SEQ ID NO:1 these residues are underlined in FIG. 1). (see, e.g., FIG. 2 alignment of FIG. a and FIG. b).
Thus, in some cases, the CasY proteins have reduced activity and one or more of the above amino acids (or one or more corresponding amino acids of any of the CasY proteins) is mutated (e.g., substituted with alanine). In some cases, the variant CasY protein is a catalytic 'dead' protein (catalytically inactive) and is referred to as 'dCasY'. The dCasY protein may be fused to an activity-providing fusion partner, and in some cases, dCasY (e.g., dCasY that does not provide a catalytically active fusion partner, but may have NLS when expressed in eukaryotic cells) may bind to the target DNA and may prevent translation of RNA polymerase from the target DNA. In some cases, the variant CasY protein is a nickase (cleaves only one strand of a double-stranded target nucleic acid (e.g., a double-stranded target DNA)).
Variant-chimeric CasY (i.e., fusion protein)
As noted above, in some cases, a CasY protein (in some cases, a CasY protein having wild-type cleavage activity and in some cases, a variant CasY having reduced cleavage activity, e.g., dCasY or nickase CasY) is fused (conjugated) to a heterologous polypeptide having an activity of interest (e.g., a catalytic activity of interest) to form a fusion protein (chimeric CasY protein). Heterologous polypeptides to which the CasY protein can be fused are referred to herein as "fusion partners".
In some cases, the fusion partner can modulate transcription (e.g., inhibit transcription, increase transcription) of the target DNA. For example, in some cases, a fusion partner is a protein (or domain from a protein) that inhibits transcription (e.g., a transcriptional repressor, a protein that acts through recruitment of transcriptional repressor proteins, modification of target DNA such as methylation, recruitment of DNA modifications, modulation of histones associated with target DNA, recruitment of histone modifications (such as those that modify acetylation and/or methylation of histones), and the like). In some cases, a fusion partner is a protein (or domain from a protein) that increases transcription (e.g., a transcriptional activator, a protein that acts through recruitment of transcriptional activator proteins, modification of target DNA such as methylation, recruitment of DNA modifications, modulation of histones associated with target DNA, recruitment of histone modifications (such as those that modify acetylation and/or methylation of histones), and the like).
In some cases, the chimeric CasY protein includes a heterologous polypeptide having an enzymatic activity that modifies a target nucleic acid (e.g., nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, or glycosylase activity).
In some cases, a chimeric CasY protein includes a heterologous polypeptide having an enzymatic activity (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desusumoylating activity, ribosylating activity, myristoylation activity, or demamylylation activity) that modifies a polypeptide (e.g., a histone) associated with a target nucleic acid.
Examples of proteins (or fragments thereof) that can be used to increase transcription include, but are not limited to: transcriptional activators such as VP16, VP64, VP48, VP160, p65 subdomain (e.g., from NFkB), and activation domains of EDLL and/or TAL activation domains (e.g., for activity in plants); histone lysine methyltransferases such as SET1A, SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, and the like; histone lysine demethylases such as JHDM2a/b, UTX, JMJD3, and the like; histone acetyltransferases such as GCN5, PCAF, CBP, P300, TAF1, TIP60/PLIP, MOZ/MYST3, MORF/MYST4, SRC1, ACTR, P160, CLOCK, etc.; and DNA demethylases such as 10-11 translocation (TET) dioxygenase 1(TET1CD), TET1, DME, DML1, DML2, ROS1 and the like.
Examples of proteins (or fragments thereof) that may be used to reduce transcription include, but are not limited to, transcription repressors such as Krppel-associated cassettes (KRAB or SKD), KOX repressor domains, Mad mSIN interaction domains (SID), ERF Repressor Domains (ERD), SRDX repressor domains (e.g., for repression in plants), etc., histone lysine methyltransferases such as Pr-SET/8, SUV-20H, RIZ, etc., histone lysine demethylases such as JMJD 2/JHDM 3, JMJD 2/GASC, JMJD2, JARID 1/RBP, JARID 1/PLU-1, JARID 1/SMCX, ZMID 1/SMCY, etc., histone lysine deacetylases such as HDAC, SIRT, HDAC, etc., DNA methylases such as HhaI 5-methyl transferase (HhaI.M), DNMT), DNA methyltransferase (DNA), DNA methyltransferase (CMT 3), plant nuclear transferase, DNA elements such as HDAC, DNT 3A, DNT 3B, DNA methyltransferase, etc., DNA methyltransferase (CMT), and the like.
In some cases, the fusion partner has an enzymatic activity that modifies a target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples of enzymatic activities that can be provided by a fusion partner include, but are not limited to: nuclease activities such as the activity provided by restriction enzymes (e.g., FokI nuclease), methyltransferase activities such as the activity provided by methyltransferases (e.g., hhal DNA m5 c-methyltransferase (m.hhal), DNA methyltransferase 1(DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3 (plant), ZMET2, CMT1, CMT2 (plant), etc.); demethylase activity such as the activity provided by demethylase (e.g., 10-11 translocation (TET) dioxygenase 1(TET1CD), TET1, DME, DML1, DML2, ROS1, etc.), DNA repair activity, DNA damage activity, deamination activity such as the activity provided by deaminase (e.g., cytosine deaminase, such as rat APOBEC1), dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity such as the activity provided by integrase and/or resolvase (e.g., Gin convertase, such as hyperactive mutant of Gin convertase, GinH 106Y; human immunodeficiency virus type 1 Integrase (IN); Tn3 resolvase, etc.), transposase activity, recombinase activity such as the activity provided by recombinase (e.g., catalytic domain of Gin), polymerase activity, ligase activity, recombinase activity, photolyase activity and glycosylase activity).
In some cases, the fusion partner has an enzymatic activity that modifies a protein (e.g., histone, RNA-binding protein, DNA-binding protein, etc.) associated with a target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples of enzymatic activities (modification of a protein associated with a target nucleic acid) that can be provided by a fusion partner include, but are not limited to: methyltransferase activity, such as that provided by Histone Methyltransferases (HMTs) (e.g., mottle inhibitor 3-9 homolog 1(SUV39H, also known as KMT 1), autosomal histone lysine methyltransferase 2(G9, also known as KMT1 and EHMT), SUV39H, ESET/SETDB, etc., SET1, MLL to 5, ASH, SYMD, NSD, DOT1, Pr-SET/8, SUV-20H, EZH, RIZ), and demethylase activity, such as that provided by histone demethylases (e.g., lysine demethylase 1A (KDM1, also known as LSD), JHDM 2/b, JMJD 2/DM 3, JMJD 2/GASC, JMJD2, JARID 1/RBP, JRID 1/PLU-1, JARID 1/SMCX, JARID 1/JCYX, and the like), and acetyltransferase activity, such as that provided by histone methyltransferases (e.g., such as acetyltransferase 300, e.g., human acetyltransferase, e.g., ACX, such as SUV, ACX, and SACK 1, GCN5, PCAF, CBP, TAF1, TIP60/PLIP, MOZ/MYST3, MORF/MYST4, HBO1/MYST2, HMOF/MYST1, SRC1, ACTR, P160, the catalytic core/fragment of CLOCK, etc.), deacetylase activity such as that provided by histone deacetylases (e.g., HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, etc.), kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deamidation activity, sumoylation activity, desugarization activity, ribosylation activity, deannosylation activity, myristoylation activity and deacylation activity.
Further examples of suitable fusion partners are dihydrofolate reductase (DHFR) destabilizing domains (e.g., to generate chemically controllable chimeric CasY proteins) and chloroplast transit peptides. Suitable chloroplast transit peptides include, but are not limited to:
MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKV NTDITSITSNGGRVKCMQVWPPIGKKKFETLSYLPPLTRDSRA (SEQ ID NO:83), MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKS (SEQ ID NO:84), MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDIT SITSNGGRVNCMQVWPPIEKKKFETLSYLPDLTDSGGRVNC (SEQ ID NO:85), MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISS SWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:86), MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPIS SSWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:87), MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLFCSFRISASVATAC (SEQ ID NO:88), MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRT VGASAAPKQSRKPHRFDRRCLSMVV (SEQ ID NO:89), MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDA TSLSVTTSARATPKQQRSVQRGSRRFPSVVVC (SEQ ID NO:90), MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLD ITSIASNGGRVQC (SEQ ID NO:91), MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVK CSAAVTPQASPVISRSAAAA (SEQ ID NO:92), and MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRT VKCCASSWNSTINGAAATTNGASAASS (SEQ ID NO: 93).
In some cases, a CasY fusion polypeptide of the present disclosure comprises: a) a CasY polypeptide of the present disclosure; and b) a chloroplast transit peptide. Thus, for example, the CRISPR-CasY complex can be targeted to chloroplasts. In some cases, such targeting can be achieved by the presence of an N-terminal extension, referred to as the Chloroplast Transit Peptide (CTP) or plastid transit peptide. If the expressed polypeptide is to be compartmentalized in a plant plastid (e.g., chloroplast), the chromosomal transgene from bacterial origin must have a sequence encoding a CTP sequence fused to the sequence encoding the expressed polypeptide. Thus, localization of the exogenous polypeptide to the chloroplast is typically achieved by operably linking a polynucleotide sequence encoding a CTP sequence to the 5' region of the polynucleotide encoding the exogenous polypeptide. During translocation to plastids, CTPs are removed in processing steps. However, the processing efficiency may be affected by the amino acid sequence of CTP and the sequence near the NH 2 terminus of the peptide. Other options that have been described for targeting the chloroplast are the maize cab-m7 signal sequence (U.S. Pat. Nos. 7,022,896, WO 97/41228), the pea glutathione reductase signal sequence (WO 97/41228) and the CTP described in US 2009029861.
In some cases, a CasY fusion polypeptide of the present disclosure can comprise: a) a CasY polypeptide of the present disclosure; and b) an endosomal escape peptide. In some cases, the endosomal escape polypeptide comprises the amino acid sequence GLFXALLXLXL LXLXL LLLXA (SEQ ID NO:94), wherein each X is independently selected from lysine, histidine and arginine. In some cases, the endosomal escape polypeptide comprises amino acid sequence GLFHALLHLLHSLWHLLLHA (SEQ ID NO: 95).
For examples of some of the above-described fusion partners (and more) used in the context of fusion to Cas9 proteins, zinc finger proteins, and/or TALE proteins (for site-specific target nucleic acid modification, transcription regulation, and/or target protein modification, e.g., histone modification), see, e.g.: nomura et al, J Am Chem Soc.2007, 7/18; 129(28) 8676-7; rivenbark et al, episenetics.2012, month 4; 350 to 60 percent (7), (4); nucleic Acids Res.2016, 7, 8; 5615-28 in 44 (12); gilbert et al, cell.2013, 7 months and 18 days; 154(2) 442-51; kearns et al, nat methods.2015, 5 months; 12, (5) 401-3; mendenhall et al, Nat Biotechnol.2013 for 12 months; 31(12) 1133-6; hilton et al, Nat biotechnol.2015 for 5 months; 33, (5) 510-7; gordley et al, ProcNatl Acad Sci U S.2009, 31.3 months; 106(13) 5053-8; akopion et al, Proc Natl Acadsi U S.2003, 7 months 22; 100(15) 8688-91; tan et al, J Virol.2006, month 2; 80(4) 1939-48; tan et al, Proc Natl Acad Sci U S A.2003, 10 months and 14 days; 100(21) 11997-; papworth et al, Proc Natl Acad Sci U S.2003, 18/2; 1621-6 parts of 100 (parts by weight); sanjana et al, Nat Protoc.2012 1/5; 171 (1) to 92; beerli et al, Proc Natl Acad Sci U SA.1998, 12, 8; 95(25) 14628-33; snowden et al, Curr biol.2002, 12 months 23 days; 2159-66 (12) (24); xu et al, Cell discov.2016, 5, month, 3; 16009; komor et al, Nature.2016, 4 months and 20 days; 533(7603) 420-4; chaikind et al, Nucleic Acids Res.2016, 11/8; choudhury et al, Oncotarget.2016, 23/6/2016; du et al, Cold Spring Harb protocol.2016, 1/4/year; pham et al, Methods Mol biol.2016; 1358: 43-57; balboa et al, Stem cellreports.2015, 9 months and 8 days; 448-59 in the step (5), (3); hara et al, Sci Rep.2015, 6 months and 9 days; 11221, and; piatek et al, Plant Biotechnol J.2015, 5 months; 13(4) 578-89; hu et al, Nucleic Acids Res.2014.4 months; 42(7) 4375-90; cheng et al, Cell Res.2013 for 10 months; 23(10) 1163-71; and Maeder et al, Nat methods.2013 for 10 months; 10(10):977-9.
Additional suitable heterologous polypeptides include, but are not limited to, polypeptides that directly and/or indirectly provide increased transcription and/or translation of the target nucleic acid (e.g., a transcriptional activator or fragment thereof, a protein that recruits a transcriptional activator or fragment thereof, a small molecule/drug responsive transcriptional and/or translational regulator, a translational regulator, etc.). Non-limiting examples of heterologous polypeptides that effect increased or decreased transcription include transcriptional activator domains and transcriptional repressor domains. In some such cases, the chimeric CasY polypeptide functions by directing the nucleic acid (guide RNA) to a specific location (i.e., sequence) in the target nucleic acid and to exert locus-specific regulation, such as blocking RNA polymerase binding to a promoter that selectively inhibits transcription activator function and/or modifying local chromatin state (e.g., modifying the target nucleic acid or modifying a polypeptide associated with the target nucleic acid when a fusion sequence is used). In some cases, the change is transient (e.g., transcriptional repression or activation). In some cases, the change is heritable (e.g., when the target nucleic acid or a protein associated with the target nucleic acid (e.g., a nucleosome histone) is epigenetically modified).
Non-limiting examples of heterologous polypeptides used when targeting a ssRNA target nucleic acid include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation factors, elongation factors, and/or release factors; e.g., eIF 4G); an RNA methylase; RNA editing enzymes (e.g., RNA deaminases, such as Adenosine Deaminase (ADAR) acting on RNA, including a to I and/or C to U editing enzymes); a helicase; RNA binding proteins, and the like. It is understood that a heterologous polypeptide can include the entire protein, or in some cases, can include a fragment (e.g., a functional domain) of the protein.
The heterologous polypeptide of the subject chimeric CasY polypeptide can be any domain, whether transient or irreversible, direct or indirect, capable of interacting with ssRNA (which for the purposes of this disclosure includes intramolecular and/or intermolecular secondary structures, e.g., double-stranded RNA duplexes such as hairpins, stem loops, etc.) including, but not limited to, effector domains selected from the group consisting of; endonucleases (e.g., rnase III, CRR22 DYW domain, Dicer and PIN (PilT N-terminal) domains from proteins such as SMG5 and SMG 6); proteins and protein domains responsible for stimulating RNA cleavage (e.g., CPSF, CstF, CFIm, and CFIIm); exonucleases (e.g., XRN-1 or exonuclease T); a de-adenylase (e.g., HNT 3); proteins and protein domains responsible for nonsense-mediated RNA decay (e.g., UPF1, UPF2, UPF3, UPF3b, RNP S1, Y14, DEK, REF2, and SRm 160); proteins and protein domains (e.g., PABP) responsible for stabilizing RNA; proteins and protein domains responsible for repressing translation (e.g., Ago2 and Ago 4); proteins and protein domains responsible for stimulating translation (e.g., Staufen); proteins and protein domains responsible for (e.g., capable of) regulating translation (e.g., translation factors such as initiation factors, extension factors, release factors, etc., e.g., eIF 4G); proteins and protein domains responsible for polyadenylation of RNA (e.g., PAP1, GLD-2, and Star-PAP); proteins and protein domains responsible for the polyuridylation of RNA (e.g., CI D1 and terminal uridyltransferase); proteins and protein domains responsible for RNA localization (e.g., from IMP1, ZBP1, She2p, She3p, and Bicaudal-D); proteins and protein domains responsible for nuclear retention of RNA (e.g., Rrp 6); proteins and protein domains responsible for nuclear export of RNA (e.g., TAP, NXF1, THO, TREX, REF, and Aly); proteins and protein domains responsible for repressing RNA splicing (e.g., PTB, Sam68, and hnRNP a 1); proteins and protein domains responsible for stimulating RNA splicing (e.g., serine/arginine (SR) -rich domains); proteins and protein domains responsible for reducing transcription efficiency (e.g., fus (tls)); and proteins and protein domains responsible for stimulating transcription (e.g., CDK7 and HIV Tat). Alternatively, the effector domain may be selected from the group comprising: an endonuclease; proteins and protein domains capable of stimulating RNA cleavage; an exonuclease; a de-adenylase enzyme; proteins and protein domains with nonsense-mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of regulating translation (e.g., translation factors such as initiation factors, extension factors, release factors, etc., e.g., eIF 4G); proteins and protein domains capable of performing polyadenylation of RNA; proteins and protein domains capable of undergoing polyuridylation of RNA; proteins and protein domains with RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains with RNA nuclear export activity; proteins and protein domains capable of repressing RNA splicing; proteins and protein domains capable of stimulating RNA splicing; proteins and protein domains capable of reducing transcription efficiency; and proteins and protein domains capable of stimulating transcription. Another suitable heterologous polypeptide is a purna binding domain, which is described in more detail in WO2012068627, which is incorporated herein by reference in its entirety.
Some RNA splicing factors that can be used as heterologous polypeptides (in whole or as fragments thereof) of chimeric CasY polypeptides have modular structures with separate sequence-specific RNA-binding modules and splicing effector domains. For example, members of the serine/arginine (SR) -rich protein family contain an N-terminal RNA Recognition Motif (RRM) that binds to the pre-mRNA and an Exon Splicing Enhancer (ESE) in the C-terminal RS domain that promotes exon inclusion. As another example, the hnRNP protein hnRNP a1 binds to an Exon Splicing Silencer (ESS) through its RRM domain and inhibits exon inclusion through a C-terminal glycine-rich domain. Some splicing factors may regulate alternative use of a splice site (ss) by binding regulatory sequences between the two alternative sites. For example, ASF/SF2 recognizes ESE and facilitates the use of intron proximal sites, whereas hnRNP A1 binds ESS and transfers splicing to the use of intron distal sites. One application of such factors is the generation of ESFs that regulate alternative splicing of endogenous genes, particularly disease-related genes. For example, Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5' splice sites to encode proteins with opposite functions. The long spliced isoform, Bcl-xL, is a potent inhibitor of apoptosis expressed in long-lived post-mitotic cells and is upregulated in many cancer cells, protecting the cells from apoptotic signals. The short isoform Bcl-xS is a pro-apoptotic isoform and is expressed at high levels in cells with high turnover rates (e.g., developing lymphocytes). The ratio of the two Bcl-x splice isoforms is defined by multiple positions in the core exon region or exon extension (i.e., between the two alternative 5' splice sites)And (5) adjusting the elements. For more examples, refer toSee WO2010075303, which is hereby incorporated by reference in its entirety.
Additional suitable fusion partners include, but are not limited to, proteins (or fragments thereof) that act as boundary elements (e.g., CTCF), proteins and fragments thereof that provide peripheral recruitment (e.g., lamin a, lamin B, etc.), protein docking elements (e.g., FKBP/FRB, Pil1/Aby1, etc.).
Examples of various additional suitable heterologous polypeptides (or fragments thereof) for the subject chimeric CasY polypeptides include, but are not limited to, those described in the following applications (the publications relate to other CRISPR endonucleases such as Cas9, but the described fusion partners may also be used with CasY): U.S. patent application: WO2010075303, WO2012068627 and WO2013155555 and can be found, for example, in the following U.S. patents and patent applications: 8,906,616, respectively; 8,895,308, respectively; 8,889,418, respectively; 8,889,356, respectively; 8,871,445, respectively; 8,865,406, respectively; 8,795,965, respectively; 8,771,945, respectively; 8,697,359; 20140068797, respectively; 20140170753, respectively; 20140179006, respectively; 20140179770, respectively; 20140186843, respectively; 20140186919, respectively; 20140186958, respectively; 20140189896, respectively; 20140227787, respectively; 20140234972, respectively; 20140242664, respectively; 20140242699, respectively; 20140242700, respectively; 20140242702, respectively; 20140248702, respectively; 20140256046, respectively; 20140273037, respectively; 20140273226, respectively; 20140273230, respectively; 20140273231, respectively; 20140273232, respectively; 20140273233, respectively; 20140273234, respectively; 20140273235, respectively; 20140287938, respectively; 20140295556, respectively; 20140295557, respectively; 20140298547, respectively; 20140304853, respectively; 20140309487, respectively; 20140310828, respectively; 20140310830, respectively; 20140315985, respectively; 20140335063, respectively; 20140335620, respectively; 20140342456, respectively; 20140342457, respectively; 20140342458, respectively; 20140349400, respectively; 20140349405, respectively; 20140356867, respectively; 20140356956, respectively; 20140356958, respectively; 20140356959, respectively; 20140357523, respectively; 20140357530, respectively; 20140364333, respectively; and 20140377868; all of which are hereby incorporated by reference in their entirety.
In some cases, the heterologous polypeptide (fusion partner) provides subcellular localization, i.e., the heterologous polypeptide contains a subcellular localization sequence (e.g., a Nuclear Localization Signal (NLS) for targeting the nucleus, a sequence for maintaining the fusion protein outside the nucleus (e.g., a Nuclear Export Sequence (NES)), a sequence that retains the fusion protein in the cytoplasm, a mitochondrial localization signal for targeting mitochondria, a chloroplast localization signal for targeting chloroplasts, an ER retention signal, etc.). In some embodiments, the CasY fusion polypeptide does not comprise an NLS, such that the protein is not targeted to the nucleus (which may be advantageous, for example, when the target nucleic acid is an RNA present in the cytosol). In some embodiments, the heterologous polypeptide can provide a tag (i.e., the heterologous polypeptide is a detectable label) that facilitates tracking and/or purification (e.g., a fluorescent protein, such as Green Fluorescent Protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, etc.; a histidine tag, such as a 6XHis tag; a Hemagglutinin (HA) tag; a FLAG tag; a Myc tag, etc.).
In some cases, a CasY protein (e.g., a wild-type CasY protein, a variant CasY protein, a chimeric CasY protein, a dCasY protein, a chimeric CasY protein in which the CasY portion has reduced nuclease activity, such as a dCasY protein fused to a fusion partner, etc.) comprises (is fused to) a Nuclear Localization Signal (NLS) (e.g., in some cases, 2 or more, 3 or more, 4 or more, or 5 or more NLS). Thus, in some cases, a CasY polypeptide comprises one or more NLS (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLS). In some cases, one or more NLS (2 or more, 3 or more, 4 or more, or 5 or more NLS) are located at or near the N-terminus and/or C-terminus (e.g., within 50 amino acids). In some cases, one or more NLS (2 or more, 3 or more, 4 or more, or 5 or more NLS) are located at or near the N-terminus (e.g., within 50 amino acids). In some cases, one or more NLS (2 or more, 3 or more, 4 or more, or 5 or more NLS) are located at or near the C-terminus (e.g., within 50 amino acids). In some cases, one or more NLS (3 or more, 4 or more, or 5 or more NLS) are located at or near both the N-terminus and the C-terminus (e.g., within 50 amino acids). In some cases, the NLS is located at the N-terminus and the NLS is located at the C-terminus.
In some cases, a CasY protein (e.g., a wild-type CasY protein, a variant CasY protein, a chimeric CasY protein, a dCasY protein, a chimeric CasY protein in which the CasY portion has reduced nuclease activity, such as a dCasY protein fused to a fusion partner, etc.) comprises (is fused to) between 1 and 10 NLS (e.g., 1-9, 1-8, 1-7, 1-6, 1-5, 2-10, 2-9, 2-8, 2-7, 2-6, or 2-5 NLS). In some cases, a CasY protein (e.g., a wild-type CasY protein, a variant CasY protein, a chimeric CasY protein, a dCasY protein, a chimeric CasY protein in which the CasY portion has reduced nuclease activity-such as a dCasY protein fused to a fusion partner, etc.) comprises (is fused to) between 2 and 5 NLSs (e.g., 2-4 or 2-3 NLSs).
Non-limiting examples of NLS include NLS sequences derived from: NLS of SV40 virus large T antigen having the amino acid sequence PKKKRKV (SEQ ID NO: 96); NLS from nucleoplasmin (e.g., nucleoplasmin bipartite NLS having sequence KRPAATKKAGQAKKKK (SEQ ID NO: 97)); c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO:98) or RQRRNELKRSP (SEQ ID NO: 99); hRNPA 1M 9 NLS having sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 100); the sequence RMRIZFKNKGKDTAELRRVEVSVAVLELRKAKKDEQILKRRNV (SEQ ID NO:101) from the IBB domain of nuclear import protein alpha (import in-alpha); the sequences VSRKRPRP (SEQ ID NO:102) and PPKKARED (SEQ ID NO:103) of the myoma T protein; the sequence PQPKKKPL of human p53 (SEQ ID NO: 104); sequence SALIKKKKKMAP of mouse c-abl IV (SEQ ID NO: 105); the sequences DRLRR (SEQ ID NO:106) and PKQKKRK (SEQ ID NO:107) of influenza virus NS 1; the sequence RKLKKKIKKL of the hepatitis virus delta antigen (SEQ ID NO: 108); the sequence REKKKFLKRR of the mouse Mx1 protein (SEQ ID NO: 109); sequence KRKGDEVDGVDEVAKKKSKK of human poly (ADP-ribose) polymerase (SEQ ID NO: 110); and sequence RKCLQAGMNLEARKTKK (SEQ ID NO:111) of the steroid hormone receptor (human) glucocorticoid. Generally, the NLS (or NLS (s)) is (are) of sufficient intensity to drive the accumulation of the CasY protein in detectable amounts in the nucleus of the eukaryotic cell. Detection of accumulation in the nucleus of the cell may be performed by any suitable technique. For example, a detectable label may be fused to the CasY protein such that the location within the cell can be visualized. The nuclei may also be isolated from the cells and their contents may then be analyzed by any suitable method of detecting proteins, such as immunohistochemistry, western blot or enzymatic activity assays. Accumulation in the nucleus can also be determined indirectly.
In some cases, a CasY fusion polypeptide comprises a "protein transduction domain" or PTD (also known as CPP-cell penetrating peptide), which refers to a polypeptide, polynucleotide, carbohydrate or organic or inorganic compound that facilitates crossing a lipid bilayer, micelle, cell membrane, organelle membrane or vesicle membrane. PTDs attached to another molecule (which may range from small polar molecules to large macromolecules and/or nanoparticles) facilitate the passage of the molecule across the membrane, for example from the extracellular space into the intracellular space or from the cytosol into the organelle. In some embodiments, the PTD is covalently linked to the amino terminus of the polypeptide (e.g., to a wild-type CasY to produce a fusion protein, or to a variant CasY protein (such as a dCasY, nickase CasY, or chimeric CasY protein) to produce a fusion protein). In some embodiments, the PTD is covalently linked to the carboxy terminus of the polypeptide (e.g., to a wild-type CasY to produce a fusion protein, or to a variant CasY protein (such as a dCasY, nickase CasY, or chimeric CasY protein) to produce a fusion protein). In some cases, the PTD is inserted into the CasY fusion polypeptide at a suitable insertion site (i.e., not at the N-terminus or C-terminus of the CasY fusion polypeptide). In some cases, a subject CasY fusion polypeptide comprises (conjugated to, fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases, the PTD includes a Nuclear Localization Signal (NLS) (e.g., in some cases, 2 or more, 3 or more, 4 or more, or 5 or more NLS). Thus, in some cases, a CasY fusion polypeptide comprises one or more NLS (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLS). In some embodiments, the PTD is covalently linked to a nucleic acid (e.g., a CasY-directing nucleic acid, a polynucleotide encoding a CasY fusion polypeptide, a donor polynucleotide, etc.). Examples of PTDs include, but are not limited to, the smallest undecendo polypeptide protein transduction domain (corresponding to residues 47-57 of HIV-1TAT comprising YGRKKRRQRRR; SEQ ID NO: 112); a poly-arginine sequence comprising an amount of arginine sufficient for introduction into a cell (e.g., 3, 4, 5, 6, 7, 8,9, 10, or 10-50 arginines); the VP22 domain (Zender et al (2002) Cancer Gene ther.9(6): 489-96); drosophila Antennapedia gene (Antennapedia) protein transduction domain (Noguchi et al (2003) Diabetes52(7): 1732-1737); truncated human calcitonin peptide (Trehin et al (2004) pharm. research 21: 1248-1256); polylysine (Wender et al (2000) Proc. Natl. Acad. Sci. USA 97: 13003-13008); RRQRRTSKLMKR (SEQ ID NO: 113); transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 114); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 115); and RQIKIWFQNRRMKWKK (SEQ ID NO: 116). Exemplary PTDs include, but are not limited to YGRKKRRQRRR SEQ ID NO:117), RKKRRQRRR SEQ ID NO: 118); an arginine homopolymer having from 3 arginine residues to 50 arginine residues; exemplary PTD domain amino acid sequences include, but are not limited to, any of the following sequences: YGRKKRRQRRR (SEQ ID NO: 119); RKKRRQRR (SEQ ID NO: 120); YARAAARQARA (SEQ ID NO: 121); THRLPRRRRRR (SEQ ID NO: 122); and GGRRARRRRRR (SEQ ID NO: 123). In some embodiments, the PTD is an Activatable CPP (ACPP) (Aguilera et al (2009) Integr Biol (Camb)6 months; 1(5-6): 371-. ACPP includes a polycationic CPP (e.g., Arg9 or "R9") linked to a matching polyanion (e.g., Glu9 or "E9") by a cleavable linker, which reduces the net charge to near zero and thereby inhibits adhesion and uptake into cells. When the linker is cleaved, the polyanion is released, exposing the polyarginine and its inherent adhesiveness locally, thereby "activating" the ACPP to traverse the membrane.
Linkers (e.g., for fusion partners)
In some embodiments, a subject CasY protein may be fused to a fusion partner via a linker polypeptide (e.g., one or more linker polypeptides). The linker polypeptide may have any of a variety of amino acid sequences. Proteins may be linked by spacer peptides, usually of a flexible nature, but other chemical bonds are not excluded. Suitable linkers include polypeptides between 4 amino acids and 40 amino acids in length or between 4 amino acids and 25 amino acids in length. These linkers can be generated by using synthetic linker-encoding oligonucleotides to couple proteins, or can be encoded by nucleic acid sequences encoding fusion proteins. Peptide linkers with a degree of flexibility may be used. The linker peptide may have virtually any amino acid sequence, bearing in mind that preferred linkers will have sequences that result in a peptide that is generally flexible. The use of small amino acids (such as glycine and alanine) is used to generate flexible peptides. It is routine for a person skilled in the art to generate such sequences. A variety of different linkers are commercially available and are believed to be suitable.
Examples of linker polypeptides include glycine polymers (G)nGlycine-serine polymers (including, for example, (GS)n、GSGGSn(SEQ ID NO:124)、GGSGGSn(SEQ ID NO:125) and GGGSn(SEQ ID NO:126) wherein n is an integer of at least 1), glycine-alanine polymer, alanine-serine polymer. Exemplary linkers may comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO:127), GGSGG (SEQ ID NO:128), GSGSGSG (SEQ ID NO:129), GSGGG (SEQ ID NO:130), GGGSG (SEQ ID NO:131), GSSSG (SEQ ID NO:132), and the like. One of ordinary skill will recognize that the design of the peptide conjugated to any desired element may include a linker that is fully or partially flexible, such that the linker may include a flexible linker and one or more moieties that impart a less flexible structure.
Detectable labels
In some cases, a CasY polypeptide of the disclosure comprises a detectable label. Suitable detectable labels and/or moieties that can provide a detectable signal can include, but are not limited to, enzymes, radioisotopes, members of specific binding pairs, fluorophores, fluorescent proteins, quantum dots, and the like.
Suitable fluorescent proteins include, but are not limited to, Green Fluorescent Protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), Enhanced GFP (EGFP), Enhanced CFP (ECFP), Enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEYFP), FPmCm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, T-HcRed, DsRed2, DsRed-monomer, J-Red, dimer 2, T-dimer 2(12), mRFP1, pocilloporin, renilla GFP (Renilla GFP), MonsterGFP, paGFP, Kaede protein and kindling protein, phycobiliprotein and phycobiliprotein conjugates (including B-phycoerythrin, R-phycoerythrin and allophycocyanin). Other examples of fluorescent proteins include mHoneydev, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrape1, mRaspberry, mGrape2, mPlum (Shaner et al (2005) nat. methods2: 905) -909), and the like. Any of a variety of fluorescent and colored proteins from coral species are suitable for use as described, for example, in Matz et al (1999) Nature Biotechnol.17: 969-973.
Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), Alkaline Phosphatase (AP), beta-Galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, beta-glucuronidase, invertase, xanthine oxidase, firefly luciferase, Glucose Oxidase (GO), and the like.
Motif adjacent to Protospacer (PAM)
The CasY protein binds to the target DNA at a target sequence defined by a region of complementarity between the DNA-targeting RNA and the target DNA. As with many CRISPR endonucleases, site-specific binding (and/or cleavage) of double-stranded target DNA occurs at positions determined by both: (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) short motifs in the target DNA [ termed Protospacer Adjacent Motifs (PAM) ].
In some embodiments, the PAM of the CasY protein is located directly 5' to the target sequence of the non-complementary strand of the target DNA (the complementary strand hybridizes to the guide sequence of the guide RNA, while the non-complementary strand does not hybridize directly to the guide RNA and is the reverse complement of the non-complementary strand). In some embodiments (e.g., when CasY1 is used as described herein), the PAM sequence of the non-complementary strand is 5 '-TA-3' (and in some cases XTA, where X is C, A or T). See, for example, fig. 5 and 7 (where PAM is TA, or CTA (if PAM is considered to be XTA), where X is C, A or T). In some embodiments (e.g., when CasY1 is used as described herein), the PAM sequence of the non-complementary strand is 5 '-TA-3' (and in some cases HTA, where H is C, A or T). See, for example, fig. 5 and 7 (where PAM is TA, or CTA (if PAM is considered to be HTA), where H is C, A or T). In some cases (e.g., when CasY2 is used as described herein), the PAM sequence of the non-complementary strand is the 5' -YR-3 ' flanking sequence of the 5' end of the target (where Y is T or C and R is a or G). In some cases (e.g., when CasY2 is used as described herein), the PAM sequence of the non-complementary strand is 5 '-TR-3' (e.g., 5'-DTR-3') (where R is a or G and D is A, G or T). See fig. 5d for an example.
In some cases, different CasY proteins (i.e., CasY proteins from various species) may be advantageously used in various provided methods to take advantage of various enzymatic characteristics of different CasY proteins (e.g., for different PAM sequence preferences; for increased or decreased enzymatic activity; for increased or decreased levels of cytotoxicity; for altering the balance between NHEJ, homology directed repair, single strand breaks, double strand breaks, etc.; utilizing short total sequences, etc.). Different PAM sequences in the target DNA may be required for CasY proteins from different species. Thus, for the particular CasY protein selected, the PAM sequence requirements may differ from the 5 '-TA-3' (or XTA, HTA) sequences described above. Various methods for identifying suitable PAM sequences, including computer modeling methods and/or wet laboratory methods (wet lab methods), are known and conventional in the art, and any convenient method may be used. The TA (XTA, HTA) PAM sequences described herein were identified using a PAM deletion assay (see, e.g., figure 5 of the working examples below).
Casy guide RNA
Nucleic acid molecules that bind to a CasY protein to form a ribonucleoprotein complex (RNP) and target the complex to a specific location within a target nucleic acid (e.g., target DNA) are referred to herein as "CasY guide RNAs" or simply "guide RNAs". It will be appreciated that in some cases hybrid DNA/RNA may be prepared such that the CasY guide RNA comprises DNA bases in addition to RNA bases, but the term "CasY guide RNA" is still used to encompass such molecules herein.
It can be said that the CasY guide RNA comprises two segments, a targeting segment and a protein binding segment. The targeted segment of the CasY guide RNA comprises a nucleotide sequence (guide sequence) that is complementary to (and thus hybridizes to) a specific sequence (target site) within a target nucleic acid (e.g., target ssRNA, target ssDNA, a complementary strand of double-stranded target DNA, etc.). The protein binding segment (or "protein binding sequence") interacts with (binds to) the CasY polypeptide. The protein binding segment of the subject CasY guide RNA comprises two segments of complementary nucleotides that hybridize to each other to form a double-stranded RNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of a target nucleic acid (e.g., genomic DNA) can occur at a location (e.g., a target sequence of a target locus) where base-pairing complementarity between a CasY guide RNA (a guide sequence of a CasY guide RNA) and the target nucleic acid is determined.
The CasY guide RNA and a CasY protein (e.g., a fused CasY polypeptide) form a complex (e.g., bind via a non-covalent interaction). The CasY guide RNA provides target specificity to the complex by comprising a targeting segment comprising a guide sequence (a nucleotide sequence complementary to the target nucleic acid sequence). The CasY protein of the complex provides site-specific activity (e.g., cleavage activity provided by the CasY protein and/or activity provided by the fusion partner in the case of chimeric CasY proteins). In other words, the CasY protein is directed to a target nucleic acid sequence (e.g., a target sequence) due to its association with the CasY guide RNA.
The "guide sequence," also referred to as a "targeting sequence" of the CasY guide RNA, can be modified such that the CasY guide RNA can target a CasY protein (e.g., a naturally occurring CasY protein, a fused CasY polypeptide (chimeric CasY), etc.) to any desired sequence of any desired target nucleic acid, except that, for example, a PAM sequence can be considered (e.g., as described herein). Thus, for example, a CasY guide RNA can have a guide sequence that is complementary to (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, a chromosomal sequence, a eukaryotic RNA, etc.), and the like.
Guide sequences for CasY guide RNAs
The subject CasY guide RNA comprises a guide sequence (i.e., a targeting sequence) that is a nucleotide sequence complementary to a sequence in a target nucleic acid (target site). In other words, the guide sequence of the CasY guide RNA can interact with a target nucleic acid (e.g., double-stranded dna (dsdna), single-stranded dna (ssdna), single-stranded RNA (ssrna), or double-stranded RNA (dsrna)) in a sequence-specific manner by hybridization (i.e., base pairing). The guide sequence of the CasY guide RNA can be modified (e.g., by genetic engineering)/designed to hybridize to any desired target sequence within a target nucleic acid (e.g., a eukaryotic target nucleic acid, such as genomic DNA) (e.g., when a PAM is contemplated, e.g., when a dsDNA target is targeted).
In some embodiments, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or greater (e.g., 65% or greater, 70% or greater, 75% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or greater (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or greater (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100%.
In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over seven consecutive nucleotides at the 3' -most end of the acid target site of the target nucleic acid.
In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides.
In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19 or more (e.g., 20 or more, 21 or more, 22 or more) consecutive nucleotides.
In some embodiments, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or greater (e.g., 70% or greater, 75% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 17-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or greater (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 17-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or greater (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 17-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17-25 consecutive nucleotides.
In some embodiments, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 60% or greater (e.g., 70% or greater, 75% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 80% or greater (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 90% or greater (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19-25 consecutive nucleotides.
In some cases, the guide sequence has a length in the range of 17-30 nucleotides (nt) (e.g., 17-25, 17-22, 17-20, 19-30, 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some cases, the guide sequence has a length in the range of 17-25 nucleotides (nt) (e.g., 17-22, 17-20, 19-25, 19-22, 19-20, 20-25, or 20-22 nt). In some cases, the guide sequence has a length of 17 or more nt (e.g., 18 or more, 19 or more, 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has a length of 17 nt. In some cases, the guide sequence has a length of 18 nt. In some cases, the guide sequence has a length of 19 nt. In some cases, the guide sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt.
Protein binding segments of CasY guide RNA
The protein binding segment of the subject CasY guide RNA interacts with a CasX protein. The CasY guide RNA directs the bound CasY protein to a specific nucleotide sequence within the target nucleic acid through the above-mentioned guide sequences. The protein-binding segment of the CasY guide RNA comprises two stretches of nucleotides that are complementary to each other and hybridize to form a double-stranded RNA duplex (dsRNA duplex). Thus, the protein binding segment comprises a dsRNA duplex.
In some cases, a dsRNA duplex region comprises a range of 5-25 base pairs (bp) (e.g., 5-22, 5-20, 5-18, 5-15, 5-12, 5-10, 5-8, 8-25, 8-22, 8-18, 8-15, 8-12, 12-25, 12-22, 12-18, 12-15, 13-25, 13-22, 13-18, 13-15, 14-25, 14-22, 14-18, 14-15, 15-25, 15-22, 15-18, 17-25, 17-22, or 17-18 bp, e.g., 5 bp, 6 bp, 7 bp, 8-15-25 bp, 12-15-25 bp, 13-18 bp, 13-15-25 bp, 14-22 bp, or 17-18 bp, e.g., 5 bp, 6 bp, 7 bp, or, 8 bp, 9 bp, 10 bp, etc.). In some cases, the dsRNA duplex region comprises a range of 6-15 base pairs (bp) (e.g., 6-12, 6-10, or 6-8 bp, such as 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.). In some cases, the duplex region comprises 5 or more bp (e.g., 6 or more, 7 or more, or 8 or more bp). In some cases, the duplex region comprises 6 or more bp (e.g., 7 or more or 8 or more bp). In some cases, not all of the nucleotides of the duplex region are paired, and thus the duplex forming region may comprise a bulge. The term "bulge" is used herein to mean a stretch of nucleotides (which may be one nucleotide) that does not contribute to the double-stranded duplex, but is surrounded by contributing nucleotides at the 5 'and 3' ends, and thus the bulge is considered to be part of the duplex region. In some cases, the dsRNA comprises 1 or more lobes (e.g., 2 or more, 3 or more, 4 or more lobes). In some cases, the dsRNA duplex comprises 2 or more lobes (e.g., 3 or more, 4 or more lobes). In some cases, the dsRNA duplex comprises 1-5 lobes (e.g., 1-4, 1-3, 2-5, 2-4, or 2-3 lobes).
Thus, in some cases, the nucleotide segments that hybridize to each other to form a dsRNA duplex have between 70% -100% complementarity to each other (e.g., between 75% -100%, 80% -10%, 85% -100%, 90% -100%, 95% -100% complementarity). In some cases, the nucleotide segments that hybridize to each other to form a dsRNA duplex have between 70% -100% complementarity to each other (e.g., between 75% -100%, 80% -10%, 85% -100%, 90% -100%, 95% -100% complementarity). In some cases, the nucleotide segments that hybridize to each other to form a dsRNA duplex have between 85% and 100% complementarity to each other (e.g., between 90% and 100%, between 95% and 100% complementarity). In some cases, the nucleotide segments that hybridize to each other to form a dsRNA duplex have between 70% and 95% complementarity to each other (e.g., between 75% and 95%, between 80% and 95%, between 85% and 95%, between 90% and 95% complementarity).
In other words, in some embodiments, a dsRNA duplex comprises two segments of nucleotides that are 70% -100% complementary to each other (e.g., 75% -100%, 80% -10%, 85% -100%, 90% -100%, 95% -100% complementary). In some cases, a dsRNA duplex comprises two segments of nucleotides that are 85% -100% complementary to each other (e.g., 90% -100%, 95% -100% complementary). In some cases, a dsRNA duplex comprises two segments of nucleotides that are 70% -95% complementary to each other (e.g., 75% -95%, 80% -95%, 85% -95%, 90% -95% complementary).
The duplex region of the subject CasY guide RNA may comprise one or more (1, 2, 3, 4, 5, etc.) mutations relative to the naturally occurring duplex region. For example, in some cases, base pairing can be maintained while the nucleotides contributing to the base pairing of each segment can be different. In some cases, the duplex region of the subject CasY guide RNA comprises more paired bases, fewer paired bases, smaller projections, larger projections, fewer projections, more projections, or any convenient combination thereof, as compared to the naturally occurring duplex region (of the naturally occurring CasY guide RNA).
Examples of various Cas9 guide RNAs can be found in the art, and in some cases, variants similar to those introduced into Cas9 guide RNAs can also be introduced into the CasY guide RNAs of the present disclosure (e.g., mutations to the dsRNA duplex region, extension of the 5 'or 3' ends for increased stability, to provide for interaction with another protein, etc.). See, for example, Jinek et al, science.2012, 8.17; 337(6096) 816-21; chylinski et al, RNABIol.2013 for 5 months; 726-37 (10) (5); ma et al, Biomed Res int.2013; 2013: 270805; hou et al, Proc Natl Acad Sci U S A.2013, 24 months 9; 110(39) 15644-9; jinek et al, elife.2013; 2: e 00471; pattanayak et al, Nat Biotechnol.2013, month 9; 839-43 in 31 (part C); qi et al, cell.2013, 2 month 28; 152(5) 1173-83; wang et al, cell.2013, 5 months and 9 days; 153(4) 910-8; auer et al Genome res.2013, 10 months and 31 days; chen et al, Nucleic Acids Res.2013, 11/1; 41(20) e 19; cheng et al, Cell Res.2013 for 10 months; 23(10) 1163-71; cho et al, genetics.2013, 11 months; 195(3) 1177-80; dicalo et al, Nucleic Acids Res.2013, 4 months; 4336-43 in 41 (7); dickinson et al, Nat methods.2013, 10 months; 10, (10) 1028 to 34; ebina et al, Sci Rep.2013; 3: 2510; fujii et al, Nucleic Acids Res.2013, 11 months and 1 day; 41(20) e 187; hu et al, Cell res.2013 for 11 months; 23, (11) 1322-5; jiang et al, Nucleic Acids Res.2013, 11/1/month; 41(20) e 188; larson et al, Nat protoc.2013 for 11 months; 2180-96 parts of 8 (11); mali et al, Natmethods.2013, 10 months; 10, (10) 957-63; nakayama et al, genesis.2013, month 12; 51(12) 835-43; ran et al, Nat protoc.2013 for 11 months; 2281-308; ran et al, cell.2013, 9/12; 154, (6) 1380-9; upadhyay et al, G3(Bethesda).2013, 12 months and 9 days; 2233-8 in3 (12); walsh et al, Proc Natl Acad Sci U S A.2013, 9 months and 24 days; 110(39) 15514-5; xie et al, Mol plant.2013, 10 months and 9 days; yang et al, cell.2013, 9 months and 12 days; 154(6) 1370-9; briner et al, Mol cell.2014, 10 months 23 days; 56(2) 333-9; and U.S. patents and patent applications: 8,906,616, respectively; 8,895,308, respectively; 8,889,418, respectively; 8,889,356, respectively; 8,871,445, respectively; 8,865,406, respectively; 8,795,965, respectively; 8,771,945, respectively; 8,697,359; 20140068797, respectively; 20140170753, respectively; 20140179006, respectively; 20140179770, respectively; 20140186843, respectively; 20140186919, respectively; 20140186958, respectively; 20140189896, respectively; 20140227787, respectively; 20140234972, respectively; 20140242664, respectively; 20140242699, respectively; 20140242700, respectively; 20140242702, respectively; 20140248702, respectively; 20140256046, respectively; 20140273037, respectively; 20140273226, respectively; 20140273230, respectively; 20140273231, respectively; 20140273232, respectively; 20140273233, respectively; 20140273234, respectively; 20140273235, respectively; 20140287938, respectively; 20140295556, respectively; 20140295557, respectively; 20140298547, respectively; 20140304853, respectively; 20140309487, respectively; 20140310828, respectively; 20140310830, respectively; 20140315985, respectively; 20140335063, respectively; 20140335620, respectively; 20140342456, respectively; 20140342457, respectively; 20140342458, respectively; 20140349400, respectively; 20140349405, respectively; 20140356867, respectively; 20140356956, respectively; 20140356958, respectively; 20140356959, respectively; 20140357523, respectively; 20140357530, respectively; 20140364333, respectively; and 20140377868; all of which are hereby incorporated by reference in their entirety.
The CasY guide RNA contains both a guide sequence and two nucleotides that hybridize to form a dsRNA duplex of the protein binding segment ("duplex forming segment"). The specific sequence of a given CasY guide RNA may be characteristic of the species in which the crRNA is present. Examples of suitable CasY guide RNAs are provided herein.
Exemplary guide RNA sequences
The repeats depicted in FIG. 6 (panels a and b), the non-guide sequence portion of the exemplary CasY guide RNA, are from the native loci of CasY 1-Y5. In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence CTCCGAAAGTATCGGGGATAAAGGC (SEQ ID NO:31) [ RNA is CUCCGAAAGUAUCGGGGAUAAAGGC (SEQ ID NO:11) ] (e.g., see FIG. 6). In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or more identical (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the crRNA sequence CTCCGAAAGTATCGGGGATAAAGGC (SEQ ID NO:31) [ RNA is CUCCGAAAGUAUCGGGGAUAAAGGC (SEQ ID NO:11) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence CTCCGAAAGTATCGGGGATAAAGGC (SEQ ID NO:31) [ RNA is CUCCGAAAGUAUCGGGGAUAAAGGC (SEQ ID NO:11) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence CACCGAAATTTGGAGAGGATAAGGC (SEQ ID NO:32) [ RNA is CACCGAAAUUUGGAGAGGAUAAGGC (SEQ ID NO:12) ] (see, e.g., FIG. 6). In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or greater (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identity) with the crRNA sequence CACCGAAATTTGGAGAGGATAAGGC (SEQ ID NO:32) [ RNA is CACCGAAAUUUGGAGAGGAUAAGGC (SEQ ID NO:12) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence CACCGAAATTTGGAGAGGATAAGGC (SEQ ID NO:32) [ RNA is CACCGAAAUUUGGAGAGGAUAAGGC (SEQ ID NO:12) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence CTCCGAATTATCGGGAGGATAAGGC (SEQ ID NO:33) [ RNA CUCCGAAUUAUCGGGAGGAUAAGGC (SEQ ID NO:13) ] (e.g., see fig. 6). In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or greater (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identity) with the crRNA sequence CTCCGAATTATCGGGAGGATAAGGC (SEQ ID NO:33) [ RNA is CUCCGAAUUAUCGGGAGGAUAAGGC (SEQ ID NO:13) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence CTCCGAATTATCGGGAGGATAAGGC (SEQ ID NO:33) [ RNA is CUCCGAAUUAUCGGGAGGAUAAGGC (SEQ ID NO:13) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence CCCCGAATATAGGGGACAAAAAGGC (SEQ ID NO:34) [ RNA is CCCCGAAUAUAGGGGACAAAAAGGC (SEQ ID NO:14) ] (e.g., see FIG. 6). In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or greater (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identity) with the crRNA sequence CCCCGAATATAGGGGACAAAAAGGC (SEQ ID NO:34) [ RNA is CCCCGAAUAUAGGGGACAAAAAGGC (SEQ ID NO:14) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence CCCCGAATATAGGGGACAAAAAGGC (SEQ ID NO:34) [ RNA is CCCCGAAUAUAGGGGACAAAAAGGC (SEQ ID NO:14) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence GTCTAGACATACAGGTGGAAAGGTGAGAGTAAAGAC (SEQ ID NO:35) [ RNA is GUCUAGACAUACAGGUGGAAAGGUGAGAGUAAAGAC (SEQ ID NO:15) ] (e.g., see FIG. 6). In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or more identical (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the crRNA sequence GTCTAGACATACAGGTGGAAAGGTGAGAGTAAAGAC (SEQ ID NO:35) [ RNA is GUCUAGACAUACAGGUGGAAAGGUGAGAGUAAAGAC (SEQ ID NO:15) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence GTCTAGACATACAGGTGGAAAGGTGAGAGTAAAGAC (SEQ ID NO:35) [ RNA is GUCUAGACAUACAGGUGGAAAGGUGAGAGUAAAGAC (SEQ ID NO:15) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) a crRNA sequence set forth in any one of SEQ ID NOS: 11-15. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identity) to a crRNA sequence set forth in any of SEQ ID NOS: 11-15. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to a crRNA sequence set forth in any of SEQ ID NOS: 11-15.
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) a crRNA sequence set forth in any one of SEQ ID NOS: 11-14. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identity) to a crRNA sequence set forth in any of SEQ ID NOS: 11-14. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to a crRNA sequence set forth in any of SEQ ID NOS: 11-14.
The repeat sequence of the native locus from CasY18 (the non-guide sequence portion of the exemplary CasY guide RNA) is CTCCGTGAATACGTGGGGTAAAGGC (SEQ ID NO:36) [ the RNA is CUCCGUGAAUACGUGGGGUAAAGGC (SEQ ID NO:16) ]. In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) crRNA sequence CTCCGTGAATACGTGGGGTAAAGGC (SEQ ID NO:36) [ RNA is CUCCGUGAAUACGUGGGGUAAAGGC (SEQ ID NO:16) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 80% or more identical (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the crRNA sequence CTCCGTGAATACGTGGGGTAAAGGC (SEQ ID NO:36) [ RNA is CUCCGUGAAUACGUGGGGUAAAGGC (SEQ ID NO:16) ]. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to the crRNA sequence CTCCGTGAATACGTGGGGTAAAGGC (SEQ ID NO:36) [ RNA is CUCCGUGAAUACGUGGGGUAAAGGC (SEQ ID NO:16) ].
In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) a crRNA sequence set forth in any one of SEQ ID NOS: 11-16. In some cases, a subject CasY guide RNA comprises (e.g., in addition to a guide sequence) a nucleotide sequence that is 80% or greater identical (e.g., 85% or greater, 90% or greater, 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to a crRNA sequence set forth in any of SEQ ID NOs: 11-16. In some cases, a subject CasY guide RNA comprises a nucleotide sequence that is 90% or greater identical (e.g., 93% or greater, 95% or greater, 97% or greater, 98% or greater, or 100% identical) to a crRNA sequence set forth in any of SEQ ID NOS: 11-16.
CASY system
The present disclosure provides a CasY system. The CasY system of the present disclosure may comprise: a) the CasY polypeptides and CasY guide RNAs of the disclosure; b) a CasY polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; c) a CasY fusion polypeptide and a CasY guide RNA of the present disclosure; d) a CasY fusion polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; e) mRNA encoding a CasY polypeptide of the disclosure and a CasY guide RNA; f) mRNA encoding a CasY polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; g) mRNA encoding a CasY fusion polypeptide of the disclosure and a CasY guide RNA; h) mRNA encoding a CasY fusion polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; i) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; j) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; k) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; l) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; m) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; n) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; o) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; p) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; q) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or r) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or some variation of one of (a) to (r).
Nucleic acids
The present disclosure provides one or more nucleic acids comprising one or more of: a donor polynucleotide sequence, a nucleotide sequence encoding a CasY polypeptide (e.g., a wild-type CasY protein, a nickase CasY protein, a dCasy protein, a chimeric CasY protein, etc.), a CasY guide RNA, and a nucleotide sequence encoding a CasY guide RNA. The present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a CasY fusion polypeptide. The present disclosure provides a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide. The present disclosure provides a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide. The present disclosure provides a recombinant expression vector comprising: a) a nucleotide sequence encoding a CasY polypeptide; and b) a nucleotide sequence encoding one or more CasY guide RNAs. The present disclosure provides a recombinant expression vector comprising: a) a nucleotide sequence encoding a CasY fusion polypeptide; and b) a nucleotide sequence encoding one or more CasY guide RNAs. In some cases, the nucleotide sequence encoding the CasY protein and/or the nucleotide sequence encoding the CasY guide RNA is operably linked to a promoter operable in a selected cell type (e.g., prokaryotic cells, eukaryotic cells, plant cells, animal cells, mammalian cells, primate cells, rodent cells, human cells, etc.).
In some cases, the nucleotide sequence encoding a CasY polypeptide of the disclosure is codon optimized. This type of optimization may require mutation of the nucleotide sequence encoding the CasY to mimic the codon bias of the intended host organism or cell while encoding the same protein. Thus, codons can be changed, but the encoded protein remains unchanged. For example, if the intended target cell is a human cell, a human codon-optimized nucleotide sequence encoding CasY may be used. As another non-limiting example, if the intended host cell is a mouse cell, a mouse codon optimized nucleotide sequence encoding CasY may be generated. As another non-limiting example, if the intended host cell is a plant cell, a plant codon-optimized nucleotide sequence encoding CasY may be produced. As another non-limiting example, if the intended host cell is an insect cell, an insect codon-optimized nucleotide sequence encoding CasY may be generated.
The present disclosure provides one or more recombinant expression vectors comprising (in some cases in a different recombinant expression vector, and in some cases in the same recombinant expression vector): (i) a nucleotide sequence of a donor template nucleic acid (wherein the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); (ii) a nucleotide sequence encoding a CasY-directing RNA that hybridizes to (e.g., is operably linked to a promoter operable in a target cell (such as a eukaryotic cell)) a target sequence targeted to a target locus of a genome; and (iii) a nucleotide sequence encoding a CasY protein (e.g., operably linked to a promoter operable in a target cell, such as a eukaryotic cell). The present disclosure provides one or more recombinant expression vectors comprising (in some cases in a different recombinant expression vector, and in some cases in the same recombinant expression vector): (i) a nucleotide sequence of a donor template nucleic acid (wherein the donor template comprises a nucleotide sequence having homology to a target sequence of a target nucleic acid (e.g., a target genome)); and (ii) a nucleotide sequence encoding a CasY-directing RNA that hybridizes to (e.g., is operably linked to a promoter operable in a target cell, such as a eukaryotic cell) a target sequence that targets a target locus of a genome. The present disclosure provides one or more recombinant expression vectors comprising (in some cases in a different recombinant expression vector, and in some cases in the same recombinant expression vector): (i) a nucleotide sequence encoding a CasY-directing RNA that hybridizes to (e.g., is operably linked to a promoter operable in a target cell (such as a eukaryotic cell)) a target sequence targeted to a target locus of a genome; and (ii) a nucleotide sequence encoding a CasY protein (e.g., operably linked to a promoter operable in a target cell, such as a eukaryotic cell).
Suitable expression vectors include viral expression vectors (e.g., viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al, Invest Opthalmol Vis Sci 35: 25432549,1994; Borras et al, Gene Ther 6:515524,1999; Li and Davidson, PNAS 92: 77007704,1995; Sakamoto et al, H Gene Ther 5: 10881097,1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated viruses (AAV) (see, e.g., Ali et al, HumGene Ther 9: 8186,1998; Flannery et al, PNAS 94: 69166921,1997; Bennett et al, Invest Opthalmol Vis Sci 38: 28572863,1997; Jomarmar et al, Gene Ther 4: 683690,1997; Rolling et al, Hum Ther 10: 641648,1999; Sagit et al, Huvall 3838: Sri 3838; Sri J591594,1996; Sri 3828J 3628; Sri et al, virol (1988)166: 154-165; and Flotte et al, PNAS (1993)90: 10613-10617); SV 40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al, PNAS 94: 1031923,1997; Takahashi et al, J Virol 73: 78127816,1999); retroviral vectors (e.g., murine leukemia virus, spleen necrosis virus, and vectors derived from retroviruses such as rous sarcoma virus, hayworm sarcoma virus, avian leukemia virus, lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus), and the like. In some cases, a recombinant expression vector of the present disclosure is a recombinant adeno-associated virus (AAV) vector. In some cases, a recombinant expression vector of the disclosure is a recombinant lentiviral vector. In some cases, a recombinant expression vector of the present disclosure is a recombinant retroviral vector.
Depending on the host/vector system used, any of a number of suitable transcriptional and translational control elements may be used in the expression vector, including constitutive and inducible promoters, transcriptional enhancer elements, transcriptional terminators, and the like.
In some embodiments, the nucleotide sequence encoding the CasY guide RNA is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. In some embodiments, the nucleotide sequence encoding a CasY protein or a CasY fusion polypeptide is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
The transcriptional control element may be a promoter. In some cases, the promoter is a constitutively active promoter. In some cases, the promoter is a regulatable promoter. In some cases, the promoter is an inducible promoter. In some cases, the promoter is a tissue-specific promoter. In some cases, the promoter is a cell-type specific promoter. In some cases, a transcriptional control element (e.g., a promoter) is functional in a targeted cell type or targeted cell population. For example, in some cases, the transcriptional control element may be functional in a eukaryotic cell, such as a hematopoietic stem cell (e.g., mobilized peripheral blood (mPB) CD34(+) cells, Bone Marrow (BM) CD34(+) cells, etc.).
Non-limiting examples of eukaryotic promoters (promoters that are functional in eukaryotic cells) include EF1 α, those from Cytomegalovirus (CMV) immediate early, Herpes Simplex Virus (HSV) thymidine kinase, early and late SV40, retroviral Long Terminal Repeats (LTR), and mouse metallothionein-I. Selection of appropriate vectors and promoters is well within the level of ordinary skill in the art. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also comprise appropriate sequences for amplifying expression. The expression vector can further comprise a nucleotide sequence encoding a protein tag (e.g., a 6xHis tag, a hemagglutinin tag, a fluorescent protein, etc.) that can be fused to the CasY protein, thereby producing a chimeric CasY polypeptide.
In some embodiments, the nucleotide sequence encoding a CasY guide RNA and/or a CasY fusion polypeptide is operably linked to an inducible promoter. In some embodiments, the nucleotide sequence encoding the CasY guide RNA and/or the CasY fusion protein is operably linked to a constitutive promoter.
The promoter can be a constitutively active promoter (i.e., a promoter that is constitutive in the active/"ON" state), it can be an inducible promoter (i.e., a promoter whose state (active/"ON" or inactive/"OFF") is controlled by an external stimulus, such as the presence of a particular temperature, compound or protein), it can be a spatially restricted promoter (i.e., transcriptional control elements, enhancers, etc.) (e.g., tissue-specific promoters, cell-type specific promoters, etc.), and it can be a temporally restricted promoter (i.e., a promoter that is in the "ON" state or "OFF" state during a particular stage of embryonic development or during a particular stage of a biological process (e.g., the hair follicle cycle in a mouse).
Suitable promoters may be derived from viruses and may therefore be referred to as viral promoters, or they may be derived from any organism, including prokaryotes or eukaryotes. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol ii, pol III). Exemplary promoters include, but are not limited to, the SV40 early promoter, the mouse mammary tumor virus Long Terminal Repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); herpes Simplex Virus (HSV) promoters, Cytomegalovirus (CMV) promoters such as the CMV immediate early promoter region (CMVIE), the Rous Sarcoma Virus (RSV) promoter, the human U6 micronucleus promoter (U6) (Miyagishi et al, Nature Biotechnology 20, 497-Asn 500(2002)), the enhanced U6 promoter (e.g., Xia et al, Nucleic Acids Res.2003, 9.1.9; 31(17)), the human H1 promoter (H1), and the like.
In some cases, the nucleotide sequence encoding the CasY guide RNA is operably linked to (under the control of) a promoter operable in eukaryotic cells (e.g., the U6 promoter, the enhanced U6 promoter, the H1 promoter, etc.). As understood by one of ordinary skill in the art, when RNA (e.g., guide RNA) is expressed from a nucleic acid (e.g., an expression vector) using the U6 promoter (e.g., in a eukaryotic cell) or another PolIII promoter, mutations may need to be made to the RNA if several ts (encoding U in RNA) are present in succession. This is because a string of T's (e.g., 5T's) in DNA can act as a terminator for polymerase iii (poliii). Thus, in order to ensure transcription of a guide RNA in eukaryotic cells, it may sometimes be necessary to modify the sequence encoding the guide RNA to eliminate the effect of T. In some cases, the nucleotide sequence encoding a CasY protein (e.g., a wild-type CasY protein, a nickase CasY protein, a dCasY protein, a chimeric CasY protein, etc.) is operably linked to a promoter operable in eukaryotic cells (e.g., a CMV promoter, an EF 1a promoter, an estrogen receptor regulated promoter, etc.).
Examples of inducible promoters include, but are not limited to, the T7 RNA polymerase promoter, the T3RNA polymerase promoter, the isopropyl- β -D-thiogalactopyranoside (IPTG) regulated promoter, the lactose-inducible promoter, the heat shock promoter, the tetracycline regulated promoter, the steroid regulated promoter, the metal regulated promoter, the estrogen receptor regulated promoter, and the like. Thus, inducible promoters may be regulated by molecules including, but not limited to, doxycycline; estrogens and/or estrogen analogs; IPTG and the like.
Inducible promoters suitable for use include any of the inducible promoters described herein or known to those of ordinary skill in the art. Examples of inducible promoters include, but are not limited to, chemically/biochemically regulated promoters and physically regulated promoters, such as alcohol-regulated promoters, tetracycline-regulated promoters (e.g., anhydrotetracycline (aTc) -responsive promoters and other tetracycline-responsive promoter systems including tetracycline repressor protein (tetR), tetracycline operator sequence (tetO), and tetracycline transactivator fusion protein (tTA)), steroid-regulated promoters (e.g., promoters based on rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptor, and promoters from the steroid/retinoid/thyroid receptor superfamily), metal-regulated promoters (e.g., promoters derived from metallothionein (protein binding to and chelating metal ions) genes from yeast, mouse, and human), and the like, Pathogen-regulated promoters (e.g., promoters induced by salicylic acid, ethylene or Benzothiadiazole (BTH)), temperature/heat inducible promoters (e.g., heat shock promoters), and light-regulated promoters (e.g., light-responsive promoters from plant cells).
In some cases, the promoter is a spatially restricted promoter (i.e., a cell-type specific promoter, a tissue-specific promoter, etc.) such that in a multicellular organism, the promoter is active (i.e., "ON") in a particular subset of cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, and the like. Any convenient spatially limited promoter can be used, so long as the promoter is functional in the targeted host cell (e.g., eukaryotic; prokaryotic).
In some cases, the promoter is a reversible promoter. Suitable reversible promoters, including reversibly inducible promoters, are known in the art. Such reversible promoters can be isolated and derived from many organisms, such as eukaryotes and prokaryotes. Modifications of reversible promoters derived from a first organism (e.g., first and second prokaryotes, etc.) for use in a second organism are well known in the art. Such reversible promoters and systems based on such reversible promoters but also containing additional control proteins include, but are not limited to, alcohol regulated promoters (e.g., alcohol dehydrogenase I (alcA) gene promoters, promoters responsive to alcohol transactivator protein (AlcR), etc.), tetracycline regulated promoters (e.g., promoter systems including Tet activator, TetON, TetOFF, etc.), steroid regulated promoters (e.g., rat glucocorticoid receptor promoter system, human estrogen receptor promoter system, retinoid promoter system, thyroid promoter system, ecdysone promoter system, mifepristone promoter system, etc.), metal regulated promoters (e.g., metallothionein promoter system, etc.), pathogenesis related regulated promoters (e.g., salicylic acid regulated promoters, ethylene regulated promoters, etc, Benzothiadiazole regulated promoters, etc.), temperature regulated promoters (e.g., heat shock inducible promoters (e.g., HSP-70, HSP-90, soybean heat shock promoters, etc.)), light regulated promoters, synthesis inducible promoters, etc.
Methods of introducing nucleic acids (e.g., a nucleic acid comprising a donor polynucleotide sequence, one or more nucleic acids encoding a CasY protein and/or a CasY guide RNA, etc.) into host cells are known in the art, and any convenient method can be used to introduce nucleic acids (e.g., expression constructs) into cells. Suitable methods include, for example, viral infection, transfection, lipofection, electroporation, calcium phosphate precipitation, Polyethyleneimine (PEI) mediated transfection, DEAE-dextran mediated transfection, liposome mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle mediated nucleic acid delivery, and the like.
Introduction of the recombinant expression vector into the cell can occur in any medium and under any culture conditions that promote cell survival. Introduction of the recombinant expression vector into the target cell can be performed in vivo or ex vivo. Introduction of the recombinant expression vector into the target cell can be performed in vitro.
In some embodiments, the CasY protein may be provided as RNA. RNA can be provided by direct chemical synthesis, or can be transcribed in vitro from DNA (e.g., DNA encoding a CasY protein). Once synthesized, RNA can be introduced into the cell by any well-known technique for introducing nucleic acids into cells (e.g., microinjection, electroporation, transfection, etc.).
Well-developed transfection techniques can be used (see, e.g., Angel and Yanik (2010) PLoS ONE 5(7): e 11756); and commercially available from QiagenReagent and commercially available Stemfect from StemgentTMRNA transfection kit and commercially available from Mirus Bio LLCThe transfection kit provides nucleic acid to the cells. See also Beumer et al (2008) PNAS105(50): 19821-.
The vector may be provided directly to the target host cell. In other words, the cell is contacted with a vector comprising the subject nucleic acid (e.g., a recombinant expression vector having a donor template sequence and encoding a CasY guide RNA; a recombinant expression vector encoding a CasY protein, etc.) such that the vector is taken up by the cell. Methods for contacting cells with nucleic acid vectors as plasmids, including electroporation, calcium chloride transfection, microinjection, and lipofection, are well known in the art. For viral vector delivery, cells can be contacted with viral particles comprising the subject viral expression vectors.
Retroviruses, such as lentiviruses, are suitable for use in the methods of the present disclosure. Commonly used retroviral vectors are "defective", i.e., incapable of producing the viral proteins required for productive infection. Furthermore, vector replication requires growth in a packaging cell line. To generate viral particles comprising a nucleic acid of interest, retroviral nucleic acids comprising the nucleic acid are packaged into the viral capsid by a packaging cell line. Different packaging cell lines provide different envelope proteins (avidity, amphotropic, or heterophilic) to be incorporated into the capsid, which determine the specificity of the viral particle for the cell (avidity for mouse and rat; amphotropic for most mammalian cell types including human, dog, and mouse; and heterophilic for most mammalian cell types other than murine cells). Appropriate packaging cell lines can be used to ensure that the cells are targeted by the packaged viral particles. Methods for introducing the subject vector expression vectors into packaging cell lines and harvesting viral particles produced by the packaging cell lines are well known in the art. Nucleic acids can also be introduced by direct microinjection (e.g., injection of RNA).
The vector used to provide the nucleic acid encoding the CasY guide RNA and/or the CasY polypeptide to the target host cell may include a suitable promoter for driving expression (i.e., transcriptional activation) of the nucleic acid of interest. In other words, in some cases, the nucleic acid of interest will be operably linked to a promoter. The promoter may include ubiquitously activated promoters, such as the CMV- β -actin promoter; or inducible promoters, such as promoters active in a particular cell population or responsive to the presence of a drug, such as tetracycline. By transcriptional activation, it is expected that transcription will increase 10-fold, 100-fold, more typically 1000-fold above basal levels in target cells. In addition, the vector for providing a nucleic acid encoding a CasY-guiding RNA and/or a CasY protein to a cell may comprise a nucleic acid sequence encoding a selectable marker in the target cell for identifying cells that have taken up the CasY-guiding RNA and/or the CasY protein.
A nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide is in some cases an RNA. Thus, the CasY fusion protein can be introduced into the cell as RNA. Methods of introducing RNA into cells are known in the art and may include, for example, direct injection, transfection, or any other method for introducing DNA. In contrast, the CasY protein may be provided to the cell as a polypeptide. Such polypeptides may optionally be fused to a polypeptide domain that increases product solubility. The domain may be linked to the polypeptide through a defined protease cleavage site (e.g., a TEV sequence cleaved by a TEV protease). The linker may also comprise one or more flexible sequences, for example 1 to 10 glycine residues. In some embodiments, cleavage of the fusion protein is performed in a buffer that maintains product solubility, e.g., in the presence of 0.5 to 2M urea, in the presence of a polypeptide and/or a solubility-increasing polynucleotide, etc. Domains of interest include endosomolytic domains, such as influenza HA domains; and other polypeptides that aid in production, such as an IF2 domain, a GST domain, a GRPE domain, and the like. The polypeptides may be formulated for improved stability. For example, the peptide may be pegylated, wherein the polyethyleneoxy group provides increased longevity in the bloodstream.
Additionally or alternatively, the CasY polypeptides of the present disclosure may be fused to a polypeptide penetrating domain to facilitate uptake by cells. A number of penetration domains are known in the art and can be used in the non-integrated polypeptides of the present disclosure, including peptides, peptidomimetics, and non-peptide carriers. For example, the penetration peptide may be derived from the third alpha helix of the Drosophila melanogaster transcription factor antennapedia gene (referred to as the penetration protein) which comprises the amino acid sequence RQIKIWFQNRRMKWKK (SEQ ID NO: 133). As another example, the penetrating peptide comprises an HIV-1tat basic region amino acid sequence, which may include, for example, amino acids 49-57 of the naturally occurring tat protein. Other penetration domains include polyarginine motifs such as the region of amino acids 34-56 of the HIV-1rev protein, nonarginine, octaarginine, and the like. (see, e.g., Futaki et al (2003) Curr Protein PeptSci.2003, 4 months; 4(2):87-9 and 446; and Wender et al (2000) Proc. Natl. Acad. Sci.U.S.A. 2000, 11 months 21; 97(24): 13003-8; published U.S. patent applications 20030220334; 20030083256; 20030032593; and 20030022831, the teachings of which are specifically incorporated herein by reference). The nine arginine (R9) sequence is one of the more potent PTDs that has been characterized (Wender et al 2000; Uemura et al 2002). The site of fusion may be selected to optimize the biological activity, secretion, or binding characteristics of the polypeptide. The optimal site will be determined by routine experimentation.
The CasY polypeptide of the present disclosure can be produced in vitro or by eukaryotic cells or by prokaryotic cells, and it can be further processed by unfolding (e.g., heat denaturation, dithiothreitol reduction, etc.) and can be further refolded using methods known in the art.
Interesting modifications that do not alter the primary sequence include chemical derivatization of the polypeptide, such as acylation, acetylation, carboxylation, amidation, and the like. Also included are modifications of glycosylation, such as those made by modifying the glycosylated form of the polypeptide during its synthesis and processing, or in further processing steps; such as those made by exposing the polypeptide to an enzyme that affects glycosylation, such as a mammalian glycosylase or deglycosylase. Sequences having phosphorylated amino acid residues such as phosphotyrosine, phosphoserine, or phosphothreonine are also contemplated.
Also suitable for inclusion in embodiments of the present disclosure are nucleic acids (e.g., nucleic acids encoding a CasY guide RNA, encoding a CasY fusion protein, etc.) and proteins (e.g., a CasY fusion protein derived from a wild-type protein or a variant protein) that have been modified using common molecular biological techniques and synthetic chemistry in order to improve their resistance to proteolytic degradation, alter target sequence specificity, optimize solubility characteristics, alter protein activity (e.g., transcriptional regulatory activity, enzymatic activity, etc.) or make them more suitable. Analogs of such polypeptides include those polypeptides that contain residues other than naturally occurring L-amino acids (e.g., D-amino acids or non-naturally occurring synthetic amino acids). D-amino acids may be substituted for some or all of the amino acid residues.
The CasY polypeptides of the disclosure can be prepared by in vitro synthesis using conventional methods as are known in the art. Various commercial synthesis devices may be used, such as automated synthesizers from Applied Biosystems, inc. Naturally occurring amino acids can be substituted for unnatural amino acids using a synthesizer. The specific order and manner of preparation will be determined by convenience, economics, desired purity, and the like.
If desired, various groups can be introduced into the peptide during synthesis or during expression, which allows attachment to other molecules or surfaces. Thus cysteine can be used to make thioethers, histidines for attachment to metal ion complexes, carboxyls for amide or ester formation, amidols for amide formation, and the like.
The CasY polypeptides of the disclosure can also be isolated and purified according to conventional methods of recombinant synthesis. Lysates can be prepared from expression hosts and purified using High Performance Liquid Chromatography (HPLC), exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification techniques. In most cases, the composition used will comprise 20% or more, more typically 75% or more, preferably 95% by weight of the desired product, and typically 99.5% or more by weight for therapeutic purposes, relative to contaminants associated with the process of product preparation and purification thereof. Typically, the percentages will be based on total protein. Thus, in some cases, a CasY polypeptide or a CasY fusion polypeptide of the disclosure has at least 80% purity, at least 85% purity, at least 90% purity, at least 95% purity, at least 98% purity, or at least 99% purity (e.g., free of contaminants, non-CasY proteins or other macromolecules, etc.).
To induce cleavage or any desired modification of a target nucleic acid (e.g., genomic DNA), or any desired modification of a polypeptide associated with the target nucleic acid, the CasY guide RNA and/or the CasY polypeptide and/or donor template sequence of the present disclosure, whether introduced as nucleic acids or polypeptides, is provided to the cell for a period of about 30 minutes to about 24 hours, e.g., 1 hour, 1.5 hours, 2 hours, 2.5 hours, 3 hours, 3.5 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 12 hours, 16 hours, 18 hours, 20 hours, or any other period of about 30 minutes to about 24 hours, which can be repeated at a frequency of about daily to about every 4 days, e.g., at any other frequency of every 1.5 days, every 2 days, every 3 days, or about daily to about every four days. The subject cells may be provided with the one or more agents one or more times, such as once, twice, three times, or more than three times, and the cells are allowed to incubate with the one or more agents for an amount of time, such as 16-24 hours, after each contact event, after which time the media is replaced with fresh media and the cells are further cultured.
In cases where two or more different targeting complexes (e.g., two different CasY guide RNAs complementary to different sequences within the same or different target nucleic acids) are provided to the cell, the complexes can be provided (e.g., as two polypeptides and/or nucleic acids) or delivered simultaneously. Alternatively, they may be provided sequentially, e.g., first providing the targeting complex, then providing the second targeting complex, etc., or vice versa.
To improve delivery of DNA vectors to target cells, DNA can be protected from damage, for example, by using lipid complexes (lipoplex) and polymeric complexes (polyplex), and entry of DNA into cells is facilitated. Thus, in some cases, a nucleic acid of the disclosure (e.g., a recombinant expression vector of the disclosure) can be covered with lipids in an organized structure like a micelle or liposome. When an organized structure is complexed with DNA, it is called a lipid complex. There are three types of lipids, anionic (negatively charged), neutral, or cationic (positively charged). Lipid complexes using cationic lipids have been shown to be useful for gene transfer. Cationic lipids, due to their positive charge, complex naturally with negatively charged DNA. Also due to their charge, they interact with the cell membrane. Endocytosis of the lipid complex then occurs and the DNA is released into the cytoplasm. Cationic lipids also prevent degradation of DNA by cells.
The complex of the polymer and the DNA is called a polymer complex. Most polymeric complexes are composed of cationic polymers, and their production is regulated by ionic interactions. One big difference between the method of action of the polymer complex and the lipid complex is that the polymer complex cannot release its DNA load into the cytoplasm, for which reason co-transfection with endosomolytic agents (lysis of endosomes generated during endocytosis, such as inactivated adenovirus) must take place. However, this is not always the case; polymers such as polyethyleneimine have their own endosome disruption methods, as do chitosan and trimethyl chitosan.
Dendrimers, a spherical hyperbranched macromolecule, can also be used to genetically modify stem cells. The surface of the dendritic polymer particles may be functionalized to alter their properties. In particular, it is possible to construct cationic dendrimers (i.e. dendrimers with a positive surface charge). When genetic material (such as a DNA plasmid) is present, charge complementarity results in the temporary association of the nucleic acid with the cationic dendrimer. Upon reaching its destination, the dendrimer-nucleic acid complex may be taken up into the cell by endocytosis.
In some cases, a nucleic acid (e.g., an expression vector) of the present disclosure comprises an insertion site for a guide sequence of interest. For example, a nucleic acid may comprise an insertion site for a guide sequence of interest, wherein the insertion site is immediately adjacent to the nucleotide sequence of a portion of the CasY guide RNA that is not altered when the guide sequence is altered to hybridize to a desired target sequence (e.g., a sequence that contributes to the CasY binding aspect of the guide RNA, e.g., a sequence that contributes to one or more dsRNA duplexes of the CasY guide RNA-this portion of the guide RNA may also be referred to as a "scaffold" or "constant region" of the guide RNA). Thus, in some cases, a subject nucleic acid (e.g., an expression vector) comprises a nucleotide sequence encoding a CasY guide RNA, except that the portion of the guide sequence encoding the guide RNA is an insertion sequence (insertion site). An insertion site is any nucleotide sequence used for insertion of a desired sequence. "insertion sites" for the various techniques are known to those of ordinary skill in the art, and any convenient insertion site may be used. The insertion site may be used in any method of manipulating a nucleic acid sequence. For example, in some cases, the insertion site is a Multiple Cloning Site (MCS) (e.g., a site comprising one or more restriction enzyme recognition sequences), a site for ligation of independent clones, a site for clone-based recombination (e.g., att site-based recombination), a nucleotide sequence recognized by a CRISPR/Cas (e.g., Cas9) based technique, and the like.
The insertion site may be of any desired length, and may depend on the type of insertion site (e.g., may depend on whether the site comprises one or more restriction enzyme recognition sequences (and how many restriction enzyme recognition sequences are comprised), whether the site comprises a target site for the CRISPR/Cas protein, etc.). In some cases, the insertion site of the subject nucleic acid is 3 or more nucleotides (nt) in length (e.g., 5 or more, 8 or more, 10 or more, 15 or more, 17 or more, 18 or more, 19 or more, 20 or more, or 25 or more, or 30 or more nt in length). In some cases, the insertion site of the subject nucleic acid has a length in the range of 2 to 50 nucleotides (nt) (e.g., 2 to 40 nt, 2 to 30 nt, 2 to 25 nt, 2 to 20 nt, 5 to 50 nt, 5 to 40 nt, 5 to 30 nt, 5 to 25 nt, 5 to 20 nt, 10 to 50 nt, 10 to 40 nt, 10 to 30 nt, 10 to 25 nt, 10 to 20 nt, 17 to 50 nt, 17 to 40 nt, 17 to 30 nt, 17 to 25 nt). In some cases, the insertion site of the subject nucleic acid has a length in the range of 5 to 40 nt.
Nucleic acid modification
In some embodiments, a subject nucleic acid (e.g., a CasY guide RNA) has one or more modifications (e.g., base modifications, backbone modifications, etc.) to provide a new or enhanced feature (e.g., improved stability) to the nucleic acid. Nucleosides are base-sugar combinations. The base portion of the nucleoside is typically a heterocyclic base. The two most common classes of such heterocyclic bases are purines and pyrimidines. A nucleotide is a nucleoside that also comprises a phosphate group covalently linked to the sugar moiety of the nucleoside. For those nucleosides that include a pentofuranosyl sugar, the phosphate group can be attached to the 2', 3', or 5' hydroxyl moiety of the sugar. In forming oligonucleotides, phosphate groups covalently link nucleosides that are adjacent to each other to form linear polymeric compounds. In turn, each end of this linear polymeric compound may be further linked to form a cyclic compound, however, a linear compound is suitable. In addition, linear compounds may have internal nucleotide base complementarity and may therefore fold in a manner that produces a fully or partially double stranded compound. Within an oligonucleotide, the phosphate group is often referred to as forming the internucleoside backbone of the oligonucleotide. The normal bond or backbone of RNA and DNA is a 3 'to 5' phosphodiester bond.
Suitable nucleic acid modifications include, but are not limited to: 2' O methyl modified nucleotides, 2' fluoro modified nucleotides, Locked Nucleic Acid (LNA) modified nucleotides, Peptide Nucleic Acid (PNA) modified nucleotides, nucleotides having phosphorothioate linkages, and 5' caps (e.g., 7-methyl guanylic acid cap (m 7G)). Additional details and additional modifications are described below.
2' -O-methyl modified nucleotides (also referred to as 2' -O-methyl RNA) are naturally occurring RNA modifications found in tRNA's and other small RNAs that occur as post-transcriptional modifications. Oligonucleotides containing 2' -O-methyl RNA can be synthesized directly. This modification increases the Tm of the RNA: RNA duplex, but results in only minor changes in RNA: DNA stability. It is stable against single-stranded ribonuclease attack and is typically 5 to 10 times less sensitive to dnase than DNA. It is commonly used in antisense oligonucleotides as a means of increasing stability and binding affinity for target messengers.
2 'fluoro-modified nucleotides (e.g., 2' fluoro bases) have fluoro-modified ribose sugars that increase binding affinity (Tm) and also confer a degree of relative nuclease resistance compared to native RNA. These modifications are commonly used in ribozymes and siRNA to improve stability in serum or other biological fluids.
LNA bases have modifications to the ribose backbone that lock the bases in the C3' -internal position, which is advantageous for RNA a-type helical duplex geometry. This modification significantly increases Tm and also has very strong nuclease resistance. Multiple LNAs can be inserted anywhere in the oligonucleotide except at the 3' end. Applications from antisense oligonucleotides to hybridization probes to SNP detection and allele-specific PCR have been described. Since LNAs confer a large increase in Tm, they can also cause an increase in primer dimer formation as well as formation from hairpins. In some cases, the number of LNAs introduced into a single oligonucleotide is 10 bases or less.
Phosphorothioate (PS) linkages (i.e., phosphorothioate linkages) replace non-bridging oxygens in the phosphate backbone of nucleic acids (e.g., oligonucleotides) with sulfur atoms. Such modifications render the internucleotide linkages resistant to nuclease degradation. Phosphorothioate linkages may be introduced between the last 3-5 nucleotides of the 5 'or 3' ends of the oligonucleotides to inhibit exonuclease degradation. The inclusion of phosphorothioate linkages within the oligonucleotide (e.g., throughout the oligonucleotide) may also help reduce endonuclease attack.
In some embodiments, the subject nucleic acids have one or more nucleotides that are 2' -O-methyl modified nucleotides. In some embodiments, a subject nucleic acid (e.g., dsRNA, siNA, etc.) has one or more 2' fluoro-modified nucleotides. In some embodiments, a subject nucleic acid (e.g., dsRNA, siNA, etc.) has one or more LNA bases. In some embodiments, a subject nucleic acid (e.g., dsRNA, siNA, etc.) has one or more nucleotides connected by phosphorothioate linkages (i.e., the subject nucleic acid has one or more phosphorothioate linkages). In some embodiments, a subject nucleic acid (e.g., dsRNA, siNA, etc.) has a 5' cap (e.g., 7-methyl guanylate cap (m 7G)). In some embodiments, the subject nucleic acids (e.g., dsRNA, siNA, etc.) have a combination of modified nucleotides. For example, a subject nucleic acid (e.g., dsRNA, siNA, etc.) can have a 5' cap (e.g., 7-methyl guanylic acid cap (m7G)) in addition to having one or more nucleotides with other modifications (e.g., 2' -O-methyl nucleotides and/or 2' fluoro modified nucleotides and/or LNA bases and/or phosphorothioate linkages).
Modified backbones and modified internucleoside linkages
Examples of suitable nucleic acids containing modifications (e.g., CasY guide RNAs) include nucleic acids containing modified backbones or non-natural internucleoside linkages. Nucleic acids having a modified backbone include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.
Suitable modified oligonucleotide backbones containing phosphorus atoms include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphates including 3' -alkylene phosphates, 5' -alkylene phosphates, and chiral phosphates, phosphinates, phosphoramidates including 3' -amino phosphoramidates and aminoalkyl phosphoramidates, phosphorodiamidates, phosphorothioates, phosphoroselenoates and phosphoroboroates having normal 3' -5' linkages, 2'-5' linked analogs of these and those oligonucleotide backbones with reversed polarity, wherein one or more internucleotide linkages are a 3 'to 3', 5 'to 5', or 2 'to 2' linkage. Suitable oligonucleotides with reversed polarity comprise a single 3' to 3' linkage at the most 3' internucleotide linkage, i.e. a single inverted nucleoside residue which is basic (either nucleobase missing or it being replaced by a hydroxyl group). Also included are various salts (such as, for example, potassium or sodium), mixed salts, and free acid forms.
In some embodiments, the subject nucleic acids comprise one or more phosphorothioate and/or heteroatomic internucleoside linkages, particularly-CH2-NH-O-CH2-、-CH2-N(CH3)-O-CH2- (known as methylene (methylimino) or MMI backbone), -CH2-O-N(CH3)-CH2-、-CH2-N(CH3)-N(CH3)-CH2-and-O-N (CH)3)-CH2-CH2- (wherein the natural phosphodiester internucleotide linkage is represented by-O-P (═ O) (OH) -O-CH2-). MMI-type internucleoside linkages are disclosed in the above-mentioned U.S. patent No. 5,489,677, the disclosure of which is incorporated herein by reference in its entirety. Suitable amide internucleoside linkages are disclosed in U.S. Pat. No. 5,602,240, the disclosure of which is incorporated herein by reference in its entirety.
Also suitable are nucleic acids having morpholino backbone structures, as described, for example, in U.S. Pat. No. 5,034,506. For example, in some embodiments, the subject nucleic acids comprise a 6-membered morpholino ring in place of a ribose ring. In some of these embodiments, phosphorodiamidite or other non-phosphodiester internucleoside linkages are substituted for phosphodiester linkages.
Wherein the suitable modified polynucleotide backbone not containing a phosphorus atom has a bond via a short chain alkyl or cycloalkyl internucleoside linkage, a mixed heteroatom and an alkyl or cycloalkyl internucleoside linkage or oneOr a backbone formed by a plurality of short chain heteroatoms or heterocyclic internucleoside linkages. These include: those having morpholino linkages (formed in part from the sugar portion of the nucleoside); a siloxane backbone; sulfide, sulfoxide and sulfone backbones; a formylacetyl and thiocarbonylacetyl backbone; methylene formyl acetyl and thio formyl acetyl skeletons; a riboacetyl (riboacetenyl) backbone; an olefin-containing backbone; a sulfamate backbone; methylene imino and methylene hydrazino skeletons; sulfonate and sulfonamide backbones; an amide skeleton; and having N, O, S and CH mixed2Other skeletons that make up the part.
Simulation object
The subject nucleic acids can be nucleic acid mimetics. The term "mimetic" when applied to a polynucleotide is intended to include polynucleotides in which only the furanose ring or the furanose ring and internucleotide linkages are replaced by non-furanosyl groups, only furanose ring replacement also known in the art as sugar replacement. The heterocyclic base moiety or modified heterocyclic base moiety maintains hybridization to the appropriate target nucleic acid. One such nucleic acid, a polynucleotide mimetic that has been shown to have excellent hybridization properties, is known as a Peptide Nucleic Acid (PNA). In PNA, the sugar backbone of the polynucleotide is replaced by an amide-containing backbone, in particular by an aminoethylglycine backbone. The nucleotides are retained and bound directly or indirectly to the aza nitrogen atom of the amide portion of the backbone.
One polynucleotide mimetic reported to have excellent hybridization properties is Peptide Nucleic Acid (PNA). The backbone in PNA compounds is two or more linked aminoethylglycine units that impart an amide-containing backbone to the PNA. The heterocyclic base moiety is directly or indirectly bonded to the aza nitrogen atom of the amide moiety of the backbone. Representative U.S. patents describing the preparation of PNA compounds include, but are not limited to: U.S. Pat. nos. 5,539,082; 5,714,331; and 5,719,262, the disclosures of which are incorporated herein by reference in their entirety.
Another class of polynucleotide mimetics that have been investigated is based on linked morpholino units (morpholino nucleic acids) having a heterocyclic base attached to a morpholino ring. A number of linking groups have been reported to link morpholino monomer units in morpholino nucleic acids. A class of linking groups has been selected to give non-ionic oligomeric compounds. Non-ionic morpholino based oligomeric compounds are less likely to have undesired interactions with cellular proteins. Morpholino-based polynucleotides are non-ionic mimetics of oligonucleotides that are less likely to form undesired interactions with cellular proteins (Dwaine A. Braasch and David R. Corey, Biochemistry,2002,41(14), 4503-. Morpholino-based polynucleotides are disclosed in U.S. Pat. No. 5,034,506, the disclosure of which is incorporated herein by reference in its entirety. A variety of compounds within morpholino polynucleotides have been prepared that have a variety of different linking groups that link the monomeric subunits.
Another class of polynucleotide mimetics is known as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in DNA/RNA molecules is replaced by a cyclohexenyl ring. CeNA DMT protected phosphoramidite monomers have been prepared and used for oligomeric compound synthesis according to classical phosphoramidite chemistry. Fully modified CeNA oligomeric compounds and oligonucleotides having specific positions modified with CeNA have been prepared and studied (see Wang et al, j.am. chem. soc.,2000,122, 8595-. Generally, the introduction of a CeNA monomer into a DNA strand increases the stability of the DNA/RNA hybrid. The CeNA oligoadenylates form complexes with RNA and DNA complementary sequences with similar stability to the natural complex. Studies showing the introduction of CeNA structures into native nucleic acid structures by NMR and circular dichroism continue to allow for simple conformational adjustments.
Another modification includes Locked Nucleic Acids (LNAs) in which the 2 '-hydroxyl group is attached to the 4' carbon atom of the sugar ring to form a 2'-C, 4' -C-oxymethylene linkage, thereby forming a bicyclic sugar moiety. The bond may be methylene (-CH)2-) bridging a2 'oxygen atom and a 4' carbon atom, where n is 1 or 2(Singh et al, chem. Commun.,1998,4,455-456, the disclosure of which is incorporated herein by reference in its entirety). LNAs and LNA analogues exhibit very high duplex thermal stability (Tm ═ 3 ℃ to +10 ℃) with complementary DNA and RNA, stability towards 3' -exonucleolytic degradation and good solubility characteristics. Has been described as comprisingEffective and non-toxic antisense oligonucleotides with LNA (e.g., Wahlestedt et al, Proc. Natl. Acad. Sci. U.S.A.,2000,97,5633-5638, the disclosure of which is incorporated herein by reference in its entirety).
The synthesis and preparation of LNA monomers adenine, cytosine, guanine, 5-methyl-cytosine, thymine and uracil has been described along with their oligomerization and nucleic acid recognition properties (e.g., Koshkin et al, Tetrahedron,1998,54,3607-3630, the disclosure of which is incorporated herein by reference in its entirety). LNAs and their preparation are also described in WO 98/39352 and WO99/14226, and in us applications 20120165514, 20100216983, 20090041809, 20060117410, 20040014959, 20020094555, and 20020086998, the disclosures of which are incorporated herein by reference in their entirety.
Modified sugar moieties
The subject nucleic acids can also comprise one or more substituted sugar moieties. Suitable polynucleotides comprise a sugar substituent selected from: OH; f; o-, S-or N-alkyl; o-, S-or N-alkenyl; o-, S-or N-alkynyl; or O-alkyl-O-alkyl, wherein alkyl, alkenyl and alkynyl may be substituted or unsubstituted C1To C10Alkyl or C2To C10Alkenyl and alkynyl groups. Particularly suitable are: o ((CH)2)nO)mCH3、O(CH2)nOCH3、O(CH2)nNH2、O(CH2)nCH3、O(CH2)nONH2And O (CH)2)nON((CH2)nCH3)2Wherein n and m are 1 to about 10. Other suitable polynucleotides comprise a sugar substituent selected from: c1To C10Lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkylaryl, arylalkyl, O-alkylaryl or O-arylalkyl, SH, SCH3、OCN、Cl、Br、CN、CF3、OCF3、SOCH3、SO2CH3、ONO2、NO2、N3、NH2Heterocycloalkyl, heterocycloalkylaryl, aminoalkylamino, polyalkylamino, substituted silyl, RNA cleavageGroups, reporter groups, intercalators, groups that improve the pharmacokinetic properties of an oligonucleotide, or groups that improve the pharmacodynamic properties of an oligonucleotide, and other substituents with similar properties. Suitable modifications include 2 '-methoxyethoxy (2' -O-CH)2CH2OCH3Also known as 2'-O- (2-methoxyethyl) or 2' -MOE) (Martin et al, Helv. Chim. acta,1995,78,486-504, the disclosure of which is incorporated herein in its entirety by reference), i.e., alkoxyalkoxy groups. Additional suitable modifications include 2' -dimethylaminoxyethoxy, i.e., O (CH)2)2ON(CH3)2A group, also known as 2' -DMAOE, as described in the examples below; and 2 '-dimethylaminoethoxyethoxy (also known in the art as 2' -O-dimethyl-amino-ethoxy-ethyl or 2'-DMAEOE), i.e., 2' -O-CH2-O-CH2-N(CH3)2。
Other suitable sugar substituents include methoxy (-O-CH)3) Aminopropoxy (- -OCH)2CH2CH2NH2) Allyl (-CH)2-CH=CH2) - -O-allyl (- -O- -CH)2—CH=CH2) And fluorine (F). The 2' -sugar substituent may be in the arabinose (upper) position or the ribose (lower) position. A suitable 2 '-arabinose modification is 2' -F. Similar modifications can also be made at other positions on the oligomeric compound, specifically at the 3 'terminal nucleoside or at the 3' position of the sugar in a 2'-5' linked oligonucleotide and at the 5 'position of the 5' terminal nucleotide. The oligomeric compounds may also have sugar mimetics, such as cyclobutyl moieties, in place of the pentofuranosyl sugar.
Base modification and substitution
The subject nucleic acids may also include nucleobase (often referred to in the art simply as "base") modifications or substitutions. As used herein, "unmodified" or "natural" nucleobases include the purine bases adenine (A) and guanine (G) and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified nucleobases include other synthetic and natural nucleobases, such as 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C-CH)3) Uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo-uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-mercapto, 8-sulfanyl, 8-hydroxy and other 8-substituted adenine and guanine, 5-halo (specifically 5-bromo), 5-trifluoromethyl and other 5-substituted uracil and cytosine, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Additional modified nucleobases include tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido (5,4-b) (1,4) benzoxazin-2 (3H) -one), phenothiazine cytidine (1H-pyrimido (5,4-b) (1,4) benzothiazin-2 (3H) -one), G-clips such as substituted phenoxazine cytidine (e.g., 9- (2-aminoethoxy) -H-pyrimido (5,4- (b) (1,4) benzoxazin-2 (3H) -one), carbazole cytidine (2H-pyrimido (4,5-b) indol-2-one), pyridoindole cytidine (H-pyrido (3',2':4,5) pyrrolo (2,3-d) pyrimidin-2-one).
Heterocyclic base moieties may also include those in which the purine or pyrimidine base is replaced by another heterocyclic ring, such as 7-deazaadenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone. Additional nucleobases include those disclosed in U.S. Pat. No. 3,687,808, those disclosed in The sense Encyclopedia Of Polymer Science and engineering, pp. 858-859, Kroschwitz, J.I. eds John Wiley & Sons,1990, those disclosed by Englisch et al, Angewandte Chemie, International Edition,1991,30,613, and those disclosed by Sanghvi, Y.S., Chapter 15, Antisense Research and Applications, pp. 289-302, Crooou, S.T. and Lebleu, B.1993, CRC Press, The disclosures Of which are incorporated herein by reference in their entirety. Some of these nucleobases can be used to increase the binding affinity of oligomeric compounds. These include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methyl cytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6 ℃ -1.2 ℃ (Sanghvi et al eds., Antisense Research and Applications, CRC Press, Boca Raton,1993, p. 276-278; the disclosure of which is incorporated herein by reference in its entirety) and are suitable base substitutions, for example when combined with 2' -O-methoxyethyl sugar modifications.
Conjugates
Another possible modification of the subject nucleic acids involves chemically linking one or more moieties or conjugates that enhance the activity, cellular distribution, or cellular uptake of the oligonucleotide to the polynucleotide. These moieties or conjugates can include a conjugate group covalently bound to a functional group such as a primary or secondary hydroxyl group. Conjugate groups include, but are not limited to, intercalators, reporters, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of oligomers, and groups that enhance the pharmacokinetic properties of oligomers. Suitable conjugate groups include, but are not limited to, cholesterol, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluorescein, rhodamine, coumarin, and dyes. Groups that enhance pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or enhance sequence-specific hybridization to a target nucleic acid. Groups that enhance pharmacokinetic properties include groups that improve absorption, distribution, metabolism, or excretion of the subject nucleic acids.
Conjugate moieties include, but are not limited to, lipid moieties such as cholesterol moieties (Letsinger et al, Proc. Natl. Acad. Sci. USA,1989,86,6553-6556), cholic acids (Manohara et al, bioorg. Med. Chem. Let.,1994,4,1053-1060), thioethers such as hexyl-S-tritylthiol alcohol (Manohara et al, Ann. N.Y.Acad. Sci.,1992,660, 306-309; Manohara et al, bioorg. Med. Chem. Let.,1993,3,2765-2770), mercaptocholesterol (Oberhauser et al, Nucl. acids Res.,1992,20,533-538), aliphatic chains such as dodecanediol or undecyl residues (Manison-Behmoars et al, EMBO J.,1991,10,1111, 1990-cetyl phosphate, 23-54-DL. K., Pascal-DL. K., Mic. K., 1993, Sp. K., Mic. K. 35-78, Sp. K., tetrahedron lett, 1995,36, 3651-; shea et al, Nucl. acids Res.,1990,18, 3777-containing 3783), a polyamine or polyethylene glycol chain (Manohara et al, Nucleosides & Nucleotides,1995,14, 969-containing 973), or an adamantane acetic acid (Manohara et al, Tetrahedron Lett.,1995,36, 3651-containing 3654), a palmityl moiety (Mishra et al, Biochim. Biophys. acta,1995,1264, 229-containing 237), or an octadecylamine or hexylamino-carbonyl-hydroxycholesterol moiety (crook et al, J.Pharmacol. Exp. Ther.,1996,277, 923-containing 937).
The conjugates may include a "protein transduction domain" or PTD (also known as CPP-cell penetrating peptide), which may refer to a polypeptide, polynucleotide, carbohydrate or organic or inorganic compound that facilitates crossing a lipid bilayer, micelle, cell membrane, organelle membrane or vesicle membrane. PTDs attached to another molecule (which may range from small polar molecules to large macromolecules and/or nanoparticles) facilitate the passage of the molecule across the membrane, for example from the extracellular space into the intracellular space or from the cytosol into an organelle (e.g., the nucleus). In some embodiments, the PTD is covalently linked to the 3' terminus of the exogenous polynucleotide. In some embodiments, the PTD is covalently linked to the 5' end of the exogenous polynucleotide. Exemplary PTDs include, but are not limited to, the minimal undecendo polypeptide protein transduction domain (corresponding to residues 47-57 of HIV-1TAT comprising YGRKKRRQRRR; SEQ ID NO: 112); a poly-arginine sequence comprising an amount of arginine sufficient for introduction into a cell (e.g., 3, 4,5, 6, 7,8, 9, 10, or 10-50 arginines); the VP22 domain (Zender et al (2002) Cancer Gene ther.9(6): 489-96); drosophila Antennapedia gene (Antennapedia) protein transduction domain (Noguchi et al (2003) Diabetes 52(7): 1732-1737); truncated human calcitonin peptide (Trehin et al (2004) pharm. research 21: 1248-1256); polylysine (Wender et al (2000) Proc. Natl. Acad. Sci. USA 97: 13003-13008); RRQRRTSKLMKR SEQ ID NO: 113); transportan GWTLNSAGYLLGKINLKALAALAKKIL SEQ ID NO: 114); KALAWEAKLAKALAKALAKHLAKALAKALKCEA SEQ ID NO: 115); and RQIKIWFQNRRMKWKK SEQ ID NO: 116). Exemplary PTDs include, but are not limited to YGRKKRRQRRR SEQ ID NO:117), RKKRRQRRR SEQ ID NO: 118); an arginine homopolymer having from 3 arginine residues to 50 arginine residues; exemplary PTD domain amino acid sequences include, but are not limited to, any of the following sequences: YGRKKRRQRRR SEQ ID NO: 119); RKKRRQRR SEQ ID NO: 120); YARAAARQARA SEQID NO: 121); THRLPRRRRRR SEQ ID NO: 122); and GGRRARRRRRR SEQ ID NO: 123). In some embodiments, the PTD is an Activatable CPP (ACPP) (Aguilera et al (2009) Integr Biol (Camb)6 months; 1(5-6): 371-. ACPP includes a polycationic CPP (e.g., Arg9 or "R9") linked to a matching polyanion (e.g., Glu9 or "E9") by a cleavable linker, which reduces the net charge to near zero and thereby inhibits adhesion and uptake into cells. When the linker is cleaved, the polyanion is released, exposing the polyarginine and its inherent adhesiveness locally, thereby "activating" the ACPP to traverse the membrane.
Introducing the components into target cells
A CasY-directing RNA (or a nucleic acid comprising a nucleotide sequence encoding a CasY-directing RNA) and/or a CasY polypeptide of the disclosure (or a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide) and/or a CasY fusion polypeptide of the disclosure (or a nucleic acid comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure) and/or a donor polynucleotide (donor template) can be introduced into a host cell by a variety of well-known methods.
The CasY systems of the disclosure can be delivered to a target cell using any of a variety of compounds and methods (e.g., wherein the CasY system comprises: a) the CasY polypeptides and CasY guide RNAs of the disclosure; b) a CasY polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; c) a CasY fusion polypeptide and a CasY guide RNA of the present disclosure; d) a CasY fusion polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; e) mRNA encoding a CasY polypeptide of the disclosure and a CasY guide RNA; f) mRNA encoding a CasY polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; g) mRNA encoding a CasY fusion polypeptide of the disclosure and a CasY guide RNA; h) mRNA encoding a CasY fusion polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; i) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; j) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; k) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; l) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; m) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; n) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; o) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; p) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; q) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or r) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or some variation of one of (a) to (r). As a non-limiting example, the CasY system of the present disclosure may be combined with lipids. As another non-limiting example, the CasY system of the present disclosure can be combined with or formulated into particles.
Methods of introducing nucleic acids into host cells are known in the art, and any convenient method can be used to introduce the subject nucleic acids (e.g., expression constructs/vectors) into target cells (e.g., prokaryotic cells, eukaryotic cells, plant cells, animal cells, mammalian cells, human cells, etc.). Suitable methods include, for example, viral infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, Polyethyleneimine (PEI) mediated transfection, DEAE-dextran mediated transfection, liposome mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle mediated nucleic acid delivery (see, e.g., Panyam et al Adv Drug Deliv rev.2012, 9/13. pi: S0169-409X (12)00283-9.doi:10.1016/j.addr.2012.09.023), and the like.
In some cases, a CasY polypeptide of the present disclosure is provided as a nucleic acid (e.g., mRNA, DNA, plasmid, expression vector, viral vector, etc.) encoding a CasY polypeptide. In some cases, the CasY polypeptides of the disclosure are provided directly as a protein (e.g., not with or as a ribonucleoprotein complex). The CasY polypeptides of the disclosure can be introduced into (provided to) a cell by any convenient method; such methods are known to those of ordinary skill in the art. As an illustrative example, a CasY polypeptide of the disclosure can be injected directly into a cell (e.g., with or without a CasY-directing RNA or a nucleic acid encoding a CasY-directing RNA, and with or without a donor polynucleotide). As another example, a preformed complex (RNP) of a CasY polypeptide and a CasY guide RNA of the present disclosure can be introduced into a cell (e.g., a eukaryotic cell) (e.g., by injection, by nuclear transfection; by a Protein Transduction Domain (PTD) conjugated to one or more components, e.g., to a CasY protein, to a guide RNA, to a CasY polypeptide and a guide RNA of the present disclosure; etc.).
In some cases, a CasY fusion polypeptide of the disclosure (e.g., dCasY fused to a fusion partner, nickase CasY fused to a fusion partner, etc.) is provided as a nucleic acid (e.g., mRNA, DNA, plasmid, expression vector, viral vector, etc.) encoding the CasY fusion polypeptide. In some cases, a CasY fusion polypeptide of the disclosure is provided directly as a protein (e.g., not with an associated guide RNA or with an associated guide RNA, i.e., as a ribonucleoprotein complex). The CasY fusion polypeptides of the disclosure can be introduced into a cell (provided to the cell) by any convenient method; such methods are known to those of ordinary skill in the art. As an illustrative example, a CasY fusion polypeptide of the disclosure can be injected directly into a cell (e.g., with or without a nucleic acid encoding a CasY guide RNA, and with or without a donor polynucleotide). As another example, a preformed complex (RNP) of a CasY fusion polypeptide and a CasY guide RNA of the present disclosure can be introduced into a cell (e.g., by injection, by nuclear transfection; by a Protein Transduction Domain (PTD) conjugated to one or more components, e.g., to a CasY fusion protein, to a guide RNA, to a CasY fusion polypeptide and a guide RNA of the present disclosure; etc.).
In some cases, a nucleic acid (e.g., a CasY guide RNA; a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, etc.) is delivered to a cell (e.g., a target host cell) and/or a polypeptide (e.g., a CasY polypeptide; a CasY fusion polypeptide) in the particle, or is associated with the particle. In some cases, the CasY systems of the present disclosure are delivered to cells in the particle, or associated with the particle. The terms "particle" and "nanoparticle" are used interchangeably as appropriate. A recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and/or a CasY guide RNA, mRNA comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, and a guide RNA can be delivered simultaneously using a particle or lipid envelope; for example, the CasY polypeptide and the CasY guide RNA, e.g., as a complex (e.g., a Ribonucleoprotein (RNP) complex) can be delivered by a particle, e.g., by a delivery particle comprising a lipid or lipid-like and a hydrophilic polymer (e.g., a cationic lipid and a hydrophilic polymer), e.g., wherein the cationic lipid comprises 1, 2-dioleoyl-3-trimethylammonium-propane (DOTAP) or 1, 2-tetracosanyl-sn-glycero-3-phosphocholine (DMPC) and/or wherein the hydrophilic polymer comprises ethylene glycol or polyethylene glycol (PEG); and/or wherein the particles further comprise cholesterol (e.g., particles from formulation 1 ═ DOTAP 100, DMPC 0, PEG 0, cholesterol 0; formulation No.2 ═ DOTAP 90, DMPC 0, PEG 10, cholesterol 0; formulation No. 3 ═ DOTAP 90, DMPC 0, PEG 5, cholesterol 5). For example, particles can be formed using a multi-step method in which the CasY polypeptide and the CasY guide RNA are mixed together, e.g., at a molar ratio of 1:1, e.g., at room temperature, e.g., for 30 minutes, e.g., in sterile, nuclease-free 1x Phosphate Buffered Saline (PBS); and DOTAP, DMPC, PEG and cholesterol suitable for formulation are separately dissolved in ethanol (e.g., 100% ethanol) and the two solutions are mixed together to form particles containing the complex.
The CasY polypeptides of the disclosure (or mRNAs comprising nucleotide sequences encoding the CasY polypeptides of the disclosure; or recombinant expression vectors comprising nucleotide sequences encoding the CasY polypeptides of the disclosure) and/or the CasY guide RNAs (or nucleic acids, such as one or more expression vectors encoding the CasY guide RNAs) may be delivered simultaneously using a particle or lipid envelope. For example, a biodegradable core-shell structured nanoparticle having a poly (β -amino ester) (PBAE) core encapsulated by a phospholipid bilayer shell may be used. In some cases, self-assembling bioadhesive polymer based particles/nanoparticles are used; such particles/nanoparticles may find application in oral delivery of peptides, intravenous delivery of peptides and intranasal delivery of peptides, for example to the brain. Other embodiments are also contemplated, such as oral absorption and ocular delivery of hydrophobic drugs. Molecular encapsulation techniques involving an engineered polymer coating that is protected and delivered to the site of disease can be used. A dose of about 5mg/kg may be used, with single or multiple doses depending on various factors, such as the target tissue.
Lipid-like compounds (e.g., as described in U.S. patent application 20110293703) can also be used for administration of polynucleotides and can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure (e.g., wherein the CasY system comprises: a) a CasY polypeptide of the disclosure and a CasY guide RNA; b) a CasY polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; c) a CasY fusion polypeptide and a CasY guide RNA of the present disclosure; d) a CasY fusion polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; e) mRNA encoding a CasY polypeptide of the disclosure and a CasY guide RNA; f) mRNA encoding a CasY polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; g) mRNA encoding a CasY fusion polypeptide of the disclosure and a CasY guide RNA; h) mRNA encoding a CasY fusion polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; i) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; j) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; k) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; l) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; m) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; n) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; o) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; p) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; q) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or r) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or some variation of one of (a) to (r). In one aspect, the aminoalcohol lipidoid compound is combined with an agent to be delivered to a cell or subject to form a microparticle, nanoparticle, liposome, or micelle. The aminoalcohol lipidoid compound may be combined with other aminoalcohol lipidoid compounds, polymers (synthetic or natural), surfactants, cholesterol, carbohydrates, proteins, lipids, and the like to form particles. These particles can then optionally be combined with pharmaceutical excipients to form a pharmaceutical composition.
Poly (β -aminoalcohol) (PBAA) may be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. U.S. patent publication No. 20130302401 relates to a class of poly (β -amino alcohols) (PBAA) prepared using combinatorial polymerization.
Sugar-based particles, for example, GalNAc as described in reference WO2014118272 (incorporated herein by reference) and Nair, J K et al, 2014, Journal of the American Chemical Society 136(49), 169581-.
In some cases, a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure is delivered to a target cell using a Lipid Nanoparticle (LNP). Negatively charged polymers (such as RNA) can be loaded into LNPs at low pH values (e.g., pH 4), where the ionizable lipids exhibit a positive charge. However, at physiological pH, LNPs exhibit low surface charges that are compatible with longer cycle times. Four ionizable cationic lipids have been contemplated, namely 1, 2-dioleyl-3-dimethylammonium-propane (DLinDAP), 1, 2-dioleyloxy-3-N, N-dimethylaminopropane (DLinDMA), 1, 2-dioleyloxy-keto-N, N-dimethyl-3-aminopropane (dlindmma), and 1, 2-dioleyl-4- (2-dimethylaminoethyl) - [1,3] -dioxolane (dlinck 2-DMA). LNP preparation is described, for example, in Rosin et al (2011) Molecular Therapy 19: 1286-2200). Cationic lipids 1, 2-dioleyl-3-dimethylammonium-propane (DLInDAP), 1, 2-dioleyloxy-3-N, N-dimethylaminopropane (DLInDMA), 1, 2-dioleyloxyketone-N, N-dimethyl-3-aminopropane (DLinK-DMA), 1, 2-dioleyl-4- (2-dimethylaminoethyl) - [1,3] -dioxolane (DLinKC2-DMA), (3-o- [2 '' - (methoxypolyethylene glycol 2000) succinyl ] -1, 2-dimyristoyl-sn-ethylene glycol (PEG-S-DMG) and R-3- [ (. omega. -methoxy-poly (ethylene glycol) 2000) carbamoyl ] -1 may be used, 2-dimyristoyloxypropyl-3-amine (PEG-C-DOMG). Nucleic acids (e.g., CasY guide RNA; nucleic acids of the present disclosure, etc.) can be encapsulated in LNPs containing DLinDAP, DLinDMA, DLinK-DMA, and DLinKC2-DMA (cationic lipid: DSPC: CHOL: PEGS-DMG or PEG-C-DOMG at a molar ratio of 40:10:40: 10). In some cases, 0.2% SP-DiOC18 was incorporated.
Spherical Nucleic Acids (SNA)TM) Constructs and other nanoparticles (particularly gold nanoparticles) can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. See, for example, Cutler et al, J.Am.chem.Soc.2011133: 9254-9257, Hao et al, Small.20117:3158-3162, Zhang et al, ACS Nano.20115: 6962-6970, Cutler et al, J.Am.chem.Soc.2134: 1376-1391, Young et al, Nano Lett.201212:3867-71, ZHENG et al, Proc.Natl.Acad.Sci.USA.2012109: 11975-80, Mirkin, Nanomedicine 20127: 635-Zhang et al, J.Am.chem.Soc.2014: 21388-1691, Weintraub, Nature 2013495: S14-S16, Choi et al, Acans.37, Acans.2017619.2017619, Nature.201763, USA.68, USA.10: 47-2093, Mirkin et al.
Self-assembled nanoparticles with RNA can be constructed with Polyethylenimine (PEI) pegylated with an Arg-Gly-asp (rgd) peptide ligand attached at the distal end of polyethylene glycol (PEG).
Generally, "nanoparticle" refers to any particle having a diameter of less than 1000 nm. In some cases, a nanoparticle suitable for delivering a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell has a diameter of 500nm or less, e.g., 25nm to 35nm, 35nm to 50nm, 50nm to 75nm, 75nm to 100nm, 100nm to 150nm, 150nm to 200nm, 200nm to 300nm, 300nm to 400nm, or 400nm to 500 nm. In some cases, a nanoparticle suitable for delivering a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell has a diameter of 25nm to 200 nm. In some cases, a nanoparticle suitable for delivering a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell has a diameter of 100nm or less. In some cases, a nanoparticle suitable for delivering a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell has a diameter of 35nm to 60 nm.
Nanoparticles suitable for delivering a CasY polypeptide of the present disclosure, a CasY fusion polypeptide of the present disclosure, an RNP of the present disclosure, a nucleic acid of the present disclosure, or a CasY system of the present disclosure to a target cell can be provided in different forms, e.g., as solid nanoparticles (e.g., metals (such as silver, gold, iron, titanium), non-metals, lipid-based solids, polymers), suspensions of nanoparticles, or combinations thereof. Metallic, dielectric, and semiconductor nanoparticles, as well as hybrid structures (e.g., core-shell nanoparticles) can be prepared. Nanoparticles made of semiconductor materials can also be labeled as quantum dots, if they are small enough (typically below 10nm) quantization of the electronic energy levels occurs. Such nanoscale particles are useful as drug carriers or imaging agents in biomedical applications, and may be suitable for similar purposes in the present disclosure.
Semi-solid and soft nanoparticles are also suitable for delivering a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. The prototype nanoparticle with semi-solid properties was a liposome.
In some cases, a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure is delivered to a target cell using an exosome. Exosomes are endogenous nanovesicles that transport RNA and proteins, and can deliver RNA to the brain and other target organs.
In some cases, a liposome is used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. Liposomes are spherical vesicular structures consisting of a monolayer or multilamellar lipid bilayer surrounding an inner aqueous compartment and a relatively impermeable outer lipophilic phospholipid bilayer. Liposomes can be made from several different types of lipids; however, phospholipids are most commonly used to generate liposomes. Although liposome formation is spontaneous when the lipid film is mixed with an aqueous solution, the formation of liposomes can also be accelerated by applying force in the form of shaking using a homogenizer, sonicator or extrusion device. Several other additives may be added to liposomes in order to alter their structure and properties. For example, cholesterol or sphingomyelin may be added to the liposome mixture to help stabilize the liposome structure and prevent leakage of the liposome internal contents. The liposome formulation may consist essentially of: natural phospholipids and lipids such as1, 2-distearoyl-sn-glycero-3-phosphatidylcholine (DSPC), sphingomyelin, egg phosphatidylcholine and monosialoganglioside.
The stabilized nucleic acid-lipid particles (SNALP) can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. The SNALP formulation may contain the lipid 3-N- [ (methoxypoly (ethylene glycol) 2000) carbamoyl ] -1, 2-dimyristoyloxy-propylamine (PEG-C-DMA), 1, 2-dioleyloxy-N, N-dimethyl-3-aminopropane (DLinDMA), 1, 2-distearoyl-sn-glycero-3-phosphocholine (DSPC) and cholesterol at a 2:40:10:48 mole percent ratio. SNALP liposomes can be prepared by formulating D-Lin-DMA and PEG-C-DMA with Distearoylphosphatidylcholine (DSPC), cholesterol and siRNA using a lipid/siRNA ratio of 25:1 and a cholesterol/D-Lin-DMA/DSPC/PEG-C-DMA molar ratio of 48/40/10/2. The resulting SNALP liposomes can be about 80-100nm in size. The SNALP can comprise synthetic cholesterol (Sigma-Aldrich, St Louis, mo., USA), dipalmitoylphosphatidylcholine (Avanti Polar Lipids, Alabaster, ala., USA), 3-N- [ (w-methoxypoly (ethylene glycol) 2000) carbamoyl ] -1, 2-dimyristoyloxypropylamine, and the cation 1, 2-dioleyloxy-3-N, N-dimethylaminopropane. The SNALP may comprise synthetic cholesterol (Sigma-Aldrich), 1, 2-distearoyl-sn-glycero-3-phosphocholine (DSPC; Avanti Polar Lipids Inc.), PEG-cDMA, and 1, 2-diolexy-3- (N; N-dimethyl) aminopropane (DLinDMA).
Other cationic lipids, such as the amino lipid 2, 2-dioleyl-4-dimethylaminoethyl- [1,3] -dioxolane (DLin-KC2-DMA), can be used to deliver the CasY polypeptides of the present disclosure, the CasY fusion polypeptides of the present disclosure, the RNPs of the present disclosure, the nucleic acids of the present disclosure, or the CasY systems of the present disclosure to a target cell. Preformed vesicles having the following lipid composition may be considered: amino lipids, Distearoylphosphatidylcholine (DSPC), cholesterol and (R) -2, 3-bis (octadecyloxy) propyl-1- (methoxypoly (ethylene glycol) 2000) propyl carbamate (PEG-lipid) at a molar ratio of 40/10/40/10, respectively, and a FVII siRNA/total lipid ratio of about 0.05 (w/w). To ensure a narrow particle size distribution in the range of 70-90nm and a low polydispersity index of 0.11.+ -.0.04(n 56), the particles can be extruded through an 80nm membrane up to three times before the addition of guide RNA. Particles containing the high potency amino lipid 16 may be used, where the molar ratio of the four lipid components 16, DSPC, cholesterol and PEG-lipid (50/10/38.5/1.5) may be further optimized to enhance in vivo activity.
Lipids can be formulated with the CasY system of the present disclosure or one or more components thereof or nucleic acids encoding the same to form Lipid Nanoparticles (LNPs). Suitable lipids include, but are not limited to, DLin-KC2-DMA4, C12-200, and glycolipids, distearoylphosphatidylcholine, cholesterol, and PEG-DMG can be formulated with the CasY system of the present disclosure or components thereof using spontaneous vesicle formation processes. The component molar ratio may be about 50/10/38.5/1.5(DLin-KC2-DMA or C12-200/distearoylphosphatidylcholine/cholesterol/PEG-DMG).
The CasY systems of the present disclosure or components thereof may be delivered encapsulated in PLGA microspheres, such as those described further in U.S. published applications 20130252281 and 20130245107 and 20130244279.
The supercharged protein can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. Supercharged proteins are a class of engineered or naturally occurring proteins that have an unusually high positive or negative net theoretical charge. Both the super negative and the super positive charged proteins exhibit the ability to withstand heat or chemically induced aggregation. The positively charged protein is also capable of penetrating mammalian cells. Associating substances with these proteins (such as plasmid DNA, RNA, or other proteins) can enable functional delivery of these macromolecules to mammalian cells in vitro and in vivo.
Cell Penetrating Peptides (CPPs) can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure, or a CasY system of the disclosure to a target cell. CPPs typically have an amino acid composition containing a high relative abundance of positively charged amino acids (such as lysine or arginine), or a sequence containing an alternating pattern of polar/charged amino acids and non-polar hydrophobic amino acids.
The implantable device can be used to deliver a CasY polypeptide of the disclosure, a CasY fusion polypeptide of the disclosure, an RNP of the disclosure, a nucleic acid of the disclosure (e.g., a CasY guide RNA, a nucleic acid encoding a CasY polypeptide, a donor template, etc.) or a CasY system of the disclosure to a target cell (e.g., a target cell in vivo, wherein the target cell is a target cell in circulation, a target cell in a tissue, a target cell in an organ, etc.). An implantable device suitable for delivering a CasY polypeptide of the present disclosure, a CasY fusion polypeptide of the present disclosure, an RNP of the present disclosure, a nucleic acid of the present disclosure, or a CasY system of the present disclosure to a target cell (e.g., a target cell in vivo, wherein the target cell is a target cell in circulation, a target cell in a tissue, a target cell in an organ, etc.) can comprise a container (e.g., a reservoir, a matrix, etc.) comprising a CasY polypeptide, a CasY fusion polypeptide, an RNP, or a CasY system (or a component thereof, e.g., a nucleic acid of the present disclosure).
Suitable implantable devices can include, for example, a polymeric substrate (such as a matrix) for use as the device body, and in some cases additional stent material (such as a metal or additional polymer), as well as materials to enhance visibility and imaging. Implantable delivery devices can be advantageous to provide release over a localized and prolonged period of time, wherein the polypeptide and/or nucleic acid to be delivered is released directly to a target site, e.g., extracellular matrix (ECM), vasculature surrounding a tumor, diseased tissue, etc. Suitable implantable delivery devices include devices suitable for delivery to a cavity (such as the abdominal cavity) and/or any other type of administration where the drug delivery system is not anchored or attached, including biostable and/or degradable and/or bioabsorbable polymeric substrates, which may, for example, optionally be a matrix. In some cases, suitable implantable drug delivery devices comprise degradable polymers, wherein the primary release mechanism is bulk erosion. In some cases, suitable implantable drug delivery devices comprise non-degradable or slowly degradable polymers, where the primary release mechanism is diffusion rather than bulk erosion, such that the outer portion functions as a membrane and the inner portion serves as a drug reservoir that is virtually unaffected by the surrounding environment over a long period of time (e.g., about one week to about several months). Combinations of different polymers with different release mechanisms may also optionally be used. The concentration gradient can remain effectively constant over the lifetime of the total release period, and thus the diffusion rate is effectively constant (referred to as "zero mode" diffusion). The term "constant" means that the diffusion rate remains above the lower threshold of therapeutic effectiveness, but it is still optionally characterized by an initial burst and/or can fluctuate, e.g., increase and decrease to some extent. The diffusion rate can be maintained for a long time, and can be considered constant to a certain level to optimize the therapeutic useful life, e.g., the effective silent period.
In some cases, the implantable delivery system is designed to protect the nucleotide-based therapeutic agent from degradation, either chemically or due to attack by enzymes and other factors in the subject.
The implantation site or target site of the device may be selected for maximum therapeutic efficacy. For example, the delivery device may be implanted within or near the tumor environment, or within or near a blood supply associated with the tumor. The target location may be, for example: 1) sites of brain degeneration, like in parkinson's disease or alzheimer's disease at the basal ganglia, white matter and gray matter; 2) the spine, as in the case of Amyotrophic Lateral Sclerosis (ALS); 3) the cervix; 4) active and chronic inflammatory joints; 5) dermis, as in the case of psoriasis; 7) sympathetic and sensory nerve sites for analgesia; 7) a bone; 8) sites of acute or chronic infection; 9) in the vagina; 10) inner ear-auditory system, inner ear labyrinth, vestibular system; 11) in the trachea; 12) in the heart; coronary arteries, epicardium; 13) the urinary tract or bladder; 14) a bladder system; 15) parenchymal tissues including, but not limited to, kidney, liver, spleen; 16) lymph nodes; 17) salivary glands; 18) a gum; 19) intra-articular (into the joint); 20) in the eye; 21) brain tissue; 22) a cerebral ventricle; 23) cavities, including the abdominal cavity (such as, but not limited to, ovarian cancer); 24) in the esophagus; and 25) intrarectally; and 26) into the vasculature.
Insertion methods (such as implantation) may optionally already be used for other types of tissue implantation and/or for insertion and/or for tissue sampling, optionally without modification, or alternatively only optionally with non-primary modifications in such methods. Such methods optionally include, but are not limited to, brachytherapy methods, biopsy, endoscopy with and/or without ultrasound (such as stereotactic methods of accessing brain tissue), laparoscopy (including laparoscopic implantation into joints, abdominal organs, bladder walls, and body cavities).
Modified host cells
The present disclosure provides a modified cell comprising a CasY polypeptide of the present disclosure and/or a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the present disclosure. The present disclosure provides a modified cell comprising a CasY polypeptide of the disclosure, wherein the modified cell is a cell that does not normally comprise a CasY polypeptide of the disclosure. The present disclosure provides a modified cell (e.g., a genetically modified cell) comprising a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure. The present disclosure provides a genetically modified cell genetically modified with an mRNA comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure. The present disclosure provides a genetically modified cell genetically modified with a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure. The present disclosure provides a genetically modified cell genetically modified with a recombinant expression vector comprising: a) a nucleotide sequence encoding a CasY polypeptide of the disclosure; and b) a nucleotide sequence encoding a CasY guide RNA of the disclosure. The present disclosure provides a genetically modified cell genetically modified with a recombinant expression vector comprising: a) a nucleotide sequence encoding a CasY polypeptide of the disclosure; b) a nucleotide sequence encoding a CasY guide RNA of the disclosure; and c) a nucleotide sequence encoding a donor template.
The cell that serves as a recipient for a CasY polypeptide of the present disclosure and/or a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the present disclosure and/or a CasY guide RNA of the present disclosure can be any of a variety of cells, including, for example, in vitro cells; an in vivo cell; an ex vivo cell; primary cells; cancer cells; an animal cell; a plant cell; algal cells; fungal cells, and the like. Cells that serve as recipients of a CasY polypeptide of the disclosure and/or a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and/or a CasY guide RNA of the disclosure are referred to as "host cells" or "target cells". The host cell or target cell may be a receptor of the CasY system of the present disclosure. The host cell or target cell may be a receptor for the CasY RNP of the present disclosure. The host cell or target cell may be a receptor for a single component of the CasY system of the present disclosure.
Non-limiting examples of cells (target cells) include: prokaryotic cells, eukaryotic cells, bacterial cells, archaebacterial cells, cells of unicellular eukaryotic organisms, protozoal cells, cells from plants (e.g., cells from plant crops, fruits, vegetables, cereals, soybeans, corn (corn), corn (maize), wheat, seeds, tomatoes, rice, cassava, sugarcane, squash, hay, potatoes, cotton, hemp, tobacco, flowering plants, conifers, gymnosperms, angiosperms, ferns, lycopodium, hornworts, liverworts, moss, dicotyledons, monocotyledons, etc.), algal cells (e.g., cells of botrytis braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana (Nannochloropsis digitana), pyrenoidosa (Chlorella pyrenoidosa), gulfweed (gulfweed), sargash), kelp (kelp, etc.)), algal cells (e.g., fungi), yeast cells, cells from mushrooms), animal cells, cells from invertebrates (e.g., drosophila, cnidaria, echinoderm, nematode, etc.), cells from vertebrates (e.g., fish, amphibians, reptiles, birds, mammals), from mammals (e.g., ungulates (e.g., pigs, cows, goats, sheep); rodents (e.g., rats, mice); a non-human primate; a human being; felines (e.g., cats); canine (e.g., dog), etc.), etc. In some cases, the cell is a cell that is not derived from a natural organism (e.g., the cell can be a synthetic cell; also referred to as an artificial cell).
The cells can be in vitro cells (e.g., established cultured cell lines). The cells may be ex vivo cells (cultured cells from an individual). The cell can be an in vivo cell (e.g., a cell in an individual). The cell may be an isolated cell. The cell may be a cell inside an organism. The cell may be an organism. The cell can be a cell in a cell culture (e.g., an in vitro cell culture). The cell may be one of a collection of cells. The cell may be prokaryotic or derived from a prokaryotic cell. The cell may be a bacterial cell or may be derived from a bacterial cell. The cells may be archaeal cells or derived from archaeal cells. The cell may be or be derived from a eukaryotic cell. The cell may be a plant cell or derived from a plant cell. The cells may be animal cells or derived from animal cells. The cells may be invertebrate cells or derived from invertebrate cells. The cells may be vertebrate cells or derived from vertebrate cells. The cells may be mammalian cells or derived from mammalian cells. The cells may be rodent cells or derived from rodent cells. The cells may be human cells or derived from human cells. The cells may be microbial cells or derived from microbial cells. The cell may be a fungal cell or derived from a fungal cell. The cell may be an insect cell. The cell can be an arthropod cell. The cell may be a protozoan cell. The cells may be helminth cells.
Suitable cells include stem cells (e.g., Embryonic Stem (ES) cells, Induced Pluripotent Stem (iPS) cells, germ cells (e.g., oocytes, sperm, oogonium, spermatogonium, etc.), somatic cells, such as fibroblasts, oligodendrocytes, glial cells, hematopoietic cells, neurons, muscle cells, bone cells, liver cells, pancreatic cells, etc.
Suitable cells include human embryonic stem cells, embryonic cardiomyocytes, myofibroblasts, mesenchymal stem cells, autologous-transplanted expanded cardiomyocytes, adipocytes, totipotent cells, pluripotent cells, blood stem cells, myoblasts, adult stem cells, bone marrow cells, mesenchymal cells, embryonic stem cells, parenchymal cells, epithelial cells, endothelial cells, mesothelial cells, fibroblasts, osteoblasts, chondrocytes, exogenous cells, endogenous cells, stem cells, hematopoietic stem cells, bone marrow-derived progenitor cells, cardiomyocytes, skeletal cells, fetal cells, undifferentiated cells, pluripotent progenitor cells, unipotent progenitor cells, monocytes, cardiac myoblasts, skeletal myoblasts, macrophages, capillary endothelial cells, xenogeneic cells, allogeneic cells, and postpartum stem cells.
In some cases, the cells are immune cells, neurons, epithelial cells, and endothelial cells or stem cells. In some cases, the immune cell is a T cell, B cell, monocyte, natural killer cell, dendritic cell, or macrophage. In some cases, the immune cell is a cytotoxic T cell. In some cases, the immune cell is a helper T cell. In some cases, the immune cell is a regulatory T cell (Treg).
In some cases, the cell is a stem cell. The stem cells include adult stem cells. Adult stem cells are also known as somatic stem cells.
Adult stem cells reside in differentiated tissues, but retain the properties of self-renewal and the ability to produce a variety of cell types, usually the typical cell type in the tissue in which the stem cells reside. Many examples of somatic stem cells are known to those skilled in the art, including muscle stem cells; hematopoietic stem cells; epithelial stem cells; a neural stem cell; mesenchymal stem cells; a mammary gland stem cell; (ii) intestinal stem cells; mesodermal stem cells; endothelial stem cells; sniffing the stem cells; neural crest stem cells, and the like.
Stem cells of interest include mammalian stem cells, where the term "mammal" refers to any animal classified as a mammal, including humans; a non-human primate; domestic and farm animals; and zoo, laboratory, sports, or pet animals such as dogs, horses, cats, cattle, mice, rats, rabbits, and the like. In some cases, the stem cell is a human stem cell. In some cases, the stem cell is a rodent (e.g., mouse; rat) stem cell. In some cases, the stem cells are non-human primate stem cells.
The stem cells may express one or more stem cell markers, such as SOX9, KRT19, KRT7, LGR5, CA9, FXYD2, CDH6, CLDN18, TSPAN8, BPIFB1, OLFM4, CDH17, and PPARGC 1A.
In some embodiments, the stem cell is a Hematopoietic Stem Cell (HSC). HSCs are mesoderm-derived cells that can be isolated from bone marrow, blood, cord blood, fetal liver, and yolk sac. HSC are characterized by CD34+And CD3-. HSCs can regenerate erythroid, neutrophil-macrophage, megakaryocyte, and lymphoid hematopoietic lineages in vivo. In vitro, HSCs can be induced to undergo at least some self-renewing cell division and can be induced to differentiate into the same lineages as seen in vivo. Thus, HSCs can be induced to differentiate into one or more of erythrocytes, megakaryocytes, neutrophils, macrophages and lymphocytes.
In other embodiments, the stem cell is a Neural Stem Cell (NSC). Neural Stem Cells (NSCs) are capable of differentiating into neurons and glial cells (including oligodendrocytes and astrocytes). Neural stem cells are pluripotent stem cells that are capable of multiple divisions and under particular conditions may give rise to daughter cells that are neural stem cells, or may be neural progenitor cells that are neuroblasts or glioblasts, e.g., cells that are committed to become one or more types of neurons and glial cells, respectively. Methods of obtaining NSCs are known in the art.
In other embodiments, the stem cell is a Mesenchymal Stem Cell (MSC). MSCs were originally derived from embryonic mesoderm and isolated from adult bone marrow and can differentiate to form muscle, bone, cartilage, fat, bone marrow stroma, and tendon. Methods of isolating MSCs are known in the art; and the MSCs may be obtained using any known method. See, e.g., U.S. Pat. No. 5,736,396, which describes the isolation of human MSCs.
In some cases, the cell is a plant cell. The plant cell may be a cell of a monocotyledonous plant. The cell may be a dicot cell.
In some cases, the cell is a plant cell. For example, the cell can be a cell of a major agricultural plant, such as barley, legumes (dry-fed), oilseed rape, corn, cotton (pima cotton), cotton (upland cotton), linseed, hay (alfalfa), hay (non-alfalfa), oat, peanut, rice, sorghum, soybean, sugarbeet, sugarcane, sunflower (oil), sunflower (non-oil), sweet potato, tobacco (burley), tobacco (flue-cured), tomato, wheat (durum), wheat (spring wheat), wheat (winter wheat), and the like. As another example, the cells are of a vegetable crop including, but not limited to, for example, alfalfa sprouts, aloe leaves, kudzu roots (arrow root), arrowhead (arrowhead), artichoke, asparagus, bamboo shoots, banana flowers, bean sprouts, beans, beet leaves, sugar beets, bitter gourds, bok choy, broccoli (turnip), brussels sprouts, cabbage sprouts, cactus leaves (opuntia ficus), crow's feet, sea-buckthorn, carrots, cauliflower, celery, chayote, chinese artichoke (crosone), chinese cabbage, celery, chinese chives, cabbage hearts, chrysanthemum leaves (chrysanthemum), kale, corn stover, sweet corn, cucumber, radish (daikon), dandelion, tender leaves, taro (dasheen), dauu (pea tip), donqua (white gourd), eggplant, chicory (endive), lettuce, fiddlehead, cress, water cress, etc, Endive, gesson (mustard), galion, galangal (siamese, thailand), garlic, ginger root, burdock (gobo), tender leaf, green leaf for hannovalacia (hanover salad green), huauzonte, jerusalem artichoke (jerusalem garthok), pachyrhizus, kale (kale) tender leaf, kohlrabi (kohlrabi), quinoa (quille), lettuce (bibb)), lettuce (boston), lettuce (boston red), lettuce (green leaf), lettuce (iceberg), lettuce (pilchard), lettuce (lotus root), lettuce (green leaf), lettuce (red oak leaf), lettuce (lettuce) processing lettuce (processing lettuce), lettuce (red leaf), lettuce (roman), lettuce (lettuce), lettuce (red lettuce (lettuce), lettuce (lotus root), radish (red bean), radish (cabbage), radish (red bean), radish (red cabbage (red bean), radish (red cabbage, red bean), red cabbage (red cabbage, red cabbage (red bean), red cabbage (red cabbage, red bean, red cabbage (red cabbage, agave (agave) leaves, yellow taro (malanga), mixed lettuce (messulin mix), watercress (mizuna), moap (lubricous luffa), moo, moqua (squash), mushrooms, mustard, yam (naganio), okra, ricepaper, onion tender leaves, opo (zucchini), ornamental corn, ornamental cucurbit, parsley, parsnip, peas, peppers (bell type), capsicum, pumpkin (pumpkin), chicory (radicchio), radish sprouts, radish (radish), green brassica, rhubarb, roma (baby red), rutabaga, salicornia (sea bean), luffa (horn/ridged), spinach, squash (squash), rice straw bales, sugar cane, sweet potato, tang lettuce, roe, taro (taro), taro (taro), taro (taro), mustard, kokura (koba), kokura (koku, Tomatoes, tomatoes (cherry type), tomatoes (grape type), tomatoes (plum type), turmeric, turnip stem and leaf, turnip (turnip), water chestnut, yam (yampi), yam (name), rape (yu choy), cassava (yuca) (cassava), and the like.
In some cases, the cell is an arthropod cell. For example, the cell may be a cell of the following sub-order, family, sub-family, population, sub-population or species: for example, there are the subdivision of the chelerythra (Cheliceta), the polypod subdivision (Myriapoda), Hexipodia, the Arachnida (Arachnida), the Insecta (Insecta), the Shimmy 34499the order (Archaeoglatha), the Thysanoptera (Thysanura), the subtropica of the ancient wing (Palaeoptera), the Mylabes (Ephemeroptera), the Aeschynes (Odonata), the Difference (Anisoptera), the Aphyllae (Zygopetera), the Neoptera (Neoptera), the Total ptera of the outer wing (Exopterygota), the Plecoptera (Plecoptera), the spinuloptera (Embioptera), the Orthoptera (Orthoptera), the Defect (Zoraptera), the Dermaptera (Diptera), and the Diptera (Diptera)yoptera), Polydesmales (Notoptera), Polydesmales (Griyloblatidae), mantisFamily (Mantopathidae), order Lophatheria (Phasmatodea), order Blattaria (Blattaria), Isoptera (Isoptera), order Mantidia (Mantodea), Parapneuroptera (Psocoptera), order Thysanoptera (Thysanoptera), order Phthiraptera (Phthiraptera), order Hemiptera (Hemiptera), order endoptera (Endoptera) or holomorphic species (Holomebola), order Hymenoptera (Hymenoptera), order Coleoptera (Coleoptera), order Hyrioptera (Streptoptera), order Serpentraea (Raphidoptera), order Guangdaloptera (Megaloptera), order Neuroptera (Neuroptera), order Longipera (Mecoptera), order Siphonaptera (Siphonaptera), order Diptera (Diptera), order Trichoptera (Lepidoptera) or Lepidoptera (Lepidoptera).
In some cases, the cell is an insect cell. For example, in some cases, the cell is a cell of a mosquito, grasshopper, hemipteran insect, fly, flea, bee, wasp, ant, lice, moth, or beetle.
Reagent kit
The present disclosure provides a kit comprising a CasY system of the present disclosure or components of a CasY system of the present disclosure.
The kit of the present disclosure may comprise: a) the CasY polypeptides and CasY guide RNAs of the disclosure; b) a CasY polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; c) a CasY fusion polypeptide and a CasY guide RNA of the present disclosure; d) a CasY fusion polypeptide, a CasY guide RNA, and a donor template nucleic acid of the present disclosure; e) mRNA encoding a CasY polypeptide of the disclosure and a CasY guide RNA; f) mRNA encoding a CasY polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; g) mRNA encoding a CasY fusion polypeptide of the disclosure and a CasY guide RNA; h) mRNA encoding a CasY fusion polypeptide of the disclosure, a CasY guide RNA, and a donor template nucleic acid; i) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; j) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; k) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a nucleotide sequence encoding a CasY guide RNA; l) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a CasY guide RNA, and a nucleotide sequence encoding a donor template nucleic acid; m) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; n) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; o) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA; p) a first recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure and a second recombinant expression vector comprising a nucleotide sequence encoding a CasY guide RNA, and a donor template nucleic acid; q) a recombinant expression vector comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or r) a recombinant expression vector comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, a nucleotide sequence encoding a first CasY-directing RNA, and a nucleotide sequence encoding a second CasY-directing RNA; or some variation of one of (a) to (r).
The kit of the present disclosure may comprise: a) a component of a CasY system of the present disclosure as described above, or may comprise a CasY system of the present disclosure; and b) one or more additional reagents, e.g., i) a buffer; ii) a protease inhibitor; iii) nuclease inhibitors; iv) reagents required to develop or visualize the detectable label; v) a positive and/or negative control target DNA; vi) positive and/or negative control CasY guide RNA, and the like. The kit of the present disclosure may comprise: a) a component of a CasY system of the present disclosure as described above, or may comprise a CasY system of the present disclosure; and b) a therapeutic agent.
The kits of the present disclosure may comprise a recombinant expression vector comprising: a) an insertion site for inserting a nucleic acid comprising a nucleotide sequence encoding a portion of a CasY-guide RNA that hybridizes to a target nucleotide sequence in a target nucleic acid; and b) a nucleotide sequence encoding a CasY binding portion of a CasY guide RNA. The kits of the present disclosure may comprise a recombinant expression vector comprising: a) an insertion site for inserting a nucleic acid comprising a nucleotide sequence encoding a portion of a CasY-guide RNA that hybridizes to a target nucleotide sequence in a target nucleic acid; b) a nucleotide sequence encoding a CasY-binding portion of a CasY-directing RNA; and c) a nucleotide sequence encoding a CasY polypeptide of the disclosure.
Practicality of use
The CasY polypeptides of the disclosure or the CasY fusion polypeptides of the disclosure can be used in a variety of methods (e.g., in combination with a CasY guide RNA, and in some cases also in combination with a donor template). For example, the CasY polypeptides of the disclosure can be used to (i) modify (e.g., cleave, e.g., nick; methylate, etc.) a target nucleic acid (DNA or RNA; single-or double-stranded); (ii) regulating transcription of the target nucleic acid; (iii) labeling the target nucleic acid; (iv) binding to a target nucleic acid (e.g., for the purposes of separation, labeling, imaging, tracking, etc.); (v) modifying a polypeptide (e.g., histone) associated with a target nucleic acid, and the like. Accordingly, the present disclosure provides a method of modifying a target nucleic acid. In some cases, a method of the present disclosure for modifying a target nucleic acid comprises contacting the target nucleic acid with: a) a CasY polypeptide of the present disclosure; and b) one or more (e.g., two) CasY guide RNAs. In some cases, a method of the present disclosure for modifying a target nucleic acid comprises contacting the target nucleic acid with: a) a CasY polypeptide of the present disclosure; b) a CasY guide RNA; and c) a donor nucleic acid (e.g., donor template). In some cases, the contacting step is performed in a cell in vitro. In some cases, the contacting step is performed in a cell in vivo. In some cases, the contacting step is performed in an ex vivo cell.
Because the method of using a CasY polypeptide includes binding the CasY polypeptide to a specific region in the target nucleic acid (targeting a specific region in the target nucleic acid by the associated CasY-directing RNA), the method is generally referred to herein as a binding method (e.g., a method of binding to a target nucleic acid). However, it is understood that in some cases, while a binding method may not result in binding of a target nucleic acid, in other cases, the method may have a different end result (e.g., the method may result in modification of the target nucleic acid (e.g., cleavage/methylation, etc.), regulation of transcription of the target nucleic acid, regulation of translation of the target nucleic acid, genome editing, regulation of a protein associated with the target nucleic acid, isolation of the target nucleic acid, etc.).
For examples of suitable methods, see, e.g., Jinek et al, science.2012, 8/17; 337(6096) 816-21; chylinski et al, RNA biol.2013 for 5 months; 726-37 (10) (5); ma et al, Biomed ResInt.2013; 2013: 270805; hou et al, Proc Natl Acad Sci U S A.2013, 24 months 9; 110(39) 15644-9; jinek et al, elife.2013; 2: e 00471; pattanayak et al, Nat Biotechnol.2013, month 9; 839-43 in 31 (part C); qi et al, cell.2013, 2 month 28; 152(5) 1173-83; wang et al, cell.2013, 5 months and 9 days; 153(4) 910-8; auer et al Genome res.2013, 10 months and 31 days; chen et al, Nucleic acids sRs.2013, 11/1; 41(20) e 19; cheng et al, Cell Res.2013 for 10 months; 23(10) 1163-71; cho et al, genetics.2013, 11 months; 195(3) 1177-80; dicalo et al, Nucleic Acids Res.2013, 4 months; 4336-43 in 41 (7); dickinson et al, Nat methods.2013, 10 months; 10, (10) 1028 to 34; ebina et al, Sci Rep.2013; 3: 2510; fujii et al, Nucleic Acids Res.2013, 11 months and 1 day; 41(20) e 187; hu et al, Cell res.2013 for 11 months; 23, (11) 1322-5; jiang et al, Nucleic Acids Res.2013, 11/1/month; 41(20) e 188; larson et al, Nat protoc.2013 for 11 months; 2180-96 parts of 8 (11); mali et al, Natmethods.2013, 10 months; 10, (10) 957-63; nakayama et al, genesis.2013, month 12; 51(12) 835-43; ran et al, Nat protoc.2013 for 11 months; 2281-308; ran et al, cell.2013, 9/12; 154, (6) 1380-9; upadhyay et al, G3(Bethesda).2013, 12 months and 9 days; 2233-8 in 3 (12); walsh et al, Proc Natl Acad Sci U S A.2013, 9 months and 24 days; 110(39) 15514-5; xie et al, Mol plant.2013, 10 months and 9 days; yang et al, cell.2013, 9 months and 12 days; 154(6) 1370-9; and U.S. patents and patent applications: 8,906,616, respectively; 8,895,308, respectively; 8,889,418, respectively; 8,889,356, respectively; 8,871,445, respectively; 8,865,406, respectively; 8,795,965, respectively; 8,771,945, respectively; 8,697,359; 20140068797, respectively; 20140170753, respectively; 20140179006, respectively; 20140179770, respectively; 20140186843, respectively; 20140186919, respectively; 20140186958, respectively; 20140189896, respectively; 20140227787, respectively; 20140234972, respectively; 20140242664, respectively; 20140242699, respectively; 20140242700, respectively; 20140242702, respectively; 20140248702, respectively; 20140256046, respectively; 20140273037, respectively; 20140273226, respectively; 20140273230, respectively; 20140273231, respectively; 20140273232, respectively; 20140273233, respectively; 20140273234, respectively; 20140273235, respectively; 20140287938, respectively; 20140295556, respectively; 20140295557, respectively; 20140298547, respectively; 20140304853, respectively; 20140309487, respectively; 20140310828, respectively; 20140310830, respectively; 20140315985, respectively; 20140335063, respectively; 20140335620, respectively; 20140342456, respectively; 20140342457, respectively; 20140342458, respectively; 20140349400, respectively; 20140349405, respectively; 20140356867, respectively; 20140356956, respectively; 20140356958, respectively; 20140356959, respectively; 20140357523, respectively; 20140357530, respectively; 20140364333, respectively; and 20140377868; each of which is hereby incorporated by reference in its entirety.
For example, the present disclosure provides (but is not limited to) methods of cleaving a target nucleic acid; a method of editing a target nucleic acid; methods of modulating transcription of a target nucleic acid; methods for isolating target nucleic acids, methods for binding target nucleic acids, methods for imaging target nucleic acids, methods for modifying target nucleic acids, and the like.
As used herein, the term/phrase "contacting a target nucleic acid, e.g., with a CasY polypeptide or with a CasY fusion polypeptide, etc., encompasses all methods for contacting a target nucleic acid. For example, the CasY polypeptide can be provided to the cell as a protein, RNA (encoding the CasY polypeptide), or DNA (encoding the CasY polypeptide); whereas the CasY guide RNA may be provided as a guide RNA or a nucleic acid encoding a guide RNA. Thus, when the method is performed, e.g., in a cell (e.g., inside a cell in vitro, inside a cell in vivo, inside a cell ex vivo), the method comprising contacting the target nucleic acid encompasses introducing into the cell any or all of the components in their active/final state (e.g., in the form of one or more proteins of a CasY polypeptide; in the form of a protein of a CasY fusion polypeptide; in the form of an RNA of a guide RNA in some cases), and also encompasses introducing into the cell one or more nucleic acids encoding one or more components (e.g., one or more nucleic acids comprising one or more nucleotide sequences encoding a CasY polypeptide or a CasY fusion polypeptide, one or more nucleic acids comprising one or more nucleotide sequences encoding one or more guide RNAs, a nucleic acid comprising a nucleotide sequence encoding a donor template, etc.). Because the methods can also be performed outside the cell in vitro, methods that include contacting the target nucleic acid (unless otherwise indicated) encompass contacting outside the cell in vitro, inside the cell in vivo, inside the cell ex vivo, and the like.
In some cases, a method of the present disclosure for modifying a target nucleic acid includes introducing a CasY locus (e.g., a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide and a nucleotide sequence of about 1 kilobases (kb) to 5kb in length surrounding the nucleotide sequence encoding a CasY) into a target cell from a cell comprising a CasY locus (e.g., in some cases, a cell comprising a CasY locus in its native state (the state in which it occurs in nature)), wherein the target cell does not typically (in its native state) comprise a CasY locus. However, one or more spacer sequences encoding a guide sequence for the encoded one or more crrnas may be modified so as to target one or more target sequences of interest. Thus, for example, in some cases, a method of the present disclosure for modifying a target nucleic acid includes introducing a CasY locus (e.g., a nucleic acid obtained from a source cell (e.g., a cell that comprises the CasY locus in its native state (which state it appears in nature)) into a target cell, wherein the nucleic acid has a length of 100 nucleotides (nt) to 5kb (e.g., a length of 100nt to 500nt, 500nt to 1kb, 1kb to 1.5kb, 1.5kb to 2kb, 2kb to 2.5kb, 2.5kb to 3kb, 3kb to 3.5kb, 3.5kb to 4kb, or 4kb to 5kb) and comprises a nucleotide sequence encoding a CasY polypeptide. As described above, in some such cases, one or more spacer sequences encoding a guide sequence for the encoded one or more crrnas may be modified so as to target one or more target sequences of interest. In some cases, the method comprises introducing into the target cell: i) a CasY locus; and ii) a donor DNA template. In some cases, the target nucleic acid is in a cell-free composition outside of the body. In some cases, the target nucleic acid is present in a target cell. In some cases, the target nucleic acid is present in a target cell, wherein the target cell is a prokaryotic cell. In some cases, the target nucleic acid is present in a target cell, wherein the target cell is a eukaryotic cell. In some cases, the target nucleic acid is present in a target cell, wherein the target cell is a mammalian cell. In some cases, the target nucleic acid is present in a target cell, wherein the target cell is a plant cell.
In some cases, a method of the present disclosure for modifying a target nucleic acid comprises contacting the target nucleic acid with a CasY polypeptide of the present disclosure or a CasY fusion polypeptide of the present disclosure. In some cases, the methods of the present disclosure for modifying a target nucleic acid comprise contacting the target nucleic acid with a CasY polypeptide and a CasY guide RNA. In some cases, the methods of the present disclosure for modifying a target nucleic acid comprise contacting the target nucleic acid with a CasY polypeptide, a first CasY-directing RNA, and a second CasY-directing RNA. In some cases, the methods of the present disclosure for modifying a target nucleic acid comprise contacting the target nucleic acid with a CasY polypeptide and a CasY guide RNA and donor DNA template of the present disclosure.
Target nucleic acids and target cells of interest
A CasY polypeptide of the disclosure or a CasY fusion polypeptide of the disclosure can bind to a target nucleic acid when bound to a CasY guide RNA, and in some cases, can bind to and modify the target nucleic acid. The target nucleic acid can be any nucleic acid (e.g., DNA, RNA), can be double-stranded or single-stranded, can be any type of nucleic acid (e.g., chromosomal (genomic DNA), derived from chromosome, chromosomal DNA, plasmid, virus, extracellular, intracellular, mitochondrial, chloroplast, linear, circular, etc.) and can be from any organism (e.g., so long as the CasY-directed RNA comprises a nucleotide sequence that hybridizes to a target sequence in the target nucleic acid such that the target nucleic acid can be targeted).
The target nucleic acid may be DNA or RNA. The target nucleic acid can be double-stranded (e.g., dsDNA, dsRNA) or single-stranded (e.g., ssRNA, ssDNA). In some cases, the target nucleic acid is single-stranded. In some cases, the target nucleic acid is single-stranded rna (ssrna). In some cases, the target ssRNA (e.g., target cellular ssRNA, viral ssRNA, etc.) is selected from: mRNA, rRNA, tRNA, non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and microRNA (miRNA). In some cases, the target nucleic acid is single-stranded DNA (ssdna) (e.g., viral DNA). As noted above, in some cases, the target nucleic acid is single-stranded.
The target nucleic acid can be located anywhere, e.g., outside the cell in vitro, inside the cell in vivo, inside the cell ex vivo. Suitable target cells (which may comprise a target nucleic acid, such as genomic DNA) include, but are not limited to: a bacterial cell; an archaeal cell; a cell of a unicellular eukaryotic organism; a plant cell; algal cells, for example, botryococcus braunii, chlamydomonas reinhardtii, marine oil-rich nannochloropsis, chlorella pyrenoidosa, gulfweed, c.agardh, and the like; fungal cells (e.g., yeast cells); an animal cell; cells from invertebrates (e.g., drosophila, cnidaria, echinoderm, nematode, etc.); cells of insects (e.g., mosquitoes; bees; agricultural pests, etc.); cells of arachnids (e.g., spiders; ticks, etc.); cells from vertebrates (e.g., fish, amphibians, reptiles, birds, mammals); cells from mammals (e.g., cells from rodents; cells from humans; cells from non-human mammals; cells from rodents (e.g., mice, rats); cells from animals of the order lagomorpha (e.g., rabbits), ungulates (e.g., cows, horses, camels, llamas)Sheep, goat, etc.); cells of marine mammals (e.g., whale, seal, elephant seal, dolphin, lion, etc.), and the like. Any type of cell can be of interest (e.g., stem cells, such as Embryonic Stem (ES) cells, Induced Pluripotent Stem (iPS) cells, germ cells (e.g., oocytes, sperm, oogonial cells, spermatogonial cells, etc.), adult stem cells, somatic cells (e.g., fibroblasts), hematopoietic cells, neurons, muscle cells, bone cells, liver cells, pancreatic cells, embryonic cells in vitro or in vivo of an embryo at any stage (e.g., stage zebrafish embryos of 1 cell, 2 cells, 4 cells, 8 cells, etc.).
The cells may be from an established cell line or they may be primary cells, where "primary cells," "primary cell lines," and "primary cultures" are used interchangeably herein to refer to cells and cell cultures that are derived from a subject and allow the culture to be grown in vitro for a limited number of passages (i.e., divisions). For example, a primary culture is a culture that can be passaged 0,1, 2,4, 5, 10 or 15 times but not enough times to pass the turn phase. Typically, primary cell lines are maintained for less than 10 passages in vitro. The target cell may be a unicellular organism and/or may be grown in culture. If the cells are primary cells, they may be harvested from the individual by any convenient method. For example, leukocytes can be conveniently harvested by apheresis, leukoapheresis, density gradient separation, and the like, while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, and the like can be conveniently harvested by biopsy.
In some of the above applications, the subject methods can be used to induce target nucleic acid cleavage, target nucleic acid modification, and/or binding of a target nucleic acid (e.g., for visualization, for collection and/or analysis, etc.) in a mitotic cell or postmitotic cell in vivo and/or ex vivo and/or in vitro (e.g., to disrupt production of a protein encoded by a targeted mRNA, to cleave or otherwise modify a target DNA, to genetically modify a target cell, etc.). Because the guide RNA provides specificity by hybridizing to the target nucleic acid, the mitotic and/or postmitotic cells of interest in the disclosed methods can include cells from any organism (e.g., bacterial cells; archaeal cells; cells of unicellular eukaryotic organisms; plant cells; algal cells, such as, for example, Staphylococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis oceanica, Chlorella pyrenoidosa, Sargassum fulvum, C.agardh, etc.; fungal cells (e.g., yeast cells); animal cells; cells from invertebrates (e.g., Drosophila, Cnida, echinoderm, nematode, etc.), cells from vertebrates (e.g., fish, amphibians, reptiles, birds, mammals), cells from mammals; cells from rodents; cells from humans, etc.). In some cases, a subject CasY protein (and/or nucleic acid encoding a protein, such as DNA and/or RNA) and/or a CasY guide RNA (and/or DNA encoding a guide RNA) and/or a donor template and/or RNP can be introduced into an individual (i.e., the target cell can be in vivo) (e.g., mammal, rat, mouse, pig, primate, non-human primate, human). In some cases, such administration can be for the purpose of treating and/or preventing a disease, e.g., by editing the genome of the targeted cell.
Plant cells include monocot and dicot cells. The cells may be root cells, leaf cells, xylem cells, phloem cells, cambium cells, apical meristem cells, parenchyma cells, canthus tissue cells, sclerenchyma cells, and the like. Plant cells include cells of crops, such as wheat, corn, rice, sorghum, millet, soybean, and the like. Plant cells include cells of agricultural fruit and nut plants, such as plants that produce apricot, orange, lemon, apple, plum, pear, almond, and the like.
Other examples of target cells are listed above in the section entitled "modified cells". Non-limiting examples of cells (target cells) include: prokaryotic cells, eukaryotic cells, bacterial cells, archaebacterial cells, cells of unicellular eukaryotic organisms, protozoal cells, cells from plants (e.g., cells from plant crops, fruits, vegetables, cereals, soybeans, corn (corn), corn (maize), wheat, seeds, tomatoes, rice, cassava, sugarcane, squash, hay, potatoes, cotton, hemp, tobacco, flowering plants, conifers, gymnosperms, angiosperms, ferns, lycopodium, hornworts, liverworts, moss, dicotyledons, monocotyledons, etc.), algal cells (e.g., cells of botrytis braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana (Nannochloropsis digitana), pyrenoidosa (Chlorella pyrenoidosa), gulfweed (gulfweed), sargash), kelp (kelp, etc.)), algal cells (e.g., fungi), yeast cells, cells from mushrooms), animal cells, cells from invertebrates (e.g., drosophila, cnidaria, echinoderm, nematode, etc.), cells from vertebrates (e.g., fish, amphibians, reptiles, birds, mammals), from mammals (e.g., ungulates (e.g., pigs, cows, goats, sheep); rodents (e.g., rats, mice); a non-human primate; a human being; felines (e.g., cats); canine (e.g., dog), etc.), etc. In some cases, the cell is a cell that is not derived from a natural organism (e.g., the cell can be a synthetic cell; also referred to as an artificial cell).
The cells can be in vitro cells (e.g., established cultured cell lines). The cells may be ex vivo cells (cultured cells from an individual). The cell can be an in vivo cell (e.g., a cell in an individual). The cell may be an isolated cell. The cell may be a cell inside an organism. The cell may be an organism. The cell can be a cell in a cell culture (e.g., an in vitro cell culture). The cell may be one of a collection of cells. The cell may be prokaryotic or derived from a prokaryotic cell. The cell may be a bacterial cell or may be derived from a bacterial cell. The cells may be archaeal cells or derived from archaeal cells. The cell may be or be derived from a eukaryotic cell. The cell may be a plant cell or derived from a plant cell. The cells may be animal cells or derived from animal cells. The cells may be invertebrate cells or derived from invertebrate cells. The cells may be vertebrate cells or derived from vertebrate cells. The cells may be mammalian cells or derived from mammalian cells. The cells may be rodent cells or derived from rodent cells. The cells may be human cells or derived from human cells. The cells may be microbial cells or derived from microbial cells. The cell may be a fungal cell or derived from a fungal cell. The cell may be an insect cell. The cell can be an arthropod cell. The cell may be a protozoan cell. The cells may be helminth cells.
Suitable cells include stem cells (e.g., Embryonic Stem (ES) cells, Induced Pluripotent Stem (iPS) cells, germ cells (e.g., oocytes, sperm, oogonium, spermatogonium, etc.), somatic cells, such as fibroblasts, oligodendrocytes, glial cells, hematopoietic cells, neurons, muscle cells, bone cells, liver cells, pancreatic cells, etc.
Suitable cells include human embryonic stem cells, embryonic cardiomyocytes, myofibroblasts, mesenchymal stem cells, autologous-transplanted expanded cardiomyocytes, adipocytes, totipotent cells, pluripotent cells, blood stem cells, myoblasts, adult stem cells, bone marrow cells, mesenchymal cells, embryonic stem cells, parenchymal cells, epithelial cells, endothelial cells, mesothelial cells, fibroblasts, osteoblasts, chondrocytes, exogenous cells, endogenous cells, stem cells, hematopoietic stem cells, bone marrow-derived progenitor cells, cardiomyocytes, skeletal cells, fetal cells, undifferentiated cells, pluripotent progenitor cells, unipotent progenitor cells, monocytes, cardiac myoblasts, skeletal myoblasts, macrophages, capillary endothelial cells, xenogeneic cells, allogeneic cells, and postpartum stem cells.
In some cases, the cells are immune cells, neurons, epithelial cells, and endothelial cells or stem cells. In some cases, the immune cell is a T cell, B cell, monocyte, natural killer cell, dendritic cell, or macrophage. In some cases, the immune cell is a cytotoxic T cell. In some cases, the immune cell is a helper T cell. In some cases, the immune cell is a regulatory T cell (Treg).
In some cases, the cell is a stem cell. The stem cells include adult stem cells. Adult stem cells are also known as somatic stem cells.
Adult stem cells reside in differentiated tissues, but retain the properties of self-renewal and the ability to produce a variety of cell types, usually the typical cell type in the tissue in which the stem cells reside. Many examples of somatic stem cells are known to those skilled in the art, including muscle stem cells; hematopoietic stem cells; epithelial stem cells; a neural stem cell; mesenchymal stem cells; a mammary gland stem cell; (ii) intestinal stem cells; mesodermal stem cells; endothelial stem cells; sniffing the stem cells; neural crest stem cells, and the like.
Stem cells of interest include mammalian stem cells, where the term "mammal" refers to any animal classified as a mammal, including humans; a non-human primate; domestic and farm animals; and zoo, laboratory, sports, or pet animals such as dogs, horses, cats, cattle, mice, rats, rabbits, and the like. In some cases, the stem cell is a human stem cell. In some cases, the stem cell is a rodent (e.g., mouse; rat) stem cell. In some cases, the stem cells are non-human primate stem cells.
The stem cells may express one or more stem cell markers, such as SOX9, KRT19, KRT7, LGR5, CA9, FXYD2, CDH6, CLDN18, TSPAN8, BPIFB1, OLFM4, CDH17, and PPARGC 1A.
In some embodiments, the stem cell is a Hematopoietic Stem Cell (HSC). HSCs are mesoderm-derived cells that can be isolated from bone marrow, blood, cord blood, fetal liver, and yolk sac. HSC are characterized by CD34+And CD3-. HSCs can regenerate erythroid, neutrophil-macrophage, megakaryocyte, and lymphoid hematopoietic lineages in vivo. In vitro, HSCs can be induced to undergo at least some self-renewing cell division and can be induced to differentiate into the same lineages as seen in vivo. Thus, HSCs can be induced to differentiate into one or more of erythrocytes, megakaryocytes, neutrophils, macrophages and lymphocytes.
In other embodiments, the stem cell is a Neural Stem Cell (NSC). Neural Stem Cells (NSCs) are capable of differentiating into neurons and glial cells (including oligodendrocytes and astrocytes). Neural stem cells are pluripotent stem cells that are capable of multiple divisions and under particular conditions may give rise to daughter cells that are neural stem cells, or may be neural progenitor cells that are neuroblasts or glioblasts, e.g., cells that are committed to become one or more types of neurons and glial cells, respectively. Methods of obtaining NSCs are known in the art.
In other embodiments, the stem cell is a Mesenchymal Stem Cell (MSC). MSCs were originally derived from embryonic mesoderm and isolated from adult bone marrow and can differentiate to form muscle, bone, cartilage, fat, bone marrow stroma, and tendon. Methods of isolating MSCs are known in the art; and the MSCs may be obtained using any known method. See, e.g., U.S. Pat. No. 5,736,396, which describes the isolation of human MSCs.
In some cases, the cell is a plant cell. The plant cell may be a cell of a monocotyledonous plant. The cell may be a dicot cell.
In some cases, the cell is a plant cell. For example, the cell can be a cell of a major agricultural plant, such as barley, legumes (dry-fed), oilseed rape, corn, cotton (pima cotton), cotton (upland cotton), linseed, hay (alfalfa), hay (non-alfalfa), oat, peanut, rice, sorghum, soybean, sugarbeet, sugarcane, sunflower (oil), sunflower (non-oil), sweet potato, tobacco (burley), tobacco (flue-cured), tomato, wheat (durum), wheat (spring wheat), wheat (winter wheat), and the like. As another example, the cells are of a vegetable crop including, but not limited to, for example, alfalfa sprouts, aloe leaves, kudzu roots (arrow root), arrowhead (arrowhead), artichoke, asparagus, bamboo shoots, banana flowers, bean sprouts, beans, beet leaves, sugar beets, bitter gourds, bok choy, broccoli (turnip), brussels sprouts, cabbage sprouts, cactus leaves (opuntia ficus), crow's feet, sea-buckthorn, carrots, cauliflower, celery, chayote, chinese artichoke (crosone), chinese cabbage, celery, chinese chives, cabbage hearts, chrysanthemum leaves (chrysanthemum), kale, corn stover, sweet corn, cucumber, radish (daikon), dandelion, tender leaves, taro (dasheen), dauu (pea tip), donqua (white gourd), eggplant, chicory (endive), lettuce, fiddlehead, cress, water cress, etc, Endive, gesson (mustard), galion, galangal (siamese, thailand), garlic, ginger root, burdock (gobo), tender leaf, green leaf for hannovalacia (hanover salad green), huauzonte, jerusalem artichoke (jerusalem garthok), pachyrhizus, kale (kale) tender leaf, kohlrabi (kohlrabi), quinoa (quille), lettuce (bibb)), lettuce (boston), lettuce (boston red), lettuce (green leaf), lettuce (iceberg), lettuce (pilchard), lettuce (lotus root), lettuce (green leaf), lettuce (red oak leaf), lettuce (lettuce) processing lettuce (processing lettuce), lettuce (red leaf), lettuce (roman), lettuce (lettuce), lettuce (red lettuce (lettuce), lettuce (lotus root), radish (red bean), radish (cabbage), radish (red bean), radish (red cabbage (red bean), radish (red cabbage, red bean), red cabbage (red cabbage, red cabbage (red bean), red cabbage (red cabbage, red bean, red cabbage (red cabbage, agave (agave) leaves, yellow taro (malanga), mixed lettuce (messulin mix), watercress (mizuna), moap (lubricous luffa), moo, moqua (squash), mushrooms, mustard, yam (naganio), okra, ricepaper, onion tender leaves, opo (zucchini), ornamental corn, ornamental cucurbit, parsley, parsnip, peas, peppers (bell type), capsicum, pumpkin (pumpkin), chicory (radicchio), radish sprouts, radish (radish), green brassica, rhubarb, roma (baby red), rutabaga, salicornia (sea bean), luffa (horn/ridged), spinach, squash (squash), rice straw bales, sugar cane, sweet potato, tang lettuce, roe, taro (taro), taro (taro), taro (taro), mustard, kokura (koba), kokura (koku, Tomatoes, tomatoes (cherry type), tomatoes (grape type), tomatoes (plum type), turmeric, turnip stem and leaf, turnip (turnip), water chestnut, yam (yampi), yam (name), rape (yu choy), cassava (yuca) (cassava), and the like.
In some cases, the cell is an arthropod cell. For example, the cell may be a cell of the following sub-order, family, sub-family, population, sub-population or species: for example, there are the subdivision Dermatophyta, polypodactyla, Hexipodia, Arachnida, Insecta, Shi\34499Oha, Thysanoptera, Guarantha, mayfly, Aeschna, Difelaria, Aphanizomenon, Neopteroida, Extrinoptera, Pleurotus, Petaeoptera, Zymoptera, orthoptera, Aleptera, Dermaptera, Dictyptera, Polydesmata, Polydesmanoidae, Mantididae, MantidiaFamily, order Lodiniales, Blattaria, Isoptera, order Mantidiales, Paraneteuropetera, order Rodentia, order Thysanoptera, order Phthiraptera, order Hemiptera, order Neptera or Persperida, order Hymenoptera, order Coleoptera, order Twistaria, order Chryseoptera, order Neptera, order Lepidoptera, order Hymenoptera, order Longiperida, order Diptera, order Malloptera, order Lepidoptera, order Praenoptera, orderThe order of the ptera or the order of the lepidoptera.
In some cases, the cell is an insect cell. For example, in some cases, the cell is a cell of a mosquito, grasshopper, hemipteran insect, fly, flea, bee, wasp, ant, lice, moth, or beetle.
Introducing the components into target cells
The Cas9 guide RNA (or a nucleic acid comprising a nucleotide sequence encoding a Cas9 guide RNA) and/or a Cas9 fusion polypeptide (or a nucleic acid comprising a nucleotide sequence encoding a Cas9 fusion polypeptide) and/or the donor polynucleotide can be introduced into the host cell by any of a variety of well-known methods.
Methods of introducing nucleic acids into cells are known in the art, and any convenient method can be used to introduce nucleic acids (e.g., expression constructs) into target cells (e.g., eukaryotic cells, human cells, stem cells, progenitor cells, etc.). Suitable methods are described in more detail elsewhere herein and include, for example, viral or phage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, Polyethyleneimine (PEI) mediated transfection, DEAE-dextran mediated transfection, liposome mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle mediated nucleic acid delivery (see, e.g., Panyam et al Adv Drug delivery rev rev.2012, 9/13/p. pi: S0169-409X (12)00283-9.doi:10.1016/j.addr.2012.09.023), and the like. Any or all of the components can be introduced into the cell as a composition (e.g., comprising any convenient combination of a CasY polypeptide, a CasY guide RNA, a donor polynucleotide, etc.) using known methods, such as nuclear transfection.
Donor polynucleotides (Donor template)
Under the direction of the CasY guide RNA, the CasY protein generates site-specific double-stranded breaks (DSBs) or single-stranded breaks (SSBs) within a double-stranded dna (dsdna) target nucleic acid in some cases (e.g., when the CasY protein is a nickase variant) that are repaired by non-homologous end joining (NHEJ) or Homologous Directed Recombination (HDR).
In some cases, contacting the target DNA (with the CasY protein and the CasY guide RNA) occurs under conditions that allow non-homologous end joining or homologous directed repair. Thus, in some cases, the subject methods include contacting the target DNA with a donor polynucleotide (e.g., by introducing the donor polynucleotide into the cell), wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide is integrated into the target DNA. In some cases, the method does not include contacting the cell with the donor polynucleotide, and modifying the target DNA such that nucleotides within the target DNA are deleted.
In some cases, a CasY guide RNA (or DNA encoding a CasY guide RNA) and a CasY protein (or nucleic acid encoding a CasY protein, such as RNA or DNA, e.g., one or more expression vectors) are co-administered (e.g., contacted with a target nucleic acid, administered to a cell, etc.) with a donor polynucleotide sequence that includes at least one segment that is homologous to the target DNA sequence, the subject methods can be used to add (i.e., insert or substitute) a nucleic acid species to the target DNA sequence (e.g., to "tap in" a nucleic acid encoding a protein, siRNA, miRNA, etc.), add a tag (e.g., 6xHis, a fluorescent protein (e.g., green fluorescent protein; yellow fluorescent protein, etc.), Hemagglutinin (HA), FLAG, etc.), add a regulatory sequence to a gene (e.g., a promoter, polyadenylation signal, Internal Ribosome Entry Sequence (IRES), 2A peptide, initiation codon, etc.) Stop codons, splicing signals, localization signals, etc.), modifying nucleic acid sequences (e.g., introducing mutations, removing pathogenic mutations by introducing the correct sequence), etc. Thus, the complex comprising the CasY guide RNA and the CasY protein may be used in any in vitro or in vivo application where it is desired to modify DNA in a site-specific (i.e. "targeted") manner, e.g. gene knock-out, gene knock-in, gene editing, gene tagging, etc., e.g. as used in e.g. gene therapy for the treatment of disease or as an antiviral, antipathogenic or anticancer therapeutic, the production of genetically modified organisms in agriculture, the large scale production of proteins by cells for therapeutic, diagnostic or research purposes, iPS cell induction, biological research, the targeting of pathogen genes for deletion or replacement, etc.
In applications where it is desired to insert a polynucleotide sequence into the genome from which the target sequence is cleaved, a donor polynucleotide (a nucleic acid comprising a donor sequence) may also be provided to the cell. By "donor sequence" or "donor polynucleotide" or "donor template" is meant a nucleic acid sequence inserted at the site of cleavage of a CasY protein (e.g., after dsDNA cleavage, after nicking the target DNA, after double nicking the target DNA, etc.). The donor polynucleotide may contain sufficient homology (e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology to the nucleotide sequence flanking the target site, e.g., within about 50 bases or less (e.g., within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases) of the target site or directly flanking the target site) to support homology-directed repair between the donor polynucleotide and the genomic sequence with which it is homologous. Approximately 25, 50, 100, or 200 nucleotides or more than 200 nucleotides (or any integer value between 10 and 200 nucleotides or more) with sequence homology between the donor and the genomic sequence may support homology-directed repair. The donor polynucleotide can be of any length, e.g., 10 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, and the like.
The donor sequence is typically not identical to the genomic sequence it replaces. Moreover, the donor sequence can contain at least one or more single base changes, insertions, deletions, inversions or rearrangements relative to the genomic sequence, so long as sufficient homology exists to support homology-directed repair (e.g., for gene correction, e.g., to transform pathogenic or non-pathogenic base pairs). In some embodiments, the donor sequence comprises a non-homologous sequence flanking two homologous regions, such that homology directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. The donor sequence may also comprise a vector backbone containing sequences that are not homologous to the DNA region of interest and are not intended to be inserted into the DNA region of interest. Typically, one or more homologous regions of the donor sequence will have at least 50% sequence identity with the genomic sequence with which recombination is desired. In certain embodiments, there is 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity. Depending on the length of the donor polynucleotide, there may be any value of sequence identity between 1% and 100%.
The donor sequence may comprise certain sequence differences compared to the genomic sequence, such as restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes, etc.), etc., which can be used to assess successful insertion of the donor sequence at the cleavage site or in some cases for other purposes (e.g., to indicate expression at the targeted genomic locus). In some cases, such nucleotide sequence differences, if located in the coding region, will not alter the amino acid sequence, or will produce silent amino acid changes (i.e., changes that do not affect protein structure or function). Alternatively, these sequence differences may include flanking recombination sequences, such as FLP, loxP sequences, etc., which may be activated at a time after removal of the marker sequence.
In some cases, the donor sequence is provided to the cell as single-stranded DNA. In some cases, the donor sequence is provided to the cell as double-stranded DNA. It can be introduced into the cell in linear or circular form. If introduced in a linear form, the ends of the donor sequence may be protected (e.g., from exonucleolytic degradation) by any convenient method, and such methods are known to those of skill in the art. For example, one or more dideoxynucleotide residues can be added to the 3' end of a linear molecule, and/or self-complementary oligonucleotides can be attached to one or both termini. (see, e.g., Chang et al (1987) Proc. Natl. Acad Sci USA 84: 4959-4963; Nehls et al (1996) Science 272: 886-889. additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, the addition of one or more terminal amino groups and the use of modified internucleotide linkages, such as, for example, phosphorothioate, phosphoramidate, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the ends of linear donor sequences, additional length sequences can be included outside the regions of homology, which can degrade without affecting recombination. Or may be delivered by a virus (e.g., adenoviral AAV), as described elsewhere herein for nucleic acids encoding a CasY guide RNA and/or a CasY fusion polypeptide and/or a donor polynucleotide.
Transgenic non-human organism
As described above, in some cases, a nucleic acid (e.g., a recombinant expression vector) of the disclosure (e.g., a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure; a nucleic acid comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, etc.) is used as a transgene to generate a transgenic non-human organism that produces a CasY polypeptide or a CasY fusion polypeptide of the disclosure. The present disclosure provides a transgenic non-human organism comprising a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the disclosure.
Transgenic non-human animals
The present disclosure provides a transgenic non-human animal comprising a transgene comprising a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide. In some embodiments, the genome of the transgenic non-human animal comprises a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the disclosure. In some cases, the transgenic non-human animal is homozygous for the genetic modification. In some cases, the transgenic non-human animal is heterozygous for the genetic modification. In some embodiments, the transgenic non-human animal is a vertebrate, such as a fish (e.g., salmon, trout, zebrafish, goldfish, puffer, cave fish, etc.), amphibian (frogs, salamanders, lizards, etc.), bird (e.g., chicken, turkey, etc.), reptile (e.g., snake, lizard, etc.), non-human mammal (e.g., ungulate, e.g., pig, cow, goat, sheep, etc.), lagomorpha (e.g., rabbit), rodent (e.g., rat, mouse), non-human primate, etc.), and the like. In some cases, the transgenic non-human animal is an invertebrate. In some cases, the transgenic non-human animal is an insect (e.g., a mosquito; an agricultural pest, etc.). In some cases, the transgenic non-human animal is an arachnid.
The nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the disclosure can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid is randomly integrated into the genome of the host cell) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoters), inducible promoters (e.g., heat shock promoters, tetracycline regulated promoters, steroid regulated promoters, metal regulated promoters, estrogen receptor regulated promoters, etc.), spatially and/or temporally limited promoters (e.g., tissue specific promoters, cell type specific promoters, etc.), and the like.
Transgenic plants
As described above, in some cases, a nucleic acid (e.g., a recombinant expression vector) of the disclosure (e.g., a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide of the disclosure; a nucleic acid comprising a nucleotide sequence encoding a CasY fusion polypeptide of the disclosure, etc.) is used as a transgene to generate a transgenic plant that produces a CasY polypeptide or a CasY fusion polypeptide of the disclosure. The present disclosure provides a transgenic plant comprising a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the present disclosure. In some embodiments, the genome of the transgenic plant comprises the subject nucleic acid. In some embodiments, the transgenic plant is homozygous for the genetic modification. In some embodiments, the transgenic plant is heterozygous for the genetic modification.
Methods for introducing exogenous nucleic acids into plant cells are well known in the art. Such plant cells are considered "transformed" as defined above. Suitable methods include viral infection (such as double-stranded DNA virus), transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, silicon carbide whisker technology, agrobacterium-mediated transformation, and the like. The choice of method will generally depend on the type of cell to be transformed and the environment under which the transformation takes place (i.e., in vitro, ex vivo or in vivo).
Transformation methods based on the soil bacterium Agrobacterium tumefaciens (Agrobacterium tumefaciens) are particularly useful for introducing exogenous nucleic acid molecules into vascular plants. Wild-type forms of Agrobacterium contain a Ti (tumor inducing) plasmid which directs the production of tumorigenic crown gall growing on the host plant. The transfer of the tumor-inducing T-DNA region of the Ti plasmid to the plant genome requires the Ti plasmid to encode the virulence genes as well as the T-DNA border sequence, which is a series of forward DNA repeats delineating the region to be transferred. Agrobacterium-based vectors are modified forms of Ti plasmids in which tumor inducing functions are replaced by nucleic acid sequences of interest to be introduced into a plant host.
Agrobacterium-mediated transformation typically employs a cointegrate vector or binary vector system, in which the components of the Ti-plasmid are divided between a helper vector, which is permanently present in the Agrobacterium host and carries the virulence genes, and a shuttle vector, which contains the gene of interest defined by a T-DNA sequence. Various binary vectors are well known in the art and are commercially available, for example, from Clontech (Palo Alto, Calif.). Methods for co-culturing Agrobacterium, for example, with cultured plant cells or wounded tissue such as leaf tissue, root explants, hypocotyls, stem blocks or tubers are also well known in the art. See, e.g., Glick and Thompson (eds.), Methods in Plant molecular biology and Biotechnology, Boca Raton, Fla.: CRC Press (1993).
Microprojectile-mediated transformation can also be used to produce the subject transgenic plants. This method, first described by Klein et al (Nature 327:70-73(1987)), relies on microparticles, such as gold or tungsten, that are coated with the desired nucleic acid molecule by precipitation with calcium chloride, spermidine or polyethylene glycol. Microparticle particles are accelerated into angiosperm tissue at high speed using a device such as BIOLISTIC PD-1000 (Biorad; Hercules Calif.).
A nucleic acid of the present disclosure (e.g., a nucleic acid comprising a nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the present disclosure (e.g., a recombinant expression vector)) can be introduced into a plant in a manner that enables the nucleic acid to enter one or more plant cells, e.g., by an in vivo or ex vivo protocol. By "in vivo" is meant the application of nucleic acid to a living body of a plant, e.g., osmosis. By "ex vivo" is meant that cells or explants are modified outside of the plant and then such cells or organs are regenerated into the plant. Various vectors have been described which are suitable for stably transforming Plant cells or for establishing transgenic plants, including those described in Weissbach and Weissbach, (1989) Methods for Plant Molecular Biology Academic Press and Gelvin et al, (1990) Plant Molecular Biology Manual, Kluwer Academic Publishers. Specific examples include those derived from the Ti plasmid of Agrobacterium tumefaciens, and those disclosed by Herrera-Estrella et al (1983) Nature303:209, Bevan (1984) Nucl Acid Res.12: 8711-. Alternatively, non-Ti vectors can be used to transfer DNA into plants and cells by using episomal DNA delivery techniques. By using these methods, transgenic plants such as wheat, rice (Christou (1991) Bio/Technology 9:957-9 and 4462) and maize (Gordon-Kamm (1990) Plant Cell 2: 603-. Immature embryos can also be good target tissues for monocotyledons by direct DNA delivery techniques using particle guns (Weeks et al (1993) Plant Physiol 102: 1077-1084; Vasil (1993) Bio/Technolo 10: 667-minus 674; Wan and Lemeaux (1994) Plant Physiol104:37-48) and Agrobacterium-mediated DNA transfer (Ishida et al (1996) Nature Biotech 14: 745-minus 750). Exemplary Methods for introducing DNA into chloroplasts are biolistic bombardment, polyethylene glycol transformation of protoplasts and microinjection (Danieli et al Nat. Biotechnology 16:345- -348, 1998; Staub et al Nat. Biotechnology 18:333- -338, 2000; O' Neill et al Plant J.3: 729-; 738, 1993; Knoblauch et al Nat Biotechnology 17: 906-; U.S. Pat. Nos. 5,451,513, 5,545,817, 5,545,818 and 5,576,198; International application No. WO 95/16783; and Boynton et al, Methods in Enzymology217: 510-. Any vector suitable for use in methods of biolistic bombardment, protoplast polyethylene glycol transformation, and microinjection will be suitable as a targeting vector for chloroplast transformation. Any double-stranded DNA vector may be used as a transformation vector, particularly when the introduction method does not use Agrobacterium.
Genetically modifiable plants include cereals, forage crops, fruits, vegetables, oilseed crops, palms, forestry plants and grapevines. Specific examples of plants that can be modified are as follows: corn, banana, peanut, red pea, sunflower, tomato, canola, tobacco, wheat, barley, oat, potato, soybean, cotton, carnation, sorghum, lupin, and rice.
The present disclosure provides transformed plant cells, tissues, plants, and products containing the transformed plant cells. The subject transformed cells, as well as tissues and products comprising the transformed cells, are characterized by the presence of the subject nucleic acid integrated into the genome and produced by a plant cell of a CasY polypeptide or a CasY fusion polypeptide of the disclosure. The recombinant plant cell of the present invention can be used as a recombinant cell population or as a tissue, seed, whole plant, stem, fruit, leaf, root, flower, stem, tuber, grain, animal feed, plant field, or the like.
The nucleotide sequence encoding a CasY polypeptide or a CasY fusion polypeptide of the disclosure can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid is randomly integrated into the genome of the host cell) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters may be any known promoter and include constitutively active promoters, inducible promoters, spatially and/or temporally limited promoters, and the like.
Methods for identifying CRISPR RNA-directed endonucleases
Methods for identifying CRISPR RNA-directed endonucleases are provided. For example, in some embodiments, such methods include the step of detecting a nucleotide sequence encoding a Cas1 polypeptide among a plurality of metagenomic nucleotide sequences. Cas1 proteins are known in the art and are present near the CRISPR locus of class 2 CRISPR systems, those CRISPR systems include single-acting proteins that act as endonucleases and do not need to interact with protein complexes in order to function properly. Although the Cas1 protein itself is involved in capturing new target sequences into the CRISPR locus, and thus is not a desirable effector protein identified by this method, the presence of the Cas1 protein near the CRISPR locus indicates that at least one other Cas protein present near the locus is likely to be an effector protein (RNA-guided endonuclease).
As used herein, the term "metagenomics" means the parallel analysis of nucleic acids recovered from a plurality of microorganisms in a sample (e.g., an environmental sample, such as a sample containing an unknown amount of prokaryotes (bacteria/archaea) and possibly containing prokaryotes that have never been found and/or characterized). Nucleic acids can be recovered from such samples by any convenient method, and are typically recovered together from the entire sample, so that it is not known from which microorganism any given nucleic acid molecule originates prior to analysis. In some embodiments, the sample contains an unknown mixture and/or amount of microorganisms. The nucleic acids can then be sequenced to generate a plurality of metagenomic sequences. In some cases, the subject methods of identifying CRISPR RNA-directed endonucleases include a step of isolating a sample (e.g., an environmental sample). In some cases, the subject methods of identifying CRISPR RNA-directed endonucleases include the steps of isolating nucleic acids from a sample and/or assaying the sample to generate a plurality of metagenomic nucleotide sequences from the sample.
Once the Cas1 protein is identified, the subject method of identifying CRISPR RNA-directed endonucleases can include the step of detecting a CRISPR array (repeat-spacer-repeat array) adjacent to the nucleotide sequence encoding Cas 1. The method can then comprise the step of cloning (e.g., from a nucleic acid sample derived from a plurality of metagenomic nucleotide sequences) a CRISPR locus comprising the detected CRISPR array into an expression vector to generate a recombinant CRISPR locus expression vector. The function of the CRISPR locus can then be tested by determining the ability of the recombinant CRISPR locus expression vector to cleave the target nucleic acid. Any convenient assay may be used. In some embodiments, the determining step comprises introducing the recombinant CRISPR locus expression vector and the target nucleic acid into a cell (e.g., a heterologous host cell, such as an e. For example, refer to the PAM deletion assay of the working examples below (fig. 5). In some cases, the determining step comprises introducing a library of plasmids into a population of host cells (e.g., e.coli cells), wherein each plasmid of the library has 4 to 10 (e.g., 5 to 10, 5 to 8,6 to 10, 6 to 8,5, 6, 7, 8) randomized nucleotides of the 5 'and/or 3' end of the target sequence. The host cell may already contain the recombinant CRISPR locus expression vector to be tested, or the recombinant CRISPR locus expression vector may be introduced after the library. Testing only the CRISPR locus with a functional, and thus comprising a functional CRISPR RNA-directed endonuclease, would result in the ability to cleave a plasmid with the target sequence. The reason for including randomized sequences at the 5 'and 3' ends of the target sequence is that the PAM sequence required for the desired endonuclease may not be known at the beginning of the experiment.
If the expression vector can cleave a target nucleic acid (e.g., a target nucleic acid having an appropriate target sequence and a PAM, such as a target sequence that matches at least one spacer sequence of the CRISPR array), the CRISPR locus comprises a nucleotide sequence encoding a candidate CRISPR RNA-directed endonuclease. Thus, the open reading frame encoding the CRISPR RNA-directed endonuclease from the CRISPR locus can then be identified. In some cases, it is desirable to identify a previously unknown CRISPR RNA-directed endonuclease, and thus in some cases, the identified polypeptide has less than 20% amino acid sequence identity (e.g., less than 15%, less than 10%, less than 5% amino acid sequence identity) to the amino acid sequence of a known CRISPR RNA-directed endonuclease polypeptide.
Examples of non-limiting aspects of the disclosure
Aspects of the inventive subject matter described above, including embodiments, can be beneficial alone or in combination with one or more other aspects or embodiments. Without limiting the foregoing description, certain non-limiting aspects of the present disclosure, numbered 1-123, are provided below. It will be apparent to those skilled in the art upon reading this disclosure that each individually numbered aspect may be used or combined with any of the individually numbered aspects preceding or following. This is intended to provide support for all such combinations of aspects and is not limited to the combinations of aspects explicitly provided below:
aspect(s)
1. A composition, comprising:
a) a CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide; and
b) a CasY guide RNA or one or more DNA molecules encoding said CasY guide RNA.
2. The composition of claim 1, wherein the CasY polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any of SEQ ID NOS: 1-8).
3. The composition of claim 1 or 2, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS: 11-15.
4. The composition of claim 1 or 2, wherein the CasY polypeptide is fused to an NLS sequence.
5. The composition of any one of claims 1-4, wherein the composition comprises a lipid.
6. The composition of any one of claims 1-4, wherein a) and b) are within a liposome.
7. The composition of any one of claims 1-4, wherein a) and b) are within a particle.
8. The composition of any one of claims 1-7, comprising one or more of: buffers, nuclease inhibitors and protease inhibitors.
9. The composition of any one of claims 1-8, wherein the CasY polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NO: 1-8).
10. The composition of any one of claims 1-9, wherein the CasY polypeptide is a nickase that is capable of cleaving only one strand of a double-stranded target nucleic acid molecule.
11. The composition of any one of claims 1-9, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasY).
12. The composition of claim 10 or 11, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from the group consisting of: d672, E769 and D935 of SEQ ID NO. 1.
13. The composition of any one of claims 1-12, further comprising a DNA donor template.
14. A CasY fusion polypeptide comprising: a CasY polypeptide fused to a heterologous polypeptide.
15. The CasY fusion polypeptide of claim 14, wherein the CasY fusion polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
16. The CasY fusion polypeptide of claim 14, wherein the CasY fusion polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
17. The CasY fusion polypeptide of any one of claims 14-16, wherein the CasY polypeptide is a nickase that is capable of cleaving only one strand of a double stranded target nucleic acid molecule.
18. The CasY fusion polypeptide of any one of claims 14-17, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasy).
19. The CasY fusion polypeptide of claim 17 or 18, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from: d672, E769 and D935 of SEQ ID NO. 1.
20. The CasY fusion polypeptide of any one of claims 14-19, wherein the heterologous polypeptide is fused to the N-terminus and/or C-terminus of the CasY polypeptide.
21. The CasY fusion polypeptide of any one of claims 14-20, which comprises NLS.
22. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a targeting polypeptide that provides binding to a cell surface moiety on a target cell or target cell type.
23. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide exhibits enzymatic activity that modifies a target DNA.
24. The CasY fusion polypeptide of claim 23, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
25. The CasY fusion polypeptide of claim 24, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity and recombinase activity.
26. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
27. The CasY fusion polypeptide of claim 26, wherein the heterologous polypeptide exhibits histone modification activity.
28. The CasY fusion polypeptide of claim 26 or 27, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desuumoylating activity, ribosylating activity, deubisylating activity, myristoylation activity, demamylylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
29. The CasY fusion polypeptide of claim 28, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity and deacetylase activity.
30. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is an endosomal escape polypeptide.
31. The CasY fusion polypeptide of claim 30, wherein the endosome escape polypeptide comprises an amino acid sequence selected from the group consisting of: GLFXALLXLXL LXLLXA (SEQ ID NO:94) and GLFHALLHLLHSLWHLLLHA (SEQ ID NO:95), wherein each X is independently selected from lysine, histidine and arginine.
32. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a chloroplast transit peptide.
33. The CasY fusion polypeptide of claim 32, wherein the chloroplast transit peptide comprises an amino acid sequence selected from the group consisting of: MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKCMQVWPPIGKKKFETLSYLPPLTRDSRA (SEQ ID NO:83), MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKS (SEQ ID NO:84), MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWPPIEKKKFETLSYLPDLTDSGGRVNC (SEQ ID NO:85), MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:86), MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:87), MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLFCSFRISASVATAC (SEQ ID NO:88), MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRTVGASAAPKQSRKPHRFDRRCLSMVV (SEQ ID NO:89), MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQQRSVQRGSRRFPSVVVC (SEQ ID NO:90), MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLDITSIASNGGRVQC (SEQ ID NO:91), MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVISRSAAAA (SEQ ID NO:92), and MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATTNGASAASS (SEQ ID NO: 93).
34. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
35. A CasY fusion polypeptide as claimed in 34 wherein the heterologous polypeptide is a transcriptional repressor domain.
36. The CasY fusion polypeptide of 34, wherein the heterologous polypeptide is a transcriptional activation domain.
37. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a protein binding domain.
38. A nucleic acid molecule encoding a CasY fusion polypeptide as defined in any one of claims 14 to 37.
39. The nucleic acid molecule of 38, wherein said nucleotide sequence encoding said CasY fusion polypeptide is operably linked to a promoter.
40. The nucleic acid molecule of 39, wherein said promoter is functional in a eukaryotic cell.
41. The nucleic acid molecule of claim 40, wherein the promoter is functional in one or more of: plant cells, fungal cells, animal cells, invertebrate cells, fly cells, vertebrate cells, mammalian cells, primate cells, non-human primate cells, and human cells.
42. The nucleic acid molecule of any one of claims 39-41, wherein the promoter is one or more of: constitutive promoters, inducible promoters, cell type specific promoters, and tissue specific promoters.
43. The nucleic acid molecule of any one of claims 38-42, wherein the DNA molecule is a recombinant expression vector.
44. The nucleic acid molecule of 43, wherein the recombinant expression vector is a recombinant adeno-associated viral vector, a recombinant retroviral vector, or a recombinant lentiviral vector.
45. The nucleic acid molecule of 39, wherein said promoter is functional in a prokaryotic cell.
46. The nucleic acid molecule of claim 38, wherein the nucleic acid molecule is mRNA.
47. One or more nucleic acid molecules encoding:
(a) a CasY guide RNA; and
(b) a CasY polypeptide.
48. One or more nucleic acid molecules of claim 47, wherein the CasY fusion polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
49. One or more nucleic acid molecules of claim 47, wherein the CasY fusion polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
50. One or more nucleic acid molecules of any one of 47-49, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS: 11-15.
51. One or more nucleic acid molecules of any one of 47-50, wherein the CasY polypeptide is fused to an NLS sequence.
52. One or more nucleic acid molecules of any one of 47-51, wherein said one or more nucleic acid molecules comprises a nucleotide sequence encoding said CasY guide RNA operably linked to a promoter.
53. One or more nucleic acid molecules of any one of claims 47-52, wherein said one or more nucleic acid molecules comprises a nucleotide sequence encoding said CasY polypeptide operably linked to a promoter.
54. One or more nucleic acid molecules of claim 52 or 53, wherein said promoter operably linked to said nucleotide sequence encoding said CasY guide RNA and/or said promoter operably linked to said nucleotide sequence encoding said CasY polypeptide is functional in a eukaryotic cell.
55. One or more nucleic acid molecules of claim 54, wherein the promoter is functional in one or more of: plant cells, fungal cells, animal cells, invertebrate cells, fly cells, vertebrate cells, mammalian cells, primate cells, non-human primate cells, and human cells.
56. One or more nucleic acid molecules of any one of claims 53-55, wherein the promoter is one or more of: constitutive promoters, inducible promoters, cell type specific promoters, and tissue specific promoters.
57. One or more nucleic acid molecules of any one of claims 47-56, wherein the one or more nucleic acid molecules are one or more recombinant expression vectors.
58. The one or more nucleic acid molecules of claim 57, wherein the one or more recombinant expression vectors are selected from the group consisting of: one or more adeno-associated viral vectors, one or more recombinant retroviral vectors, or one or more recombinant lentiviral vectors.
59. One or more nucleic acid molecules of claim 53, wherein said promoter is functional in a prokaryotic cell.
60. A eukaryotic cell comprising one or more of:
a) a CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide,
b) casy fusion polypeptide or nucleic acid molecule encoding said Casy fusion polypeptide, and
c) a CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
61. The eukaryotic cell of 60, comprising a nucleic acid molecule encoding the CasY polypeptide, wherein the nucleic acid molecule is integrated into the genomic DNA of the cell.
62. The eukaryotic cell of 60 or 61, wherein the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arthropod cell, a fungal cell, an avian cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell.
63. A cell comprising a CasY fusion polypeptide or a nucleic acid molecule encoding said CasY fusion polypeptide.
64. The cell of claim 63, wherein said cell is a prokaryotic cell.
65. The cell of 63 or 64, comprising a nucleic acid molecule encoding said CasY fusion polypeptide, wherein said nucleic acid molecule is integrated into the genomic DNA of said cell.
66. A method of modifying a target nucleic acid, the method comprising contacting the target nucleic acid with:
a) a CasY polypeptide; and
b) a CasY guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid,
wherein said contacting results in a modification of said target nucleic acid by said CasY polypeptide.
67. The method of 66, wherein the modification is cleavage of the target nucleic acid.
68. The method of 66 or 67, wherein the target nucleic acid is selected from the group consisting of: double-stranded DNA, single-stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
69. The method of any one of claims 66-68, wherein the contacting occurs outside of a cell in vitro.
70. The method of any one of claims 66-68, wherein the contacting occurs inside a cell in culture.
71. The method of any one of claims 66-68, wherein the contacting occurs inside a cell in vivo.
72. The method of 70 or 71, wherein the cell is a eukaryotic cell.
73. The method of 72, wherein the cell is selected from the group consisting of: plant cells, fungal cells, mammalian cells, reptile cells, insect cells, avian cells, fish cells, parasite cells, arthropod cells, invertebrate cells, vertebrate cells, rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells.
74. The method of 70 or 71, wherein the cell is a prokaryotic cell.
75. The method of any one of claims 66-74, wherein said contacting results in genome editing.
76. The method of any one of claims 66-75, wherein the contacting comprises: the following were introduced into the cells: (a) said CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide, and (b) said CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
77. The method of claim 76, wherein the contacting further comprises: introducing a DNA donor template into the cell.
78. The method of any one of claims 66-77, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS: 11-15.
79. The method of any one of claims 66-78, wherein the CasY polypeptide is fused to an NLS sequence.
80. A method of modulating transcription from a target DNA, modifying a target nucleic acid, or modifying a protein associated with a target nucleic acid, the method comprising contacting the target nucleic acid with:
a) a CasY fusion polypeptide comprising a CasY polypeptide fused to a heterologous polypeptide; and
b) a CasY guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid.
81. The method of 80, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS: 11-15.
82. The method of 80 or 81, wherein the CasY fusion polypeptide comprises an NLS sequence.
83. The method of any one of claims 80-82, wherein the modification is not cleavage of the target nucleic acid.
84. The method of any one of claims 80-83, wherein the target nucleic acid is selected from the group consisting of: double-stranded DNA, single-stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
85. The method of any one of claims 80-84, wherein the contacting occurs outside of a cell in vitro.
86. The method of any one of claims 80-84, wherein the contacting occurs inside a cell in culture.
87. The method of any one of claims 80-84, wherein the contacting occurs inside a cell in vivo.
88. The method of 86 or 87, wherein the cell is a eukaryotic cell.
89. The method of 88, wherein the cell is selected from the group consisting of: plant cells, fungal cells, mammalian cells, reptile cells, insect cells, avian cells, fish cells, parasite cells, arthropod cells, invertebrate cells, vertebrate cells, rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells.
90. The method of 86 or 87, wherein the cell is a prokaryotic cell.
91. The method of any one of claims 80-90, wherein the contacting comprises: the following were introduced into the cells: (a) said CasY fusion polypeptide or a nucleic acid molecule encoding said CasY fusion polypeptide, and (b) said CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
92. The method of any one of claims 80-91, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasY).
93. The method of any one of claims 80-92, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from: d672, E769 and D935 of SEQ ID NO. 1.
94. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits enzymatic activity that modifies a target DNA.
95. The method of 94, wherein said heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
96. The method of claim 95, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity and recombinase activity.
97. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
98. The method of 97, wherein said heterologous polypeptide exhibits histone modification activity.
99. The method of 97 or 98, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desuumoylating activity, ribosylating activity, deubisylating activity, myristoylation activity, demamylylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
100. The method of 99, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity and deacetylase activity.
101. The method of any one of claims 80-93, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
102. The method of 101, wherein the heterologous polypeptide is a transcriptional repressor domain.
103. The method of 101, wherein the heterologous polypeptide is a transcriptional activation domain.
104. The method of any one of claims 80-93, wherein the heterologous polypeptide is a protein binding domain.
105. A transgenic multicellular non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding one or more of:
a) (ii) a CasY polypeptide having a sequence of,
b) casy fusion polypeptide, and
c) CasY guide RNA.
106. The transgenic multicellular non-human organism of claim 105 wherein the CasY fusion polypeptide comprises an amino acid sequence having 50% or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
107. The transgenic multicellular non-human organism of claim 105 wherein the CasY fusion polypeptide comprises an amino acid sequence having 85% or more amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
108. The transgenic multicellular non-human organism of any one of 105-107 wherein the organism is a plant, monocot, dicot, invertebrate, insect, arthropod, arachnid, parasite, helminth, cnidium, vertebrate, fish, reptile, amphibian, ungulate, bird, pig, horse, sheep, rodent, mouse, rat, or non-human primate.
109. A system, comprising:
a) a CasY polypeptide and a CasY guide RNA;
b) a CasY polypeptide, a CasY guide RNA and a DNA donor template;
c) a CasY fusion polypeptide and a CasY guide RNA;
d) a CasY fusion polypeptide, a CasY guide RNA, and a DNA donor template;
e) mRNA encoding a CasY polypeptide and a CasY guide RNA;
f) mRNA encoding a CasY polypeptide, a CasY guide RNA, and a DNA donor template;
g) mRNA encoding a CasY fusion polypeptide and a CasY guide RNA;
h) mRNA encoding a CasY fusion polypeptide, a CasY guide RNA, and a DNA donor template;
i) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY polypeptide, and ii) a nucleotide sequence encoding a CasY guide RNA;
j) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY polypeptide, ii) a nucleotide sequence encoding a CasY guide RNA, and iii) a DNA donor template;
k) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY fusion polypeptide, and ii) a nucleotide sequence encoding a CasY guide RNA; and
l) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY fusion polypeptide, ii) a nucleotide sequence encoding a CasY guide RNA, and a DNA donor template.
110. The CasY system of 109, wherein the CasY fusion polypeptide comprises an amino acid sequence having 50% or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
111. The CasY system of 109, wherein the CasY fusion polypeptide comprises an amino acid sequence having 85% or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO:1 or SEQ ID NO:2 (or the amino acid sequence set forth in any one of SEQ ID NOS: 1-8).
112. The CasY system as described in any one of claims 109-111, wherein the donor template nucleic acid has a length of from 8 nucleotides to 1000 nucleotides.
113. The CasY system as described in any one of claims 109-111, wherein the donor template nucleic acid has a length of 25 nucleotides to 500 nucleotides.
114. A kit comprising the CasY system as described in any one of items 109-113.
115. The kit of claim 114, wherein the components of the kit are in the same container.
116. The kit of claim 114, wherein the components of the kit are in separate containers.
117. A sterile container comprising the CasY system as described in any one of items 109 and 116.
118. The sterile container of claim 117, wherein said container is a syringe.
119. An implantable device comprising the CasY system as described in any of 109 and 116.
120. The implantable device of 119, wherein the CasY system is within a matrix.
121. The implantable device of 119, wherein the CasY system is in a depot.
122. A method of identifying an CRISPR RNA-directed endonuclease, the method comprising:
detecting a nucleotide sequence encoding a Cas1 polypeptide in a plurality of metagenomic nucleotide sequences;
detecting a CRISPR array near the nucleotide sequence encoding Cas 1;
cloning a CRISPR locus comprising the detected CRISPR array from a nucleic acid sample from which the plurality of metagenomic nucleotide sequences are derived into an expression vector to generate a recombinant CRISPR locus expression vector;
determining the ability of the recombinant CRISPR locus expression vector to cleave a target nucleic acid, wherein the CRISPR locus having the ability to cleave a target nucleic acid comprises a nucleotide sequence encoding an CRISPR RNA-directed endonuclease.
Identifying an open reading frame in the CRISPR locus encoding a polypeptide having less than 20% amino acid sequence identity to the amino acid sequence of a known CRISPR RNA-directed endonuclease polypeptide.
123. The method of 122, wherein the assaying comprises introducing the recombinant CRISPR locus expression vector and target nucleic acid into a cell.
Examples
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperature, etc.) but some experimental error and deviation should be accounted for. Unless otherwise indicated, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees celsius, and pressure is at or near atmospheric pressure. Standard abbreviations may be used, e.g., bp, base pairs; kb, kilobases; pl, picoliter; s or sec, seconds; min, min; h or hr, hours; aa, an amino acid; kb, kilobases; bp, base pair; nt, nucleotide; i.m., intramuscular (intramyogenic); i.p., intraperitoneal (intraperitoneally); s.c., subcutaneous (s.c.), etc.
Example 1
The work described herein includes analyzing metagenomic samples of microbial communities from groundwater, sediments, and acid mine drainage. A new class 2 CRISPR-Cas system not represented in cultured organisms was identified.
FIG. 3.CasY Domain and similarity search. (FIG. a) schematic domain representation of CasY deduced from a remote homology alignment with AcCpf1 using HHpred. Conserved catalytic residues are marked with a red bar above the protein. CasY contains separate RuvC domains (RuvC-I, RuvC-II and RuvC-III) at the C-terminal region, and contains a large novel N-terminal domain. The highest hits based on the following searches are shown below the schematic: (1) BLAST searches for all proteins in NCBI (NR database, including model and environmental proteins). (2) Hidden Markov Model (HMM) search based on models constructed using all Cas proteins, at 11 months of Makarova et al Nat Rev microbiol.2015; 722-36 and Shmakov et al Mol cell.2015, 11 months and 5 days; 60(3) 385-97). (3) HHpred-based long homology search. The hits are color coded based on their importance and provide hit ranges and E values. Notably, CasY has only partial hits. The 812N-terminal amino acids of CasY have only one very minor local hit. Taken together, these findings indicate that CasY is a novel Cas protein. (panel b) construction of different CasY-containing CRISPR locus scaffolds from sequence data.
Example 2
FIG. 4 is a schematic representation of the map of the CasY and C2C3 loci. The interfering proteins are shown in green and the acquired proteins in red. The repeat sequence folded using the RNA structure is shown on the right, revealing a strong hairpin at the 5' end, indicating that the CRISPR array is self-processed by CasY.
FIG. 5 (panels a to d) PAM-dependent plasmid interference by CasY. FIG. a PAM deletion assay was performed with CasY. Coli containing the CasY CRISPR locus was transformed with a plasmid library with 7 nucleotides randomized at the 5 'or 3' end of the target sequence. Target plasmids were selected and transformants were pooled. Random regions were amplified and prepared for deep sequencing. The deleted sequences were identified and used to generate PAM signatures. (FIG. b) PAM signature generated for CasY.1 shows a strong preference for sequences containing the 5' -TA-3 ' flanking sequence of the 5' end of the target. No 3' PAM was detected. (panel c) four different PAMs were assayed directly to verify PAM determination by PAM deletion assay. (FIG. D) PAM signature generated for CasY.2 shows preference for 5' -YR-3 ' and/or 5' -TR-3 ' (e.g., 5' -DTR-3 ') (lower and upper thresholds, respectively) flanking sequences containing the 5' end of the target (where Y is T or C; R is A or G; and D is A, G or T). No 3' PAM was detected.
FIG. 6 (FIG. a) sequences from 'repeats' of naturally occurring CasY guide RNAs (against CasY loci Y1-Y6). (Panel b) CasY RNA-guided DNA cleavage map. The CasY protein binds to crRNA (CasY guide RNA) in the repeat region (black, repeats; red, spacer). Base pairing of the guide sequence of the guide RNA with the target sequence (blue) containing the correct Protospacer Adjacent Motif (PAM) results in double-stranded cleavage of the target DNA.
Example 3: novel CRISPR-Cas systems from uncultured microorganisms
The CRISPR-Cas adaptive immune system revolutionized genome engineering by providing programmable enzymes capable of site-specific DNA cleavage. However, current CRISPR-Cas technology is based only on systems from cultured bacteria, leaving the vast majority of enzymes from organisms that are not isolated in an undeveloped state. The data provided herein show that using metagenomics independent of genomic resolution of culture, a new CRISPR-Cas system was identified, including Cas9 first reported in the archaebacterial domain. This different Cas9 enzyme was found as part of the active CRISPR-Cas system in few studied nano archaea. In bacteria, two previously unknown systems, CRISPR-CasX and CRISPR-CasY, were found, which are among the most simplified systems identified so far. Notably, all required functional components were identified by metagenomics, which allowed the validation of robust RNA-guided DNA interference activity in e. The data herein show that queries for environmental microbial communities combined with experiments in living cells can achieve unprecedented genomic diversity, the content of which will expand all the components of microbial-based biotechnology.
Results
Megabase-scale metagenomic datasets from groundwater, sediment and acid mine drainage microbial communities were analyzed looking for class 2 CRISPR-Cas systems not represented in cultured organisms. The first Cas9 protein in the archaeal domain was identified, and two new CRISPR-Cas systems, CRISPR-CasX and CRISPR-CasY, were found in non-cultured bacteria (fig. 7). Notably, both archaea Cas9 and CasY are encoded only in the genomes of organisms from lineages for which no known isolation is representative.
Identification of archaea Cas9 for the first time
One of the features of CRISPR-Cas9 is that it is assumed to be present only in the bacterial domain. Therefore, it is surprising to find Cas9 proteins encoded in the genomes of the nano archaea ARMAN-1(Candidatus micrarchaumaacipolium ARMAN-1) and ARMAN-4(Candidatus Parvarchaeum acipolium ARMAN-4) in the Acid Mine Drainage (AMD) metagenomic dataset. These findings extend the emergence of the Cas 9-containing CRISPR system to another biological domain.
The armam-4 cas9 gene was found in 16 different samples in the same genomic environment, but without other adjacent cas genes (although centered at several DNA sequence contigs >25 kbp) and with only one adjacent CRISPR repeat-spacer unit (fig. 13). The absence of the classical CRISPR array and cas1 encoding the universal CRISPR integrase indicates that no new spacer system is available. The target of the spacer sequence was not identified, but given the conservation of the locus in samples taken over several years, the function of the locus in the "single target" CRISPR-Cas system cannot be excluded at this time.
In contrast, the CRISPR-Cas loci in ARMAN-1 recovered from 15 different samples included large CRISPR arrays adjacent to Cas1, Cas2, Cas4 and Cas9 genes. A number of alternative ARMAN-1CRISPR arrays were reconstructed, with largely conserved ends (possibly consisting of the oldest spacer) and variable regions in which a number of different spacers have been incorporated (fig. 8a and fig. 14). Based on this high denaturation of the spacer content, these data show that the ARMAN-1CRISPR-Cas9 system is active in the sampled population.
Notably, the 56 putative spacer targets (protospacers) in the ARMAN-1CRISPR-Cas9 system are located on a single 10kbp genomic fragment, which is likely the ARMAN-1 virus because it encodes a high density of short hypothetical proteins (FIG. 8 b). Indeed, cryo-electron tomography reconstruction typically identifies viral particles that attach to the ARMAN cells. The ARMAN-1 protospacer sequence was also derived from a putative transposon in the genome of ARMAN-2, another nanoarchaea, and a putative mobile element in the genome of an archaea of the order thermotrophozoies, which included the mobile element of I-plasma from the same ecosystem (FIG. 15). A direct cytoplasmic "bridge" is observed between the ARMAN and the heat-source soma cells, meaning that there is a close relationship between them. Therefore, the ARMAN-1CRISPR-Cas9 can defend against transposon transmission between these organisms, an effect reminiscent of piRNA-mediated defense against transposition in eukaryotic germline.
Active DNA-targeting CRISPR-Cas systems use a2 to 4bp Protospacer Adjacent Motif (PAM) located after the target sequence to distinguish self from non-self. Examination of sequences adjacent to the genomic target sequence did reveal a strong 'NGG' PAM preference in ARMAN-1 (figure 8 c). Cas9 also uses two separate transcripts, CRISPR RNA (crRNA) and transactivation CRISPR RNA (tracrRNA), for RNA-guided DNA cleavage. Putative tracrrnas were identified in the vicinity of both the arnan-1 and the arnan-4 CRISPR-Cas9 systems (fig. 16). Previously, it was proposed that type II CRISPR systems are absent in archaea due to the lack of the host factor rnase III, which is responsible for the maturation of the crRNA-tracrRNA guide complex. Notably, no rnase III homologues were identified in the arnan-1 genome (estimated completion rate of 95%) and no internal promoters of CRISPR arrays were predicted, suggesting an as yet unidentified mechanism of guide RNA production. Biochemical experiments testing the cleavage activity of the arnan-1 and arnan-4 Cas9 proteins purified from both escherichia coli and yeast, as well as in vivo escherichia coli targeting assays, did not reveal any detectable activity (see fig. 21 and 17).
CRISPR-CasX is a novel double-RNA-guided CRISPR system
Except for Cas9, only three Cas effector protein families of class 2 were found and experimentally validated: cpf1, C2C1 and C2C 2. Another gene, c2c3, identified only on small DNA fragments, has been proposed, which also encodes this family of proteins. A new type of class 2 CRISPR-Cas system was found in the genomes of two bacteria recovered repeatedly in groundwater and sediment samples. The high degree of conservation of this system in two organisms belonging to different phyla (delta proteobacteria and phytophthora) indicates recent transphylal metastasis. This newly described system includes Cas1, Cas2, Cas4, and an uncharacterized-980 aa protein, referred to herein as CasX. The CRISPR array associated with each CasX has a highly similar repeat of 37 base pairs, a spacer of 33-34 base pairs, and a putative tracrRNA between the Cas operon and the CRISPR array (fig. 7 b). BLAST searches revealed only weak similarity to transposase (e value)>1×10-4) Wherein the similarity is limited to a specific region at the C-terminus of CasX. Distant homology detection and protein modeling identified RuvC domains near the C-terminus of CasX, whose structure was reminiscent of that found in the V-type CRISPR-Cas system (fig. 18). The remainder of the CasX protein (630N-terminal amino acids) showed no detectable similarity to any known protein, suggesting that this is a novel class 2 effector. the combination of tracrRNA and the Cas1, Cas2, and Cas4 proteins alone is unique in a V-type system. Furthermore, CasX is much smaller than any known V-type protein: 980aa compared to the typical size of Cpf1, C2C1 and C2C3 of more than 1,200 aa.
Next, it is thought that although CasX has a smaller size and non-classical locus content, CasX is capable of RNA-guided DNA targeting similar to Cas9 and Cpf1 enzymes. To test this possibility, plasmids encoding the minimal CRISPR-CasX locus were synthesized, which included CasX, a short repeat-spacer array, and an inserted non-coding region. When expressed in e.coli, this minimal locus blocks the transformation of plasmids with target sequences identified by metagenomic analysis (fig. 9a to 9c, fig. 19). In addition, transformation interference occurs only when the spacer sequence in the mini-locus matches the original spacer sequence in the plasmid target. To identify the PAM sequence of CasX, the transformation assay was repeated in E.coli using plasmids containing 5 'or 3' randomized sequences adjacent to the target site. This analysis revealed a strict preference for the sequence ' TTCN ' located directly 5' to the original spacer sequence (fig. 9 d). No 3' PAM preference was observed (figure 19). Consistent with this finding, 'TTCA' is a sequence found upstream of the putative delta proteobacteria CRISPR-castx protospacer identified in environmental samples. Notably, the two CRISPR-CasX loci share the same PAM sequence, consistent with their high degree of CasX protein homology.
Examples of single and double RNA guide systems exist in type V CRISPR loci. Environmental macrotranscriptional dataset was used to determine if CasX requires tracrRNA for DNA targeting activity. This analysis revealed that non-coding RNA transcripts with sequences complementary to the CRISPR repeats are encoded between the Cas2 open reading frame and the CRISPR array (fig. 10). Transcriptomic profiles also showed that CRISPR RNA (crRNA) was processed to include a 22 nt repeat and a 20 nt adjacent spacer, similar to the crRNA processing that occurred in the CRISPR-Cas9 system (fig. 10 a). In addition, 2 nt 3' overhangs were identified, consistent with rnase III mediated processing of crRNA-tracrRNA duplexes (fig. 10 b). To determine the dependence of the CasX activity on the putative tracrRNA, this region was deleted from the minimal CRISPR-CasX locus described above and the plasmid interference assay was repeated. Deletion of the putative tracrRNA coding sequence from the CasX plasmid abolished the robust transformation interference observed in its presence (fig. 10 c). Together, these results established CasX as a novel functional DNA-targeted dual RNA-guided CRISPR enzyme.
CRISPR-CasY, a system found only in bacterial lineages lacking isolates, identifies another novel class 2 Cas protein encoded in the genome of certain potential phylogenetic radiogenic (CPR) bacteria. These bacteria are generally of a type thatSmall cell size (based on low temperature TEM data and enrichment by filtration), very small genome and limited biosynthetic capacity, indicating that they are most likely symbionts. A new-1,200 aa Cas protein, referred to herein as CasY, appears to be part of the minimal CRISPR-Cas system, which at most includes Cas1 and CRISPR array (fig. 11 a). Most CRISPR arrays have very short spacers of 17-19 nt, but one system (casy.5) that lacks Cas1 has longer spacers (27-29 nt). Six instances of the identified CasY proteins had no significant sequence similarity to any of the proteins in the public database. Use of a Cas protein published by3,4Sensitive search of the constructed profile model (HMM) indicated that four of the six CasY proteins had local similarity with C2C3 in the C-terminal region overlapping with the RuvC domain and in a small region (. about.45 aa) at the N-terminus (e value 4X 10)-11–3×10-18) (see FIG. 18). C2C3 is a putative V-type Cas effector, identified on short contigs, without taxonomic dependencies, and has not been experimentally validated. C2C3 was found to be located behind an array with a short spacer sequence and Cas1, but no other Cas proteins, as in CasY. It is noteworthy that although sharing significant sequence similarity with other CasY proteins (optimal Blast hit: e value 6X 10)-85、7×10-75) However, the two CasY proteins identified in this study did not have significant similarity to C2C 3.
Given the low homology of CRISPR-CasY to any experimentally validated CRISPR locus, it was next sought to know if this system confers RNA-guided DNA interference, but due to the short spacer length, there is no reliable information about the possible PAM motif that might be required for such activity. To solve this problem, the entire CRISPR-cassy.1 locus is synthesized with a shortened CRISPR array and introduced into e. These cells are then challenged in a transformation assay with a target plasmid having a sequence that matches the spacer sequence in the array and which contains adjacent randomized 5 'or 3' regions to identify potential PAMs. Analysis of the transformants revealed a deletion of the sequence containing the 5' TA directly adjacent to the targeting sequence (FIG. 11 b). Using this identified PAM sequence, the casy.1 locus was tested against plasmids containing a single PAM. Plasmid interference was only demonstrated in the presence of the target containing the identified 5' TA PAM sequence (fig. 11 c). Thus, these data show that CRISPR-CasY has DNA interference activity.
Discussion of the related Art
A novel class 2 CRISPR-Cas adaptive immune system in genomes from non-cultured bacteria and archaea was identified and characterized. Evolutionary analysis of Cas1 (fig. 12a), which is universal for active CRISPR loci, indicates that the archaeal Cas9 system described herein does not significantly belong to any existing type II subtype. Cas1 phylogeny (and the presence of Cas 4) clustered it with type II-B systems, but the sequence of Cas9 was more similar to type II-C proteins (fig. 20). Thus, the archaeal type II system can appear as a fusion of type II-C and type II-B systems (FIG. 12B). Likewise, Cas1 phylogenetic analysis indicates that Cas1 from the CRISPR-CasX system is remote from any other known V-type system. The type V system has been shown to be the result of fusion of the transposon with the adapter module from the original type I system (Cas 1-Cas 2). It is therefore assumed that the CRISPR-CasX system occurs after a fusion event that is different from the one that generates the aforementioned type V system. Strikingly, both the CRISPR-CasY and putative C2C3 systems appear to lack Cas2, a protein that is thought to be essential for DNA integration into the CRISPR locus. Whereas all CRISPR-Cas systems are considered progeny of the original type I system containing both Cas1 and Cas2, the CRISPR-CasY and C2C3 systems may have different ancestors than other CRISPR-Cas systems, or alternatively, Cas2 may be lost in its evolutionary history.
The discovery of Cas9 in archaea and two previously unknown CRISPR-Cas systems in bacteria described herein uses a large set of DNA and RNA sequence data obtained from a complex natural microbial community. In the case of CasX and CasY, genomic content is crucial for predicting non-obvious functions from unassembled sequence information. Furthermore, by analyzing functional tests directed by metagenomic data, the identification of putative tracrRNA as well as targeting viral sequences was found. Interestingly, some of the most compact CRISPR-Cas loci identified to date are found in organisms with very small genomes. The consequence of the small genome size is that these organisms may depend on other community members for essential metabolic requirements and therefore they remain essentially out of the scope of traditional culture-based methods. The limited number of proteins required for interference makes these minimal systems particularly valuable for developing new genome editing tools. Importantly, it is shown herein that the metagenomic findings associated with CRISPR-Cas systems are not limited to in silico observations, but can be introduced into experimental environments where their function can be tested. Given that almost all living environments can now be probed by genomically resolved metagenomic approaches, it is expected that the combinatorial computer experimental approaches described herein will greatly expand the diversity of known CRISPR-Cas systems, providing new technologies for biological research and clinical applications.
Method of producing a composite material
Macrogenomics and macrotranscriptomics
Metagenomic samples from three different sites were analyzed: (1) acid Mine Drainage (AMD) samples collected from Richmond Mine, Iron Mountain, California between 2006 and 2010, (2) groundwater and sediment samples collected from the Rifle Integrated Field Research (IFRC) site of Colorado River adjacent to Rifle, Colorado, between 2007 and 2013. (3) Cold CO from Colorado plateau in Utah in 2009 and 20142And the driven intermittent spring Crystal Geyser collects the underground water.
For AMD data, Denef and Banfield (2012) and Miller et al (2011) reported DNA extraction methods and short read sequencing. For Rifle data, Anantharaman et al (2016) and Brown et al (2015) describe DNA and RNA extraction as well as sequencing, assembly and reconstituted genomes. For samples from Crystal Geyser, the methods followed those described by Probst et al (2016) and Emerson et al (2015). Briefly, DNA was extracted from the samples using the PowerSoil DNA isolation kit (MoBio Laboratories inc., Carlsbad, CA, USA). RNA was extracted from 0.2 μm filtrates taken from six 2011Rifle groundwater samples as described by Brown et al (2015). DNA was sequenced on the IlluminaHiSeq2000 platform and macrotranscriptome cDNA on the 5500XL SOLiD platform. For the re-analysis of newly reported Crystal Geyser data and AMD data, the IDBA-UD assembly sequence was used. Bowtie2 was used for DNA and rna (cdna) read mapping, which were used to determine sequencing coverage and gene expression, respectively. Open Reading Frames (ORFs) were predicted on the assembled scaffold using Prodigal. Scaffolds from the Crystal Geyser dataset were ranked based on differential coverage abundance patterns using a combination of abagaca, abagaca 2(https:// githu. com/CK7) Maxbin2 and tetranucleotide frequencies using emerging self-organizing maps (ESOM). The genomes were manually organized using% GC content, taxonomic dependencies, and genome integrity. Py (https:// githu. com/christopherbbrown) was used to correct scaffold errors.
CRISPR-Cas computational analysis
Known Cas proteins from assembled contigs of various samples were scanned using a Hidden Markov Model (HMM) profile, which was constructed using the HMMer suite based on alignment of Makarova et al and Shmakov et al. CRISPR arrays were identified using a local version of the criisprfinder software. Loci comprising both Cas1 and CRISPR arrays were further analyzed if one of the 10 ORFs adjacent to the Cas1 gene encodes an uncharacterized protein of greater than 800 aa, and no known Cas interfering genes were identified on the same contig. These large proteins were further analyzed as potential class 2 Cas effectors. Based on sequence similarity using MCL, potential effectors were clustered into protein families. These protein families were expanded by constructing HMMs representing each of these families and using them to retrieve similar Cas proteins in the metagenomic dataset. To ensure that the protein family is indeed new, BLAST against the non-redundant (nr) and metagenomic (env _ nr) protein databases of NCBI and HMM search against the UniProt knowledge base were used to search for known homologues. Only proteins without full-length hits (> 25% of the protein length) were considered novel proteins. HHpred from the HH-suite (HH-suite) was used for a remote homology search of the putative Cas proteins. Domain architecture was inferred using high-resolution HHpred hits based on comparison to the resolved crystal structure and secondary structure predicted by JPred 4. The HMM database, including the newly discovered Cas proteins, can be seen in supplementary data 1.
The spacer sequence was determined from the assembled data using a CrispFinder. CRASS is used to locate additional spacer sequences in short DNA reads of relevant samples. The spacer sequence target (original spacer sequence) is then identified by a BLAST search (using a "-task blastn-short") for the relevant metagenomic assembly of hits with ≦ 1 mismatch to the spacer sequence. Hits belonging to contigs containing the relevant repeats are filtered out (to avoid identifying the CRISPR array as the original spacer). The Protospacer Adjacent Motifs (PAMs) were identified by aligning the regions flanking the protospacer and using WebLogo visualization. The RNA structure was predicted using mFold. CRISPR array diversity was analyzed by manually aligning the spacer, repeat and flanking sequences from the assembly data. Manual alignment and contig visualization was performed using Geneious 9.1.
Phylogenetic analysis of Cas1 and Cas9 proteins for the newly identified system was used with proteins from Makarova et al and Shmakov et al. Non-redundant groups were compiled by clustering proteins with > 90% identity together using CD-HIT. Alignments were generated using MAFFT and maximum likelihood phylogeny was constructed using RAxML with protamamgalg as a surrogate model and 100 bootstrap samples. Cas1 trees use branches to casposons as roots. The tree was visualized using FigTree 1.4.1(http:// tree. bio. ed. ac. uk/software/FigTree /) and iTOL v 3.
Generation of heterologous plasmids
The metagenomic contig was made into a minimal CRISPR interfering plasmid by removing the proteins associated with the acquisition of CasX and reducing the size of the CRISPR array of both CasX and CasY. The minimal loci are synthesized as gblocks (integrated DNA technology) and assembled using Gibson assembly.
PAM deletion assay
PAM deletion assays were performed with modifications as previously described. Plasmid libraries containing randomized PAM sequences were passed through DN with a primer pair containing a target with 7 nt randomized PAM regionsThe A oligonucleotides were annealed to assemble and extended with Klenow fragment (NEB). Double-stranded DNA was digested with EcoRI and NcoI and ligated into pUC19 backbone. The ligated library was transformed into DH5 alpha and harvested>108And isolating and purifying the plasmid. 200ng of the pooled library was transformed into electrocompetent E.coli carrying the CRISPR locus or a control plasmid without a locus. The transformed cells were plated at 25 ℃ in the presence of carbenicillin (100mg L)-1) And chloramphenicol (30mg L-1) For 30 hours on selective medium. Plasmid DNA was extracted and PAM sequences were amplified with adaptors for Illumina sequencing. PAM regions of 7 nt were extracted and PAM frequencies for each 7 nt sequence were calculated. Weblogo is generated using a PAM sequence that is missing above a specified threshold.
Plasmid interference
Putative targets identified from metagenomic sequence analysis or PAM deletion assays were cloned into pUC19 plasmid. 10ng of the target plasmid was transformed into electrocompetent E.coli (NEB-stabilized) containing CRISPR locus plasmid. Cells were allowed to recover at 25 ℃ for 2 hours and appropriate dilutions plated on selective media. The plates were incubated at 25 ℃ and the colony forming units were counted. All plasmid interference experiments were performed in triplicate, and electrocompetent cells were prepared independently for each replicate.
ARMAN-Cas9 protein expression and purification
Expression constructs of Cas9 from ARMAN-1(AR1) and ARMAN-4(AR4) were assembled from codon-optimized gBlocks (integrated DNA technologies) for E.coli. Cloning of the assembled Gene into pET-based expression vector as N-terminal His6-MBP or His6A fusion protein. The expression vector was transformed into BL21(DE3) E.coli cells and grown in LB broth at 37 ℃. For protein expression, cells were induced with 0.4mM IPTG (isopropyl. beta. -D-1-thiogalactopyranoside) during mid-log phase and incubated overnight at 16 ℃. All subsequent steps were carried out at 4 ℃. The cell pellet was resuspended in lysis buffer (50mM Tris-HCl pH8, 500mM NaCl, 1mM CEEP, 10mM imidazole) 0.5% Triton X-100 and supplemented with completeProtease inhibitor cocktail (Roche) and then lysed by sonication. The lysates were clarified by centrifugation at 15000g for 40 min and applied to Superflow Ni-NTA agarose (Qiagen) in portions. The resin was washed well with washing buffer A (50mM Tris-HCl pH8, 500mM NaCl, 1mM TCEP, 10mM imidazole) and then with 5 column volumes of washing buffer B (50mM Tris-HCl pH8, 1M NaCl, 1mM CEEP, 10mM imidazole). The protein was eluted from the Ni-NTA resin with elution buffer (50mM Tris-HCl pH8, 500mM NaCl, 1mM TCEP, 300mM imidazole). His removal by TEV protease during overnight dialysis against Wash buffer A6-MBP tag. The cleaved Cas9 was removed from the affinity tag by a second Ni-NTA agarose column. The proteins were dialyzed into IEX buffer A (50mM Tris-HCl pH7.5, 300mM NaCl, 1mM TCEP, 5% glycerol) and then applied to a 5mL heparin HiTrap column (GE Life Sciences). Cas9 was eluted with a linear NaCl (0.3-1.5M) gradient. Fractions were combined and concentrated using a 30kDa rotary concentrator (Thermo Fisher). Cas9 was further purified by size exclusion chromatography on a Superdex 200pg column (GELife Sciences), where applicable, and stored in IEX buffer a for subsequent cleavage assay. For yeast expression, AR1-Cas9 was cloned into the Gal1/10His6-MBP TEV Ura s.cerevisiae expression vector (Addgene plasmid # 48305). The vector was transformed into BY4741 URA3 strain and the culture was grown in medium at 30 ℃. Protein expression was induced with 2% w/v galactose at an OD600 of-0.6 and incubated overnight at 16 ℃. Protein purification was performed as above.
RNA in vitro transcription and oligonucleotide purification
As described above65In vitro transcription reactions were performed using a synthetic DNA template containing the T7 promoter sequence. All in vitro transcribed guide and target RNA or DNA was purified by denaturing PAGE. The double stranded target RNA and DNA were hybridized in 20mM Tris HCl pH7.5 and 100mM NaCl by incubation at 95 ℃ for 1 minute, followed by slow cooling to room temperature. Hybrids were purified by native PAGE.
In vitro cleavage assay
Purified DNA and RNA oligonucleotides were radiolabeled in 1 XPNK buffer at 37 ℃ for 30 minutes using T4 polynucleotide kinase (NEB) and [ gamma-32P ] ATP (Perkin-Elmer). PNK was heat inactivated at 65 ℃ for 20 min and free ATP was removed from the labeling reaction using an illustra Microspin G-25 column (GE Life Sciences). CrRNA and tracrRNA were mixed in equimolar amounts in 1x refolding buffer (50mM Tris HCl pH7.5, 300mM NaCl, 1 mtcep, 5% glycerol) and incubated at 70 ℃ for 5 minutes and then slowly cooled to room temperature. The reaction was made up to 1mM final metal concentration and subsequently heated at 50 ℃ for 5 minutes. After slow cooling to room temperature, the refolded guides were placed on ice. Unless specified for buffer, salt concentration, Cas9 was reconstituted with equimolar amounts of the guide in 1x cleavage buffer (50mM Tris HCl ph7.5, 300mM NaCl, 1mM TCEP, 5% glycerol, 5mM divalent metal) at 37 ℃ for 10 minutes. The cleavage reaction was performed in a 10x excess Cas 9-directed complex with radiolabeled target in 1x cleavage buffer at 37 ℃ or specified temperature. The reaction was quenched in an equal volume of gel loading buffer supplemented with 50mM EDTA. Cleavage products were separated on 10% denaturing PAGE and visualized by phosphorescence imaging.
In vivo escherichia coli interference assay
As previously disclosed66Coli transformation assays of AR1-Cas9 and AR4-Cas9 were performed. Briefly, E.coli transformed with the guide RNA was made electrocompetent. Cells were then transformed with 9fmol of a plasmid encoding wildtype or catalytically inactive Cas9(dCas 9). The dilution series of recovered cells were plated on LB plates with selective antibiotics. Colonies were counted after 16 hours at 37 ℃.
Table 1. detailed information on the organism and genomic position identifying the CRISPR-Cas system, as well as information on the number and average length of the reconstructed spacer and length of the repeat (NA, not available). The ARMAN-1 spacer sequence was reconstructed from 16 samples.
While the invention has been described with reference to specific embodiments thereof, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the appended claims.
Sequence listing
<110> Doudna, Jennifer A
Burstein, David
Banfield, Jillian F
Harrington, Lucas B
<120> RNA-guided nucleic acid modifying enzymes and methods of use thereof
<130> BERK-343WO
<150> US 62/402,849
<151> 2016-09-30
<160> 134
<170> PatentIn 3.5 edition
<210> 1
<211> 1125
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 1
Met Arg Lys Lys Leu Phe Lys Gly Tyr Ile Leu His Asn Lys Arg Leu
1 5 10 15
Val Tyr Thr Gly Lys Ala Ala Ile Arg Ser Ile Lys Tyr Pro Leu Val
20 25 30
Ala Pro Asn Lys Thr Ala Leu Asn Asn Leu Ser Glu Lys Ile Ile Tyr
35 40 45
Asp Tyr Glu His Leu Phe Gly Pro Leu Asn Val Ala Ser Tyr Ala Arg
50 55 60
Asn Ser Asn Arg Tyr Ser Leu Val Asp Phe Trp Ile Asp Ser Leu Arg
65 70 75 80
Ala Gly Val Ile Trp Gln Ser Lys Ser Thr Ser Leu Ile Asp Leu Ile
85 90 95
Ser Lys Leu Glu Gly Ser Lys Ser Pro Ser Glu Lys Ile Phe Glu Gln
100 105 110
Ile Asp Phe Glu Leu Lys Asn Lys Leu Asp Lys Glu Gln Phe Lys Asp
115 120 125
Ile Ile Leu Leu Asn Thr Gly Ile Arg Ser Ser Ser Asn Val Arg Ser
130 135 140
Leu Arg Gly Arg Phe Leu Lys Cys Phe Lys Glu Glu Phe Arg Asp Thr
145 150 155 160
Glu Glu Val Ile Ala Cys Val Asp Lys Trp Ser Lys Asp Leu Ile Val
165 170 175
Glu Gly Lys Ser Ile Leu Val Ser Lys Gln Phe Leu Tyr Trp Glu Glu
180 185 190
Glu Phe Gly Ile Lys Ile Phe Pro His Phe Lys Asp Asn His Asp Leu
195 200 205
Pro Lys Leu Thr Phe Phe Val Glu Pro Ser Leu Glu Phe Ser Pro His
210 215 220
Leu Pro Leu Ala Asn Cys Leu Glu Arg Leu Lys Lys Phe Asp Ile Ser
225 230 235 240
Arg Glu Ser Leu Leu Gly Leu Asp Asn Asn Phe Ser Ala Phe Ser Asn
245 250 255
Tyr Phe Asn Glu Leu Phe Asn Leu Leu Ser Arg Gly Glu Ile Lys Lys
260 265 270
Ile Val Thr Ala Val Leu Ala Val Ser Lys Ser Trp Glu Asn Glu Pro
275 280 285
Glu Leu Glu Lys Arg Leu His Phe Leu Ser Glu Lys Ala Lys Leu Leu
290 295 300
Gly Tyr Pro Lys Leu Thr Ser Ser Trp Ala Asp Tyr Arg Met Ile Ile
305 310 315 320
Gly Gly Lys Ile Lys Ser Trp His Ser Asn Tyr Thr Glu Gln Leu Ile
325 330 335
Lys Val Arg Glu Asp Leu Lys Lys His Gln Ile Ala Leu Asp Lys Leu
340 345 350
Gln Glu Asp Leu Lys Lys Val Val Asp Ser Ser Leu Arg Glu Gln Ile
355 360 365
Glu Ala Gln Arg Glu Ala Leu Leu Pro Leu Leu Asp Thr Met Leu Lys
370 375 380
Glu Lys Asp Phe Ser Asp Asp Leu Glu Leu Tyr Arg Phe Ile Leu Ser
385 390 395 400
Asp Phe Lys Ser Leu Leu Asn Gly Ser Tyr Gln Arg Tyr Ile Gln Thr
405 410 415
Glu Glu Glu Arg Lys Glu Asp Arg Asp Val Thr Lys Lys Tyr Lys Asp
420 425 430
Leu Tyr Ser Asn Leu Arg Asn Ile Pro Arg Phe Phe Gly Glu Ser Lys
435 440 445
Lys Glu Gln Phe Asn Lys Phe Ile Asn Lys Ser Leu Pro Thr Ile Asp
450 455 460
Val Gly Leu Lys Ile Leu Glu Asp Ile Arg Asn Ala Leu Glu Thr Val
465 470 475 480
Ser Val Arg Lys Pro Pro Ser Ile Thr Glu Glu Tyr Val Thr Lys Gln
485 490 495
Leu Glu Lys Leu Ser Arg Lys Tyr Lys Ile Asn Ala Phe Asn Ser Asn
500 505 510
Arg Phe Lys Gln Ile Thr Glu Gln Val Leu Arg Lys Tyr Asn Asn Gly
515 520 525
Glu Leu Pro Lys Ile Ser Glu Val Phe Tyr Arg Tyr Pro Arg Glu Ser
530 535 540
His Val Ala Ile Arg Ile Leu Pro Val Lys Ile Ser Asn Pro Arg Lys
545 550 555 560
Asp Ile Ser Tyr Leu Leu Asp Lys Tyr Gln Ile Ser Pro Asp Trp Lys
565 570 575
Asn Ser Asn Pro Gly Glu Val Val Asp Leu Ile Glu Ile Tyr Lys Leu
580 585 590
Thr Leu Gly Trp Leu Leu Ser Cys Asn Lys Asp Phe Ser Met Asp Phe
595 600 605
Ser Ser Tyr Asp Leu Lys Leu Phe Pro Glu Ala Ala Ser Leu Ile Lys
610 615 620
Asn Phe Gly Ser Cys Leu Ser Gly Tyr Tyr Leu Ser Lys Met Ile Phe
625 630 635 640
Asn Cys Ile Thr Ser Glu Ile Lys Gly Met Ile Thr Leu Tyr Thr Arg
645 650 655
Asp Lys Phe Val Val Arg Tyr Val Thr Gln Met Ile Gly Ser Asn Gln
660 665 670
Lys Phe Pro Leu Leu Cys Leu Val Gly Glu Lys Gln Thr Lys Asn Phe
675 680 685
Ser Arg Asn Trp Gly Val Leu Ile Glu Glu Lys Gly Asp Leu Gly Glu
690 695 700
Glu Lys Asn Gln Glu Lys Cys Leu Ile Phe Lys Asp Lys Thr Asp Phe
705 710 715 720
Ala Lys Ala Lys Glu Val Glu Ile Phe Lys Asn Asn Ile Trp Arg Ile
725 730 735
Arg Thr Ser Lys Tyr Gln Ile Gln Phe Leu Asn Arg Leu Phe Lys Lys
740 745 750
Thr Lys Glu Trp Asp Leu Met Asn Leu Val Leu Ser Glu Pro Ser Leu
755 760 765
Val Leu Glu Glu Glu Trp Gly Val Ser Trp Asp Lys Asp Lys Leu Leu
770 775 780
Pro Leu Leu Lys Lys Glu Lys Ser Cys Glu Glu Arg Leu Tyr Tyr Ser
785 790 795 800
Leu Pro Leu Asn Leu Val Pro Ala Thr Asp Tyr Lys Glu Gln Ser Ala
805 810 815
Glu Ile Glu Gln Arg Asn Thr Tyr Leu Gly Leu Asp Val Gly Glu Phe
820 825 830
Gly Val Ala Tyr Ala Val Val Arg Ile Val Arg Asp Arg Ile Glu Leu
835 840 845
Leu Ser Trp Gly Phe Leu Lys Asp Pro Ala Leu Arg Lys Ile Arg Glu
850 855 860
Arg Val Gln Asp Met Lys Lys Lys Gln Val Met Ala Val Phe Ser Ser
865 870 875 880
Ser Ser Thr Ala Val Ala Arg Val Arg Glu Met Ala Ile His Ser Leu
885 890 895
Arg Asn Gln Ile His Ser Ile Ala Leu Ala Tyr Lys Ala Lys Ile Ile
900 905 910
Tyr Glu Ile Ser Ile Ser Asn Phe Glu Thr Gly Gly Asn Arg Met Ala
915 920 925
Lys Ile Tyr Arg Ser Ile Lys Val Ser Asp Val Tyr Arg Glu Ser Gly
930 935 940
Ala Asp Thr Leu Val Ser Glu Met Ile Trp Gly Lys Lys Asn Lys Gln
945 950 955 960
Met Gly Asn His Ile Ser Ser Tyr Ala Thr Ser Tyr Thr Cys Cys Asn
965 970 975
Cys Ala Arg Thr Pro Phe Glu Leu Val Ile Asp Asn Asp Lys Glu Tyr
980 985 990
Glu Lys Gly Gly Asp Glu Phe Ile Phe Asn Val Gly Asp Glu Lys Lys
995 1000 1005
Val Arg Gly Phe Leu Gln Lys Ser Leu Leu Gly Lys Thr Ile Lys
1010 1015 1020
Gly Lys Glu Val Leu Lys Ser Ile Lys Glu Tyr Ala Arg Pro Pro
1025 1030 1035
Ile Arg Glu Val Leu Leu Glu Gly Glu Asp Val Glu Gln Leu Leu
1040 1045 1050
Lys Arg Arg Gly Asn Ser Tyr Ile Tyr Arg Cys Pro Phe Cys Gly
1055 1060 1065
Tyr Lys Thr Asp Ala Asp Ile Gln Ala Ala Leu Asn Ile Ala Cys
1070 1075 1080
Arg Gly Tyr Ile Ser Asp Asn Ala Lys Asp Ala Val Lys Glu Gly
1085 1090 1095
Glu Arg Lys Leu Asp Tyr Ile Leu Glu Val Arg Lys Leu Trp Glu
1100 1105 1110
Lys Asn Gly Ala Val Leu Arg Ser Ala Lys Phe Leu
1115 1120 1125
<210> 2
<211> 1226
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 2
Met Gln Lys Val Arg Lys Thr Leu Ser Glu Val His Lys Asn Pro Tyr
1 5 10 15
Gly Thr Lys Val Arg Asn Ala Lys Thr Gly Tyr Ser Leu Gln Ile Glu
20 25 30
Arg Leu Ser Tyr Thr Gly Lys Glu Gly Met Arg Ser Phe Lys Ile Pro
35 40 45
Leu Glu Asn Lys Asn Lys Glu Val Phe Asp Glu Phe Val Lys Lys Ile
50 55 60
Arg Asn Asp Tyr Ile Ser Gln Val Gly Leu Leu Asn Leu Ser Asp Trp
65 70 75 80
Tyr Glu His Tyr Gln Glu Lys Gln Glu His Tyr Ser Leu Ala Asp Phe
85 90 95
Trp Leu Asp Ser Leu Arg Ala Gly Val Ile Phe Ala His Lys Glu Thr
100 105 110
Glu Ile Lys Asn Leu Ile Ser Lys Ile Arg Gly Asp Lys Ser Ile Val
115 120 125
Asp Lys Phe Asn Ala Ser Ile Lys Lys Lys His Ala Asp Leu Tyr Ala
130 135 140
Leu Val Asp Ile Lys Ala Leu Tyr Asp Phe Leu Thr Ser Asp Ala Arg
145 150 155 160
Arg Gly Leu Lys Thr Glu Glu Glu Phe Phe Asn Ser Lys Arg Asn Thr
165 170 175
Leu Phe Pro Lys Phe Arg Lys Lys Asp Asn Lys Ala Val Asp Leu Trp
180 185 190
Val Lys Lys Phe Ile Gly Leu Asp Asn Lys Asp Lys Leu Asn Phe Thr
195 200 205
Lys Lys Phe Ile Gly Phe Asp Pro Asn Pro Gln Ile Lys Tyr Asp His
210 215 220
Thr Phe Phe Phe His Gln Asp Ile Asn Phe Asp Leu Glu Arg Ile Thr
225 230 235 240
Thr Pro Lys Glu Leu Ile Ser Thr Tyr Lys Lys Phe Leu Gly Lys Asn
245 250 255
Lys Asp Leu Tyr Gly Ser Asp Glu Thr Thr Glu Asp Gln Leu Lys Met
260 265 270
Val Leu Gly Phe His Asn Asn His Gly Ala Phe Ser Lys Tyr Phe Asn
275 280 285
Ala Ser Leu Glu Ala Phe Arg Gly Arg Asp Asn Ser Leu Val Glu Gln
290 295 300
Ile Ile Asn Asn Ser Pro Tyr Trp Asn Ser His Arg Lys Glu Leu Glu
305 310 315 320
Lys Arg Ile Ile Phe Leu Gln Val Gln Ser Lys Lys Ile Lys Glu Thr
325 330 335
Glu Leu Gly Lys Pro His Glu Tyr Leu Ala Ser Phe Gly Gly Lys Phe
340 345 350
Glu Ser Trp Val Ser Asn Tyr Leu Arg Gln Glu Glu Glu Val Lys Arg
355 360 365
Gln Leu Phe Gly Tyr Glu Glu Asn Lys Lys Gly Gln Lys Lys Phe Ile
370 375 380
Val Gly Asn Lys Gln Glu Leu Asp Lys Ile Ile Arg Gly Thr Asp Glu
385 390 395 400
Tyr Glu Ile Lys Ala Ile Ser Lys Glu Thr Ile Gly Leu Thr Gln Lys
405 410 415
Cys Leu Lys Leu Leu Glu Gln Leu Lys Asp Ser Val Asp Asp Tyr Thr
420 425 430
Leu Ser Leu Tyr Arg Gln Leu Ile Val Glu Leu Arg Ile Arg Leu Asn
435 440 445
Val Glu Phe Gln Glu Thr Tyr Pro Glu Leu Ile Gly Lys Ser Glu Lys
450 455 460
Asp Lys Glu Lys Asp Ala Lys Asn Lys Arg Ala Asp Lys Arg Tyr Pro
465 470 475 480
Gln Ile Phe Lys Asp Ile Lys Leu Ile Pro Asn Phe Leu Gly Glu Thr
485 490 495
Lys Gln Met Val Tyr Lys Lys Phe Ile Arg Ser Ala Asp Ile Leu Tyr
500 505 510
Glu Gly Ile Asn Phe Ile Asp Gln Ile Asp Lys Gln Ile Thr Gln Asn
515 520 525
Leu Leu Pro Cys Phe Lys Asn Asp Lys Glu Arg Ile Glu Phe Thr Glu
530 535 540
Lys Gln Phe Glu Thr Leu Arg Arg Lys Tyr Tyr Leu Met Asn Ser Ser
545 550 555 560
Arg Phe His His Val Ile Glu Gly Ile Ile Asn Asn Arg Lys Leu Ile
565 570 575
Glu Met Lys Lys Arg Glu Asn Ser Glu Leu Lys Thr Phe Ser Asp Ser
580 585 590
Lys Phe Val Leu Ser Lys Leu Phe Leu Lys Lys Gly Lys Lys Tyr Glu
595 600 605
Asn Glu Val Tyr Tyr Thr Phe Tyr Ile Asn Pro Lys Ala Arg Asp Gln
610 615 620
Arg Arg Ile Lys Ile Val Leu Asp Ile Asn Gly Asn Asn Ser Val Gly
625 630 635 640
Ile Leu Gln Asp Leu Val Gln Lys Leu Lys Pro Lys Trp Asp Asp Ile
645 650 655
Ile Lys Lys Asn Asp Met Gly Glu Leu Ile Asp Ala Ile Glu Ile Glu
660 665 670
Lys Val Arg Leu Gly Ile Leu Ile Ala Leu Tyr Cys Glu His Lys Phe
675 680 685
Lys Ile Lys Lys Glu Leu Leu Ser Leu Asp Leu Phe Ala Ser Ala Tyr
690 695 700
Gln Tyr Leu Glu Leu Glu Asp Asp Pro Glu Glu Leu Ser Gly Thr Asn
705 710 715 720
Leu Gly Arg Phe Leu Gln Ser Leu Val Cys Ser Glu Ile Lys Gly Ala
725 730 735
Ile Asn Lys Ile Ser Arg Thr Glu Tyr Ile Glu Arg Tyr Thr Val Gln
740 745 750
Pro Met Asn Thr Glu Lys Asn Tyr Pro Leu Leu Ile Asn Lys Glu Gly
755 760 765
Lys Ala Thr Trp His Ile Ala Ala Lys Asp Asp Leu Ser Lys Lys Lys
770 775 780
Gly Gly Gly Thr Val Ala Met Asn Gln Lys Ile Gly Lys Asn Phe Phe
785 790 795 800
Gly Lys Gln Asp Tyr Lys Thr Val Phe Met Leu Gln Asp Lys Arg Phe
805 810 815
Asp Leu Leu Thr Ser Lys Tyr His Leu Gln Phe Leu Ser Lys Thr Leu
820 825 830
Asp Thr Gly Gly Gly Ser Trp Trp Lys Asn Lys Asn Ile Asp Leu Asn
835 840 845
Leu Ser Ser Tyr Ser Phe Ile Phe Glu Gln Lys Val Lys Val Glu Trp
850 855 860
Asp Leu Thr Asn Leu Asp His Pro Ile Lys Ile Lys Pro Ser Glu Asn
865 870 875 880
Ser Asp Asp Arg Arg Leu Phe Val Ser Ile Pro Phe Val Ile Lys Pro
885 890 895
Lys Gln Thr Lys Arg Lys Asp Leu Gln Thr Arg Val Asn Tyr Met Gly
900 905 910
Ile Asp Ile Gly Glu Tyr Gly Leu Ala Trp Thr Ile Ile Asn Ile Asp
915 920 925
Leu Lys Asn Lys Lys Ile Asn Lys Ile Ser Lys Gln Gly Phe Ile Tyr
930 935 940
Glu Pro Leu Thr His Lys Val Arg Asp Tyr Val Ala Thr Ile Lys Asp
945 950 955 960
Asn Gln Val Arg Gly Thr Phe Gly Met Pro Asp Thr Lys Leu Ala Arg
965 970 975
Leu Arg Glu Asn Ala Ile Thr Ser Leu Arg Asn Gln Val His Asp Ile
980 985 990
Ala Met Arg Tyr Asp Ala Lys Pro Val Tyr Glu Phe Glu Ile Ser Asn
995 1000 1005
Phe Glu Thr Gly Ser Asn Lys Val Lys Val Ile Tyr Asp Ser Val
1010 1015 1020
Lys Arg Ala Asp Ile Gly Arg Gly Gln Asn Asn Thr Glu Ala Asp
1025 1030 1035
Asn Thr Glu Val Asn Leu Val Trp Gly Lys Thr Ser Lys Gln Phe
1040 1045 1050
Gly Ser Gln Ile Gly Ala Tyr Ala Thr Ser Tyr Ile Cys Ser Phe
1055 1060 1065
Cys Gly Tyr Ser Pro Tyr Tyr Glu Phe Glu Asn Ser Lys Ser Gly
1070 1075 1080
Asp Glu Glu Gly Ala Arg Asp Asn Leu Tyr Gln Met Lys Lys Leu
1085 1090 1095
Ser Arg Pro Ser Leu Glu Asp Phe Leu Gln Gly Asn Pro Val Tyr
1100 1105 1110
Lys Thr Phe Arg Asp Phe Asp Lys Tyr Lys Asn Asp Gln Arg Leu
1115 1120 1125
Gln Lys Thr Gly Asp Lys Asp Gly Glu Trp Lys Thr His Arg Gly
1130 1135 1140
Asn Thr Ala Ile Tyr Ala Cys Gln Lys Cys Arg His Ile Ser Asp
1145 1150 1155
Ala Asp Ile Gln Ala Ser Tyr Trp Ile Ala Leu Lys Gln Val Val
1160 1165 1170
Arg Asp Phe Tyr Lys Asp Lys Glu Met Asp Gly Asp Leu Ile Gln
1175 1180 1185
Gly Asp Asn Lys Asp Lys Arg Lys Val Asn Glu Leu Asn Arg Leu
1190 1195 1200
Ile Gly Val His Lys Asp Val Pro Ile Ile Asn Lys Asn Leu Ile
1205 1210 1215
Thr Ser Leu Asp Ile Asn Leu Leu
1220 1225
<210> 3
<211> 1160
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 3
Met Lys Ala Lys Lys Ser Phe Tyr Asn Gln Lys Arg Lys Phe Gly Lys
1 5 10 15
Arg Gly Tyr Arg Leu His Asp Glu Arg Ile Ala Tyr Ser Gly Gly Ile
20 25 30
Gly Ser Met Arg Ser Ile Lys Tyr Glu Leu Lys Asp Ser Tyr Gly Ile
35 40 45
Ala Gly Leu Arg Asn Arg Ile Ala Asp Ala Thr Ile Ser Asp Asn Lys
50 55 60
Trp Leu Tyr Gly Asn Ile Asn Leu Asn Asp Tyr Leu Glu Trp Arg Ser
65 70 75 80
Ser Lys Thr Asp Lys Gln Ile Glu Asp Gly Asp Arg Glu Ser Ser Leu
85 90 95
Leu Gly Phe Trp Leu Glu Ala Leu Arg Leu Gly Phe Val Phe Ser Lys
100 105 110
Gln Ser His Ala Pro Asn Asp Phe Asn Glu Thr Ala Leu Gln Asp Leu
115 120 125
Phe Glu Thr Leu Asp Asp Asp Leu Lys His Val Leu Asp Arg Lys Lys
130 135 140
Trp Cys Asp Phe Ile Lys Ile Gly Thr Pro Lys Thr Asn Asp Gln Gly
145 150 155 160
Arg Leu Lys Lys Gln Ile Lys Asn Leu Leu Lys Gly Asn Lys Arg Glu
165 170 175
Glu Ile Glu Lys Thr Leu Asn Glu Ser Asp Asp Glu Leu Lys Glu Lys
180 185 190
Ile Asn Arg Ile Ala Asp Val Phe Ala Lys Asn Lys Ser Asp Lys Tyr
195 200 205
Thr Ile Phe Lys Leu Asp Lys Pro Asn Thr Glu Lys Tyr Pro Arg Ile
210 215 220
Asn Asp Val Gln Val Ala Phe Phe Cys His Pro Asp Phe Glu Glu Ile
225 230 235 240
Thr Glu Arg Asp Arg Thr Lys Thr Leu Asp Leu Ile Ile Asn Arg Phe
245 250 255
Asn Lys Arg Tyr Glu Ile Thr Glu Asn Lys Lys Asp Asp Lys Thr Ser
260 265 270
Asn Arg Met Ala Leu Tyr Ser Leu Asn Gln Gly Tyr Ile Pro Arg Val
275 280 285
Leu Asn Asp Leu Phe Leu Phe Val Lys Asp Asn Glu Asp Asp Phe Ser
290 295 300
Gln Phe Leu Ser Asp Leu Glu Asn Phe Phe Ser Phe Ser Asn Glu Gln
305 310 315 320
Ile Lys Ile Ile Lys Glu Arg Leu Lys Lys Leu Lys Lys Tyr Ala Glu
325 330 335
Pro Ile Pro Gly Lys Pro Gln Leu Ala Asp Lys Trp Asp Asp Tyr Ala
340 345 350
Ser Asp Phe Gly Gly Lys Leu Glu Ser Trp Tyr Ser Asn Arg Ile Glu
355 360 365
Lys Leu Lys Lys Ile Pro Glu Ser Val Ser Asp Leu Arg Asn Asn Leu
370 375 380
Glu Lys Ile Arg Asn Val Leu Lys Lys Gln Asn Asn Ala Ser Lys Ile
385 390 395 400
Leu Glu Leu Ser Gln Lys Ile Ile Glu Tyr Ile Arg Asp Tyr Gly Val
405 410 415
Ser Phe Glu Lys Pro Glu Ile Ile Lys Phe Ser Trp Ile Asn Lys Thr
420 425 430
Lys Asp Gly Gln Lys Lys Val Phe Tyr Val Ala Lys Met Ala Asp Arg
435 440 445
Glu Phe Ile Glu Lys Leu Asp Leu Trp Met Ala Asp Leu Arg Ser Gln
450 455 460
Leu Asn Glu Tyr Asn Gln Asp Asn Lys Val Ser Phe Lys Lys Lys Gly
465 470 475 480
Lys Lys Ile Glu Glu Leu Gly Val Leu Asp Phe Ala Leu Asn Lys Ala
485 490 495
Lys Lys Asn Lys Ser Thr Lys Asn Glu Asn Gly Trp Gln Gln Lys Leu
500 505 510
Ser Glu Ser Ile Gln Ser Ala Pro Leu Phe Phe Gly Glu Gly Asn Arg
515 520 525
Val Arg Asn Glu Glu Val Tyr Asn Leu Lys Asp Leu Leu Phe Ser Glu
530 535 540
Ile Lys Asn Val Glu Asn Ile Leu Met Ser Ser Glu Ala Glu Asp Leu
545 550 555 560
Lys Asn Ile Lys Ile Glu Tyr Lys Glu Asp Gly Ala Lys Lys Gly Asn
565 570 575
Tyr Val Leu Asn Val Leu Ala Arg Phe Tyr Ala Arg Phe Asn Glu Asp
580 585 590
Gly Tyr Gly Gly Trp Asn Lys Val Lys Thr Val Leu Glu Asn Ile Ala
595 600 605
Arg Glu Ala Gly Thr Asp Phe Ser Lys Tyr Gly Asn Asn Asn Asn Arg
610 615 620
Asn Ala Gly Arg Phe Tyr Leu Asn Gly Arg Glu Arg Gln Val Phe Thr
625 630 635 640
Leu Ile Lys Phe Glu Lys Ser Ile Thr Val Glu Lys Ile Leu Glu Leu
645 650 655
Val Lys Leu Pro Ser Leu Leu Asp Glu Ala Tyr Arg Asp Leu Val Asn
660 665 670
Glu Asn Lys Asn His Lys Leu Arg Asp Val Ile Gln Leu Ser Lys Thr
675 680 685
Ile Met Ala Leu Val Leu Ser His Ser Asp Lys Glu Lys Gln Ile Gly
690 695 700
Gly Asn Tyr Ile His Ser Lys Leu Ser Gly Tyr Asn Ala Leu Ile Ser
705 710 715 720
Lys Arg Asp Phe Ile Ser Arg Tyr Ser Val Gln Thr Thr Asn Gly Thr
725 730 735
Gln Cys Lys Leu Ala Ile Gly Lys Gly Lys Ser Lys Lys Gly Asn Glu
740 745 750
Ile Asp Arg Tyr Phe Tyr Ala Phe Gln Phe Phe Lys Asn Asp Asp Ser
755 760 765
Lys Ile Asn Leu Lys Val Ile Lys Asn Asn Ser His Lys Asn Ile Asp
770 775 780
Phe Asn Asp Asn Glu Asn Lys Ile Asn Ala Leu Gln Val Tyr Ser Ser
785 790 795 800
Asn Tyr Gln Ile Gln Phe Leu Asp Trp Phe Phe Glu Lys His Gln Gly
805 810 815
Lys Lys Thr Ser Leu Glu Val Gly Gly Ser Phe Thr Ile Ala Glu Lys
820 825 830
Ser Leu Thr Ile Asp Trp Ser Gly Ser Asn Pro Arg Val Gly Phe Lys
835 840 845
Arg Ser Asp Thr Glu Glu Lys Arg Val Phe Val Ser Gln Pro Phe Thr
850 855 860
Leu Ile Pro Asp Asp Glu Asp Lys Glu Arg Arg Lys Glu Arg Met Ile
865 870 875 880
Lys Thr Lys Asn Arg Phe Ile Gly Ile Asp Ile Gly Glu Tyr Gly Leu
885 890 895
Ala Trp Ser Leu Ile Glu Val Asp Asn Gly Asp Lys Asn Asn Arg Gly
900 905 910
Ile Arg Gln Leu Glu Ser Gly Phe Ile Thr Asp Asn Gln Gln Gln Val
915 920 925
Leu Lys Lys Asn Val Lys Ser Trp Arg Gln Asn Gln Ile Arg Gln Thr
930 935 940
Phe Thr Ser Pro Asp Thr Lys Ile Ala Arg Leu Arg Glu Ser Leu Ile
945 950 955 960
Gly Ser Tyr Lys Asn Gln Leu Glu Ser Leu Met Val Ala Lys Lys Ala
965 970 975
Asn Leu Ser Phe Glu Tyr Glu Val Ser Gly Phe Glu Val Gly Gly Lys
980 985 990
Arg Val Ala Lys Ile Tyr Asp Ser Ile Lys Arg Gly Ser Val Arg Lys
995 1000 1005
Lys Asp Asn Asn Ser Gln Asn Asp Gln Ser Trp Gly Lys Lys Gly
1010 1015 1020
Ile Asn Glu Trp Ser Phe Glu Thr Thr Ala Ala Gly Thr Ser Gln
1025 1030 1035
Phe Cys Thr His Cys Lys Arg Trp Ser Ser Leu Ala Ile Val Asp
1040 1045 1050
Ile Glu Glu Tyr Glu Leu Lys Asp Tyr Asn Asp Asn Leu Phe Lys
1055 1060 1065
Val Lys Ile Asn Asp Gly Glu Val Arg Leu Leu Gly Lys Lys Gly
1070 1075 1080
Trp Arg Ser Gly Glu Lys Ile Lys Gly Lys Glu Leu Phe Gly Pro
1085 1090 1095
Val Lys Asp Ala Met Arg Pro Asn Val Asp Gly Leu Gly Met Lys
1100 1105 1110
Ile Val Lys Arg Lys Tyr Leu Lys Leu Asp Leu Arg Asp Trp Val
1115 1120 1125
Ser Arg Tyr Gly Asn Met Ala Ile Phe Ile Cys Pro Tyr Val Asp
1130 1135 1140
Cys His His Ile Ser His Ala Asp Lys Gln Ala Ala Phe Asn Ile
1145 1150 1155
Ala Val
1160
<210> 4
<211> 1210
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 4
Met Ser Lys Arg His Pro Arg Ile Ser Gly Val Lys Gly Tyr Arg Leu
1 5 10 15
His Ala Gln Arg Leu Glu Tyr Thr Gly Lys Ser Gly Ala Met Arg Thr
20 25 30
Ile Lys Tyr Pro Leu Tyr Ser Ser Pro Ser Gly Gly Arg Thr Val Pro
35 40 45
Arg Glu Ile Val Ser Ala Ile Asn Asp Asp Tyr Val Gly Leu Tyr Gly
50 55 60
Leu Ser Asn Phe Asp Asp Leu Tyr Asn Ala Glu Lys Arg Asn Glu Glu
65 70 75 80
Lys Val Tyr Ser Val Leu Asp Phe Trp Tyr Asp Cys Val Gln Tyr Gly
85 90 95
Ala Val Phe Ser Tyr Thr Ala Pro Gly Leu Leu Lys Asn Val Ala Glu
100 105 110
Val Arg Gly Gly Ser Tyr Glu Leu Thr Lys Thr Leu Lys Gly Ser His
115 120 125
Leu Tyr Asp Glu Leu Gln Ile Asp Lys Val Ile Lys Phe Leu Asn Lys
130 135 140
Lys Glu Ile Ser Arg Ala Asn Gly Ser Leu Asp Lys Leu Lys Lys Asp
145 150 155 160
Ile Ile Asp Cys Phe Lys Ala Glu Tyr Arg Glu Arg His Lys Asp Gln
165 170 175
Cys Asn Lys Leu Ala Asp Asp Ile Lys Asn Ala Lys Lys Asp Ala Gly
180 185 190
Ala Ser Leu Gly Glu Arg Gln Lys Lys Leu Phe Arg Asp Phe Phe Gly
195 200 205
Ile Ser Glu Gln Ser Glu Asn Asp Lys Pro Ser Phe Thr Asn Pro Leu
210 215 220
Asn Leu Thr Cys Cys Leu Leu Pro Phe Asp Thr Val Asn Asn Asn Arg
225 230 235 240
Asn Arg Gly Glu Val Leu Phe Asn Lys Leu Lys Glu Tyr Ala Gln Lys
245 250 255
Leu Asp Lys Asn Glu Gly Ser Leu Glu Met Trp Glu Tyr Ile Gly Ile
260 265 270
Gly Asn Ser Gly Thr Ala Phe Ser Asn Phe Leu Gly Glu Gly Phe Leu
275 280 285
Gly Arg Leu Arg Glu Asn Lys Ile Thr Glu Leu Lys Lys Ala Met Met
290 295 300
Asp Ile Thr Asp Ala Trp Arg Gly Gln Glu Gln Glu Glu Glu Leu Glu
305 310 315 320
Lys Arg Leu Arg Ile Leu Ala Ala Leu Thr Ile Lys Leu Arg Glu Pro
325 330 335
Lys Phe Asp Asn His Trp Gly Gly Tyr Arg Ser Asp Ile Asn Gly Lys
340 345 350
Leu Ser Ser Trp Leu Gln Asn Tyr Ile Asn Gln Thr Val Lys Ile Lys
355 360 365
Glu Asp Leu Lys Gly His Lys Lys Asp Leu Lys Lys Ala Lys Glu Met
370 375 380
Ile Asn Arg Phe Gly Glu Ser Asp Thr Lys Glu Glu Ala Val Val Ser
385 390 395 400
Ser Leu Leu Glu Ser Ile Glu Lys Ile Val Pro Asp Asp Ser Ala Asp
405 410 415
Asp Glu Lys Pro Asp Ile Pro Ala Ile Ala Ile Tyr Arg Arg Phe Leu
420 425 430
Ser Asp Gly Arg Leu Thr Leu Asn Arg Phe Val Gln Arg Glu Asp Val
435 440 445
Gln Glu Ala Leu Ile Lys Glu Arg Leu Glu Ala Glu Lys Lys Lys Lys
450 455 460
Pro Lys Lys Arg Lys Lys Lys Ser Asp Ala Glu Asp Glu Lys Glu Thr
465 470 475 480
Ile Asp Phe Lys Glu Leu Phe Pro His Leu Ala Lys Pro Leu Lys Leu
485 490 495
Val Pro Asn Phe Tyr Gly Asp Ser Lys Arg Glu Leu Tyr Lys Lys Tyr
500 505 510
Lys Asn Ala Ala Ile Tyr Thr Asp Ala Leu Trp Lys Ala Val Glu Lys
515 520 525
Ile Tyr Lys Ser Ala Phe Ser Ser Ser Leu Lys Asn Ser Phe Phe Asp
530 535 540
Thr Asp Phe Asp Lys Asp Phe Phe Ile Lys Arg Leu Gln Lys Ile Phe
545 550 555 560
Ser Val Tyr Arg Arg Phe Asn Thr Asp Lys Trp Lys Pro Ile Val Lys
565 570 575
Asn Ser Phe Ala Pro Tyr Cys Asp Ile Val Ser Leu Ala Glu Asn Glu
580 585 590
Val Leu Tyr Lys Pro Lys Gln Ser Arg Ser Arg Lys Ser Ala Ala Ile
595 600 605
Asp Lys Asn Arg Val Arg Leu Pro Ser Thr Glu Asn Ile Ala Lys Ala
610 615 620
Gly Ile Ala Leu Ala Arg Glu Leu Ser Val Ala Gly Phe Asp Trp Lys
625 630 635 640
Asp Leu Leu Lys Lys Glu Glu His Glu Glu Tyr Ile Asp Leu Ile Glu
645 650 655
Leu His Lys Thr Ala Leu Ala Leu Leu Leu Ala Val Thr Glu Thr Gln
660 665 670
Leu Asp Ile Ser Ala Leu Asp Phe Val Glu Asn Gly Thr Val Lys Asp
675 680 685
Phe Met Lys Thr Arg Asp Gly Asn Leu Val Leu Glu Gly Arg Phe Leu
690 695 700
Glu Met Phe Ser Gln Ser Ile Val Phe Ser Glu Leu Arg Gly Leu Ala
705 710 715 720
Gly Leu Met Ser Arg Lys Glu Phe Ile Thr Arg Ser Ala Ile Gln Thr
725 730 735
Met Asn Gly Lys Gln Ala Glu Leu Leu Tyr Ile Pro His Glu Phe Gln
740 745 750
Ser Ala Lys Ile Thr Thr Pro Lys Glu Met Ser Arg Ala Phe Leu Asp
755 760 765
Leu Ala Pro Ala Glu Phe Ala Thr Ser Leu Glu Pro Glu Ser Leu Ser
770 775 780
Glu Lys Ser Leu Leu Lys Leu Lys Gln Met Arg Tyr Tyr Pro His Tyr
785 790 795 800
Phe Gly Tyr Glu Leu Thr Arg Thr Gly Gln Gly Ile Asp Gly Gly Val
805 810 815
Ala Glu Asn Ala Leu Arg Leu Glu Lys Ser Pro Val Lys Lys Arg Glu
820 825 830
Ile Lys Cys Lys Gln Tyr Lys Thr Leu Gly Arg Gly Gln Asn Lys Ile
835 840 845
Val Leu Tyr Val Arg Ser Ser Tyr Tyr Gln Thr Gln Phe Leu Glu Trp
850 855 860
Phe Leu His Arg Pro Lys Asn Val Gln Thr Asp Val Ala Val Ser Gly
865 870 875 880
Ser Phe Leu Ile Asp Glu Lys Lys Val Lys Thr Arg Trp Asn Tyr Asp
885 890 895
Ala Leu Thr Val Ala Leu Glu Pro Val Ser Gly Ser Glu Arg Val Phe
900 905 910
Val Ser Gln Pro Phe Thr Ile Phe Pro Glu Lys Ser Ala Glu Glu Glu
915 920 925
Gly Gln Arg Tyr Leu Gly Ile Asp Ile Gly Glu Tyr Gly Ile Ala Tyr
930 935 940
Thr Ala Leu Glu Ile Thr Gly Asp Ser Ala Lys Ile Leu Asp Gln Asn
945 950 955 960
Phe Ile Ser Asp Pro Gln Leu Lys Thr Leu Arg Glu Glu Val Lys Gly
965 970 975
Leu Lys Leu Asp Gln Arg Arg Gly Thr Phe Ala Met Pro Ser Thr Lys
980 985 990
Ile Ala Arg Ile Arg Glu Ser Leu Val His Ser Leu Arg Asn Arg Ile
995 1000 1005
His His Leu Ala Leu Lys His Lys Ala Lys Ile Val Tyr Glu Leu
1010 1015 1020
Glu Val Ser Arg Phe Glu Glu Gly Lys Gln Lys Ile Lys Lys Val
1025 1030 1035
Tyr Ala Thr Leu Lys Lys Ala Asp Val Tyr Ser Glu Ile Asp Ala
1040 1045 1050
Asp Lys Asn Leu Gln Thr Thr Val Trp Gly Lys Leu Ala Val Ala
1055 1060 1065
Ser Glu Ile Ser Ala Ser Tyr Thr Ser Gln Phe Cys Gly Ala Cys
1070 1075 1080
Lys Lys Leu Trp Arg Ala Glu Met Gln Val Asp Glu Thr Ile Thr
1085 1090 1095
Thr Gln Glu Leu Ile Gly Thr Val Arg Val Ile Lys Gly Gly Thr
1100 1105 1110
Leu Ile Asp Ala Ile Lys Asp Phe Met Arg Pro Pro Ile Phe Asp
1115 1120 1125
Glu Asn Asp Thr Pro Phe Pro Lys Tyr Arg Asp Phe Cys Asp Lys
1130 1135 1140
His His Ile Ser Lys Lys Met Arg Gly Asn Ser Cys Leu Phe Ile
1145 1150 1155
Cys Pro Phe Cys Arg Ala Asn Ala Asp Ala Asp Ile Gln Ala Ser
1160 1165 1170
Gln Thr Ile Ala Leu Leu Arg Tyr Val Lys Glu Glu Lys Lys Val
1175 1180 1185
Glu Asp Tyr Phe Glu Arg Phe Arg Lys Leu Lys Asn Ile Lys Val
1190 1195 1200
Leu Gly Gln Met Lys Lys Ile
1205 1210
<210> 5
<211> 1287
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 5
Met Lys Arg Ile Leu Asn Ser Leu Lys Val Ala Ala Leu Arg Leu Leu
1 5 10 15
Phe Arg Gly Lys Gly Ser Glu Leu Val Lys Thr Val Lys Tyr Pro Leu
20 25 30
Val Ser Pro Val Gln Gly Ala Val Glu Glu Leu Ala Glu Ala Ile Arg
35 40 45
His Asp Asn Leu His Leu Phe Gly Gln Lys Glu Ile Val Asp Leu Met
50 55 60
Glu Lys Asp Glu Gly Thr Gln Val Tyr Ser Val Val Asp Phe Trp Leu
65 70 75 80
Asp Thr Leu Arg Leu Gly Met Phe Phe Ser Pro Ser Ala Asn Ala Leu
85 90 95
Lys Ile Thr Leu Gly Lys Phe Asn Ser Asp Gln Val Ser Pro Phe Arg
100 105 110
Lys Val Leu Glu Gln Ser Pro Phe Phe Leu Ala Gly Arg Leu Lys Val
115 120 125
Glu Pro Ala Glu Arg Ile Leu Ser Val Glu Ile Arg Lys Ile Gly Lys
130 135 140
Arg Glu Asn Arg Val Glu Asn Tyr Ala Ala Asp Val Glu Thr Cys Phe
145 150 155 160
Ile Gly Gln Leu Ser Ser Asp Glu Lys Gln Ser Ile Gln Lys Leu Ala
165 170 175
Asn Asp Ile Trp Asp Ser Lys Asp His Glu Glu Gln Arg Met Leu Lys
180 185 190
Ala Asp Phe Phe Ala Ile Pro Leu Ile Lys Asp Pro Lys Ala Val Thr
195 200 205
Glu Glu Asp Pro Glu Asn Glu Thr Ala Gly Lys Gln Lys Pro Leu Glu
210 215 220
Leu Cys Val Cys Leu Val Pro Glu Leu Tyr Thr Arg Gly Phe Gly Ser
225 230 235 240
Ile Ala Asp Phe Leu Val Gln Arg Leu Thr Leu Leu Arg Asp Lys Met
245 250 255
Ser Thr Asp Thr Ala Glu Asp Cys Leu Glu Tyr Val Gly Ile Glu Glu
260 265 270
Glu Lys Gly Asn Gly Met Asn Ser Leu Leu Gly Thr Phe Leu Lys Asn
275 280 285
Leu Gln Gly Asp Gly Phe Glu Gln Ile Phe Gln Phe Met Leu Gly Ser
290 295 300
Tyr Val Gly Trp Gln Gly Lys Glu Asp Val Leu Arg Glu Arg Leu Asp
305 310 315 320
Leu Leu Ala Glu Lys Val Lys Arg Leu Pro Lys Pro Lys Phe Ala Gly
325 330 335
Glu Trp Ser Gly His Arg Met Phe Leu His Gly Gln Leu Lys Ser Trp
340 345 350
Ser Ser Asn Phe Phe Arg Leu Phe Asn Glu Thr Arg Glu Leu Leu Glu
355 360 365
Ser Ile Lys Ser Asp Ile Gln His Ala Thr Met Leu Ile Ser Tyr Val
370 375 380
Glu Glu Lys Gly Gly Tyr His Pro Gln Leu Leu Ser Gln Tyr Arg Lys
385 390 395 400
Leu Met Glu Gln Leu Pro Ala Leu Arg Thr Lys Val Leu Asp Pro Glu
405 410 415
Ile Glu Met Thr His Met Ser Glu Ala Val Arg Ser Tyr Ile Met Ile
420 425 430
His Lys Ser Val Ala Gly Phe Leu Pro Asp Leu Leu Glu Ser Leu Asp
435 440 445
Arg Asp Lys Asp Arg Glu Phe Leu Leu Ser Ile Phe Pro Arg Ile Pro
450 455 460
Lys Ile Asp Lys Lys Thr Lys Glu Ile Val Ala Trp Glu Leu Pro Gly
465 470 475 480
Glu Pro Glu Glu Gly Tyr Leu Phe Thr Ala Asn Asn Leu Phe Arg Asn
485 490 495
Phe Leu Glu Asn Pro Lys His Val Pro Arg Phe Met Ala Glu Arg Ile
500 505 510
Pro Glu Asp Trp Thr Arg Leu Arg Ser Ala Pro Val Trp Phe Asp Gly
515 520 525
Met Val Lys Gln Trp Gln Lys Val Val Asn Gln Leu Val Glu Ser Pro
530 535 540
Gly Ala Leu Tyr Gln Phe Asn Glu Ser Phe Leu Arg Gln Arg Leu Gln
545 550 555 560
Ala Met Leu Thr Val Tyr Lys Arg Asp Leu Gln Thr Glu Lys Phe Leu
565 570 575
Lys Leu Leu Ala Asp Val Cys Arg Pro Leu Val Asp Phe Phe Gly Leu
580 585 590
Gly Gly Asn Asp Ile Ile Phe Lys Ser Cys Gln Asp Pro Arg Lys Gln
595 600 605
Trp Gln Thr Val Ile Pro Leu Ser Val Pro Ala Asp Val Tyr Thr Ala
610 615 620
Cys Glu Gly Leu Ala Ile Arg Leu Arg Glu Thr Leu Gly Phe Glu Trp
625 630 635 640
Lys Asn Leu Lys Gly His Glu Arg Glu Asp Phe Leu Arg Leu His Gln
645 650 655
Leu Leu Gly Asn Leu Leu Phe Trp Ile Arg Asp Ala Lys Leu Val Val
660 665 670
Lys Leu Glu Asp Trp Met Asn Asn Pro Cys Val Gln Glu Tyr Val Glu
675 680 685
Ala Arg Lys Ala Ile Asp Leu Pro Leu Glu Ile Phe Gly Phe Glu Val
690 695 700
Pro Ile Phe Leu Asn Gly Tyr Leu Phe Ser Glu Leu Arg Gln Leu Glu
705 710 715 720
Leu Leu Leu Arg Arg Lys Ser Val Met Thr Ser Tyr Ser Val Lys Thr
725 730 735
Thr Gly Ser Pro Asn Arg Leu Phe Gln Leu Val Tyr Leu Pro Leu Asn
740 745 750
Pro Ser Asp Pro Glu Lys Lys Asn Ser Asn Asn Phe Gln Glu Arg Leu
755 760 765
Asp Thr Pro Thr Gly Leu Ser Arg Arg Phe Leu Asp Leu Thr Leu Asp
770 775 780
Ala Phe Ala Gly Lys Leu Leu Thr Asp Pro Val Thr Gln Glu Leu Lys
785 790 795 800
Thr Met Ala Gly Phe Tyr Asp His Leu Phe Gly Phe Lys Leu Pro Cys
805 810 815
Lys Leu Ala Ala Met Ser Asn His Pro Gly Ser Ser Ser Lys Met Val
820 825 830
Val Leu Ala Lys Pro Lys Lys Gly Val Ala Ser Asn Ile Gly Phe Glu
835 840 845
Pro Ile Pro Asp Pro Ala His Pro Val Phe Arg Val Arg Ser Ser Trp
850 855 860
Pro Glu Leu Lys Tyr Leu Glu Gly Leu Leu Tyr Leu Pro Glu Asp Thr
865 870 875 880
Pro Leu Thr Ile Glu Leu Ala Glu Thr Ser Val Ser Cys Gln Ser Val
885 890 895
Ser Ser Val Ala Phe Asp Leu Lys Asn Leu Thr Thr Ile Leu Gly Arg
900 905 910
Val Gly Glu Phe Arg Val Thr Ala Asp Gln Pro Phe Lys Leu Thr Pro
915 920 925
Ile Ile Pro Glu Lys Glu Glu Ser Phe Ile Gly Lys Thr Tyr Leu Gly
930 935 940
Leu Asp Ala Gly Glu Arg Ser Gly Val Gly Phe Ala Ile Val Thr Val
945 950 955 960
Asp Gly Asp Gly Tyr Glu Val Gln Arg Leu Gly Val His Glu Asp Thr
965 970 975
Gln Leu Met Ala Leu Gln Gln Val Ala Ser Lys Ser Leu Lys Glu Pro
980 985 990
Val Phe Gln Pro Leu Arg Lys Gly Thr Phe Arg Gln Gln Glu Arg Ile
995 1000 1005
Arg Lys Ser Leu Arg Gly Cys Tyr Trp Asn Phe Tyr His Ala Leu
1010 1015 1020
Met Ile Lys Tyr Arg Ala Lys Val Val His Glu Glu Ser Val Gly
1025 1030 1035
Ser Ser Gly Leu Val Gly Gln Trp Leu Arg Ala Phe Gln Lys Asp
1040 1045 1050
Leu Lys Lys Ala Asp Val Leu Pro Lys Lys Gly Gly Lys Asn Gly
1055 1060 1065
Val Asp Lys Lys Lys Arg Glu Ser Ser Ala Gln Asp Thr Leu Trp
1070 1075 1080
Gly Gly Ala Phe Ser Lys Lys Glu Glu Gln Gln Ile Ala Phe Glu
1085 1090 1095
Val Gln Ala Ala Gly Ser Ser Gln Phe Cys Leu Lys Cys Gly Trp
1100 1105 1110
Trp Phe Gln Leu Gly Met Arg Glu Val Asn Arg Val Gln Glu Ser
1115 1120 1125
Gly Val Val Leu Asp Trp Asn Arg Ser Ile Val Thr Phe Leu Ile
1130 1135 1140
Glu Ser Ser Gly Glu Lys Val Tyr Gly Phe Ser Pro Gln Gln Leu
1145 1150 1155
Glu Lys Gly Phe Arg Pro Asp Ile Glu Thr Phe Lys Lys Met Val
1160 1165 1170
Arg Asp Phe Met Arg Pro Pro Met Phe Asp Arg Lys Gly Arg Pro
1175 1180 1185
Ala Ala Ala Tyr Glu Arg Phe Val Leu Gly Arg Arg His Arg Arg
1190 1195 1200
Tyr Arg Phe Asp Lys Val Phe Glu Glu Arg Phe Gly Arg Ser Ala
1205 1210 1215
Leu Phe Ile Cys Pro Arg Val Gly Cys Gly Asn Phe Asp His Ser
1220 1225 1230
Ser Glu Gln Ser Ala Val Val Leu Ala Leu Ile Gly Tyr Ile Ala
1235 1240 1245
Asp Lys Glu Gly Met Ser Gly Lys Lys Leu Val Tyr Val Arg Leu
1250 1255 1260
Ala Glu Leu Met Ala Glu Trp Lys Leu Lys Lys Leu Glu Arg Ser
1265 1270 1275
Arg Val Glu Glu Gln Ser Ser Ala Gln
1280 1285
<210> 6
<211> 1192
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 6
Met Ala Glu Ser Lys Gln Met Gln Cys Arg Lys Cys Gly Ala Ser Met
1 5 10 15
Lys Tyr Glu Val Ile Gly Leu Gly Lys Lys Ser Cys Arg Tyr Met Cys
20 25 30
Pro Asp Cys Gly Asn His Thr Ser Ala Arg Lys Ile Gln Asn Lys Lys
35 40 45
Lys Arg Asp Lys Lys Tyr Gly Ser Ala Ser Lys Ala Gln Ser Gln Arg
50 55 60
Ile Ala Val Ala Gly Ala Leu Tyr Pro Asp Lys Lys Val Gln Thr Ile
65 70 75 80
Lys Thr Tyr Lys Tyr Pro Ala Asp Leu Asn Gly Glu Val His Asp Ser
85 90 95
Gly Val Ala Glu Lys Ile Ala Gln Ala Ile Gln Glu Asp Glu Ile Gly
100 105 110
Leu Leu Gly Pro Ser Ser Glu Tyr Ala Cys Trp Ile Ala Ser Gln Lys
115 120 125
Gln Ser Glu Pro Tyr Ser Val Val Asp Phe Trp Phe Asp Ala Val Cys
130 135 140
Ala Gly Gly Val Phe Ala Tyr Ser Gly Ala Arg Leu Leu Ser Thr Val
145 150 155 160
Leu Gln Leu Ser Gly Glu Glu Ser Val Leu Arg Ala Ala Leu Ala Ser
165 170 175
Ser Pro Phe Val Asp Asp Ile Asn Leu Ala Gln Ala Glu Lys Phe Leu
180 185 190
Ala Val Ser Arg Arg Thr Gly Gln Asp Lys Leu Gly Lys Arg Ile Gly
195 200 205
Glu Cys Phe Ala Glu Gly Arg Leu Glu Ala Leu Gly Ile Lys Asp Arg
210 215 220
Met Arg Glu Phe Val Gln Ala Ile Asp Val Ala Gln Thr Ala Gly Gln
225 230 235 240
Arg Phe Ala Ala Lys Leu Lys Ile Phe Gly Ile Ser Gln Met Pro Glu
245 250 255
Ala Lys Gln Trp Asn Asn Asp Ser Gly Leu Thr Val Cys Ile Leu Pro
260 265 270
Asp Tyr Tyr Val Pro Glu Glu Asn Arg Ala Asp Gln Leu Val Val Leu
275 280 285
Leu Arg Arg Leu Arg Glu Ile Ala Tyr Cys Met Gly Ile Glu Asp Glu
290 295 300
Ala Gly Phe Glu His Leu Gly Ile Asp Pro Gly Ala Leu Ser Asn Phe
305 310 315 320
Ser Asn Gly Asn Pro Lys Arg Gly Phe Leu Gly Arg Leu Leu Asn Asn
325 330 335
Asp Ile Ile Ala Leu Ala Asn Asn Met Ser Ala Met Thr Pro Tyr Trp
340 345 350
Glu Gly Arg Lys Gly Glu Leu Ile Glu Arg Leu Ala Trp Leu Lys His
355 360 365
Arg Ala Glu Gly Leu Tyr Leu Lys Glu Pro His Phe Gly Asn Ser Trp
370 375 380
Ala Asp His Arg Ser Arg Ile Phe Ser Arg Ile Ala Gly Trp Leu Ser
385 390 395 400
Gly Cys Ala Gly Lys Leu Lys Ile Ala Lys Asp Gln Ile Ser Gly Val
405 410 415
Arg Thr Asp Leu Phe Leu Leu Lys Arg Leu Leu Asp Ala Val Pro Gln
420 425 430
Ser Ala Pro Ser Pro Asp Phe Ile Ala Ser Ile Ser Ala Leu Asp Arg
435 440 445
Phe Leu Glu Ala Ala Glu Ser Ser Gln Asp Pro Ala Glu Gln Val Arg
450 455 460
Ala Leu Tyr Ala Phe His Leu Asn Ala Pro Ala Val Arg Ser Ile Ala
465 470 475 480
Asn Lys Ala Val Gln Arg Ser Asp Ser Gln Glu Trp Leu Ile Lys Glu
485 490 495
Leu Asp Ala Val Asp His Leu Glu Phe Asn Lys Ala Phe Pro Phe Phe
500 505 510
Ser Asp Thr Gly Lys Lys Lys Lys Lys Gly Ala Asn Ser Asn Gly Ala
515 520 525
Pro Ser Glu Glu Glu Tyr Thr Glu Thr Glu Ser Ile Gln Gln Pro Glu
530 535 540
Asp Ala Glu Gln Glu Val Asn Gly Gln Glu Gly Asn Gly Ala Ser Lys
545 550 555 560
Asn Gln Lys Lys Phe Gln Arg Ile Pro Arg Phe Phe Gly Glu Gly Ser
565 570 575
Arg Ser Glu Tyr Arg Ile Leu Thr Glu Ala Pro Gln Tyr Phe Asp Met
580 585 590
Phe Cys Asn Asn Met Arg Ala Ile Phe Met Gln Leu Glu Ser Gln Pro
595 600 605
Arg Lys Ala Pro Arg Asp Phe Lys Cys Phe Leu Gln Asn Arg Leu Gln
610 615 620
Lys Leu Tyr Lys Gln Thr Phe Leu Asn Ala Arg Ser Asn Lys Cys Arg
625 630 635 640
Ala Leu Leu Glu Ser Val Leu Ile Ser Trp Gly Glu Phe Tyr Thr Tyr
645 650 655
Gly Ala Asn Glu Lys Lys Phe Arg Leu Arg His Glu Ala Ser Glu Arg
660 665 670
Ser Ser Asp Pro Asp Tyr Val Val Gln Gln Ala Leu Glu Ile Ala Arg
675 680 685
Arg Leu Phe Leu Phe Gly Phe Glu Trp Arg Asp Cys Ser Ala Gly Glu
690 695 700
Arg Val Asp Leu Val Glu Ile His Lys Lys Ala Ile Ser Phe Leu Leu
705 710 715 720
Ala Ile Thr Gln Ala Glu Val Ser Val Gly Ser Tyr Asn Trp Leu Gly
725 730 735
Asn Ser Thr Val Ser Arg Tyr Leu Ser Val Ala Gly Thr Asp Thr Leu
740 745 750
Tyr Gly Thr Gln Leu Glu Glu Phe Leu Asn Ala Thr Val Leu Ser Gln
755 760 765
Met Arg Gly Leu Ala Ile Arg Leu Ser Ser Gln Glu Leu Lys Asp Gly
770 775 780
Phe Asp Val Gln Leu Glu Ser Ser Cys Gln Asp Asn Leu Gln His Leu
785 790 795 800
Leu Val Tyr Arg Ala Ser Arg Asp Leu Ala Ala Cys Lys Arg Ala Thr
805 810 815
Cys Pro Ala Glu Leu Asp Pro Lys Ile Leu Val Leu Pro Val Gly Ala
820 825 830
Phe Ile Ala Ser Val Met Lys Met Ile Glu Arg Gly Asp Glu Pro Leu
835 840 845
Ala Gly Ala Tyr Leu Arg His Arg Pro His Ser Phe Gly Trp Gln Ile
850 855 860
Arg Val Arg Gly Val Ala Glu Val Gly Met Asp Gln Gly Thr Ala Leu
865 870 875 880
Ala Phe Gln Lys Pro Thr Glu Ser Glu Pro Phe Lys Ile Lys Pro Phe
885 890 895
Ser Ala Gln Tyr Gly Pro Val Leu Trp Leu Asn Ser Ser Ser Tyr Ser
900 905 910
Gln Ser Gln Tyr Leu Asp Gly Phe Leu Ser Gln Pro Lys Asn Trp Ser
915 920 925
Met Arg Val Leu Pro Gln Ala Gly Ser Val Arg Val Glu Gln Arg Val
930 935 940
Ala Leu Ile Trp Asn Leu Gln Ala Gly Lys Met Arg Leu Glu Arg Ser
945 950 955 960
Gly Ala Arg Ala Phe Phe Met Pro Val Pro Phe Ser Phe Arg Pro Ser
965 970 975
Gly Ser Gly Asp Glu Ala Val Leu Ala Pro Asn Arg Tyr Leu Gly Leu
980 985 990
Phe Pro His Ser Gly Gly Ile Glu Tyr Ala Val Val Asp Val Leu Asp
995 1000 1005
Ser Ala Gly Phe Lys Ile Leu Glu Arg Gly Thr Ile Ala Val Asn
1010 1015 1020
Gly Phe Ser Gln Lys Arg Gly Glu Arg Gln Glu Glu Ala His Arg
1025 1030 1035
Glu Lys Gln Arg Arg Gly Ile Ser Asp Ile Gly Arg Lys Lys Pro
1040 1045 1050
Val Gln Ala Glu Val Asp Ala Ala Asn Glu Leu His Arg Lys Tyr
1055 1060 1065
Thr Asp Val Ala Thr Arg Leu Gly Cys Arg Ile Val Val Gln Trp
1070 1075 1080
Ala Pro Gln Pro Lys Pro Gly Thr Ala Pro Thr Ala Gln Thr Val
1085 1090 1095
Tyr Ala Arg Ala Val Arg Thr Glu Ala Pro Arg Ser Gly Asn Gln
1100 1105 1110
Glu Asp His Ala Arg Met Lys Ser Ser Trp Gly Tyr Thr Trp Gly
1115 1120 1125
Thr Tyr Trp Glu Lys Arg Lys Pro Glu Asp Ile Leu Gly Ile Ser
1130 1135 1140
Thr Gln Val Tyr Trp Thr Gly Gly Ile Gly Glu Ser Cys Pro Ala
1145 1150 1155
Val Ala Val Ala Leu Leu Gly His Ile Arg Ala Thr Ser Thr Gln
1160 1165 1170
Thr Glu Trp Glu Lys Glu Glu Val Val Phe Gly Arg Leu Lys Lys
1175 1180 1185
Phe Phe Pro Ser
1190
<210> 7
<211> 1192
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 7
Met Ala Glu Ser Lys Gln Met Gln Cys Arg Lys Cys Gly Ala Ser Met
1 5 10 15
Lys Tyr Glu Val Ile Gly Leu Gly Lys Lys Ser Cys Arg Tyr Met Cys
20 25 30
Pro Asp Cys Gly Asn His Thr Ser Ala Arg Lys Ile Gln Asn Lys Lys
35 40 45
Lys Arg Asp Lys Lys Tyr Gly Ser Ala Ser Lys Ala Gln Ser Gln Arg
50 55 60
Ile Ala Val Ala Gly Ala Leu Tyr Pro Asp Lys Lys Val Gln Thr Ile
65 70 75 80
Lys Thr Tyr Lys Tyr Pro Ala Asp Leu Asn Gly Glu Val His Asp Arg
85 90 95
Gly Val Ala Glu Lys Ile Glu Gln Ala Ile Gln Glu Asp Glu Ile Gly
100 105 110
Leu Leu Gly Pro Ser Ser Glu Tyr Ala Cys Trp Ile Ala Ser Gln Lys
115 120 125
Gln Ser Glu Pro Tyr Ser Val Val Asp Phe Trp Phe Asp Ala Val Cys
130 135 140
Ala Gly Gly Val Phe Ala Tyr Ser Gly Ala Arg Leu Leu Ser Thr Val
145 150 155 160
Leu Gln Leu Ser Gly Glu Glu Ser Val Leu Arg Ala Ala Leu Ala Ser
165 170 175
Ser Pro Phe Val Asp Asp Ile Asn Leu Ala Gln Ala Glu Lys Phe Leu
180 185 190
Ala Val Ser Arg Arg Thr Gly Gln Asp Lys Leu Gly Lys Arg Ile Gly
195 200 205
Glu Cys Phe Ala Glu Gly Arg Leu Glu Ala Leu Gly Ile Lys Asp Arg
210 215 220
Met Arg Glu Phe Val Gln Ala Ile Asp Val Ala Gln Thr Ala Gly Gln
225 230 235 240
Arg Phe Ala Ala Lys Leu Lys Ile Phe Gly Ile Ser Gln Met Pro Glu
245 250 255
Ala Lys Gln Trp Asn Asn Asp Ser Gly Leu Thr Val Cys Ile Leu Pro
260 265 270
Asp Tyr Tyr Val Pro Glu Glu Asn Arg Ala Asp Gln Leu Val Val Leu
275 280 285
Leu Arg Arg Leu Arg Glu Ile Ala Tyr Cys Met Gly Ile Glu Asp Glu
290 295 300
Ala Gly Phe Glu His Leu Gly Ile Asp Pro Gly Ala Leu Ser Asn Phe
305 310 315 320
Ser Asn Gly Asn Pro Lys Arg Gly Phe Leu Gly Arg Leu Leu Asn Asn
325 330 335
Asp Ile Ile Ala Leu Ala Asn Asn Met Ser Ala Met Thr Pro Tyr Trp
340 345 350
Glu Gly Arg Lys Gly Glu Leu Ile Glu Arg Leu Ala Trp Leu Lys His
355 360 365
Arg Ala Glu Gly Leu Tyr Leu Lys Glu Pro His Phe Gly Asn Ser Trp
370 375 380
Ala Asp His Arg Ser Arg Ile Phe Ser Arg Ile Ala Gly Trp Leu Ser
385 390 395 400
Gly Cys Ala Gly Lys Leu Lys Ile Ala Lys Asp Gln Ile Ser Gly Val
405 410 415
Arg Thr Asp Leu Phe Leu Leu Lys Arg Leu Leu Asp Ala Val Pro Gln
420 425 430
Ser Ala Pro Ser Pro Asp Phe Ile Ala Ser Ile Ser Ala Leu Asp Arg
435 440 445
Phe Leu Glu Ala Ala Glu Ser Ser Gln Asp Pro Ala Glu Gln Val Arg
450 455 460
Ala Leu Tyr Ala Phe His Leu Asn Ala Pro Ala Val Arg Ser Ile Ala
465 470 475 480
Asn Lys Ala Val Gln Arg Ser Asp Ser Gln Glu Trp Leu Ile Lys Glu
485 490 495
Leu Asp Ala Val Asp His Leu Glu Phe Asn Lys Ala Phe Pro Phe Phe
500 505 510
Ser Asp Thr Gly Lys Lys Lys Lys Lys Gly Ala Asn Ser Asn Gly Ala
515 520 525
Pro Ser Glu Glu Glu Tyr Thr Glu Thr Glu Ser Ile Gln Gln Pro Glu
530 535 540
Asp Ala Glu Gln Glu Val Asn Gly Gln Glu Gly Asn Gly Ala Ser Lys
545 550 555 560
Asn Gln Lys Lys Phe Gln Arg Ile Pro Arg Phe Phe Gly Glu Gly Ser
565 570 575
Arg Ser Glu Tyr Arg Ile Leu Thr Glu Ala Pro Gln Tyr Phe Asp Met
580 585 590
Phe Cys Asn Asn Met Arg Ala Ile Phe Met Gln Leu Glu Ser Gln Pro
595 600 605
Arg Lys Ala Pro Arg Asp Phe Lys Cys Phe Leu Gln Asn Arg Leu Gln
610 615 620
Lys Leu Tyr Lys Gln Thr Phe Leu Asn Ala Arg Ser Asn Lys Cys Arg
625 630 635 640
Ala Leu Leu Glu Ser Val Leu Ile Ser Trp Gly Glu Phe Tyr Thr Tyr
645 650 655
Gly Ala Asn Glu Lys Lys Phe Arg Leu Arg His Glu Ala Ser Glu Arg
660 665 670
Ser Ser Asp Pro Asp Tyr Val Val Gln Gln Ala Leu Glu Ile Ala Arg
675 680 685
Arg Leu Phe Leu Phe Gly Phe Glu Trp Arg Asp Cys Ser Ala Gly Glu
690 695 700
Arg Val Asp Leu Val Glu Ile His Lys Lys Ala Ile Ser Phe Leu Leu
705 710 715 720
Ala Ile Thr Gln Ala Glu Val Ser Val Gly Ser Tyr Asn Trp Leu Gly
725 730 735
Asn Ser Thr Val Ser Arg Tyr Leu Ser Val Ala Gly Thr Asp Thr Leu
740 745 750
Tyr Gly Thr Gln Leu Glu Glu Phe Leu Asn Ala Thr Val Leu Ser Gln
755 760 765
Met Arg Gly Leu Ala Ile Arg Leu Ser Ser Gln Glu Leu Lys Asp Gly
770 775 780
Phe Asp Val Gln Leu Glu Ser Ser Cys Gln Asp Asn Leu Gln His Leu
785 790 795 800
Leu Val Tyr Arg Ala Ser Arg Asp Leu Ala Ala Cys Lys Arg Ala Thr
805 810 815
Cys Pro Ala Glu Leu Asp Pro Lys Ile Leu Val Leu Pro Ala Gly Ala
820 825 830
Phe Ile Ala Ser Val Met Lys Met Ile Glu Arg Gly Asp Glu Pro Leu
835 840 845
Ala Gly Ala Tyr Leu Arg His Arg Pro His Ser Phe Gly Trp Gln Ile
850 855 860
Arg Val Arg Gly Val Ala Glu Val Gly Met Asp Gln Gly Thr Ala Leu
865 870 875 880
Ala Phe Gln Lys Pro Thr Glu Ser Glu Pro Phe Lys Ile Lys Pro Phe
885 890 895
Ser Ala Gln Tyr Gly Pro Val Leu Trp Leu Asn Ser Ser Ser Tyr Ser
900 905 910
Gln Ser Gln Tyr Leu Asp Gly Phe Leu Ser Gln Pro Lys Asn Trp Ser
915 920 925
Met Arg Val Leu Pro Gln Ala Gly Ser Val Arg Val Glu Gln Arg Val
930 935 940
Ala Leu Ile Trp Asn Leu Gln Ala Gly Lys Met Arg Leu Glu Arg Ser
945 950 955 960
Gly Ala Arg Ala Phe Phe Met Pro Val Pro Phe Ser Phe Arg Pro Ser
965 970 975
Gly Ser Gly Asp Glu Ala Val Leu Ala Pro Asn Arg Tyr Leu Gly Leu
980 985 990
Phe Pro His Ser Gly Gly Ile Glu Tyr Ala Val Val Asp Val Leu Asp
995 1000 1005
Ser Ala Gly Phe Lys Ile Leu Glu Arg Gly Thr Ile Ala Val Asn
1010 1015 1020
Gly Phe Ser Gln Lys Arg Gly Glu Arg Gln Glu Glu Ala His Arg
1025 1030 1035
Glu Lys Gln Arg Arg Gly Ile Ser Asp Ile Gly Arg Lys Lys Pro
1040 1045 1050
Val Gln Ala Glu Val Asp Ala Ala Asn Glu Leu His Arg Lys Tyr
1055 1060 1065
Thr Asp Val Ala Thr Arg Leu Gly Cys Arg Ile Val Val Gln Trp
1070 1075 1080
Ala Pro Gln Pro Lys Pro Gly Thr Ala Pro Thr Ala Gln Thr Val
1085 1090 1095
Tyr Ala Arg Ala Val Arg Thr Glu Ala Pro Arg Ser Gly Asn Gln
1100 1105 1110
Glu Asp His Ala Arg Met Lys Ser Ser Trp Gly Tyr Thr Trp Ser
1115 1120 1125
Thr Tyr Trp Glu Lys Arg Lys Pro Glu Asp Ile Leu Gly Ile Ser
1130 1135 1140
Thr Gln Val Tyr Trp Thr Gly Gly Ile Gly Glu Ser Cys Pro Ala
1145 1150 1155
Val Ala Val Ala Leu Leu Gly His Ile Arg Ala Thr Ser Thr Gln
1160 1165 1170
Thr Glu Trp Glu Lys Glu Glu Val Val Phe Gly Arg Leu Lys Lys
1175 1180 1185
Phe Phe Pro Ser
1190
<210> 8
<211> 1193
<212> PRT
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 8
Met Lys Arg Ile Ala Lys Phe Arg His Asp Lys Pro Val Lys Arg Glu
1 5 10 15
Ala Trp Ser Lys Gly Tyr Arg Val His Lys Asn Arg Ile Ile Asn Lys
20 25 30
Val Thr Arg Ser Ile Lys Tyr Pro Leu Val Val Lys Asp Glu Trp Lys
35 40 45
Lys Arg Leu Ile Asp Asp Ala Ala His Asp Tyr Arg Trp Leu Val Gly
50 55 60
Pro Ile Asn Tyr Ser Asp Trp Cys Arg Asp Pro Asn Gln Tyr Ser Ile
65 70 75 80
Leu Glu Phe Trp Ile Asp Phe Leu Cys Val Gly Gly Val Phe Gln Ser
85 90 95
Ser His Ser Asn Ile Cys Arg Leu Ala Ile Gln Leu Ser Gly Gly Ser
100 105 110
Val Phe Glu Gln Glu Trp Lys Asp Leu Ser Pro Phe Val Arg Ala Asn
115 120 125
Leu Ile Gln Gly Ile Lys Pro Ala Glu Phe Ile Gly Phe Leu Thr Ala
130 135 140
Glu Phe Arg Ser Ser Ser Asn Pro Lys Asn Phe Ile Ser Lys Phe Phe
145 150 155 160
Glu Gly Ser Asn Glu Asp Leu Glu Ser Leu Thr Asn Glu Phe Ala Ser
165 170 175
Ile Val Asp Phe Ile Lys Ala Lys Asp Ile Ser Leu Leu Arg Lys Ser
180 185 190
Leu Pro Ser Cys Lys Lys Ile Ala Pro Asn Leu Trp Glu Lys Ala Val
195 200 205
Gly Ser His Ser Thr Asn Glu Leu Leu Lys Leu Leu Thr Lys Tyr Thr
210 215 220
Arg Val Met Leu Val Ala Glu Pro Ser His Ser Asp Arg Val Phe Ser
225 230 235 240
Gln Thr Val Leu Gln Ser Asn Asp Gln Asp Asp Pro Glu Leu Thr Gly
245 250 255
Pro Leu Pro Ser His Lys Val Gly Lys Ala Ser Tyr Leu Phe Ile Pro
260 265 270
Glu Phe Ile Arg Glu Val Asn Leu Asp Lys Ile Ser Lys Leu Asp Leu
275 280 285
Ser Ala Lys Ser Lys Leu Ala Val Glu Gln Val Lys Lys Leu Ser Glu
290 295 300
Leu Thr Ser Asp Phe Lys Gln Ile Glu Asn Gln Ser Glu Ala Tyr Phe
305 310 315 320
Gly Leu Ser Thr Ser Phe Asn Glu Leu Ser Asn Phe Leu Gly Ile Leu
325 330 335
Ile Arg Thr Leu Arg Asn Ala Pro Glu Ala Ile Leu Lys Asp Gln Ile
340 345 350
Ala Leu Cys Ala Pro Leu Asp Lys Asp Ile Leu Lys Ile Thr Leu Asp
355 360 365
Trp Leu Cys Asp Arg Ala Gln Ala Leu Pro Glu Asn Pro Arg Phe Glu
370 375 380
Thr Asn Trp Ala Glu Tyr Arg Ser Tyr Leu Gly Gly Lys Ile Lys Ser
385 390 395 400
Trp Phe Ser Asn Tyr Glu Asn Phe Phe Glu Ile Pro Gln Ala Ala Ser
405 410 415
Ser Gln Gln Asn Asn Asn Arg Glu Lys Lys Leu Gly Asn Arg Ser Ala
420 425 430
Ile Arg Ala Leu Asn Leu Lys Lys Glu Ala Phe Glu Lys Ala Arg Glu
435 440 445
Thr Phe Lys Gly Asp Lys Gly Thr Leu Glu Lys Ile Asp Leu Ala Tyr
450 455 460
Arg Leu Leu Gly Ser Ile Ser Pro Glu Val Leu Gln Cys Asp Glu Gly
465 470 475 480
Leu Lys Leu Tyr Gln Gln Phe Asn Asp Glu Leu Leu Val Leu Asn Glu
485 490 495
Thr Ile Asn Gln Lys Phe Gln Asp Ala Lys Arg Asp Ile Lys Ala Lys
500 505 510
Lys Glu Lys Glu Ser Phe Glu Lys Leu Gln Arg Asn Leu Ser Ser Pro
515 520 525
Leu Pro Arg Ile Pro Glu Phe Phe Gly Glu Arg Ala Lys Lys Gly Tyr
530 535 540
Gln Lys Ala Arg Val Ser Pro Lys Leu Ala Arg His Leu Leu Glu Cys
545 550 555 560
Leu Asn Asp Trp Leu Ala Arg Phe Ala Lys Val Glu Glu Ser Ala Phe
565 570 575
Ser Glu Lys Glu Phe Gln Arg Ile Leu Asp Trp Leu Arg Thr Ser Asp
580 585 590
Phe Leu Pro Val Phe Ile Arg Lys Ser Lys Asp Pro Pro Ser Trp Leu
595 600 605
Arg Tyr Ile Ala Arg Val Ala Thr Gly Lys Tyr Tyr Phe Trp Val Ser
610 615 620
Glu Tyr Ser Arg Lys Arg Val Gln Ile Ile Asp Lys Pro Ile Ala Gln
625 630 635 640
Asn Pro Leu Lys Glu Leu Ile Ser Trp Phe Leu Leu Asn Lys Asp Ala
645 650 655
Phe Ser Arg Asp Asn Glu Leu Phe Lys Gly Leu Ser Ser Lys Met Val
660 665 670
Thr Leu Ala Arg Ile Met Ala Gly Ile Leu Arg Asp Arg Gly Glu Gly
675 680 685
Leu Lys Glu Leu Gln Ala Met Thr Ser Lys Leu Asp Asn Ile Gly Leu
690 695 700
Leu His Pro Ser Phe Ser Val Pro Val Thr Asp Ser Leu Lys Asp Ala
705 710 715 720
Ala Phe Tyr Arg Ala Phe Phe Ser Glu Leu Glu Gly Leu Leu Asn Ile
725 730 735
Gly Arg Ser Arg Leu Ile Ile Glu Arg Ile Thr Leu Gln Ser Gln Gln
740 745 750
Ser Lys Asn Lys Lys Thr Arg Arg Pro Leu Met Pro Glu Pro Phe Ile
755 760 765
Asn Glu Asp Lys Glu Val Phe Leu Ala Phe Pro Lys Phe Glu Thr Lys
770 775 780
Asn Lys Val Lys Gly Thr Arg Val Val Tyr Asn Ser Pro Asp Glu Val
785 790 795 800
Asn Trp Leu Leu Ser Pro Ile Arg Ser Ser Lys Gly Gln Leu Ser Phe
805 810 815
Met Phe Arg Cys Leu Ser Glu Asp Ala Lys Ile Met Thr Thr Ser Gly
820 825 830
Gly Cys Ser Tyr Ile Val Glu Phe Lys Lys Leu Leu Glu Ala Gln Glu
835 840 845
Glu Val Leu Ser Ile His Asp Cys Asp Ile Ile Pro Arg Ala Phe Val
850 855 860
Ser Ile Pro Phe Thr Leu Glu Arg Glu Ser Glu Glu Thr Lys Pro Asp
865 870 875 880
Trp Lys Pro Asn Arg Phe Met Gly Val Asp Ile Gly Glu Tyr Ala Val
885 890 895
Ala Tyr Cys Val Ile Glu Lys Gly Thr Asp Ser Ile Glu Ile Leu Asp
900 905 910
Cys Gly Ile Val Arg Asn Gly Ala His Arg Val Leu Lys Glu Lys Val
915 920 925
Asp Arg Leu Lys Arg Arg Gln Arg Ser Met Thr Phe Gly Ala Met Asp
930 935 940
Thr Ser Ile Ala Ala Ala Arg Glu Ser Leu Val Gly Asn Tyr Arg Asn
945 950 955 960
Arg Leu His Ala Ile Ala Leu Lys His Gly Ala Lys Leu Val Tyr Glu
965 970 975
Tyr Glu Val Ser Ala Phe Glu Ser Gly Gly Asn Arg Ile Lys Lys Val
980 985 990
Tyr Glu Thr Leu Lys Lys Ser Asp Cys Thr Gly Glu Thr Glu Ala Asp
995 1000 1005
Lys Asn Ala Arg Lys His Ile Trp Gly Glu Thr Asn Ala Val Gly
1010 1015 1020
Asp Gln Ile Gly Ala Gly Trp Thr Ser Gln Thr Cys Ala Lys Cys
1025 1030 1035
Gly Arg Ser Phe Gly Ala Asp Leu Lys Ala Gly Asn Phe Gly Val
1040 1045 1050
Ala Val Pro Val Pro Glu Lys Val Glu Asp Ser Lys Gly His Tyr
1055 1060 1065
Ala Tyr His Glu Phe Pro Phe Glu Asp Gly Leu Lys Val Arg Gly
1070 1075 1080
Phe Leu Lys Pro Asn Lys Ile Ile Ser Asp Gln Lys Glu Leu Ala
1085 1090 1095
Lys Ala Val His Ala Tyr Met Arg Pro Pro Leu Val Ala Leu Gly
1100 1105 1110
Lys Arg Lys Leu Pro Lys Asn Ala Arg Tyr Arg Arg Gly Asn Ser
1115 1120 1125
Ser Leu Phe Arg Cys Pro Phe Ser Asp Cys Gly Phe Thr Ala Asp
1130 1135 1140
Ala Asp Ile Gln Ala Ala Tyr Asn Ile Ala Val Lys Gln Leu Tyr
1145 1150 1155
Lys Pro Lys Lys Gly Tyr Pro Lys Glu Arg Lys Trp Gln Asp Phe
1160 1165 1170
Val Ile Leu Lys Pro Lys Glu Pro Ser Lys Leu Phe Asp Lys Gln
1175 1180 1185
Phe Tyr Arg Pro Asn
1190
<210> 9
<211> 4
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 9
Ala Ala Ala Ala
1
<210> 10
<211> 4
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 10
Ala Ala Ala Ala
1
<210> 11
<211> 25
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 11
cuccgaaagu aucggggaua aaggc 25
<210> 12
<211> 25
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 12
caccgaaauu uggagaggau aaggc 25
<210> 13
<211> 25
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 13
cuccgaauua ucgggaggau aaggc 25
<210> 14
<211> 25
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 14
ccccgaauau aggggacaaa aaggc 25
<210> 15
<211> 36
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 15
gucuagacau acagguggaa aggugagagu aaagac 36
<210> 16
<211> 25
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 16
cuccgugaau acguggggua aaggc 25
<210> 17
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 17
aaaaaaaaaa 10
<210> 18
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 18
aaaaaaaaaa 10
<210> 19
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 19
aaaaaaaaaa 10
<210> 20
<211> 43
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 20
cuccgaaagu aucggggaua aaggcaucaa uaccaaacuc ugg 43
<210> 21
<211> 6430
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 21
ttaaaaggac agtttctaat agcatataat cattatagca ttacatacgg aaaactactt 60
caaatttgcg gcagatcgga ttttgctggc ccagagatat attttccttc tttgttaaaa 120
gcggatttat ggcaagggca gagccagttt ttatttttat cttcccattc aacgatgcat 180
ccaagatgtg ggcaaattgg agagagtttt aaaatttctc ctttttcatt tttgtatacg 240
gcaactttct ttccttctat ctcaacaatt tttcctgtgt tgttttttaa attgtctaaa 300
gtacccgaag ttttcataaa gcgccccttc ataaaaagat aaggaaaaag aaatatttgt 360
tttaataatg ttaacatata gcttgttgaa ttataacatt tatccgagag gtggtctaac 420
ttatgcaact tattgattct tactttagga gaatagttct actctaggcg tatagagaac 480
ttttgttgaa aggtttttgc aatatctcta ctttctggcc aaaaatcggt ttttcccgcg 540
aatctgccgt atagtttgta tcctgcttta acaggtctgc ctccgctagg ttttcccggg 600
aaaggtacta taaatctctt atttcctaag agataagagc gcaaaccgag aattaagcca 660
tgatagagtt cctgaaaagt agcagtttgg cgagttgctg caacataaat ttctgtatcc 720
atgaaatcct ttaggttttc cattgtatag ggaagtgttt tactttcatc cccaccgttt 780
tcttgtatct cttttattgt attaaaggcg actccgtcga taaaacctct atatggttcc 840
atcaaatcgt agattagaga ggggtaatct gaaggtgtgt gggtgtatcc gtgaaaagga 900
ctaaaatgat ggtaaaccac ccaacgcaag ataataccgc taacaaattt tgaagaagca 960
tctaaaacat tacagataaa attacctttt gatcgtcgcc tatctttagg atatcccaaa 1020
gacttgtaga aatgttccca atatcttttg gcatgccacg attccactcc aactatagac 1080
tccacggacg ataagccctg cagttcctgc gttggggctg ggattaacca ttccatggat 1140
ttgaatttag cgtaaatcaa tcttttcgtt atatatgcgc gtttcttttc attttgtctg 1200
aatagaatct gttttgttag taaatcttct ctattagatg ttgtagaagg aacgatccaa 1260
acaccgcggg gcatatttcg tcgatgtatt gttaaaggaa tgccccaagc actgcatttt 1320
tctagaaatt cttgttctag cggacaaacg ctaccataaa acatgataga gtgaatctct 1380
ggaaaggaca aatccagctc accacctttg taagagaatt taacactctt tcccgataag 1440
tctatggatt ttacataggg taaccagata aattgtttac gcttggcgaa atatctcctc 1500
atttcgtatt ggatatatgt ctcaaattat gctatattta aggtacattt tcaagcggtt 1560
tttagctcgt ttacatttta atatcaacaa aatcggggag aagtctccga aagtatcggg 1620
gataaaggca tcaataccaa actctggctc cgaaagtatc ggggataaag gcattcccaa 1680
tatctcatta ctccgaaagt atcggggata aaggctcctc ccgtatctgt caactccgaa 1740
agtatcgggg ataaaggctt aaaaaggaat accccactcc gaaagtatcg gggataaagg 1800
cttgtactcc acatccgcta ctccgaaagt atcggggata aaggcactga aacttgaatt 1860
gtactccgaa agtatcgggg ataaaggcat cttgcgactt tctcttctcc gaaagtatcg 1920
gggataaagg ctcttcggtt ggtacgggtt ctccgaaagt atcggggata aaggcttatg 1980
gcagtatcgc atactccgaa agtatcgggg ataaaggctt cataagtacg cctaaactcc 2040
gaaagtatcg gggataaagg cagatgaggc tatacttaac tccgaaagta tcggggataa 2100
aggcacaaac ataaagggaa aactccgaaa gtatcgggga taaaggcata aatctggtga 2160
acttactccg aaagtatcgg ggataaaggc tactgttatt gttgtacact ccgaaagtat 2220
cggggataaa ggcataacta gcgttcccat tctccgaaag tatcaaaata aaaagggttt 2280
ccagttttta actaaacttt agccttccac cctttcctga ttttgttgat aattaataat 2340
gcgcaaaaaa ttgtttaagg gttacatttt acataataag aggcttgtat atacaggtaa 2400
agctgcaata cgttctatta aatatccatt agtcgctcca aataaaacag ccttaaacaa 2460
tttatcagaa aagataattt atgattatga gcatttattc ggacctttaa atgtggctag 2520
ctatgcaaga aattcaaaca ggtacagcct tgtggatttt tggatagata gcttgcgagc 2580
aggtgtaatt tggcaaagca aaagtacttc gctaattgat ttgataagta agctagaagg 2640
atctaaatcc ccatcagaaa agatatttga acaaatagat tttgagctaa aaaataagtt 2700
ggataaagag caattcaaag atattattct tcttaataca ggaattcgtt ctagcagtaa 2760
tgttcgcagt ttgagggggc gctttctaaa gtgttttaaa gaggaattta gagataccga 2820
agaggttatc gcctgtgtag ataaatggag caaggacctt atcgtagagg gtaaaagtat 2880
actagtgagt aaacagtttc tttattggga agaagagttt ggtattaaaa tttttcctca 2940
ttttaaagat aatcacgatt taccaaaact aacttttttt gtggagcctt ccttggaatt 3000
tagtccgcac ctccctttag ccaactgtct tgagcgtttg aaaaaattcg atatttcgcg 3060
tgaaagtttg ctcgggttag acaataattt ttcggccttt tctaattatt tcaatgagct 3120
ttttaactta ttgtccaggg gggagattaa aaagattgta acagctgtcc ttgctgtttc 3180
taaatcgtgg gagaatgagc cagaattgga aaagcgctta cattttttga gtgagaaggc 3240
aaagttatta gggtacccta agcttacttc ttcgtgggcg gattatagaa tgattattgg 3300
cggaaaaatt aaatcttggc attctaacta taccgaacaa ttaataaaag ttagagagga 3360
cttaaagaaa catcaaatcg cccttgataa attacaggaa gatttaaaaa aagtagtaga 3420
tagctcttta agagaacaaa tagaagctca acgagaagct ttgcttcctt tgcttgatac 3480
catgttaaaa gaaaaagatt tttccgatga tttagagctt tacagattta tcttgtcaga 3540
ttttaagagt ttgttaaatg ggtcttatca aagatatatt caaacagaag aggagagaaa 3600
ggaggacaga gatgttacca aaaaatataa agatttatat agtaatttgc gcaacatacc 3660
tagatttttt ggggaaagta aaaaggaaca attcaataaa tttataaata aatctctccc 3720
gaccatagat gttggtttaa aaatacttga ggatattcgt aatgctctag aaactgtaag 3780
tgttcgcaaa cccccttcaa taacagaaga gtatgtaaca aagcaacttg agaagttaag 3840
tagaaagtac aaaattaacg cctttaattc aaacagattt aaacaaataa ctgaacaggt 3900
gctcagaaaa tataataacg gagaactacc aaagatctcg gaggtttttt atagataccc 3960
gagagaatct catgtggcta taagaatatt acctgttaaa ataagcaatc caagaaagga 4020
tatatcttat cttctcgaca aatatcaaat tagccccgac tggaaaaaca gtaacccagg 4080
agaagttgta gatttgatag agatatataa attgacattg ggttggctct tgagttgtaa 4140
caaggatttt tcgatggatt tttcatcgta tgacttgaaa ctcttcccag aagccgcttc 4200
cctcataaaa aattttggct cttgcttgag tggttactat ttaagcaaaa tgatatttaa 4260
ttgcataacc agtgaaataa aggggatgat tactttatat actagagaca agtttgttgt 4320
tagatatgtt acacaaatga taggtagcaa tcagaaattt cctttgttat gtttggtggg 4380
agagaaacag actaaaaact tttctcgcaa ctggggtgta ttgatagaag agaagggaga 4440
tttgggggag gaaaaaaacc aggaaaaatg tttgatattt aaggataaaa cagattttgc 4500
taaagctaaa gaagtagaaa tttttaaaaa taatatttgg cgtatcagaa cctctaagta 4560
ccaaatccaa tttttgaata ggctttttaa gaaaaccaaa gaatgggatt taatgaatct 4620
tgtattgagc gagcctagct tagtattgga ggaggaatgg ggtgtttcgt gggataaaga 4680
taaactttta cctttactga agaaagaaaa atcttgcgaa gaaagattat attactcact 4740
tccccttaac ttggtgcctg ccacagatta taaggagcaa tctgcagaaa tagagcaaag 4800
gaatacatat ttgggtttgg atgttggaga atttggtgtt gcctatgcag tggtaagaat 4860
agtaagggac agaatagagc ttctgtcctg gggattcctt aaggacccag ctcttcgaaa 4920
aataagagag cgtgtacagg atatgaagaa aaagcaggta atggcagtat tttctagctc 4980
ttccacagct gtcgcgcgag tacgagaaat ggctatacac tctttaagaa atcaaattca 5040
tagcattgct ttggcgtata aagcaaagat aatttatgag atatctataa gcaattttga 5100
gacaggtggt aatagaatgg ctaaaatata ccgatctata aaggtttcag atgtttatag 5160
ggagagtggt gcggataccc tagtttcaga gatgatctgg ggcaaaaaga ataagcaaat 5220
gggaaaccat atatcttcct atgcgacaag ttacacttgt tgcaattgtg caagaacccc 5280
ttttgaactt gttatagata atgacaagga atatgaaaag ggaggcgacg aatttatttt 5340
taatgttggc gatgaaaaga aggtaagggg gtttttacaa aagagtctgt taggaaaaac 5400
aattaaaggg aaggaagtgt tgaagtctat aaaagagtac gcaaggccgc ctataaggga 5460
agtcttgctt gaaggagaag atgtagagca gttgttgaag aggagaggaa atagctatat 5520
ttatagatgc cctttttgtg gatataaaac tgatgcggat attcaagcgg cgttgaatat 5580
agcttgtagg ggatatattt cggataacgc aaaggatgct gtgaaggaag gagaaagaaa 5640
attagattac attttggaag ttagaaaatt gtgggagaag aatggagctg ttttgagaag 5700
cgccaaattt ttatagttat attggatata tcttttcaaa aaatctgaat tggtctagga 5760
ccgcggaatc ctatggtaat ttctacgtcc agaatgtagc gccatgccat tagaccagtc 5820
cccgaattaa acatcgccga acttcttggt gatgttatgg caaagagaat gcgacagcgc 5880
ctattcattg agcaagatat ggaaagtatt cctccagggc aaacaatggt tttgaatatg 5940
ggggagcctg ttgtgggaac ggaatttaca catcggcgga atattaatgg gaaagagtgc 6000
gttttatttt ttgcagttga actttttaaa gacgacagcg cgtagtcagt acatcttcgg 6060
cccatcttaa tcttccattg gggttattaa gactgcccac tttagcagca agatttttaa 6120
ggtgactcct taattctttc tcgtgcggag ttagatctat ttttccaaaa tctttatccg 6180
catggtttag gaatatttgt atagagtcta ggggaatttc cttaccgatg tcccccgctg 6240
cggtaacaac tctgtaaaga tccatcttta ttgaatttaa tataaactgt ctgtcttttt 6300
tcatatttct aaatgctttt ttgttaattc aaataaccta cccctcacat tcttatcgta 6360
tatctcatat gtatacttac ctagtgcagg tttgtaattt ctcatagcca tatattcaac 6420
ttcttttgaa 6430
<210> 22
<211> 13819
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 22
ctctttttct tgactatggt catcgcttag cttggcgggg acgtttgatc tttgcttcta 60
gtttaatcct ttttctgtcc ttgttgtttt taatgaatta ccctctaatt tggggtttat 120
tagctttgag tttattggct ttagtgattc taacttggtg gaaaaaggct tggactaaat 180
ggttgttagt cccactgata atttttctgc tggctggcac tctagcgatt tttgcttcaa 240
aacctatttt agctaaacca atttttgatc taaatcaaag tttgaaaatt aatagttttg 300
attcgcgacc taatttagat agcactgctc aagtgactaa agccagtttg aaagctcatc 360
cctttttagg ttttggtcca aatcgttttt ggcgagcttg gactctttat aagccaaaat 420
tatttaatca atcagtaatc tggtcagttg attatcgtct ggcttatggt tttattccaa 480
caatgttagt aactcaaggt ggcctcggtt ttctggcttg gttaattctg ataatttcta 540
gttttattta tctttatcat ttattcaaac aaagttcagt agaagatttt tccacgataa 600
ttttattgag tctaagtttt atttatctct ggttaaattt actcattctt aatcctaatt 660
ttgttatcct ctctctggct tttgggtgct tggggtggtt gttagttttt aatcataaaa 720
tttctaatca gctttcttgg cacattaaat tagatacgtt tctaaaaagt ttagtggcaa 780
aactaggtct tagtattatt ttgggttttt tatttttaat cattattttg tcactgctta 840
attatagttc tttgatctta tttcatcggg gtctttcatc tttggatcgg ggggattttt 900
ccgccaccga aaaaaattgg cgtttagcta gtcgtttgag tcctcagaca gtttataatc 960
gttctttggc tgatcttaaa ctgcgtcaga ttaatcaact tctgacgact cctaattctg 1020
attctcaaaa aactttagcc gagttttccc gtttttatgg tgagtcaatt ggatttggct 1080
tgactgctcg tgaccaagat ccttttgatt atttaaattg gttaatttta ggtcaagttt 1140
atgaagctgg gattccgctt aaaattaaag gggccgatat tcaagctcgg aaaatttatc 1200
aagaagtgct tagattaaac ccggtttggc cagtcatttg gctaaatttg gctcgagtgg 1260
aattaggctc tgatcaccct gatttagcgc gagaagattt acttaaagct ttggaattaa 1320
aagccgatta ttccgatgct ctgttagctt tagccgaatt agattatagt caaggtcgat 1380
tatcaaaagc tttagcggga gctaaggtgg cagttctgaa agaaccaaat aatttgggag 1440
cttggttttc ccttggtttt ttccagtatc aaattggaca ttatgatgaa gctgtcattt 1500
ctttagaaaa agtcttaacc tttaatcaaa attcagctga tactaaatat tttcttggtt 1560
taagtttagc tgaacttgat cgaacgactg aggcgattga cctatttcaa tctttagttc 1620
gggctaatcc cgacaatcaa gagcttaaaa atattttaac taatctcaaa gctggtcgaa 1680
cagctttagc gccaccagag accaaaacca aaacaaaata ataattcatg gtgtctaaaa 1740
ttactcgctt acttcaaaaa gaatttacca atcttcacca agcagctttt ttgttggcta 1800
cttcggcctt gctgtctcaa tttttgggtt tgtggcggga tcgtttatta gcctctggtt 1860
ttggagctag tcatcaatta gatatttatt atacggcttt tcgcttaccg gatttaattt 1920
acgtttcggt ggcttctttt gtttcgatca cggtccatat tcctttgatt attaataaga 1980
tggaaactgg tggtaaaccg gcggtggaaa aatttctcaa ttcagtgctg acagtttttt 2040
taattgggat ggtttcagtt tccgcgttat tatttatttt tatgccctgg ttatcgaaaa 2100
ttaccgctcc cgggttttct tcagttgatc aacaaacctt agtcacctta tctcgaattt 2160
tattgttgtc tcccttattg ttgggtttgt ctaatctctt gggaggagcc actcaagctt 2220
ttcgtaaatt tgccgcctat gcctttagtc ctatttttta taatttggga attatttttg 2280
ggattttctt tttctatcct ttgcttggtt tgccgggctt agtctgggga gtaattctcg 2340
gtgcagtctt acatttatca attcaattgc cagttttaag tcaattaggt ttacgtcttc 2400
gtttatcgag attaattaat tggccggaaa tgagaaaagt gatgctcata tccctaccgc 2460
gaactattac cttatcggct aatcaactat ctttattagt tttagtggct ttagcttcgt 2520
ttttgcccaa agggtcaatt tcggttttta atttttcgct caatcttcaa tcagtccccc 2580
tgtcgattat cggagtttct tattcggtgg cggcttttcc cgtcttggcc aaattttttg 2640
tcgctggtca acacaaagaa tttgctggtg aaattatcgc cgccattcga catattattt 2700
tttggtctgc tccagtggtc gttttgttta ttgttttacg agctcaaatc gtccgggtga 2760
ttttaggttc aggacgtttt gattggtcgg ccactcgatt gacggcagct tgtttggcga 2820
ttttttctgt gtcagtgatt gctcaaagtt tgattttagt tttagtccga gcttactatg 2880
ccgctgggga aaccaaaact cccttgatca ttaattcctt atcatctttg ggaacaatta 2940
ttttggcttt aattttatgg caactgttca aagtttggcc ggcctttcat ctgattttgg 3000
aacaaattct aagattgaaa gatttaccag ggacaattat tttagtctta cctctcgctt 3060
tttcgattgg agcgattatc aatgtttttg ttttatggtg ggctttcgaa cgacgctttg 3120
ctatcggaat ttggcgcaat ttagaggtag ttagtcttca gtctttagtc gcttctttat 3180
ttggtggctt tgtggcctat aacttactaa atgtctttag tctgtattat aaattagata 3240
ctttttggtc aatctttgag cagggatttt tagccggtat tttgggctta attgcctgga 3300
tttcggtctt aattcttttg aaaagtgaag aattggctga attgggacgt tctctgtcag 3360
cccgagtctg gaaagttgtc cctattgtcc cagaacgaga agaactgtag gatgggaaag 3420
tctttatatg gatttaaaac actatcgtaa tttttctatt attgctcacc ccagtagaac 3480
agccaagctg tctacggggc aagtattgat cataaattag tcttatggat ttaaaacact 3540
atcgtaattt ttctattatt gcccatatag atcatgggaa gagtactttg tctgatcggc 3600
ttttagattt gacagggaca attgaaaagc gaaaaatgcg agaacaagtc cttgattcga 3660
tggagttaga acgtgaacga ggaataacca tcaaaatgca accagtccga atgaattata 3720
aattggctgg tgaagattat attctgaatc taattgatac tccgggtcat attgattttt 3780
cttatgaagt gtctcgttcg cttcaagcag tggaaggggt cttgcttttg gttgacgcca 3840
ctcaaggggt ccaagctcaa acttttactg ttttagcgat ggctcaagaa ttgggtttaa 3900
cgattattcc cgttttaaac aaaattgatt taccaattgc tcgaacagct gaagtcaaac 3960
aagagattgt taatctatta aaatgtcagc ccgaagatat tatggcggtt tctggcaaaa 4020
ccggtgaagg agtagataaa ttattaattg agattattaa aaaaattcct agtccaattt 4080
cagaaataaa agttgttaaa ccttgccgag cgctggtatt tgattttgaa tattctattc 4140
ataaaggagt ggtggtctat gttcgagttt tagatggcga aattactccc gctgatcaac 4200
taaactttgt cgcttctggt gaaaaatttt cggttttaga attaggttat tttcgacctc 4260
aagctgaacc acaaaaaaaa ttacaggcgg gtgacattgg ttatttagtc actggaatta 4320
aaaaaccagg caatgctaaa gtgggggata cgattaccac tttagtgagt cctcttccag 4380
ctgtaccggg ctatatgact cctcgaccgg tggtctgggc ttctctttat ccagctagcc 4440
aagatgattt tgctctactc aagcaatccc tcgaacgatt aaatcttcaa gatgccgctc 4500
tgtcttttga agaggaaagc tcgggtgctt tgggacgagg ttttagagct ggttttctgg 4560
gaatgcttca tttggaaatc attagcgaac gattgaagcg agaattttct ttaaatttaa 4620
ttgtgacgac accgagtatt agttatcgtc taattaatac tcggaccaaa gaagaagtca 4680
ggattttctc tcctcacctt tttccacttg aaatcaagga ttatgaaatt tacgaatctt 4740
gggtagcggt tagaattatt agtcccgccg attatcttag tccgattatt caattacttc 4800
atgaacacga agcggaagta atgactatgg aaacttttag ttctagtcgc accgctttgt 4860
ctatcctcat gcctttacga gaattgatgc gtaatttttt tgatagttta aaaagtgtct 4920
cttctggctt tgcttctttt tcttatgaat tagccgaaga acgtctcgct gatgtctctc 4980
gcttggatat tttaattaat ggtgaaataa ttccggcttt ttcgcgaatt gtttcgcgtc 5040
gacgaatcga aaaggatgct tcggaaatgg ctgaacgttt agagggtttg attcccaaac 5100
aattgattac gattaaaatc caagttcaag gtttagggcg aattttggcg gcgcgttcaa 5160
tttccgctct acgaaaagat gtcactgact atctctatgg cggcgatatt actcgaaaaa 5220
tgaaattacg agaaaagcag aaaaaaggca agaaaaaaat gcaacagctg ggtaaggtaa 5280
atatccccca agaagttttt ctaaagatga tgcgaaatgc ggactagcgc ggactggacg 5340
cagactaatg cgaatttacc ctatggagta gcttgctata ctccataggg taaacgcaga 5400
tagtcacaaa caagacactg atcagatcag cgttttttta gcattgatcg gcgttttatc 5460
taaacaagaa ggggagagag taaagggcga ccatacttaa aataacaaga ataccaactg 5520
tcgctgagat gatttgaaag atttttttgt gtttgctctg aaataacatt agttgtagta 5580
taaggctgtg accagatttt atcaagtcga aaaacatttt aagtggctaa atgttctctt 5640
tcttattgtc actttaatct tggtgatttt tttggctcga ggggtttggc gagtttataa 5700
tcagagtcgt tttgctaatt ctaattatct tttgactaaa gatcgtctta ctaaattaga 5760
agacagacaa aaacaaatta ctgatcgtct agaaaaatta tcaaccgatc gtggtttaga 5820
agaagaattt agaaataatt tttcagtcgt gcgaccaggg gaaaaaatga ttttaattgt 5880
cgatagtatt gaaacagcta ctgatacagc cactactagt gaggctagtc tttgggggac 5940
tttaaaagcc ttattattaa gtcgttaatt aaaaaagcga gattggttca gcttgccctc 6000
ttaaatttct tgtgcaaata tgcgggtatg gtttagtttg ccctttaaaa ttttttgtcc 6060
gaacatgcga gtatggttta gtggtagaat gcgaccttcc caaggttgag acgcgagttc 6120
gattctcgct actcgcacaa aaaacttttt agggtgaata gaatgcgacc cccgaagaac 6180
agcaaagctg tctacggggc aggcttccca agcataagac gctggttcga ttcccgcatt 6240
tcgcacaatt ggccgattaa aatagtattt tattttttta tgtcctccac ctttaaacga 6300
actatcgaaa attttacttg tgctcattgt ggagcggagg tgattggtaa tggttatact 6360
gatcactgtc ctaaatgcct ctggggcctc catgtagatg atttcccggg agatcgagct 6420
aatccttgtt tgggcttaat gaagccgatt ggagtggatt tagcgaaggg agattatact 6480
ttaagctatc aatgtgaaaa atgtcacatc attaaaacta ataaaactgc tccggacgat 6540
gaacttaaca agtacttgac cggtatgtta taattgttaa ataagttaaa tttaaaatat 6600
aaaatgaaga aagttaccat ttattccact cccacttgtg gttattgtaa aattgctaaa 6660
caattcttta aagataaggg aattgatttt acagagattg atgtcactac tgatttagct 6720
gggcgacagg ccttagaaca aaaaattggc cgaattacgg gtgtgccagt aattacgatt 6780
gacgaagaag ccgtcgtggg ttttgatcaa gctcatattg cgaagatgtt agggatttaa 6840
actagtgaca atttaccccg ccttctgcca gccggtagag gatgggtttt tttggtaatt 6900
tgctaacaac aaacaaggag tctattatga agattaagtt tttgcctctg tagttcccgc 6960
cataatcctt aaataaattt aggattatgg cgggcgggaa acaagccggt taacgctctc 7020
atagttcaaa ggatagaact gtctcgtcct aagagaccaa tctccgttcg agtcggagtg 7080
agagcacaga ttaaaaaaca ttgactagag tcctacttgc cagcctaaga tttgctttag 7140
taaagttttg gcgggaggga aagatgtagg ttcgattcct accagaggca caattcgtaa 7200
cttggtcaaa tcattttcaa aacaaatgat accacacaca gaggagagga tatggggcac 7260
agccttcgtc agtttgataa ctcaaggaaa caaatctaaa aataaaactt caccgatgtt 7320
atcatttgga agccatctgc ttttcgtgcg tatttgaaac atttttggca acactccaga 7380
aatcggtagg gccggccgtc cttccatata gtttgtaacc aacttttact ggtcggccac 7440
cgtttggttt tcctggaatt ggtacgatga attgtcgcga tattccctgt aagtatgatc 7500
gtaacgataa tacagacccg tgtaataatt cttgaaaagt tactatctga cgagtagtgt 7560
tggtataaag atttgagtcg agataatctt caacggcgat tatacacctt gccagaaaat 7620
ccttttcttc aaccttttcc gatttagctt gctgtattgt attgaataca attttttcaa 7680
tatttcctct ataaggttcc attagatcgt aaactaaaga aggataatct gtcggaatgt 7740
ggagaaatcc atgatagggg ctcattcggt ggtaaattat ataacgcagt gtaataccgc 7800
ttattaattt tgaaaccgcg tccaaaatag attttatcgt atttgcccct cctctgcgtg 7860
aatatccact gtatccgagt attttataat attttttcca atataccttt gcatgctgtg 7920
cctcaatgtt taccatttgt ttaatagaat atcttttacc gtcaaataac attggatatg 7980
aaaccagcca actcatactt ttgaatttgg cttgtaaaat cttcttggca atgtggacac 8040
gtttcttctc attatttcta aatgagattt gcttacttaa gatatcatct ttggccgagg 8100
tttttacgct tggagtaatc catacggcat tactcattgt tcttcggtgt agacatatag 8160
gaacgccata ttttgcgcaa agttgtaaaa aattttcact taaatcacaa gttccaccat 8220
aaagcataat cgaaagaatg tttttaatgt ttgcggtata tttgccacct ttatattgaa 8280
aagttacaat atttttcttt acttctattt ggaaggtgta gggtagccat aagggtatct 8340
ttttattctt gctaatagac atgttttttg atattattac cctagaaaga gttaggtttt 8400
gaatacaaaa tctaacttat attttgtatt ttgtcaagta aaataaagag aaaagagaga 8460
acctcaccga aatttggaga ggataaggca agacaacaca catcttgcac cgaaatttgg 8520
agaggataag gcataccgct ctggctttga acaccgaaat ttggagagga taaggcaata 8580
ttcaaaatat ctagcaccga aatttggaga ggataaggct caatcttttt atagcctaca 8640
ccgaaatttg gagaggataa ggcaactcaa cataaagggt gcaccgaaat ttggagagga 8700
taaggcggat cgagataagt cgaacaccga aatttggaga ggataaggcg ctaacaaaat 8760
taccacccac cgaaatttgg agaggataag gcaaaccagc agggacttca caccgaaatt 8820
tggagaggat aaggcacaat tgtcatgttt attcaccgaa atttggagag gataaggctc 8880
gtttatgtta gcgaccacac cgaaatttgg agaggataag gcaagaaaca ataaccgcag 8940
aacaccgaaa tttggagagg ataaggccaa ttataatata gcctgcaccg aaatttggag 9000
aggataaggc aagatactgt tccaataaca ccgaaatttg gagaggataa ggcaaattat 9060
cataatccat tcaccgaaat ttggagagga taaggcatgg cttgtttttg taatcaccga 9120
aatttggaga ggataaggca cagggagaaa ttgcgaacac cgaaatttgg agaggataag 9180
gcgtttggca ataagtctcg caccgaaatt tggagaggat aaggcatggg tcaatccaac 9240
ccgtcaccga aatttggaga ggatgatggg tttggttcaa aaattctaag aatctgcttt 9300
attttcttca cttcacctac acggtctttc gtctcgttcc ttctagtaac acgagacctc 9360
gcctttccga ccgttctctt tgtctcttta ttttatctga cagaatatgc aaaaagtaag 9420
aaaaacttta tcagaggtac ataaaaatcc ttatggtaca aaagtccgta atgcaaagac 9480
tggctactca ctacagatag agaggctttc gtatactgga aaagagggga tgagaagttt 9540
taagattcca ctcgaaaata aaaataaaga agtttttgat gaattcgtaa aaaagatcag 9600
gaatgattat atcagtcagg ttgggttgct caatctttct gattggtatg aacattatca 9660
ggagaaacaa gaacattatt ctttggcgga tttttggtta gatagtttga gggccggagt 9720
gatttttgcg cacaaagaaa ctgagataaa gaatcttatc tctaagatac gtggtgataa 9780
atcgattgtt gataaattta atgcaagtat aaagaaaaaa cacgccgatc tttatgccct 9840
tgtcgatata aaagctctct acgattttct tacctccgac gcaagaaggg gattaaagac 9900
cgaagaagaa ttttttaact caaaaaggaa taccttgttt ccgaaattta gaaaaaaaga 9960
taacaaagcc gtcgaccttt gggtcaaaaa atttattggg ctggataata aagacaaatt 10020
aaattttacc aaaaagttta tcggtttcga tccaaatcct cagattaaat atgaccatac 10080
tttcttcttt catcaagaca ttaattttga tctagagaga atcacgactc cgaaggaact 10140
tatttcgact tataagaaat tcttaggaaa aaataaggat ctatacggtt ctgatgaaac 10200
aacggaagat caacttaaaa tggtattagg ttttcataat aatcacggcg ctttttctaa 10260
gtatttcaac gcgagcttgg aagcttttag ggggagagac aactccttgg ttgaacaaat 10320
aattaataat tctccttact ggaatagcca tcggaaagaa ttggaaaaga gaatcatttt 10380
tttgcaagtt cagtctaaaa aaataaaaga gaccgaactg ggaaagcctc acgagtatct 10440
tgcgagtttt ggcgggaagt ttgaatcttg ggtttcaaac tatttacgtc aggaagaaga 10500
ggtcaaacgt caactttttg gttatgagga gaataaaaaa ggccagaaaa aatttatcgt 10560
gggcaacaaa caagagctag ataaaatcat cagagggaca gatgagtatg agattaaagc 10620
gatttctaag gaaaccattg gacttactca gaaatgttta aaattacttg aacaactaaa 10680
agatagtgtc gatgattata cacttagcct atatcggcaa ctcatagtcg aattgagaat 10740
cagactgaat gttgaattcc aagaaactta tccggaatta atcggtaaga gtgagaaaga 10800
taaagaaaaa gatgcgaaaa ataaacgggc agacaagcgt tacccgcaaa tttttaagga 10860
tataaaatta atccccaatt ttctcggtga aacgaaacaa atggtatata agaaatttat 10920
tcgttccgct gacatccttt atgaaggaat aaattttatc gaccagatcg ataaacagat 10980
tactcaaaat ttgttgcctt gttttaagaa cgacaaggaa cggattgaat ttaccgaaaa 11040
acaatttgaa actttacggc gaaaatacta tctgatgaat agttcccgtt ttcaccatgt 11100
tattgaagga ataatcaata ataggaaact tattgaaatg aaaaagagag aaaatagcga 11160
gttgaaaact ttctccgata gtaagtttgt tttatctaag ctttttctta aaaaaggcaa 11220
aaaatatgaa aatgaggtct attatacttt ttatataaat ccgaaagctc gtgaccagcg 11280
acggataaaa attgttcttg atataaatgg gaacaattca gtcggaattt tacaagatct 11340
tgtccaaaag ttgaaaccaa aatgggacga catcataaag aaaaatgata tgggagaatt 11400
aatcgatgca atcgagattg agaaagtccg gctcggcatc ttgatagcgt tatactgtga 11460
gcataaattc aaaattaaaa aagaactctt gtcattagat ttgtttgcca gtgcctatca 11520
atatctagaa ttggaagatg accctgaaga actttctggg acaaacctag gtcggttttt 11580
acaatccttg gtctgctccg aaattaaagg tgcgattaat aaaataagca ggacagaata 11640
tatagagcgg tatactgtcc agccgatgaa tacggagaaa aactatcctt tactcatcaa 11700
taaggaggga aaagccactt ggcatattgc tgctaaggat gacttgtcca agaagaaggg 11760
tgggggcact gtcgctatga atcaaaaaat cggcaagaat ttttttggga aacaagatta 11820
taaaactgtg tttatgcttc aggataagcg gtttgatcta ctaacctcaa agtatcactt 11880
gcagttttta tctaaaactc ttgatactgg tggagggtct tggtggaaaa acaaaaatat 11940
tgatttaaat ttaagctctt attctttcat tttcgaacaa aaagtaaaag tcgaatggga 12000
tttaaccaat cttgaccatc ctataaagat taagcctagc gagaacagtg atgatagaag 12060
gcttttcgta tccattcctt ttgttattaa accgaaacag acaaaaagaa aggatttgca 12120
aactcgagtc aattatatgg ggattgatat cggagaatat ggtttggctt ggacaattat 12180
taatattgat ttaaagaata aaaaaataaa taagatttca aaacaaggtt tcatctatga 12240
gccgttgaca cataaagtgc gcgattatgt tgctaccatt aaagataatc aggttagagg 12300
aacttttggc atgcctgata cgaaactagc cagattgcga gaaaatgcca ttaccagctt 12360
gcgcaatcaa gtgcatgata ttgctatgcg ctatgacgcc aaaccggtat atgaatttga 12420
aatttccaat tttgaaacgg ggtctaataa agtgaaagta atttatgatt cggttaagcg 12480
agctgatatc ggccgaggcc agaataatac cgaagcagac aatactgagg ttaatcttgt 12540
ctgggggaag acaagcaaac aatttggcag tcaaatcggc gcttatgcga caagttacat 12600
ctgttcattt tgtggttatt ctccatatta tgaatttgaa aattctaagt cgggagatga 12660
agaaggggct agagataatc tatatcagat gaagaaattg agtcgcccct ctcttgaaga 12720
tttcctccaa ggaaatccgg tttataagac atttagggat tttgataagt ataaaaacga 12780
tcaacggttg caaaagacgg gtgataaaga tggtgaatgg aaaacacaca gagggaatac 12840
tgcaatatac gcctgtcaaa agtgtagaca tatctctgat gcggatatcc aagcatcata 12900
ttggattgct ttgaagcaag ttgtaagaga tttttataaa gacaaagaga tggatggtga 12960
tttgattcaa ggagataata aagacaagag aaaagtaaac gagcttaata gacttattgg 13020
agtacataaa gatgtgccta taataaataa aaatttaata acatcactcg acataaactt 13080
actatagagt tctcttcatt ggattgaaaa tagatccgat tcctaccaga gacaccaaat 13140
aaatttaaaa ttaaaaatta cctgccaaaa tttcgttcaa cgaaacttaa gcaggcaaga 13200
aaatttaaaa ttaaatccgc tggtgggcgg ataaagtcaa aaattgaaaa tatattaaat 13260
tgacaatatg ttctttatta gagtgcgatg tttgaatacc tcggggcttc gaatcagtag 13320
attcgtggct tggccataaa tccacaggta ttcaaacacg cgatgtgttt tgtatggccg 13380
ggtgggccat acctattcta acaaaacaac catggtgttt ggcgtgccta atacctcatc 13440
ggctctgccg tgaggatagg acacgcaact tgttttatta tgatataatg aaaggtagaa 13500
attgtcattt tgtaatggaa cagtaaaaaa gaggtgccgg tgatgaacaa aagagtgact 13560
aaaggagaca tcaggattta cctgatgatg tggaagggtg ctattatgac cgtctgtgtc 13620
gcgagtctgg ttggcatcat ccttggtcca gtctatcttt tgatcatttt tccgttgaag 13680
aaaatgatca gaaggtattc gatcgatttt tcggatttgc tcaaaggtct ttgatgactt 13740
ttaggcaaga agattgtttg ttagctctct accgcaagga ggagggcttt ttcttttttt 13800
taaattaatt tacctttca 13819
<210> 23
<211> 34045
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<220>
<221> features not yet classified
<222> (29562)..(29573)
<223> n is a, c, g or t
<400> 23
atgttccctc ttcttttcgt tgcctctgaa taagatttgc ttactcaaga tatcttcctt 60
agaagacgtc tttatgcttg gggtaatcca gatcgcggta ctcatcgttc tgcgatggat 120
gcaaacagga acactatatt tagtgcatag ttgcaagaaa tcctccttta aatcacaggt 180
gccgccataa agcattatcg ataagatgtt tttgacgtca gcagaataga cacctccttt 240
gtaatggaaa gttatcttat cttttttcac ctctattgcg gaagtataag ggaaccatag 300
ggggattctt ctgttgttat ttttcatgtt ttgatatata attacactag atatgggcac 360
atttcaggag taaaatctaa cccatttttt gtattttgtc aaataaaata aaggtaaagg 420
agagaacctc tccgaattat cgggaggata aggcagcgtc tgataattct tcctccgaat 480
tatcgggagg ataaggcaag actggtaaac tctagctccg aattatcggg aggataaggc 540
acagtaacaa catacgggct ccgaattatc gggaggataa ggcaaactaa ccgttgctct 600
actccgaatt atcgggagga taaggcaaag cgtttaaagc cgacactccg aattatcggg 660
aggataaggc aaacgcccta taacgcaatc tccgaattat cgggaggata aggcgtagtt 720
agtggataat ttactccgaa ttatcgggag gataaggcga cgctgacgat aaactgctcc 780
gaattatcgg gaggataagg cacaaacatt tcctcgacat ctccgaatta tcgggaggat 840
aaggcataat tactcgctcg acactccgaa ttatcgggag gataaggcaa aatcatatcg 900
ttcttgctcc gaattatcgg gaggataagg caccccgaca aaattaagcc tccgaattat 960
cgggaggata agtatggata tttccacaat cttgaaagaa agatttgtta gcctttaatc 1020
cattctcctt tccctttatt ttatctgaca acatatgaaa gctaaaaaaa gtttttataa 1080
tcaaaagcgg aagttcggta aaagaggtta tcgtcttcac gatgaacgta tcgcgtattc 1140
aggagggatt ggatcgatgc gatctattaa atatgaattg aaggattcgt atggaattgc 1200
tgggcttcgt aatcgaatcg ctgacgcaac tatttctgat aataagtggc tgtacgggaa 1260
tataaatcta aatgattatt tagagtggcg atcttcaaag actgacaaac agattgaaga 1320
cggagaccga gaatcatcac tcctgggttt ttggctggaa gcgttacgac tgggattcgt 1380
gttttcaaaa caatctcatg ctccgaatga ttttaacgag accgctctac aagatttgtt 1440
tgaaactctt gatgatgatt tgaaacatgt tcttgatagg aaaaaatggt gtgactttat 1500
caagatagga acacctaaga caaatgacca aggtcgttta aaaaaacaaa tcaagaattt 1560
gttaaaagga aacaagagag aggaaattga aaaaactctc aatgaatcag acgatgaatt 1620
gaaagagaaa ataaacagaa ttgccgatgt ttttgcaaaa aataagtctg ataaatacac 1680
aattttcaaa ttagataaac ccaatacgga aaaatacccc agaatcaacg atgttcaggt 1740
ggcgtttttt tgtcatcccg attttgagga aattacagaa cgagatagaa caaagactct 1800
agatctgatc attaatcggt ttaataagag atatgaaatt accgaaaata aaaaagatga 1860
caaaacttca aacaggatgg ccttgtattc cttgaaccag ggctatattc ctcgcgtcct 1920
gaatgattta ttcttgtttg tcaaagacaa tgaggatgat tttagtcagt ttttatctga 1980
tttggagaat ttcttctctt tttccaacga acaaattaaa ataataaagg aaaggttaaa 2040
aaaacttaaa aaatatgctg aaccaattcc cggaaagccg caacttgctg ataaatggga 2100
cgattatgct tctgattttg gcggtaaatt ggaaagctgg tactccaatc gaatagagaa 2160
attaaagaag attccggaaa gcgtttccga tctgcggaat aatttggaaa agatacgcaa 2220
tgttttaaaa aaacaaaata atgcatctaa aatcctggag ttatctcaaa agatcattga 2280
atacatcaga gattatggag tttcttttga aaagccggag ataattaagt tcagctggat 2340
aaataagacg aaggatggtc agaaaaaagt tttctatgtt gcgaaaatgg cggatagaga 2400
attcatagaa aagcttgatt tatggatggc tgatttacgc agtcaattaa atgaatacaa 2460
tcaagataat aaagtttctt tcaaaaagaa aggtaaaaaa atagaagagc tcggtgtctt 2520
ggattttgct cttaataaag cgaaaaaaaa taaaagtaca aaaaatgaaa atggctggca 2580
acaaaaattg tcagaatcta ttcaatctgc cccgttattt tttggcgaag ggaatcgtgt 2640
acgaaatgaa gaagtttata atttgaagga ccttctgttt tcagaaatca agaatgttga 2700
aaatatttta atgagctcgg aagcggaaga cttaaaaaat ataaaaattg aatataaaga 2760
agatggcgcg aaaaaaggga actatgtctt gaatgtcttg gctagatttt acgcgagatt 2820
caatgaggat ggctatggtg gttggaacaa agtaaaaacc gttttggaaa atattgcccg 2880
agaggcgggg actgattttt caaaatatgg aaataataac aatagaaatg ccggcagatt 2940
ttatctaaac ggccgcgaac gacaagtttt tactctaatc aagtttgaaa aaagtatcac 3000
ggtggaaaaa atacttgaat tggtaaaatt acctagccta cttgatgaag cgtatagaga 3060
tttagtcaac gaaaataaaa atcataaatt acgcgacgta attcaattga gcaagacaat 3120
tatggctctg gttttatctc attctgataa agaaaaacaa attggaggaa attatatcca 3180
tagtaaattg agcggataca atgcgcttat ttcaaagcga gattttatct cgcggtatag 3240
cgtgcaaacg accaacggaa ctcaatgtaa attagccata ggaaaaggca aaagcaaaaa 3300
aggtaatgaa attgacaggt atttctacgc ttttcaattt tttaagaatg acgacagcaa 3360
aattaattta aaggtaatca aaaataattc gcataaaaac atcgatttca acgacaatga 3420
aaataaaatt aacgcattgc aagtgtattc atcaaactat cagattcaat tcttagactg 3480
gttttttgaa aaacatcaag ggaagaaaac atcgctcgag gtcggcggat cttttaccat 3540
cgccgaaaag agtttgacaa tagactggtc ggggagtaat ccgagagtcg gttttaaaag 3600
aagcgacacg gaagaaaaga gggtttttgt ctcgcaacca tttacattaa taccagacga 3660
tgaagacaaa gagcgtcgta aagaaagaat gataaagacg aaaaaccgtt ttatcggtat 3720
cgatatcggt gaatatggtc tggcttggag tctaatcgaa gtggacaatg gagataaaaa 3780
taatagagga attagacaac ttgagagcgg ttttattaca gacaatcagc agcaagtctt 3840
aaagaaaaac gtaaaatcct ggaggcaaaa ccaaattcgt caaacgttta cttcaccaga 3900
cacaaaaatt gctcgtcttc gtgaaagttt gatcggaagt tacaaaaatc aactggaaag 3960
tctgatggtt gctaaaaaag caaatcttag ttttgaatac gaagtttccg ggtttgaagt 4020
tgggggaaag agggttgcaa aaatatacga tagtataaag cgtgggtcgg tgcgtaaaaa 4080
ggataataac tcacaaaatg atcaaagttg gggtaaaaag ggaattaatg agtggtcatt 4140
cgagacgacg gctgccggaa catcgcaatt ttgtactcat tgcaagcggt ggagcagttt 4200
agcgatagta gatattgaag aatatgaatt aaaagattac aacgataatt tatttaaggt 4260
aaaaattaat gatggtgaag ttcgtctcct tggtaagaaa ggttggagat ccggcgaaaa 4320
gatcaaaggg aaagaattat ttggtcccgt caaagacgca atgcgcccaa atgttgacgg 4380
actagggatg aaaattgtaa aaagaaaata tctaaaactt gatctccgcg attgggtttc 4440
aagatatggg aatatggcta ttttcatctg tccttatgtc gattgccacc atatctctca 4500
tgcggataaa caagctgctt ttaatattgc cgtgcgaggg tatttgaaaa gcgttaatcc 4560
tgacagagca ataaaacacg gagataaagg tttgtctagg gactttttgt gccaagaaga 4620
gggtaagctt aattttgaac aaatagggtt attatgaatc taaaaatagt cgtgatcaac 4680
aaactcaatc atttgaaaaa tttttatcgt cgccatccaa agaaaatcct ttggttgggg 4740
gtgccattgc tattgcttat cgggttgggg gcttgggctt atactcggag gactcaaccc 4800
gagttcgaaa cagaggtggt gaagttgggc gaggtggccg atgtggtgag cgatactggt 4860
ttggtgacgg ccgagaatga tctcactctc tcgttcgaga cgggcggggt cgttcgcacg 4920
gttaaggtta ccgaaggtga cgcggtttat cgaggacaga cgttagtctc gctggatgcc 4980
agtttgaagg cggcggaagt ggcgagcgcg cgcgccacgt tggccgctca agaagccaaa 5040
ttggctgaac tggtggcggg cccgaccaag ctagatttag cttcggccaa gacgaaactc 5100
gagaacgccc gcaagacctt gctgaccgcc gacctgcaag cgtacttcgc cggtccttca 5160
gccgattatg cggcttcttc attcacttat acggcgccga cggttttggg gacttacaat 5220
tccgatcaag agggcgaata cgtgcttgag ttatatcaat caggcgcgcc gtcgggctac 5280
tcggtggagt actccggttt ggagacgggg attatggagg gcgccgaagg acgagccgag 5340
cccttgggcc ggcgcggtct ctatctccaa ttcccggaga acttcattcg ggcgccagag 5400
gtaatttggc gcgtgcctat ccccaacacc aagtccgctt cttatgctac taaccggcgc 5460
gcctacgaac aggctcaagc cgattacgac ctgaaagtgg ctggcactcg cgccgaacaa 5520
attgtcgccg ccgaagccca agcgcgccaa gcccgcgcca ccctccaatc ggcgcaggcc 5580
tcgctgtcca agctctccct tacggcgccg gtggccggtt tggtgaagtc cgttccggtt 5640
accgtagggg agacggttac cgttggttca ccagctgtgg cgttggtctc ggatcataat 5700
tattacgtga ccctctatgt gccggaggct gagatggcca acttgacggt cggcgacttg 5760
gccgagatcc ggctcaaggc cttccccgat cgcgtcttcc gcgccaccgt ggggagtgtg 5820
gccccggcgg ccgaagatcg tgatggcgtg gcttcgttta aagttaaatt atatttccaa 5880
gaatccgatc cccaaattag agtggggatg tcggctgacg tcgaccttga ggcgcttaag 5940
aagaccgacg tcatggtggt gcccgggcgg gcggtggtgc gctctaatgg gcgaatcttt 6000
gtccgggttt ggagcaataa gaccgtcgag gaacgctcgg tggagattgg tctgcgtggc 6060
tctgatggct cggtggagat tgtctcggga ctctcggtgg gcgaagaggt gattactttt 6120
atccgtgacg aggagttgga tcgcttggcg gactaattcc ctttcggcgt ttatggcttt 6180
acttgaactc gaccaagtta ctaaatctta ttatagcgac gatctcacca ctcagatctt 6240
gcgcgggatt tcgtttacca ttaatgaagg cgaattcgtc tcgattatgg gcccgtccgg 6300
ttcgggcaaa tcaaccctct tgcacgttct cggattcttg gctgatcgca ccgccggtac 6360
ttaccgcttc aacggcaagc aatttgccga acataccgat gaggagatcg cgcgggtacg 6420
caatgaagaa atggggttcg tcttccagac tttcaactta cttggtcgta ataccgtctt 6480
cgaaaatgtg cgcttgccgc tcatctactc gcgcgtgccc gaaggagagt ggccggcctt 6540
ggttgatcag gctatcgccc aagttaagct tgatcatcgg cgcgactatg cctgctccaa 6600
gctctccggc ggcgagcaac aacgcgtcgc catcgctcgc gccttggtca accgacccaa 6660
cgtcctcttc gccgacgaac cgaccggcaa cttagactcc gcttcggggg gagcggtgat 6720
ggatacttta caacacttgc atgaagattc tggtcagacg gtgatcttaa tcactcacga 6780
gacctatacc gccgagcatg ctcagcggat catcaagatt ttggatggcc gggtcgaagc 6840
cgatttcaga cttgagacca gacgacgcgc cagcgagggt tatcataagt agttcgattt 6900
aatttatcct gagggtaatc gaaggactca ccacaagtaa aatgcaacgt tacaaattta 6960
gcttcctttc ggccttggag gcgatcaaaa ccaatcgtac gcgctctatc ctcaccactt 7020
tggggatcgt tattggggtg gcggcgatca ttgtgattat gtcgttgggc gccggcgccc 7080
agagtttaat tttaaatgag atcaatcaga tgggggccga gacggtcatc gtgttgccgg 7140
gtgagatcac tgatgccgcg gcggttttct cggactcact gacgcaacgt gacctggccg 7200
cggtgaaggt taagtccaat gtgcccaatt tggcgcgcgc cgcgccggcg gtcatcgtcc 7260
caggcaagac cacttataga ggtacgactt atacccccgc catgattatc ggcactgaag 7320
cggaattctt cggtgaggtt tttaatattt accctaaggt gggcacaatc tatgatcaag 7380
atgatatcga gacagcggcg cgggtggcga ttattggcga caaggttaag accgagcttt 7440
ttggcgcttc tgacgcggtg ggcgagcgga tcgatatcaa gggcaagcaa ttccgcgtgg 7500
tgggggtgta tccaacgacg gggcaaaaag gacctttcga tatcgacggc ttggtgatga 7560
ttccgcacac caccgcccag acttatctct taggcactaa ctattatcat cgccttatga 7620
ctcaagccga cagttcggac aatgtcgaga aattggcaca cgacatcacc gcgaccctgc 7680
gggagactca tggtctttat cctggtgatg acgacgactt ctcggtggta actcaacaag 7740
cgctggtgga tcaaatttcg atcattatca acattctcac ggccttcttg gcggccgtgg 7800
tggcgatctc cttggtggtg ggcggtatcg gcgtgatgaa tattatgctc gtgtcggtga 7860
ccgaacgcac taaagagatt ggtttgcgca aggcgctcgg ggcgacccgc tcggccatta 7920
tgacgcaatt tctctttgag gcgattgcgc tgaccttgtt tgggggcgtg ctggggatca 7980
tgatcggcgc ctcgctctcg ctcgtgctct cggggattct cacttacgcc gtggggctca 8040
attggtcctt ccacttcccc attagcgccg cgatgctcgg ggtcacggtc tcggcggcgg 8100
tcggactggt gtttggcctc tatccggcgc gtcgcgccgc cgccaaagac ccaatcgaag 8160
cgttgcggta tgaatagaac cggggaggtt tgacgtgact attgattagt gttagactat 8220
tgaaggaagt taatttgatt ttttgttcga aacaaagaaa aaaagaagga ggttaccatg 8280
tcggataaaa tcgtgagatt gcctcacctt aaagtttggc aacgagatcg gtgttggtgg 8340
ggacaattac tcttcactga tcgctcgatg agcgaagagt tcaacggcaa gttcttggcc 8400
ttggtcgctc tgcttgaagc ccaagagcga aaaagtgttg ttaatgaaga catcctcgat 8460
ctacttgatc agattgggaa atccccattg tcggagacag attgtcttcg gctacgacgt 8520
gacggtcatg ataaggtaga tgtggttctg gttaaaatta tgagaaattg ggtccgcgac 8580
tcggctcaaa atgagcgacg tgaatttgag ctcgtaagtt ttaaaaccac cattatgtcc 8640
aaacaggcgg cgaaagccac cttcaactga aatttttctc gcctgcgaat ctccaagcag 8700
accggtccga gcacgtgttg ctcgggccct ttatttttaa taaatatttg cccgaggatt 8760
gttttctcaa attctctttt ttctttaagt cggggttttt ggactgaaac ggaagagttg 8820
taatctagaa actcactttt tttggatggt ttttcaacaa atagctgtta caatagaaga 8880
gtggaaaaat aaaatgagtt gttttaaacc aggtacgggt aatcgaaagc tcagcacaat 8940
tccgggtttt accttgattg aaatcttggt ggtggttgcc attatcggta ttttgtcggg 9000
aataatttcg aataatttaa ggggtgctaa aattaaagcc cgagaagcct cggcccttca 9060
aaatgcgcgg caattagatt tggcggtatc gctttttgaa atagataaag gttattatcc 9120
gggaaccctg ggggttgaga caaatcaaga tgaccaaacg actggttgga aagaaggacc 9180
aggaaccctg cacgacgatc tggttcccaa atatatttct aaattaccca cgagtgatga 9240
gataaagttt atttatcttg ccgatgaacc atgtcccaac gaccagacga aaccttgtcg 9300
agctaagata gttatcgata ctgaccaaat tgtcgatggt gacggaggga cacccccacc 9360
acccccacca cccccaccac cagctaaggt gattgttccg gacttggtta ataaaaccga 9420
agccgaagcc ctcggggcca tctcggcggc taatttagca gtaggcttca atgatgatgg 9480
gtgtagtgat atggtttctt ctggttatgt tttttctcaa tcgttgacgg ccggtgctag 9540
tgttgatgaa ggtacggcga ttaatattgt tgtttctgcc ggagggtgta tttctccgcc 9600
accggtcggg tcgatcccta tctcaagttg tggcacaata ataactcaac ctggagatta 9660
ccatctggcc ctggtggagg agaccgagtt gaatcaaact aattccggga tctgtattta 9720
tgttaacaat gttgataatg ttaatttaga ctgtcagaat ataaagataa agggtaccga 9780
taccacagag tcatcgaaac aatatggcgt aattgtcggt aattcgtctg gggtggccgt 9840
taaaaattgt ctgattgaaa acgtcggcac cggaattagg gtatattcgt ctgataacat 9900
ctcgattgaa aacaatcgac tgtcaaactt aggcagggaa gggatgtatc ttaaagataa 9960
ttcagatgtg attattcgaa ataatcagct gaccaacgcc ggtgcaagag cgattgctat 10020
ttatcgagaa tgggcgagtc ttatttccgg ttacgctgtt gataataaca ccatcaaggg 10080
ggggtcctat ggtattacgt tcgggcatct gtttaccgac agtcgtcctc ccggtgagat 10140
taaagagatc gttataaacg gcaataattt atatgatatt gtcactacgg ctctatcctt 10200
aaatttagtc gagaacctct caatcattaa taattacatt tatgacccga aaatattcct 10260
ccaaatagac gattctaaaa atttactcat agacaacaac ttcggccaaa atatcacctg 10320
ggacatgttt atcggctatt cagataatgt aaccttttct aacaataagc ttaagagcgc 10380
ttcggcgact aaatcggtgg ttttagtttg gatgtttagg gttaataact tagatttctc 10440
tcgcaacgaa attgaaggct acaatcgtaa tttgttaaaa cttgacgata gttatgattt 10500
ctcgatcaaa aataatattt tcaatagccg ggttggtgtt tatgaagggg tgattttggg 10560
taaaggtttt ctcggtgtat ctggtgaagt ttctgaaaat gatttttacg gcggtggcga 10620
gggcgtctct ttagctttag atatttatca taattcggcc aaccgtctgg cgatctttaa 10680
taataatttt attgattatt tgggggcgtc gttaagatat gattctagtt ttttggattt 10740
aggagctaat tattatggta caaccgactg tgccttattg cgggcgacaa cttggcccga 10800
ctgggtgata ataccacctt cttctggttt acccagtcct ttgctttact tggattcgtt 10860
ttggcctaaa gggaacgttc aaacttgcaa ttaatttagg ctaaactgcg agtgaggtgt 10920
ttttcttgat atttagatta aaaagtgata taagtataaa agagaaagga ggttctgatg 10980
tctcaaatgg gtattgccca cgcgctcttt tacaagcgag gggattgtct ccaagctcgg 11040
atcgttttcg gcgacggtcg cttgagcgaa gagttcagct cccgtctcga agggatggag 11100
attctgacaa aatctcgtca ggataagctc atttctcatc aagagatgac ctctctggcg 11160
ttggaatttg cggaatcgac tttgccggcg agaactccgt cggcggaaat tgttgacggc 11220
cttctgatgg cgatgaagct tgacctttga aagctttatc aaaaccgctc tccggctgat 11280
ctcggggcgg tttttttgtt taaatttaaa gggatggagt tatttcgagc gggggatgcg 11340
atgcttctga tgagtgaagt tggcgttgaa gtttgacttg aagttttgat tgttcggccc 11400
gcccgatttc tgaaacttga agactgacgg ggtgcggaaa ccggcgttgc ccggttgttg 11460
ctgtttgttt tgtttgctcc gattggtgtt tttcatatcc tttaattata aatcgaagtt 11520
ggattatggc aagcagtaag ataaacgtcc taattgtgac gtgattgaca gaaaagataa 11580
aacaatgtag gatagatttc ggatcctgaa ccttcaactc tcctcaacag aatcaacaga 11640
aaggaagaca gaatgaagaa gatgcttgtc ttgttgtccg cgtttgtctt gaccatcgcc 11700
gagctggctt cggccggatc gttctctgac ccgttcgatg cccttgattc ggcttgggtg 11760
accgatcggt tcgagccggc cggattctcc agcgtcgtct tcgacggcga caatcggttg 11820
gagattgcga ttagcgcgac cgactcggag gctaatcgtc cggccgggtt cactagtggg 11880
ttttataaca cgcaaggccg tcaacgagat gccttgatgg cggaaccttg ggtcatctcc 11940
ggcgatcttt acttgtcgct ggatatgctc ttgggcgaca atttgcgccg gactgatctc 12000
tgggcgcgaa cttcggacgg tccggaggct aatgcgcaat acccgattat cgggatgcgt 12060
cggtttgacc cgcttgatcc cttcaacccg ctggcgggtg atattgcctc aacttggcga 12120
gtctgggatt cggacacggt cgacggttgg gtcaatttgg ccacgccgat ggtggctggt 12180
tggaacacgc tttcgattga gagtgacggt ctatcatatc tctatcggat caacggggtt 12240
gaggtctatg aggacctcac catcagcgct ttcgcgaccg atctgaccac ggtctttctc 12300
caaggttata acttcggcgg tgactacgaa gtctattggg acaatgtctc tgccgccacc 12360
ttggctccgg tgcccgagcc ggccacgatc ttgcttttaa tgctgggggc cggcgtggtg 12420
gcgattcgtc gtcatttcgc gaaacaacaa taactaactt gagaggttag ggtccgccaa 12480
cccgttcgct gtcgcgagcg ggttttttta ttggcgagaa gttaaggggt gatgtttagt 12540
tgaccaaggt aatagcgaag ggtgtagagc caatcctcgt cttcttcgcc ggcttccagt 12600
ttttgtttca gaagccattc gagataaccg cgatcggtct tggccacttc ggcgagcgtt 12660
cggtctttat gcttgccaaa accgaatttt ttgaagagtg acggacgaga cgagatctca 12720
atcattttgg cgagcgtttc ttcgtcggag agttcgcgcg aacccaagag cgagccgtcg 12780
ccggctttca atttttgcca taaccgatta aacagcgctt cggtcaccaa aacatcgccc 12840
acggcgtcat gagcggtgcc atcaagatcc aagtcgagat aataacgcaa gaattgcaga 12900
ttgtattccg gaatcacccc ttcggtatcc agttcgcgag ccaagcgcag ggtgcagata 12960
tattgcggca ctttgactcc ttcggcggcc aagatagcga tgtcgaattt ggcattgtgc 13020
gccaccaaca cgtgatcagc gagaagggtt tccagctcgc gacggaaggc gctctcggcg 13080
aagggttctt tgtcggccac cagcttattg gtgatgtgag tgatactcat cgacttaacc 13140
gagatgggga ctggcggctt gaagtaggcg gtgcgagtgg tggttttggt tttgtagcag 13200
acctgacaaa ggcgatcttt ggtcacgtcg ttgccggtgg tttcggtatc taagaataag 13260
atttccatgg tcggttaagc ggccggttgg tcggtcgaat caaccttaac gttttggata 13320
attacgggcg tgacggggcg atcgttttgg tcagtggcga cttggccgat ctggtttaca 13380
atttcttgtc caacagttac ccgaccgaag atggtgtagt tattgggtag cggataatct 13440
tcgagcatga taaagaattg actgccgttg gtattgggac cggcgttggc catcgccaac 13500
acgccttgcc ggtagccggc ctggtatgac ggagtggccg gatcgagctc gtcggcgaat 13560
tggtaaccgg ggccgccggt accgcagggg ccggtggcgg ggactttggc ggattcaggt 13620
gaacagttcg ggtcgccgcc ttggatcata aaacccttga tcactcgatg gaaggtgaga 13680
ccgttgtaat aaccggctcg ggccagcttg ataaagttgg caaccgtgtt gggggcgtct 13740
ttttcgtaga gaacgagggt aatctcgcca agattggttt gcaaggtgat ttggttaggc 13800
atagttgagg tggtcagtcc cgagcttgct cgcggtgagt tcgtcgaatc cgtcgaggtg 13860
gcttgagatt gataaatgtt acttgttaaa tcggcaggat tgggcgctct ctgatttaac 13920
ttttgccaac caaaaagtcc agccaggccg agtaaaataa taagaactaa aatcacctgt 13980
ttgttcatgg gaattgagaa acgggttaaa gatgggctga taattgtgaa ttataacaat 14040
aaccgttaga gtaaggcaat gaagagtgaa gaaccggaag attatcggct aggttggcgg 14100
cccttcttgg gttgccaagt ggatctctct cagcgaccgt tgattccgcg cgaggagacg 14160
gaattctggg ttgatcaagc aatcaaggaa cttaaaccag aatcaaccgc cggcaaacaa 14220
gtcttggact tgtttgccgg ttccggttgc atcggcttgg cggtgcttga gcactgtccg 14280
ggcgtggcgg tgactttcgg cgaaagggag gaaaaatttt gtgggcagat tcggaagaac 14340
ctcaagttaa acccgccagc cagatttgat ttcccgccag accttcgggc ggcctctcaa 14400
ggtctggcgg gtggaaggac catggcctct caaggtctgg cgggtgaaag gaccatggcc 14460
tctcaaggtc tggcggggcg aattagagtc gagtcgtcgg gaaaggttgt ccaaaccgac 14520
attttttcca aaatcaaagg gcagtttgat tttattttcg ccaacccgcc ttatgtcgcg 14580
accagaagaa gtcgggttca agcctcggtg cgcgactggg agccggccgg agcgctcttt 14640
gccggccccg acggtttggc ggtgattcga ccgtttttgg ttgaagcgaa aaaacgtttg 14700
cacccgggtg gccggattta tttggaattc ggttacggcc aaaaaggcgc tctggaagag 14760
ttattgcggc aaaacggata taaaggttgg tcgtttcggc gcgaccagtt tggccgctgg 14820
cgttgggtcg tgatacaata gcggtatcaa aagttaattt tttaattcta aaattttatg 14880
acagacaaaa acaaagcttt cattctctgg ttcaatgatt tgacaattgg cgacgtcggt 14940
ttggttggcg gcaagaacgc cgctttgggc gaaatggtca acaacctggt tccgcttgga 15000
gttaatgtgc cgaatggttt cgcgattacg gcgcacgctt acgcctactt cttagacaag 15060
acaggcttaa aacagaggat taaggaaatt ttgaccgatc tcaatactca caatatcaac 15120
gatttgcaaa aacgcggcgc ccaagtccgc gccgcgatta ttaaagaaga attgccggaa 15180
gaactgcaag tggagattat caacgcttat cgcaagctta gcgccaacta tcacagccag 15240
gccgtggatg tggcggtgcg gtcttccgcc acggccgagg atttgcccgg ggcctcgttt 15300
gccggtcaac aagaaactta tcttaatgtc gccagcgaaa aggagttgat gttgtcggtg 15360
cgcaagtgct tcgcctcgct ctttaccaat cgcgccatct cttatcgggt tgataagggt 15420
ttctcaatgt ttgatgtttt gctttcggtc ggggtacaga agatggtgcg cagcgatttg 15480
gccgcggccg gcgtgatgtt ttcggtcgac accgaaaccg gtttcgataa ggtggtggtg 15540
atcaacggtg cctacggttt gggcgagatg gtggtcttgg gcaaagtcac tcccgatgaa 15600
ttcgtggtct tcaagccgtc gctggagcgc ggttatcagg cgattctctc caagacgctt 15660
ggtcgcaagg acgtgaagtt ggtttacggc gccaagggca ccaaacaggt gtcggtgccg 15720
gccaaagagg tgaaccgttt ttgtctcaaa gacgaggagg tttccaaact ggccgcttgg 15780
ggcctgacca ttgagaaata tttttccggc aaacacaatc gctatcaacc gatggatatg 15840
gagtgggcca aggacggcaa gaccggcgaa ctctttattg ttcaagctcg ccccgagacg 15900
gtccacgccg aagccgacaa gaatgtttac gaagagcata ttttgaaaga gaaaggcaag 15960
gagttggttc gtggcaacgc catcggcgcc aagatcactg ccggcaaagt gcgcctgatc 16020
aagagcgcca accagatgaa caccttcaag ccgggcgaga tcttggttac cgagatcacc 16080
gatccggatt gggaaccgat tatgaagatc gcggcggcga ttatcaccga gaagggcggg 16140
cggaccagtc atgcggccat tgtctcgcgt gagcttggag tgccctccat cgtgggcacg 16200
ggcaacgcca ccaaggtgct aaaaaacggc cagctggtga ccgtggattg ttcctccggc 16260
aaagaaggag tggtttacga aggcaagctt gcctttgaga aaaaagaaca tcgtctaacc 16320
gctaccgcca agacgcgcac caaggtaatg gtcaatatcg gttcacccga cgatgccttc 16380
cgcaatttct atttgcccgt ttccggggtc ggtttaggtc ggttggaatt tatcattaat 16440
tcttacatca aggttcaccc caacgcgctc ttggattaca aagagcttaa ggccagtcgc 16500
gatccgcgcg ccaagaaggc ggttaaggcg attgatgagt tgacggttga atacaaaaac 16560
aagaccgatt attacgtcgg cgaattggcc gaaggggttg ccaaaatcgc ggccaccttc 16620
tacccgcacg acgtgattat ccgtttctcc gatttcaaga ccaacgagta ccgcactctg 16680
atcggcggcg atctctacga gccggaagag gagaacccga tgatcggttg gcgcggcgct 16740
tcgcgttatt atgatcccaa tttccgtcgc gctttcgcct tggaatgtcg cgctctctac 16800
caagtgcgta gcgagatggg cctttccaac gtgatcccga tgattccctt ctgtcgcacg 16860
gcggaagaag gccggcaagt ggtggagatt atgaccgaag ccggtctgga ccgtcaggct 16920
gacccttcgc tcaagattta tgtgatgtgc gagattcctt ccaacgtggt ggaggccgat 16980
gcctttttgg aagtcttcga cgggatgtcg atcggttcca acgacctgac ccagctgatg 17040
cttggtttgg atcgcgattc caacttgatc agccatatcg ccaacgagaa tcatccggcc 17100
gtcaagaaga tgattgaggt ggcgattaaa gcttgtcggg ccaagggcaa gtatatcggc 17160
atttgcggtc aggcgccgtc cgattatccg gagtttgccg attttttggt gcagaacggg 17220
atcgggagca tctcgctcaa tcccgattcg gtgattaaga ccttacccgt gattgaggcg 17280
gccgaagaga agtatcccca aagataataa aaatatgaaa atcgcttttt ttgaattgga 17340
gacttgggaa aaaaaatact tgcaagagcg aactctgccc ggcgaggtcg tttttatcga 17400
cggaccgttg gatgagacca agttgccgga gcaaaacgat ttcgacgcca tttcggtttt 17460
tgttaattcc attgtcggcg acaaagtgtt gggacatttt cccaatctcc agttgattgc 17520
cacccgctcg accggttatg atcattttga cctgccaact tgcgccgctc ggggggtcaa 17580
ggtggccaac gtgccgagtt acggcgaaga taccgtggcc gagtacgcct tcgccttaat 17640
gctcactctc tcgcgcaaga tttgcgagag ttatgagcgt attcgcgaga ccggcagttt 17700
cgatctcacc ggcctgcgcg gctttgatct gaagggcaag accttggggg tgatcggcac 17760
tggtcggatc ggcaaaaacg cgatcgagat cgcgcggggc ttcaatatga atatcgtcgc 17820
ttacgacaaa tttcccgacc cggtttatgc cgaaaagatg ggctatcgtt atctgtctct 17880
ggacgaggtg ctggccacgg ccgatatctt gaccttgcac gtgccctacc tgccggagaa 17940
tcatcatttg atcaatgccg aaacgctggc caaaatgaag tcgggggctt acctgatcaa 18000
caccgctcgc ggtggcttga ttgacaccgc ggctctgctc gtggcgctta agtcggggca 18060
aattgccgga gccggtttgg acgtgctcga agaggagggc gtaatcaaag atgaggtcaa 18120
tttcttaacc aacggtcgct tggatcaagg cgatctgaag acggtgctcg gcaatcatat 18180
tttgattgat ttgcccaacg tgatcattac tccgcataat gccttcaaca cttgggaggc 18240
gctgaagcgc attttagaca ccaccgtggc gaatctggtg gcttttgaag ctggaatgcc 18300
gcaaaatttg atcagtggcg attaaggcgg tttattgacg ttttaccttg ataacggtac 18360
aataaggtca gattccgttc ggggtgagtg gaaaaacgtc ggttctagac aacggaagga 18420
gattttatgg cccagaagtc tgccactgaa attgtttgag ctcgtctgtc tgcgtgaccg 18480
acgagcttgt gttttgttta aataaaaaga tggctgaatt caatttcaaa atcgaaaaga 18540
aaattgccgg ccgtctcggc cgagcgggaa caataatgac gcctcacgga gacatctcca 18600
ctccggcgtt tatcaccgtg gggaccaagg ccaccgtcaa ggcgctctcg ccggagcaag 18660
taatggcctc cggttcaccg gcggcgttgg ccaatactta ccacctcctc ttggagccgg 18720
gcgcggaagc ggtggcgcgg gctggcggtt tgcatcgcta tatgaattgg ccggggccgc 18780
tgattaccga ttcgggcggc ttccaggtct tctcgctcgg cgcggcttat gacgagggcg 18840
ggatcaataa attcctcaag ccgggcctac cctcgcggac cgcaccgaag cgaccttcgg 18900
aagaaggtcc gcgggagccg aagccggcca agattgacga agacggagtg acgtttcgtt 18960
cgcctttgga tggcgccgaa caccgcctga cgccggagag ctcgattcaa attcaacatc 19020
aacttggcgc cgatattatt tttgctttcg acgaatgcac ggcgcccacg gccgattacg 19080
tttatcagaa ggaagccatg aatcgcactc accgctgggc cgagcggagt ttggctgaac 19140
acgagcggct aacccaggct aagactcggg aaaatgcttc taaaaaagtc ctcggtcctc 19200
ttcaggcttc gcttgaggcc agactttttg ataagcattt tcccgagtct tattcggcct 19260
tgttcggcat cgtccaaggc ggccgcttcc aagacttgag ggaggcgagc gccaaattta 19320
ttgccagctt gcctttcgcc ggttttggga ttggcggttc cttcgataag accgatatgg 19380
gcacggcggt cgggtgggtc aatgcgatct tgccgaccga caaaccgcgc cacctgctgg 19440
ggattggcga accggaggat atgtttgagg cggtggcgca aggggccgac actttcgatt 19500
gtgtcactcc aacgcgcttg gcgcgccatg ccactttatt gacggcgacc ggccggctca 19560
atattttgaa tgccgctcac cgtgacgatc cgacatcgat cgaagccgat tgtgactgtt 19620
acgcctgcca aaattattcg cgcgcttact tggctcacct tttccgcgcc ggtgagattt 19680
ttggcgccac tttggccacg attcacaatt tgcgctttat gaatcgtctg tcggagcaaa 19740
tgcgcgccgc gattttggcc gagcgatttt tggagttcaa ggccgagtgg ctagccaaat 19800
atcaaagatg aagaaacccc cctcaacccc aaaacttttt cgtttggaaa gcgccttcgc 19860
gccggccggc gatcaaccgg cagcgattaa ggcgctgacc gaaggtctgg cacgcaatct 19920
tcgtcatcaa accttgttgg gggtgaccgg ttcgggcaaa acttttacca tggcgggagt 19980
gattgccgct tacaacaagc cgaccttggt gattgcccat aataaaactt tggcggccca 20040
attggcgcag gagtatcgaa gttttttccc cgaccacgcg gtgcattact ttgtttctta 20100
ttacgattat tatcaaccgg aggcttacgt ggcggccagc gacacttata tcgagaaaga 20160
cgccagcatc aacgaagaga tcgaacggct tcgtcacgcc tctaccgaag cgcttctgac 20220
gcggcgcgac gtgatcattg tcgcttcggt gtcgtgcatc tacggtttgg gcagtccgga 20280
ggaatacgcc aaaagtttta tcaattttaa tcttggcggg aaaattgaac gccaagcctt 20340
gattgagaaa ctggtcagtc tttattatga gcgaatcaac gccgatctct cgcccggcac 20400
ctttcgcgcc atcggcaatt ctgtggagat tatgccgccc ggtcaacgag agatcatcaa 20460
tctcaagttg accggggacc accttgccga aattttgatc gttgacgctg tttcgcgccg 20520
agtggtgaac cagccgggcg agatttcaat ttatccggct aagcacttta tcaccagcgc 20580
cgacgaacgc cagcgcgcca tcgctttgat taagaccgag ttggctgaga ggttgaaaga 20640
gttggttgcc gccggcaaga atctggaggc cgaacgcctg aagcgccgca ccaattacga 20700
tttggcgatg atcaaagaaa tcggctactg caatggcatt gagaattatt cacgccacct 20760
ctcggggcgg gcggcgggcg aggcgccggc caccttgctt gattattttc ctaagacttc 20820
tttcggtcgg cccgattttt tgaccatcat tgatgagtct cacgtaacgg tgccgcagct 20880
tggcgggatg tttgccggcg acgagaaccg gaagaaaaat ttggtggcct atggttttcg 20940
tctgcccagc gctctggaca atcgcccgct caagtttccc gagtttgaag cccgaattgg 21000
tcccactatc tataccagcg ccaccccggg caaatacgag cttgaagcca gtaatcccca 21060
aaaaggcggg cagatcatcg aacagattat ccggcccacc ggcctggtgg atccggcaat 21120
tgaaattaaa ccgatcgttt cgaccgcgcg ctatctcggg caaatccagg attttatcgc 21180
cgaggtgaaa aaagaaattg ctcaaggtcg gcgggctatc gccacgacct taaccaaacg 21240
gatggccgaa gatttgagcg agtatttgaa aggtgagggg attaaggccg aatatttgca 21300
cagcgagatc aaaacgttgg agcggatcaa aatcctcacc gacttccgcc gcggcgagtt 21360
cgactgcttg gtcggcgtta atctcttgcg cgaaggtttg gatctgcccg aagtgtcgct 21420
gatcggcatt ttggatgctg ataaggaggg cttcttgcgg tcggaagtgg cgttgatcca 21480
gaccattggc cgggcggcgc gcaatttggc cggccgggtg attctctacg cggagacgat 21540
aaccgactcg atgaagcggg cgatggatga gacggcgcgc cggcggacca aacaactggc 21600
ttacaatcag caacatggca ttacgccggt ttcaatcgtg aagaagatta aagacatcac 21660
cgacagtttg gctaaagatc ggcaacaatc ggttaccgct ctcttggcaa tagatgaaga 21720
gctttatggt aaaaacaaga aaaaattaat cagggagaag gtcaagcaaa tgagcgaagc 21780
ggtcaagaac ctcgatttcg aaaccgccgc tctcctccgc gacgaaatca agatcttgga 21840
aaacgtcaag actaaggcca aatgatatcg gaggatgatg ttggcgtgac atcccgccga 21900
caatttttat cccaattcat acacgaccgt gcacggatag ggatgattag gaagtctgag 21960
gcaggttgaa aaattttctc aaccaacgat cattttcgat ttgggtgact tccagatata 22020
aaatttcatt tccgattcgg taattggctt taatcatcgc gacaatttcg cggcaatcat 22080
aaggcgaaac ccagacgctg ttttgcaatc tgactaagcc aaggtggtgt aaccaacgac 22140
gaagtttgtc tcgggtgctt cgcttccatt ccttaatatc aaagatgatg attcgatatt 22200
tgcggtccca tttggacggt ttttttatgg tcaacttctt taactggtat tctcttaatc 22260
tcgcttgacc ttttttagtt aaacgaacaa ttttttgatt ttgatgattg gtttgaatct 22320
caagcaaccc ttggttcttc attttctcta ttaccgtatt ggtgtaatat tttttctttg 22380
attgttgtcc gggcaaatat tttagcagtt gaacgcagtt gggggccaac aaggtaaaag 22440
caatcacccc ggtgataccg atgatactta aaataagctc ttgataatcc gctttgtcta 22500
ttcgtgacat ataccttatt ataaacggtc gtataagata agggaagata gaaaagatag 22560
gaaaagaggg aatccctcaa agcttttttg tttgggtcgg atgtgttata atcgctaggt 22620
tccctatggg ccggcccacg gggggtttcg gcgtcatccg gaataagatt aagaaatttt 22680
tatggatcag aaacatcagg ataaaatcaa aatcaaaggg gcgcggacgc acaacctgaa 22740
gaatatcagt ttggagattc cgcgcgatca actcacggtg attaccggtt tatcgggctc 22800
gggcaagtct agcttggctt tcgacactat ttttgccgaa ggccagcgac gctatattga 22860
gtcactttca gcttacgcgc gccaattttt gaaacaatta cccaaaccgg aggtggacga 22920
gatctctggt ctctcgccgg cgattgccat tgaccagaaa tcgcgttcgc acaatccgcg 22980
ctcaaccgtg gcgaccgtga ccgagatcta cgattatctg cgcgtgctct acgcgcggat 23040
cggccggccg cactgtccgg tgtgtggagt ggcgattgag aaactctcgc tggaggaaat 23100
cgtgaatttc gccaaagaga aaattgccgt cagtcatcgg ggtaaaaaaa atctcaagat 23160
ttcaattacc gcgcccttgg tgcgcggacg gaaaggggag tattatcagc tcctctacga 23220
tttactggac aagggttacc tcgaagtgtt ggtggacggt caaacttatc aactgcgcga 23280
acgcatcgta atgaccaaga ccaagaagca tgatattgac gccgtggtcg acatgattga 23340
ttggagcgat cagggcgagg ttgtcgcggc cggccagcgt ttggccgagg cggtggaacg 23400
ggcgctcaaa gagtcggacg gtctagtgaa gattgtgatt gataacgaga acttcctgct 23460
ttcctccaaa ttttcttgcc ccaacgatgg cttctctttt cccgagattg aaccgcgact 23520
cttctccttc aattcgcctt acggcgcttg tcccacttgt cacggtattg gcaccaagca 23580
cctcttcggt ggcgaacctt gcgatacttg ccaaggggct cgcctgcgtc gggaggcctt 23640
ggaggtgaga attggcggca aaaacattat ggaagcggtg tcgctctcaa ttgccgacgc 23700
ggccagcttt ttcgacaagc tgaagttgac cccgaaagag aaaacaattt ccgaggtgct 23760
gtggcgcgag atcaaggcgc gattgaagtt tttgctcgat gtgggtttgg attacgtgga 23820
gttgaatcgc cgcgccgaca cgctctcggg cggtgaggcc caacgcatcc gcctggcttc 23880
gcagttgggg tcgcgtttgg tcggcacgct ctacgtgctt gatgaaccca cgattggttt 23940
gcatgctcgc gataacgcca aactgattaa gactttgctt gagttgcgcg atttgggcaa 24000
caccattgtg gtggtggagc acgacgaaga cacaattttt gcctctgatt atttggtgga 24060
tatcggccct ggggccgggg tgcacggggg caaggtggtg gccgccggtc caaccgagaa 24120
atttttaacc agcaagaaga acgattataa ttctttgacg attgattacc ttcggggcga 24180
caagactatc gctttgccgg aaaaacggcg aggaaaccag aagggcgcgc tgaaaattcg 24240
cgggggcaaa atttttaaca tcaagaatct caatgtggac ctgccgctct cgcgcttggt 24300
ggcgattacc ggcgtgtcgg gttcgggcaa atcctctttc gtctacgaaa ttctttataa 24360
aaatttgcag gccaaactgg agcgtcgtta tcgcaccaac accttgttta attgtcggga 24420
atttggcgga acggaatact tgagccgagt ggtcttagtg gatcagtcac cgatcggtcg 24480
gaccccgcgc tccaatccgg ccacttatac cggcgccttc accttcatcc gggaactttt 24540
tgcggcttcg gctctggccc gggcgcgcgg ctggaagccg tctcgcttct ccttcaacgt 24600
ggctggcggc cggtgcgagg cctgccaagg taacggcgaa gtggcggtgg agatgcattt 24660
cttacctacc atctttgttc cttgcgatgt ttgcggcggc aaacgctacg agaaggaaac 24720
tctggaagcg ctctataaag gaaaaaatat ttacgaagtg ttgcagatga cggtggaaga 24780
agcctttagt tttttcgaag atattccggc catcttcgac cggctcaaaa cgttgaacga 24840
agtcggtttg ggttatttgg aattgggtca atcggccacc accctctcgg gaggcgaggc 24900
ccaacgggtc aaaatctcca ctgaacttta tcggccgttt accgaacgca cgatttatat 24960
cttggacgaa ccaacggtcg gattgcatta cgaagatgtt aaaaacctaa acgaaatttt 25020
gcaaaaattg gtgaccaaag gcaataccgt ggtggtgatt gagcataatt tggaagtggt 25080
caagagcgcc gattacgtga ttgatctcgg gcccgccggc ggcaaagacg gcggcgagtt 25140
ggtggcggtc ggaacgccgg aagaattggc ctacgctcct ggctcccata ccgggaaata 25200
tctcaagcgt ctgttgaaac aacaataatt aaagttgaaa gatggaaagc cgggagctta 25260
aaaaatatca attgcccgat gggcccgggg tctacttctt caagcagggc cggcgaatcc 25320
tttatgtggg caaagccacg tcgctcaagg atcgggtgcg cagttatttt gccggtgatt 25380
tgggcgaaac gcgcggacca aaaattgagc ggatgcttga gttggccaac cgcgtggact 25440
ggcaaaccac ggactcggtg ttggaagcgc tcttgctgga gtcggccttg atcaagaaac 25500
atcaaccgcc ctataacacc agagaaaaag atgacaagag ctactggttc gtggtgatta 25560
ctcacgaacc ttttccccga gtattgttgt gtcggggccg gcaattgtcg aacggttcat 25620
tctctcttgc gcttaaaatc aaaaaaattt tcggcccttt tccccgttca agcgaaatca 25680
aggccgcctt gctcgtgatc cgaaaaattt ttccttatcg cgaccgttgt caactggcgg 25740
tggccggccg accctgtttt aatcgtcagc tcggactctg ccccggggtg tgcaccggcg 25800
aaattaacca aaccgattat cggcggctga ttgccaacat tgaacgcttg tttgccgggc 25860
gtaaaaggga attgctcgtt cgtctggaac gcgccatgaa acgagcggcc agaactcaac 25920
gtttcgaagc ggcgggtcaa attcgcaatc aaattttcgc cctcaaacat attcaagatt 25980
tggcgttgtt gaaatcaagc cccaaccgcc tcaagggaaa atccgttcgg atcgaggctt 26040
acgatgtggc tcattggcaa ggcgaggccg cggtgggagc catggcggtt tggcaagacg 26100
gagagttgga tcgaagtcag ttccgccaat tcaaacttcg ggcgacaacg ccgggggacg 26160
atttggccgg gttgcgcgaa atcttgactc gacgtctggg tcatcgggag tggcccgagc 26220
cctctctggt ggtggtggat ggagaccagc gacaggtcgc cacggcccaa gtcgcattgg 26280
ctcgtcaagg tcttgactgg ccggtagtcg gagtgaccaa agaccgtcat caccgcgccg 26340
tcgctttggc gggcaatctt gaggcagaga gttttgaccg tcaagccgtg attgaagtca 26400
acgacgcggc tcatcgcgtg gccattgctc atcatcgccg acgtttgcgt ttgggtcggt 26460
aaggtcaggg cttatccctt ggagcgctct tccgaaatat ggtaaaataa aggtcggata 26520
atcaacttta tgttttggtc tgacttagtc gcaaagttgc ccaccgagcc ctcggtttgg 26580
attgccgcgt tgggtttgtt tggggtcgcc tttttccttg gttatttttg gcaggatcaa 26640
tcgaccagga cgagatggca ggtcaagcag gagatgttga agaaccagca gattattgaa 26700
ctggaaaaag tcaaccagaa cttggcggcc aaaaatcgtg aactctatgc caaagaattg 26760
gagctgacca tcgccaacaa acatctccaa gcgctggaag cagccaaatc caaatttatc 26820
gccgtgacca ctcaccaatt gcgcacgccg ctctcggctg tgaagtggac gctggatttg 26880
gcggccaaag gtcaattggg caaggtcgac gaagagcaaa aaagtttctt aaacaaaggc 26940
ttgattagtg tcaaccgggt tattgccatc gtgaacgaac tcttgcgcgt ggactcggtg 27000
gagaccgatc aagtcgtcta ttgtttccaa cccgtcaatt ttatcaagct gttcgacgaa 27060
gtgttgtttg aattcgaagt gcaggccaag agcaaagggg tgaaactctc ggtgcgtcgg 27120
ccggagactg acctgcctcc aattgatttg gatgaaacca agattaaaat ggtgatggaa 27180
aatcttttcg acaacgccat taaatacacg ccggtgggcg gtctggtgga agtggttgtc 27240
tccgacaagc gtctcaaccg cgccgaaggg gcgattgagg tgacggtgcg cgattccggc 27300
atcggcatcc cgagcgagga aaagaacaac attttccaaa aatttttccg cgcgaccaac 27360
gcgatcaagg ccgagcccga cggttccggt ctcggtctct ttatcgctca cgatattgtg 27420
actcggcata atggctcaat gtggtttgag ccggccgcgg gcggaggcac gatttttacc 27480
ttcactttac cgattcatca gaagacgcta taattttaaa gactcttatc aatttaatct 27540
taaaagacaa tggacaagaa aaaaatccta atcgtggagg acgacgagtt cctccgttcc 27600
ctcaacgcca agaagctgga gagcgagggt tatgccgtta gtgtgtcgcc cgacgggacc 27660
agcgcgatcg aattgattcc tgaagaattg cccgacttgg tgtttctgga tcttctgttg 27720
ccgggcggca aagacggttt cgatgtttta acggcgatca aggccgacga aaaaaccaag 27780
aatattccgg tcgtggtttt ctccaatctc ggccaagccg aggatatcaa gaaggctaag 27840
gacttgggcg cgattgactt tttgatcaaa gccaacttta cccttgacga cgtggtgacg 27900
aaaattaaag aaattttgaa ataaaacaaa tcaatggcgc ccattcgagt cggtatcttg 27960
cgcggtggca tcggatccga gtatgaagtt tcgcttcgaa ccggcgccgg tgttttgcgc 28020
cacttgccgg gcgacaagta tcagccggtg gatattttgc tgtctcgaga cggggcgtgg 28080
tatgccggcg gtttgcgcgc cacccccgag cgggcggtac ggggagtcga tgtgatcttc 28140
aacgccttgc acggcgagtt cggcgaagac ggtcaagcgc aacaactgct tgattatctg 28200
ttcaagccct atactggttc cggcgcggtc gccagcgctc tggggatgga taagcctcga 28260
gccaaagagc tcttccggca ggctggtctg cgggtgccca acggcgcggt gcttcggcga 28320
gcggatcgtc ccgaggaaac cgatgccgag gcggtggctt acgatgtctt caaaaaaatt 28380
ccgccgcctt ggatcgtgaa gccggccagc ggtggctcct cggtggatct ccggctggcg 28440
cgccattacc ccgagttagt ggcggcggtg gccgccggcc ttaagcagaa cgatcgaatc 28500
ttggttgagg aatacgtgcg cggtcaagaa gccacggtgg gggtcgtcga tcgtctgcgc 28560
ggccgcgatc attatccgtt gttgccggtt gagattgtca cgctgccaga caaggtcttg 28620
tttgattacg aagcgaagta cggcggccaa accaaagaaa tttgccccgg ccgctttcgg 28680
ccggaagaca agcttgagtt ggaacgtcaa gccgttttga ttcatcaaca attaggcctg 28740
cgtcactatt ctcgttccga ttttatcatc tcgcctcgcg gtatctacgt gctggaagtc 28800
aacactttgc ccggcctgac cgaagagtct ctggtgccca aggcgctggc cgctgccggc 28860
atcgcttacc cgcagttttt ggatcacttg gtgaccttgg cgttagaacg acgctgaatt 28920
tgaaggacaa aaaagccccg cgagagaaga tgcagtgatc tcaagggggc aagaggaggg 28980
gatgaaaggt atgaaggaac taccaatgaa ggggatggaa ctgggacaaa agaacaaatt 29040
aggtggcaga gccttcagtg ccactcgaaa gctctgccgg ttagggtgta aaggtcgagc 29100
gagcgaccta tcttcaggtt atcataaggt gtgatttttt gcaagggcgg agggattatc 29160
ttggtggtgt tattataata gcatttgctc gaacttattt tcaagacaaa atgaaggact 29220
gaacgccccg ccacccgcct cgcggacttg gcggacacca gaaacaaaaa attttcttaa 29280
cattttccga tttggcgcga ggaagaattt ctcttaaatg gaaaagaaaa ttttgtttct 29340
ggtgttctgt cctcaaggtc tcgggcagtt ggcggggctt cagaaattcg gacagaaaat 29400
taaaaagtgt catccccccc aaaccccaac cactttttaa ttttctgatt cctacaatgt 29460
ttcgtttggt ggtgttattt tagcatttgc tcgaacttat ttccaagaaa aaatgaaaga 29520
ctagcgttcc ccgcgcgctg aagcgcctct gtgcaaagca cnnnnnnnnn nnnggggatt 29580
ttgaattttg tccgcgcgga ggcagggtct gggagggaat ccgcgcgggc tttatttttt 29640
tgaatttttt tggcgtagag cttgtataaa atacaattat atggtataaa aatagtaaga 29700
gaaagtcatc gtggctttct caaaaccgct cattgacaac taaaaaagga ggatccaatg 29760
attatttcat tcagtgggcc ctccggtatc ggtaagggct tcatcaaaga acgactatta 29820
cagctttatc cagacatcca agaattggtg tggtatacaa ctcgcacctt gcgaccaaac 29880
gaacaagggt caaacagaat tcaagtttca ctttccgagt ttaaccagtc ggttgaactt 29940
ggcaagctta ctttagtgca agatcttttt ggtcatcgtt atggtctaaa aaaagaagat 30000
ctcgtaacga gttcgggtat caagttgact gagttgcatc cagcaaatct agtggaagca 30060
ctcaaaatca acccgaagat ttttgcaatt ggtcttgtaa cttctgattt atcactactt 30120
cgtaaaagac ttactgttgt gagaaagacg gaaagcgaag cagagataga gaaaagagtt 30180
acgaaagcta aaagcgagat cgagataatt ctacaacaca ggtcttttta tgcttccgtg 30240
attgaaatta cagaagctga agaagatcaa gtgttcaaca aggttcatgc aatattgcaa 30300
tcacaaatca aaccgaaagg aggaaaaaat gaaactagaa acacaagttg gtagtctgaa 30360
gttgcacaca ccgttgttgc tggcttcagg ttacattacc gaaacaccag agttctttct 30420
gagagctcaa ccctacggct gttcgggtat cgttacccga tcacttaaac aaaatgttcc 30480
agcggaacga tcacggatta catctccacg ctatgcagtc tttggtaatg acagcatgct 30540
taactgcgag tggggaaatg aaagaccgtg gacggattgg cgagatcatg gagtgcaaca 30600
ggtcaaagca attggttgtc taatcatcat ttcgctttcg gggcgagatt tggatagctg 30660
ttgtaatttg attcgtgcat tcgataagat cggtgttgat gcctacgaaa tcaacatctc 30720
atgttcgcat tctggagcac tgcatgggaa tctgaatgtt gatgtgcttc acctagaaca 30780
actgatgaaa agagtgcgta acattacgac gactccaatc tggatcaagt tgtcgtattc 30840
aaacctgctg ttctcaatgg caaaacaagc cgaagagttt agagcagatg cgatagtgtg 30900
cacaaatagc atcggtccag gaatgttgat cgacaccaaa accgctaaac cgaaactcgg 30960
aatcaagggc ggaggcggtg gaatgacggg aaaagcaatt ttcccgatcg ctctatggtg 31020
tgtgcatcag ctttcaaaaa ccgtgagtat ccctgttgtc ggttgtggtg gaattttcac 31080
cgcagacgat gtaattcaaa tgctcatggc aggtgctagt gcagttcaac tctacacagc 31140
tcctgcgctg aaaggtccta cggtctttag acgagtaaag gctggactac aaaggtttct 31200
cgatgagaat ccgaagtatg cttcagtcaa agacctcgtt ggacttacgc tcgacaaaac 31260
aggtgagcat aagttttctt cacctcgtcc agtcgtgatt gaagaaaagt gcacaggatg 31320
tggaatctgt attcaatcct gtgcatttga cgccctgtca atggttcgta gtgctgatag 31380
caaagcactg gcggtcattg ccgataactg catctcatgc aacgcttgcg ttggagtatg 31440
tcctccgaaa ttcgacgcta tcaaagcatc attctaggag gtaatacaga aatgaaaaaa 31500
aacacataca tcatcgcggt tcactgcaat gcgtgtcgaa ccctactgta tcgttacaaa 31560
aaagaaggtg gtggacatct cctcaagtgt tatgccgaca tgataatgtc ggattacact 31620
aaaggcgatc taaggtgtcc ttcttgcggt caagagtttg ctcgacatgc aatcatccac 31680
aatcgctcag cacataagat aatccgaggg agagtctttg tgaagggtca tcatggataa 31740
catcatcaca acgggtggtt tgattcaatc agaccacccg ttattttttt attttagttc 31800
aaatctgttt ttgaaataat tagatgtata gtttttataa tcaaaaatct cattagattc 31860
tttatttagt ttttctacat attcaaaaaa ttgtttttta tcaaaaatat caagactaag 31920
ttctttacaa acatttgcaa ttcctttaac caattcatcg ccattttcat taccagaggc 31980
cattttttct gcttcgtaat aataactatg tcccggtact tctaccaatg caaattcaat 32040
atccttatat tcatatactt tactttttct aacacaaagc atacctttac caaatccaag 32100
ttcgccgaaa atttcaacca acgtgtcaaa atcgccttgt ttagtgaaaa ccgagagctc 32160
tttacgttgc tcatttcctc cccattcgcc aattttaaga ataatttcag gaattccatt 32220
ggtcactcgc aatcgtatat ctttttttct atgttctacc cctccctcta gaaaagttga 32280
ataatcaatc aatactctat ttttctctga tttctttttt ccactactgt caaaaaattt 32340
taccagattc tcaaattctc cttttgataa aggtcctcgt atttcaattt ctatattttc 32400
atccatattt attgattttt taggtttata aatagttgct ttattatcat ggtcgcataa 32460
ctaccagtag gtaagtaaaa ggaaagtgta attttcattt tatttttatg aagatcgtca 32520
gactctaaat catgagcata catattagtg gcgaccaaga gatttctctt gttcagtttt 32580
ggttttgcta aaaaattttc tggaattagt tcaaaccctc cagcttcaca aatatgtgga 32640
cattgaaata cagcatttgt tggcaaatat aatttgccaa catttttgaa tataaatttt 32700
ttacttttag tatttttctc tatcaacaaa gatgcctgtg tattccacag aaaactatta 32760
tatgcggaca caaaaaaaga aacttttttt ggattcatga catcaaaaac ctttttgtag 32820
tctgagatat cttttgcttt tagttcagct ccttgcgtaa tattatttgt aatttttagt 32880
tgttcataag cctgtttcca attatcttct actattgcct taccaatcag atgagtatta 32940
taggggccac caggcattcc aaatctttga ttgtcatagt aatttataaa ataaagttgt 33000
ttgtgattgt ggacataatt tgaaagatta tctgcaatcg tagaatttaa atttcttacc 33060
actattttaa aagcatttcc gtgtaaagcc ctttctttta ttggtttttc cccatgaccc 33120
attacaaact taatttttga aaattgattt ttaaatttgt gtttcttgtt aaatactatg 33180
atatcttttt ctttcaagat ttttttgatg gaaataagtt gttcggtaat agcatcctca 33240
tcttttaatc cttggctaca tacatcctca aatgaaagtt taaaaaatag ctttatttgt 33300
tctaaggctt caaatgttgt aaatccagat ttttgtagcc aaatataagt aaacttacgt 33360
ttaccttttg atataaatga tggcataaga gagacctccg tcatctgaaa gtcttcgttt 33420
atgtgtttta ttttataatc ctcatattta tccataatat aaataattta acataaataa 33480
ccttatttgt aaataattcg ccaaaaaatc ccaaaaaaca aaagcccgcg cggattccct 33540
cccagaccct gcctccgcgc ggacaaaatt caaaatcccc gccgaatttc aaaaacatta 33600
gtctcggttt tgcgaaccct tctcccagaa aatagttttt gcaaaaccga gtccatattt 33660
gcatttctgc acctcgcctc attctcccag attattagtg gcgaggggca gggcgtttcc 33720
ccgcacttct gcttcagcag aagctctgtg ctttgcacag aggcgcttca gcgcgcgggg 33780
aacgctagtc tttcattttt tcttggaaat aagttcgagc aaatgctaaa ataacaccac 33840
caaacgaaac ttgttcggaa ttaagaaagc ggagcgattt tgcgggagcc aaaatcgcgc 33900
tatcattttt ttcaaaaccc tttccgccta cggcggaagc ggtgaattcc caaagttccc 33960
cccaattgaa atcatgaaag acctcaaacc aaaatatttt ctctacgcga ggaaatcaac 34020
agaggatgat gaccaccaaa taatg 34045
<210> 24
<211> 11142
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<220>
<221> features not yet classified
<222> (6655)..(6659)
<223> n is a, c, g or t
<400> 24
catcttcatt tgtatgcgta tcagagagat caaaaactat gttatcaatg atggcgcggt 60
atggttcaat gagatcgaag gcgagagcgg ggtaatcagt cgtctcgtgc aggaacccat 120
ggaagggaga gaggtgatgg tagtgaatcc atcggagaaa aattcctatt aaaaattttg 180
acatcgcatt cagcgcgttg ctggccgggt ttttaccgcg cctcatgaag gctgaatgtc 240
cgagcttctt gaaatatgcg ctccaatagc gctggctgtg gagcgcttcg tgattgcgca 300
gttcctgaat ggtcatggtg cgggagagtt tttttgcagg aggaacgata agccacgcca 360
tgctgttgaa ttttgcgttt agaatttgcc gcgcgatata ctttttgatg cgcaaatcag 420
agcgctttac aagttgttga gaaagcaggt catttccatc tgcacggtta ctggcggtaa 480
tccagactgt attggttaaa tttctcctat gaataatgat aggaatttta tgacgcgctg 540
tgaattcaag tgtgctcggg gctaagggag ggctatctcc gtaaatcatg atggagagga 600
gcttggcagg attgcaggtt acttcgcctc ctttatactt aatgtgaata tttttccctt 660
tgacttcaaa tgtttcaaca taaggcgccc aaagaggtat tttttgcgag tatgttttca 720
tgttatagaa taaagtgagt attgaaatat aaaactttat atggtaatgt aagacacata 780
attttgcaag atgtgttgca aaaaagcgat tttttgaggg gtcgccccga atatagggga 840
caaaaaggct agcatacttt tttggaaccc cgaatatagg ggacaaaaag gcttatgagc 900
tgaaaaagat ccccgaatat aggggacaaa aaggcacgcc gctttcgcgt tcaaccccga 960
atatagggga caaaaaggca attaccgcat aaatcatccc cgaatatagg ggacaaaaag 1020
gcaacatgac ccaccctcct ccccgaatat aggggacaaa aaggctatga gacttctgaa 1080
atccccccga atatagggga caaaaaggct taagccccat gctttctccc cgaatatagg 1140
ggacaaaaag gctgaagtac gcaatctgca accccgaata taggggacaa aaaggcatgc 1200
tgtttgtatc ttcaccccga atatagggga caaaaaggca aggatattca agcgcacccc 1260
ccgaatatag gggacaaaaa ggcttaccac acaacttatt gaccccgaat ataggggaca 1320
aaaaggctgt gagcgatgta aaccaccccg aatataggag acaaaaaggc gcgtggtcaa 1380
tgctcgtgcc ccgaatatag gggacaaaaa ggcctttagc ttcatttaag attttaggta 1440
tttccggaca gcggcttgac cgcatcgtcc tcgccttttc ctaaaatcgc ccctcttaaa 1500
tcgcttgcct tacagacgca tgtataaaga tattttgaag attaagttat cgcatacttt 1560
atgagtaagc gacatcctag aattagcggc gtaaaagggt accgtttgca tgcgcaacgg 1620
ctggaatata ccggcaaaag tggggcaatg cgaacgatta aatatcctct ttattcatct 1680
ccgagcggtg gaagaacggt tccgcgcgag atagtttcag caatcaatga tgattatgta 1740
gggctgtacg gtttgagtaa ttttgacgat ctgtataatg cggaaaagcg caacgaagaa 1800
aaggtctact cggttttaga tttttggtac gactgcgtcc aatacggcgc ggttttttcg 1860
tatacagcgc cgggtctttt gaaaaatgtt gccgaagttc gcgggggaag ctacgaactt 1920
acaaaaacgc ttaaagggag ccatttatat gatgaattgc aaattgataa agtaattaaa 1980
tttttgaata aaaaagaaat ttcgcgagca aacggatcgc ttgataaact gaagaaagac 2040
atcattgatt gcttcaaagc agaatatcgg gaacgacata aagatcaatg caataaactg 2100
gctgatgata ttaaaaatgc aaaaaaagac gcgggagctt ctttagggga gcgtcaaaaa 2160
aaattatttc gcgatttttt tggaatttca gagcagtctg aaaatgataa accgtctttt 2220
actaatccgc taaacttaac ctgctgttta ttgccttttg acacagtgaa taacaacaga 2280
aaccgcggcg aagttttgtt taacaagctc aaggaatatg ctcaaaaatt ggataaaaac 2340
gaagggtcgc ttgaaatgtg ggaatatatt ggcatcggga acagcggcac tgccttttct 2400
aattttttag gagaagggtt tttgggcaga ttgcgcgaga ataaaattac agagctgaaa 2460
aaagccatga tggatattac agatgcatgg cgtgggcagg aacaggaaga agagttagaa 2520
aaacgtctgc ggatacttgc cgcgcttacc ataaaattgc gcgagccgaa atttgacaac 2580
cactggggag ggtatcgcag tgatataaac ggcaaattat ctagctggct tcagaattac 2640
ataaatcaaa cagtcaaaat caaagaggac ttaaagggac acaaaaagga cctgaaaaaa 2700
gcgaaagaga tgataaatag gtttggggaa agcgacacaa aggaagaggc ggttgtttca 2760
tctttgcttg aaagcattga aaaaattgtt cctgatgata gcgctgatga cgagaaaccc 2820
gatattccag ctattgctat ctatcgccgc tttctttcgg atggacgatt aacattgaat 2880
cgctttgtcc aaagagaaga tgtgcaagag gcgctgataa aagaaagatt ggaagcggag 2940
aaaaagaaaa aaccgaaaaa gcgaaaaaag aaaagtgacg ctgaagatga aaaagaaaca 3000
attgacttca aggagttatt tcctcatctt gccaaaccat taaaattggt gccaaacttt 3060
tacggcgaca gtaagcgtga gctgtacaag aaatataaga acgccgctat ttatacagat 3120
gctctgtgga aagcagtgga aaaaatatac aaaagcgcgt tctcgtcgtc tctaaaaaat 3180
tcattttttg atacagattt tgataaagat ttttttatta agcggcttca gaaaattttt 3240
tcggtttatc gtcggtttaa tacagacaaa tggaaaccga ttgtgaaaaa ctctttcgcg 3300
ccctattgcg acatcgtctc acttgcggag aatgaagttt tgtataaacc gaaacagtcg 3360
cgcagtagaa aatctgccgc gattgataaa aacagagtgc gtctcccttc cactgaaaat 3420
atcgcaaaag ctggcattgc cctcgcgcgg gagctttcag tcgcaggatt tgactggaaa 3480
gatttgttaa aaaaagagga gcatgaagaa tacattgatc tcatagaatt gcacaaaacc 3540
gcgcttgcgc ttcttcttgc cgtaacagaa acacagcttg acataagcgc gttggatttt 3600
gtagaaaatg ggacggtcaa ggattttatg aaaacgcggg acggcaatct ggttttggaa 3660
gggcgtttcc ttgaaatgtt ctcgcagtca attgtgtttt cagaattgcg cgggcttgcg 3720
ggtttaatga gccgcaagga atttatcact cgctccgcga ttcaaactat gaacggcaaa 3780
caggcggagc ttctctacat tccgcatgaa ttccaatcgg caaaaattac aacgccaaag 3840
gaaatgagca gggcgtttct tgaccttgcg cccgcggaat ttgctacatc gcttgagcca 3900
gaatcgcttt cggagaagtc attattgaaa ttgaagcaga tgcggtacta tccgcattat 3960
tttggatatg agcttacgcg aacaggacag gggattgatg gtggagtcgc ggaaaatgcg 4020
ttacgacttg agaagtcgcc agtaaaaaaa cgagagataa aatgcaaaca gtataaaact 4080
ttgggacgcg gacaaaataa aatagtgtta tatgtccgca gttcttatta tcagacgcaa 4140
tttttggaat ggtttttgca tcggccgaaa aacgttcaaa ccgatgttgc ggttagcggt 4200
tcgtttctta tcgacgaaaa gaaagtaaaa actcgctgga attatgacgc gcttacagtc 4260
gcgcttgaac cagtttccgg aagcgagcgg gtctttgtct cacagccgtt tactattttt 4320
ccggaaaaaa gcgcagagga agaaggacag aggtatcttg gcatagacat cggcgaatac 4380
ggcattgcgt atactgcgct tgagataact ggcgacagtg caaagattct tgatcaaaat 4440
tttatttcag acccccagct taaaactctg cgcgaggagg tcaaaggatt aaaacttgac 4500
caaaggcgcg ggacatttgc catgccaagc acgaaaatcg cccgcatccg cgaaagcctt 4560
gtgcatagtt tgcggaaccg catacatcat cttgcgttaa agcacaaagc aaagattgtg 4620
tatgaattgg aagtgtcgcg ttttgaagag ggaaagcaaa aaattaagaa agtctacgct 4680
acgttaaaaa aagcggatgt gtattcagaa attgacgcgg ataaaaattt acaaacgaca 4740
gtatggggaa aattggccgt tgcaagcgaa atcagcgcaa gctatacaag ccagttttgt 4800
ggtgcgtgta aaaaattgtg gcgggcggaa atgcaggttg acgaaacaat tacaacccaa 4860
gaactaatcg gcacagttag agtcataaaa gggggcactc ttattgacgc gataaaggat 4920
tttatgcgcc cgccgatttt tgacgaaaat gacactccat ttccaaaata tagagacttt 4980
tgcgacaagc atcacatttc caaaaaaatg cgtggaaaca gctgtttgtt catttgtcca 5040
ttctgccgcg caaacgcgga tgctgatatt caagcaagcc aaacaattgc gcttttaagg 5100
tatgttaagg aagagaaaaa ggtagaggac tactttgaac gatttagaaa gctaaaaaac 5160
attaaagtgc tcggacagat gaagaaaata tgatagacgt tgtttttaca ccatcgctat 5220
tgactaggtg atctttacgt cagaacccca tcagaaattc cttaaactcc tcaaacttgt 5280
ttgaaagcgg gagaacctgt ttttgtttgt gtagaagctt tttgagatca gcggggagag 5340
gtattttttt gccgatgagt ggttccacta ttgcgttgaa tttcactgga tgcgcggtct 5400
caagaaaaat gccgagagta tttttctttt tattttgagc acaatatttt ttgaggccta 5460
aataggcaac cgcgccgtgc ggatctgcac tatagccaca gcggttatac agttcagaaa 5520
ttgccccgcg cgtttcagcg tcagtaaacg atgcgccgaa aatatctttt tgcatttcag 5580
cgcgttcatc atgatacaga gtgcgcatac gcgcgaagtt actcggattt ccgatatcca 5640
tggcatttga aattgttcgt attgacggtt ttggaatgaa cggctcaccg cataaatatc 5700
gcgggacgac atcattgctg tttgtggcgg cgatgaattg tctcacagga agccccattt 5760
tttttgcaat gagccctgcg gtgaggttgc caaaatttcc gcacggcact gaaaatacaa 5820
gcggcgggca tacagcgaac gagcgagctt gcgcttgggc atacgcgtaa aaataataga 5880
atgtctgcga aataagccgc gcgatattga ttgaatttgc agaggcaagg cgcaatgttc 5940
gggcaagctc ccgatcggca aatgcttgtt ttacgagggt ttggcagtcg tcaaacgtgc 6000
cgtttatctc aagcgccgtg atgtttttgc ctaagccagt aatctgtttt tcctgaatag 6060
cacttactcc gtcttttggg tatagaatta taatgtgcac gcgctcactt tgaaaaaagc 6120
tgtgcgccac tgccgcgccg gtgtctccgc ttgttgcggc aagaatggtt aaacatctgt 6180
cgtcattttc caaaaaataa cacatcaatt ccgccatgaa tcgcgcgcca aaatctttaa 6240
acgagagtgt ttggccgtga aaaagttcaa gtacagcgag cgtttcattt aaaaacacaa 6300
gaggcgcgtc aaatgtgaga gatttttcaa taatgcggtt gatgtcttgt tttggaattt 6360
tagggaacca caactcgctt gtttcccgcg caatatcttt gagggatttt ttggcaatgc 6420
ttttgaaaaa tgatgaagag agccggggaa tttcaagcgg catgaacagg ccgccatccg 6480
gcgcgagcgg ggaaaagaga ccatgtttaa aggaaaaaat tttattgttt ctatttgtgc 6540
ttttaagctt catggcaggt ttgtataaaa ttctctgctg aaaattcggg cgaccgtagt 6600
ctgtgatagg ggatggttgc gtgcgcgtat tgtttatagc gattggtgcg atagnnnnnc 6660
agttttgggt aacatcgcgc gagcgcagag cgattgtttt cgttattccg cttttcaaac 6720
atattccccc acagcacggg ctttggatcg cgaaggtact gttcaaacat ttctttgcgt 6780
acttttgccg gcgtgtataa atataccaca cgcgtatatt ttttgagcag attgcataat 6840
gcggggtcaa cataaataac actccctgtc gtgtcaataa ctgtgcgaca atcaagtttt 6900
cttttttgta ttaaaccgat aatttttcgt ataacgctac gctcgcaacg caaataatgg 6960
ctttgattcg cgttgtattg ggactcgtat ggctggccaa gccatcgcga tacatcttga 7020
atgcccttat agccgtgctt tttaagcaag gaagcaagct ttttttcaat taaatcgtca 7080
cagcagatat gcgcgtaccc aaagcgcgca agctgttgcg cccagtatga ttttcccgcg 7140
cctgacatgc cgataagcgc gattggtttt tcttgcacac tatatatgtt cataaacgca 7200
ctgccttaaa aatatctgaa aaaactcctg cggatgtcac ctctgcgcct gctcctttgc 7260
ctcgtacgat aagcggtgtt tcatggtaat gatcggtggt aaatgaaaat atattgtcgc 7320
tcccgcggag cccggcaaac ggatgattag aggcaacttc tttaagaaac atttttgcct 7380
tgccattttc tatttcagca acaaagcgaa gcactgcgcc gcgtgcgatc gcgcgttgtt 7440
tttttgcttc aaattgggcg tcgtaccgtt caagtgtttt taaaaattct ttaacggttt 7500
ccttttttct gccttgcgga atgagctgtt ctatttcaac atccgcgcat tccatgggga 7560
gagcgcactc tcttgcaaca atcaccaatt ttcgcgccgc gtccatgccg tttaagtcgt 7620
ttcgcggatc tggttccgtg taaccgagct tctgcgcctc gcgcaccgct ttgctcaatg 7680
ttgtatttcc ctcaaatgag ttaaagatat agcttagcgt tccagaaacg attgctgaaa 7740
ttttttctac gcggtcgccg cagagcatga aatctcgtat ggtggaaagc acaggaagcc 7800
ctgccccgac ggttgtttca tataaaaacc gcgtatggtt ttgagaggcg agtagtttta 7860
aatttttata gaatttaaaa ttggatgaaa ggcctttttt attcggcgtt acaatggcaa 7920
tgcgctctgc aagtatggtg ttatagaggg cgggaatttc ttcgctcgcg gtgcagtcca 7980
caaacacggc gtttggaagg cgcattgcct tcatgccggc gacaaattga gcaagatcag 8040
ctttttgtcc gcgcgtgtta agctcttctt tccagccaga aagcgtgccg aggtgttccc 8100
caagaaccat tttcttggtg ttgacgatgc ctgcaacttt gagcgcaata ccctcctctg 8160
ccaaaagccg ctctctttga gcattgattt tcgtaagaag cgcagatccg ataagcccgc 8220
ttcccgcgag aaacacgtga atgttttgtg gtgccatagg tataaaaaaa ccgctccaga 8280
catgtgggta atgtccggag cggaagaagt tataatgcgc cttgttttta tttttaactc 8340
ttcacaacca aacatcaccc gccttttgcg gtaatagtgg tgatgatggt agtgatgcta 8400
ttttgacgca taagaatttt tttgactctc atagtatagc acaagtaaaa ttttttgcgc 8460
aaggttttgg tgagttgata gagttttgag gttgatatct aattgtcaag aaacggggat 8520
aatgtgcaca cattatcaca acagattgaa tatatgcggg ttttgtgaaa taatggcatt 8580
atatatcttg atgaacctca ccaaactcgc caattttttc tttgaacttg gcatgatgaa 8640
acgggaaaag catcagggtt ttgctattgc gggcgtgcat cacgacatgg ggtctttagc 8700
ggatcatacg tgtcgcgcgg ctttaattgg cgcaatttta gcggaaatgg aaggcgcgga 8760
cgtgaataaa gttgccatga tggtgctttt gcacgatata ccggaaacgc gcattgggga 8820
tcatcataaa gttgcggcgc ggtatttgga tacgaaaaaa gtggaacgcg ctattttttt 8880
agaacaaatt cagtttctgc ctgatccttt gcaaaaaaaa tggctcgcgc tctacgacga 8940
aaaagcaaag agaagcacta aagagggtat tgtcgcaaaa gacgcggact ggcttgaact 9000
ggcgatttcc gcgcgtgaat acatacacat cggctataaa gatttgcagt tgtgggttga 9060
taatgttcgg agcgcgcttg aaactgaatc cgccaaaaaa cttcttgcag aaatagaaaa 9120
acaaggcacc tacgactggg cccgcggttt agaaaagatg acatatcaga aattatcgtg 9180
atctgcaatt ttttgctata attataaaaa agtttcattc caacatctaa cgcaacattg 9240
aggaaaaact tcaatgcaat gatgagtatt gtgaaaaagt tgggaccagc tctctttccc 9300
attttgcagg atatgcgtct ctcgtatcag gtgcatggaa aggagtaaaa aaatacacgc 9360
cgcttgcaaa tttagaagac gtacggaata gagccgttgc gattagaaaa gaagcagaca 9420
aagaaaagcc agatagttta gagattgatc gtattttaac ggattttatg aatgcggagc 9480
taaaggaatt atggaatacc atagataaac gtattgttga tgcggcgaaa aagtttatac 9540
aaaacttcaa agatcatccc gaagacgcga ggagagcgaa ggtggagagt tggggactag 9600
aagaatggaa aagagattta gaacggatag tcaaaacccc aattaatcaa atgatggggg 9660
acgcatcatt tgtgattaac agaggagtgg atcagtatcg tgcgcgcgat atggcgaaaa 9720
ttatgggtaa gataagtgtt ttttatcaac cccttgtgtg ggagaaggcg tcataaccca 9780
tgagaattat cacaaaattc tctgcttcat atacaccatc gctccgtaaa gccccgagga 9840
atcgcagagc tttgattttt gaatcggcgg aaaggacggg aacaggggtt gatttgattt 9900
cttgacacgc tgtgagttgg gcagtagagt agtaagaaag taatattttt ttatattcat 9960
gaacactaag ataatacaaa aagctacatc tcgggggaaa attacgcttc caggacagtg 10020
gcgtaaaaag tttcctacga accaatatct tgttgaagtg gaagaagatt tgcttaagat 10080
taagcctttt gaagtggaca cggcggggca attagaagaa caagtaaaag tgttgaattg 10140
tgtcaataga tttgagggac ttgcgataaa aggaagaaaa tttgctaaaa agagaggaat 10200
taaaatggac gatgttttaa aagatgatta aagcagtact tgatacgaat attttaattt 10260
ccgcactttt ttggaaaggc accccatata ttattgtgca ggatggatta gagggtgtgt 10320
ttgaaatggt tacttcaaaa gcaataatga gtgaaacgaa agagaagttg attcaaaaat 10380
ttgaattttc tgttgaagat actctaagat acttggaact cttggtttgt aagtcgttcg 10440
ttgtatcacc gatggtacag cataatgtgg tgaaaaatga tagtactgat aataaaattc 10500
ttgagtgtgc ggtaagcgcc aacgcagatt atattgtgac aggagataaa catctactaa 10560
atatcaagca ttatcaaggg atcactattc tcactgcacg cagatttgat gagatacttg 10620
aaaatgaacg gagtagaatg agaagaaata agcgataggg acagaataac ttggatccaa 10680
ccttctaacg caacagcgtt aagaatgaat taattgattg aaaacctcgt atggtgtttg 10740
aaagtcgagt gtttttctcg gtcggccatt caggagatgt tgcgctcgtt tcacttcgta 10800
ccgcgatacc ttggtaaagt tggttccttt cggaaaaaat tgtctgatga gtccattggt 10860
gttttcgttc gtgcctcgtt cccatggact ccggggatgg gcgaagtaga ctttgactcc 10920
ggtcagattc gtgaataatt tgtggctggc catttcccgc ccttggtcgt atgtcatcgt 10980
cagtctcatt tgtttcggca attttttcac ttccttggca aacgctttgg ccacatcttc 11040
ggcagatttg cttttcacgg ggataaggat agtcgtgcgg gtcgtgcgct caaccagagt 11100
gccaagagcc gaacgattgt tctttccaac aatgagatcg cc 11142
<210> 25
<211> 13879
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 25
tttccaccgc cgctcaatca gtctagacat acaggtggaa aggtgagagt aaagacgtga 60
caaccttctc atcctcttca aagtctagac atacaggtgg aaaggtgaga gtaaagacaa 120
accgtgccac actaaaccga tgagtctaga catacaggtg gaaaggtgag agtaaagact 180
caagtaacta cctgttcttt cacaagtcta gacctgcagg tggtaaggtg agagtaaaga 240
cttttatcct cctctctatg cttctgagtc tagacattta ggtggaaagg tgagagtaaa 300
gacttgtgga gatccatgaa cttcggcagt ctagacctgc aggtggaaag gtgagagtaa 360
agacgtcctt cacacgatct tcctctgtta gtctaggcct gcaggtggaa aggtgagagt 420
aaagacgcat aagcgtaatt gaagctctct ccggtccaga ccttgtcgcg cttgtgttgc 480
gacaaaggcg gagtccgcaa taagttcttt ttacaatgtt ttttccataa aaccgataca 540
atcaagtatc ggttttgctt tttttatgaa aatatgttat gctatgtgct caaataaaaa 600
tatcaataaa atagcgtttt tttgataatt tatcgctaaa attatacata atcacgcaac 660
attgccattc tcacacagga gaaaagtcat ggcagaaagc aagcagatgc aatgccgcaa 720
gtgcggcgca agcatgaagt atgaagtaat tggattgggc aagaagtcat gcagatatat 780
gtgcccagat tgcggcaatc acaccagcgc gcgcaagatt cagaacaaga aaaagcgcga 840
caaaaagtat ggatccgcaa gcaaagcgca gagccagagg atagctgtgg ctggcgcgct 900
ttatccagac aaaaaagtgc agaccataaa gacctacaaa tacccagcgg atcttaatgg 960
cgaagttcat gacagcggcg tcgcagagaa gattgcgcag gcgattcagg aagatgagat 1020
cggcctgctt ggcccgtcca gcgaatacgc ttgctggatt gcttcacaaa aacagagcga 1080
gccgtattca gttgtagatt tttggtttga cgcggtgtgc gcaggcggag tattcgcgta 1140
ttctggcgcg cgcctgcttt ccacagtcct ccagttgagt ggcgaggaaa gcgttttgcg 1200
cgctgcttta gcatctagcc cgtttgtaga tgacattaat ttggcgcaag cggaaaagtt 1260
cctagccgtt agccggcgca caggccaaga taagctaggc aagcgcattg gagaatgttt 1320
tgcggaaggc cggcttgaag cgcttggcat caaagatcgc atgcgcgaat tcgtgcaagc 1380
gattgatgtg gcccaaaccg cgggccagcg gttcgcggcc aagctaaaga tattcggcat 1440
cagtcagatg cctgaagcca agcaatggaa caatgattcc gggctcactg tatgtatttt 1500
gccggattat tatgtcccgg aagaaaaccg cgcggaccag ctggttgttt tgcttcggcg 1560
cttacgcgag atcgcgtatt gcatgggaat tgaggatgaa gcaggatttg agcatctagg 1620
cattgaccct ggtgctcttt ccaatttttc caatggcaat ccaaagcgag gatttctcgg 1680
ccgcctgctc aataatgaca ttatagcgct ggcaaacaac atgtcagcca tgacgccgta 1740
ttgggaaggc agaaaaggcg agttgattga gcgccttgca tggcttaaac atcgcgctga 1800
aggattgtat ttgaaagagc cacatttcgg caactcctgg gcagaccacc gcagcaggat 1860
tttcagtcgc attgcgggct ggctttccgg atgcgcgggc aagctcaaga ttgccaagga 1920
tcagatttca ggcgtgcgta cggatttgtt tctgctcaag cgccttctgg atgcggtacc 1980
gcaaagcgcg ccgtcgccgg actttattgc ttccatcagc gcgctggatc ggtttttgga 2040
agcggcagaa agcagccagg atccggcaga acaggtacgc gctttgtacg cgtttcatct 2100
gaacgcgcct gcggtccgat ccatcgccaa caaggcggta cagaggtctg attcccagga 2160
gtggcttatc aaggaactgg atgctgtaga tcaccttgaa ttcaacaaag catttccgtt 2220
tttttcggat acaggaaaga aaaagaagaa aggagcgaat agcaacggag cgccttctga 2280
agaagaatac acggaaacag aatccattca acaaccagaa gatgcagagc aggaagtgaa 2340
tggtcaagaa ggaaatggcg cttcaaagaa ccagaaaaag tttcagcgca ttcctcgatt 2400
tttcggggaa gggtcaagga gtgagtatcg aattttaaca gaagcgccgc aatattttga 2460
catgttctgc aataatatgc gcgcgatctt tatgcagcta gagagtcagc cgcgcaaggc 2520
gcctcgtgat ttcaaatgct ttctgcagaa tcgtttgcag aagctttaca agcaaacctt 2580
tctcaatgct cgcagtaata aatgccgcgc gcttctggaa tccgtcctta tttcatgggg 2640
agaattttat acttatggcg cgaatgaaaa gaagtttcgt ctgcgccatg aagcgagcga 2700
gcgcagctcg gatccggact atgtggttca gcaggcattg gaaatcgcgc gccggctttt 2760
cttgttcgga tttgagtggc gcgattgctc tgctggagag cgcgtggatt tggttgaaat 2820
ccacaaaaaa gcaatctcat ttttgcttgc aatcactcag gccgaggttt cagttggttc 2880
ctataactgg cttgggaata gcaccgtgag ccggtatctt tcggttgctg gcacagacac 2940
attgtacggc actcaactgg aggagttttt gaacgccaca gtgctttcac agatgcgtgg 3000
gctggcgatt cggctttcat ctcaggagtt aaaagacgga tttgatgttc agttggagag 3060
ttcgtgccag gacaatctcc agcatctgct ggtgtatcgc gcttcgcgcg acttggctgc 3120
gtgcaaacgc gctacatgcc cggctgaatt ggatccgaaa attcttgttc tgccggttgg 3180
tgcgtttatc gcgagcgtaa tgaaaatgat tgagcgtggc gatgaaccat tagcaggcgc 3240
gtatttgcgt catcggccgc attcattcgg ctggcagata cgggttcgtg gagtggcgga 3300
agtaggcatg gatcagggca cagcgctagc attccagaag ccgactgaat cagagccgtt 3360
taaaataaag ccgttttccg ctcaatacgg cccagtactt tggcttaatt cttcatccta 3420
tagccagagc cagtatctgg atggattttt aagccagcca aagaattggt ctatgcgggt 3480
gctacctcaa gccggatcag tgcgcgtgga acagcgcgtt gctctgatat ggaatttgca 3540
ggcaggcaag atgcggctgg agcgctctgg agcgcgcgcg tttttcatgc cagtgccatt 3600
cagcttcagg ccgtctggtt caggagatga agcagtattg gcgccgaatc ggtacttggg 3660
actttttccg cattccggag gaatagaata cgcggtggtg gatgtattag attccgcggg 3720
tttcaaaatt cttgagcgcg gtacgattgc ggtaaatggc ttttcccaga agcgcggcga 3780
acgccaagag gaggcacaca gagaaaaaca gagacgcgga atttctgata taggccgcaa 3840
gaagccggtg caagctgaag ttgacgcagc caatgaattg caccgcaaat acaccgatgt 3900
tgccactcgt ttagggtgca gaattgtggt tcagtgggcg ccccagccaa agccgggcac 3960
agcgccgacc gcgcaaacag tatacgcgcg cgcagtgcgg accgaagcgc cgcgatctgg 4020
aaatcaagag gatcatgctc gtatgaaatc ctcttgggga tatacctggg gcacctattg 4080
ggagaagcgc aaaccagagg atattttggg catctcaacc caagtatact ggaccggcgg 4140
tataggcgag tcatgtcccg cagtcgcggt tgcgcttttg gggcacatta gggcaacatc 4200
cactcaaact gaatgggaaa aagaggaggt tgtattcggt cgactgaaga agttctttcc 4260
aagctagacg atctttttaa aaactgggct gctggctatc gtatggtcag tagctcttat 4320
ttttttactt gatatatggt attatctcaa taatatgcat ctcttcatag atacaacaga 4380
aaaagaatca tttgatattg ctttgattga tgatgagcgc gttatcaaaa agaagcgaat 4440
caaatcaatc cgccaacatt cggaaaagct tttgaaatca attgacgcgc ttttgttgtc 4500
cgcaaaatca tctctgaaag atatacaagg catcatcgcg gtaaaaggcc ctgggtcatt 4560
tacctcattg cgcattggaa tcgcgacagc caacgcgttg gcattcgctt tgggagtggg 4620
gattgctgga gttgacaaaa cagatgagtg gagtaagatt gtttcttcag cagatttgat 4680
ctttaaaaag caaaaaaaga acttaaatat cgtcataccc gaatacggca gagagccgga 4740
cattacctaa ataggagggt ttagaaatgt tattgctcat tttgattctc acaatagttt 4800
tgagcatcat tcttttgtgc ttttgcgcgt ttattctctg cataatcaca gaagatggca 4860
gggaaatgct tttgatgttt ggaataggca aatgccactt gaattattaa agtggctttt 4920
ttatttgtac aaaaacagtg tcagagcgcc gattcggcgc tctgacactg ttttacaaac 4980
cctcacccca accctctccc gaatacagga gagggaattt ttatactgtg cataacttgt 5040
gcgcaaatag tgcctagata agggttgcgt aaaattacaa gagtggtgta taatatcatc 5100
atagtggtga ggagtgggga taagtggtgg agaacctcat caataataga taccaatgtt 5160
cataggagaa tacaaacata ctattgatac caaaggaaga atggcaatac ctgccaaatt 5220
tcggcaggat ttgaaaaagg gcgcaatcgt aacaaaagga ttggataatt gcctttttgt 5280
atacactcaa gatgaatgga aaaaactcgt ggacaagcta tctaatcttc caatctcaca 5340
gcagaaaagc cgggcatttg ccagattaat gctagcagga gcaatggacg tgcaaattga 5400
ctcccaaggc agaattctta taccagaata tcttcgcaaa ttcgcgtcaa tcaagaaaga 5460
caccataata gcagggcttt acagtcggct tgaaatatgg gattcaaaag aatgggaaaa 5520
atacaaatca gccactgaaa agataagcac aaaaatagct gaagagctca cgctctaggc 5580
caaaaacaaa aataaaattc aaaacaatca cgagatcctt cgactccgcg agtacgcttc 5640
gctcagagcc tgccccgagt attccgaggg gatgacggtt gaaattcgga tggcataata 5700
attttatttt tggagctggt cttttagtag ctccattttt tatcccatga gcaaatcaga 5760
acacatacca gtattattaa acgaagtaat tgaaggtctt gacttgtcct ctaatgatac 5820
agtaatagac gccacagtag gcggagcagg acacgcgcaa gctattttag aaaaaaccgc 5880
gccatcaggc aagcttcttg gaattgattg ggacgcgaaa gcaatcgagc gcgcgcgaga 5940
acatctaaaa agatttagca accgaattat attaaaaaca ggaaattaca cagatataaa 6000
acaacttctc tatgaatcag gaattaataa ggttaatgct atattattgg acttgggctt 6060
atctcttgat caactcaaag attcctctag aggatttagc ttccaatctg aaggaccatt 6120
ggacatgagg ttttctgacc agatggacac aacagctttt gatattgtga acacctggcc 6180
agagaatgat ctggtacaaa tctttcaaga atacggtgaa gagaggcgcg ctgcacgtgc 6240
agcacgcaat atcgccactg cgcgcagtca cgcgccaatc aacaccgcaa aagatctggc 6300
agaattagtt atgcgcgggg ccggaaggcg aggcaaggtt catcccgcta cccgcatatt 6360
ccaggccctg cgcattgcta caaatcatga attagacaat gtcaaacaag cattgcctaa 6420
tatgattgat atgctttctt cagaaggaag attagcagtt atcacattcc attccttaga 6480
agaccgcatt gtgaagcagt atttcaagcc attggctaaa gaggaaaatc cgcgcattaa 6540
gctcatcaat aagaaagtaa taaagccaag ccgagaggag caagtgaaaa atccagcatc 6600
cagaagcgcg aaattgagaa tcgtggaaaa gatttaatca ttccaaaaac aaaaatagca 6660
tcacatgaca acatattcgc acaaaaaaac gccgtatctg tggcacgcat tttcaatatt 6720
gctgatttta gtattagtgg ttacttattt agtacagata aacagccaag cagaaacatc 6780
ttactctatt aaaggattag aagaaaaaaa gcaagaattg aatagtatta tagaagataa 6840
agaacttgaa gcagtttcag cgcgatcttt aaatggaatc gcgcttaagg caaaagaaat 6900
gaatttgcag gatccaaagg atgttacatt cataaaaata ggattaagca cagttgccgt 6960
gagcgaagag ctttctccat aacatgactt catattcatc atcaaaaaag agcaattcag 7020
ctacgcgcgc gaaattcata attggcgcgg tttttatttt tggcgttatt ttgatttacc 7080
gcttagctga tttacagctt atcaatactc aagaaattca ggcatctgcc gcgcgccagc 7140
agtcaacagt gcgcatcctt ccagctgaac gaggcaagat tttttacaag gagagaatag 7200
gtgatgaaga atttccagtc gcgactaata gatcatataa ccaggtattc attattccaa 7260
aagacataca ggatccaatc aaagccgcgg aaaagctatt gcctttggtt gagccatatg 7320
ggcttgatga agaaacatta ttattccgat taagcaagca aaatgacatt tacgagccat 7380
tagcgcataa attaacagat gaagagcttg agccatttat tgggcttgat ttaattgggc 7440
ttgaatcaga agatgaaaaa gctaggtttt acccggacgc tgatttgctc gcgcatataa 7500
ctgggtttgt cggggtttca gaacaaggca aggttggtca atatgggctt gagggatttt 7560
ttgaaaatga gctcaaagga aaggacgggc ttattgaggg caaaacagat atatttggca 7620
ggcttataca aacaggaact ttaaaacgca cccaaggcga gccaggagat gatttattat 7680
taaccataca gcgcactttg caggcatatg tgtgcagaaa attagatgaa aaaattgagc 7740
aaataagagc tgctggcgga tcagtaataa ttgtgaaccc agatactggc gctattctcg 7800
cgatgtgctc ttcaccatca tttgatccga ataattataa tcaagttgaa gatattagcg 7860
tatacatgaa tccagcagtg agctcaagct atgagccagg atcaattttc aagccattta 7920
caatggccgc ggcaattaat gagaaagcag ttactagcga tacaacatat attgatgagg 7980
gagtggaaga gatcggcaaa tacaaaatcc gcaattctga caacaaagcg cacggggaag 8040
ttaatatggt aactgtttta gatgaatcat tgaatactgg cgcgattttt gtccagcgtc 8100
agattggaaa tgagaagttc aaagattatg ttgaaaaatt cggatttggc agaacaacag 8160
atattgaatt aggaaatgag gtttctggaa atatttcttc attgtataag gatggagata 8220
tttacgcggc aactggctcg tttggccaag gaattactgt tacgcctatt cagatggtaa 8280
tggcatatgc ggcgattgct aatggaggaa aattaatgca gccatatctt attgctcagc 8340
gacaaagaca ggataaaact attgtaactg agccagttca aattgatgag ccgatttcag 8400
tgcaggcctc aactattata tctggaatgt tggtgagcgt ggtgcgtgct gggcacgcta 8460
tatctgctgg agtggaagga tattatattg ccggcaaaac tggaaccgcg caggtcgcgg 8520
aaggcggagg gtatggaagc aagaccattc attcatttgc cgggtttggg cctgttgatg 8580
agccagtgtt tgcaatgctt gtgaaattag attatcctca atacggcgca tgggcagcta 8640
atactgcggc tcctttgttt ggcgaattag ccaaatttat actacaatac tatgaaatac 8700
ctcctgatga ggcgatataa ataaaatatg aaaaaaataa taattacaat tttacaaact 8760
ctggccaaaa gagttattta caaatataag cccaaagtgg tggctattac tggctcagtc 8820
ggaaaaaccg cgactaagga ggcagtgttt gctgtattga ataagaaatt gcaagtgcgc 8880
aagaatgaag gcaattttaa cacggaaatc gggttgcctt tgacaatcat tggcttgcaa 8940
aaatcaccag gcaaaaatcc attcaaatgg cttgcagtgt acgcgcgcgc tattggcctt 9000
ttaatcttta ggattgatta tccaaaagtt ttggttcttg aaatgggcgc tgataagcca 9060
ggagatattg ctgaattaat aagtattgct aagccagaca ttggcataat taccgcgatt 9120
agcgctgttc atacagagca gtttaatagt attgctggcg ttgtgcgtga aaaaggaaag 9180
ctctttcgcg ttgttgaaaa ggatggttgg attatcgtga ataacgaccg atctgaagtt 9240
tatgatatcg cgcaaaagtg cgacgcgaaa aaagtatata ttgggcagtg cgctgaatta 9300
tctgataaca cccctttttc agtatgcgcg tccgagattt cagtgagcat gtcagaagct 9360
caagaaaccg gcattgctgg cacttcattt aagcttcata ctgatggaaa ggttattccg 9420
gttttgatga aaggaattat tggggagcat tggacatatc ctgccatgta cgcggcagct 9480
gttgcgcgca ttcttggggt tcatatggtt gatgttactg agggtttgcg cgagattaat 9540
cctcaatcag gaaggatgcg agttttagct ggcattaaaa aaacaatttt aattgatgat 9600
acttataatt cttcgccaaa cgcggctaag agcgcggttg atactttagc gttattgcgt 9660
attggaaggg agaaatattg cgtgtttggg gatatgttgg agcttggttc tatatctgaa 9720
gaagagcatc aaaaattagg catgcttgtc gcgcgcgagg ggattgatta tctgatttgc 9780
gttggcgagc gcgcgcgcga cattgcgcgg ggcgctataa aagcaaagat gccgaaggat 9840
catgtgtttg aatttgataa tactaaagat gctgggctct ttatccaaaa gcgtttggag 9900
caaggggata tggttctgat taaaggttcg caaggcgtgc gcatggagcg cgtgaccaaa 9960
gagattatgg cgcatccgga aaaatcaaaa gaacttcttg tgcggcaaag taaagaatgg 10020
ttgagtaagg cctagtgcgt atttttgata atttcctcca cttcttccgc attttctgca 10080
tccatcaatt tcacgcgcaa ttgctttgcc ccatcccagc cagaaacata ggccttgaaa 10140
tgttttttca ttacagcgaa tgatttgtgt ttgataagtt tttcgtagag tttggcgtgc 10200
tctattaaaa cgcgcaattt gttatctttg ctgggataga aaacggagaa aacggtgtca 10260
agagtcgttt tctgtaaaaa acgactcctg acaccgtttt ctttgaagaa ccacggattg 10320
ccgaaaattg cgcggccgat cataacgcca tcaacaccgg tctcccgggc tttttgatgc 10380
gcatcgtcta aatacgaaac atctccattc ccgataataa gcgtcttggg cgcgattttg 10440
tctcgcatct gaataacgct tttagccaaa tgccatttag caggaacgcg ggacatttct 10500
tttctagtgc gccagtgaat cgtcaaagcc gcaatgtctg tcttcagaag aataggaatc 10560
caggtatcaa tttcattttt cgtatatcca atgcgcgttt taacagaaat tggcaatttt 10620
ggcgcgcctt ttttggctgc agcaatcaaa gcgcgcgcta aatcagggtt tttcatcaaa 10680
ccagccccag cgccttgctt ttcaactttc cggtccgggc atcccatgtt aatatctaat 10740
ccatcaaaac ccaaatcctg aattatgcga gctgtttttt tcatattatc tggatttgct 10800
gtaaatactt gcgcgacaat aggccgctct ttcgcggaaa atttaagatt tttaagaatt 10860
tcatctttgt cgccaagagc aatgccatcc gcggacacga attcagtcca cattacatct 10920
ggcttgccat actttgcgat aatccgccta aaagccgcgt ctgtcacgtc agacatagga 10980
gccaaacaga agaatggttt tttgagttgt tgccaaaaat tattcatgtc atcttgcgct 11040
tatttgtcat cccgaggctt aattatatat ttttagaaaa taggatgtgg taaacggatt 11100
atataagtgt aatagtaatg ccacacaagc cgagaggatc tcgtctttaa gagctcgaga 11160
tgacaataca aggcgagaga atctcgcgac taataactat gcttattatc aaataaatcc 11220
ttccaatcag aattgaattt gtttataagc aacaccttat ttctgtggct tagttttttt 11280
agcttctttt cgcgctcaat agcatacgag atattgtcaa agtgttcata atacaccagt 11340
ttatcagtat tgtattttga agtaaaccct ggtatttttt tatttttatg ttcccaaatt 11400
cttctggata atgaattgca tactccggta taaaataccg tatgtcgtat gtttgttgtt 11460
atatatacat aaaagttata ttgattttgt cttggcatgt ttttgtttca taagatcctc 11520
tcggcctgca aggatttttg ttttggactc catgattcgt ttaccacata ttcgatatta 11580
tgtagtattg taaggtctcg ggatgacagg taaaaggcat gggaatggca tctaaatctc 11640
ctcctttttc tcatgcacat aattcatcca ttcctcaatc acttttataa acgccttgaa 11700
cggagcctct ataataaaat ccaacgcaaa aatgaaaatg ttaatttgcg cgaaccgcgt 11760
ggacatccat ttgccagcat gcagaatcgg aattgtaaaa aacgcccata agccccggat 11820
aaatccttgc tttgggggca ggacaatcat ttcctgattt gactggcgga tgcggtacgc 11880
gaataaggaa acaaacgaga ggaacaagag aaagataaaa atgccgataa acgtgaaatt 11940
cagcgcgatc aaaatataaa tcatcaaacc gaacgaaatg ccaaatagca ttccgtacaa 12000
caaagtaaac accgcgcgca ggaaaaagct acgcttgcta gatttgcgca tctgaataat 12060
ttcgccttga ttttggataa tatgatttat gccacttatc atttgattgg tgttttcttc 12120
atcaggcagt ttagttgaga gtgcgataag cgcgagcaag gcaggcggaa aaattaaatt 12180
aatagccaaa ggcatataat caattttgtg aatcaataaa taatcaacag gaatttccag 12240
aaccacggct aataaaaatt tagtaattac caaataaata atacttcgct taatgcctcg 12300
gtgtaaagaa gcgcgggatt tttcgtactg cttttggcag atggcgcgca ccctttgctc 12360
aaattcatgc ccggtgttca tatcagacca ggcttttcct ggatcctgcg caatcgcgtc 12420
ttgcaaaata gtgaaatatc caacgtattt cctgaacaaa ggagcgagtt tttcttttat 12480
aggcgagttt aaatcttgcg ttattgtaga gtgtatttca ttcaaatgct ctcctatttc 12540
ccgtataaga tcgtgatttg cgcgcgtcca ttctggataa taggtcaata gcaaatgata 12600
tccaatagtg tcattgtcgt ttttatatag aattcggcta gtggctatat aaatctgctt 12660
caaacgttct cgatcattaa tttcatcctc aattctaacg cgctcctgaa gatattcata 12720
catggcattg attgacgcgt gcattacata tggtggcata aggaattcgt caatttccgt 12780
tgctgctatg ccagagagcc aaaatgaaag agaagaagaa tcattgatat cttttatagg 12840
agcgtgacca agcaaggtga aatatttttc aaatataata tcaagttctt ttatttttcg 12900
ttcagggata gtattgttcg gaaggtaccg cgcgtggata agttctgaaa ttagattttc 12960
tgagatatta tttttatggc ctgatgaaat cattctacgc aaaatgcgct caatcgcgtt 13020
tctgcggatt aaatgttctt ctttatattc aaccgcgttg cgcatgcgct cgtatataaa 13080
agttgcctgt ccagcgcggg tggttatgga gatttttggt tcggtcgggt ctgtatcttt 13140
tgagcgcgct tcttccctgg ccgcgcgcac gattcgctgg attgtttctg gtatttgcat 13200
ttctttatac tagctgattt tgcttgtttt ttcaattgtt ttataaaaaa agtgcccgga 13260
atgcaaattg cgcattccgg gcttggggag acagggcagg ggatgccctg tttggggctt 13320
actgccggtc ggtcagatca cgggctacta ccgccgcaat cctcgccacc gcccaggcag 13380
taacgagacg actctttttt tacctgattg acgaccgtac cgtcgagcag gacgttatcg 13440
ccgagcagat tcgctgtatt gatgtccgta gccgcggtag ccgcgatagt cgtggtcgtc 13500
gtcgtggttt ccgtagtggc tgtgccgacc gcgctgtttt cgccgccctc ttttgtcatc 13560
cgaatgacat catcgccatt cagagtcgtt tcctcgctga ccgggttgtt ggtcccgcag 13620
ccgatcattc cgatcagggc gaccagcgcg atacagaaga aaatcatgaa atacttcatc 13680
gggtgctcct ttttatgagg tttttggaaa acgatatcac gctttgtatt attcacctcc 13740
cttccaaagc aagcgcaata tcggtctttt ttactatttt aagaacggac gagcatctta 13800
tactatttta aaaataatgt caagagtgtt aacaaataca aaaaattgac tcatataaaa 13860
acggtgtcag gagtcgttt 13879
<210> 26
<211> 7532
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<220>
<221> features not yet classified
<222> (2669)..(2692)
<223> n is a, c, g or t
<400> 26
tacctaatcc tgggcgtctt tggtgtatta tgcacttgcg gttagaatac acccgaacat 60
aattgacaaa gaccataaaa tgtcttatta tccttttaga aaaatcgtgt tcatttataa 120
tatatacata ccccaattcc aaggatttct tgactggcag cgggcttggt atcctgcgaa 180
acacagccag tttgggaaac ctgggtcttt atttttaaag acacaggaat tcccgcgtct 240
tttgccttgg aacaccaacc acctattgcg ccttttttct cattttagca aaagtggctg 300
tctagacctt caggtggaaa ggtgagagta aagacattgg gcctgcacga ttcatgggcc 360
ggtctagacc ttcaggtgga aaggtgagag taaagactct accgcgtcca gcactatctt 420
ggtccgtcta gacatttaga tggaaaggcg agagtaaaga tgcgcgaaag acggctacat 480
tgttccacaa ggcagaaagg attagccgcc tactgcttga acatccgcag tatttaaccc 540
attttcccaa aggaggaaaa tcatgggtac gcagattatc aagcggatag accttgactg 600
gcagtcaagt tttccgcacg ccaagatgct ggtgaatcag gaagcatcat ttaaccacat 660
tgcagagtcc ggactcacgg cgctcataga agcgccgacc ggatccggaa aaaccgcgac 720
tggctatacc tttctttcgg ccatagccct tcgcgcgcgc aagagtccgc aatttaaggg 780
ccggctcgtg tatgttgctc cgaataaagc attagtcggg caggtgcaga acatgcatcc 840
agatgtgaaa gtcgcgcttg gtcgcaacga gcatacatgc tcgtattacg atggaattca 900
tcaagcagac gaagtgccgt gttcgttttt ggttcgctcc ggccggtgtg gccactatgt 960
gaatcaagaa accggcgcaa cacttgaatt tggagctgaa ccatgttcgt attatcagca 1020
aatctatgag gcaaagcgcg gcatcggaat tctggcatgc actgacgcgt tttggctgtt 1080
cacgcatttg tttaatccaa agcagtggcc tcagcccatg ggtttggtat tggacgaggt 1140
tgaccgcttg gctgatattg ttcgcaggtg cttgtcatac gaaatttctg attggcgcat 1200
tgagcgcgcc attaatttgc ttgaaaaagt cggttcagtt caggtgcagt atctctcgtc 1260
ttttttgcgc accttgaatc gggtggtatc aaaaaagccg gccctggagc ccattttgct 1320
ggatgatgag gagattcgcc aactgtttga aaaagtgggg cgcatcagcg cggatgtcat 1380
caaatccgat ttggacgccg cgattgcgag caacaaggtt gaccctatgg ctgagcgcga 1440
aatccttaag cagatagaaa cactttgctt tgacatcagc cggtatgtgc ggagtttggg 1500
atacgcgctt ccgaatcgca gaggcaaggg tgatgaacgc aagcgcgatg ctcctctttc 1560
gtacgcgtac gcgtatcata aatccgagcg cgacgctggg gcgcatgtgc agaacaaagt 1620
tgtggtgtgt tcctattggg tgcggcctct tatccgcaag ctctttggaa agaacacgct 1680
cgcgtattca gcgtttgtcg gggataaaac gattttggat tatgaggctg gagttgattt 1740
tccattaatc tctctgcggt cccaatttcc ggcgagcaat gcgcgattgt atgtgccgag 1800
cgattctcca aatttggcat ataatgagca ggatgtcggt gacatggcta agactttgcg 1860
ccatattgcc atatcaactc ggcggtttgc cgagcgcggc tttcgttctc tcttgctgac 1920
tgtttcaaat agagagcgtg aattgctgta cgtcgcgtgc gcggaactga aagggctgga 1980
tgctataagt tatggcagtg gcgttactgc gcgcgcggcc gcggatagat tcaaagaagg 2040
agaaggggac gctcttattg gcgttttgtc gcattatggc actgggctgg atttgccagg 2100
caagattgct aacattgttt ttctcctgcg gccgaatttt cctccaccaa aagatcctat 2160
ggcacagttt gagattcgcc gggccgagcg catcaaaaag tcgcattggc ccgtgtggta 2220
ctggcgcgcg taccgagagg ctctgaatgc ccagggacgc ccgatacgaa gcgccgatga 2280
caaaggggtc gcgttcttta tctcccagca attcaagaag cgtttattca acattttgcc 2340
ggagcatctt gagagcgcat atcggagccg cctcacatgg gaccagtgcg agaaagacgc 2400
gctgaaactg tttgaggaat aggggtatta tttcgttgtt tttatggccc ggatggtgtt 2460
ttttatacat catccgggtt tttatgttga tttgatgcga taatcatgat ttttgcgtgg 2520
tattgacaaa cattataaaa aacgctatta tccgcgtaca aaacctataa atcgttcatt 2580
tataatatat acatacccca attccaagga tttcttgact ggcagcgggc ttggtatcct 2640
gcgaaacaca gccagtttgg gaaacctgnn nnnnnnnnnn nnnnnnnnnn nngccagttt 2700
gggaaacctg ggtctttatt tttaaagaca caggaattcc cgcgtctttt gccttggaac 2760
accaaccacc tattgcgtct ttttcgctca ttttagcaaa agtggctgtc tagacataca 2820
ggtggaaagg tgagagtaaa gacatggcct gaatagcgtc ctcgtcctcg tctagacata 2880
caggtggaaa ggtgagagta aagaccggag cactcatcct ctcactctat tttgtctaga 2940
catacaggtg gaaaggtgag agtaaagaca aaccgtgcca cactaaaccg atgagtctag 3000
acatacaggt ggaaaggtga gagtaaagac tcaagtaact acctgttctt tcacaagtct 3060
agacatacag gtggaaaggt gagagtaaag actcaagtaa ctacctgttc tttcacaagt 3120
ctagacctgc aggtggtaag gtgagagtaa agactcaagt aactacctgt tctttcacaa 3180
gtctagacct gcaggtggta aggtgagagt aaagactttt atcctcctct ctatgcttct 3240
gagtctagac atttaggtgg aaaggtgaga gtaaagactt gtggagatcc atgaacttcg 3300
gcagtctaga cctgcaggtg gaaaggtgag agtaaagacg tccttcacac gatcttcctc 3360
tgttagtcta ggcctgcagg tggaaaggtg agagtaaaga cgcataagcg taattgaagc 3420
tctctccggt ccagaccttg tcgcgcttgt gttgcgacaa aggcggagtc cgcaataagt 3480
tctttttaca atgttttttc cataaaaccg atacaatcaa gtatcggttt tgcttttttt 3540
atgaaaatat gttatgctat gtgctcaaat aaaaatatca ataaaatagc gtttttttga 3600
taatttatcg ctaaaattat acataatcac gcaacattgc cattctcaca caggagaaaa 3660
gtcatggcag aaagcaagca gatgcaatgc cgcaagtgcg gcgcaagcat gaagtatgaa 3720
gtaattggat tgggcaagaa gtcatgcaga tatatgtgcc cagattgcgg caatcacacc 3780
agcgcgcgca agattcagaa caagaaaaag cgcgacaaaa agtatggatc cgcaagcaaa 3840
gcgcagagcc agaggatagc tgtggctggc gcgctttatc cagacaaaaa agtgcagacc 3900
ataaagacct acaaataccc agcggatctg aatggcgaag ttcatgacag aggcgtcgca 3960
gagaagattg agcaggcgat tcaggaagat gagatcggcc tgcttggccc gtccagcgaa 4020
tacgcttgct ggattgcttc acaaaaacaa agcgagccgt attcagttgt agatttttgg 4080
tttgacgcgg tgtgcgcagg cggagtattc gcgtattctg gcgcgcgcct gctttccaca 4140
gtcctccagt tgagtggcga ggaaagcgtt ttgcgcgctg ctttagcatc tagcccgttt 4200
gtagatgaca ttaatttggc gcaagcggaa aagttcctag ccgttagccg gcgcacaggc 4260
caagataagc taggcaagcg cattggagaa tgtttcgcgg aaggccggct tgaagcgctt 4320
ggcatcaaag atcgcatgcg cgaattcgtg caagcgattg atgtggccca aaccgcgggc 4380
cagcggttcg cggccaagct aaagatattc ggcatcagtc agatgcctga agccaagcaa 4440
tggaacaatg attccgggct cactgtatgt attttgccgg attattatgt cccggaagaa 4500
aaccgcgcgg accagctggt tgttttgctt cggcgcttac gcgagatcgc gtattgcatg 4560
ggaattgagg atgaagcagg atttgagcat ctaggcattg accctggcgc tctttccaat 4620
ttttccaatg gcaatccaaa gcgaggattt ctcggccgcc tgctcaataa tgacattata 4680
gcgctggcaa acaacatgtc agccatgacg ccgtattggg aaggcagaaa aggcgagttg 4740
attgagcgcc ttgcatggct taaacatcgc gctgaaggat tgtatttgaa agagccacat 4800
ttcggcaact cctgggcaga ccaccgcagc aggattttca gtcgcattgc gggctggctt 4860
tccggatgcg cgggcaagct caagattgcc aaggatcaga tttcaggcgt gcgtacggat 4920
ttgtttctgc tcaagcgcct tctggatgcg gtaccgcaaa gcgcgccgtc gccggacttt 4980
attgcttcca tcagcgcgct ggatcggttt ttggaagcgg cagaaagcag ccaggatccg 5040
gcagaacagg tacgcgcttt gtacgcgttt catctgaacg cgcctgcggt ccgatccatc 5100
gccaacaagg cggtacagag gtctgattcc caggagtggc ttatcaagga actggatgct 5160
gtagatcacc ttgaattcaa caaagcattt ccgttttttt cggatacagg aaagaaaaag 5220
aagaaaggag cgaatagcaa cggagcgcct tctgaagaag aatacacgga aacagaatcc 5280
attcaacaac cagaagatgc agagcaggaa gtgaatggtc aagaaggaaa tggcgcttca 5340
aagaaccaga aaaagtttca gcgcattcct cgatttttcg gggaagggtc aaggagtgag 5400
tatcgaattt taacagaagc gccgcaatat tttgacatgt tctgcaataa tatgcgcgcg 5460
atctttatgc agctagagag tcagccgcgc aaggcgcctc gtgatttcaa atgctttctg 5520
cagaatcgtt tgcagaagct ttacaagcaa acctttctca atgctcgcag taataaatgc 5580
cgcgcgcttc tggaatccgt ccttatttca tggggagaat tttatactta tggcgcgaat 5640
gaaaagaagt ttcgtctgcg ccatgaagcg agcgagcgca gctcggatcc ggactatgtg 5700
gttcagcagg cattggaaat cgcgcgccgg cttttcttgt tcggatttga gtggcgcgat 5760
tgctctgctg gagagcgcgt ggatttggtt gaaatccaca aaaaagcaat ctcatttttg 5820
cttgcaatca ctcaggccga ggtttcagtt ggttcctata actggcttgg gaatagcacc 5880
gtgagccggt atctttcggt tgctggcaca gacacattgt acggcactca actggaggag 5940
tttttgaacg ccacagtgct ttcacagatg cgtgggctgg cgattcggct ttcatctcag 6000
gagttaaaag acggatttga tgttcagttg gagagttcgt gccaggacaa tctccagcat 6060
ctgctggtgt atcgcgcttc gcgcgacttg gctgcgtgca aacgcgctac atgcccggct 6120
gaattggatc cgaaaattct tgttctgccg gctggtgcgt ttatcgcgag cgtaatgaaa 6180
atgattgagc gtggcgatga accattagca ggcgcgtatt tgcgtcatcg gccgcattca 6240
ttcggctggc agatacgggt tcgtggagtg gcggaagtag gcatggatca gggcacagcg 6300
ctagcattcc agaagccgac tgaatcagag ccgtttaaaa taaagccgtt ttccgctcaa 6360
tacggcccag tactttggct taattcttca tcctatagcc agagccagta tctggatgga 6420
tttttaagcc agccaaagaa ttggtctatg cgggtgctac ctcaagccgg atcagtgcgc 6480
gtggaacagc gcgttgctct gatatggaat ttgcaggcag gcaagatgcg gctggagcgc 6540
tctggagcgc gcgcgttttt catgccagtg ccattcagct tcaggccgtc tggttcagga 6600
gatgaagcag tattggcgcc gaatcggtac ttgggacttt ttccgcattc cggaggaata 6660
gaatacgcgg tggtggatgt attagattcc gcgggtttca aaattcttga gcgcggtacg 6720
attgcggtaa atggcttttc ccagaagcgc ggcgaacgcc aagaggaggc acacagagaa 6780
aaacagagac gcggaatttc tgatataggc cgcaagaagc cggtgcaagc tgaagttgac 6840
gcagccaatg aattgcaccg caaatacacc gatgttgcca ctcgtttagg gtgcagaatt 6900
gtggttcagt gggcgcccca gccaaagccg ggcacagcgc cgaccgcgca aacagtatac 6960
gcgcgcgcag tgcggaccga agcgccgcga tctggaaatc aagaggatca tgctcgtatg 7020
aaatcctctt ggggatatac ctggagcacc tattgggaga agcgcaaacc agaggatatt 7080
ttgggcatct caacccaagt atactggacc ggcggtatag gcgagtcatg tcccgcagtc 7140
gcggttgcgc ttttggggca cattagggca acatccactc aaactgaatg ggaaaaagag 7200
gaggttgtat tcggtcgact gaagaagttc tttccaagct agacgatctt tttaaaaact 7260
gggctgctgg ctatcgtatg gtcagtagct cttatttttt tacttgatat atggtattat 7320
ctcaataata tgcatctctt catagataca acagaaaaag aatcatttga tattgctttg 7380
attgatgatg agcgcgttat caaaaagaag cgaatcaaat caatccgcca acattcggaa 7440
aagcttttga aatcaattga cgcgcttttg ttgtccgcaa aatcatctct gaaagatata 7500
caaggcatca tcgcggtaaa aggccctggg tc 7532
<210> 27
<211> 16262
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 27
cggaaaggcg gcccagaaac gggttgacca aattttgtgt tcagtggtga tgatggcgat 60
gycgatgtcg ctgcttacgc gggcgttgtg caggccgatg gagtcggaaa tcagaatggc 120
ctggacgtgg gggagggtgg ccagccagcg caggtaatga tgccgtttgc gcagtttggt 180
ttcggtgagg ccgtagcggg ccaggcgcag ggggacgagg tgggagcggt ttttgaggtg 240
ataaaagcca tcggtgtgag tgatgtgtgg gtgagtggcg agggcggcag tgagttcggc 300
gggcgtggtg gtgtgttgcc acagccagcg ctggagttca ccggcggtca gcggaaattc 360
catgaggtca aagtagctca tggtggcggt gatggcgtgc tcgagttggg ggcgacaagc 420
gggtttcatg ctcctattat agcagatttt cagagttgga tttttgctgt tttttcttgg 480
ccggagtacc cgttttttta ttgtttgaaa aatcagggct taaaaatttt aggtgagagt 540
ctttttgcta tatccaagaa gaaattttgc catatttttt ggtcaatttt tattttcatt 600
cttggtaggt cttttaattc ggtcactttt aatagttggc ttcccatttg tactgggtcg 660
atgtgccagt caaattttat cttggccttt tttatcagat catcgaatgt ccattctttt 720
ttttggagaa tacaatacag gtctataaaa tcccgtgaac gtggtttttg atacatggta 780
aatactttgt tgacggcaat gtctaacagg ctgtcaattt tcagaccgtt tgttttcaag 840
cccttttgta taatcggaaa ggggtagtag gtaaattccg ttttgataac atccttgtcg 900
atatggataa aaaacagatt ccggttgaag ctctgctgaa aatctatctt tttaaatttt 960
acctttttct gtattttttt gagtatagta aaaatatccg tagaatcgaa ttctttttcc 1020
gaaaagaaat ccaaatcttc ggataaccga tgatgcagat aaaattctgc gagagcggtg 1080
ccaccggaaa gataaaattt ttcccggatg agtttttctt gtgatagctg ctgaaggaga 1140
gcgctttggt tggttgtcag gattgttggc cccataggag aaaagataaa aattttttct 1200
tacctgggtc gaggtccagc ctatcccagt actttttcag ttgacttcgc ttgatttttt 1260
ttccacccaa gccaaaattt accatctgtt cgagtttcca gatagtgtat ttttctttat 1320
ttttttttag ctctgtgagg tcaatattcc aattgtacat ggctgtattt tagcatatag 1380
cagcttaaat ttcaatttta ttttagccaa aatagtagaa tggtggcggt gttagatgaa 1440
tatttcgtag ttgtcttttg atatcacctg gaattttgcg tcttggtagg catcgctgaa 1500
tgcctttggc gctcgggctg attttttccc ccatttgaat tcaaatgccc tgagttttcc 1560
atttttttct tccaagtaat caatttctgc cttttggtgc gtgcgccaga aatatctgtt 1620
taccgaattt tcagtatttt ccaatttttt cattctttct acgaggagaa aattttccca 1680
gagccccccg acatcttcgc gtaaggagag aggattgaga ttattaatga gtgcgttgcg 1740
aatgccgaga tcatagaagt aaattttccg cagttttttg agttcgttgc gaatgtttcg 1800
actgtatggt ttcaaagtaa aaacaatgaa agccttctca agaatgccta tataattagc 1860
cacggttttt tgatcgatat tgagcaggtt ggacagttcc cggtaggaaa cttctttacc 1920
tatctggagt gccagcgcct gcaggagttt gtcgagtact tcaggattgc ggatgttctg 1980
aaatgccaga atgtctttat ataaataact tctggtgata ttgcgcagca attcctcagc 2040
ttccgatgat ttgaggacaa tttccggata cataccgaaa atcattcttt gttccagtgt 2100
tcttttttct tcctgtatat tctgtatctg cctgagttct tccagtgaaa agggatagag 2160
gataaattca tattttcttc ctgtgagcgg ctcaacgatc tgattagcga gatcaaaaga 2220
agatgatccg gtggcgataa tctgcatttc cggaaagttg tcaacaagta gtttcagtgt 2280
cagtccgata ttttttactc tttgcgcctc gtcaaggaag ataatgtttg catctcccag 2340
ataggccttg agttcggtcg aagttttgtc ggtaagagcg gtgcgaacgt ccggttcatc 2400
acagttgagg tagcgagagg tgtggctcgc aagcttttcc tcaagggctt tgaggatagt 2460
tgtcttacct acctgtctgg cgccatagat aataataacc ttttttttga aaaggtgttt 2520
ttcaataata ggctgaaggt ctctgctaat ccgcatagta tatatgattt agatgattat 2580
actcctctca ttatatatta aaatgcggat ttagtcaatg cattctacta taaatgcttt 2640
atattagcca aaatgtcaga aaattgatat ttttgaccat ttttactata tttcggacac 2700
cttattttgg ttctcgattc atgtatcact gcccgctgta ggttgcgggc caatttttaa 2760
aggagaattt tatgatgcct gttgtgctct ttataaaatc gttttttttg attttccata 2820
gttctctctt gtagggactt gaaataaaat gttttttata ctactatagg cctagttcct 2880
taacaatatt ttgcttactt taaagcgaaa ataggtaagg cacacctata ccataaggat 2940
ttaaagactc tttggcgaca gctttccacc gaccctgagt agttaaagac tgacgtatca 3000
tgtcataaca ccaacatttc tagatataaa gacgcgacag ctttcaggcg ataccgacgt 3060
ttctagacat aaagactttg gataaaccat aatgcaccga cgttcctcga tataaagacc 3120
cgttgtggtc ccaaaattca ccgacatttt aagaggtaaa gacaagtgca cctgagtcgc 3180
tgcaccgaca ttcccgatat aaagactgtc gctcaacccc aaaacaccga cattcccgat 3240
ataaagactc gccctagatc ttcttgcacc gactgtataa ggaataaaga cgtccgacca 3300
cgtgcaccac accgactcgt gtgaacctaa agactcaccg ccgcactacc ctcaccgact 3360
atatcaaacc taaagattgg taacttgttt gtctgacacc gactgtatca gagataaaga 3420
ctgttttcca tgcgttgcgc accgacgttc ctagatataa agactatcat tatcggggaa 3480
accgccgact gtactagata taaagacccg tcgctttgtt tgaacgccga cattcttaag 3540
aaataaagac gtggtaagag tagtgtttca ccgacattcc tttatgtaaa gacaatgaat 3600
agtctttttc acaccgactg tgaatgtatg aaatctaaag acctgaaagt gcaatgcaaa 3660
tgctgacagt gttagtctaa agacaaagta ggaatcagga tccgccgact aaataaaact 3720
taaagacaag ccagatatcc aggccacacc gacgtcccta gatgtaaaga ctagtgactc 3780
catgctatgc accgacattc cgaggcctaa agacagagag gctaacattt gtgcaccgac 3840
ccttcaagag gtaaagacat agggaacacg ctgaatcacc gacgttccta ggtatacaga 3900
cgaaatgcaa tgaaaaacgt caccgacatt tcaagacgta aagacccaag aatctttgcc 3960
cgtccccgac attccaagac gtaaagacta gccaaaacct ccagaccccc gacattccaa 4020
gacataaaga caagcgctcc aacatgtgtc accgacatta ttccgcccca gcatcgatca 4080
ttttgacttg gaaagagaca ttcttctttc caagttttta ttttgagcaa aatttgactt 4140
tttattggtt atcctttatt actatgggtg cttagtgcat cgaaaggtgg gctaagcaca 4200
acaaaagtgt tctttttatc ttaaacttga ggttttagac ctcatcaacc caaaaagggt 4260
gtaacatcat gaaacatcag aaacatcaag aaaatgcagt ctctgacgaa acatctaacc 4320
cttccgccga gccatggatt tttgattttg agaaatggtg gccctacgat acgtatccca 4380
ccatgcatca taatcaatcc gaggctttca aattaattcg aagtgtccta cggaaagaag 4440
gtgtgggtaa aaccatcctt gaacttccta ccggatctgg gaagacggtc attgggatcg 4500
tgtatctcct tactttgcat cacaagatgc aggaaggcga gattcctaca gctccgctgt 4560
tttacatcgt gcctaataag gcgctggtaa agcaggtgtg tgaaatgttc ccagatatca 4620
cctttggtgt gtatggccgg aatgaatatg attgtctgta ttaccagccg aaagaaacgt 4680
ttacagccga tcagattccc tgtttggttc taccatgcaa gcatcgggtg aaccaggatg 4740
atggaactac gcaagaatct ggtgctgagc catgtccgta ttatttggtg aagtataagg 4800
cgaagcagct gactcagaag gctcgaatca ttgtctgtac cgcttctttt tatcttttca 4860
ctcaactcat tcatgagtgg ccgctgcctg gaggactggt tattgacgaa acggatgagc 4920
tggctgaaat ttttcggcgg gcgctctcca cgaaagtcag tgattggcac ctgagtcagt 4980
gcgtcacgat gatgcggcaa agtgggatgg atggtgaagc ggatctcatg cagaaatttt 5040
atgacgccgt ggttagaatt gtcggagtca agtctcctca aaagcctacg cttttgaaga 5100
aacacgaaat cagtgagctc ctcgaggtag ttcctcagtt cgacaccaga aaactgaaaa 5160
ggcgtataaa tgccctcatc aaagacggaa agattgatgc agagaattcg cgtgaagtgc 5220
tgaatcagct gactgtggtt gccaatgatc tgaaacgata cgccgtttcg cttgcctatg 5280
ccttgcctga gggtgaccgt agggccctta attacctcta tgcatattat gaaggaccgg 5340
atgatcttcc agggaagaag aaagttcggt gtgtcattaa tatctgcaat tggtacatgc 5400
cgcctctcat taggcggatt ctctcgcctc ggaccctggc atatacagcc actatcggtg 5460
agtatagtga ctttgcctac gataccggaa ttgaaggttc gttttatacc atgaactctg 5520
attttccggt ggagaactcc cgtatcttca tgcccgatga cgttgccaac ttggctgtga 5580
aatcggtcaa accaggcgac aaagatcgga tgatgcgtct gattgctaag tcagctcgtg 5640
aatttgcgga tcaaggtcat cggagtctgg tggtggtcat ttccaatgag gagcgttcaa 5700
ggtttctgga aattgttgaa gaatacagtc tcaaaatgct cacctatgga aatggtgttt 5760
cggcgcgcga ggctattgca aggtttcagg ctggtgaagg ggaggtgttt gtgggaacgg 5820
cagccaactg ttctcatggc ctgaacttcg ataagcagac tgctccggtg attttttttc 5880
tgcggcctgg ttatccggtg cagggagatc cactcgcaga tttcgaagaa gagcggatgg 5940
gaaataagag gtggggtgtt tggacctggc gggttatgcg gcagttactt caggtgcgtg 6000
gccggaatat ccgcagtccg gaggatttgg gagttatttt cctgatgtca ggccagttta 6060
aacgtttcgc agggaaggcg attccggggt ggcttatcaa agcctatatc tccggcaaga 6120
aattcagggc ctgtgtgtca gaggccaaaa agctcctgaa aaagtcttaa ttaagccaaa 6180
aaaattgttt ttttgtctct gtccttgaca atataattga actttgctaa gttagggtcc 6240
cctgttagag gaaacagcag caaagggaag tctgagcgcg agaggcctta gtctttagag 6300
ttcttaataa gaacttttct gggcccaaag tgcgctttag tctttattcc ctgagctctg 6360
tctactttga tggggccttt ttttattcaa atttttttat tttcgctacg tcttgacaaa 6420
aatatagatg tatactatat ttcgcccgag gtaataaaga aaatagcggt aaagctataa 6480
gattttatta tttcatttat aagaactttg aaaaccgaca ttatcaaaaa ccatgcaaag 6540
ccctttagat gagggcagga ggttgaaaaa atgaagagaa ttctgaacag tctgaaagtt 6600
gctgccttga gacttctgtt tcgaggcaaa ggttctgaat tagtgaagac agtcaaatat 6660
ccattggttt ccccggttca aggcgcggtt gaagaacttg ctgaagcaat tcggcacgac 6720
aacctgcacc tttttgggca gaaggaaata gtggatctta tggagaaaga cgaaggaacc 6780
caggtgtatt cggttgtgga tttttggttg gataccctgc gtttagggat gtttttctca 6840
ccatcagcga atgcgttgaa aatcacgctg ggaaaattca attctgatca ggtttcacct 6900
tttcgtaagg ttttggagca gtcacctttt tttcttgcgg gtcgcttgaa ggttgaacct 6960
gcggaaagga tactttctgt tgaaatcaga aagattggta aaagagaaaa cagagttgag 7020
aactatgccg ccgatgtgga gacatgcttc attggtcagc tttcttcaga tgagaaacag 7080
agtatccaga agctggcaaa tgatatctgg gatagcaagg atcatgagga acagagaatg 7140
ttgaaggcgg atttttttgc tatacctctt ataaaagacc ccaaagctgt cacagaagaa 7200
gatcctgaaa atgaaacggc gggaaaacag aaaccgcttg aattatgtgt ttgtcttgtt 7260
cctgagttgt atacccgagg tttcggctcc attgctgatt ttctggttca gcgacttacc 7320
ttgctgcgtg acaaaatgag taccgacacg gcggaagatt gcctcgagta tgttggcatt 7380
gaggaagaaa aaggcaatgg aatgaattcc ttgctcggca cttttttgaa gaacctgcag 7440
ggtgatggtt ttgaacagat ttttcagttt atgcttgggt cttatgttgg ctggcagggg 7500
aaggaagatg tactgcgcga acgattggat ttgctggccg aaaaagtcaa aagattacca 7560
aagccaaaat ttgccggaga atggagtggt catcgtatgt ttctccatgg tcagctgaaa 7620
agctggtcgt cgaatttctt ccgtcttttt aatgagacgc gggaacttct ggaaagtatc 7680
aagagtgata ttcaacatgc caccatgctc attagctatg tggaagagaa aggaggctat 7740
catccacagc tgttgagtca gtatcggaag ttaatggaac aattaccggc gttgcggact 7800
aaggttttgg atcctgagat tgagatgacg catatgtccg aggctgttcg aagttacatt 7860
atgatacaca agtctgtagc gggatttctg ccggatttac tcgagtcttt ggatcgagat 7920
aaggataggg aatttttgct ttccatcttt cctcgtattc caaagataga taagaagacg 7980
aaagagatcg ttgcatggga gctaccgggc gagccagagg aaggctattt gttcacagca 8040
aacaaccttt tccggaattt tcttgagaat ccgaaacatg tgccacgatt tatggcagag 8100
aggattcccg aggattggac gcgtttgcgc tcggcccctg tgtggtttga tgggatggtg 8160
aagcaatggc agaaggtggt gaatcagttg gttgaatctc caggcgccct ttatcagttc 8220
aatgaaagtt ttttgcgtca aagactgcaa gcaatgctta cggtctataa gcgggatctc 8280
cagactgaga agtttctgaa gctgctggct gatgtctgtc gtccactcgt tgattttttc 8340
ggacttggag gaaatgatat tatcttcaag tcatgtcagg atccaagaaa gcaatggcag 8400
actgttattc cactcagtgt cccagcggat gtttatacag catgtgaagg cttggctatt 8460
cgtctccgcg aaactcttgg attcgaatgg aaaaatctga aaggacacga gcgggaagat 8520
tttttacggc tgcatcagtt gctgggaaat ctgctgttct ggatcaggga tgcgaaactt 8580
gtcgtgaagc tggaagactg gatgaacaat ccttgtgttc aggagtatgt ggaagcacga 8640
aaagccattg atcttccctt ggagattttc ggatttgagg tgccgatttt tctcaatggc 8700
tatctctttt cggaactgcg ccagctggaa ttgttgctga ggcgtaagtc ggtgatgacg 8760
tcttacagcg tcaaaacgac aggctcgcca aataggctct tccagttggt ttacctacct 8820
ctaaaccctt cagatccgga aaagaaaaat tccaacaact ttcaggagcg cctcgataca 8880
cctaccggtt tgtcgcgtcg ttttctggat cttacgctgg atgcatttgc tggcaaactc 8940
ttgacggatc cggtaactca ggaactgaag acgatggccg gtttttacga tcatctcttt 9000
ggcttcaagt tgccgtgtaa actggcggcg atgagtaacc atccaggatc ctcttccaaa 9060
atggtggttc tggcaaaacc aaagaagggt gttgctagta acatcggctt tgaacctatt 9120
cccgatcctg ctcatcctgt gttccgggtg agaagttcct ggccggagtt gaagtacctg 9180
gaggggttgt tgtatcttcc cgaagataca ccactgacca ttgaactggc ggaaacgtcg 9240
gtcagttgtc agtctgtgag ttcagtcgct ttcgatttga agaatctgac gactatcttg 9300
ggtcgtgttg gtgaattcag ggtgacggca gatcaacctt tcaagctgac gcccattatt 9360
cctgagaaag aggaatcctt catcgggaag acctacctcg gtcttgatgc tggagagcga 9420
tctggcgttg gtttcgcgat tgtgacggtt gacggcgatg ggtatgaggt gcagaggttg 9480
ggtgtgcatg aagatactca gcttatggcg cttcagcaag tcgccagcaa gtctcttaag 9540
gagccggttt tccagccact ccgtaagggc acatttcgtc agcaggagcg cattcgcaaa 9600
agcctccgcg gttgctactg gaatttctat catgcattga tgatcaagta ccgagctaaa 9660
gttgtgcatg aggaatcggt gggttcatcc ggtctggtgg ggcagtggct gcgtgcattt 9720
cagaaggatc tcaaaaaggc tgatgttctg cccaagaagg gtggaaaaaa tggtgtagac 9780
aaaaaaaaga gagaaagcag cgctcaggat accttatggg gaggagcttt ctcgaagaag 9840
gaagagcagc agatagcctt tgaggttcag gcagctggat caagccagtt ttgtctgaag 9900
tgtggttggt ggtttcagtt ggggatgcgg gaagtaaatc gtgtgcagga gagtggcgtg 9960
gtgctggact ggaaccggtc cattgtaacc ttcctcatcg aatcctcagg agaaaaggta 10020
tatggtttca gtcctcagca actggaaaaa ggctttcgtc ctgacatcga aacgttcaaa 10080
aaaatggtaa gggattttat gagacccccc atgtttgatc gcaaaggtcg gccggccgcg 10140
gcgtatgaaa gattcgtact gggacgtcgt caccgtcgtt atcgctttga taaagttttt 10200
gaagagagat ttggtcgcag tgctcttttc atctgcccgc gggtcgggtg tgggaatttc 10260
gatcactcca gtgagcagtc agccgttgtc cttgccctta ttggttacat tgctgataag 10320
gaagggatga gtggtaagaa gcttgtttat gtgaggctgg ctgaacttat ggctgagtgg 10380
aagctgaaga aactggagag atcaagggtg gaagaacaga gctcggcaca ataatttgag 10440
aagtaaaata gttttttaga ttcagtttcg caaaggaggt gatttggttc tttgaagaga 10500
ggtgtcatta tatgtggcat ctcttttcat tttgagagat tttttctaaa aataaaactt 10560
ggaaagaaat agttctttcc aagtcaaaat gatcgatttt aaggaatgtc ggtgaagtga 10620
tttatgaaca aatgtcttta tatttcatat ggtcggtgta agtacgaatg cgagttgcct 10680
ttaggttttt accgtcggta atccacatta ttcacttggt ctttaggctt catagcgtcg 10740
gtattctttt tatatatgca agtctttaca ttgaggaacg tcgatgttca aaccagatgt 10800
gtttgtcttt atacctcgga atgtcggtga agtgatttat gaacaaagtc tttaattttt 10860
acacagtcgg tggctttccg agcaagagta gtctttatat ttagaacagt cggcgtcggc 10920
agtgcttttt ataagtcttt gtatctcatg tagtcggtgc attgtctttg caactgggtc 10980
tttatctctt aatatggtcg gtggaaactc ttgtgggaat ctttatctca agaaaagtcg 11040
gtgtcgcctg aaagctgtcg cgtctttagg tctcatgcag tcggtgtcgg tcaaaagctc 11100
gcttgtcttt atattttata cagtcggtgt aaaggtgagc tggctgagtc tttatccctc 11160
ttaaagtcgg tgcaagaagt atggcggtat gtctttactt gtcgttaggt cggtgttcat 11220
ccgtctctag ggtgtcttta tctttatgaa tgtcggtgta ggtccaaacg atgtatgtct 11280
tacatcagga attcaggaat gtcggggtta ctaatatgca atggagtctt tatgtctggg 11340
aacgtcgtta ttttactctt gcgagattgt ctttactcag gaagtcggag ctcgattgat 11400
tgacattgcg tcttttagat accatactgt cggtgtggac ggctcgcctg atggtcttta 11460
ccttttatac ggtcggtggg ttgctgggcg cttcagtctt tacgtttcat gcggtcggtg 11520
tcattctcat gccctacgtc tttatctcta agaatgtcgg tggagcgact taggtgcact 11580
ggtctttatg tttagaaatg tcggtgtgat tacaggtatc aaatgtcttt agctctggga 11640
aggtcggtat cgatccaaag atccggggtt ttaaattgtt gtcaatgaac taggcacata 11700
gtaatataaa aaacatttta ttacaagccc ccctcctttt tgtttggcgc ccaacaaaaa 11760
aaatcgccca aaagagcagc ttttcgggcg cggcgcctcc atatatagcg caccaaacta 11820
tttcaacgcc ctggccaaat acctccccgt gtgactcttt tttaccttgg ccacatcacg 11880
cggcgtacct tcggccacca gcaaaccacc gtgattgcca ccttccggac ccagatcaat 11940
cacccagtcc gaagatttaa taacttccaa attgtgttca ataatcaata gactgttgcc 12000
cttatccacc agcttgctca gcacgtgcag caaccgtttc acatcatcaa aatgcaaacc 12060
cgtcgtcggc tcatccaaaa tatacaacgt ctttcccgtc gagcgccgtg acaattccgt 12120
cgccagcttc acacgctgcg cttcaccacc actcagcgtc gtcgcattct gtcccagctg 12180
aatatagccc aaacccactt caaacagcgt cttcaacttt tcatgaataa tcggaatatt 12240
gctgaaaaat ttcgtcgcat cttcgaccgt catgttcagt acctcggaaa tatttttccc 12300
cttgtaatga atttccaaag cctgctcgtt gtagcggcgg cctttgcatt cgtcgcaatc 12360
cacatacacg tccggcagga agtgcatctc aattttggtc acaccatcgc cctgacaggc 12420
ttcgcagcgg ccacccttca cattgaaact gaaacgcccg gccttgtagc cgcgcatctt 12480
cgcttccggc acctgcgtga acagatcgcg aatgtaggta aacacgccgg tgtaggtggc 12540
ggcgttggag cggggagtac ggccgatcgg cgactgatca atatcaatca ccttatcgag 12600
atattccagt ccgcgcagct ctttgtgttt gccgggaata tccttggcat tatgaaaatg 12660
ttgtgacaac gcgcgggcga gaatatcggt catcaacgtc gatttgccgc tgccggaaac 12720
gccggtgatg cacactaatt ttcccagcgg aatgcgcacg ttgatatttt gtaggttgtg 12780
ggcggtggca ccgcggattt caatatattt gccgttgccg cggcggtact tgtgcggcgc 12840
ttcaatgaat tttttgccgc tcagatattg accggtcaat gacgctttat ttttaataat 12900
ttcctgaggt gtgccaaggg caacaatttc gccaccgtgt ttgccggcac caggccccac 12960
gtcaataaca taatcagcgg agcgaatcgt ttcttcatcg tgctcgacga cgatcacggt 13020
attgcctaat tcgcgcagcg ctttgagtgt gtctatgagt ttggagttgt cgcgttggtg 13080
caagccaatg ctgggttcat cgaggatata gataacgccg accaaagatg aaccgatttg 13140
cgtggccaga cgaatgcgtt gcgcttcacc gccgcttaaa gtcgaagcag cgcgatctaa 13200
agtcaaataa tccagaccta cattatgtaa aaaagtcagg cgttcgcgga tttctttcat 13260
gatctgatgc gaaattttgg cttcgcgtac ggacatgacg tagacattat tttttgccat 13320
gctgttgccg ccggagttgg caccaccttt gccggccgcg tttttggcgc cagcaccctt 13380
cgcgccagca cccgcaccac caaccacaaa cccctcaaaa aatgcctgcg cttcttcaat 13440
gctcaacccc gtcgtgtcag aaatggattt gccgcgaatc gttacggcca gtgcaatttt 13500
gttcaaccgt ttcccgtgac acgtcggaca atcaaagacg cgcatgtagc gttcgatttc 13560
cgagcggata tattccgact cggtttcttt gtagcgccgt tccaaattcg gtatcacgcc 13620
ttcatacgtc gtcacaaatt cacggatttt ggatgtcgag ttcatgccgc tgttgacgtc 13680
gaaagattct tcgccggtgc cgtaaaacac cagcttcagt tgcgcggcgg tcattttttt 13740
caccggttcg tccaaagaaa aaccgtattt ggccgccact gtcgccagaa tccgcagcat 13800
ccagccctga ttcgaagacg tgcgtgacca gggtctgatg gcaccctgat tgatgctcaa 13860
atttttattg ggaatgatca gttcagcgtc gacttcgagc ttggtgccca atccagtgca 13920
ttccacgcag gcgccgtgcg ggctgttaaa cgaaaacagg cgcggttcaa tttccggcag 13980
gttgatgccg cagcgcggac aggcgaagtg ctgactgaac agctgatctt tttcgctggt 14040
actgtcgtgc acaatcacca taccatcacc caaatccaag gcggtttcca gagattcgtg 14100
caagcggctg cggtttttgc gcagctcttt gtcaacaacc aagcgatcta caacaacatc 14160
aatggtatgt ttctttttct tatcgaggac gagatcgagt gcttcttcga tgctcatcat 14220
attcccgttg acgcgcacgc gcacaaaacc ggctttgcgc gtttcttcaa agacgtgttt 14280
gtgttcacct tttttgtcgc ggataatttg cgcgatgagc ataaatttcg tatccgcttt 14340
caggcgcaga atttgttcga ggatttgttc ggtggtttgt ttgctgactt tatcaccgca 14400
gttggggcag tgtggttggc cgatgcgggc gtagagcaaa cgcaggtaat cgtaaatttc 14460
ggtgacggtg ccgacggtgg atcggggatt gtgggatgtg gttttttgat cgatggagat 14520
ggcgggcgag aggccttcaa tgctgtcgac gtcaggcttg tccatcaggc cgaggaattg 14580
gcgggcgtag gaagacaggc tttcgacgta gcggcgctga ccttcggcat agatcgtatc 14640
aaaagccagg gaagattttc ccgagccgga caggccggtg atgacgacga gctggtcacg 14700
ggggatgtcc aggctgatat ttttcaggtt gtggacgcgg gcgcctttga tgatgatcga 14760
attttcacct gccataattg atcgttatga gacaacaaaa atttttagag caaagcccgt 14820
aacctgcttt cgaggcagaa ttttcaaaat actgccgagg cgaaggaaaa aattttgagg 14880
aatactgtta gtatttcgag aaatttttta caagccgcag gcggattttg aaaattatga 14940
tccggaatga ggttgcgggt tttactctag acgaacttcc gccagtctac tacttttttt 15000
tgcgtaagtc aaccgtttgt gggcggggct gattcggttt tgtggtggtt tcgggagcag 15060
catagatgta gcggaaaatt caaaaaactg gtataatatt gctacaacct atacaaacaa 15120
aagcgtaaaa atcatgcatt tttcacgttt cggattttat ttccgtaacc gacgcatggt 15180
agaacgtttc ttcgttctat tttgtgctat tttttctgct gtcctggttt tgtcgcttgt 15240
tgccctggtg ctggtggctg acaaaattaa tatcaatccc attgtgcaca tcttgtttcg 15300
tttttttcag cgaccctttg tcagtgcgct gattctgtct tttttcgtca caacccttct 15360
ttacgccgtt tttgttctgg tgcatccagt gcagcatcat accgtgtatt ggcagcgtca 15420
ttcgcagcga tatcatattc gcaagaaatc ccatattcac cgcagattgc gtcacattcc 15480
cgcgcagaca tcacataagc tgttggcgct cagttcactt tttgttgtgg ttaaaattgt 15540
ttttgtcagt tttgcctccg gttttttacc gcatgatgtt ttggcacaga ccgttgatcc 15600
gagcggacag aaaagtcagt cggtgttggt ggcggcgttt tatgtccagg tgcttgattc 15660
cgatgatttg tatatttgga tttttatgtt gggccttttg ccgctggcgg ttctgatttt 15720
tttcatcgtt tttcgttcgc atatttttcc gcataagaat tttcattatg agagcgcaca 15780
tctggatacg aatattgtca cttttgcggc ccggaagaag gcggagcagc ggcgcaaaaa 15840
gccatcacct ccggccggta ttgtaccttt gcatgatgca taacctatga attctgtttt 15900
gcagaaaaaa ttagctggtc tgccgcatca acccggcgtc tatgtgtata aagacgcacg 15960
gggtgatgtt ttgtacgtgg ggaaggccaa agatttggcg aagcgcgtgc gatcgtattg 16020
gcagtcgggt cgctcgctgg tgccggacaa agctttgatg gtgagtcagg cggctgatat 16080
cgatatcacg gtggtgagtt cggaaacgga agcttttttg ctcgaagcga gtttcattaa 16140
aaaataccgg ccgcggttta atattatttt gaaagatgat aaaagttttt cgtatattaa 16200
ggtgacgttg cgggaagaat ttccgagggt gctggtggtg cggcgcgtga cgcgcgatgg 16260
ca 16262
<210> 28
<211> 10
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 28
aaaaaaaaaa 10
<210> 29
<211> 10
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 29
aaaaaaaaaa 10
<210> 30
<211> 10
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 30
aaaaaaaaaa 10
<210> 31
<211> 25
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 31
ctccgaaagt atcggggata aaggc 25
<210> 32
<211> 25
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 32
caccgaaatt tggagaggat aaggc 25
<210> 33
<211> 25
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 33
ctccgaatta tcgggaggat aaggc 25
<210> 34
<211> 25
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 34
ccccgaatat aggggacaaa aaggc 25
<210> 35
<211> 36
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 35
gtctagacat acaggtggaa aggtgagagt aaagac 36
<210> 36
<211> 25
<212> DNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 36
ctccgtgaat acgtggggta aaggc 25
<210> 37
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 37
aaaaaaaaaa 10
<210> 38
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 38
aaaaaaaaaa 10
<210> 39
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 39
aaaaaaaaaa 10
<210> 40
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 40
aaaaaaaaaa 10
<210> 41
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 41
aaaaaaaaaa 10
<210> 42
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 42
aaaaaaaaaa 10
<210> 43
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 43
aaaaaaaaaa 10
<210> 44
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 44
aaaaaaaaaa 10
<210> 45
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 45
aaaaaaaaaa 10
<210> 46
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 46
aaaaaaaaaa 10
<210> 47
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 47
aaaaaaaaaa 10
<210> 48
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 48
aaaaaaaaaa 10
<210> 49
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 49
aaaaaaaaaa 10
<210> 50
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 50
aaaaaaaaaa 10
<210> 51
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 51
aaaaaaaaaa 10
<210> 52
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 52
aaaaaaaaaa 10
<210> 53
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 53
aaaaaaaaaa 10
<210> 54
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 54
aaaaaaaaaa 10
<210> 55
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 55
aaaaaaaaaa 10
<210> 56
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 56
aaaaaaaaaa 10
<210> 57
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 57
aaaaaaaaaa 10
<210> 58
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 58
aaaaaaaaaa 10
<210> 59
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 59
aaaaaaaaaa 10
<210> 60
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 60
aaaaaaaaaa 10
<210> 61
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 61
aaaaaaaaaa 10
<210> 62
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 62
aaaaaaaaaa 10
<210> 63
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 63
aaaaaaaaaa 10
<210> 64
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 64
aaaaaaaaaa 10
<210> 65
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 65
aaaaaaaaaa 10
<210> 66
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 66
aaaaaaaaaa 10
<210> 67
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 67
aaaaaaaaaa 10
<210> 68
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 68
aaaaaaaaaa 10
<210> 69
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 69
aaaaaaaaaa 10
<210> 70
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 70
aaaaaaaaaa 10
<210> 71
<211> 10
<212> RNA
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 71
aaaaaaaaaa 10
<210> 72
<211> 10
<212> RNA
<213> Unknown (Unknown)
<220>
<223> synthetic sequence
<400> 72
aaaaaaaaaa 10
<210> 73
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 73
aaaaaaaaaa 10
<210> 74
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 74
aaaaaaaaaa 10
<210> 75
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 75
aaaaaaaaaa 10
<210> 76
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 76
aaaaaaaaaa 10
<210> 77
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 77
aaaaaaaaaa 10
<210> 78
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 78
aaaaaaaaaa 10
<210> 79
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 79
aaaaaaaaaa 10
<210> 80
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 80
aaaaaaaaaa 10
<210> 81
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 81
aaaaaaaaaa 10
<210> 82
<211> 10
<212> RNA
<213> Artificial sequence (Artificial sequence)
<220>
<223> synthetic sequence
<400> 82
aaaaaaaaaa 10
<210> 83
<211> 84
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 83
Met Ala Ser Met Ile Ser Ser Ser Ala Val Thr Thr Val Ser Arg Ala
1 5 10 15
Ser Arg Gly Gln Ser Ala Ala Met Ala Pro Phe Gly Gly Leu Lys Ser
20 25 30
Met Thr Gly Phe Pro Val Arg Lys Val Asn Thr Asp Ile Thr Ser Ile
35 40 45
Thr Ser Asn Gly Gly Arg Val Lys Cys Met Gln Val Trp Pro Pro Ile
50 55 60
Gly Lys Lys Lys Phe Glu Thr Leu Ser Tyr Leu Pro Pro Leu Thr Arg
65 70 75 80
Asp Ser Arg Ala
<210> 84
<211> 57
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 84
Met Ala Ser Met Ile Ser Ser Ser Ala Val Thr Thr Val Ser Arg Ala
1 5 10 15
Ser Arg Gly Gln Ser Ala Ala Met Ala Pro Phe Gly Gly Leu Lys Ser
20 25 30
Met Thr Gly Phe Pro Val Arg Lys Val Asn Thr Asp Ile Thr Ser Ile
35 40 45
Thr Ser Asn Gly Gly Arg Val Lys Ser
50 55
<210> 85
<211> 85
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 85
Met Ala Ser Ser Met Leu Ser Ser Ala Thr Met Val Ala Ser Pro Ala
1 5 10 15
Gln Ala Thr Met Val Ala Pro Phe Asn Gly Leu Lys Ser Ser Ala Ala
20 25 30
Phe Pro Ala Thr Arg Lys Ala Asn Asn Asp Ile Thr Ser Ile Thr Ser
35 40 45
Asn Gly Gly Arg Val Asn Cys Met Gln Val Trp Pro Pro Ile Glu Lys
50 55 60
Lys Lys Phe Glu Thr Leu Ser Tyr Leu Pro Asp Leu Thr Asp Ser Gly
65 70 75 80
Gly Arg Val Asn Cys
85
<210> 86
<211> 76
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 86
Met Ala Gln Val Ser Arg Ile Cys Asn Gly Val Gln Asn Pro Ser Leu
1 5 10 15
Ile Ser Asn Leu Ser Lys Ser Ser Gln Arg Lys Ser Pro Leu Ser Val
20 25 30
Ser Leu Lys Thr Gln Gln His Pro Arg Ala Tyr Pro Ile Ser Ser Ser
35 40 45
Trp Gly Leu Lys Lys Ser Gly Met Thr Leu Ile Gly Ser Glu Leu Arg
50 55 60
Pro Leu Lys Val Met Ser Ser Val Ser Thr Ala Cys
65 70 75
<210> 87
<211> 76
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 87
Met Ala Gln Val Ser Arg Ile Cys Asn Gly Val Trp Asn Pro Ser Leu
1 5 10 15
Ile Ser Asn Leu Ser Lys Ser Ser Gln Arg Lys Ser Pro Leu Ser Val
20 25 30
Ser Leu Lys Thr Gln Gln His Pro Arg Ala Tyr Pro Ile Ser Ser Ser
35 40 45
Trp Gly Leu Lys Lys Ser Gly Met Thr Leu Ile Gly Ser Glu Leu Arg
50 55 60
Pro Leu Lys Val Met Ser Ser Val Ser Thr Ala Cys
65 70 75
<210> 88
<211> 72
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 88
Met Ala Gln Ile Asn Asn Met Ala Gln Gly Ile Gln Thr Leu Asn Pro
1 5 10 15
Asn Ser Asn Phe His Lys Pro Gln Val Pro Lys Ser Ser Ser Phe Leu
20 25 30
Val Phe Gly Ser Lys Lys Leu Lys Asn Ser Ala Asn Ser Met Leu Val
35 40 45
Leu Lys Lys Asp Ser Ile Phe Met Gln Leu Phe Cys Ser Phe Arg Ile
50 55 60
Ser Ala Ser Val Ala Thr Ala Cys
65 70
<210> 89
<211> 69
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 89
Met Ala Ala Leu Val Thr Ser Gln Leu Ala Thr Ser Gly Thr Val Leu
1 5 10 15
Ser Val Thr Asp Arg Phe Arg Arg Pro Gly Phe Gln Gly Leu Arg Pro
20 25 30
Arg Asn Pro Ala Asp Ala Ala Leu Gly Met Arg Thr Val Gly Ala Ser
35 40 45
Ala Ala Pro Lys Gln Ser Arg Lys Pro His Arg Phe Asp Arg Arg Cys
50 55 60
Leu Ser Met Val Val
65
<210> 90
<211> 77
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 90
Met Ala Ala Leu Thr Thr Ser Gln Leu Ala Thr Ser Ala Thr Gly Phe
1 5 10 15
Gly Ile Ala Asp Arg Ser Ala Pro Ser Ser Leu Leu Arg His Gly Phe
20 25 30
Gln Gly Leu Lys Pro Arg Ser Pro Ala Gly Gly Asp Ala Thr Ser Leu
35 40 45
Ser Val Thr Thr Ser Ala Arg Ala Thr Pro Lys Gln Gln Arg Ser Val
50 55 60
Gln Arg Gly Ser Arg Arg Phe Pro Ser Val Val Val Cys
65 70 75
<210> 91
<211> 57
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 91
Met Ala Ser Ser Val Leu Ser Ser Ala Ala Val Ala Thr Arg Ser Asn
1 5 10 15
Val Ala Gln Ala Asn Met Val Ala Pro Phe Thr Gly Leu Lys Ser Ala
20 25 30
Ala Ser Phe Pro Val Ser Arg Lys Gln Asn Leu Asp Ile Thr Ser Ile
35 40 45
Ala Ser Asn Gly Gly Arg Val Gln Cys
50 55
<210> 92
<211> 65
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 92
Met Glu Ser Leu Ala Ala Thr Ser Val Phe Ala Pro Ser Arg Val Ala
1 5 10 15
Val Pro Ala Ala Arg Ala Leu Val Arg Ala Gly Thr Val Val Pro Thr
20 25 30
Arg Arg Thr Ser Ser Thr Ser Gly Thr Ser Gly Val Lys Cys Ser Ala
35 40 45
Ala Val Thr Pro Gln Ala Ser Pro Val Ile Ser Arg Ser Ala Ala Ala
50 55 60
Ala
65
<210> 93
<211> 72
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 93
Met Gly Ala Ala Ala Thr Ser Met Gln Ser Leu Lys Phe Ser Asn Arg
1 5 10 15
Leu Val Pro Pro Ser Arg Arg Leu Ser Pro Val Pro Asn Asn Val Thr
20 25 30
Cys Asn Asn Leu Pro Lys Ser Ala Ala Pro Val Arg Thr Val Lys Cys
35 40 45
Cys Ala Ser Ser Trp Asn Ser Thr Ile Asn Gly Ala Ala Ala Thr Thr
50 55 60
Asn Gly Ala Ser Ala Ala Ser Ser
65 70
<210> 94
<211> 20
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<220>
<221> features not yet classified
<222> (4)..(4)
<223> Xaa can be any naturally occurring amino acid
<220>
<221> features not yet classified
<222> (8)..(8)
<223> Xaa can be any naturally occurring amino acid
<220>
<221> features not yet classified
<222> (11)..(11)
<223> Xaa can be any naturally occurring amino acid
<220>
<221> features not yet classified
<222> (15)..(15)
<223> Xaa can be any naturally occurring amino acid
<220>
<221> features not yet classified
<222> (19)..(19)
<223> Xaa can be any naturally occurring amino acid
<400> 94
Gly Leu Phe Xaa Ala Leu Leu Xaa Leu Leu Xaa Ser Leu Trp Xaa Leu
1 5 10 15
Leu Leu Xaa Ala
20
<210> 95
<211> 20
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 95
Gly Leu Phe His Ala Leu Leu His Leu Leu His Ser Leu Trp His Leu
1 5 10 15
Leu Leu His Ala
20
<210> 96
<211> 7
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 96
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 97
<211> 16
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 97
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1 5 10 15
<210> 98
<211> 9
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 98
Pro Ala Ala Lys Arg Val Lys Leu Asp
1 5
<210> 99
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 99
Arg Gln Arg Arg Asn Glu Leu Lys Arg Ser Pro
1 5 10
<210> 100
<211> 38
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 100
Asn Gln Ser Ser Asn Phe Gly Pro Met Lys Gly Gly Asn Phe Gly Gly
1 5 10 15
Arg Ser Ser Gly Pro Tyr Gly Gly Gly Gly Gln Tyr Phe Ala Lys Pro
20 25 30
Arg Asn Gln Gly Gly Tyr
35
<210> 101
<211> 42
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 101
Arg Met Arg Ile Glx Phe Lys Asn Lys Gly Lys Asp Thr Ala Glu Leu
1 5 10 15
Arg Arg Arg Arg Val Glu Val Ser Val Glu Leu Arg Lys Ala Lys Lys
20 25 30
Asp Glu Gln Ile Leu Lys Arg Arg Asn Val
35 40
<210> 102
<211> 8
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 102
Val Ser Arg Lys Arg Pro Arg Pro
1 5
<210> 103
<211> 8
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 103
Pro Pro Lys Lys Ala Arg Glu Asp
1 5
<210> 104
<211> 8
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 104
Pro Gln Pro Lys Lys Lys Pro Leu
1 5
<210> 105
<211> 12
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 105
Ser Ala Leu Ile Lys Lys Lys Lys Lys Met Ala Pro
1 5 10
<210> 106
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 106
Asp Arg Leu Arg Arg
1 5
<210> 107
<211> 7
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 107
Pro Lys Gln Lys Lys Arg Lys
1 5
<210> 108
<211> 10
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 108
Arg Lys Leu Lys Lys Lys Ile Lys Lys Leu
1 5 10
<210> 109
<211> 10
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 109
Arg Glu Lys Lys Lys Phe Leu Lys Arg Arg
1 5 10
<210> 110
<211> 20
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 110
Lys Arg Lys Gly Asp Glu Val Asp Gly Val Asp Glu Val Ala Lys Lys
1 5 10 15
Lys Ser Lys Lys
20
<210> 111
<211> 17
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 111
Arg Lys Cys Leu Gln Ala Gly Met Asn Leu Glu Ala Arg Lys Thr Lys
1 5 10 15
Lys
<210> 112
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 112
Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg
1 5 10
<210> 113
<211> 12
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 113
Arg Arg Gln Arg Arg Thr Ser Lys Leu Met Lys Arg
1 5 10
<210> 114
<211> 27
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 114
Gly Trp Thr Leu Asn Ser Ala Gly Tyr Leu Leu Gly Lys Ile Asn Leu
1 5 10 15
Lys Ala Leu Ala Ala Leu Ala Lys Lys Ile Leu
20 25
<210> 115
<211> 33
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 115
Lys Ala Leu Ala Trp Glu Ala Lys Leu Ala Lys Ala Leu Ala Lys Ala
1 5 10 15
Leu Ala Lys His Leu Ala Lys Ala Leu Ala Lys Ala Leu Lys Cys Glu
20 25 30
Ala
<210> 116
<211> 16
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 116
Arg Gln Ile Lys Ile Trp Phe Gln Asn Arg Arg Met Lys Trp Lys Lys
1 5 10 15
<210> 117
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 117
Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg
1 5 10
<210> 118
<211> 9
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 118
Arg Lys Lys Arg Arg Gln Arg Arg Arg
1 5
<210> 119
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 119
Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg
1 5 10
<210> 120
<211> 8
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 120
Arg Lys Lys Arg Arg Gln Arg Arg
1 5
<210> 121
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 121
Tyr Ala Arg Ala Ala Ala Arg Gln Ala Arg Ala
1 5 10
<210> 122
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 122
Thr His Arg Leu Pro Arg Arg Arg Arg Arg Arg
1 5 10
<210> 123
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 123
Gly Gly Arg Arg Ala Arg Arg Arg Arg Arg Arg
1 5 10
<210> 124
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 124
Gly Ser Gly Gly Ser
1 5
<210> 125
<211> 6
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 125
Gly Gly Ser Gly Gly Ser
1 5
<210> 126
<211> 4
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 126
Gly Gly Gly Ser
1
<210> 127
<211> 4
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 127
Gly Gly Ser Gly
1
<210> 128
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 128
Gly Gly Ser Gly Gly
1 5
<210> 129
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 129
Gly Ser Gly Ser Gly
1 5
<210> 130
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 130
Gly Ser Gly Gly Gly
1 5
<210> 131
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 131
Gly Gly Gly Ser Gly
1 5
<210> 132
<211> 5
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 132
Gly Ser Ser Ser Gly
1 5
<210> 133
<211> 16
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 133
Arg Gln Ile Lys Ile Trp Phe Gln Asn Arg Arg Met Lys Trp Lys Lys
1 5 10 15
<210> 134
<211> 11
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<220>
<223> synthetic sequence
<400> 134
Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg
1 5 10
Claims (123)
1. A composition, comprising:
a) a CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide; and
b) a CasY guide RNA or one or more DNA molecules encoding said CasY guide RNA.
2. The composition of claim 1, wherein the CasY polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
3. The composition of claim 1 or claim 2, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS 11-15.
4. The composition of claim 1 or claim 2, wherein the CasY polypeptide is fused to an NLS sequence.
5. The composition of any one of claims 1-4, wherein the composition comprises a lipid.
6. The composition of any one of claims 1-4, wherein a) and b) are within a liposome.
7. The composition of any one of claims 1-4, wherein a) and b) are within a particle.
8. The composition of any one of claims 1-7, comprising one or more of: buffers, nuclease inhibitors and protease inhibitors.
9. The composition of any one of claims 1-8, wherein the CasY polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
10. The composition of any one of claims 1-9, wherein the CasY polypeptide is a nickase that is capable of cleaving only one strand of a double-stranded target nucleic acid molecule.
11. The composition of any one of claims 1-9, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasy).
12. The composition of claim 10 or claim 11, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from: d672, E769 and D935 of SEQ ID NO. 1.
13. The composition of any one of claims 1-12, further comprising a DNA donor template.
14. A CasY fusion polypeptide comprising: a CasY polypeptide fused to a heterologous polypeptide.
15. The CasY fusion polypeptide of claim 14, wherein the CasY polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
16. The CasY fusion polypeptide of claim 14, wherein the CasY polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
17. The CasY fusion polypeptide of any one of claims 14-16, wherein the CasY polypeptide is a nickase that is capable of cleaving only one strand of a double stranded target nucleic acid molecule.
18. The CasY fusion polypeptide of any one of claims 14-17, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasy).
19. The CasY fusion polypeptide of claim 17 or claim 18, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from: d672, E769 and D935 of SEQ ID NO. 1.
20. The CasY fusion polypeptide of any one of claims 14-19, wherein the heterologous polypeptide is fused to the N-terminus and/or C-terminus of the CasY polypeptide.
21. The CasY fusion polypeptide of any one of claims 14-20, which comprises NLS.
22. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a targeting polypeptide that provides binding to a cell surface moiety on a target cell or target cell type.
23. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide exhibits enzymatic activity that modifies a target DNA.
24. The CasY fusion polypeptide of claim 23, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
25. The CasY fusion polypeptide of claim 24, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity and recombinase activity.
26. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
27. The CasY fusion polypeptide of claim 26, wherein the heterologous polypeptide exhibits histone modification activity.
28. The CasY fusion polypeptide of claim 26 or claim 27, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desuumoylating activity, ribosylating activity, deubisylating activity, myristoylation activity, demamylylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
29. The CasY fusion polypeptide of claim 28, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity and deacetylase activity.
30. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is an endosomal escape polypeptide.
31. The CasY fusion polypeptide of claim 30, wherein the endosomal escape polypeptide comprises an amino acid sequence selected from the group consisting of: GLFXALLXLXL LXLLXA (SEQ ID NO:94) and GLFHALLHLLHSLWHLLLHA (SEQ ID NO:95), wherein each X is independently selected from lysine, histidine and arginine.
32. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a chloroplast transit peptide.
33. The CasY fusion polypeptide of claim 32, wherein the chloroplast transit peptide comprises an amino acid sequence selected from the group consisting of: MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKCMQVWPPIGKKKFETLSYLPPLTRDSRA (SEQ ID NO:83), MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKS (SEQ ID NO:84), MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWPPIEKKKFETLSYLPDLTDSGGRVNC (SEQ ID NO:85), MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:86), MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC (SEQ ID NO:87), MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLFCSFRISASVATAC (SEQ ID NO:88), MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRTVGASAAPKQSRKPHRFDRRCLSMVV (SEQ ID NO:89), MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQQRSVQRGSRRFPSVVVC (SEQ ID NO:90), MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLDITSIASNGGRVQC (SEQ ID NO:91), MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVISRSAAAA (SEQ ID NO:92), and MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATTNGASAASS (SEQ ID NO: 93).
34. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
35. The CasY fusion polypeptide of claim 34, wherein the heterologous polypeptide is a transcriptional repressor domain.
36. The CasY fusion polypeptide of claim 34, wherein the heterologous polypeptide is a transcriptional activation domain.
37. The CasY fusion polypeptide of any one of claims 14-21, wherein the heterologous polypeptide is a protein binding domain.
38. A nucleic acid molecule encoding the CasY fusion polypeptide of any one of claims 14-37.
39. The nucleic acid molecule of claim 38, wherein the nucleotide sequence encoding the CasY fusion polypeptide is operably linked to a promoter.
40. The nucleic acid molecule of claim 39, wherein the promoter is functional in a eukaryotic cell.
41. The nucleic acid molecule of claim 40, wherein the promoter is functional in one or more of: plant cells, fungal cells, animal cells, invertebrate cells, fly cells, vertebrate cells, mammalian cells, primate cells, non-human primate cells, and human cells.
42. The nucleic acid molecule of any one of claims 39-41, wherein the promoter is one or more of: constitutive promoters, inducible promoters, cell type specific promoters, and tissue specific promoters.
43. The nucleic acid molecule of any one of claims 38-42, wherein the DNA molecule is a recombinant expression vector.
44. The nucleic acid molecule of claim 43, wherein the recombinant expression vector is a recombinant adeno-associated viral vector, a recombinant retroviral vector, or a recombinant lentiviral vector.
45. The nucleic acid molecule of claim 39, wherein the promoter is functional in a prokaryotic cell.
46. The nucleic acid molecule of claim 38, wherein the nucleic acid molecule is mRNA.
47. One or more nucleic acid molecules encoding:
(a) a CasY guide RNA; and
(b) a CasY polypeptide.
48. The one or more nucleic acid molecules of claim 47, wherein the CasY polypeptide comprises an amino acid sequence having 50% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
49. The one or more nucleic acid molecules of claim 47, wherein the CasY polypeptide comprises an amino acid sequence having 85% or greater identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
50. The one or more nucleic acid molecules of any one of claims 47-49, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS 11-15.
51. The one or more nucleic acid molecules of any one of claims 47-50, wherein the CasY polypeptide is fused to an NLS sequence.
52. The one or more nucleic acid molecules of any one of claims 47-51, wherein the one or more nucleic acid molecules comprises a nucleotide sequence encoding the CasY guide RNA operably linked to a promoter.
53. The one or more nucleic acid molecules of any one of claims 47-52, wherein the one or more nucleic acid molecules comprises a nucleotide sequence encoding the CasY polypeptide operably linked to a promoter.
54. The one or more nucleic acid molecules of claim 52 or claim 53, wherein said promoter operably linked to said nucleotide sequence encoding said CasY guide RNA and/or said promoter operably linked to said nucleotide sequence encoding said CasY polypeptide is functional in a eukaryotic cell.
55. The one or more nucleic acid molecules of claim 54, wherein the promoter is functional in one or more of: plant cells, fungal cells, animal cells, invertebrate cells, fly cells, vertebrate cells, mammalian cells, primate cells, non-human primate cells, and human cells.
56. The one or more nucleic acid molecules of any one of claims 53-55, wherein the promoter is one or more of: constitutive promoters, inducible promoters, cell type specific promoters, and tissue specific promoters.
57. The one or more nucleic acid molecules of any one of claims 47-56, wherein the one or more nucleic acid molecules are one or more recombinant expression vectors.
58. The one or more nucleic acid molecules of claim 57, wherein the one or more recombinant expression vectors are selected from the group consisting of: one or more adeno-associated viral vectors, one or more recombinant retroviral vectors, or one or more recombinant lentiviral vectors.
59. The one or more nucleic acid molecules of claim 53, wherein the promoter is functional in a prokaryotic cell.
60. A eukaryotic cell comprising one or more of:
a) a CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide,
b) casy fusion polypeptide or nucleic acid molecule encoding said Casy fusion polypeptide, and
c) a CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
61. The eukaryotic cell of claim 60, comprising a nucleic acid molecule encoding the CasY polypeptide, wherein the nucleic acid molecule is integrated into the genomic DNA of the cell.
62. The eukaryotic cell of claim 60 or claim 61, wherein the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arthropod cell, a fungal cell, an avian cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell.
63. A cell comprising a CasY fusion polypeptide or a nucleic acid molecule encoding said CasY fusion polypeptide.
64. The cell of claim 63, wherein the cell is a prokaryotic cell.
65. The cell of claim 63 or claim 64, comprising a nucleic acid molecule encoding said CasY fusion polypeptide, wherein said nucleic acid molecule is integrated into the genomic DNA of said cell.
66. A method of modifying a target nucleic acid, the method comprising contacting the target nucleic acid with:
a) a CasY polypeptide; and
b) a CasY guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid,
wherein said contacting results in a modification of said target nucleic acid by said CasY polypeptide.
67. The method of claim 66, wherein the modification is cleavage of the target nucleic acid.
68. The method of claim 66 or claim 67, wherein the target nucleic acid is selected from the group consisting of: double-stranded DNA, single-stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
69. The method of any one of claims 66-68, wherein the contacting occurs outside of a cell in vitro.
70. The method of any one of claims 66-68, wherein the contacting occurs inside a cell in culture.
71. The method of any one of claims 66-68, wherein the contacting occurs inside a cell in vivo.
72. The method of claim 70 or claim 71, wherein the cell is a eukaryotic cell.
73. The method of claim 72, wherein the cell is selected from the group consisting of: plant cells, fungal cells, mammalian cells, reptile cells, insect cells, avian cells, fish cells, parasite cells, arthropod cells, invertebrate cells, vertebrate cells, rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells.
74. The method of claim 70 or claim 71, wherein the cell is a prokaryotic cell.
75. The method of any one of claims 66-74, wherein the contacting results in genome editing.
76. The method of any one of claims 66-75, wherein the contacting comprises: the following were introduced into the cells: (a) said CasY polypeptide or a nucleic acid molecule encoding said CasY polypeptide, and (b) said CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
77. The method of claim 76, wherein the contacting further comprises: introducing a DNA donor template into the cell.
78. The method of any one of claims 66-77, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS 11-15.
79. The method of any one of claims 66-78, wherein the CasY polypeptide is fused to an NLS sequence.
80. A method of modulating transcription from a target DNA, modifying a target nucleic acid, or modifying a protein associated with a target nucleic acid, the method comprising contacting the target nucleic acid with:
a) a CasY fusion polypeptide comprising a CasY polypeptide fused to a heterologous polypeptide; and
b) a CasY guide RNA comprising a guide sequence that hybridizes to a target sequence of the target nucleic acid.
81. The method of claim 80, wherein the CasY guide RNA comprises a nucleotide sequence that has 80% or greater identity to a crRNA sequence set forth in any one of SEQ ID NOS: 11-15.
82. The method of claim 80 or claim 81, wherein the CasY fusion polypeptide comprises an NLS sequence.
83. The method of any one of claims 80-82, wherein the modification is not cleavage of the target nucleic acid.
84. The method of any one of claims 80-83, wherein the target nucleic acid is selected from the group consisting of: double-stranded DNA, single-stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
85. The method of any one of claims 80-84, wherein the contacting occurs outside of a cell in vitro.
86. The method of any one of claims 80-84, wherein the contacting occurs inside a cell in culture.
87. The method of any one of claims 80-84, wherein the contacting occurs inside a cell in vivo.
88. The method of claim 86 or claim 87, wherein the cell is a eukaryotic cell.
89. The method of claim 88, wherein the cell is selected from the group consisting of: plant cells, fungal cells, mammalian cells, reptile cells, insect cells, avian cells, fish cells, parasite cells, arthropod cells, invertebrate cells, vertebrate cells, rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells.
90. The method of claim 86 or claim 87, wherein the cell is a prokaryotic cell.
91. The method of any one of claims 80-90, wherein the contacting comprises: the following were introduced into the cells: (a) said CasY fusion polypeptide or a nucleic acid molecule encoding said CasY fusion polypeptide, and (b) said CasY guide RNA or a nucleic acid molecule encoding said CasY guide RNA.
92. The method of any one of claims 80-91, wherein the CasY polypeptide is a catalytically inactive CasY polypeptide (dCasy).
93. The method of any one of claims 80-92, wherein the CasY polypeptide comprises one or more mutations at positions corresponding to positions selected from: d672, E769 and D935 of SEQ ID NO. 1.
94. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits enzymatic activity that modifies a target DNA.
95. The method of claim 94, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
96. The method of claim 95, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: nuclease activity, methyltransferase activity, demethylase activity, deamination activity, depurination activity, integrase activity, transposase activity and recombinase activity.
97. The method of any one of claims 80-93, wherein the heterologous polypeptide exhibits an enzymatic activity that modifies a target polypeptide associated with a target nucleic acid.
98. The method of claim 97, wherein the heterologous polypeptide exhibits histone modification activity.
99. The method of claim 97 or claim 98, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylating activity, polyadenylation activity, sumoylating activity, desuumoylating activity, ribosylating activity, deubisylating activity, myristoylation activity, demamylylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
100. The method of claim 99, wherein the heterologous polypeptide exhibits one or more enzymatic activities selected from the group consisting of: methyltransferase activity, demethylase activity, acetyltransferase activity and deacetylase activity.
101. The method of any one of claims 80-93, wherein the heterologous polypeptide is a protein that increases or decreases transcription.
102. The method of claim 101, wherein the heterologous polypeptide is a transcriptional repressor domain.
103. The method of claim 101, wherein the heterologous polypeptide is a transcriptional activation domain.
104. The method of any one of claims 80-93, wherein the heterologous polypeptide is a protein binding domain.
105. A transgenic multicellular non-human organism whose genome comprises a transgene comprising a nucleotide sequence encoding one or more of:
a) (ii) a CasY polypeptide having a sequence of,
b) casy fusion polypeptide, and
c) CasY guide RNA.
106. The transgenic, multicellular non-human organism of claim 105 wherein the CasY polypeptide comprises an amino acid sequence having 50 percent or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID No. 1 or SEQ ID No. 2.
107. The transgenic, multicellular non-human organism of claim 105 wherein the CasY polypeptide comprises an amino acid sequence having 85 percent or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID No. 1 or SEQ ID No. 2.
108. The transgenic multicellular non-human organism of any one of claims 105-107 wherein the organism is a plant, a monocot, a dicot, an invertebrate, an insect, an arthropod, an arachnid, a parasite, a worm, a cnidium, a vertebrate, a fish, a reptile, an amphibian, an ungulate, a bird, a pig, a horse, a sheep, a rodent, a mouse, a rat, or a non-human primate.
109. A system, comprising:
a) a CasY polypeptide and a CasY guide RNA;
b) a CasY polypeptide, a CasY guide RNA and a DNA donor template;
c) a CasY fusion polypeptide and a CasY guide RNA;
d) a CasY fusion polypeptide, a CasY guide RNA, and a DNA donor template;
e) mRNA encoding a CasY polypeptide and a CasY guide RNA;
f) mRNA encoding a CasY polypeptide, a CasY guide RNA, and a DNA donor template;
g) mRNA encoding a CasY fusion polypeptide and a CasY guide RNA;
h) mRNA encoding a CasY fusion polypeptide, a CasY guide RNA, and a DNA donor template;
i) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY polypeptide, and ii) a nucleotide sequence encoding a CasY guide RNA;
j) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY polypeptide, ii) a nucleotide sequence encoding a CasY guide RNA, and iii) a DNA donor template;
k) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY fusion polypeptide, and ii) a nucleotide sequence encoding a CasY guide RNA; and
l) one or more recombinant expression vectors comprising: i) a nucleotide sequence encoding a CasY fusion polypeptide, ii) a nucleotide sequence encoding a CasY guide RNA, and a DNA donor template.
110. The CasY system of claim 109, wherein the CasY polypeptide comprises an amino acid sequence having 50% or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
111. The CasY system of claim 109, wherein the CasY polypeptide comprises an amino acid sequence having 85% or greater amino acid sequence identity to the amino acid sequence set forth in SEQ ID NO 1 or SEQ ID NO 2.
112. The CasY system of any one of claims 109 and 111, wherein the donor template nucleic acid has a length of from 8 nucleotides to 1000 nucleotides.
113. The CasY system of any one of claims 109 and 111, wherein the donor template nucleic acid has a length of from 25 nucleotides to 500 nucleotides.
114. A kit comprising the CasY system as defined in any one of claims 109-113.
115. The kit of claim 114, wherein the components of the kit are in the same container.
116. The kit of claim 114, wherein the components of the kit are in separate containers.
117. A sterile container comprising the CasY system of any one of claims 109 and 116.
118. The sterile container of claim 117, wherein said container is a syringe.
119. An implantable device comprising the CasY system as claimed in any one of claims 109 and 116.
120. The implantable device of claim 119, wherein the CasY system is within a matrix.
121. The implantable device of claim 119, wherein the CasY system is in a depot.
122. A method of identifying an CRISPR RNA-directed endonuclease, the method comprising:
detecting a nucleotide sequence encoding a Cas1 polypeptide in a plurality of metagenomic nucleotide sequences;
detecting a CRISPR array near the nucleotide sequence encoding Cas 1;
cloning a CRISPR locus comprising the detected CRISPR array from a nucleic acid sample from which the plurality of metagenomic nucleotide sequences are derived into an expression vector to generate a recombinant CRISPR locus expression vector;
determining the ability of the recombinant CRISPR locus expression vector to cleave a target nucleic acid, wherein the CRISPR locus having the ability to cleave a target nucleic acid comprises a nucleotide sequence encoding an CRISPR RNA-directed endonuclease.
Identifying an open reading frame in the CRISPR locus encoding a polypeptide having less than 20% amino acid sequence identity to the amino acid sequence of a known CRISPR RNA-directed endonuclease polypeptide.
123. The method of claim 122, wherein said assaying comprises introducing the recombinant CRISPR locus expression vector and a target nucleic acid into a cell.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/402,849 | 2016-09-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK40013668A true HK40013668A (en) | 2020-08-07 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110418647B (en) | RNA-guided nucleic acid modification enzymes and methods of use thereof | |
| US12264314B1 (en) | CasZ compositions and methods of use | |
| US20240301376A1 (en) | Class 2 crispr/cas compositions and methods of use | |
| AU2017335890B2 (en) | RNA-guided nucleic acid modifying enzymes and methods of use thereof | |
| CN114040971A (en) | CRISPR-Cas effector polypeptides and methods of use thereof | |
| HK40013668A (en) | Rna-guided nucleic acid modifying enzymes and methods of use thereof | |
| HK40012328A (en) | Rna-guided nucleic acid modifying enzymes and methods of use thereof | |
| EA045278B1 (en) | RNA-GUIDED NUCLEIC ACIDS MODIFYING ENZYMES AND METHODS OF THEIR APPLICATION |