[go: up one dir, main page]

US20230227810A1 - Methods for generating barcoded combinatorial libraries - Google Patents

Methods for generating barcoded combinatorial libraries Download PDF

Info

Publication number
US20230227810A1
US20230227810A1 US18/157,740 US202318157740A US2023227810A1 US 20230227810 A1 US20230227810 A1 US 20230227810A1 US 202318157740 A US202318157740 A US 202318157740A US 2023227810 A1 US2023227810 A1 US 2023227810A1
Authority
US
United States
Prior art keywords
nucleic acid
sequence
cassette
target
nuclease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/157,740
Inventor
Ryan T. Gill
Andrew Garst
Tanya Elizabeth Warnecke Lipscomb
Marcelo Colika BASSALO
Ramsey Ibrahim ZEITOUN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muse Biotechnology Inc
Inscripta Inc
University of Colorado Colorado Springs
Original Assignee
Muse Biotechnology Inc
Inscripta Inc
University of Colorado Colorado Springs
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muse Biotechnology Inc, Inscripta Inc, University of Colorado Colorado Springs filed Critical Muse Biotechnology Inc
Priority to US18/157,740 priority Critical patent/US20230227810A1/en
Assigned to INSCRIPTA, INC. reassignment INSCRIPTA, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: MUSE BIOTECHNOLOGY, INC.
Assigned to THE REGENTS OF THE UNIVERSITY OF COLORADO, A BODY CORPORATE reassignment THE REGENTS OF THE UNIVERSITY OF COLORADO, A BODY CORPORATE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GARST, Andrew, GILL, RYAN T., ZEITOUN, RAMSEY IBRAHIM, BASSALO, MARCELO COLIKA
Assigned to MUSE BIOTECHNOLOGY, INC. reassignment MUSE BIOTECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WARNECKE LIPSCOMB, TANYA ELIZABETH
Publication of US20230227810A1 publication Critical patent/US20230227810A1/en
Assigned to UNITED STATES DEPARTMENT OF ENERGY reassignment UNITED STATES DEPARTMENT OF ENERGY CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF COLORADO
Assigned to UNITED STATES DEPARTMENT OF ENERGY reassignment UNITED STATES DEPARTMENT OF ENERGY CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF COLORADO
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1079Screening libraries by altering the phenotype or phenotypic trait of the host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1082Preparation or screening gene libraries by chromosomal integration of polynucleotide sequences, HR-, site-specific-recombination, transposons, viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • compositions comprising: i) a first donor nucleic acid comprising: a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid; and ii) a second donor nucleic acid comprising: a) a barcode corresponding to the modified first target nucleic acid sequence; and b) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of a second target nucleic acid.
  • a first donor nucleic acid comprising: a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation
  • PAM protospacer adjacent motif
  • compositions wherein the modified first target nucleic acid sequence comprises at least one inserted, deleted, or substituted nucleic acid compared to a corresponding un-modified first target nucleic acid.
  • compositions wherein the first guide nucleic acid and second guide nucleic acid are compatible with a nucleic acid-guided nuclease.
  • the nucleic acid-guided nuclease is a Type II or Type V Cas protein.
  • compositions wherein the nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue.
  • compositions wherein the second donor nucleic acid comprises a second PAM mutation.
  • compositions wherein the second donor nucleic acid sequence comprises a regulatory sequence or a mutation to turn a screenable or selectable marker on or off.
  • compositions wherein the second donor nucleic acid sequence targets a unique landing site.
  • nucleic acid-guided nuclease is a CRISPR nuclease.
  • PAM mutation is not recognized by the nucleic acid-guided nuclease.
  • nucleic acid-guided nuclease is a Type II or Type V Cas protein.
  • nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue.
  • the recorder cassette further comprises a second PAM mutation that is not recognized by the nucleic acid-guided nuclease.
  • nucleic acid sequence further comprises a regulatory sequence that turns transcription of a screenable or selectable marker on or off. Further disclosed are methods wherein the nucleic acid sequence further comprises a PAM mutation that is not compatible with the nucleic acid-guided nuclease. Further disclosed are methods wherein the nucleic acid sequence further comprises a second unique landing site for subsequent engineering rounds.
  • the polynucleotide further comprises an editing cassette comprising a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid, wherein the unique barcode corresponds to the modified first target nucleic acid such that the modified target nucleic acid can be identified by the unique barcode.
  • an editing cassette comprising a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid, wherein the unique barcode corresponds to the modified first target nucleic acid such that the modified target nucleic acid can be identified by the unique barcode.
  • PAM protospacer adjacent motif
  • compositions comprising i) a first donor nucleic acid comprising: a) a modified first target nucleic acid sequence; b) a mutant protospacer adjacent motif (PAM) sequence; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid; and ii) a second donor nucleic acid comprising: a) a recorder sequence; and b) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of the second target nucleic acid.
  • the first donor nucleic acid and the second donor nucleic acid are covalently linked or comprised on a single nucleic acid molecule.
  • compositions wherein the modified first target nucleic acid comprises a 5′ homology are and a 3′ homology arm. Further provided are compositions wherein the 5′ homology arm and the 3′ homology arm are homologous to nucleic acid sequence flanking a protospacer complementary to the first spacer region. Further provided are compositions wherein the modified first target nucleic acid sequence comprises at least one inserted, deleted, or substituted nucleic acid compared to a corresponding un-modified first target nucleic acid. Further provided are compositions wherein the first gRNA is compatible with a nucleic acid-guided nuclease, thereby facilitating nuclease-mediate cleavage of the first target nucleic acid.
  • compositions wherein the nucleic acid-guided nuclease is a Cas protein, such as a Type II or Type V Cas protein. Further provided are compositions wherein the nucleic acid-guided nuclease is Cas9 or Cpf1. Further provided are compositions wherein the nucleic acid-guided nuclease is MAD2 or MAD7. Further provided are compositions wherein the nucleic acid-guided nuclease is an engineered or non-natural enzyme. Further provided are compositions wherein the nucleic acid-guided nuclease is a engineered or non-natural enzyme derived from Cas9 or Cpf1.
  • compositions wherein the nucleic acid-guided nuclease is an engineered or non-natural enzyme that has less than 80% homology to either Cas9 or Cpf1. Further provided are compositions wherein the mutant PAM sequence is not recognized by the nucleic acid-guided nuclease. Further provided are compositions wherein the recorder sequence comprises a barcode. Further provided are compositions wherein the recorder sequence comprises a fragment of a screenable or selectable marker. Further provided are compositions wherein the recorder sequence comprises a unique sequence by which the modified first target nucleic acid sequence is specifically identified. Further provided are compositions wherein the recorder sequence comprises a unique sequence by which the edited cells may be selected or enriched.
  • a first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein.
  • a second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein.
  • a first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette.
  • a first donor nucleic acid and a second donor nucleic acid can be covalently linked.
  • the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • cells comprising an engineered chromosome or polynucleic acid comprising: a first modified sequence; a first mutant protospacer adjacent motif (PAM); a first recorder sequence, the sequence of which uniquely identifies the first modified sequence, wherein the first modified sequence and the first recorder sequence are separated by at least 1 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least 100 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least 500 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least lkbp. Further provided are cells wherein the first recorder sequence is a barcode.
  • cells wherein the first modified sequence is within a coding sequence. Further provided are cells wherein the first modified sequence comprises at least one inserted, deleted, or substituted nucleotide compared to an unmodified sequence. Further provided are cells further comprising: a second modified sequence; a second mutant PAM; and a second recorder sequence, the sequence of which uniquely identifies the second modified sequence, wherein the second modified sequence and the second recorder sequence are separated by at least 1 kb. Further provided are cells wherein the first recorder sequence and the second recorder sequence are separated by less than 100 bp. Further provided are cells wherein the second recorder sequence is a barcode. Further provided are cells wherein the second modified sequence is within a coding sequence.
  • the second modified sequence comprises at least one inserted, deleted, or substituted nucleotide compared to an unmodified sequence.
  • the first recorder sequence and the second recorder sequence are immediately adjacent to each other or overlapping, thereby generating a combined recorder sequence.
  • the combined recorder sequence comprises a selectable or screenable marker.
  • the combined recorder sequence comprises a selectable or screenable marker by which the cells may be enriched or selected.
  • each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease
  • each polynucleotide comprises: i) a modified first target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; iii) a first guide nucleic acid sequence comprising a guide sequence complementary to a portion of the first target nucleic acid; and (iv) a recorder sequence
  • each polynucleotide further comprises a second mutant PAM sequence.
  • each polynucleotide further comprises a second guide nucleic acid sequence comprising a guide sequence complementary to a portion of the second target nucleic acid.
  • the recorder sequence comprises a unique sequence by which the modified first target nucleic acid is specifically identified upon sequencing the recorder sequence. Further provided are methods further comprising e) sequencing the recorder sequence, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step b).
  • inserting the modified first target nucleic acid sequence comprises cleaving the first target nucleic acid by the nuclease complexed with the transcription product of the first guide nucleic acid sequence. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homologous recombination. Further provided are methods wherein the polynucleotide further comprises a second guide nucleic acid sequence comprising a spacer region complementary to a portion of the second target nucleic acid.
  • inserting the recorder sequence comprises cleaving the second target nucleic acid by the nuclease complexed with the transcription product of the second guide nucleic acid sequence. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homologous recombination. Further provided are methods wherein the targetable nuclease is a Cas protein. Further provided are methods wherein the Cas protein is a Type II or Type V Cas protein. Further provided are methods wherein the Cas protein is Cas9 or Cpf1.
  • the targetable nuclease is a nucleic acid-guided nuclease. Further provided are methods wherein the targetable nuclease is MAD2 or MAD7. Further provided are methods wherein the mutant PAM sequence is not recognized by the targetable nuclease. Further provided are methods wherein the targetable nuclease is an engineered targetable nuclease. Further provided are methods wherein the mutant PAM sequence is not recognized by the engineered targetable nuclease.
  • each cell within the second population of cells comprises a third nucleic acid, a fourth target nucleic acid, and a targetable nuclease.
  • each of the second polynucleotides comprises: i) a modified third target nucleic acid sequence; ii) a third mutant protospacer adjacent motif (PAM) sequence; iii) a third guide nucleic acid sequence comprising a spacer region complementary to a portion of the third target nucleic acid; and (iv) a second recorder sequence.
  • each second polynucleotide further comprises a fourth mutant PAM sequence.
  • each second polynucleotide further comprises a fourth guide nucleic acid sequence comprising a guide sequence complementary to a portion of the fourth target nucleic acid.
  • methods further comprising: a) inserting the modified third target nucleic acid sequence within the third target nucleic acid; b) inserting the second recorder sequence within the fourth target nucleic acid; c) cleaving the third target nucleic acid by the nuclease in cells that do not comprise the second mutant PAM sequence, thereby enriching for cells comprising the inserted modified third target nucleic acid sequence.
  • the fourth target nucleic acid is adjacent to the second target nucleic acid.
  • the inserted first recorder sequence is adjacent to the second recorder sequence, such that sequencing information can be obtained for the first and second recorder sequence from a single sequencing read.
  • identifying engineered cells comprising: a) providing cells, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, b) introducing into the cells a polynucleotide comprising: 1) a first donor nucleic acid comprising i) a modified target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; and iii) a first guide nucleic acid sequence comprising a first guide sequence complementary to a portion of the first target nucleic acid; and 2) a second donor nucleic acid comprising i) a recorder sequence corresponding to the modified target nucleic acid sequence; and ii) a second guide nucleic acid sequence comprising a second guide sequence complementary to a portion of the second target nucleic acid, c) cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant PAM sequence
  • the second donor nucleic acid further comprises a second mutant PAM sequence.
  • sequencing the record sequence array comprises obtaining sequence information for each of the plurality of recorder sequences within a single sequencing read. Further provided are methods wherein steps a)-c) are repeated at least once. Further provided are methods wherein steps a)-c) are repeated at least twice. Further provided are methods wherein the recorder sequence is a barcode. Further provided are methods where the first donor nucleic acid and the second donor nucleic acid are covalently linked.
  • a first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein.
  • a second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein.
  • a first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette.
  • a first donor nucleic acid and a second donor nucleic acid can be covalently linked.
  • the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • identifying engineered cells comprising: a) providing cells, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, b) introducing into the cells a polynucleotide comprising: 1) a first donor nucleic acid comprising i) a modified target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; and iii) a first guide nucleic acid sequence comprising a first guide sequence complementary to a portion of the first target nucleic acid; and 2) a second donor nucleic acid comprising i) a marker fragment corresponding to the modified target nucleic acid sequence; and ii) a second guide nucleic acid sequence comprising a second guide sequence complementary to a portion of the second target nucleic acid, c) cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant PAM sequence,
  • the second donor nucleic acid further comprises a second mutant PAM sequence.
  • the complete marker comprises a selectable marker.
  • the selectable marker comprises an antibiotic resistance marker or an auxotrophic marker.
  • the complete marker comprises a screenable reporter.
  • the screenable reporter comprises a fluorescent reporter.
  • the screenable reporter comprises a gene.
  • the screenable reporter comprises a promotor or regulatory element.
  • the promoter or regulatory element turns on or off transcription of a screenable or selectable element.
  • a first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein.
  • a second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein.
  • a first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette.
  • a first donor nucleic acid and a second donor nucleic acid can be covalently linked.
  • the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • the polynucleotide further comprises a second mutant nuclease recognition site. Further provided are methods wherein selecting for a phenotype of interest comprises cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant nuclease recognition sequence, thereby enriching for cells comprising the inserted modified first target nucleic acid sequence. Further provided are methods wherein selecting for a phenotype of interest comprises cleaving the second target nucleic acid by the nuclease in cells that do not comprise the second mutant nuclease recognition sequence, thereby enriching for cells comprising the inserted modified first target nucleic acid sequence.
  • the recorder sequence is linked to the modified first target nucleic acid. Further provided are methods wherein the recorder sequence comprises a unique sequence by which the modified first target nucleic acid is specifically identified upon sequencing the recorder sequence. Further provided are methods further comprising e) sequencing the recorder sequence, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step b). Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises homologous recombination. Further provided are methods wherein the nuclease is a Cas protein.
  • the polynucleotide further comprises a first guide nucleic acid sequence comprising a guide sequence complementary to a portion of the first target nucleic acid. Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises cleaving the first target nucleic acid by the nuclease complexed with the transcription product of the first guide nucleic acid sequence. Further provided are methods wherein the polynucleotide further comprises a second guide nucleic acid sequence comprising a guide sequence complementary to a portion of the second target nucleic acid. Further provided are methods wherein inserting the recorder sequence comprises cleaving the second target nucleic acid by the nuclease complexed with the transcription product of the second guide nucleic acid sequence.
  • inserting the modified first target nucleic acid sequence or the recorder sequence comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence or the recorder sequence comprises homologous recombination. Further provided are methods wherein the mutant nuclease recognition sequence comprises a mutant PAM sequence not recognized by the targetable nuclease. Further provided are methods wherein the Cas protein is a Type II or Type V Cas protein. Further provided are methods wherein the targetable nuclease is MAD2. Further provided are methods wherein the mutant PAM sequence is not recognized by MAD2. Further provided are methods wherein the targetable nuclease is MAD7. Further provided are methods wherein the mutant PAM sequence is not recognized by MAD7.
  • the Cas protein is Cas9. Further provided are methods wherein the mutant PAM sequence is not recognized by Cas9. Further provided are methods wherein the Cas protein is Cpf1. Further provided are methods wherein the mutant PAM sequence is not recognized by Cpf1. Further provided are methods wherein the nuclease is an Argonaute nuclease. Further provided are methods further comprising introducing guide DNA oligonucleotides comprising a guide sequence complementary to a portion of the first target nucleic acid prior to selecting for a phenotype. Further provided are methods wherein the mutant nuclease recognition sequence comprises a mutant target flanking sequence not recognized by the Argonaute nuclease.
  • nuclease is a zinc finger nuclease. Further provided are methods wherein the mutant nuclease recognition sequence is not recognized by the zinc finger nuclease. Further provided are methods wherein the nuclease is a transcription activator-like effector nuclease (TALEN). Further provided are methods wherein the mutant nuclease recognition sequence is not recognized by the TALEN.
  • TALEN transcription activator-like effector nuclease
  • FIGS. 1 A- 1 C depict an example genetic engineering workflow including target design, plasmid design, and plasmid library generation.
  • Figure discloses SEQ ID NOS 187-190, respectively, in order of appearance.
  • FIGS. 2 A- 2 D depicts validation data for an example experiment using a disclosed engineering method.
  • FIGS. 3 A- 3 C depict an example trackable genetic engineering workflow, including a plasmid comprising an editing cassette and a recording cassette, and downstream sequencing of barcodes in order to identify the incorporated edit or mutation.
  • Figure discloses SEQ ID NOS 191-192, respectively, in order of appearance.
  • FIGS. 3 D- 3 E depict an example trackable genetic engineering workflow, including iterative rounds of engineering with a different editing cassette and recorder cassette with unique barcode (BC) at each round, followed by selection and tracking to confirm the successful engineering step at each round.
  • BC barcode
  • FIGS. 4 A- 4 B depict an example of incorporation of a target mutation and PAM mutation using a plasmid comprising an editing cassette.
  • Figure discloses SEQ ID NOS 193, 193, 194, 193, 194, 193, 193, 195, 193, 196, 193, 197, 194, 193, and 198, respectively, in order of appearance.
  • FIGS. 5 A- 5 B depict an example of a plasmid comprising an editing cassette, designed to incorporate a target mutation and a PAM mutation into a first target sequence, and a recording cassette, designed to incorporate a barcode sequence into a second target sequence.
  • FIG. 5 B depicts example data validating incorporation of the editing cassette and recorder cassette and selection of the engineered bacterial cells.
  • 5 A discloses the left column sequences as SEQ ID NOS 201, 200, 201, 200, 200, 200, 200, 200, 200, 200, 200, 201, 202, 200, 200, 200, 200, 200, 200, 200, 202, and 200, respectively, in order of appearance and the right column sequences as SEQ ID NOS 203, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 203, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, 205, and 205, respectively, in order of appearance.
  • FIG. 6 depicts an example recursive engineering workflow.
  • FIGS. 7 A- 7 B depict an example plasmid curing workflow for combinatorial engineering and validation of an example experiment using said workflow.
  • FIGS. 8 A- 8 B depict an example genetic engineering workflow including target design, plasmid design, and plasmid library generation.
  • Figure discloses SEQ ID NOS 187-190, respectively, in order of appearance.
  • FIGS. 9 A- 9 D depicts validation data for an example genetic engineering experiment.
  • FIGS. 10 A- 10 F depict an example data set from a genetic engineering experiment.
  • FIGS. 11 A- 11 C depict an example design and data set from a genetic engineering experiment.
  • FIGS. 12 A- 12 F depict an example design for a genetic engineering experiment.
  • FIGS. 13 A- 13 D depict example designed edits to be made by a genetic engineering.
  • Figure discloses SEQ ID NOS 187-190 and 206-207, respectively, in order of appearance.
  • FIGS. 14 A- 14 B depict an example design for a genetic engineering experiment.
  • FIGS. 15 A- 15 D depict an example of Cas9 editing efficiency controls.
  • Figure discloses SEQ ID NOS 208-209, respectively, in order of appearance.
  • FIGS. 16 A- 16 E depict an examples of toxicity of dsDNA cleavage in E. coli.
  • FIG. 16 F- 16 H depict an example of a transformation and survival assay, and editing and recording efficiencies, with low and high copy plasmids expressing Cas9.
  • FIGS. 17 A- 17 D depict an example of genetic engineering strategy for gene deletion.
  • FIGS. 17 A and 17 C disclose SEQ ID NO: 210.
  • FIGS. 18 A- 18 B depicts an example of editing efficiency controls by cotransformation of guide nucleic acid and linear dsDNA cassettes.
  • FIGS. 19 A- 19 D depict an example of library cloning analysis and statistics.
  • FIGS. 20 A- 20 B depict an example of precision of editing cassette tracking of recombineered populations.
  • FIG. 21 depicts an example of growth characteristics of folA mutations in M9 minimal media
  • FIGS. 22 A- 22 C depicts an example of enrichment profiles for folA editing cassettes in minimal media.
  • FIGS. 23 A- 23 F depict an example of validation of identified acrB mutations for improved solvent and antibiotic tolerance.
  • FIGS. 24 A- 24 D depict an example mutant variant assessment analysis.
  • FIG. 25 depicts an example of reconstruction of mutations identified by erythromycin selection.
  • FIGS. 26 A- 26 B depict an example of validation of Crp S28P mutation for furfural or thermal tolerance.
  • FIGS. 27 A- 27 C depict an example of edit and barcode correlation studies.
  • FIG. 28 depicts an example of a selectable recording strategy.
  • FIG. 29 depicts an example of a selectable recording strategy.
  • FIGS. 30 A- 30 B depict data from a selectable recording experiment.
  • Figure discloses “TCCACTGGTATGCAT” as SEQ ID NO: 211.
  • FIGS. 31 A- 31 B depict editing and transformation efficiencies from various nucleic acid-guided nucleases from an example experiment.
  • FIG. 32 depict editing efficiencies of the MAD2 nuclease with various guide nucleic acids.
  • FIG. 33 depict editing efficiencies of the MAD7 nuclease with various guide nucleic acids.
  • Methods and compositions for enabling sophisticated combinatorial engineering strategies to optimize and explore complex phenotypes are provided herein.
  • Many phenotypes of interest to basic research and biotechnology are the result of combinations of mutations that occur at distal loci.
  • cancer is often linked to mutations that influence multiple hallmark gene functions rather than a single chromosomal edit.
  • metabolic and regulatory processes that are the target of continuing engineering efforts require the activities of many proteins acting in concert to produce the phenotypic output of interest.
  • Methods and compositions disclosed herein can provide ways of rapid engineering and prototyping of such functions since they can provide rapid construction and accurate reporting on the mutational effects at many sites in parallel.
  • the methods and compositions described herein can be carried out or used in any type of cell in which a nucleic acid-guided nuclease system, such as CRISPR or Argonaute, or other targetable nuclease systems, such as TALEN, ZFN, or meganuclease can function (e.g., target and cleave DNA), including prokaryotic, eukaryotic, or archaeal cells.
  • the cell can be a bacterial cell, such as Escherichia spp. (e.g., E. coli ).
  • the cell can be a fungal cell, such as a yeast cell, e.g., Saccharomyces spp.
  • the cell can be a human cell.
  • the cell can be an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell. Additionally or alternatively, the methods described herein can be carried out in vitro or in cell-free systems in which a nucleic acid guided nuclease system, such as CRISPR or Argonaute, or other nuclease systems, such as TALEN, ZFN, or meganuclease can function (e.g., target and cleave DNA).
  • a nucleic acid guided nuclease system such as CRISPR or Argonaute
  • other nuclease systems such as TALEN, ZFN, or meganuclease can function (e.g., target and cleave DNA).
  • compositions and methods for genetic engineering Disclosed are methods and compositions suitable for trackable or recursive genetic engineering. Disclosed method and compositions can use massively multiplexed oligonucleotide synthesis and cloning to enable high fidelity, trackable, multiplexed genome editing at single nucleotide resolution on a whole genome scale.
  • Methods and compositions can be used to perform high-fidelity trackable editing, for example, at single-nucleotide resolution and can be used to perform editing at a whole genome scale or on episomal nucleic acid molecules.
  • Massively multiplexed oligonucleotide synthesis and/or cloning can be used in combination with a targetable nuclease system, such as a CRISPR system, MAD2 system, MAD7 system, or other nucleic acid-guided nuclease system, for editing.
  • cassette often refers to a single molecule polynucleotide.
  • a cassette can comprise DNA.
  • a cassette can comprise RNA.
  • a cassette can comprise a combination of DNA and RNA.
  • a cassette can comprise non-naturally occurring nucleotides or modified nucleotides.
  • a cassette can be single stranded.
  • a cassette can be double stranded.
  • a cassette can be synthesized as a single molecule.
  • a cassette can be assembled from other cassettes, oligonucleotides, or other nucleic acid molecules.
  • a cassette can comprise one or more elements.
  • Such elements can include, as non-limiting examples, one or more of any of editing sequences, recorder sequences, guide nucleic acids, promoters, regulatory elements, mutant PAM sequences, homology arms, primer sites, linker regions, unique landing sites, a cassette, and any other element disclosed herein. Such elements can be in any order or combination. Any two or more elements can be contiguous or non-contiguous.
  • a cassette can be comprised within a larger polynucleic acid. Such a larger polynucleic acid can be linear or circular, such as a plasmid or viral vector.
  • a cassette can be a synthesized cassette.
  • a cassette can be a trackable cassette.
  • a cassette can be designed to be used in any method or composition disclosed herein, including multiplex engineering methods and trackable engineering methods.
  • An exemplary cassette can couple two or more elements, such as 1) a guide nucleic acid (e.g. gRNAs or gDNAs) designed for targeting a user specified target sequence in the genome and 2) an editing sequence and/or recorder sequence as disclosed herein (e.g. FIG. 1 B and FIG. 5 A ).
  • a cassette comprising an editing sequence and guide nucleic acid can be referred to as an editing cassette.
  • a cassette comprising an editing sequence can be referred to as an editing cassette.
  • a cassette comprising a recorder sequence and a guide nucleic acid can be referred to as a recorder cassette.
  • a cassette comprising a recorder sequence can be referred to as a recorder cassette.
  • an editing cassette and a recorder cassette are delivered into the cell at the same time.
  • an editing cassette and a recorder cassette may be covalently linked.
  • these elements may be synthesized together by multiplexed oligonucleotide synthesis.
  • a cassette can comprise one or more guide nucleic acids and editing cassette as a contiguous polynucleotide. In other examples, one or more guide nucleic acids and editing cassette are contiguous. In other examples, one or more guide nucleic acids and editing cassette are non-contiguous. In other examples, two or more guide nucleic acids and editing cassette are non-contiguous.
  • a cassette can comprise one or more guide nucleic acids, an editing cassette, and a recorder cassette as a contiguous polynucleotide.
  • one or more guide nucleic acids, editing cassette, and recorder cassette are contiguous.
  • two or more guide nucleic acids, editing cassette, and recorder cassette are contiguous.
  • one or more guide nucleic acids, editing cassette, and recorder cassette are non-contiguous.
  • two or more guide nucleic acids, editing cassette, and recorder cassette are non-contiguous.
  • a cassette can comprise one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes as a contiguous polynucleotide.
  • one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes are contiguous.
  • two or more guide nucleic acids, two or more editing cassettes, and two or more recorder cassettes are contiguous.
  • one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes are non-contiguous.
  • two or more guide nucleic acids, two or more editing cassettes, and two or more recorder cassettes are non-contiguous.
  • a cassette can comprise one or more guide nucleic acids and editing sequence as a contiguous polynucleotide. In other examples, one or more guide nucleic acids and editing sequence are contiguous. In other examples, one or more guide nucleic acids and editing sequence are non-contiguous. In other examples, two or more guide nucleic acids and editing sequence are non-contiguous.
  • a cassette can comprise one or more guide nucleic acids, an editing sequence, and a recorder sequence as a contiguous polynucleotide.
  • one or more guide nucleic acids, editing sequence, and recorder sequence are contiguous.
  • two or more guide nucleic acids, editing sequence, and recorder sequence are contiguous.
  • one or more guide nucleic acids, editing sequence, and recorder sequence are non-contiguous.
  • two or more guide nucleic acids, editing sequence, and recorder sequence are non-contiguous.
  • a cassette can comprise one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences as a contiguous polynucleotide.
  • one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences are contiguous.
  • two or more guide nucleic acids, two or more editing sequences, and two or more recorder sequences are contiguous.
  • one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences are non-contiguous.
  • two or more guide nucleic acids, two or more editing sequences, and two or more recorder sequences are non-contiguous.
  • An editing cassette can comprise an editing sequence.
  • An editing sequence can comprise a mutation, such as a synonymous or non-synonymous mutation, and homology arms (HAs).
  • An editing sequence can comprise a mutation, such as a synonymous or non-synonymous mutation, and homology arms (HAs) designed to undergo homologous recombination with the target sequence at the site of nucleic acid-guided nuclease-mediated double strand break (e.g. FIG. 1 B ).
  • a recorder cassette can comprise a recorder sequence.
  • a recorder sequence can comprise a trackable sequence, such as a barcode or marker, and homology arms (HAs).
  • a recorder sequence can comprise a trackable sequence, such as a barcode or marker, and homology arms (HAs) designed to undergo homologous recombination with the chromosome at the site of nucleic acid-guided nuclease-mediated double strand break (e.g. FIG. 1 B ).
  • a cassette can encode machinery (e.g. targetable nuclease, guide nucleic acid, editing cassette, and/or recorder cassette as disclosed herein) necessary to induce strand breakage as well as designed repair that can be selectively enriched and/or tracked in cells.
  • a cell can be any cell such as eukaryotic cell, archaeal cell, prokaryotic cell, or microorganisms such as E. coli (e.g. FIG. 2 A- 2 D ).
  • a cassette can comprise an editing cassette.
  • a cassette can comprise a recorder cassette.
  • a cassette can comprise a guide nucleic acid and an editing cassette.
  • a cassette can comprise a guide nucleic acid and a recorder cassette.
  • a cassette can comprise a guide nucleic acid, an editing cassette, and a recorder cassette.
  • a cassette can comprise two guide nucleic acids, an editing cassette, and a recorder cassette.
  • a cassette can comprise more than two guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes.
  • a cassette can comprise an editing sequence.
  • a cassette can comprise a recorder sequence.
  • a cassette can comprise a guide nucleic acid and an editing sequence.
  • a cassette can comprise a guide nucleic acid and a recorder sequence.
  • a cassette can comprise a guide nucleic acid, an editing sequence, and a recorder sequence.
  • a cassette can comprise two guide nucleic acids, an editing sequence, and a recorder sequence.
  • a cassette can comprise more than two guide nucleic acids, one or more editing sequences, and one or more recorder sequences.
  • Single genome edits can be tracked using sequencing technologies, e.g. short read sequencing technologies (e.g. FIG. 1 C ), long read sequencing technologies, or any other sequencing technologies known in the art.
  • sequencing technologies e.g. short read sequencing technologies (e.g. FIG. 1 C ), long read sequencing technologies, or any other sequencing technologies known in the art.
  • each editing cassette upon transformation, each editing cassette generates the designed genetic modification within the transformed cell.
  • the editng cassette can act in trans as a barcode of the genetic mutation introduced by the editing cassette and can enable the tracking of this mutation frequency in a complex population over time and across many different growth conditions (e.g. FIG. 2 A- 2 D and FIG. 1 C ).
  • a recording cassette inserts the designed trackable sequence, such as a marker or barcode sequence, within the transformed cell.
  • the recorder cassette can act in cis as a barcode of the chromosomal mutation and can enable the tracking of this mutation frequency in a complex population over time and across many different growth conditions.
  • the methods provided herein simplify sample preparation and depth of coverage for mapping diversity genome wide, and provide powerful tools for engineering on a genome scale (e.g. FIG. 1 C ).
  • a plurality of cassettes can be pooled into a library of cassettes.
  • a library of cassettes can comprise at least 2 cassettes.
  • a library of cassettes can comprise from 5 to a million cassettes.
  • a library of cassettes can comprise at least a million cassettes. It should be understood, that a library of cassettes can comprise any number of cassettes.
  • a library of cassettes can comprise cassettes that have any combination of common elements and non-common or unique elements as compared to the other cassettes within the pool.
  • a library of cassettes can comprise common priming sites or common homology arms while also containing non-common or unique barcodes.
  • Common elements can be shared by a plurality, majority, or all of the cassettes within a library of cassettes.
  • Non-common elements can be shared by a plurality, minority, or sub-population of cassettes within the library of cassettes.
  • Unique elements can be shared by a one, a few, or a sub-population of cassettes within the library of cassettes, such that it is able to identify or distinguish the one, few, or sub-population of cassettes from the other cassettes within the library of cassettes. Such combinations of common and non-common are advantageous for multiplexing techniques as disclosed herein.
  • Cassettes disclosed herein can generate the designed genetic modification or insert the designed marker or barcode sequence with high efficiency within a transformed cell.
  • the efficiency is greater than 50%.
  • the efficiency is 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% (e.g., FIGS. 32 A, 32 B, and 33 ).
  • transformation, editing, and/or recording efficiency can be increased by modulating the expression of one or more components disclosed herein, such as a nucleic acid-guided nuclease.
  • a nucleic acid-guided nuclease such as a nucleic acid-guided nuclease.
  • Methods for modulating components are disclosed herein and are known in the art. Such methods can include expressing a component, such as a nucleic acid-guided nuclease or CRISPR enzyme of a subject system on a low or high copy plasmid, depending on the experimental design.
  • a cassettes can comprise a cassettes as disclosed herein.
  • a cassette can comprise any combination of an editing cassette and/or recorder cassette disclosed herein.
  • Such a cassette can be comprised on a larger polynucleic acid molecule.
  • Such a larger polynucleic acid molecule can be linear or circular, such as a plasmid or viral vector.
  • An editing cassette can comprise a mutation relative to a target nucleic acid sequence.
  • the editing cassette can comprise sequence homologous to the target sequence flanking the desired mutation or editing sequence.
  • the editing cassette can comprise a region which recognizes, or hybridizes to, a target sequence of a nucleic acid in a cell or population of cells, is homologous to the target sequence of the nucleic acid of the cell and includes a mutation, or a desired mutation, of at least one nucleotide relative to the target sequence.
  • An editing cassette can comprise a first editing sequence comprising a first mutation relative to a target sequence.
  • a first mutation can comprise a mutation such as an insertion, deletion, or substitution of at least one nucleotide compared to the non-editing target sequence.
  • the mutation can be incorporated into a coding region or non-coding region.
  • An editing cassette can comprise a second editing sequence comprising a second mutation relative to a target sequence.
  • the second mutation can be designed to mutate or otherwise silence a PAM sequence such that a corresponding nucleic acid guided nuclease or CRISPR nuclease is no longer able to cleave the target sequence.
  • this mutation or silencing of a PAM can serve as a method for selecting transformants in which the first editing sequence has been incorporated.
  • an editing cassette comprises at least two mutations, wherein one mutation is a PAM mutation.
  • the PAM mutation can be in a second editing cassette.
  • Such a second editing cassette can be covalently linked and can be continuous or non-contiguous to the other elements in the cassette.
  • An editing cassette can comprise a guide nucleic acid, such as a gRNA encoding gene, optionally operably linked to a promoter.
  • the guide nucleic acid can be designed to hybridize with the targeted nucleic acid sequence in which the editing sequence will be incorporated.
  • a recording cassette can comprise a recording sequence.
  • a recorder sequence can comprise a barcoding sequence, or other screenable or selectable marker or fragment thereof.
  • the recording sequence can be comprised within a recorder cassette.
  • Recorder cassettes can comprise regions homologous to an insertion site within a target nucleic acid sequence such that the recording sequence is incorporated by homologous recombination or homology-driven repair systems.
  • the site of incorporation of the recording cassette can be comprised on the same DNA molecule as the target nucleic acid to be edited by an editing cassette.
  • the recorder sequence can comprise a barcode, unique DNA sequence, and/or a complete copy or fragment of a selectable or screenable element or marker.
  • a recorder cassette can comprise a mutation relative to the target sequence.
  • the mutation can be designed to mutate or otherwise silence a PAM sequence such that a corresponding nucleic acid guided nuclease or CRISPR nuclease is no longer able to cleave the target sequence. In such cases, this mutation or silencing of a PAM site can serve as a method for selecting transformants in which the first recording sequence has been incorporated.
  • a recorder cassette can comprise a PAM mutation.
  • the PAM mutation can be designed to mutate or otherwise silence a PAM site such that a corresponding CRISPR nuclease is no longer able to cleave the target sequence. In such cases, this mutation or silencing of a PAM site can serve as a method for selecting transformants in which the recorder sequence has been incorporated.
  • a recorder cassette can comprise a guide nucleic acid, such as a gene encoding a gRNA.
  • a promoter can be operably linked to a nucleic acid sequence encoding a guide nucleic acid capable of targeting a nucleic acid-guided nuclease to the desired target sequence.
  • a guide nucleic acid can target a unique site within the target site. In some cases, the guide nucleic acid targets a unique landing site that was incorporated in a prior round of engineering. In some cases, the guide nucleic acid targets a unique landing site that was incorporated by a recorder cassette in a prior round of engineering.
  • a recorder cassette can comprise a barcode.
  • a barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode. In some examples, the barcode is a non-naturally occurring sequence that is not found in nature. In most examples, the combination of the desired mutation and the barcode within the editing cassette is non-naturally occurring and not found in nature.
  • a barcode can be any number of nucleotides in length.
  • a barcode can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. In some cases, the barcode is more than 30 nucleotides in length.
  • a barcode can be generated by degenerate oligonucleotide synthesis.
  • a barcode can be rationally designed or user-specified.
  • a recorder cassette can comprise a landing site.
  • a landing site can serve as a target site for a recorder cassette for a successive engineering round.
  • a landing site can comprise a PAM.
  • a landing site can be a unique sequence.
  • a landing site can be at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 nucleotides in length. In some cases, the landing site is greater than 50 nucleotides in length.
  • a recorder cassette can comprise a selectable or screenable marker, or a regulatory sequence or mutation that turns a selectable or screenable marker on or off.
  • the turning on or off of a selectable marker can be used of selection or counter-selection, respectively, of iterative rounds of engineering.
  • An example regulatory sequence includes a ribosome-binding site (RBS), though other such regulatory sequences are envisioned.
  • Mutations that turn a selectable or screenable marker on can include any possible start codon that is recognized by the host transcription machinery.
  • a mutation that turns off a selectable or screenable marker includes a mutation that deletes a start codon or one that inserts a premature stop codon or a reading frame shift mutation.
  • a recorder cassette can comprise one or more of a guide nucleic acid targeting a target site into which the recorder sequence is to be incorporated, a PAM mutation to silence a PAM used by the guide RNA, a barcode corresponding to an editing cassette, a unique site to serve as a landing site for a recorder cassette of a subsequent rounds of engineering, a regulatory sequence or mutation that turns a screenable or selectable marker on or off, these one or more elements being flanked by homology arms that are designed to promote recombination of these one or more elements into the cleaved target site that is targeted by the guide RNA.
  • a recorder cassette can comprise a first homology arm, a PAM mutation, a barcode, a unique landing site, a regulatory sequence or mutation for a screenable or selectable marker, a second homology arm, and guide RNA.
  • the first homology arm can be an upstream homology arm.
  • the second homology arm can be a downstream homology arm.
  • the homology arms can be homologous to sequences flanking a cleavage site that is targeted by the guide RNA.
  • a cassette can comprise two guide nucleic acids designed to target two distinct target nucleic acid sequences.
  • the guide nucleic acid can comprise a single gRNA or chimeric gRNA consisting of a crRNA and trRNA sequences, or alternatively, the gRNA can comprise separated crRNA and trRNAs, or a guide nucleic acid can comprise a crRNA.
  • guide nucleic acid can be introduced simultaneously with a trackable polynucleic acid or plasmid comprising an editing cassette and/or recorder cassette. In these cases, the guide nucleic acid can be encoded on a separate plasmid or be delivered in RNA form via delivery methods well known in the art.
  • a cassette can comprise a gene encoding a nucleic acid-guided nuclease, such as a CRISPR nuclease, functional with the chosen guide nucleic acid.
  • a nucleic acid-guided nuclease or CRISPR nuclease gene can be provided on a separate plasmid.
  • a nucleic acid-guided nuclease or CRISPR nuclease can be provided on the genome or episomal plasmid of a host organism to which a trackable polynucleic acid or plasmid will be introduced.
  • the nucleic acid-guided nuclease or CRISPR nuclease gene can be operably linked to a constitutive or inducible promotor.
  • a nucleic acid-guided nuclease or CRISPR nuclease can be provided as mRNA or polypeptide using delivery systems well known in the art.
  • Such mRNA or polypeptide delivery systems can include, but are not limited to, nanoparticles, viral vectors, or other cell-permeable technologies.
  • a cassette can comprise a selectable or screenable marker, for example, such as that comprised within a recorder cassette.
  • the recorder cassette can comprise a barcode, such as trackable nucleic acid sequence which can be uniquely correlated with a genetic mutation of the corresponding editing cassette, or otherwise identifiably correlated with such a genetic mutation such that sequencing the barcode will allow identification of the corresponding genetic mutation introduced by the editing cassette.
  • recorder cassette can comprise a complete copy of or a fragment of a gene encoding an antibiotic resistance gene, auxotrophic marker, fluorescent protein, or other known selectable or screenable markers.
  • a trackable library can comprise a plurality of cassettes as disclosed herein.
  • a trackable library can comprise a plurality of trackable polynucleic acids or plasmids comprising a cassette as disclosed herein.
  • a cassette, polynucleotide, or plasmid comprising a recorder sequence or recorder cassette as disclosed herein can be referred to as a trackable cassette, polynucleotide, or plasmid.
  • a cassette, polynucleotide, or plasmid comprising an editing sequence or editing cassette as disclosed herein can be referred to as a trackable cassette, polynucleotide, or plasmid.
  • the trackable library within the trackable library are distinct editing cassette and recorder cassette combinations that are sequenced to determine which editing sequence corresponds with a given marker or barcode sequence comprised within the recorder cassette. Therefore, when the editing and recorder sequences are incorporated into a target sequence, you can determine the edit that was incorporated by sequencing the recorder sequence. Sequence the recorder sequence or barcode can significantly cut down on sequencing time and cost.
  • Library size can depend on the experiment design. For example, if the aim is to edit each amino acid within a protein of interest, then the library size can depend on the number (N) of amino acids in a protein of interest, with a full saturation library (all 20 amino acids at each position or non-naturally occurring amino acids) scaling as 19 (or more) ⁇ N and an alanine-mapping library scaling as 1 ⁇ N.
  • N number of amino acids in a protein of interest
  • a full saturation library all 20 amino acids at each position or non-naturally occurring amino acids scaling as 19 (or more) ⁇ N
  • an alanine-mapping library scaling as 1 ⁇ N Screening of even very large proteins of more than 1,000 amino acids can be tractable given current multiplex oligo synthesis capabilities (e.g. 120,000 oligos).
  • more general properties with developed high-throughput screens and selections can be efficiently tested using the libraries disclosed herein.
  • libraries can be designed to mutate any number of amino acids within a target protein, including 1, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. up to the total number of amino acids within a target protein.
  • select amino acids can be targeted, such as catalytically active amino acids, or those involved in protein-protein interactions.
  • Each amino acid that is targeted for mutation can be mutated into any number of alternate amino acids, such as any other natural or non-naturally occurring amino acid or amino acid analog.
  • all targeted amino acids are mutated to the same amino acid, such as alanine.
  • the targeted amino acids are independently mutated to any other amino acid in any combination or permutation.
  • Trackable libraries can comprise trackable mutations in individual residues or sequences of interest. Trackable libraries can be generated using custom-synthesized oligonucleotide arrays. Trackable plasmids can be generated using any cloning or assembly methods known in the art. For example, CREATE-Recorder plasmids can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
  • Recorder sequences such as barcodes, can be designed in silico via standard code with a degenerate mutation at the target codon.
  • the degenerate mutation can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleic acid residues.
  • the degenerate mutations can comprise 15 nucleic acid residues (N15).
  • Homology arms can be added to a recorder sequence and/or editing sequence to allow incorporation of the recorder and/or editing sequence into the desired location via homologous recombination or homology-driven repair.
  • Homology arms can be added by synthesis, in vitro assembly, PCR, or other known methods in the art.
  • homology arms can be assembled via overlapping oligo extension, Gibson assembly, or any other method disclosed herein.
  • a homology arm can be added to both ends of a recorder and/or editing sequence, thereby flanking the sequence with two distinct homology arms, for example, a 5′ homology arm and a 3′ homology arm.
  • the same 5′ and 3′ homology arms can be added to a plurality of distinct recorder sequences, thereby generating a library of unique recorder sequences that each have the same spacer target or targeted insertion site.
  • the same 5′ and 3′ homology arms can be added to a plurality of distinct editing sequences, thereby generating a library of unique editing sequences that each have the same spacer target or targeted insertion site.
  • different or a variety of 5′ or 3′ homology arms can be added to a plurality of recorder sequences or editing sequences.
  • a recorder sequence library comprising flanking homology arms can be cloned into a vector backbone.
  • the recorder sequence and homology arms are cloned into a recorder cassette.
  • Recorder cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of recorder sequence insertion.
  • the nucleic acid sequences flanking the CRISPR/Cas-mediated cleavage site are homologous or substantially homologous to the homology arms comprised within the recorder cassette.
  • An editing sequence library comprising flanking homology arms can be cloned into a vector backbone.
  • the editing sequence and homology arms are cloned into an editing cassette.
  • Editing cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of editing sequence insertion.
  • the nucleic acid sequences flanking the CRISPR/Cas-mediated cleavage site are homologous or substantially homologous to the homology arms comprised within the editing cassette.
  • Gene-wide or genome-wide editing libraries can be subcloned into a vector backbone.
  • the vector backbone comprises a recorder cassette as disclosed herein.
  • the editing sequence library can be inserted or assembled into a second site to generate competent trackable plasmids that can embed the recording barcode at a fixed locus while integrating the editing libraries at a wide variety of user defined sites.
  • a recorder sequence and/or cassette can be assembled or inserted into a vector backbone first, followed by insertion of an editing sequence and/or cassette.
  • an editing sequence and/or cassette can be inserted or assembled into a vector backbone first, followed by insertion of a recorder sequence and/or cassette.
  • a recorder sequence and/or cassette and an editing sequence and/or cassette are simultaneous inserted or assembled into a vector.
  • a recorder sequence and/or cassette and an editing sequence and/or cassette are comprised on the same cassette prior to simultaneous insertion or assembly into a vector.
  • a recorder sequence and/or cassette and an editing sequence and/or cassette are linked prior to simultaneous insertion or assembly into a vector.
  • a recorder sequence and/or cassette and an editing sequence and/or cassette are covalently linked prior to simultaneous insertion or assembly into a vector. In any of these cases, trackable plasmids or plasmid libraries can be generated.
  • a cassette or nucleic acid molecule can be synthesized which comprises one or more elements disclosed herein.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette and a guide nucleic acid.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette and a recorder cassette.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette, a guide nucleic acid, and a recorder cassette.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette, a recorder cassette, and two guide nucleic acids.
  • a nucleic acid molecule can be synthesized that comprises a recorder cassette and a guide nucleic acid.
  • a nucleic acid molecule can be synthesized that comprises a recorder cassette.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette.
  • the guide nucleic acid can optionally be operably linked to a promoter.
  • the nucleic acid molecule can further include one or more barcodes.
  • cassettes or synthesized nucleic acid molecules can be synthesized using any oligonucleotide synthesis method known in the art.
  • cassettes can be synthesized by array based oligonucleotide synthesis.
  • the oligonucleotides can be cleaved from the array. Cleavage of oligonucleotides from an array can create a pool of oligonucleotides.
  • Software and automation methods can be used for multiplex synthesis and generation. For example, software and automation can be used to create 10, 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , or more cassettes, such as trackable cassettes.
  • An automation method can generate trackable plasmids in rapid fashion. Trackable cassettes can be processed through a workflow with minimal steps to produce precisely defined genome-wide libraries.
  • Cassette libraries such as trackable cassette libraries, can be generated which comprise two or more nucleic acid molecules or plasmids comprising any combination disclosed herein of recorder sequence, editing sequence, guide nucleic acid, and optional barcode, including combinations of one or more of any of the previously mentioned elements.
  • such a library can comprise at least 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , or more nucleic acid molecules or plasmids of the present disclosure. It should be understood that such a library can include any number of nucleic acid molecules or plasmids, even if the specific number is not explicit listed above.
  • Cassettes or cassette libraries can be sequenced in order to determine the recorder sequence and editing sequence pair that is comprised on each cassette.
  • a known recorder sequence is paired with a known editing sequence during the library generation process.
  • Other methods of determining the association between a recorder sequence and editing sequence comprised on a common nucleic acid molecule or plasmid are envisioned such that the editing sequence can be identified by identification or sequencing of the recorder sequence.
  • the libraries can be comprised on plasmids, Bacterial artificial chromosomes (BACs), Yeast artificial chromosomes (YACs), synthetic chromosomes, or viral or phage genomes. These methods and compositions can be used to generate portable barcoded libraries in host organisms, such as E. coli . Library generation in such organisms can offer the advantage of established techniques for performing homologous recombination. Barcoded plasmid libraries can be deep-sequenced at one site to track mutational diversity targeted across the remaining portions of the plasmid allowing dramatic improvements in the depth of library coverage (e.g. FIG. 3 A ).
  • Each plasmid can encode a recorder cassette designed to edit a site in the target DNA (e.g. FIG. 3 A , black cassette). Sites to be targeted can be functionally neutral sites, or they can be a screenable or selectable marker gene.
  • the homology arm (HA) of the recorder cassette can contain a recorder sequence (e.g., FIG. 3 B ) that is inserted into the recording site during recombineering. Recombineering can comprise DNA cleavage, such as nucleic acid-guided nuclease-mediated DNA cleavage, and repair via homologous recombination.
  • the recorder sequence can comprise a barcode, unique DNA sequence, or a complete copy or fragment of a screenable or selectable marker. In some examples, the recorder sequence is 15 nucleotides.
  • the recorder sequence can comprise less than 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 88, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more than 200 nucleotides.
  • the recorder cassette can be covalently coupled to at least one editing cassette in a plasmid (e.g., FIG. 3 A , green cassette) to generate trackable plasmid libraries that have a unique recorder and editing cassette combination.
  • This trackable library can be sequenced to generate the recorder/edit mapping and used to track editing libraries across large segments of the target DNA (e.g., FIG. 3 C ).
  • Recorder and editing sequences can be comprised on the same polynucleotide, in which case they are both incorporated into the target nucleic acid sequence, such as a genome or plasmid, by the same recombination event.
  • the recorder and editing sequences can be comprised on separate cassettes within the same trackable plasmid, in which case the recorder and editing sequences are incorporated into the target nucleic acid sequence by separate recombination events, either simultaneously or sequentially.
  • Methods are provided herein for combining multiplex oligonucleotide synthesis with recombineering, to create libraries of specifically designed and trackable mutations. Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of mutations leading to a phenotype of interest.
  • Methods and compositions disclosed herein can be used to simultaneously engineer and track engineering events in a target nucleic acid sequence.
  • Trackable plasmids can be generated using in vitro assembly or cloning techniques.
  • the CREATE-Recorder plasmids can be generated using chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
  • Trackable plasmids can comprise at least one recording sequence, such as a barcode, and at least one editing sequence. In most cases, the recording sequence is used to record and track engineering events. Each editing sequence can be used to incorporate a desired edit into a target nucleic acid sequence. The desired edit can include insertion, deletion, substitution, or alteration of the target nucleic acid sequence.
  • the one or more recording sequence and editing sequences are comprised on a single cassette comprised within the trackable plasmid such that they are incorporated into the target nucleic acid sequence by the same engineering event.
  • the recording and editing sequences are comprised on separate cassettes within the trackable plasmid such that they are each incorporated into the target nucleic acid by distinct engineering events.
  • the trackable plasmid comprises two or more editing sequences. For example, one editing sequence can be used to alter or silence a PAM sequence while a second editing sequence can be used to incorporate a mutation into a distinct sequence.
  • Recorder sequences can be inserted into a site separated from the editing sequence insertion site.
  • the inserted recorder sequence can be separated from the editing sequence by 1 bp or any number of base pairs.
  • the separation distance can be about 1 bp, 10 bp, 50 bp, 100 bp, 500 bp, lkp, 2 kb, 5 kb, 10 kb, or greater.
  • the separation distance can be any discrete integer of base pairs. It should be readily understood that there the limit of the number of base pairs separating the two insertion sites can be limited by the size of the genome, chromosome, or polynucleotide into which the insertions are being made. In some examples, the maximum distance of separation depends on the size of the target nucleic acid or genome.
  • Recorder sequences can be inserted adjacent to editing sequences, or within proximity to the editing sequence.
  • the recorder sequence can be inserted outside of the open reading frame within which the editing sequence is inserted.
  • Recorder sequence can be inserted into an untranslated region adjacent to an open reading frame within which an editing sequence has been inserted.
  • the recorder sequence can be inserted into a functionally neutral or non-functional site.
  • the recorder sequence can be inserted into a screenable or selectable marker gene.
  • the target nucleic acid sequence is comprised within a genome, artificial chromosome, synthetic chromosome, or episomal plasmid.
  • the target nucleic acid sequence can be in vitro or in vivo.
  • the CREATE-Recorder plasmid can be introduced into the host organisms by transformation, transfection, conjugation, biolistics, nanoparticles, cell-permeable technologies, or other known methods for DNA delivery, or any combination thereof.
  • the host organism can be a eukaryote, prokaryote, bacterium, archaea, yeast, or other fungi.
  • the engineering event can comprise recombineering, non-homologous end joining, homologous recombination, or homology-driven repair.
  • the engineering event is performed in vitro or in vivo.
  • the methods described herein can be carried out in any type of cell in which a nucleic acid-guided nuclease system can function (e.g., target and cleave DNA), including prokaryotic and eukaryotic cells or in vitro.
  • the cell is a bacterial cell, such as Escherichia spp. (e.g., E. coli ).
  • the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp.
  • the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
  • a cell is a recombinant organism.
  • the cell can comprise a non-native nucleic acid-guided nuclease system.
  • the cell can comprise recombination system machinery.
  • recombination systems can include lambda red recombination system, Cre/Lox, attB/attP, or other integrase systems.
  • the trackable plasmid can have the complementary components or machinery required for the selected recombination system to work correctly and efficiently.
  • a method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing cassette; (c) obtaining viable cells.
  • Such a method can optionally further comprise (d) sequencing the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • a method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette comprising a PAM mutation as disclosed herein and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells.
  • Such a method can optionally further comprise (d) sequencing the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • Method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, at least one recorder cassette, and at least two gRNA into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing and recorder cassettes; (c) obtaining viable cells.
  • Such a method can optionally further comprise (d) sequencing the recorder sequence of the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • a method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, a recorder cassette, and at least two gRNA into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette and recorder cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; and (c) obtaining viable cells.
  • Such a method can optionally further comprise (d) sequencing the recorder sequence of the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • Such methods can also further comprise a recorder cassette comprising a second PAM mutation, such that both PAMs must be silences by the editing cassette PAM mutation and recorder cassette PAM mutation in order to escape cell death.
  • transformation efficiency is determined by using a non-targeting guide nucleic acid control, which allows for validation of the recombineering procedure and CFU/ng calculations.
  • absolute efficient is obtained by counting the total number of colonies on each transformation plate, for example, by counting both red and white colonies from a galK control.
  • relative efficiency is calculated by the total number of successful transformants (for example, white colonies) out of all colonies from a control (for example, galK control).
  • the methods of the disclosure can provide, for example, greater than 1000 ⁇ improvements in the efficiency, scale, cost of generating a combinatorial library, and/or precision of such library generation.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the efficiency of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the scale of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater decrease in the cost of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the precision of genomic or combinatorial library generation.
  • Disclosed herein are methods and compositions for iterative rounds of engineering. Disclosed herein are recursive engineering strategies that allow implementation of trackable engineering at the single cell level through several serial engineering cycles (e.g., FIG. 3 D or FIG. 6 ). These disclosed methods and compositions can enable search-based technologies that can effectively construct and explore complex genotypic space. The terms recursive and iterative can be used interchangeably.
  • Combinatorial engineering methods can comprise multiple rounds of engineering.
  • Methods disclosed herein can comprise 2 or more rounds of engineering.
  • a method can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more than 30 rounds of engineering.
  • a new recorder sequence such as a barcode
  • a target site e.g., FIG. 3 D , green bars or FIG. 6 , black bars
  • a PCR, or similar reaction, of the recording locus can be used to reconstruct each combinatorial genotype or to confirm that the engineered edit from each round has been incorporated into the target site.
  • Selection can occur by a PAM mutation incorporated by an editing cassette.
  • Selection can occur by a PAM mutation incorporated by a recorder cassette.
  • Selection can occur using a screenable, selectable, or counter-selectable marker.
  • Selection can occur by targeting a site for editing or recording that was incorporated by a prior round of engineering, thereby selecting for variants that successfully incorporated edits and recorder sequences from both rounds or all prior rounds of engineering.
  • Quantitation of these genotypes can be used for understanding combinatorial mutational effects on large populations and investigation of important biological phenomena such as epistasis.
  • Serial editing and combinatorial tracking can be implemented using recursive vector systems as disclosed herein.
  • These recursive vector systems can be used to move rapidly through the transformation procedure (e.g., FIG. 7 A ).
  • these systems consist of two or more plasmids containing orthogonal replication origins, antibiotic markers, and gRNAs.
  • the gRNA in each vector can be designed to target one of the other resistance markers for destruction by nucleic acid-guided nuclease-mediated cleavage.
  • These systems can be used, in some examples, to perform transformations in which the antibiotic selection pressure is switched to remove the previous plasmid and drive enrichment of the next round of engineered genomes.
  • Two or more passages through the transformation loop can be performed, or in other words, multiple rounds of engineering can be performed.
  • Introducing the requisite recording cassettes and editing cassettes into recursive vectors as disclosed herein can be used for simultaneous genome editing and plasmid curing in each transformation step with high efficiencies.
  • the recursive vector system disclosed herein comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique plasmids.
  • the recursive vector system can use a particular plasmid more than once as long as a distinct plasmid is used in the previous round and in the subsequent round.
  • Recursive methods and compositions disclosed herein can be used to restore function to a selectable or screenable element in a targeted genome or plasmid.
  • the selectable or screenable element can include an antibiotic resistance gene, a fluorescent gene, a unique DNA sequence or watermark, or other known reporter, screenable, or selectable gene.
  • each successive round of engineering can incorporate a fragment of the selectable or screenable element, such that at the end of the engineering rounds, the entire selectable or screenable element has been incorporated into the target genome or plasmid.
  • only those genome or plasmids, which have successfully incorporated all of the fragments, and therefore all of the desired corresponding mutations, can be selected or screened for. In this way, the selected or screened cells will be enriched for those that have incorporated the edits from each and every iterative round of engineering.
  • Recursive methods can be used to switch a selectable or screenable marker between an on and an off position, or between an off and an on position, with each successive round of engineering.
  • Using such a method allows conservation of available selectable or screenable markers by requiring, for example, the use of only one screenable or selectable marker.
  • short regulatory sequence or start codon or non-start codons can be used to turn the screenable or selectable marker on and off. Such short sequences can easily fit within a cassette or polynucleotide, such as a synthesized cassette.
  • each round of engineering is used to incorporate an edit unique from that of previous rounds.
  • Each round of engineering can incorporate a unique recording sequence.
  • Each round of engineering can result in removal or curing of the CREATE plasmid used in the previous round of engineering.
  • successful incorporation of the recording sequence of each round of engineering results in a complete and functional screenable or selectable marker or unique sequence combination.
  • Unique recorder cassettes comprising recording sequences such as barcodes or screenable or selectable markers can be inserted with each round of engineering, thereby generating a recorder sequence that is indicative of the combination of edits or engineering steps performed.
  • Successive recording sequences can be inserted adjacent to one another.
  • Successive recording sequences can be inserted within proximity to one another.
  • Successive sequences can be inserted at a distance from one another.
  • Successive sequences can be inserted at a distance from one another.
  • successive recorder sequences can be inserted and separated by 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or greater than 100 bp.
  • successive recorder sequences are separated by about 10, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, or greater than 1500 bp.
  • Successive recorder sequences can be separated by any desired number of base pairs and can be dependent and limited on the number of successive recorder sequences to be inserted, the size of the target nucleic acid or target genomes, and/or the design of the desired final recorder sequence. For example, if the compiled recorder sequence is a functional screenable or selectable marker, than the successive recording sequences can be inserted within proximity and within the same reading frame from one another. If the compiled recorder sequence is a unique set of barcodes to be identified by sequencing and have no coding sequence element, then the successive recorder sequences can be inserted with any desired number of base pairs separating them. In these cases, the separation distance can be dependent on the sequencing technology to be used and the read length limit.
  • a recorder cassette comprises a landing site to be used as a target site for the recorder cassette of the next round of engineering.
  • a guide nucleic acid can complex with a compatible nucleic acid-guided nuclease and can hybridize with a target sequence, thereby directing the nuclease to the target sequence.
  • a subject nucleic acid-guided nuclease capable of complexing with a guide nucleic acid can be referred to as a nucleic acid-guided nuclease that is compatible with the guide nucleic acid.
  • a guide nucleic acid capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide nucleic acid that is compatible with the nucleic acid-guided nucleases.
  • a guide nucleic acid can be DNA.
  • a guide nucleic acid can be RNA.
  • a guide nucleic acid can comprise both DNA and RNA.
  • a guide nucleic acid can comprise modified of non-naturally occurring nucleotides.
  • the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • a guide nucleic acid can comprise a guide sequence.
  • a guide sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences.
  • a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long. The guide sequence can be 15-20 nucleotides in length. The guide sequence can be 15 nucleotides in length. The guide sequence can be 16 nucleotides in length. The guide sequence can be 17 nucleotides in length. The guide sequence can be 18 nucleotides in length. The guide sequence can be 19 nucleotides in length. The guide sequence can be 20 nucleotides in length.
  • a guide nucleic acid can comprise a scaffold sequence.
  • a “scaffold sequence” includes any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid comprising a scaffold sequence and a guide sequence.
  • Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure. In some cases, the one or two sequence regions are comprised or encoded on the same polynucleotide.
  • the one or two sequence regions are comprised or encoded on separate polynucleotides.
  • Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions.
  • the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
  • at least one of the two sequence regions is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • a scaffold sequence of a subject guide nucleic acid can comprise a secondary structure.
  • a secondary structure can comprise a pseudoknot region.
  • the compatibility of a guide nucleic acid and nucleic acid-guided nuclease is at least partially determined by sequence within or adjacent to a pseudoknot region of the guide RNA.
  • binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence.
  • binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence.
  • guide nucleic acid refers to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with an nucleic acid-guided nuclease as described herein.
  • a guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence.
  • a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci.
  • native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features.
  • Common features can include sequence outside a pseudoknot region.
  • Common features can include a pseudoknot region.
  • Common features can include a primary sequence or secondary structure.
  • a guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence.
  • a guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid.
  • Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • nuclease such as a nucleic acid-guided nuclease to perform directed genome evolution/produce changes (deletions, substitutions, additions) in a target sequence, such as DNA or RNA, for example, genomic DNA or episomal DNA.
  • Suitable nucleases can include, for example, RNA-guided nucleases such as Cas9, Cpf1, MAD2, or MAD7, DNA-guided nucleases such as Argonaute, or other nucleases such as zinc-finger nucleases, TALENs, or meganucleases.
  • Nuclease genes can be obtained from any source, such as from a bacterium, archaea, prokaryote, eukaryote, or virus.
  • a Cas9 gene can be obtained from a bacterium harboring the corresponding Type II CRISPR system, such as the bacterium S. pyogenes (SEQ ID NO: 110).
  • the nucleic acid sequence and/or amino acid sequence of the nuclease may be mutated, relative to the sequence of a naturally occurring nuclease.
  • a mutation can be, for example, one or more insertions, deletions, substitutions or any combination of two or three of the foregoing.
  • the resulting mutated nuclease can have enhanced or reduced nuclease activity relative to the naturally occurring nuclease. In some cases, the resulting mutated nuclease can have no nuclease activity relative to the naturally occurring nuclease.
  • Some disclosed methods can include a two-stage construction process which relies on generation of cassette libraries that incorporate directed mutations from an editing cassettes directly into a genome, episomal nucleic acid molecule, or isolated nucleic acid molecule.
  • rationally designed editing cassettes can be cotransformed into cells with a guide nucleic acid (e.g., guide RNA) that hybridizes to or targets a target DNA sequence.
  • the guide nucleic acid is introduced as an RNA molecule, or encoded on a DNA molecule.
  • Editing cassettes can be designed such that they couple deletion or mutation of a PAM site with the mutation of one or more desired codons or nucleic acid residues in the adjacent nucleic acid sequence.
  • the deleted or mutated PAM site in some cases, can no longer be recognized by the chosen nucleic acid-guided nuclease.
  • at least one PAM or more than one PAM can be deleted or mutated, such as two, three, four, or more PAMs.
  • Methods disclosed herein can enable generation of an entire cassette library in a single transformation.
  • the cassette library can be retrieved, in some cases, by amplification of the recombinant chromosomes, e.g. by a PCR reaction, using a synthetic feature or priming site from the editing cassettes.
  • a second PAM deletion or mutation is simultaneously incorporated. This approach can covalently couple the codon-targeted mutations directly to a PAM deletion.
  • cassette libraries there is a second stage to construction of cassette libraries.
  • the PCR amplified cassette libraries carrying the destination PAM deletion/mutation and the targeted mutations, such as a desired mutation of one or more nucleotides, such as one or more nucleotides in one or more codons can be co-transformed into naive cells.
  • the cells can be eukaryotic cell, archaeal cell, or prokaryotic cells.
  • the cassette libraries can be co-transformed with a guide nucleic acid or plasmid encoding the same to generate a population of cells that express a rationally designed protein library.
  • the libraries can be co-transformed with a guide nucleic acid such as a gRNA, chimeric gRNA, split gRNA, or a crRNA and trRNA set.
  • the cassette library can comprise a plurality of cassettes wherein each cassette comprises an editing cassette and guide nucleic acid.
  • the cassette library can comprise a plurality of cassettes wherein each cassette comprises an editing cassette, recorder cassettes and two guide nucleic acids.
  • the guide nucleic acid can guide selection of a target sequence.
  • a target sequence refers to any locus in vitro or in in vivo, or in the nucleic acid of a cell or population of cells in which a mutation of at least one nucleotide, such as a mutation of at least one nucleotide in at least one codon, is desired.
  • the target sequence can be, for example, a genomic locus, target genomic sequence, or extrachromosomal locus.
  • the guide nucleic acid can be expressed as a DNA molecule, referred to as a guide DNA, or as a RNA molecule, referred to as a guide RNA.
  • a guide nucleic acid can comprise a guide sequence, that is complementary to a region of the target region.
  • a guide nucleic acid can comprise a scaffold sequence that can interact with a compatible nucleic acid-guided nuclease, and can optionally form a secondary structure.
  • a guide nucleic acid can functions to recruit a nucleic acid-guided nuclease to the target site.
  • a guide sequence can be complementary to a region upstream of the target site.
  • a guide sequence can be complementary to at least a portion of the target site.
  • a guide sequence can be completely complementary (100% complementary) to the target site or include one or more mismatches, provided that it is sufficiently complementary to the target site to specifically hybridize/guide and recruit the nuclease.
  • Suitable nucleic acid guided nuclease include, as non-limiting examples, CRISPR nucleases, Cas nucleases, such as Cas9 or Cpf1, MAD2, and MAD7.
  • the CRISPR RNA (crRNA or spacer-containing RNA) and trans-activating CRISPR RNA (tracrRNA or trRNA) can guide selection of a target sequence.
  • a target sequence refers to any locus in vitro or in in vivo, or in the nucleic acid of a cell or population of cells in which a mutation of at least one nucleotide, such as a mutation of at least one nucleotide in at least one codon, is desired.
  • the target sequence can be, for example, a genomic locus, target genomic sequence, or extrachromosomal locus.
  • the tracrRNA and crRNA can be expressed as a single, chimeric RNA molecule, referred to as a single-guide RNA, guide RNA, or gRNA.
  • the nucleic acid sequence of the gRNA comprises a first nucleic acid sequence, also referred to as a first region, that is complementary to a region of the target region and a second nucleic acid sequence, also referred to a second region, that forms a stem loop structure and functions to recruit a CRISPR nuclease to the target region.
  • the first region of the gRNA can be complementary to a region upstream of the target genomic sequence.
  • the first region of the gRNA can be complementary to at least a portion of the target region.
  • the first region of the gRNA can be completely complementary (100% complementary) to the target genomic sequence or include one or more mismatches, provided that it is sufficiently complementary to the target genomic sequence to specifically hybridize/guide and recruit a CRISPR nuclease, such as Cas9 or Cpf1.
  • a CRISPR nuclease such as Cas9 or Cpf1.
  • a guide sequence or first region of the gRNA can be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or at least 30 nucleotides in length.
  • the guide sequence or first region of the gRNA can be at least 20 nucleotides in length.
  • a stem loop structure that can be formed by the scaffold sequence or second nucleic acid sequence of a gRNA can be at least 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 7, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides in length.
  • a stem loop structure can be from 80 to 90 or 82 to 85 nucleotides in length.
  • a scaffold sequence or second region of the gRNA that forms a stem loop structure can be 83 nucleotides in length.
  • a guide nucleic acid of a cassette that is introduced into a first cell using the methods disclosed herein can be the same as the guide nucleic acid of a second cassette that is introduced into a second cell. More than one guide nucleic acid can be introduced into the population of first cells and/or the population of second cells. The more than one guide nucleic acids can comprise guide sequences that are complementary to more than one target region.
  • Methods disclosed herein can comprise using oligonucleotides.
  • Such oligonucleotides can be obtained or derived from many sources.
  • an oligonucleotide can be derived from a nucleic acid library that has been diversified by nonhomologous random recombination (NRR); such a library is referred to as an NRR library.
  • NRR nonhomologous random recombination
  • An oligonucleotide can be synthesized, for example by array-based synthesis or other known chemical synthesis method. The length of an oligonucleotide can be dependent on the method used in obtaining the oligonucleotide.
  • An oligonucleotide can be approximately 50-200 nucleotides, 75-150 nucleotides, or between 80-120 nucleotides in length.
  • An oligonucleotide can be about 10, 20, 30, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more nucleotides in length, including any integer, for example, 51, 52, 53, 54, 201, 202, etc.
  • An oligonucleotide can be about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, 2000, or more nucleotides in length, including any integer, for example, 101, 203, 1001, 2001, 2010, etc.
  • Oligonucleotides and/or other nucleic acid molecules can be combined or assembled to generate a cassette.
  • a cassette can comprise (a) a region that is homologous to a target region of the nucleic acid of the cell and includes a desired mutation of at least one nucleotide or one codon relative to the target region, and (b) a protospacer adjacent motif (PAM) mutation.
  • the PAM mutation can be any insertion, deletion or substitution of one or more nucleotides that mutates the sequence of the PAM such that it is no longer recognized by a nucleic acid-guided nuclease system or CRISPR nuclease system.
  • a cell that comprises such a PAM mutation may be said to be “immune” to nuclease-mediated killing.
  • the desired mutation relative to the sequence of the target region can be an insertion, deletion, and/or substitution of one or more nucleotides.
  • the insertion, deletion, and/or substitution of one or more nucleotides is in at least one codon of the target region.
  • the cassette can be synthesized in a single synthesis, comprising (a) a region that is homologous to a target region of the nucleic acid of the cell and includes a desired mutation of at least one nucleotide or one codon relative to the target region, (b) a protospacer adjacent motif (PAM) mutation, and optionally (c) a region that is homologous to a second target region of the nucleic acid of the cell and includes a recorder sequence.
  • PAM protospacer adjacent motif
  • the methods disclosed herein can be applied to any target nucleic acid molecule of interest, from any prokaryote including bacteria and archaea, or any eukaryote, including yeast, mammalian, and human genes, or any viral particle.
  • the nucleic acid module can be a non-coding nucleic acid sequence, gene, genome, chromosome, plasmid, episomal nucleic acid molecule, artificial chromosome, synthetic chromosome, or viral nucleic acid.
  • Recovery efficiency can be verified based on the presence of a PCR product or on changes in amplicon or PCR product sizes or sequence obtained with primers directed at the selected target locus.
  • Primers can be designed to hybridize with endogenous sequences or heterologous sequences contained on the donor nucleic acid molecule.
  • the PCR primer can be designed to hybridize to a heterologous sequence such that PCR will only be possible if the donor nucleic acid is incorporated.
  • Sequencing of PCR products from the recovered libraries indicates the heterologous sequence or synthetic priming site from the dsDNA cassettes or donor sequences can be incorporated with about 90-100% efficiency. In other examples, the efficiency can be about 5%, 10% 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100%.
  • the ability to improve final editing efficiencies of the methods disclosed herein can be assessed by carrying out cassette construction in gene deficient strains before transferring to a wild-type donor strain in an effort to prevent loss of mutations during the donor construction phase. Additionally or alternatively, efficiency of the disclosed methods can be assessed by targeting an essential gene.
  • Essential genes can include any gene required for survival or replication of a viral particle, cell, or organism. In some examples, essential genes include dxs, metA, and folA. Essential genes have been effectively targeted using guide nucleic acid design strategies described. Other suitable essential genes are well known in the art.
  • a nucleic acid-guided nuclease provides a method of increasing editing efficiencies by modulating the level of a nucleic acid-guided nuclease. This could be done by using copy control plasmids, such as high copy number plasmids or low copy number plasmids. Low copy number plasmids could be plasmids that can have about 20 or less copies per cell, as opposed to high copy number plasmids that can have about 1000 copies per cell. High copy number plasmids and low copy number plasmids are well known in the art and it is understood that an exact plasmid copy per cell does not need to be known in order to characterize a plasmid as either high or low copy number.
  • the decreasing expression level of a nucleic acid-guided nuclease can increase transformation, editing, and/or recording efficiencies.
  • decreasing expression level of the nucleic acid-guided nuclease is done by expressing the nucleic acid-guided nuclease on a low copy number plasmid.
  • the increasing expression level of a nucleic acid-guided nuclease can increase transformation, editing, and/or recording efficiencies.
  • increasing expression level of the nucleic acid-guided nuclease is done by expressing the nucleic acid-guided nuclease on a high copy number plasmid.
  • RNAi RNAi, amiRNAi, or other RNA silencing techniques to modulate transcript level
  • fusing the protein of interest to a degradation domain or any other method known in the art.
  • the mutant library can be effectively constructed and retrieved within 1-3 hours post recombineering. In some examples, the mutant library is constructed within 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, or 24 hours post recombineering. In some examples, the mutant library can be retrieved within 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 24, 36, or 48 hours post recombineering and/or post-constructing by recombineering.
  • Some methods disclosed herein can be used for trackable, precision genome editing.
  • methods disclosed herein can achieve high efficiency editing/mutating using a single cassette that encodes both an editing cassette and guide nucleic acid, and optionally a recorder cassette and second guide nucleic acid.
  • a single vector can encode an editing cassette while a guide nucleic acid is provided sequentially or concomitantly.
  • methods disclosed herein can provide single step generation of hundreds or thousands of precision edits/mutations. Mutations can be mapped by sequencing the editing cassette on the vector, rather than by sequencing of the genome or a section of the genome of the cell or organism.
  • the methods disclosed herein can have broad utility in protein and genome engineering applications, as well as for reconstruction of mutations, such as mutations identified in laboratory evolution experiments.
  • the methods and compositions disclosed here can combine an editing cassette, which could include a desired mutation and a PAM mutation, with a gene encoding a guide nucleic acid on a single vector.
  • a trackable mutant library can be generated in a single transformation or single reaction.
  • Methods disclosed herein can comprise introducing a cassette comprising an editing cassette that includes the desired mutation and the PAM mutation into a cell or population of cells.
  • the cell into which the cassette or vector is introduced also comprises a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7.
  • a gene or mRNA encoding the nucleic acid-guided nuclease is concomitantly, sequentially, or subsequently introduced into the cell or population of cells.
  • a targetable nuclease system including nucleic acid-guided nuclease and a guide nucleic acid
  • expression of a targetable nuclease system including nucleic acid-guided nuclease and a guide nucleic acid, in the cell or cell population can be activated such that the guide nucleic acid recruits the nucleic acid-guided nuclease to the target region, where dsDNA cleavage occurs.
  • the homologous region of an editing cassette complementary to the target sequence mutates the PAM and the one or more codon of the target sequence.
  • Cells of the population of cells that did not integrate the PAM mutation can undergo unedited cell death due to nucleic acid-guided nuclease mediated dsDNA cleavage.
  • cells of the population of cells that integrate the PAM mutation do not undergo cell death; they remain viable and are selectively enriched to high abundance. Viable cells can be obtained and can provide a library of trackable or targeted mutations.
  • the homologous region of a recorder cassette complementary to the target sequence mutates the PAM and introduces a barcode into a target sequence.
  • Cells of the population of cells that did not integrate the PAM mutation can undergo unedited cell death due to nucleic acid-guided nuclease mediated dsDNA cleavage.
  • cells of the population of cells that integrate the PAM mutation do not undergo cell death; they remain viable and are selectively enriched to high abundance. Viable cells can be obtained and can provide a library of trackable mutations.
  • a separate vector or mRNA encoding a nucleic acid-guided nuclease can be introduced into the cell or population of cells.
  • Introducing a vector or mRNA into a cell or population of cells can be performed using any method or technique known in the art.
  • vectors can be introduced by standard protocols, such as transformation including chemical transformation and electroporation, transduction and particle bombardment.
  • mRNA can be introduced by standard protocols, such as transformation as disclosed herein, and/or by techniques involving cell permeable peptides or nanoparticles.
  • An editing cassette can include (a) a region, which recognizes (hybridizes to) a target region of a nucleic acid in a cell or population of cells, is homologous to the target region of the nucleic acid of the cell and includes a mutation, referred to a desired mutation, of at least one nucleotide that can be in at least one codon relative to the target region, and (b) a protospacer adjacent motif (PAM) mutation.
  • the editing cassette also comprises a barcode.
  • the barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode.
  • the PAM mutation may be any insertion, deletion or substitution of one or more nucleotides that mutates the sequence of the PAM such that the mutated PAM (PAM mutation) is not recognized by a chosen nucleic acid-guided nuclease system.
  • a cell that comprises such as a PAM mutation may be said to be “immune” to nucleic acid-guided nuclease-mediated killing.
  • the desired mutation relative to the sequence of the target region may be an insertion, deletion, and/or substitution of one or more nucleotides and may be at least one codon of the target region.
  • the distance between the PAM mutation and the desired mutation is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides on the editing cassette
  • the PAM mutation is located at least 9 nucleotides from the end of the editing cassette.
  • the desired mutation is located at least 9 nucleotides from the end of the editing cassette.
  • a desired mutation can be an insertion of a nucleic acid sequence relative to the sequence of the target sequence.
  • the nucleic acid sequence inserted into the target sequence can be of any length.
  • the nucleic acid sequence inserted is at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, or at least 2000 nucleotides in length.
  • the editing cassette comprises a region that is at least 10, 15, 20, 25, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 51, 52, 53, 54, 55, 56, 57, 58, 59, or at least 60 nucleotides in length and homologous to the target sequence.
  • the homology arms or homologous region can be about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more nucleotides in length, including any integer therein.
  • the homology arms or homologous region can be over 200 nucleotides in length.
  • a barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode.
  • the barcode is a non-naturally occurring sequence that is not found in nature.
  • the combination of the desired mutation and the barcode within the editing cassette is non-naturally occurring and not found in nature.
  • a barcode can be any number of nucleotides in length.
  • a barcode can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. In some cases, the barcode is more than 30 nucleotides in length.
  • An editing cassette or recorder cassette can comprise at least a portion of a gene encoding a guide nucleic acid, and optionally a promoter operable linked to the encoded guide nucleic acid.
  • the portion of the gene encoding the guide nucleic acid encodes the portion of the guide nucleic acid that is complementary to the target sequence.
  • the portion of the guide nucleic acid that is complementary to the target sequence, or the guide sequence can be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or at least 30 nucleotides in length.
  • the guide sequence is 24 nucleotides in length.
  • the guide sequence is 18 nucleotides in length.
  • the editing cassette or recorder cassette further comprises at least two priming sites.
  • the priming sites may be used to amplify the cassette, for example by PCR.
  • the portion of the guide sequence is used as a priming site.
  • Editing cassettes or recorder cassettes for use in the described methods can be obtained or derived from many sources.
  • the cassettes can be synthesized, for example by array-based synthesis, multiplex synthesis, multi-parallel synthesis, PCR assembly, in vitro assembly, Gibson assembly, or any other synthesis method known in the art.
  • the editing cassette or recorder cassette is synthesized, for example by array-based synthesis, multiplex synthesis, multi-parallel synthesis, PCR assembly,—in vitro assembly, Gibson assembly, or any other synthesis method known in the art.
  • the length of the editing cassette or recorder cassette may be dependent on the method used in obtaining said cassette.
  • An editing cassette can be approximately 50-300 nucleotides, 75-200 nucleotides, or between 80-120 nucleotides in length. In some embodiments, the editing cassette can be any discrete length between 50 nucleotide and 1 Mb.
  • a recorder cassette can be approximately 50-300 nucleotides, 75-200 nucleotides, or between 80-120 nucleotides in length. In some embodiments, the recorder cassette can be any discrete length between 50 nucleotide and 1 Mb.
  • Methods disclosed herein can also involve obtaining editing cassettes and recorder cassettes and constructing a trackable plasmid or vector.
  • Methods of constructing a vector will be known to one ordinary skill in the art and may involve ligating the cassettes into a vector backbone.
  • plasmid construction occurs by in vitro DNA assembly methods, oligonucleotide assembly, PCR-based assembly, SLIC, CPEC, or other assembly methods well known in the art.
  • the cassettes or a subset (pool) of the cassettes can be amplified prior to construction of the vector, for example by PCR.
  • the cell or population of cells comprising a polynucleotide encoding a nucleic acid-guided nuclease can be maintained or cultured under conditions in which the nuclease is expressed.
  • Nucleic acid-guided nuclease expression can be controlled or can be constitutively on.
  • the methods described herein can involve maintaining cells under conditions in which nuclease expression is activated, resulting in production of the nuclease, for example, Cas9, Cpf1, MAD2, or MAD7.
  • Specific conditions under which the nucleic acid-guided nuclease is expressed can depend on factors, such as the nature of the promoter used to regulate expression of the nuclease.
  • Nucleic acid-guided nuclease expression can be induced in the presence of an inducer molecule, such as arabinose.
  • an inducer molecule such as arabinose.
  • expression of the nuclease can occur.
  • CRISPR-nuclease expression can be repressed in the presence of a repressor molecule.
  • expression of the nuclease can occur.
  • Cells or the population of cells that remain viable can be obtained or separated from the cells that undergo unedited cell death as a result of nucleic acid-guided nuclease-mediated killing; this can be done, for example, by spreading the population of cells on culture surface, allowing growth of the viable cells, which are then available for assessment.
  • the methods can involve sequencing of the editing cassette, recorder cassette, or barcode to identify the mutation of one of more codon. Sequencing of the editing cassette can be performed as a component of the vector or after its separation from the vector and, optionally, amplification. Sequencing can be performed using any sequencing method known in the art, such as by Sanger sequencing or next-generation sequencing methods.
  • the cell is a bacterial cell, such as Escherichia spp., e.g., E. coli .
  • the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp.
  • the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
  • a “vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to or expressed in a cell.
  • a desired sequence can be included in a vector, such as by restriction and ligation or by recombination or assembly methods know in the art.
  • Vectors are typically composed of DNA, although RNA vectors are also available. Vectors include, but are not limited to plasmids, fosmids, phagemids, virus genomes, artificial chromosomes, and synthetic nucleic acid molecules.
  • Vectors useful in the methods disclosed herein can comprise at least one editing cassette as described herein, at least one gene encoding a gRNA, and optionally a promoter and/or a barcode. More than one editing cassette can be included on the vector, for example 2, 3, 4, 5, 6, 7, 8, 9, 10 or more editing cassettes.
  • the more than one editing cassettes can be designed to target different target regions, for example, there could be different editing cassettes, each of which contains at least one region homologous with a different target region.
  • each editing cassette target the same target region while each editing cassette comprises a different desired mutation relative to the target region.
  • the plurality of editing cassettes can comprise a combination of editing cassettes targeting the same target region and editing cassettes targeting different target regions.
  • Each editing cassette can comprise an identifying barcode.
  • the vector can include one or more genes encoding more than one gRNA, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10 or more gRNAs.
  • the more than one gRNAs can contain regions that are complementary to a portion of different target regions, for example, if there are different gRNAs, each of which can be complementary to a portion of a different target region.
  • the more than one gRNA can each target the same target region.
  • the more than one gRNA can be a combination of gRNAs targeting the same and different target regions.
  • a cassette comprising a gene encoding a portion of a guide nucleic acid can be ligated or assembled into a vector that encodes another portion of a guide nucleic acid. Upon ligation or assembly, the portion of the guide nucleic acid from the cassette and the other portion of the guide nucleic acid can form a functional guide nucleic acid.
  • a promoter and a gene encoding a guide nucleic acid can be operably linked.
  • the methods involve introduction of a second vector encoding a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7.
  • the vector may further comprise one or more promoters operably linked to a gene encoding the nucleic acid-guided nuclease.
  • operably linked can mean the promoter affects or regulates transcription of the DNA encoding a gene, such as the gene encoding the gRNA or the gene encoding a CRISPR nuclease.
  • a promoter can be a native promoter such as a promoter present in the cell into which the vector is introduced.
  • a promoter can be an inducible or repressible promoter, for example, the promoter can be regulated allowing for inducible or repressible transcription of a gene, such as the gene encoding the guide nucleic acid or the gene encoding a nucleic acid-guided nuclease.
  • Such promoters that are regulated by the presence or absence of a molecule can be referred to as an inducer or a repressor, respectively.
  • the nature of the promoter needed for expression of the guide nucleic acid or nucleic acid-guided nuclease can vary based on the species or cell type and can be recognized by one of ordinary skill in the art.
  • a separate vector encoding a nucleic acid-guided nuclease can be introduced into a cell or population of cells before or at the same time as introduction of a trackable plasmid as disclosed herein.
  • the gene encoding a nucleic acid-guided nuclease can be integrated into the genome of the cell or population of cells, or the gene can be maintained episomally.
  • the nucleic acid-guided nuclease-encoding DNA can be integrated into the cellular genome before introduction of the trackable plasmid, or after introduction of the trackable plasmid.
  • a nucleic acid molecule such as DNA-encoding a nucleic acid-guided nuclease, can be expressed from DNA integrated into the genome.
  • a gene encoding Cas9, Cpf1, MAD2, or MAD7 is integrated into the genome of the cell.
  • Vectors or cassettes useful in the methods described herein can further comprise two or more priming sites.
  • the presence of flanking priming sites allows amplification of the vector or cassette.
  • a cassette or vector encodes a nucleic acid-guided nuclease comprising one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs.
  • the engineered nuclease comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus).
  • the engineered nuclease comprises at most 6 NLSs.
  • an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus.
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 111); the NLS from nucleoplasmin (e.g.
  • the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:112)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO:113) or RQRRNELKRSP (SEQ ID NO:114); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 115); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 116) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:117) and PPKKARED (SEQ ID NO:11 of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:119) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO:120) of mouse c-abl IV; the sequences
  • the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell.
  • strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors.
  • Detection of accumulation in the nucleus may be performed by any suitable technique.
  • a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI).
  • Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g.
  • nucleic acid-guided nuclease activity assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLSs.
  • Methods disclosed herein are capable of engineering a few to hundreds of genetic sequence or proteins simultaneously. These methods can permit one to map in a single experiment many or all possible residue changes over a collection of desired proteins onto a trait of interest, as part of individual proteins of interest or as part of a pathway. This approach can be used at least for the following by mapping i) any number of residue changes for any number of proteins of interest in a specific biochemical pathway or that catalyze similar reactions or ii) any number of residues in the regulatory sites of any number of proteins or interest with a specific regulon or iii) any number of residues of a biological agent used to treat a health condition.
  • methods described herein include identifying genetic variations of one or more target genes that affect any number or residues, such as one or more, or all residues of one or more target proteins.
  • compositions and methods disclosed herein permit parallel analysis of two or more target proteins or proteins that contribute to a trait. Parallel analysis of multiple proteins by a single experiment described can facilitate identification, modification and design of superior systems for example for producing a eukaryotic or prokaryotic byproduct, producing a eukaryotic byproduct, for example, a biological agent such as a growth factor or antibody, in a prokaryotic organism and the like.
  • Relevant biologics used in analysis and treatment of disease can be produced in these genetically engineered environments that could reduce production time, increase quality all while reducing costs to the manufacturers and the consumers.
  • Some embodiments disclosed herein comprise constructs of use for studying genetic variations of a gene or gene segment wherein the gene or gene segment is capable of generating a protein.
  • a construct can be generated for any number of residues, such as one, two, more than two, or all residue modifications of a target protein that is linked to a trackable agent such as a barcode.
  • a barcode indicative of a genetic variation of a gene of a target protein can be located outside of the open reading frame of the gene. In some embodiments such a barcode can be located many hundreds or thousands of bases away from the gene. It is contemplated herein that these methods can be performed in vivo.
  • such a construct comprises a trackable polynucleic acid or plasmid as disclosed herein.
  • Constructs described herein can be used to compile a comprehensive library of genetic variations encompassing all residue changes of one target protein, more than one target protein or target proteins that contribute to a trait.
  • libraries disclosed herein can be used to select proteins with improved qualities to create an improved single or multiple protein system for example for producing a byproduct, such as a chemical, biofuels, biological agent, pharmaceutical agent, or for biomass, or biologic compared to a non-selective system.
  • compositions and methods which can be used to rapidly and efficiently examine the roles of some or all genes in a viral, microbial, or eukaryotic genome using mixtures of barcoded oligonucleotides.
  • these compositions and methods can be used to develop a powerful new technology for comprehensively mapping protein structure-activity relationships (ProSAR).
  • multiplex cassette synthesis can be combined with recombineering, to create mutant libraries of specifically designed and barcoded mutations along one or more genes of interest in parallel.
  • Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of protein sequence-activity relationships (ProSAR).
  • ProSAR protein sequence-activity relationships
  • systematic ProSAR mapping can elucidate individual amino acid mutations for improved function and/or activity and/or stability etc.
  • Methods can be iterated to combinatorially improve the function, activity, or stability.
  • Cassettes can be generated by oligonucleotide synthesis. Given that existing capabilities of multiplex oligonucleotide synthesis can reach over 120,000 oligonucleotides per array, combined with recombineering, the methods disclosed herein can be scaled to construct mutant libraries for dozens to hundreds of proteins in a single experiment. In some examples, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, or more proteins can be partially or completely covered by mutant libraries generated by the methods disclosed herein.
  • a partial or complete substitution library for one or more protein constructs can be barcoded, or non-barcoded if desired, for one or for several hundred proteins at the same time.
  • such libraries comprise trackable plasmids as disclosed herein.
  • Cassette library size can depend on the number (N) of amino acids in a protein of interest, with a full saturation library, including all 20 amino acids at each position and optionally non-naturally occurring amino acids, scaling as 19 (or more) ⁇ N and an alanine-mapping library scaling as 1 ⁇ N.
  • N number of amino acids in a protein of interest
  • a full saturation library including all 20 amino acids at each position and optionally non-naturally occurring amino acids, scaling as 19 (or more) ⁇ N and an alanine-mapping library scaling as 1 ⁇ N.
  • mutations at residues important for a particular trait such as thermostability, resistance to environmental pressures, or increases or decreases in functionality or production, can be combined via multiplex recombineering with mutations important for various other traits, such as catalytic activity, to create combinatorial libraries for multi-trait optimization.
  • Methods disclosed herein can provide for creating and/or evaluating comprehensive, in vivo, mutational libraries of one or more target protein(s). These approaches can be extended via a recorder cassettes or barcoding technology to generate trackable mutational libraries for any number of residues or every residue in a protein. This approach can be based on protein sequence-activity relationship mapping method extended to work in vivo, capable of working on one or a few to hundreds of proteins simultaneously depending on the technology selected. For example, these methods permit one to map in a single experiment any number of, the majority of, or all possible residue changes over a collection of desired proteins onto a trait of interest, as part of individual proteins of interest or as part of a pathway.
  • these approaches can be used at least for the following by mapping i) any number of or all residue changes for any number of or all proteins in a specific biochemical pathway, such as lycopene production, or that catalyze similar reactions, such as dehydrogenases or other enzymes of a pathway of use to produce a desired effect or produce a product, or ii) any number of or all residues in the regulatory sites of any number of or all proteins with a specific regulatory mechanism, such as heat shock response, or iii) any number of or all residues of a biological agent used to treat a health condition, such as insulin, a growth factor (HCG), an anti-cancer biologic, or a replacement protein for a deficient population.
  • a biological agent used to treat a health condition such as insulin, a growth factor (HCG), an anti-cancer biologic, or a replacement protein for a deficient population.
  • Scores related to various input parameters can be assigned in order to generate one or more composite score(s) for designing genomically-engineered organisms or systems. These scores can reflect quality of genetic variations in genes or genetic loci as they relate to selection of an organism or design of an organism for a predetermined production, trait or traits. Certain organisms or systems can be designed based on a need for improved organisms for biorefining, biomass, such as crops, trees, grasses, crop residues, or forest residues, biofuel production, and using biological conversion, fermentation, chemical conversion and catalysis to generate and use compounds, biopharmaceutical production and biologic production. In certain embodiments, this can be accomplished by modulating growth or production of microorganism through genetic manipulation methods disclosed herein.
  • Genetic manipulation by methods disclosed herein of genes encoding a protein can be used to make desired genetic changes that can result in desired phenotypes and can be accomplished through numerous techniques including but not limited to, i) introduction of new genetic material, ii) genetic insertion, disruption or removal of existing genetic material, as well as, iii) mutation of genetic material, such as a point mutation, or any combinations of i, ii, and iii, that results in desired genetic changes with desired phenotypic changes. Mutations can be directed or random, in addition to those including, but not limited to, error prone or directed mutagenesis through PCR, mutator strains, and random mutagenesis. Mutations can be incorporated using trackable plasmids and methods as disclosed herein.
  • Disclosed methods can be used for inserting and accumulating higher order modifications into a microorganism's genome or a target protein; for example, multiple different site-specified mutations in the same genome, at high efficiency to generate libraries of genomes with over 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, or more targeted modifications are described.
  • these mutations are within regulatory modules, regulatory elements, protein-coding regions, or non-coding regions.
  • Protein coding modifications can include, but are not limited to, amino acid changes, codon optimization, and translation tuning.
  • methods are provided for the co-delivery of reagents to a single biological cell.
  • the methods generally involve the attachment or linkage of two or more cassettes, followed by delivery of the linked cassettes to a single cell.
  • the methods provided herein involve the delivery of two or more cassettes to a single cell.
  • Traditional methods of reagent delivery may often be inefficient and/or inconsistent, leading to situations in which some cells receive only one of the cassettes.
  • the methods provided herein may improve the efficiency and/or consistency of reagent delivery, such that a majority of cells in a cell population each receive the two or more cassettes. For example, more than 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% of the cells in a cell population may receive the two or more cassettes.
  • the two or more cassettes may be linked by any known method in the art and generally the method chosen will be commensurate with the chemistry of the cassettes.
  • the two or more cassettes are linked by a covalent bond (i.e., covalently-linked), however, other types of non-covalent chemical bonds are envisioned, such as hydrogen bonds, ionic bonds, and metallic bonds.
  • covalent bond i.e., covalently-linked
  • other types of non-covalent chemical bonds are envisioned, such as hydrogen bonds, ionic bonds, and metallic bonds.
  • the editing cassette and the recorder cassette may be linked and delivered into a single cell.
  • a known edit is then associated with a known recorder or barcode sequence for that cell.
  • the two or more cassettes are nucleic acids, such as two or more nucleic acids.
  • the nucleic acids may be RNA, DNA, or a combination of both, and may contain any number of chemically-modified nucleotides or nucleotide analogues.
  • two or more RNA cassettes are linked for delivery to a single cell.
  • two or more DNA cassettes are linked for delivery to a single cell.
  • a DNA cassettes and an RNA cassettes are linked for delivery to a single cell.
  • the nucleic acids may be derived from genomic RNA, complementary DNA (cDNA), or chemically or enzymatically synthesized DNA.
  • a cassettes may be of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, about 110, about 120, about 130, about 140, about 150, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about
  • Two or more cassettes may be linked on a linear nucleic acid molecule or may be linked on a plasmid or circular nucleic acid molecule.
  • the two or more cassettes may be linked directly to one another or may be separated by one or more nucleotide spacers or linkers.
  • Two or more cassettes may be covalently linked on a linear cassettes or may be covalently linked on a plasmid or circular nucleic acid molecule.
  • the two or more cassettes may be covalently linked directly to one another or may be separated by one or more nucleotide spacers or linkers.
  • cassettes may be linked for co-delivery.
  • the two or more cassettes may include nucleic acids, lipids, proteins, peptides, small molecules, or any combination thereof.
  • the two or more cassettes may be essentially any cassettes that are amenable to linkage.
  • the two or more cassettes are covalently linked (e.g., by a chemical bond). Covalent linkage may help to ensure that the two or more cassettes are co-delivered to a single cell. Generally, the two or more cassettes are covalently linked prior to delivery to a cell. Any method of covalently linking two or more molecules may be utilized, and it should be understood that the methods used will be at least partly determined by the types of cassettes to be linked.
  • methods are provided for the co-delivery of reagents to a single biological cell.
  • the methods generally involve the covalent attachment or linkage of two or more cassettes, followed by delivery of the covalently-linked cassettes into a single cell.
  • the methods provided may help to ensure that an individual cell receives the two or more cassettes.
  • Any known method of reagent delivery may be utilized to deliver the linked cassettes to a cell and will at least partly depend on the chemistry of the cassettes to be delivered.
  • Non-limiting examples of reagent delivery methods may include: transformation, lipofection, electroporation, transfection, nanoparticles, and the like.
  • cassettes, or isolated, donor, or editing nucleic acids may be introduced to a cell or microorganism to alter or modulate an aspect of the cell or microorganism, for example survival or growth of the microorganism as disclosed herein.
  • the isolated nucleic acid may be derived from genomic RNA, complementary DNA (cDNA), chemically or enzymatically synthesized DNA. Additionally or alternatively, isolated nucleic acids may be of use for capture probes, primers, labeled detection oligonucleotides, or fragments for DNA assembly.
  • nucleic acid can include single-stranded and/or double-stranded molecules, as well as DNA, RNA, chemically modified nucleic acids and nucleic acid analogs. It is contemplated that a nucleic acid may be of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
  • Isolated nucleic acids may be made by any method known in the art, for example using standard recombinant methods, assembly methods, synthetic techniques, or combinations thereof.
  • the nucleic acids may be cloned, amplified, assembled, or otherwise constructed.
  • the nucleic acids may conveniently comprise sequences in addition to a portion of a lysine riboswitch. For example, a multi-cloning site comprising one or more endonuclease restriction sites may be added.
  • a nucleic acid may be attached to a vector, adapter, or linker for cloning of a nucleic acid. Additional sequences may be added to such cloning and sequences to optimize their function, to aid in isolation of the nucleic acid, or to improve the introduction of the nucleic acid into a cell.
  • Use of cloning vectors, expression vectors, adapters, and linkers is well known in the art.
  • Isolated nucleic acids may be obtained from cellular, bacterial, or other sources using any number of cloning methodologies known in the art.
  • oligonucleotide probes which selectively hybridize, under stringent conditions, to other oligonucleotides or to the nucleic acids of an organism or cell. Methods for construction of nucleic acid libraries are known and any such known methods may be used.
  • Cellular genomic DNA, RNA, or cDNA may be screened for the presence of an identified genetic element of interest using a probe based upon one or more sequences. Various degrees of stringency of hybridization may be employed in the assay.
  • High stringency conditions for nucleic acid hybridization are well known in the art.
  • conditions may comprise low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0.15 M NaCl at temperatures of about 50° C. to about 70° C. It is understood that the temperature and ionic strength of a desired stringency are determined in part by the length of the particular nucleic acid(s), the length and nucleotide content of the target sequence(s), the charge composition of the nucleic acid(s), and by the presence or concentration of formamide, tetramethylammonium chloride or other solvent(s) in a hybridization mixture. Nucleic acids may be completely complementary to a target sequence or may exhibit one or more mismatches.
  • Nucleic acids of interest may also be amplified using a variety of known amplification techniques. For instance, polymerase chain reaction (PCR) technology may be used to amplify target sequences directly from DNA, RNA, or cDNA. PCR and other in vitro amplification methods may also be useful, for example, to clone nucleic acid sequences, to make nucleic acids to use as probes for detecting the presence of a target nucleic acid in samples, for nucleic acid sequencing, or for other purposes.
  • PCR polymerase chain reaction
  • Isolated nucleic acids may be prepared by direct chemical synthesis by methods such as the phosphotriester method, or using an automated synthesizer. Chemical synthesis generally produces a single stranded oligonucleotide. This may be converted into double stranded DNA by hybridization with a complementary sequence or by polymerization with a DNA polymerase using the single strand as a template.
  • Target proteins contemplated herein include protein agents used to treat a human condition or to regulate processes (e.g. part of a pathway such as an enzyme) involved in disease of a human or non-human mammal. Any method known for selection and production of antibodies or antibody fragments is also contemplated. Additionally or alternatively, target proteins can be proteins or enzymes involved in a pathway or process in a virus, cell, or organism.
  • Some methods disclosed herein comprise targeting cleavage of specific nucleic acid sequences using a site-specific, targetable, and/or engineered nuclease or nuclease system.
  • Such nucleases can create double-stranded break (DSBs) at desired locations in a genome or nucleic acid molecule.
  • DSBs double-stranded break
  • a nuclease can create a single strand break.
  • two nucleases are used, each of which generates a single strand break.
  • the one or more double or single strand break can be repaired by natural processes of homologous recombination (HR) and non-homologous end-joining (NHEJ) using the cell's endogenous machinery. Additionally or alternatively, endogenous or heterologous recombination machinery can be used to repair the induced break or breaks.
  • HR homologous recombination
  • NHEJ non-homologous end-joining
  • endogenous or heterologous recombination machinery can be used to repair the induced break or breaks.
  • Engineered nucleases such as zinc finger nucleases (ZFNs), Transcription Activator-Like Effector Nucleases (TALENs), engineered homing endonucleases, and RNA or DNA guided endonucleases, such as CRISPR/Cas such as Cas9 or CPF1, and/or Argonaute systems, are particularly appropriate to carry out some of the methods of the present invention. Additionally or alternatively, RNA targeting systems can use used, such as CRISPR/Cas systems including c2c2 nucleases.
  • Methods disclosed herein can comprise cleaving a target nucleic acid using a CRISPR systems, such as a Type I, Type II, Type III, Type IV, Type V, or Type VI CRISPR system.
  • CRISPR/Cas systems can be multi-protein systems or single effector protein systems. Multi-protein, or Class 1, CRISPR systems include Type I, Type III, and Type IV systems. Alternatively, Class 2 systems include a single effector molecule and include Type II, Type VI, and Type VI.
  • CRISPR systems used in methods disclosed herein can comprise a single or multiple effector proteins.
  • An effector protein can comprise one or multiple nuclease domains.
  • An effector protein can target DNA or RNA, and the DNA or RNA may be single stranded or double stranded.
  • Effector proteins can generate double strand or single strand breaks.
  • Effector proteins can comprise mutations in a nuclease domain thereby generating a nickase protein.
  • Effector proteins can comprise mutations in one or more nuclease domains, thereby generating a catalytically dead nuclease that is able to bind but not cleave a target sequence.
  • CRISPR systems can comprise a single or multiple guiding RNAs.
  • the gRNA can comprise a crRNA.
  • the gRNA can comprise a chimeric RNA with crRNA and tracrRNA sequences.
  • the gRNA can comprise a separate crRNA and tracrRNA.
  • Target nucleic acid sequences can comprise a protospacer adjacent motif (PAM) or a protospacer flanking site (PFS).
  • PAM or PFS may be 3′ or 5′ of the target or protospacer site. Cleavage of a target sequence may generate blunt ends, 3′ overhangs, or 5′ overhangs.
  • a gRNA can comprise a spacer sequence.
  • Spacer sequences can be complementary to target sequences or protospacer sequences.
  • Spacer sequences can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 nucleotides in length. In some examples, the spacer sequence can be less than 10 or more than 36 nucleotides in length.
  • a gRNA can comprise a repeat sequence.
  • the repeat sequence is part of a double stranded portion of the gRNA.
  • a repeat sequence can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length.
  • the spacer sequence can be less than 10 or more than 50 nucleotides in length.
  • a gRNA can comprise one or more synthetic nucleotides, non-naturally occurring nucleotides, nucleotides with a modification, deoxyribonucleotide, or any combination thereof. Additionally or alternatively, a gRNA may comprise a hairpin, linker region, single stranded region, double stranded region, or any combination thereof. Additionally or alternatively, a gRNA may comprise a signaling or reporter molecule.
  • a CRISPR nuclease can be endogenously or recombinantly expressed within a cell.
  • a CRISPR nuclease can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome.
  • a CRISPR nuclease can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide. In such examples, polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • gRNAs can be encoded by genetic or episomal DNA within a cell.
  • gRNAs can be provided or delivered to a cell expressing a CRISPR nuclease.
  • gRNAs can be provided or delivered concomitantly with a CRISPR nuclease or sequentially.
  • Guide RNAs can be chemically synthesized, in vitro transcribed, or otherwise generated using standard RNA generation techniques known in the art.
  • a CRISPR system can be a Type II CRISPR system, for example a Cas9 system.
  • the Type II nuclease can comprise a single effector protein, which, in some cases, comprises a RuvC and HNH nuclease domains.
  • a functional Type II nuclease can comprise two or more polypeptides, each of which comprises a nuclease domain or fragment thereof.
  • the target nucleic acid sequences can comprise a 3′ protospacer adjacent motif (PAM).
  • the PAM may be 5′ of the target nucleic acid.
  • Guide RNAs gRNA
  • gRNA can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences.
  • the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA.
  • the Type II nuclease can generate a double strand break, which is some cases creates two blunt ends.
  • the Type II CRISPR nuclease is engineered to be a nickase such that the nuclease only generates a single strand break.
  • two distinct nucleic acid sequences can be targeted by gRNAs such that two single strand breaks are generated by the nickase.
  • the two single strand breaks effectively create a double strand break.
  • a Type II nickase In some cases where a Type II nickase is used to generate two single strand breaks, the resulting nucleic acid free ends can either be blunt, have a 3′ overhang, or a 5′ overhang.
  • a Type II nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave.
  • a Type II nuclease could have mutations in both the RuvC and HNH domains, thereby rendering the both nuclease domains non-functional.
  • a Type II CRISPR system can be one of three sub-types, namely Type II-A, Type II-B, or Type II-C.
  • a CRISPR system can be a Type V CRISPR system, for example a Cpf1, C2c1, or C2c3 system.
  • the Type V nuclease can comprise a single effector protein, which in some cases comprises a single RuvC nuclease domain.
  • a function Type V nuclease comprises a RuvC domain split between two or more polypeptides.
  • the target nucleic acid sequences can comprise a 5′ PAM or 3′ PAM.
  • Guide RNAs can comprise a single gRNA or single crRNA, such as can be the case with Cpf1. In some cases, a tracrRNA is not needed.
  • a gRNA can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences or the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA.
  • the Type V CRISPR nuclease can generate a double strand break, which in some cases generates a 5′ overhang.
  • the Type V CRISPR nuclease is engineered to be a nickase such that the nuclease only generates a single strand break.
  • two distinct nucleic acid sequences can be targeted by gRNAs such that two single strand breaks are generated by the nickase.
  • the two single strand breaks effectively create a double strand break.
  • a Type V nickase is used to generate two single strand breaks
  • the resulting nucleic acid free ends can either be blunt, have a 3′ overhang, or a 5′ overhang.
  • a Type V nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave.
  • a Type V nuclease could have mutations a RuvC domain, thereby rendering the nuclease domain non-functional.
  • a CRISPR system can be a Type VI CRISPR system, for example a C2c2 system.
  • a Type VI nuclease can comprise a HEPN domain.
  • the Type VI nuclease comprises two or more polypeptides, each of which comprises a HEPN nuclease domain or fragment thereof.
  • the target nucleic acid sequences can by RNA, such as single stranded RNA.
  • a target nucleic acid can comprise a protospacer flanking site (PFS).
  • the PFS may be 3′ or 5′ or the target or protospacer sequence.
  • Guide RNAs gRNA can comprise a single gRNA or single crRNA.
  • a tracrRNA is not needed.
  • a gRNA can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences or the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA.
  • a Type VI nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave.
  • a Type VI nuclease could have mutations in a HEPN domain, thereby rendering the nuclease domains non-functional.
  • Non-limiting examples of suitable nucleases, including nucleic acid-guided nucleases, for use in the present disclosure include C2c1, C2c2, C2c3, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Cpf1, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx100, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologues thereof, orthologues thereof, or modified versions thereof
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus which includes but is not limited to Thiomicrospira, Succinivibrio, Candidatus, Porphyromonas, Acidomonococcus, Prevotella, Smithella, Moraxella, Synergistes, Francisella, Leptospira, Catenibacterium, Kandleria, Clostridium, Dorea, Coprococcus, Enterococcus, Fructobacillus, Weissella, Pediococcus, Corynebacter, Sutterella, Legionella, Treponema, Roseburia, Filifactor, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola, Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Parvibaculum, Staphylococcus, Nit
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a kingdom, which includes but is not limited to Firmicute, Actinobacteria, Bacteroidetes, Proteobacteria, Spirochates, and Tenericutes.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a phylum which includes but is not limited to Erysipelotrichia, Clostridia, Bacilli, Actinobacteria, Bacteroidetes, Flavobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteob acteria, Deltaproteob acteria, Epsilonproteobacteria, Spirochaetes, and Mollicutes.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within an order which includes but is not limited to Clostridiales, Lactobacillales, Actinomycetales, Bacteroidales, Flavobacteriales, Rhizobiales, Rhodospirillales, Burkholderiales, Neisseriales, Legionellales, Nautiliales, Campylobacterales, Spirochaetales, Mycoplasmatales, and Thiotrichales.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a family which includes but is not limited to Lachnospiraceae, Enterococcaceae, Leuconostocaceae, Lactobacillaceae, Streptococcaceae, Peptostreptococcaceae, Staphylococcaceae, Eubacteriaceae, Corynebacterineae, Bacteroidaceae, Flavobacterium, Cryomoorphaceae, Rhodobiaceae, Rhodospirillaceae, Acetobacteraceae, Sutterellaceae, Neisseriaceae, Legionellaceae, Nautiliaceae, Campylobacteraceae, Spirochaetaceae, Mycoplasmataceae, Pisciririckettsiaceae, and Francisellaceae.
  • nucleic acid-guided nucleases suitable for use in the methods, systems, and compositions of the present disclosure include those derived from an organism such as, but not limited to, Thiomicrospira sp. XS5, Eubacterium rectale, Succinivibrio dextrinosolvens, Candidatus Methanoplasma termitum, Candidatus Methanomethylophilus alvus, Porphyromonas crevioricanis, Flavobacterium branchiophilum, Acidomonococcus sp., Lachnospiraceae bacterium COE1, Prevotella brevis ATCC 19188 , Smithella sp.
  • SCADC Moraxella bovoculi, Synergistes jonesii, Bacteroidetes oral taxon 274, Francisella tularensis, Leptospira inadai serovar Lyme str. 10 , Acidomonococcus sp. crystal structure (5B43) S. mutans, S. agalactiae, S. equisimilis, S. sanguinis, S. pneumonia; C. jejuni, C. coli; N. salsuginis, N. tergarcus; S. auricularis, S. carnosus; N. meningitides, N. gonorrhoeae; L. monocytogenes, L. ivanovii; C.
  • Lachnospiraceae bacterium MA2020 Lachnospiraceae bacterium MA2020 , Candidatus Methanoplasma termitum, Eubacterium eligens, Moraxella bovoculi 237 , Leptospira inadai, Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis 3, Prevotella disiens, Porphyromonas macacae, Catenibacterium sp.
  • Suitable nucleases for use in any of the methods disclosed herein include, but are not limited to, nucleases having the sequences listed in Table 1, or homologues having at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% sequence identity to any of the nucleases listed in Table 1.
  • Argonaute (Ago) systems can be used to cleave target nucleic acid sequences.
  • Ago protein can be derived from a prokaryote, eukaryote, or archaea.
  • the target nucleic acid may be RNA or DNA.
  • a DNA target may be single stranded or double stranded.
  • the target nucleic acid does not require a specific target flanking sequence, such as a sequence equivalent to a protospacer adjacent motif or protospacer flanking sequence.
  • the Ago protein may create a double strand break or single strand break. In some examples, when a Ago protein forms a single strand break, two Ago proteins may be used in combination to generate a double strand break.
  • an Ago protein comprises one, two, or more nuclease domains. In some examples, an Ago protein comprises one, two, or more catalytic domains. One or more nuclease or catalytic domains may be mutated in the Ago protein, thereby generating a nickase protein capable of generating single strand breaks. In other examples, mutations in one or more nuclease or catalytic domains of an Ago protein generates a catalytically dead Ago protein that can bind but not cleave a target nucleic acid.
  • Ago proteins can be targeted to target nucleic acid sequences by a guiding nucleic acid.
  • the guiding nucleic acid is a guide DNA (gDNA).
  • the gDNA can have a 5′ phosphorylated end.
  • the gDNA can be single stranded or double stranded. Single stranded gDNA can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length.
  • the gDNA can be less than 10 nucleotides in length. In some examples, the gDNA can be more than 50 nucleotides in length.
  • Argonaute-mediated cleavage can generate blunt end, 5′ overhangs, or 3′ overhangs.
  • one or more nucleotides are removed from the target site during or following cleavage.
  • Argonaute protein can be endogenously or recombinantly expressed within a cell.
  • Argonaute can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome.
  • an Argonaute protein can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide.
  • polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • Guide DNAs can be provided by genetic or episomal DNA within a cell.
  • gDNA are reverse transcribed from RNA or mRNA within a cell.
  • gDNAs can be provided or delivered to a cell expressing an Ago protein.
  • Guide DNAs can be provided or delivered concomitantly with an Ago protein or sequentially.
  • Guide DNAs can be chemically synthesized, assembled, or otherwise generated using standard DNA generation techniques known in the art.
  • Guide DNAs can be cleaved, released, or otherwise derived from genomic DNA, episomal DNA molecules, isolated nucleic acid molecules, or any other source of nucleic acid molecules.
  • compositions comprising a nuclease such as an nucleic acid-guided nuclease (e.g., Cas9, Cpf1, MAD2, or MAD7) or a DNA-guided nuclease (e.g., Ago), linked to a chromatin-remodeling enzyme.
  • a nuclease fusion protein as described herein may provide improved accessibility to regions of highly-structured DNA.
  • Non-limiting examples of chromatin-remodeling enzymes that can be linked to a nucleic-acid guided nuclease may include: histone acetyl transferases (HATs), histone deacetylases (HDACs), histone methyltransferases (HMTs), chromatin remodeling complexes, and transcription activator-like (Tal) effector proteins.
  • Histone deacetylases may include HDAC1, HDAC2, HDAC3, HDAC4, HDACS, HDAC6, HDAC7, HDAC8, HDAC9, HDAC10, HDAC11, sirtuin 1, sirtuin 2, sirtuin 3, sirtuin 4, sirtuin 5, sirtuin 6, and sirtuin 7.
  • Histone acetyl transferases may include GCNS, PCAF, Hat1, Elp3, Hpa2, Hpa3, ATF-2, Nut1, Esa1, Sas2, Sas3, Tip60, MOF, MOZ, MORF, HBO1, p300, CBP, SRC-1, ACTR, TIF-2, SRC-3, TAFII250, TFIIIC, Rtt109, and CLOCK.
  • Histone methyltransferases may include ASH1L, DOT1L, EHMT1, EHMT2, EZH1, EZH2, MLL, MLL2, MLL3, MLL4, MLL5, NSD1, PRDM2, SET, SETBP1, SETD1A, SETD1B, SETD2, SETD3, SETD4, SETD5, SETD6, SETD7, SETD8, SETD9, SETDB1, SETDB2, SETMAR, SMYD1, SMYD2, SMYD3, SMYD4, SMYD5, SUV39H1, SUV39H2, SUV420H1, and SUV420H2.
  • Chromatin-remodeling complexes may include SWI/SNF, ISWI, NuRD/Mi-2/CHD, INO80 and SWR1.
  • the nuclease is a wild-type nuclease. In other instances, the nuclease is a chimeric engineered nuclease.
  • Chimeric engineered nucleases as disclosed herein can comprise one or more fragments or domains, and the fragments or domains can be of a nuclease, such as nucleic acid-guided nuclease, orthologs of organisms of genuses, species, or other phylogenetic groups disclosed herein; advantageously the fragments are from nuclease orthologs of different species.
  • a chimeric engineered nuclease can be comprised of fragments or domains from at least two different nucleases.
  • a chimeric engineered nuclease can be comprised of fragments or domains from at least two different species.
  • a chimeric engineered nuclease can be comprised of fragments or domains from at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different nucleases or different species.
  • more than one fragment or domain from one nuclease or species wherein the more than one fragment or domain are separated by fragments or domains from a second nuclease or species.
  • a chimeric engineered nuclease comprises 2 fragments, each from a different protein or nuclease.
  • a chimeric engineered nuclease comprises 3 fragments, each from a different protein or nuclease.
  • a chimeric engineered nuclease comprises 4 fragments, each from a different protein or nuclease. In some examples, a chimeric engineered nuclease comprises 5 fragments, each from a different protein or nuclease.
  • Nuclease fusion proteins can be recombinantly expressed within a cell.
  • a nuclease fusion protein can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome.
  • a nuclease and a chromatin-remodeling enzyme may be engineered separately, and then covalently linked, prior to delivery to a cell.
  • a nuclease fusion protein can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide. In such examples, polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • compositions comprising a cell-cycle-dependent nuclease are provided.
  • a cell-cycle dependent nuclease generally includes a targeted nuclease as described herein linked to an enzyme that leads to degradation of the targeted nuclease during G1 phase of the cell cycle, and expression of the targeted nuclease during G2/M phase of the cell cycle.
  • Such cell-cycle dependent expression may, for example, bias the expression of the nuclease in cells where homology-directed repair (HDR) is most active (e.g., during G2/M phase).
  • HDR homology-directed repair
  • the nuclease is covalently linked to cell-cycle regulated protein such as one that is actively degraded during G1 phase of the cell cycle and is actively expressed during G2/M phase of the cell cycle.
  • the cell-cycle regulated protein is Geminin.
  • Other non-limiting examples of cell-cycle regulated proteins may include: Cyclin A, Cyclin B, Hsll, Cdc6, Finl, p21 and Skp2.
  • the nuclease is a wild-type nuclease.
  • the nuclease is a engineered nuclease.
  • Engineered nucleases can be non-naturally occurring.
  • Non-naturally occurring targetable nucleases and non-naturally occurring targetable nuclease systems can address many of these challenges and limitations.
  • Non-naturally targetable nuclease systems are engineered to address one or more of the challenges described above and can be referred to as engineered nuclease systems.
  • Engineered nuclease systems can comprise one or more of an engineered nuclease, such as an engineered nucleic acid-guided nuclease, an engineered guide nucleic acid, an engineered polynucleotides encoding said nuclease, or an engineered polynucleotides encoding said guide nucleic acid.
  • Engineered nucleases, engineered guide nucleic acids, and engineered polynucleotides encoding the engineered nuclease or engineered guide nucleic acid are not naturally occurring and are not found in nature. It follows that engineered nuclease systems including one or more of these elements are non-naturally occurring.
  • Non-limiting examples of types of engineering that can be done to obtain a non-naturally occurring nuclease system are as follows.
  • Engineering can include codon optimization to facilitate expression or improve expression in a host cell, such as a heterologous host cell.
  • Engineering can reduce the size or molecular weight of the nuclease in order to facilitate expression or delivery.
  • Engineering can alter PAM selection in order to change PAM specificity or to broaden the range of recognized PAMs.
  • Engineering can alter, increase, or decrease stability, processivity, specificity, or efficiency of a targetable nuclease system.
  • Engineering can alter, increase, or decrease protein stability.
  • Engineering can alter, increase, or decrease processivity of nucleic acid scanning.
  • Engineering can alter, increase, or decrease target sequence specificity.
  • Engineering can alter, increase, or decrease nuclease activity.
  • Engineering can alter, increase, or decrease editing efficiency.
  • Engineering can alter, increase, or decrease transformation efficiency.
  • Engineering can alter, increase, or decrease nuclease or
  • non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 41-60), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 127-146), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 147-166), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 61-80), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 21-40) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 1-20), or engineered guide nucleic acids comprising any one of SEQ ID NO
  • non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 168), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 169), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 170), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 171), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 167) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 108-110), or engineered guide nucleic acids compatible with any targetable nuclease disclosed here
  • a guide nucleic acid can be DNA.
  • a guide nucleic acid can be RNA.
  • a guide nucleic acid can comprise both DNA and RNA.
  • a guide nucleic acid can comprise modified of non-naturally occurring nucleotides.
  • the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features.
  • Common features can include sequence outside a pseudoknot region.
  • Common features can include a pseudoknot region (e.g., 172-181).
  • Common features can include a primary sequence or secondary structure.
  • a guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence.
  • a guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid.
  • Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • the nuclease is a chimeric nuclease.
  • Chimeric nucleases can be engineered nucleases.
  • Chimeric nucleases as disclosed herein can comprise one or more fragments or domains, and the fragments or domains can be of a nuclease, such as nucleic acid-guided nuclease, orthologs of organisms of genuses, species, or other phylogenetic groups; advantageously the fragments are from nuclease orthologs of different species.
  • a chimeric nuclease can be comprised of fragments or domains from at least two different nucleases.
  • a chimeric nuclease can be comprised of fragments or domains from at least two different species.
  • a chimeric nuclease can be comprised of fragments or domains from at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different nucleases or different species. In some cases, more than one fragment or domain from one nuclease or species, wherein the more than one fragment or domain are separated by fragments or domains from a second nuclease or species. In some examples, a chimeric nuclease comprises 2 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 3 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 4 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 5 fragments, each from a different protein or nuclease.
  • FIGS. 1 A-C depict an example of an overview of CRISPR EnAbled Trackable genome Engineering (CREATE) design and workflow.
  • FIG. 1 A shows an example of the CREATE methodology which allows programmatic genome modifications to be focused on key amino acid residues or promoter targets across the genome. Such libraries thus enable systematic assessment of sequence/activity relationships for a wide variety of genomic targets in parallel.
  • FIG. 1 B depicts an example of CREATE cassettes designed to encode both homology arm (HA) and guide RNA (gRNA) sequences to target a specific locus in the E. coli genome.
  • HA homology arm
  • gRNA guide RNA
  • the 100 bp homology arm was designed to introduce a specific codon mutation (target codon) that can be selectively enriched by a synonymous PAM mutation to rescue the sequence from Cas9 cleavage and allow highly efficient mutagenesis.
  • the P1 and P2 sites black) serve as general priming sites allowing multiplexed amplification, cloning and sequencing of many libraries in parallel.
  • the promoter J23119, green is a constitutive promoter that drives expression of the gRNA.
  • the HA design for introducing a stop codon at residue 145 in the galK locus is also depicted at the bottom of FIG. 1 B .
  • the top sequence shows the wildtype genome sequence with the PAM (CCG; the reverse complement of which is CGG, which is recognized by S. pyogenes Cas9) and target codon (TAT, encoding Y) highlighted.
  • the HA design introduces a “silent scar” at the PAM site (CgG, the reverse complement of which is CCG, which is not recognized by S. pyogenes Cas9) and a single nucleotide TAT>TAA mutation at codon 145 (resulting in a STOP).
  • This design strategy was implemented programmatically for coding regions across the genome.
  • FIG. 1 C depicts an overview of an example CREATE workflow.
  • CREATE cassettes are synthesized on a microarray delivered as large oligo pools (10 4 to 10 6 individual library members). Parallel cloning and recombineering allowed processing of these pools into genomic libraries, in some cases in 23 days. Deep sequencing of the CREATE plasmids can be used to track the fitness of thousands of precision mutations genome wide following selection or screening of the mutant libraries.
  • FIG. 2 A-D depicts an example of the effect of Cas9 activity on transformation and editing efficiencies.
  • the galK 120/17 CREATE cassette 120 bp HA and 17 bp PAM/codon spacing) targeting codon 145 in galK gene or a control non-targeting gRNA vector was transformed in cells carrying pSIMS along with dCas9 (e.g. left set of bars in FIG. 2 A ) or Cas9 (e.g. right set of bars in FIG. 2 A ) plasmids.
  • the pSIMS plasmid carries lambda red recombination machinery.
  • the cas9 gene was cloned into the pBTBX-2 backbone under the control of a pBAD promoter to allow control of the cleavage activity by addition of arabinose. Transformation efficiencies of each vector are shown with dark grey bars. The total number of recombinant cells (light grey bars) were calculated based on red/white colony screening on MacConkey agar. In cases where white colonies were undetectable by plate based screening we assumed 10 4 editing efficiencies. A 10 2 fold reduction in transformation efficiency compared to the non-targeting gRNA control was also observed for CREATE cassettes transformed into the Cas9 background.
  • FIG. 2 B depicts an example of the characterization of CREATE cassette HA length and PAM/codon spacing on editing efficiency. All cassettes were designed to introduce a TAA stop at codon 145 in the gene using PAMs at the indicated distance (PAM/codon bottom) from the target codon and variable homology arms lengths (HA, bottom). Dark grey and light grey bars correspond to uninduced or induced expression of Cas9 under the pBAD promoter using 0.2% arabinose. In the majority of cases the editing efficiency appears to be unaffected by induction suggesting that low amounts of Cas9 due to leaky expression are sufficient for high efficiency editing.
  • FIG. 2 C shows example data from sequencing of the genomic loci from CREATE recombineering reactions.
  • the galK cassettes from FIG. 2 B are labeled according to the HA length and PAM codon spacing.
  • the other loci shown were cassettes isolated from multiplexed library cloning reactions.
  • the bar plot ( FIG. 2 C ) indicates the number of times each genotype was observed by genomic colony sequencing following recombineering with each CREATE cassette.
  • the + and labels at the bottom indicate the presence or absence of the designed mutation at the two relevant sites in each clone.
  • the circular inset indicates the relative position of each gene on the E. coli genome.
  • FIG. 2 D depicts an example of library coverage from multiplexed cloning of CREATE plasmids. Deep sequencing counts each variant are shown with respect to their position on the genome. The inset shows a histogram of these plasmid counts for the entire library. The distribution follows expected Poisson distribution for low average counts.
  • FIG. 3 A depicts an example of an overview of the method used to generate a trackable episomal DNA library. Transformation of a CREATE recorder plasmid generates modifications of the target DNA at two sites. One edit occurs to the desired target gene (gray) introducing a codon or promoter mutation designed to test specific engineering objectives. The second edit targets a functionally neutral site and introduces a 15 nucleotide barcode (BC, black). By virtue of coupling these libraries on a single CREATE plasmid the target DNA is edited at both sites and each unique barcode can be used to track edits throughout the rest of the plasmid.
  • BC 15 nucleotide barcode
  • FIG. 2 B depicts an example of the CREATE barcode design.
  • a degenerate library is constructed from overlapping oligos and cloned in a separate site of the CREATE vector to make a library of CREATE recorder cassettes that can be coupled to the designer editing libraries.
  • FIG. 2 C depicts an exemplary CREATE record mapping strategy. Deep sequencing of both the target DNA (left) and CREATE plasmids allows a simple sequence mapping strategy by allowing each editing cassette to be uniquely assigned by the barcode sequence. This allows the relative fitness of each barcode (and thus edit) to be tracked during selection or screening processes and can be shuttled between different organisms using standard vectors.
  • a cassette was designed to make an R1335K mutation in the Cas9 protein. This cassette was cloned into a CREATE plasmid and transformed into MG1655 E. coli carrying the pSIM5 and X2-Cas9 vectors.
  • the pSIM5 vector comprises lambda red recombination machinery.
  • the X2-Cas9 vector comprises an arabinose-inducible Cas9 expression cassette.
  • FIG. 5 A depicts the total number of colonies (CFUs) in duplicate experiments that are edited and/or barcoded.
  • the edited CFUs numbers were calculated by extrapolation of the data in FIG. 5 A to the total number of CFUs on the plate.
  • the barcoded CFUs numbers were calculated by counting the number of white colonies in a galK screening (site in which barcode is integrated). These data show that the majority of barcoded colonies contained the designed genomic edit.
  • FIG. 6 depicts an example of combinatorial genome engineering and tracking.
  • Three recursive CREATE plasmids are used, each with a gRNA targeting one of the other markers in this series (indicated by T-lines).
  • an edit and barcode are incorporated into the genome and the previous CREATE plasmid is cured.
  • rapid iterative transformations can be performed to construct either a defined combination of mutations or a combinatorial library to search for improved phenotypes.
  • the recording site is compatible with short read sequencing technologies that allow the fitness of combinations to be tracked across a population. Such an approach allows rapid investigation of genetic epistasis and optimization of phenotypes relevant to basic research or for commercial biological applications.
  • FIG. 3 D and FIG. 3 E depict another example of combinatorial genome engineering.
  • an editing cassette blue rectangle in FIG. 3 D
  • a recorder cassette green rectangle in FIG. 3 D
  • each recorder sequence comprises a 15 nucleotide barcode.
  • the recorder sequences are each inserted adjacent to the last recorder sequence, despite where the editing cassette was inserted.
  • Each recorder cassette can simultaneously delete a PAM site.
  • the engineered cells can be selected and then the inserted mutations can be tracked by sequencing the recorder region that comprises all of the inserted recorder cassettes.
  • each editing cassette can be linked or associated with one or more unique barcodes within the recorder cassette. Since each recorder cassette corresponds to the associate editing cassette, then the mutations incorporated by the editing cassettes can be tracked or identified by the sequence of the recorder cassette, or the sequence of the barcodes within the recorder cassette. As is demonstrated in FIG. 3 E , by sequencing all of the recorder cassettes or barcodes within the recorder cassettes, each of the inserted mutations can be identified and tracked.
  • the inserted recorder sequences can be referred to as a recorder site, recorder array, or barcode array.
  • sequencing the barcode array or recorder site allows tracking of the history of genomic editing events in the strain.
  • the barcode array or recorder site can identify the order in which the mutations were inserted as well as what the mutation is.
  • Each CREATE plasmid can be positively selected for based on the indicated antibiotics (Trimeth: trimethoprim, Carb: carbenicillin, Tet: tetracycline) and contains a gRNA targeting one of the other antibiotic markers.
  • the reCREATE1 plasmid can be selected for on carbenicillin and encodes a gRNA that will selectively target the trimethoprim resistance gene for destruction.
  • One pass through the carb/tetracycline/trimethoprim antibiotic marker series allows selective incorporation of up to three targeted edits.
  • the recording function would be implemented as illustrated in FIG. 5 , but is omitted here for simplicity.
  • FIG. 7 B depicts an example of data from iterative rounds of CREATE engineering.
  • a serial transformation series began with cells transformed with X2cas9 (kan) and the reCREATE1 vector.
  • the spot plating results indicate that curing is 99.99% effective at each transformation step, ensuring highly efficient engineering in each round of transformation.
  • Simultaneous genome editing and plasmid curing in each transformation step with high efficiencies was achieved by introducing the requisite recording and editing CREATE cassettes into recursive vectors as disclosed herein (e.g. FIG. 7 B ).
  • FIGS. 8 A- 8 B An example overview of CRISPR EnAbled Trackable genome Engineering (CREATE) design workflow is depicted in FIGS. 8 A- 8 B .
  • FIG. 8 A shows example anatomy of a CREATE cassette designed for protein engineering.
  • Cassettes encode a spacer (red) along with part of a guide RNA (gRNA) sequence and a designer homology arm (HA) that can template homologous recombination at the genomic cut site.
  • gRNA guide RNA
  • HA designer homology arm
  • the HA is designed to systematically couple mutations to a specified codon or target site (TS, blue) to a nearby synonymous PAM mutation (SPM, red) to rescue the sequence from Cas9 cleavage and allow highly efficient mutagenesis.
  • TS specified codon or target site
  • SPM synonymous PAM mutation
  • the priming sites (P1 and P2, black) are designed to allow multiplexed amplification and cloning of specific subpools from massively parallel array based synthesis.
  • a constitutive promoter (green) drives expression of the gRNA.
  • FIG. 8 A further shows a detailed example of HA design for introducing a stop codon at residue 145 in the galK locus. The top sequence shows is of the wt genome with the PAM and TS codon highlighted. The translation sequences are shown to illustrate that the resulting mutant contains a single nonsynonymous mutation at the target site.
  • FIG. 8 B shows an example overview of the CREATE workflow. CREATE oligos are synthesized on a microarray and delivered as large pools (10 4 -10 6 individual library members).
  • cassettes are amplified and cloned in multiplex with the ability to subpool designs. After introduction of the CREATE plasmids into cells expressing Cas9 mutations are transferred to the genome with high efficiencies. Measurement of the frequency of each plasmid before (fi, t1) and after selection (fi, t2) by deep sequencing provides enrichment scores (Ei) for each CREATE cassette. These scores allow rapid identification of adaptive variants at up to single nucleotide or amino acid resolution for thousands loci in parallel.
  • FIG. 9 A depicts an example of the effects of Cas9 activity on transformation and editing efficiencies were measured using no a cassette with a spacer and 120 bp HA targeted to the galK (galK_Y145*_120/17)
  • the total transformants (TT white) produced by this CREATE vector are shown in white and the total number of recombinants (TR) in dark blue.
  • TR is calculated as the product of the editing efficiency and Tt.
  • FIG. 9 B shows an example of characterization of CREATE cassette HA length and PAM/codon spacing on editing efficiency.
  • FIG. 9 C depicts an example of determination of editing efficiency for oligo derived cassettes by sequencing of the genomic loci.
  • the galK_Y145*_120/17 cassette from FIGS. 9 A and 9 B is shown in white for reference.
  • the bar plot indicates the number of times each genotype was observed by genomic colony sequencing following recombineering with each CREATE cassette.
  • the circular inset indicates the relative position of each gene on the E. coli genome.
  • FIG. 9 D depicts distance between SPM and the TS (as exemplified in FIG. 8 A ) is strongly correlated with editing efficiency (correct edits/total sequences sampled).
  • the galK cassettes with 44 and 59 bp in FIG. 9 B were omitted from this analysis.
  • FIG. 10 A- 10 C depict an example where CREATE was used to generate a full scanning saturation mutagenesis library of the folA gene for identification of mutations that can confer resistance to TMP.
  • the count weighted average enrichment score from two trials of selection is plotted as a function of residue position (right).
  • Cassettes encoding nonsynonymous mutations are shown in gray, and those encoding synonymous mutations in black.
  • Cassettes with enrichment scores greater than 1.8 are highlighted in red and mutations that affect previously reported sites are labeled for reference.
  • the dashed lines indicate enrichment values that are significantly different (p ⁇ 0.05) from the synonymous dataset as determined by bootstrapping of the confidence intervals. These values are shown as a histogram for reference (middle).
  • FIGS. 10 D- 10 F depict example growth analysis of wt (left) F153W (middle) and F153R (right) variants in the indicated range of TMP concentrations (shown right).
  • FIG. 11 A depicts example genomic plots of enrichment scores for CREATE libraries grown at 42.2° C. in minimal media conditions.
  • the innermost plot illustrates the counts of the plasmid library before selection with labels for the top 20 representatives.
  • the outer ring shows the fitness of pooled library variants after growth in minimal media at elevated temperature (42.2° C.).
  • the bars are colored according to log 2 enrichment. Blue bars represent detrimental mutations, red bars represent significantly enriched mutations and gray bars indicate mutations that appear neutral in this assay.
  • the 20 most enriched variants are labeled for reference and labels corresponding to ALE-derived variants are colored red.
  • FIG. 11 B shows a histogram of enrichment scores of all library variants (gray), ALE-derived mutants (red) and synonymous mutants (black) under 42.2° C. growth conditions.
  • the dotted gray line indicates significant enrichment scores compared to the synonymous population.
  • the histograms are normalized as a fraction of the total number of variants passing the counting threshold (number indicated in parentheses). Note that 231 of 251 unique nonsynonmous ALE cassettes sampled by this experiment appear to provide significant growth benefits.
  • FIG. 11 C depicts enrichment of mutations based on mutational distance from wt.
  • FIG. 12 A depicts example genomic plots of enrichment (log 2) of library variants in the presence of erythromycin (outer) and rifampicin (middle). The innermost plot illustrates the count distribution of the input plasmids for reference. Coloring and labeling are as in FIG. 11 A- 11 C .
  • FIG. 12 B depicts CREATE mutation mapping at the individual amino acid level. CREATE cassettes that introduce bulky side chains to amino acids 1572, S531 and L533 (red) of the RNA polymerase ⁇ subunit (rpoB) are highly enriched in the presence of rifampicin from genome wide targeting libraries.
  • FIG. 12 A depicts example genomic plots of enrichment (log 2) of library variants in the presence of erythromycin (outer) and rifampicin (middle). The innermost plot illustrates the count distribution of the input plasmids for reference. Coloring and labeling are as in FIG. 11 A- 11 C .
  • FIG. 12 B depicts
  • FIG. 11 C depicts a zoomed in region of the MarA transcription factor bound to its cognate DNA target is shown for reference (PDB ID 1BL0).
  • the wt Q89 residue protrudes away from the DNA binding interface due to unfavorable steric and electrostatic interactions between this side chain and the DNA.
  • the Q89N substitution identified by selection introduces a H-donor and shortens the side chain such that productive H-bonding can occur between this residue and the DNA backbone. Such an interaction likely favors stronger DNA binding and induction of downstream resistance genes.
  • FIG. 12 D depicts enrichment plot of genome wide targeting libraries with 10 g/L acetate or 2 g/L furfural respectively. Coloring is the same as in FIG. 11 A .
  • FIG. 12 D depicts enrichment plot of genome wide targeting libraries with 10 g/L acetate or 2 g/L furfural respectively. Coloring is the same as in FIG. 11 A .
  • FIG. 12 E depicts CREATE mapping at a gene level reveals trends at the gene level. Strong enrichment fis metA and fadR targeting mutations in acetate suggests important roles for these genes in acetate tolerance, as depicted in FIG. 12 F , same as in the furfural selections depicted in FIG. 12 E .
  • FIGS. 13 A- 13 D Illustration of example designs compatible with CREATE strategy are depicted in FIGS. 13 A- 13 D .
  • FIG. 13 A shows protein engineering applications a silent codon approach is taken (top, see also FIG. 8 A- 8 B ).
  • This mutation strategy allows targeted mutagenesis of key protein regions to alter features such as DNA binding, protein-protein interactions, catalysis, or allosteric regulation.
  • FIG. 13 B shows promoter mutations PAM sites in proximity to a specified transcription start site (TSS) can be disrupted through nucleotide replacement or integration cassettes.
  • TSS specified transcription start site
  • FIG. 13 C shows an example cassette design for mutagenizing a ribosome binding site (RBS).
  • FIG. 13 D depicts an example of a simple deletion design. Points a and b are included to illustrate distance between two sites at the gene deletion locus. In all cases cassette designs disrupt a targeted PAM to allow selective enrichment of the designed mutant.
  • FIGS. 14 A- 14 B depict edits made the DMAPP pathway in E. coli which is the precursor to lycopene. Edits were made to the ORF's for 11 genes. Eight edits were designed to improve activity and 3 edits were designed to reduce activity of competitive enzymes. Approximately 10,000 variants within the lycopene pathway were constructed and screened.
  • FIG. 15 depicts Cas9 editing control experiments.
  • the CREATE galK_120/17 off cassette (relevant edits shown in red at bottom) was transformed into different backgrounds to assess the efficiency of homologous recombination between the CREATE plasmid and the target genome.
  • Red colonies represent unedited (wt) genomic variants and white colonies represent edited variants.
  • Transformation into cells containing only pSIM5 or pSIM5/X2 and dCas9 plasmids exhibited no detectable recombination as indicated by the lack of white colonies.
  • In the presence of active Cas9 X2-Cas9 far right we observe high efficiency editing (>80%), indicating the requirements for dsDNA cleavage to achieve high efficiency editing and library coverage.
  • Example 16 Toxicity of gRNA dsDNA Cleavage in E. coli
  • FIGS. 16 A- 16 C depict experiments testing the toxicity of generating double strand breaks in E. coli .
  • gRNA targeting galK spacer sequence TTAACTTTGCGTAACAACGC (SEQ ID NO: 182)
  • folA spacer sequence GTAATTTTGTATAGAATTTA
  • the targeted sites are illustrated graphically on the left and at the bottom of the bar graph.
  • a non-targeting gRNA control was used to estimate transformation efficiency based on no edits (far left, no target sites).
  • FIGS. 16 D- 16 E depicts data from another such cell survival assay.
  • the editing cassette contained a F153R mutation, which leads to temperature sensitivity of the folA gene.
  • the recorder cassette contained a 15 nucleotide barcode designed to disrupt the galK gene, which allows screening of colonies on MacConkey agar plates. In this example, generating two cuts decreased cell survival compared to generating zero or one cut.
  • FIG. 16 F depicts data from a transformation and survival assay comparing a low copy number plasmid (Ec23) expressing Cas9 and a high copy number plasmid (MG) expressing Cas9.
  • Ec23 low copy number plasmid
  • MG high copy number plasmid
  • Different vectors with distinct editing cassettes were used to target different gene target sites (folA, lacZ, xylA, and rhaA).
  • the recorder cassettes were designed to target different sequences within the galK gene, either site S1, S2, or S3.
  • the recursive vector used had a different vector backbone compared to the others and is part of a 3-vector system designed for iterative engineering that cures the cell of the previous round vector.
  • the data indicates that lower Cas9 expression (Ec23 vector) increases survival and/or transformation efficiency.
  • the decreased Cas9 expression increased transformation efficiency by orders of magnitude in cells undergoing two genomic cuts (editing cassette and recording cassette).
  • FIG. 16 G shows the correlation between editing efficiency and recording efficiency in cells transformed with the low copy number plasmid (Ec23) expressing Cas9 and the high copy number plasmid (MG) expressing Cas9. Editing and recording efficiencies were similar for high (MG) and lower (Ec23) expression of cas9. Ec23 yielded more colonies and had better survival (as shown in FIG. 16 E ), while maintaining a high efficiency of dual editing (editing cassette and recorder cassette incorporation).
  • FIG. 17 A-D depict an example CREATE strategy for gene deletion.
  • FIG. 17 A depicts an example cassette design for deleting 100 bp from the galK ORF.
  • the HA is designed to recombine with regions of homology with the designated spacing, with each 50 bp side of the CREATE HA designed to recombine at the designated site (blue).
  • the PAM/spacer location (red) is proximal to one of the homology arms and is deleted during recombination, allowing selectable enrichment of the deleted segment.
  • FIG. 17 B depicts electrophoresis of chromosomal PCR amplicons from clones recombineered with this cassette.
  • FIG. 17 C depicts design for 700 bp deletion as in a).
  • FIG. 17 D depicts colony PCR of 700 bp deletion cassettes as in FIG. 17 B ).
  • the asterisks in FIGS. 17 B and 17 D indicate colonies that appear to have the designed deletion. Note that some clones appear to have bands pertaining to both wt and deletion sizes indicating that chromosome segregation in some of the colonies is incomplete when plated 3 hrs post recombineering.
  • FIG. 18 depicts effect of PAM distance on editing efficiency using linear dsDNA PCR amplicons and co-transformation with a gRNA.
  • On the left is an illustration of the experiments using PCR amplicons containing a dual (TAATAA) stop codon on one side (asterisk) and a PAM mutation just downstream of the galK gene (gray box) on the other end were co-transformed with a gRNA targeting the downstream galK PAM site.
  • the primers were designed such that the mutations were 40 nt from the end of the amplicon to ensure enough homology for recombination. Data was obtained from these experiments by red/white colony screening. A linear fit to the data is shown at the bottom.
  • FIG. 19 A depicts reads from an example plasmid library following cloning are shown according to the number of total mismatches between the read and the target design sequence. The majority of plasmids are matches to the correct design. However, there are a large number of 4 base pair indel/mismatch mutants that were observed in this cloned population.
  • FIG. 19 B depicts a plot of the mutation profile for the plasmid pool as a function of cassette position. An increase in the mutation frequency is observed near the center of the homology arm (HA) indicating a small error bias in the sequencing or synthesis of this region. We suspect that this is due to the presence of sequences complementary to the spacer element in the gRNA.
  • FIG. 19 A depicts reads from an example plasmid library following cloning are shown according to the number of total mismatches between the read and the target design sequence. The majority of plasmids are matches to the correct design. However, there are a large number of 4 base pair indel
  • FIG. 19 C depicts a histogram of the distances between the PAM and codon for the CREATE cassettes designed in this study. Large majority (>95%) were within the design constraints tested in FIG. 9 A- 9 D . The small fraction that are beyond 60 bp were made in cases where there was no synonymous PAM mutation within closer proximity.
  • FIG. 19 D depicts library coverage from multiplexed cloning of CREATE plasmids. Deep sequencing counts each variant are shown with respect to their position on the genome. The inset shows a histogram of the number of variants having the indicated plasmid counts in the cloned libraries.
  • FIG. 20 A depicts a correlation plot of CREATE cassette read frequencies in the plasmid population prior to Cas9 exposure (x-axis) and after 3 hours post transformation into a Cas9 background.
  • FIG. 20 B depicts a correlation plot between replicate recombineering reactions following overnight recovery.
  • the gray lines indicate the line of perfect correlation for reference.
  • R2 and p values were calculated from a linear fit to the data using the Python SciPy statistics package. A counting threshold of 5 for each replicate experiment was applied to the data to filter out noise from each data set.
  • FIG. 21 depicts growth characteristics of folA mutations in M9 minimal media. While F153R appears to maintain normal growth characteristics the growth rate of the F153W mutation is significantly slower under these conditions, suggesting that these two amino acid substitutions at the same site have very different effects on organismal fitness presumably due to different changes invoked in the stability/dynamics of this protein.
  • FIG. 22 depicts enrichment profiles for folA CREATE cassettes in minimal media.
  • Cassettes that encode synonymous HA are shown in black and non-synonymous cassettes in gray, the dashed lines indicate enrichment scores with p ⁇ 0.05 significance compared to the synonymous population mean as estimated from a bootstrap analysis.
  • the enrichment score observed for each mutant cassette at each position in the protein sequence is shown to the left and a histogram of these enrichment scores as a fraction of the total variants to the right.
  • the two populations appear to be largely similar. conserveed residues that are highly deleterious are shown in blue for reference.
  • FIG. 23 A depicts on the left a global overview of AcrB efflux pump. Substrates enter the pump through the openings in the periplasmic space and are extruded via the AcrB/AcrA/TolC complex across the outer membrane and into the extracellular space. Library targeted residues are highlighted by blue spheres for reference and the red dot indicates the region where many of the enriched variants clustered. On the right is a blow up of the loop-helix motif abutting the central funnel where enriched mutations in isobutanol were identified (red and teal spheres), presumably affecting solute transport from the periplasmic space. Mutants targeting the T60 position (teal spheres) was also enriched in the presence of erythromycin.
  • FIG. 23 B depicts confirmation of N70D and D73L mutations for tolerance to isobutanol.
  • the N70D mutation in particular appears to improve the final OD to a significant degree.
  • FIG. 23 C depicts improved growth of the AcrB T60N mutant was observed in inhibitory concentrations of erythromycin (200 ⁇ g/mL) and isobutanol (1.2%) in shaking 96 well plate, indicating that this mutation may enhance the efflux activity of this pump towards many compounds.
  • FIGS. 24 A- 24 D depict the number of variants detected in CREATE experiments involving 500 ⁇ g/mL rifampicin ( FIG. 24 A ), 500 ⁇ g/mL erythromycin ( FIG. 24 B ), 10 g/L acetate ( FIG. 24 C ), and 2 g/L furfural ( FIG. 24 D ). While naturally evolving systems or error-prone PCR are highly biased towards sampling single nucleotide polymorphisms (e.g. 1 nt mutations, red) these histograms illustrate the potential advantages for rational design approaches that can identify rare or inaccessible mutations (2 and 3 nt, green and blue respectively).
  • FIG. 26 A depicts a crystal structure of the Crp regulatory protein with variants identified by furfural selection highlighted in red (PDB ID 3N4M).
  • PDB ID 3N4M A number of the CREATE designs targeting residues near the cyclic-AMP binding site (aa. 28-30, 65) of this regulator were highly enriched in minimal media selections for furfural or thermal tolerance suggesting that these mutations may enhance E. coli growth in minimal media under a variety of stress conditions.
  • FIG. 26 B depicts validation the Crp S28P mutant identified in 2 g/L furfural selections in M9 media. This mutant was reconstructed as described for AcrB T60S in Example 23.
  • CRISPR EnAbled Trackable genome Engineering couples highly efficient CRISPR editing with massively parallel oligomer synthesis to enable trackable precision editing on a genome wide scale. This can be accomplished using synthetic cassettes that link a targeting guide RNA with rationally programmable homologous repair cassettes that can be systematically designed to edit loci across a genome and track their phenotypic effects.
  • CREATE CRISPR EnAbled Trackable genome Engineering
  • each CREATE cassette is designed to include both a targeting guide RNA (gRNA) and a homology arm (HA) that introduces rational mutations at the chromosomal cleavage site (e.g. FIG. 8 A ).
  • the HA encodes both the genomic edit of interest coupled to a synonymous PAM mutation that is designed to abrogate Cas9 cleavage after repair (e.g. FIG. 8 B ).
  • This arrangement not only ensures that the desired edit can be selectively enriched to high levels by Cas9 but also that the sequences required to guide cleavage and HR are covalently coupled during synthesis and thus delivered simultaneously to the same cell during transformation.
  • the high efficiency editing of CRISPR based selection in E. coli should also ensure a strong correlation between the CREATE plasmid and genomic sequences and allow the plasmid sequence to serve as a trans-acting barcode or proxy for the genomic edit (e.g. FIG. 8 C ).
  • This design software is part of a suite of web-based design tools that can be implemented for E. coli and is under further development for other organisms as well as an expanded set of CRISPR-Cas systems.
  • This software platform enables high-throughput rational design of genomic libraries in a format that is compatible with parallelized array based oligo synthesis and simple homology based cloning methods that can be performed in batch for library construction (e.g. FIG. 8 B ).
  • the library designs included: 1) a complete saturation of the folA gene to map the entire mutational landscape of an essential gene in its chromosomal context 2) saturation mutagenesis of functional residues in 35 global regulators, efflux pumps and metabolic enzymes implicated in a wide range of tolerance and production phenotypes in E.
  • the pooled oligo libraries were amplified and cloned in parallel and a subset of single variants were isolated to further characterize editing efficiency at different loci (e.g. FIG. 9 C ).
  • Amplification and sequencing of the genomic loci after transformation with the CREATE plasmids revealed editing efficiencies of 70% on average (106 of 144 clones sampled at seven different loci), with a range of 30% for the metA V20L cassette to 100% for the rpoH_V179H cassette.
  • the differences in editing efficiency for each cassette were highly correlated with the distance between the PAM and target codon (e.g. FIG. 9 D ), a feature that also appears to affect the ability of linear DNA templates to effectively introduce targeted mutations (e.g. FIG.
  • DHFR dihydrofolate reductase
  • TMP antibiotic trimethoprim
  • a CREATE library designed to saturate every codon from 2-158 of the DHFR enzyme was recombineered into E. coli MG1655 and allowed to recover overnight. Following recovery ⁇ 10 9 cells (1 mL saturated culture) was transferred into media containing inhibitory TMP concentrations and allowed to grow for 48 hours. The resulting plasmid populations were then sequenced to assess our ability to capture information at the level of single amino acid substitutions that can confer TMP resistance (e.g. FIG. 10 A- 10 B ). Bootstrapped confidence intervals for mutational effect were derived using the enrichment data of the 158 synonymous mutations included in this experiment (e.g. FIG. 10 A- 10 B ).
  • the highly enriched F153R mutant grows rapidly under a large range of TMP concentrations while the F153W mutant demonstrates growth only at the moderate TMP concentration used in the selection, consistent with their respective enrichment scores (e.g. FIG. 10 A- 10 F ).
  • 6 of the 7 mutations we identified using CREATE require two nucleotide changes to convert the wt TTT codon to one of the observed amino acids (I: 1 nt,W: 2 nt,D: 2 nt,R: 2 nt,P: 2 nt,M: 2 nt,H: 2 nt).
  • the F153R and F153W mutations also appear to impact the native enzyme activity in distinct ways (e.g. FIG. 21 ), implying that these substitutions may confer tolerance by altering the enzymatic cycle of this enzyme in distinct manners.
  • FIG. 23 A- 23 F a 4,240 variant library targeting the AcrB multidrug efflux pump in E. coli.
  • This protein acts as a proton exchange pump that exports a wide variety of chemicals including antibiotics, chemical mutagens, and short chain alcohols that are being pursued as next generation biofuels and motivating numerous engineering efforts.
  • the library was designed to target the interior chamber, the exit funnel that channels substrates towards the outer-membrane component of the AcrB/AcrA/TolC complex, and key regions of the transmembrane domain where mutations conferring tolerance to isobutanol and longer chain alcohols have been identified (e.g. FIG. 23 A- 23 C ).
  • FIG. 12 A- 12 F To further validate the method for genome-scale mapping and exploration we challenged genome wide targeting libraries with antibiotics or solvents relevant to bioproduction (e.g. FIG. 12 A- 12 F ).
  • an antibiotic that inhibits transcription by the RNA polymerase e.g. FIG. 12 A , inner circle
  • the data suggest that a bulky substitution is necessary to sterically hinder 7 rifampicin binding.
  • the rifampicin selections enriched a number mutations to the MarA transcriptional activator, whose over-expression due to marR knockout is a well studied aspect of multiple antibiotic resistance (MAR) phenotypes in E. coli .
  • MAR multiple antibiotic resistance
  • the DNA bound crystal structure of MarA Q89 is positioned near the DNA backbone but pointed into solution due to a steric clash between other possible rotamers and nearest phosphate group on the DNA backbone (e.g. FIG. 12 C ).
  • Modeling of the MarA Q89N and Q89D mutations identified by this selection suggests that shortening the side chain by a single carbon unit may enable new protein-DNA H-bonding interactions and thereby improve the overall MAR induction response.
  • the Fis, Fnr and FadR regulators are all involved transcriptional regulation of the primary acetate utilization gene acs, and implicated in the so-called “acetate-switch” which allows the cell to effectively scavenge acetate. Knockout of these regulators leads to constitutive expression of the acetate utilization pathways and improved acetate growth phenotypes suggesting that the mutations identified in this study (e.g. FIG. 12 E- 12 F ) likely inhibit these regulatory functions by destabilizing their respective protein targets.
  • CREATE allows parallel mapping of tens of thousands of amino acid and promoter mutations in a single experiment.
  • the construction, selection, and mapping of >50,000 genome-wide mutations can in some examples be accomplished in 1-2 weeks by a single researcher, offering orders of magnitude improvement in economics, throughput, and target scale over the current state of the art methods in synthetic biology.
  • the ability to track the enrichment of library variants allows multiplex sequence to activity mapping by a simple PCR based workflow using just a single set of primers as opposed to more complicated downstream sequencing approaches that are limited to a few dozen loci.
  • FIG. 9 A- 9 D the high efficiency mutagenesis reported in this work was not only an order of magnitude improved but was also achieved in a wild type MG1655 strain in which all of the native DNA repair pathways are intact.
  • the majority of previously reported recombineering efforts in E. coli have used single-stranded oligo engineering which requires deletion of the mismatch repair genes or chemically modified oligonucleotides to achieve mutagenesis at 1-30% efficiency.
  • FIG. 9 A- 9 D eliminating the need for specialized genetic modifications outside of the Cas9 and k-RED genes to perform efficient editing and tracking on a population scale (e.g. FIG. 9 A- 9 D ).
  • This fact alongside the broad utility of CRISPR editing suggests that the CREATE approach will readily port to a wide range of microorganisms such as Saccharomyces cerevisiae and other recombinogenic bacteria for which high-efficiency transformation protocols are available.
  • the CREATE strategy should also be compatible with a wide range of CRISPR/Cas systems using similar automation approaches to design and tracking. Extension of this methodology to higher eukaryotes however will require the development of strategies to overcome non-homologous end-joining as well as alternative tracking systems that can stably replicate.
  • the CREATE strategy provides a streamlined approach for sequence to activity mapping and directed evolution by integrating multiplexed oligo synthesis, CRISPR-CAS editing, and high-throughput sequencing.
  • the plasmid cassette should have minimal or no functional influence relative to the genomic edit, ii) the genomic loci will only be either the WT sequence or the sequence from the editing cassette that we obtain via sequencing, and iii) offsite editing is highly unlikely given the toxicity of CRISPR-Cas editing of multiple sites (e.g. FIG. 16 A- 16 E ) or when performed in the absence of an added editing-repair template.
  • the use of replicate experiments and deeper sequencing can also address these issues.
  • Off target gRNA cleavage should be rare in E. coli due to the relatively small size of its genome (4 Mb), and thus lack of (non-targeted) regions of homology to the CREATE cassette. Moreover, the toxicity of gRNAs in the presence of Cas9 (e.g. FIG. 9 A ) ensures that cells survival is compromised in E. coli due to dsDNA breaks. Each additional cut introduced into E. coli appears to incur multiplicative toxicity effects, even when homologous repair templates are provided for each cut site (e.g. FIG. 16 A- 16 E ). This toxicity effect would be further exacerbated by the absence of a repair template to guide HR (e.g. FIG. 16 A- 16 E ), as would be the case for an off-target cleavage event from a single gRNA targeting two sites but containing only a single HA.
  • Synonymous mutations can confer unexpected effects on phenotype.
  • an internal control that consists of a library of synonymous mutations ( 1/20 at each codon or 5% of total input), each of which samples different PAM and codon combinations and thus give us an idea of the range of possible effects we may have on a gene by measuring the enrichment profile of many synonymous changes.
  • This population as a control we can accurately identify significant fitness changes at the resolution of single amino acids as the work suggests.
  • the folA library (3140 cassettes) was designed to be an unbiased, exploratory library for full single site saturation mutagenesis and sequence activity.
  • the genes we sought to maximize the probability of interesting genotypes by choosing to focus the diversity of sites most likely to have a functional impact on the targeted protein (e.g. DNA binding sites, active sites, regions identified as mutational hotspots by previous selections).
  • the sites that were included in these library designs were selected based on information deposited in databases including Ecocyc (biocyc.org/), Uniprot (uniprot.org/), and the PDB (rcsb.org/pdb) as well as relevant literature citations that identified residues or regions of interest using directed evolution approaches.
  • the Uniprot and Ecocyc databases provide manually curated sequence features that indicate mutational effects and important domains of each protein. In cases where there was enough structural information to model ligand or DNA binding sites the relevant crystal structures were loaded into Pymol and manual residue selections were made and exported as numerical lists.
  • Design of the CREATE cassettes was automated using custom Python scripts.
  • the basic algorithm takes a gene sequence, a list of target residues, and a list of codons as inputs.
  • the gene sequence is searched for all available PAM sites with the corresponding spacer sequence. This list is then sorted according to relative proximity to the targeted codon position.
  • the algorithm checks for synonymous mutations that can be made in-frame that also directly disrupt the PAM site, in the event that this condition is met the algorithm proceeds to making the prescribed codon change and designing the full CREATE cassette with the accompanying spacer and iterates for each input codon and position respectively. For each PAM mutation, all possible synonymous codon substitutions are checked before proceeding to the next PAM site.
  • codon saturation libraries in this study we chose the most frequent codons (genscript.com/cgi-bin/tools/codon_freq_table) for each designed amino acid substitution according to the E. coli usage statistics.
  • the script can be run rapidly on a laptop computer and was used to generate the full design of these libraries in ⁇ 10 minutes.
  • the algorithm used in this study was designed to make the most conservative mutations possible by sometimes using only the PAM as the selectable mutation marker.
  • the X2-cas9 broad host range vector was constructed by amplifying the cas9 gene from genomic S. pyogenes DNA into the pBTBX2 backbone (Lucigen).
  • a vector map and sequence of this vector and the galK_Y145*_120/17 CREATE cassette are provided at the following locations: benchling.com/s/3c941j/edit; benchling.com/s/xRBDwcMy/edit.
  • Genomic libraries were prepared by transforming CREATE plasmid libraries into a wildtype E. coli MG1655 strain carrying the temperature sensitive pSIM5 plasmid (lambda RED) and a broad host range plasmid containing an inducible cas9 gene from cloned from S. pyogenes genomic DNA into the pBTBX-2 backbone (X2cas9, e.g. FIG. 15 A- 15 D ).
  • pSIM5 was induced for 15 min at 42° C. followed by chilling on ice for 15 min. The cells were washed 3 times with 1 ⁇ 5 the initial culture volume of ddH2O (e.g. 10 mL washes for 50 mL culture).
  • the cells were recovered in LB+0.4% arabinose to induce Cas9.
  • the cells were recovered 1-2 hrs before spot plating to determine library coverage and transferred to a 10 ⁇ volume for overnight recovery in LB+0.4% arabinose+50 ⁇ g/mL kanamycin+100 ⁇ g/mL carbenicillin.
  • Saturated overnight cultures were pelleted and resuspended in 5 mL of LB. 1 mL was used to make glycerol stocks and the other 1 mL washed with the appropriate selection media before proceeding with selection.
  • the cells were harvested by pelleting and resuspension in fresh selection media. All selections were performed in shake flask and inoculated at an initial OD600 of 0.1. Three serial dilutions (48-96 hrs depending on growth rates in the target condition) were carried out for each selection by transferring 1/100th the media volume after the cultures reached stationary phase. The 42° C. selections were performed in M9 media+0.2% glucose to mimic low carbon availability from the initial adaptation. Antibiotic selections were carried out in LB+500 ⁇ g/mL rifampicin or erythromycin to ensure stringent selection.
  • the solvent selections were performed in M9+0.4% glucose and either 10 g/L acetate (unbuffered) or 2 g/L furfural. Selections were harvested by pelleting 1 mL of the final culture and the cell pellet was boiled in 100 ⁇ L TE buffer to preserve both the plasmid and the genomic DNA for further desired analyses.
  • Custom Illumina compatible primers were designed to allow a single amplification step from the CREATE plasmid and assignment of experimental reads using barcodes.
  • the CREATE cassettes were amplified directly from the plasmid sequences of boiled cell lysates using 20 cycles of PCR with the Phusion (NEB) polymerase using 60° C. annealing and 1:30 minute extension times. As in the cloning procedure a minimal number of PCR cycles was maintained to prevent accumulation of mutations and recombined CREATE cassettes that were observed when an excessive number of PCR cycles was implemented (e.g. >25-30). Amplified fragments were verified and quantified by 1% agarose gel electrophoresis and pooled according to the desired read depth for each sample. The pooled library was cleaned using Qiaquick PCR cleanup kit and processed for NGS using standard Illumina preparation kits. The Illumina sequencing and sample preparation were performed with the primers.
  • Paired-end Illumina sequencing reads were sorted according to the golay barcode index with allowance of up to 3 mismatches then merged using the usearch-fastq_merge algorithm. Sorted reads were then matched against the database of designed CREATE cassettes using the usearch global algorithm at an identity threshold of 90% allowing up to 60 possible hits for each read. The resulting hits were further sorted according to percent identity and read assignment was made using the best matching CREATE cassette design at a final cutoff 98% identity to the initial design. It should be noted that this read assignment strategy attempts to identify correlations between the designed genotypes and may therefore miss other important features that arise due to mutations that could occur during the experimental procedure. This approach was taken both to simplify data analysis as well as evaluate the ‘forward’ design and annotation procedure and it's ability to accurately identify meaningful genetic phenomena.
  • Enrichment scores (or absolute fitness scores) were calculated as the log 2 enrichment score using the following equation:
  • F x,f is the frequency of cassette X at the final time point and F x,i is the initial frequency of cassette X and W is the absolute fitness of each variant.
  • Frequencies were determined by dividing the read counts for each variant by the total experimental counts including those that were lost to filtering. Each selection was performed in duplicate and the count weighted average of the two measurements was used to infer the average fitness score of each mutation as follows:
  • the AcrB T60N and Crp S28P and FolA F153R/W CREATE cassettes were ordered as separate gblocks from IDT, cloned and sequence verified. Each cassette was transformed into MG1655 and colony screened to identify a clone with the designed genomic edit. These strains (e.g. FIG. 21 and FIG. 22 A- 22 C ) were then subjected to the growth conditions from the pooled library selection as indicated. The growth curves were taken in triplicate for each condition in 100 ⁇ L in a 96 well plate reader set to measure absorbance at 600 nm. The plate was covered and water added to empty wells to reduce evaporation during the growth.
  • Circle plots were generated using Circos v0.67. Plots were generated in Python 2.7 using the matplotlib plotting libraries and figures were made using Adobe Illustrator CS5. Entropy scores for the FolA ( FIG. 10 A ) were determined using the ProDy Python package and the Pfam accession PF00186 representative proteome alignment RP35.
  • the possible outcomes are depicted in FIG. 27 B .
  • Pre-selection all combinations of edit/barcode/WT are possible. After selection, edits cells could be enriched whether they are barcoded or not in this experimental design.
  • the transformations were plated on selective media that allowed for enrichment of cells containing the gene edits. 30 colonies from each combination transformation were sequenced to determine if they contained the desired barcode.
  • FIG. 27 C shows the results from the sequencing data. Two of the edit/barcode combinations were found in 100% of the tested colonies (30/30 colonies), and the other edit/barcode combination transformation was found in approximately 97% of tested colonies (29/30 colonies). The single colony that was not properly engineered contained the gene edit, but not the barcode.
  • FIG. 28 depicts an example strategy for selecting for the recording event (e.g., incorporation of the barcode by the recorder cassette), in addition to selecting for the editing cassette incorporation, thereby increasing the efficiency of recovering cells that have been both edited and barcoded.
  • the recording event e.g., incorporation of the barcode by the recorder cassette
  • sequences S0, S1, S2, etc. are designed to be targeted by the guide RNA associated with the recorder cassette of the next round.
  • a PAM mutation, a barcode, S1 site, and regulatory elementary necessary to turn on a selectable marker are incorporated into the S0 site in the target region. This turns on the TetR selectable marker and allows for enrichment of barcoded mutants variants with the S1 site that have the first round PAM site deleted.
  • a new recorder cassette comprising a second PAM mutation, a second barcode, a S2 site, and a mutation that turns off the selectable marker is incorporated into the S1 site from the previous round.
  • the recorder cassette from each round is designed to incorporate into a unique sequence (e.g., S0, S1, etc.) that was incorporated in the previous round. This ensures that the last round of barcoding was successful so that all desired engineering steps are contained in the final product.
  • the incorporation of PAM mutations at each step also helps ensure that the desired barcoded variants are selected for since cells having the unmodified PAM sequences will be killed as they can't escape CRISPR enzyme cleavage.
  • This strategy uses multiple methods to increase the efficiency of isolating desired variants that contain all of the engineered edits from each round of engineering.
  • the PAM mutation, selectable marker switch, and unique landing site incorporated in each round separately increase efficiency and together increase efficiency as well.
  • These tools allow for selection of each recording round and allow design of highly active recording guide RNAs.
  • An array of equally spaced (or not equally spaced, depending on the design) barcodes is generated and facilitates downstream analysis such as sequencing the barcode array to determine which corresponding edits are incorporated throughout the genome.
  • FIG. 29 depicts an experimental design to test the selectable recorder strategy described above.
  • a plasmid (pREC1) containing an editing cassette and a recorder cassette was transformed into cells.
  • the editing cassette either contained a non-targeting editing cassette, or a mutation that incorporated a mutation (not TS) or a temperature sensitive mutation (TS) into a target gene.
  • the recorder cassette was designed to incorporate into the S0 site in the target gene that originally had the tetR selectable marker turned off.
  • the recorder cassette also contained a PAM mutation that deleted the S0 PAM site, first barcode (BC1), a unique S1 site for the subsequent engineering round recording cassette to incorporate into, and a corrective mutation that will turn on the TetR selectable marker.
  • BC1 first barcode
  • a guide RNA on the recorder cassette that targets a PAM site in the S0 site allows a CRISPR enzyme, in this case Cas9, to cleave the S0 site.
  • the recorder cassette recombines into the cleaved S0 site.
  • the PAM mutation is incorporated, which means the S0-gRNA can no longer target the S0 site, thereby killing WT cells and enriching for cells that received the barcode.
  • the TetR selectable marker was also turned on, allowing further selection of the barcoded variant.
  • FIGS. 30 A and 30 B show the results from the experiment described above and depicted in FIG. 29 .
  • 16 were sequence and determined to all contain the designed barcode ( FIG. 30 A ).
  • FIG. 30 B shows that the control cells that did not contain the recorder target site (non-target) did not survive the presence of Tet, while cells that contained the target site were successfully barcoded as evidences by the turning on of TetR, allowing cells to be selected on Tet containing media.
  • the Tet resistant colonies were confirmed at the genomic site to have TetR gene turned on. These data showed that selectable recording was successful.
  • Wild-type nucleic acid sequences for MAD1-MAD20 include SEQ ID NOs 21-40, respectively. These MAD nucleases were codon optimized for expression in E. coli and the codon optimized sequences are listed as SEQ ID NO: 41-60, respectively (summarized in Table 2). Codon optimized MAD1-MAD20 were cloned into an expression construct comprising a constitutive or inducible promoter (e.g., T7 promoter SEQ ID NO: 83, or pBAD promoter SEQ ID NO: 81 or SEQ ID NO: 82) and an optional 6 ⁇ -His tag (SEQ ID NO: 186). The generated MAD1-MAD20 expression constructs are provided as SEQ ID NOs: 61-80, respectively.
  • a constitutive or inducible promoter e.g., T7 promoter SEQ ID NO: 83, or pBAD promoter SEQ ID NO: 81 or SEQ ID NO: 82
  • 6 ⁇ -His tag SEQ
  • MAD2 and MAD7 nucleases are nucleic acid-guided nuclease that can be used in the methods disclosed herein.
  • Nucleases Mad2 (SEQ ID NO: 2) and Mad 7 (SEQ ID NO: 7) were cloned and transformed into cells. Editing cassettes designed to mutate a target site in a galK gene were designed with mutations, which allowed for white/red screening of successfully editing colonies. The editing cassettes also encoded a guide nucleic acid designed to target galK. The editing cassettes were transformed into E. coli cells expressing MAD2, MAD7, or Cas9.
  • FIG. 31 A shows the editing efficiency of Mad2 and Mad7 compared to Cas9 (SEQ ID NO: 110).
  • FIG. 31 A shows the editing efficiency of Mad2 and Mad7 compared to Cas9 (SEQ ID NO: 110).
  • the guide nucleic acid used with MAD2 and MAD7 comprised a scaffold-12 sequence and a guide sequence targeting galK.
  • the guide nucleic acid used with Cas9 comprised a sequence compatible with the S. pyogenes Cas9.
  • FIG. 32 and Table 3 show more examples of gene editing using the MAD2 nuclease.
  • different guide nucleic acid sequences were tested.
  • the guide sequence of the guide nucleic acids targeted the galK gene as described above.
  • the scaffold sequence of the guide nucleic acids were one of various sequences tested as indicated.
  • Guide nucleic acids with scaffold-5, scaffold-10, scaffold-11, and scaffold-12 were able to form functional complexes with MAD2.
  • FIG. 33 and Table 4 show more examples of gene editing using the MAD7 nuclease.
  • different guide nucleic acid sequences were tested.
  • the guide sequence of the guide nucleic acids targeted the galK gene as described above.
  • the scaffold sequence of the guide nucleic acids were one of various sequences tested as indicated.
  • Guide nucleic acids with scaffold-10, scaffold-11, and scaffold-12 were able to form functional complexes with MAD7.
  • Amino acid sequences are provided in Table 2 and scaffolding sequences are provided in Table 3 and Table 4.
  • Table 3 and Table 4 also provided the designed mutations in the editing cassettes that were used to mutate the galK target gene.
  • nuclease scaffold sequence mutation gene 1 MAD2 Scaffold-12; SEQ ID NO: 95 N89KpnI galK 2 MAD2 Scaffold-10; SEQ ID NO: 93 L80** galK 3 MAD2 Scaffold-5; SEQ ID NO: 88 L80** galK 4 MAD2 Scaffold-12; SEQ ID NO: 95 D70KpnI galK 5 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK 6 MAD2 Scaffold-11; SEQ ID NO: 94 Y145** galK 7 MAD2 Scaffold-10; SEQ ID NO: 93 Y145** galK 8 MAD2 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 9 MAD2 Scaffold-11; SEQ ID NO: 94 L80** galK 10 SpCas9 S .
  • nuclease scaffold sequence mutation gene 1 MAD7 Scaffold-1; SEQ ID NO: 84 L80** galK 2 MAD7 Scaffold-2; SEQ ID NO: 85 Y145** galK 3 MAD7 Scaffold-4; SEQ ID NO: 87 Y145** galK 4 MAD7 Scaffold-10; SEQ ID NO: 93 Y145** galK 5 MAD7 Scaffold-11; SEQ ID NO: 95 L80** galK

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Virology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Immunology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)

Abstract

Provided herein are methods and composition for trackable genetic variant libraries. Further provided herein are methods and compositions for recursive engineering. Further provided herein are methods and compositions for multiplex engineering. Further provided herein are methods and compositions for enriching for editing and trackable engineered sequences and cells using nucleic acid-guided nucleases.

Description

    CROSS-REFERENCE
  • The present application claims priority to U.S. patent application Ser. No. 16/295,393, filed Mar. 7, 2019, which claims priority to U.S. patent application Ser. No. 15/948,798, filed Apr. 9, 2018, now U.S. Pat. No. 10,287,575; which claims priority to U.S. patent application Ser. No. 15/948,793, filed Apr. 9, 2018, now U.S. Pat. No. 10,294,473; which claims priority to U.S. patent application Ser. No. 15/632,222, filed Jun. 23, 2017, now U.S. Pat. No. 10,017,760; which claims priority to U.S. Provisional Application Ser. No. 62/354,516, filed Jun. 24, 2016; U.S. Provisional Application Ser. No. 62/367,386, filed Jul. 27, 2016; and U.S. Provisional Application Ser. No. 62/483,930, filed Apr. 10, 2017, the contents of each being hereby incorporated by reference in their entirety.
  • STATEMENT AS TO FEDERALLY SPONSORED RESEARCH
  • This disclosure was made with the support of the United States government under Contract number DE-SC0008812 by the Department of Energy.
  • SEQUENCE LISTING
  • The instant application contains a Sequence Listing which has been submitted in XML file format via EFS-Web and is hereby incorporated by reference in its entirety. Said XML copy, created Mar. 6, 2023, is named 14325_002US5_SL.xml and is 666,805 bytes in size.
  • BACKGROUND OF THE DISCLOSURE
  • Understanding the relationship between a protein's amino acid structure and its overall function continues to be of great practical, clinical, and scientific significance for biologists and engineers. Directed evolution can be a powerful engineering and discovery tool, but the random and often combinatorial nature of mutations makes their individual impacts difficult to quantify and thus challenges further engineering. More systematic analysis of contributions of individual residues or saturation mutagenesis remains labor- and time-intensive for entire proteins and simply is not possible on reasonable timescales for editing of multiple proteins in parallel, such as metabolic pathways or multi-protein complexes, using standard methods.
  • SUMMARY OF THE DISCLOSURE
  • Disclosed herein are compositions comprising: i) a first donor nucleic acid comprising: a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid; and ii) a second donor nucleic acid comprising: a) a barcode corresponding to the modified first target nucleic acid sequence; and b) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of a second target nucleic acid. Further disclosed are compositions wherein the modified first target nucleic acid sequence comprises at least one inserted, deleted, or substituted nucleic acid compared to a corresponding un-modified first target nucleic acid. Further disclosed are compositions wherein the first guide nucleic acid and second guide nucleic acid are compatible with a nucleic acid-guided nuclease. Further disclosed are compositions wherein the nucleic acid-guided nuclease is a Type II or Type V Cas protein. Further disclosed are compositions wherein the nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue. Further disclosed are compositions wherein the second donor nucleic acid comprises a second PAM mutation. Further disclosed are compositions wherein the second donor nucleic acid sequence comprises a regulatory sequence or a mutation to turn a screenable or selectable marker on or off. Further disclosed are compositions wherein the second donor nucleic acid sequence targets a unique landing site.
  • Disclosed herein are methods of genome engineering, the method comprising: a) contacting a population of cells with a polynucleotide, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a nucleic acid-guided nuclease, wherein the polynucleotide comprises 1) an editing cassette comprising: i) a modified first target nucleic acid sequence; ii) a first protospacer adjacent motif (PAM) mutation; iii) a first guide nucleic acid sequence comprising a spacer region complementary to a portion of the first target nucleic acid and compatible with the nucleic acid-guided nuclease; and 2) a recorder cassette comprising i) a barcode corresponding to the modified first target nucleic acid sequence; and ii) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of the second target nucleic acid and compatible with the nucleic acid-guided nuclease; b) allowing the first guide nucleic acid sequence, the second guide nucleic acid sequence, and the nucleic acid-guided nuclease to create a genome edit within the first target nucleic acid and the second target nucleic acid. Further disclosed are methods further comprising c) sequencing a portion of the barcode, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step a). Further disclosed are methods wherein the nucleic acid-guided nuclease is a CRISPR nuclease. Further disclosed are methods wherein the PAM mutation is not recognized by the nucleic acid-guided nuclease. Further disclosed are methods wherein the nucleic acid-guided nuclease is a Type II or Type V Cas protein. Further disclosed are methods wherein the nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue. Further disclosed are methods wherein the recorder cassette further comprises a second PAM mutation that is not recognized by the nucleic acid-guided nuclease.
  • Disclosed herein are methods of selectable recursive genetic engineering comprising a) contacting cells comprising a nucleic acid-guided nuclease with a polynucleotide comprising a recorder cassette, said recorder cassette comprising i) a nucleic acid sequence that recombines into a unique landing site incorporated during a previous round of engineering, wherein the nucleic acid sequence comprises a unique barcode; and ii) a guide RNA compatible with the nucleic acid-guided nuclease that targets the unique landing site; and b) allowing the nucleic acid-guided nuclease to edit the unique landing site, thereby incorporating the unique barcode into the unique landing site. Further disclosed are methods wherein the nucleic acid sequence further comprises a regulatory sequence that turns transcription of a screenable or selectable marker on or off. Further disclosed are methods wherein the nucleic acid sequence further comprises a PAM mutation that is not compatible with the nucleic acid-guided nuclease. Further disclosed are methods wherein the nucleic acid sequence further comprises a second unique landing site for subsequent engineering rounds. Further disclosed are methods wherein the polynucleotide further comprises an editing cassette comprising a) a modified first target nucleic acid sequence; b) a first protospacer adjacent motif (PAM) mutation; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid, wherein the unique barcode corresponds to the modified first target nucleic acid such that the modified target nucleic acid can be identified by the unique barcode.
  • Provided herein are compositions comprising i) a first donor nucleic acid comprising: a) a modified first target nucleic acid sequence; b) a mutant protospacer adjacent motif (PAM) sequence; and c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid; and ii) a second donor nucleic acid comprising: a) a recorder sequence; and b) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of the second target nucleic acid. In some aspects, the first donor nucleic acid and the second donor nucleic acid are covalently linked or comprised on a single nucleic acid molecule. Further provided are compositions wherein the modified first target nucleic acid comprises a 5′ homology are and a 3′ homology arm. Further provided are compositions wherein the 5′ homology arm and the 3′ homology arm are homologous to nucleic acid sequence flanking a protospacer complementary to the first spacer region. Further provided are compositions wherein the modified first target nucleic acid sequence comprises at least one inserted, deleted, or substituted nucleic acid compared to a corresponding un-modified first target nucleic acid. Further provided are compositions wherein the first gRNA is compatible with a nucleic acid-guided nuclease, thereby facilitating nuclease-mediate cleavage of the first target nucleic acid. Further provided are compositions wherein the nucleic acid-guided nuclease is a Cas protein, such as a Type II or Type V Cas protein. Further provided are compositions wherein the nucleic acid-guided nuclease is Cas9 or Cpf1. Further provided are compositions wherein the nucleic acid-guided nuclease is MAD2 or MAD7. Further provided are compositions wherein the nucleic acid-guided nuclease is an engineered or non-natural enzyme. Further provided are compositions wherein the nucleic acid-guided nuclease is a engineered or non-natural enzyme derived from Cas9 or Cpf1. Further provided are compositions wherein the nucleic acid-guided nuclease is an engineered or non-natural enzyme that has less than 80% homology to either Cas9 or Cpf1. Further provided are compositions wherein the mutant PAM sequence is not recognized by the nucleic acid-guided nuclease. Further provided are compositions wherein the recorder sequence comprises a barcode. Further provided are compositions wherein the recorder sequence comprises a fragment of a screenable or selectable marker. Further provided are compositions wherein the recorder sequence comprises a unique sequence by which the modified first target nucleic acid sequence is specifically identified. Further provided are compositions wherein the recorder sequence comprises a unique sequence by which the edited cells may be selected or enriched. A first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein. A second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein. A first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette. A first donor nucleic acid and a second donor nucleic acid can be covalently linked. In any of these examples, the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • Provided herein are cells comprising an engineered chromosome or polynucleic acid comprising: a first modified sequence; a first mutant protospacer adjacent motif (PAM); a first recorder sequence, the sequence of which uniquely identifies the first modified sequence, wherein the first modified sequence and the first recorder sequence are separated by at least 1 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least 100 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least 500 bp. Further provided are cells wherein the first modified sequence and the first recorder sequence are separated by at least lkbp. Further provided are cells wherein the first recorder sequence is a barcode. Further provided are cells wherein the first modified sequence is within a coding sequence. Further provided are cells wherein the first modified sequence comprises at least one inserted, deleted, or substituted nucleotide compared to an unmodified sequence. Further provided are cells further comprising: a second modified sequence; a second mutant PAM; and a second recorder sequence, the sequence of which uniquely identifies the second modified sequence, wherein the second modified sequence and the second recorder sequence are separated by at least 1 kb. Further provided are cells wherein the first recorder sequence and the second recorder sequence are separated by less than 100 bp. Further provided are cells wherein the second recorder sequence is a barcode. Further provided are cells wherein the second modified sequence is within a coding sequence. Further provided are cells wherein the second modified sequence comprises at least one inserted, deleted, or substituted nucleotide compared to an unmodified sequence. Further provided are cells wherein the first recorder sequence and the second recorder sequence are immediately adjacent to each other or overlapping, thereby generating a combined recorder sequence. Further provided are cells wherein the combined recorder sequence comprises a selectable or screenable marker. Further provided are cells wherein the combined recorder sequence comprises a selectable or screenable marker by which the cells may be enriched or selected.
  • Provided herein are methods of genome engineering, the method comprising: a) introducing into a population of cells a plurality of polynucleotides, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, wherein each polynucleotide comprises: i) a modified first target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; iii) a first guide nucleic acid sequence comprising a guide sequence complementary to a portion of the first target nucleic acid; and (iv) a recorder sequence; b) inserting the modified first target nucleic acid sequence within the first target nucleic acid; c) inserting the recorder sequence within the second target nucleic acid; d) cleaving the first target nucleic acid by the targetable nuclease in cells that do not comprise the mutant PAM sequence, thereby enriching for cells comprising the inserted modified first target nucleic acid sequence. Further provided are methods wherein the recorder sequence is linked to the modified first target nucleic acid. Further provided are methods wherein each polynucleotide further comprises a second mutant PAM sequence. Further provided are methods wherein each polynucleotide further comprises a second guide nucleic acid sequence comprising a guide sequence complementary to a portion of the second target nucleic acid. Further provided are methods wherein the recorder sequence comprises a unique sequence by which the modified first target nucleic acid is specifically identified upon sequencing the recorder sequence. Further provided are methods further comprising e) sequencing the recorder sequence, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step b). Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises cleaving the first target nucleic acid by the nuclease complexed with the transcription product of the first guide nucleic acid sequence. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homologous recombination. Further provided are methods wherein the polynucleotide further comprises a second guide nucleic acid sequence comprising a spacer region complementary to a portion of the second target nucleic acid. Further provided are methods wherein inserting the recorder sequence comprises cleaving the second target nucleic acid by the nuclease complexed with the transcription product of the second guide nucleic acid sequence. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence further comprises homologous recombination. Further provided are methods wherein the targetable nuclease is a Cas protein. Further provided are methods wherein the Cas protein is a Type II or Type V Cas protein. Further provided are methods wherein the Cas protein is Cas9 or Cpf1. Further provided are methods wherein the targetable nuclease is a nucleic acid-guided nuclease. Further provided are methods wherein the targetable nuclease is MAD2 or MAD7. Further provided are methods wherein the mutant PAM sequence is not recognized by the targetable nuclease. Further provided are methods wherein the targetable nuclease is an engineered targetable nuclease. Further provided are methods wherein the mutant PAM sequence is not recognized by the engineered targetable nuclease. Further provided are methods further comprising introducing a second plurality of polynucleotides into a second population of cells comprising the enriched cells from step d), wherein each cell within the second population of cells comprises a third nucleic acid, a fourth target nucleic acid, and a targetable nuclease. Further provided are methods wherein each of the second polynucleotides comprises: i) a modified third target nucleic acid sequence; ii) a third mutant protospacer adjacent motif (PAM) sequence; iii) a third guide nucleic acid sequence comprising a spacer region complementary to a portion of the third target nucleic acid; and (iv) a second recorder sequence. Further provided are methods wherein each second polynucleotide further comprises a fourth mutant PAM sequence. Further provided are methods wherein each second polynucleotide further comprises a fourth guide nucleic acid sequence comprising a guide sequence complementary to a portion of the fourth target nucleic acid. Further provided are methods further comprising: a) inserting the modified third target nucleic acid sequence within the third target nucleic acid; b) inserting the second recorder sequence within the fourth target nucleic acid; c) cleaving the third target nucleic acid by the nuclease in cells that do not comprise the second mutant PAM sequence, thereby enriching for cells comprising the inserted modified third target nucleic acid sequence. Further provided are methods wherein the fourth target nucleic acid is adjacent to the second target nucleic acid. Further provided are methods wherein the inserted first recorder sequence is adjacent to the second recorder sequence, such that sequencing information can be obtained for the first and second recorder sequence from a single sequencing read. Further provided are methods further comprising obtaining sequence information from the first and second recorder sequences within a single sequence read, thereby identifying the modified first and third target nucleic acid sequences inserted into the first and third target nucleic acids respectively.
  • Provided herein are methods of identifying engineered cells, the method comprising: a) providing cells, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, b) introducing into the cells a polynucleotide comprising: 1) a first donor nucleic acid comprising i) a modified target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; and iii) a first guide nucleic acid sequence comprising a first guide sequence complementary to a portion of the first target nucleic acid; and 2) a second donor nucleic acid comprising i) a recorder sequence corresponding to the modified target nucleic acid sequence; and ii) a second guide nucleic acid sequence comprising a second guide sequence complementary to a portion of the second target nucleic acid, c) cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant PAM sequence, thereby enriching for cells comprising the modified target nucleic acid sequence, d) repeating steps a)-c) at least one time using the cells enriched for in step c) as the cells for step a) of the following round, wherein the recorder sequence from each round is incorporated adjacent to the recorder sequence from the previous round, thereby generating a record sequence array comprising a plurality of traceable barcodes, and e) sequencing the record sequence, thereby identifying engineered cells comprising a desired combination of modified target nucleic acids. Further provided are methods wherein the second donor nucleic acid further comprises a second mutant PAM sequence. Further provided are methods wherein sequencing the record sequence array comprises obtaining sequence information for each of the plurality of recorder sequences within a single sequencing read. Further provided are methods wherein steps a)-c) are repeated at least once. Further provided are methods wherein steps a)-c) are repeated at least twice. Further provided are methods wherein the recorder sequence is a barcode. Further provided are methods where the first donor nucleic acid and the second donor nucleic acid are covalently linked. A first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein. A second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein. A first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette. A first donor nucleic acid and a second donor nucleic acid can be covalently linked. In any of these examples, the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • Provided herein are methods of identifying engineered cells, the method comprising: a) providing cells, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, b) introducing into the cells a polynucleotide comprising: 1) a first donor nucleic acid comprising i) a modified target nucleic acid sequence; ii) a mutant protospacer adjacent motif (PAM) sequence; and iii) a first guide nucleic acid sequence comprising a first guide sequence complementary to a portion of the first target nucleic acid; and 2) a second donor nucleic acid comprising i) a marker fragment corresponding to the modified target nucleic acid sequence; and ii) a second guide nucleic acid sequence comprising a second guide sequence complementary to a portion of the second target nucleic acid, c) cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant PAM sequence, thereby enriching for cells comprising the modified target nucleic acid sequence, d) repeating steps a)-c) at least one time using the cells enriched for in step c) as the cells for step a) of the following round, wherein the marker fragment from each round is incorporated adjacent to the marker fragment from the previous round, thereby generating a complete marker, and e) identifying cells comprising the complete marker, thereby identifying engineered cells comprising a desired combination of modified target nucleic acids. Further provided are methods wherein the second donor nucleic acid further comprises a second mutant PAM sequence. Further provided are methods wherein the complete marker comprises a selectable marker. Further provided are methods wherein the selectable marker comprises an antibiotic resistance marker or an auxotrophic marker. Further provided are methods wherein the complete marker comprises a screenable reporter. Further provided are methods wherein the screenable reporter comprises a fluorescent reporter. Further provided are methods wherein the screenable reporter comprises a gene. Further provided are methods wherein the screenable reporter comprises a promotor or regulatory element. Further provided are methods wherein the promoter or regulatory element turns on or off transcription of a screenable or selectable element. Further provided are methods wherein the screenable reporter comprises a screenable or selectable element which alters a characteristic of a colony comprising the element compared to a colony that does not comprise the element. A first donor nucleic acid can be a cassette, such as an editing cassette as disclosed herein. A second donor nucleic acid can be a cassette, such as a recording cassette as disclosed herein. A first donor nucleic acid and a second donor nucleic acid can be comprised on a single cassette. A first donor nucleic acid and a second donor nucleic acid can be covalently linked. In any of these examples, the elements of the cassette or donor nucleic acids can be contiguous or non-contiguous.
  • Provided herein are methods of genome engineering, the method comprising: a) introducing into a population of cells a polynucleotide, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a targetable nuclease, wherein the polynucleotide comprises: i) a modified first target nucleic acid sequence; ii) a mutant nuclease recognition sequence; iii) a recorder sequence; b) inserting the modified first target nucleic acid sequence within the first target nucleic acid; c) inserting the recorder sequence within the second target nucleic acid; and d) selecting for a phenotype of interest. Further provided are methods wherein the polynucleotide further comprises a second mutant nuclease recognition site. Further provided are methods wherein selecting for a phenotype of interest comprises cleaving the first target nucleic acid by the nuclease in cells that do not comprise the mutant nuclease recognition sequence, thereby enriching for cells comprising the inserted modified first target nucleic acid sequence. Further provided are methods wherein selecting for a phenotype of interest comprises cleaving the second target nucleic acid by the nuclease in cells that do not comprise the second mutant nuclease recognition sequence, thereby enriching for cells comprising the inserted modified first target nucleic acid sequence. Further provided are methods wherein the recorder sequence is linked to the modified first target nucleic acid. Further provided are methods wherein the recorder sequence comprises a unique sequence by which the modified first target nucleic acid is specifically identified upon sequencing the recorder sequence. Further provided are methods further comprising e) sequencing the recorder sequence, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step b). Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises homologous recombination. Further provided are methods wherein the nuclease is a Cas protein. Further provided are methods wherein the polynucleotide further comprises a first guide nucleic acid sequence comprising a guide sequence complementary to a portion of the first target nucleic acid. Further provided are methods wherein inserting the modified first target nucleic acid sequence comprises cleaving the first target nucleic acid by the nuclease complexed with the transcription product of the first guide nucleic acid sequence. Further provided are methods wherein the polynucleotide further comprises a second guide nucleic acid sequence comprising a guide sequence complementary to a portion of the second target nucleic acid. Further provided are methods wherein inserting the recorder sequence comprises cleaving the second target nucleic acid by the nuclease complexed with the transcription product of the second guide nucleic acid sequence. Further provided are methods wherein inserting the modified first target nucleic acid sequence or the recorder sequence comprises homology-directed repair. Further provided are methods wherein inserting the modified first target nucleic acid sequence or the recorder sequence comprises homologous recombination. Further provided are methods wherein the mutant nuclease recognition sequence comprises a mutant PAM sequence not recognized by the targetable nuclease. Further provided are methods wherein the Cas protein is a Type II or Type V Cas protein. Further provided are methods wherein the targetable nuclease is MAD2. Further provided are methods wherein the mutant PAM sequence is not recognized by MAD2. Further provided are methods wherein the targetable nuclease is MAD7. Further provided are methods wherein the mutant PAM sequence is not recognized by MAD7. Further provided are methods wherein the Cas protein is Cas9. Further provided are methods wherein the mutant PAM sequence is not recognized by Cas9. Further provided are methods wherein the Cas protein is Cpf1. Further provided are methods wherein the mutant PAM sequence is not recognized by Cpf1. Further provided are methods wherein the nuclease is an Argonaute nuclease. Further provided are methods further comprising introducing guide DNA oligonucleotides comprising a guide sequence complementary to a portion of the first target nucleic acid prior to selecting for a phenotype. Further provided are methods wherein the mutant nuclease recognition sequence comprises a mutant target flanking sequence not recognized by the Argonaute nuclease. Further provided are methods wherein the nuclease is a zinc finger nuclease. Further provided are methods wherein the mutant nuclease recognition sequence is not recognized by the zinc finger nuclease. Further provided are methods wherein the nuclease is a transcription activator-like effector nuclease (TALEN). Further provided are methods wherein the mutant nuclease recognition sequence is not recognized by the TALEN.
  • INCORPORATION BY REFERENCE
  • All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
  • FIGS. 1A-1C depict an example genetic engineering workflow including target design, plasmid design, and plasmid library generation. Figure discloses SEQ ID NOS 187-190, respectively, in order of appearance.
  • FIGS. 2A-2D depicts validation data for an example experiment using a disclosed engineering method.
  • FIGS. 3A-3C depict an example trackable genetic engineering workflow, including a plasmid comprising an editing cassette and a recording cassette, and downstream sequencing of barcodes in order to identify the incorporated edit or mutation. Figure discloses SEQ ID NOS 191-192, respectively, in order of appearance.
  • FIGS. 3D-3E depict an example trackable genetic engineering workflow, including iterative rounds of engineering with a different editing cassette and recorder cassette with unique barcode (BC) at each round, followed by selection and tracking to confirm the successful engineering step at each round.
  • FIGS. 4A-4B depict an example of incorporation of a target mutation and PAM mutation using a plasmid comprising an editing cassette. Figure discloses SEQ ID NOS 193, 193, 194, 193, 194, 193, 193, 195, 193, 196, 193, 197, 194, 193, and 198, respectively, in order of appearance.
  • FIGS. 5A-5B depict an example of a plasmid comprising an editing cassette, designed to incorporate a target mutation and a PAM mutation into a first target sequence, and a recording cassette, designed to incorporate a barcode sequence into a second target sequence. FIG. 5B depicts example data validating incorporation of the editing cassette and recorder cassette and selection of the engineered bacterial cells. FIG. 5A discloses the left column sequences as SEQ ID NOS 201, 200, 201, 200, 200, 200, 200, 200, 200, 200, 200, 201, 202, 200, 200, 200, 200, 200, 200, 200, 202, and 200, respectively, in order of appearance and the right column sequences as SEQ ID NOS 203, 204, 204, 204, 204, 204, 204, 204, 204, 204, 204, 203, 205, 205, 205, 205, 205, 205, 205, 205, 205, and 205, respectively, in order of appearance.
  • FIG. 6 depicts an example recursive engineering workflow.
  • FIGS. 7A-7B depict an example plasmid curing workflow for combinatorial engineering and validation of an example experiment using said workflow.
  • FIGS. 8A-8B depict an example genetic engineering workflow including target design, plasmid design, and plasmid library generation. Figure discloses SEQ ID NOS 187-190, respectively, in order of appearance.
  • FIGS. 9A-9D depicts validation data for an example genetic engineering experiment.
  • FIGS. 10A-10F depict an example data set from a genetic engineering experiment.
  • FIGS. 11A-11C depict an example design and data set from a genetic engineering experiment.
  • FIGS. 12A-12F depict an example design for a genetic engineering experiment.
  • FIGS. 13A-13D depict example designed edits to be made by a genetic engineering. Figure discloses SEQ ID NOS 187-190 and 206-207, respectively, in order of appearance.
  • FIGS. 14A-14B depict an example design for a genetic engineering experiment.
  • FIGS. 15A-15D depict an example of Cas9 editing efficiency controls. Figure discloses SEQ ID NOS 208-209, respectively, in order of appearance.
  • FIGS. 16A-16E depict an examples of toxicity of dsDNA cleavage in E. coli.
  • FIG. 16F-16H depict an example of a transformation and survival assay, and editing and recording efficiencies, with low and high copy plasmids expressing Cas9.
  • FIGS. 17A-17D depict an example of genetic engineering strategy for gene deletion. FIGS. 17A and 17C disclose SEQ ID NO: 210.
  • FIGS. 18A-18B depicts an example of editing efficiency controls by cotransformation of guide nucleic acid and linear dsDNA cassettes.
  • FIGS. 19A-19D depict an example of library cloning analysis and statistics.
  • FIGS. 20A-20B depict an example of precision of editing cassette tracking of recombineered populations.
  • FIG. 21 depicts an example of growth characteristics of folA mutations in M9 minimal media
  • FIGS. 22A-22C depicts an example of enrichment profiles for folA editing cassettes in minimal media.
  • FIGS. 23A-23F depict an example of validation of identified acrB mutations for improved solvent and antibiotic tolerance.
  • FIGS. 24A-24D depict an example mutant variant assessment analysis.
  • FIG. 25 depicts an example of reconstruction of mutations identified by erythromycin selection.
  • FIGS. 26A-26B depict an example of validation of Crp S28P mutation for furfural or thermal tolerance.
  • FIGS. 27A-27C depict an example of edit and barcode correlation studies.
  • FIG. 28 depicts an example of a selectable recording strategy.
  • FIG. 29 depicts an example of a selectable recording strategy.
  • FIGS. 30A-30B depict data from a selectable recording experiment. Figure discloses “TCCACTGGTATGCAT” as SEQ ID NO: 211.
  • FIGS. 31A-31B depict editing and transformation efficiencies from various nucleic acid-guided nucleases from an example experiment.
  • FIG. 32 depict editing efficiencies of the MAD2 nuclease with various guide nucleic acids.
  • FIG. 33 depict editing efficiencies of the MAD7 nuclease with various guide nucleic acids.
  • DETAILED DESCRIPTION OF THE DISCLOSURE
  • While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention.
  • Methods and compositions for enabling sophisticated combinatorial engineering strategies to optimize and explore complex phenotypes are provided herein. Many phenotypes of interest to basic research and biotechnology are the result of combinations of mutations that occur at distal loci. For example, cancer is often linked to mutations that influence multiple hallmark gene functions rather than a single chromosomal edit. Likewise, many metabolic and regulatory processes that are the target of continuing engineering efforts require the activities of many proteins acting in concert to produce the phenotypic output of interest. Methods and compositions disclosed herein can provide ways of rapid engineering and prototyping of such functions since they can provide rapid construction and accurate reporting on the mutational effects at many sites in parallel.
  • The methods and compositions described herein can be carried out or used in any type of cell in which a nucleic acid-guided nuclease system, such as CRISPR or Argonaute, or other targetable nuclease systems, such as TALEN, ZFN, or meganuclease can function (e.g., target and cleave DNA), including prokaryotic, eukaryotic, or archaeal cells. The cell can be a bacterial cell, such as Escherichia spp. (e.g., E. coli). The cell can be a fungal cell, such as a yeast cell, e.g., Saccharomyces spp. The cell can be a human cell. The cell can be an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell. Additionally or alternatively, the methods described herein can be carried out in vitro or in cell-free systems in which a nucleic acid guided nuclease system, such as CRISPR or Argonaute, or other nuclease systems, such as TALEN, ZFN, or meganuclease can function (e.g., target and cleave DNA).
  • Disclosed herein are compositions and methods for genetic engineering. Disclosed are methods and compositions suitable for trackable or recursive genetic engineering. Disclosed method and compositions can use massively multiplexed oligonucleotide synthesis and cloning to enable high fidelity, trackable, multiplexed genome editing at single nucleotide resolution on a whole genome scale.
  • Trackable Plasmids
  • Methods and compositions can be used to perform high-fidelity trackable editing, for example, at single-nucleotide resolution and can be used to perform editing at a whole genome scale or on episomal nucleic acid molecules. Massively multiplexed oligonucleotide synthesis and/or cloning can be used in combination with a targetable nuclease system, such as a CRISPR system, MAD2 system, MAD7 system, or other nucleic acid-guided nuclease system, for editing.
  • As used herein, “cassette” often refers to a single molecule polynucleotide. A cassette can comprise DNA. A cassette can comprise RNA. A cassette can comprise a combination of DNA and RNA. A cassette can comprise non-naturally occurring nucleotides or modified nucleotides. A cassette can be single stranded. A cassette can be double stranded. A cassette can be synthesized as a single molecule. A cassette can be assembled from other cassettes, oligonucleotides, or other nucleic acid molecules. A cassette can comprise one or more elements. Such elements can include, as non-limiting examples, one or more of any of editing sequences, recorder sequences, guide nucleic acids, promoters, regulatory elements, mutant PAM sequences, homology arms, primer sites, linker regions, unique landing sites, a cassette, and any other element disclosed herein. Such elements can be in any order or combination. Any two or more elements can be contiguous or non-contiguous. A cassette can be comprised within a larger polynucleic acid. Such a larger polynucleic acid can be linear or circular, such as a plasmid or viral vector. A cassette can be a synthesized cassette. A cassette can be a trackable cassette.
  • A cassette can be designed to be used in any method or composition disclosed herein, including multiplex engineering methods and trackable engineering methods. An exemplary cassette can couple two or more elements, such as 1) a guide nucleic acid (e.g. gRNAs or gDNAs) designed for targeting a user specified target sequence in the genome and 2) an editing sequence and/or recorder sequence as disclosed herein (e.g. FIG. 1B and FIG. 5A). A cassette comprising an editing sequence and guide nucleic acid can be referred to as an editing cassette. A cassette comprising an editing sequence can be referred to as an editing cassette. A cassette comprising a recorder sequence and a guide nucleic acid can be referred to as a recorder cassette. A cassette comprising a recorder sequence can be referred to as a recorder cassette. In a preferred embodiment, an editing cassette and a recorder cassette are delivered into the cell at the same time. Further, an editing cassette and a recorder cassette may be covalently linked. Further, these elements may be synthesized together by multiplexed oligonucleotide synthesis.
  • A cassette can comprise one or more guide nucleic acids and editing cassette as a contiguous polynucleotide. In other examples, one or more guide nucleic acids and editing cassette are contiguous. In other examples, one or more guide nucleic acids and editing cassette are non-contiguous. In other examples, two or more guide nucleic acids and editing cassette are non-contiguous.
  • A cassette can comprise one or more guide nucleic acids, an editing cassette, and a recorder cassette as a contiguous polynucleotide. In other examples, one or more guide nucleic acids, editing cassette, and recorder cassette are contiguous. In other examples, two or more guide nucleic acids, editing cassette, and recorder cassette are contiguous. In other examples, one or more guide nucleic acids, editing cassette, and recorder cassette are non-contiguous. In other examples, two or more guide nucleic acids, editing cassette, and recorder cassette are non-contiguous.
  • A cassette can comprise one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes as a contiguous polynucleotide. In other examples, one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes are contiguous. In other examples, two or more guide nucleic acids, two or more editing cassettes, and two or more recorder cassettes are contiguous. In other examples, one or more guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes are non-contiguous. In other examples, two or more guide nucleic acids, two or more editing cassettes, and two or more recorder cassettes are non-contiguous.
  • A cassette can comprise one or more guide nucleic acids and editing sequence as a contiguous polynucleotide. In other examples, one or more guide nucleic acids and editing sequence are contiguous. In other examples, one or more guide nucleic acids and editing sequence are non-contiguous. In other examples, two or more guide nucleic acids and editing sequence are non-contiguous.
  • A cassette can comprise one or more guide nucleic acids, an editing sequence, and a recorder sequence as a contiguous polynucleotide. In other examples, one or more guide nucleic acids, editing sequence, and recorder sequence are contiguous. In other examples, two or more guide nucleic acids, editing sequence, and recorder sequence are contiguous. In other examples, one or more guide nucleic acids, editing sequence, and recorder sequence are non-contiguous. In other examples, two or more guide nucleic acids, editing sequence, and recorder sequence are non-contiguous.
  • A cassette can comprise one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences as a contiguous polynucleotide. In other examples, one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences are contiguous. In other examples, two or more guide nucleic acids, two or more editing sequences, and two or more recorder sequences are contiguous. In other examples, one or more guide nucleic acids, one or more editing sequences, and one or more recorder sequences are non-contiguous. In other examples, two or more guide nucleic acids, two or more editing sequences, and two or more recorder sequences are non-contiguous.
  • An editing cassette can comprise an editing sequence. An editing sequence can comprise a mutation, such as a synonymous or non-synonymous mutation, and homology arms (HAs). An editing sequence can comprise a mutation, such as a synonymous or non-synonymous mutation, and homology arms (HAs) designed to undergo homologous recombination with the target sequence at the site of nucleic acid-guided nuclease-mediated double strand break (e.g. FIG. 1B).
  • A recorder cassette can comprise a recorder sequence. A recorder sequence can comprise a trackable sequence, such as a barcode or marker, and homology arms (HAs). A recorder sequence can comprise a trackable sequence, such as a barcode or marker, and homology arms (HAs) designed to undergo homologous recombination with the chromosome at the site of nucleic acid-guided nuclease-mediated double strand break (e.g. FIG. 1B).
  • A cassette can encode machinery (e.g. targetable nuclease, guide nucleic acid, editing cassette, and/or recorder cassette as disclosed herein) necessary to induce strand breakage as well as designed repair that can be selectively enriched and/or tracked in cells. A cell can be any cell such as eukaryotic cell, archaeal cell, prokaryotic cell, or microorganisms such as E. coli (e.g. FIG. 2A-2D).
  • A cassette can comprise an editing cassette. A cassette can comprise a recorder cassette. A cassette can comprise a guide nucleic acid and an editing cassette. A cassette can comprise a guide nucleic acid and a recorder cassette. A cassette can comprise a guide nucleic acid, an editing cassette, and a recorder cassette. A cassette can comprise two guide nucleic acids, an editing cassette, and a recorder cassette. A cassette can comprise more than two guide nucleic acids, one or more editing cassettes, and one or more recorder cassettes. These elements of a cassette can be linked covalently. These elements of a cassette can be contiguous. These elements of a cassette can be contiguous.
  • A cassette can comprise an editing sequence. A cassette can comprise a recorder sequence. A cassette can comprise a guide nucleic acid and an editing sequence. A cassette can comprise a guide nucleic acid and a recorder sequence. A cassette can comprise a guide nucleic acid, an editing sequence, and a recorder sequence. A cassette can comprise two guide nucleic acids, an editing sequence, and a recorder sequence. A cassette can comprise more than two guide nucleic acids, one or more editing sequences, and one or more recorder sequences. These elements of a cassette can be linked covalently. These elements of a cassette can be contiguous. These elements of a cassette can be contiguous.
  • Single genome edits can be tracked using sequencing technologies, e.g. short read sequencing technologies (e.g. FIG. 1C), long read sequencing technologies, or any other sequencing technologies known in the art.
  • In some embodiments, upon transformation, each editing cassette generates the designed genetic modification within the transformed cell. In some examples, the editng cassette can act in trans as a barcode of the genetic mutation introduced by the editing cassette and can enable the tracking of this mutation frequency in a complex population over time and across many different growth conditions (e.g. FIG. 2A-2D and FIG. 1C).
  • In some examples, a recording cassette inserts the designed trackable sequence, such as a marker or barcode sequence, within the transformed cell. In some examples, the recorder cassette can act in cis as a barcode of the chromosomal mutation and can enable the tracking of this mutation frequency in a complex population over time and across many different growth conditions.
  • By providing cis and/or trans tracking of designed genomic mutations, the methods provided herein simplify sample preparation and depth of coverage for mapping diversity genome wide, and provide powerful tools for engineering on a genome scale (e.g. FIG. 1C).
  • A plurality of cassettes can be pooled into a library of cassettes. A library of cassettes can comprise at least 2 cassettes. A library of cassettes can comprise from 5 to a million cassettes. A library of cassettes can comprise at least a million cassettes. It should be understood, that a library of cassettes can comprise any number of cassettes.
  • A library of cassettes can comprise cassettes that have any combination of common elements and non-common or unique elements as compared to the other cassettes within the pool. For example, a library of cassettes can comprise common priming sites or common homology arms while also containing non-common or unique barcodes. Common elements can be shared by a plurality, majority, or all of the cassettes within a library of cassettes. Non-common elements can be shared by a plurality, minority, or sub-population of cassettes within the library of cassettes. Unique elements can be shared by a one, a few, or a sub-population of cassettes within the library of cassettes, such that it is able to identify or distinguish the one, few, or sub-population of cassettes from the other cassettes within the library of cassettes. Such combinations of common and non-common are advantageous for multiplexing techniques as disclosed herein.
  • Cassettes disclosed herein can generate the designed genetic modification or insert the designed marker or barcode sequence with high efficiency within a transformed cell. In many examples, the efficiency is greater than 50%. In some examples the efficiency is 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% (e.g., FIGS. 32A, 32B, and 33 ).
  • In some examples, transformation, editing, and/or recording efficiency can be increased by modulating the expression of one or more components disclosed herein, such as a nucleic acid-guided nuclease. Methods for modulating components are disclosed herein and are known in the art. Such methods can include expressing a component, such as a nucleic acid-guided nuclease or CRISPR enzyme of a subject system on a low or high copy plasmid, depending on the experimental design.
  • Disclosed herein are methods and compositions for generating cassettes. A cassettes can comprise a cassettes as disclosed herein. For example, a cassette can comprise any combination of an editing cassette and/or recorder cassette disclosed herein. Such a cassette can be comprised on a larger polynucleic acid molecule. Such a larger polynucleic acid molecule can be linear or circular, such as a plasmid or viral vector.
  • An editing cassette can comprise a mutation relative to a target nucleic acid sequence. The editing cassette can comprise sequence homologous to the target sequence flanking the desired mutation or editing sequence. The editing cassette can comprise a region which recognizes, or hybridizes to, a target sequence of a nucleic acid in a cell or population of cells, is homologous to the target sequence of the nucleic acid of the cell and includes a mutation, or a desired mutation, of at least one nucleotide relative to the target sequence.
  • An editing cassette can comprise a first editing sequence comprising a first mutation relative to a target sequence. A first mutation can comprise a mutation such as an insertion, deletion, or substitution of at least one nucleotide compared to the non-editing target sequence. The mutation can be incorporated into a coding region or non-coding region.
  • An editing cassette can comprise a second editing sequence comprising a second mutation relative to a target sequence. The second mutation can be designed to mutate or otherwise silence a PAM sequence such that a corresponding nucleic acid guided nuclease or CRISPR nuclease is no longer able to cleave the target sequence. In such cases, this mutation or silencing of a PAM can serve as a method for selecting transformants in which the first editing sequence has been incorporated.
  • In some examples, an editing cassette comprises at least two mutations, wherein one mutation is a PAM mutation. In some examples, the PAM mutation can be in a second editing cassette. Such a second editing cassette can be covalently linked and can be continuous or non-contiguous to the other elements in the cassette.
  • An editing cassette can comprise a guide nucleic acid, such as a gRNA encoding gene, optionally operably linked to a promoter. The guide nucleic acid can be designed to hybridize with the targeted nucleic acid sequence in which the editing sequence will be incorporated.
  • A recording cassette can comprise a recording sequence. A recorder sequence can comprise a barcoding sequence, or other screenable or selectable marker or fragment thereof. The recording sequence can be comprised within a recorder cassette. Recorder cassettes can comprise regions homologous to an insertion site within a target nucleic acid sequence such that the recording sequence is incorporated by homologous recombination or homology-driven repair systems. The site of incorporation of the recording cassette can be comprised on the same DNA molecule as the target nucleic acid to be edited by an editing cassette. The recorder sequence can comprise a barcode, unique DNA sequence, and/or a complete copy or fragment of a selectable or screenable element or marker.
  • A recorder cassette can comprise a mutation relative to the target sequence. The mutation can be designed to mutate or otherwise silence a PAM sequence such that a corresponding nucleic acid guided nuclease or CRISPR nuclease is no longer able to cleave the target sequence. In such cases, this mutation or silencing of a PAM site can serve as a method for selecting transformants in which the first recording sequence has been incorporated. A recorder cassette can comprise a PAM mutation. The PAM mutation can be designed to mutate or otherwise silence a PAM site such that a corresponding CRISPR nuclease is no longer able to cleave the target sequence. In such cases, this mutation or silencing of a PAM site can serve as a method for selecting transformants in which the recorder sequence has been incorporated.
  • A recorder cassette can comprise a guide nucleic acid, such as a gene encoding a gRNA. A promoter can be operably linked to a nucleic acid sequence encoding a guide nucleic acid capable of targeting a nucleic acid-guided nuclease to the desired target sequence. A guide nucleic acid can target a unique site within the target site. In some cases, the guide nucleic acid targets a unique landing site that was incorporated in a prior round of engineering. In some cases, the guide nucleic acid targets a unique landing site that was incorporated by a recorder cassette in a prior round of engineering.
  • A recorder cassette can comprise a barcode. A barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode. In some examples, the barcode is a non-naturally occurring sequence that is not found in nature. In most examples, the combination of the desired mutation and the barcode within the editing cassette is non-naturally occurring and not found in nature. A barcode can be any number of nucleotides in length. A barcode can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. In some cases, the barcode is more than 30 nucleotides in length. A barcode can be generated by degenerate oligonucleotide synthesis. A barcode can be rationally designed or user-specified.
  • A recorder cassette can comprise a landing site. A landing site can serve as a target site for a recorder cassette for a successive engineering round. A landing site can comprise a PAM. A landing site can be a unique sequence. A landing site can be at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 nucleotides in length. In some cases, the landing site is greater than 50 nucleotides in length.
  • A recorder cassette can comprise a selectable or screenable marker, or a regulatory sequence or mutation that turns a selectable or screenable marker on or off. In such cases, the turning on or off of a selectable marker can be used of selection or counter-selection, respectively, of iterative rounds of engineering. An example regulatory sequence includes a ribosome-binding site (RBS), though other such regulatory sequences are envisioned. Mutations that turn a selectable or screenable marker on can include any possible start codon that is recognized by the host transcription machinery. A mutation that turns off a selectable or screenable marker includes a mutation that deletes a start codon or one that inserts a premature stop codon or a reading frame shift mutation.
  • A recorder cassette can comprise one or more of a guide nucleic acid targeting a target site into which the recorder sequence is to be incorporated, a PAM mutation to silence a PAM used by the guide RNA, a barcode corresponding to an editing cassette, a unique site to serve as a landing site for a recorder cassette of a subsequent rounds of engineering, a regulatory sequence or mutation that turns a screenable or selectable marker on or off, these one or more elements being flanked by homology arms that are designed to promote recombination of these one or more elements into the cleaved target site that is targeted by the guide RNA.
  • A recorder cassette can comprise a first homology arm, a PAM mutation, a barcode, a unique landing site, a regulatory sequence or mutation for a screenable or selectable marker, a second homology arm, and guide RNA. The first homology arm can be an upstream homology arm. The second homology arm can be a downstream homology arm. The homology arms can be homologous to sequences flanking a cleavage site that is targeted by the guide RNA.
  • A cassette can comprise two guide nucleic acids designed to target two distinct target nucleic acid sequences. In any case, the guide nucleic acid can comprise a single gRNA or chimeric gRNA consisting of a crRNA and trRNA sequences, or alternatively, the gRNA can comprise separated crRNA and trRNAs, or a guide nucleic acid can comprise a crRNA. In other examples, guide nucleic acid can be introduced simultaneously with a trackable polynucleic acid or plasmid comprising an editing cassette and/or recorder cassette. In these cases, the guide nucleic acid can be encoded on a separate plasmid or be delivered in RNA form via delivery methods well known in the art.
  • A cassette can comprise a gene encoding a nucleic acid-guided nuclease, such as a CRISPR nuclease, functional with the chosen guide nucleic acid. A nucleic acid-guided nuclease or CRISPR nuclease gene can be provided on a separate plasmid. A nucleic acid-guided nuclease or CRISPR nuclease can be provided on the genome or episomal plasmid of a host organism to which a trackable polynucleic acid or plasmid will be introduced. In any of these examples, the nucleic acid-guided nuclease or CRISPR nuclease gene can be operably linked to a constitutive or inducible promotor. Examples of suitable constitutive and inducible promoters are well known in the art. A nucleic acid-guided nuclease or CRISPR nuclease can be provided as mRNA or polypeptide using delivery systems well known in the art. Such mRNA or polypeptide delivery systems can include, but are not limited to, nanoparticles, viral vectors, or other cell-permeable technologies.
  • A cassette can comprise a selectable or screenable marker, for example, such as that comprised within a recorder cassette. For example, the recorder cassette can comprise a barcode, such as trackable nucleic acid sequence which can be uniquely correlated with a genetic mutation of the corresponding editing cassette, or otherwise identifiably correlated with such a genetic mutation such that sequencing the barcode will allow identification of the corresponding genetic mutation introduced by the editing cassette. In other examples, recorder cassette can comprise a complete copy of or a fragment of a gene encoding an antibiotic resistance gene, auxotrophic marker, fluorescent protein, or other known selectable or screenable markers.
  • Trackable Plasmid Libraries
  • A trackable library can comprise a plurality of cassettes as disclosed herein. A trackable library can comprise a plurality of trackable polynucleic acids or plasmids comprising a cassette as disclosed herein. A cassette, polynucleotide, or plasmid comprising a recorder sequence or recorder cassette as disclosed herein can be referred to as a trackable cassette, polynucleotide, or plasmid. A cassette, polynucleotide, or plasmid comprising an editing sequence or editing cassette as disclosed herein can be referred to as a trackable cassette, polynucleotide, or plasmid.
  • In some cases, within the trackable library are distinct editing cassette and recorder cassette combinations that are sequenced to determine which editing sequence corresponds with a given marker or barcode sequence comprised within the recorder cassette. Therefore, when the editing and recorder sequences are incorporated into a target sequence, you can determine the edit that was incorporated by sequencing the recorder sequence. Sequence the recorder sequence or barcode can significantly cut down on sequencing time and cost.
  • Library size can depend on the experiment design. For example, if the aim is to edit each amino acid within a protein of interest, then the library size can depend on the number (N) of amino acids in a protein of interest, with a full saturation library (all 20 amino acids at each position or non-naturally occurring amino acids) scaling as 19 (or more)×N and an alanine-mapping library scaling as 1×N. Thus, screening of even very large proteins of more than 1,000 amino acids can be tractable given current multiplex oligo synthesis capabilities (e.g. 120,000 oligos). In addition to or as an alternative to activity screens, more general properties with developed high-throughput screens and selections can be efficiently tested using the libraries disclosed herein. It should be readily understood that libraries can be designed to mutate any number of amino acids within a target protein, including 1, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. up to the total number of amino acids within a target protein. Additionally, select amino acids can be targeted, such as catalytically active amino acids, or those involved in protein-protein interactions. Each amino acid that is targeted for mutation can be mutated into any number of alternate amino acids, such as any other natural or non-naturally occurring amino acid or amino acid analog. In some examples, all targeted amino acids are mutated to the same amino acid, such as alanine. In other cases, the targeted amino acids are independently mutated to any other amino acid in any combination or permutation.
  • Trackable libraries can comprise trackable mutations in individual residues or sequences of interest. Trackable libraries can be generated using custom-synthesized oligonucleotide arrays. Trackable plasmids can be generated using any cloning or assembly methods known in the art. For example, CREATE-Recorder plasmids can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
  • Recorder sequences, such as barcodes, can be designed in silico via standard code with a degenerate mutation at the target codon. The degenerate mutation can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleic acid residues. In some examples, the degenerate mutations can comprise 15 nucleic acid residues (N15).
  • Homology arms can be added to a recorder sequence and/or editing sequence to allow incorporation of the recorder and/or editing sequence into the desired location via homologous recombination or homology-driven repair. Homology arms can be added by synthesis, in vitro assembly, PCR, or other known methods in the art. For example, homology arms can be assembled via overlapping oligo extension, Gibson assembly, or any other method disclosed herein. A homology arm can be added to both ends of a recorder and/or editing sequence, thereby flanking the sequence with two distinct homology arms, for example, a 5′ homology arm and a 3′ homology arm.
  • The same 5′ and 3′ homology arms can be added to a plurality of distinct recorder sequences, thereby generating a library of unique recorder sequences that each have the same spacer target or targeted insertion site. The same 5′ and 3′ homology arms can be added to a plurality of distinct editing sequences, thereby generating a library of unique editing sequences that each have the same spacer target or targeted insertion site. In alternative examples, different or a variety of 5′ or 3′ homology arms can be added to a plurality of recorder sequences or editing sequences.
  • A recorder sequence library comprising flanking homology arms can be cloned into a vector backbone. In some examples, the recorder sequence and homology arms are cloned into a recorder cassette. Recorder cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of recorder sequence insertion. In many cases, the nucleic acid sequences flanking the CRISPR/Cas-mediated cleavage site are homologous or substantially homologous to the homology arms comprised within the recorder cassette.
  • An editing sequence library comprising flanking homology arms can be cloned into a vector backbone. In some examples, the editing sequence and homology arms are cloned into an editing cassette. Editing cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of editing sequence insertion. In many cases, the nucleic acid sequences flanking the CRISPR/Cas-mediated cleavage site are homologous or substantially homologous to the homology arms comprised within the editing cassette.
  • Gene-wide or genome-wide editing libraries can be subcloned into a vector backbone. In some cases, the vector backbone comprises a recorder cassette as disclosed herein. The editing sequence library can be inserted or assembled into a second site to generate competent trackable plasmids that can embed the recording barcode at a fixed locus while integrating the editing libraries at a wide variety of user defined sites.
  • A recorder sequence and/or cassette can be assembled or inserted into a vector backbone first, followed by insertion of an editing sequence and/or cassette. In other cases, an editing sequence and/or cassette can be inserted or assembled into a vector backbone first, followed by insertion of a recorder sequence and/or cassette. In other cases, a recorder sequence and/or cassette and an editing sequence and/or cassette are simultaneous inserted or assembled into a vector. In other cases, a recorder sequence and/or cassette and an editing sequence and/or cassette are comprised on the same cassette prior to simultaneous insertion or assembly into a vector. In other cases, a recorder sequence and/or cassette and an editing sequence and/or cassette are linked prior to simultaneous insertion or assembly into a vector. In other cases, a recorder sequence and/or cassette and an editing sequence and/or cassette are covalently linked prior to simultaneous insertion or assembly into a vector. In any of these cases, trackable plasmids or plasmid libraries can be generated.
  • A cassette or nucleic acid molecule can be synthesized which comprises one or more elements disclosed herein. For example, a nucleic acid molecule can be synthesized that comprises an editing cassette and a guide nucleic acid. A nucleic acid molecule can be synthesized that comprises an editing cassette and a recorder cassette. A nucleic acid molecule can be synthesized that comprises an editing cassette, a guide nucleic acid, and a recorder cassette. A nucleic acid molecule can be synthesized that comprises an editing cassette, a recorder cassette, and two guide nucleic acids. A nucleic acid molecule can be synthesized that comprises a recorder cassette and a guide nucleic acid. A nucleic acid molecule can be synthesized that comprises a recorder cassette. A nucleic acid molecule can be synthesized that comprises an editing cassette. In any of these cases, the guide nucleic acid can optionally be operably linked to a promoter. In any of these cases, the nucleic acid molecule can further include one or more barcodes.
  • Synthesized cassettes or synthesized nucleic acid molecules can be synthesized using any oligonucleotide synthesis method known in the art. For example, cassettes can be synthesized by array based oligonucleotide synthesis. In such examples, following synthesis of the oligonucleotides, the oligonucleotides can be cleaved from the array. Cleavage of oligonucleotides from an array can create a pool of oligonucleotides.
  • Software and automation methods can be used for multiplex synthesis and generation. For example, software and automation can be used to create 10, 102, 103, 104, 105, 106, or more cassettes, such as trackable cassettes. An automation method can generate trackable plasmids in rapid fashion. Trackable cassettes can be processed through a workflow with minimal steps to produce precisely defined genome-wide libraries.
  • Cassette libraries, such as trackable cassette libraries, can be generated which comprise two or more nucleic acid molecules or plasmids comprising any combination disclosed herein of recorder sequence, editing sequence, guide nucleic acid, and optional barcode, including combinations of one or more of any of the previously mentioned elements. For example, such a library can comprise at least 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 104, 105, 106, 107, 108, 109, 1010, or more nucleic acid molecules or plasmids of the present disclosure. It should be understood that such a library can include any number of nucleic acid molecules or plasmids, even if the specific number is not explicit listed above.
  • Cassettes or cassette libraries can be sequenced in order to determine the recorder sequence and editing sequence pair that is comprised on each cassette. In other cases, a known recorder sequence is paired with a known editing sequence during the library generation process. Other methods of determining the association between a recorder sequence and editing sequence comprised on a common nucleic acid molecule or plasmid are envisioned such that the editing sequence can be identified by identification or sequencing of the recorder sequence.
  • Methods and compositions for tracking edited episomal libraries that are shuttled between E. coli and other organisms/cell lines are provided herein. The libraries can be comprised on plasmids, Bacterial artificial chromosomes (BACs), Yeast artificial chromosomes (YACs), synthetic chromosomes, or viral or phage genomes. These methods and compositions can be used to generate portable barcoded libraries in host organisms, such as E. coli. Library generation in such organisms can offer the advantage of established techniques for performing homologous recombination. Barcoded plasmid libraries can be deep-sequenced at one site to track mutational diversity targeted across the remaining portions of the plasmid allowing dramatic improvements in the depth of library coverage (e.g. FIG. 3A).
  • Trackable Engineering Methods
  • An example of trackable engineering workflow is depicted in FIG. 3A. Each plasmid can encode a recorder cassette designed to edit a site in the target DNA (e.g. FIG. 3A, black cassette). Sites to be targeted can be functionally neutral sites, or they can be a screenable or selectable marker gene. The homology arm (HA) of the recorder cassette can contain a recorder sequence (e.g., FIG. 3B) that is inserted into the recording site during recombineering. Recombineering can comprise DNA cleavage, such as nucleic acid-guided nuclease-mediated DNA cleavage, and repair via homologous recombination. The recorder sequence can comprise a barcode, unique DNA sequence, or a complete copy or fragment of a screenable or selectable marker. In some examples, the recorder sequence is 15 nucleotides. The recorder sequence can comprise less than 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 88, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more than 200 nucleotides.
  • Through a multiplexed cloning approach, the recorder cassette can be covalently coupled to at least one editing cassette in a plasmid (e.g., FIG. 3A, green cassette) to generate trackable plasmid libraries that have a unique recorder and editing cassette combination. This trackable library can be sequenced to generate the recorder/edit mapping and used to track editing libraries across large segments of the target DNA (e.g., FIG. 3C). Recorder and editing sequences can be comprised on the same polynucleotide, in which case they are both incorporated into the target nucleic acid sequence, such as a genome or plasmid, by the same recombination event. In other examples, the recorder and editing sequences can be comprised on separate cassettes within the same trackable plasmid, in which case the recorder and editing sequences are incorporated into the target nucleic acid sequence by separate recombination events, either simultaneously or sequentially.
  • Methods are provided herein for combining multiplex oligonucleotide synthesis with recombineering, to create libraries of specifically designed and trackable mutations. Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of mutations leading to a phenotype of interest.
  • Methods and compositions disclosed herein can be used to simultaneously engineer and track engineering events in a target nucleic acid sequence.
  • Trackable plasmids can be generated using in vitro assembly or cloning techniques. For example, the CREATE-Recorder plasmids can be generated using chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
  • Trackable plasmids can comprise at least one recording sequence, such as a barcode, and at least one editing sequence. In most cases, the recording sequence is used to record and track engineering events. Each editing sequence can be used to incorporate a desired edit into a target nucleic acid sequence. The desired edit can include insertion, deletion, substitution, or alteration of the target nucleic acid sequence. In some examples, the one or more recording sequence and editing sequences are comprised on a single cassette comprised within the trackable plasmid such that they are incorporated into the target nucleic acid sequence by the same engineering event. In other examples, the recording and editing sequences are comprised on separate cassettes within the trackable plasmid such that they are each incorporated into the target nucleic acid by distinct engineering events. In some examples, the trackable plasmid comprises two or more editing sequences. For example, one editing sequence can be used to alter or silence a PAM sequence while a second editing sequence can be used to incorporate a mutation into a distinct sequence.
  • Recorder sequences can be inserted into a site separated from the editing sequence insertion site. The inserted recorder sequence can be separated from the editing sequence by 1 bp or any number of base pairs. For example, the separation distance can be about 1 bp, 10 bp, 50 bp, 100 bp, 500 bp, lkp, 2 kb, 5 kb, 10 kb, or greater. The separation distance can be any discrete integer of base pairs. It should be readily understood that there the limit of the number of base pairs separating the two insertion sites can be limited by the size of the genome, chromosome, or polynucleotide into which the insertions are being made. In some examples, the maximum distance of separation depends on the size of the target nucleic acid or genome.
  • Recorder sequences can be inserted adjacent to editing sequences, or within proximity to the editing sequence. For example, the recorder sequence can be inserted outside of the open reading frame within which the editing sequence is inserted. Recorder sequence can be inserted into an untranslated region adjacent to an open reading frame within which an editing sequence has been inserted. The recorder sequence can be inserted into a functionally neutral or non-functional site. The recorder sequence can be inserted into a screenable or selectable marker gene.
  • In some examples, the target nucleic acid sequence is comprised within a genome, artificial chromosome, synthetic chromosome, or episomal plasmid. In various examples, the target nucleic acid sequence can be in vitro or in vivo. When the target nucleic acid sequence is in vivo, the CREATE-Recorder plasmid can be introduced into the host organisms by transformation, transfection, conjugation, biolistics, nanoparticles, cell-permeable technologies, or other known methods for DNA delivery, or any combination thereof. In such examples, the host organism can be a eukaryote, prokaryote, bacterium, archaea, yeast, or other fungi.
  • The engineering event can comprise recombineering, non-homologous end joining, homologous recombination, or homology-driven repair. In some examples, the engineering event is performed in vitro or in vivo.
  • The methods described herein can be carried out in any type of cell in which a nucleic acid-guided nuclease system can function (e.g., target and cleave DNA), including prokaryotic and eukaryotic cells or in vitro. In some embodiments the cell is a bacterial cell, such as Escherichia spp. (e.g., E. coli). In other embodiments, the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp. In other embodiments, the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
  • In some examples, a cell is a recombinant organism. For example, the cell can comprise a non-native nucleic acid-guided nuclease system. Additionally or alternatively, the cell can comprise recombination system machinery. Such recombination systems can include lambda red recombination system, Cre/Lox, attB/attP, or other integrase systems. Where appropriate, the trackable plasmid can have the complementary components or machinery required for the selected recombination system to work correctly and efficiently.
  • A method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing cassette; (c) obtaining viable cells. Such a method can optionally further comprise (d) sequencing the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • A method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette comprising a PAM mutation as disclosed herein and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells. Such a method can optionally further comprise (d) sequencing the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • Method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, at least one recorder cassette, and at least two gRNA into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing and recorder cassettes; (c) obtaining viable cells. Such a method can optionally further comprise (d) sequencing the recorder sequence of the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • In some examples where the trackable plasmid comprises an editing cassette designed to silence a PAM site, a method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, a recorder cassette, and at least two gRNA into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette and recorder cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; and (c) obtaining viable cells. Such a method can optionally further comprise (d) sequencing the recorder sequence of the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon. Such methods can also further comprise a recorder cassette comprising a second PAM mutation, such that both PAMs must be silences by the editing cassette PAM mutation and recorder cassette PAM mutation in order to escape cell death.
  • In some examples transformation efficiency is determined by using a non-targeting guide nucleic acid control, which allows for validation of the recombineering procedure and CFU/ng calculations. In some cases, absolute efficient is obtained by counting the total number of colonies on each transformation plate, for example, by counting both red and white colonies from a galK control. In some examples, relative efficiency is calculated by the total number of successful transformants (for example, white colonies) out of all colonies from a control (for example, galK control).
  • The methods of the disclosure can provide, for example, greater than 1000× improvements in the efficiency, scale, cost of generating a combinatorial library, and/or precision of such library generation.
  • The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the efficiency of generating genomic or combinatorial libraries.
  • The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the scale of generating genomic or combinatorial libraries.
  • The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater decrease in the cost of generating genomic or combinatorial libraries.
  • The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the precision of genomic or combinatorial library generation.
  • Recursive Tracking for Combinatorial Engineering
  • Disclosed herein are methods and compositions for iterative rounds of engineering. Disclosed herein are recursive engineering strategies that allow implementation of trackable engineering at the single cell level through several serial engineering cycles (e.g., FIG. 3D or FIG. 6 ). These disclosed methods and compositions can enable search-based technologies that can effectively construct and explore complex genotypic space. The terms recursive and iterative can be used interchangeably.
  • Combinatorial engineering methods can comprise multiple rounds of engineering. Methods disclosed herein can comprise 2 or more rounds of engineering. For example, a method can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more than 30 rounds of engineering.
  • In some examples, during each round of engineering a new recorder sequence, such as a barcode, is incorporated at the same or nearby locus in a target site (e.g., FIG. 3D, green bars or FIG. 6 , black bars) such that following multiple engineering cycles to construct combinatorial diversity throughout the genome (e.g., FIG. 3E, green bars or FIG. 6 , grey bars) a PCR, or similar reaction, of the recording locus can be used to reconstruct each combinatorial genotype or to confirm that the engineered edit from each round has been incorporated into the target site.
  • Disclosed herein are methods for selecting for successive rounds of engineering. Selection can occur by a PAM mutation incorporated by an editing cassette. Selection can occur by a PAM mutation incorporated by a recorder cassette. Selection can occur using a screenable, selectable, or counter-selectable marker. Selection can occur by targeting a site for editing or recording that was incorporated by a prior round of engineering, thereby selecting for variants that successfully incorporated edits and recorder sequences from both rounds or all prior rounds of engineering.
  • Quantitation of these genotypes can be used for understanding combinatorial mutational effects on large populations and investigation of important biological phenomena such as epistasis.
  • Serial editing and combinatorial tracking can be implemented using recursive vector systems as disclosed herein. These recursive vector systems can be used to move rapidly through the transformation procedure (e.g., FIG. 7A). In some examples, these systems consist of two or more plasmids containing orthogonal replication origins, antibiotic markers, and gRNAs. The gRNA in each vector can be designed to target one of the other resistance markers for destruction by nucleic acid-guided nuclease-mediated cleavage. These systems can be used, in some examples, to perform transformations in which the antibiotic selection pressure is switched to remove the previous plasmid and drive enrichment of the next round of engineered genomes. Two or more passages through the transformation loop can be performed, or in other words, multiple rounds of engineering can be performed. Introducing the requisite recording cassettes and editing cassettes into recursive vectors as disclosed herein can be used for simultaneous genome editing and plasmid curing in each transformation step with high efficiencies.
  • In some examples, the recursive vector system disclosed herein comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique plasmids. In some examples, the recursive vector system can use a particular plasmid more than once as long as a distinct plasmid is used in the previous round and in the subsequent round.
  • Recursive methods and compositions disclosed herein can be used to restore function to a selectable or screenable element in a targeted genome or plasmid. The selectable or screenable element can include an antibiotic resistance gene, a fluorescent gene, a unique DNA sequence or watermark, or other known reporter, screenable, or selectable gene. In some examples, each successive round of engineering can incorporate a fragment of the selectable or screenable element, such that at the end of the engineering rounds, the entire selectable or screenable element has been incorporated into the target genome or plasmid. In such examples, only those genome or plasmids, which have successfully incorporated all of the fragments, and therefore all of the desired corresponding mutations, can be selected or screened for. In this way, the selected or screened cells will be enriched for those that have incorporated the edits from each and every iterative round of engineering.
  • Recursive methods can be used to switch a selectable or screenable marker between an on and an off position, or between an off and an on position, with each successive round of engineering. Using such a method allows conservation of available selectable or screenable markers by requiring, for example, the use of only one screenable or selectable marker. Furthermore, short regulatory sequence or start codon or non-start codons can be used to turn the screenable or selectable marker on and off. Such short sequences can easily fit within a cassette or polynucleotide, such as a synthesized cassette.
  • One or more rounds of engineering can be performed using the methods and compositions disclosed herein. In some examples, each round of engineering is used to incorporate an edit unique from that of previous rounds. Each round of engineering can incorporate a unique recording sequence. Each round of engineering can result in removal or curing of the CREATE plasmid used in the previous round of engineering. In some examples, successful incorporation of the recording sequence of each round of engineering results in a complete and functional screenable or selectable marker or unique sequence combination.
  • Unique recorder cassettes comprising recording sequences such as barcodes or screenable or selectable markers can be inserted with each round of engineering, thereby generating a recorder sequence that is indicative of the combination of edits or engineering steps performed. Successive recording sequences can be inserted adjacent to one another. Successive recording sequences can be inserted within proximity to one another. Successive sequences can be inserted at a distance from one another.
  • Successive sequences can be inserted at a distance from one another. For example, successive recorder sequences can be inserted and separated by 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or greater than 100 bp. In some examples, successive recorder sequences are separated by about 10, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, or greater than 1500 bp.
  • Successive recorder sequences can be separated by any desired number of base pairs and can be dependent and limited on the number of successive recorder sequences to be inserted, the size of the target nucleic acid or target genomes, and/or the design of the desired final recorder sequence. For example, if the compiled recorder sequence is a functional screenable or selectable marker, than the successive recording sequences can be inserted within proximity and within the same reading frame from one another. If the compiled recorder sequence is a unique set of barcodes to be identified by sequencing and have no coding sequence element, then the successive recorder sequences can be inserted with any desired number of base pairs separating them. In these cases, the separation distance can be dependent on the sequencing technology to be used and the read length limit.
  • In some examples, a recorder cassette comprises a landing site to be used as a target site for the recorder cassette of the next round of engineering. By using such a method, successive rounds of recorder cassettes can only be introduced into the target site if the recorder cassette from the previous round was successfully incorporated, thereby providing the target site for the present engineering round (e.g., FIG. 28 ).
  • Guide Nucleic Acid
  • A guide nucleic acid can complex with a compatible nucleic acid-guided nuclease and can hybridize with a target sequence, thereby directing the nuclease to the target sequence. A subject nucleic acid-guided nuclease capable of complexing with a guide nucleic acid can be referred to as a nucleic acid-guided nuclease that is compatible with the guide nucleic acid. Likewise, a guide nucleic acid capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide nucleic acid that is compatible with the nucleic acid-guided nucleases.
  • A guide nucleic acid can be DNA. A guide nucleic acid can be RNA. A guide nucleic acid can comprise both DNA and RNA. A guide nucleic acid can comprise modified of non-naturally occurring nucleotides. In cases where the guide nucleic acid comprises RNA, the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • A guide nucleic acid can comprise a guide sequence. A guide sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence. The degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences. In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long. The guide sequence can be 15-20 nucleotides in length. The guide sequence can be 15 nucleotides in length. The guide sequence can be 16 nucleotides in length. The guide sequence can be 17 nucleotides in length. The guide sequence can be 18 nucleotides in length. The guide sequence can be 19 nucleotides in length. The guide sequence can be 20 nucleotides in length.
  • A guide nucleic acid can comprise a scaffold sequence. In general, a “scaffold sequence” includes any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid comprising a scaffold sequence and a guide sequence. Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure. In some cases, the one or two sequence regions are comprised or encoded on the same polynucleotide. In some cases, the one or two sequence regions are comprised or encoded on separate polynucleotides. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions. In some embodiments, the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, at least one of the two sequence regions is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • A scaffold sequence of a subject guide nucleic acid can comprise a secondary structure. A secondary structure can comprise a pseudoknot region. In some example, the compatibility of a guide nucleic acid and nucleic acid-guided nuclease is at least partially determined by sequence within or adjacent to a pseudoknot region of the guide RNA. In some cases, binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence. In some cases, binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence.
  • In aspects of the invention the terms “guide nucleic acid” refers to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with an nucleic acid-guided nuclease as described herein.
  • A guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence. Often, a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci. In other words, native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features. Common features can include sequence outside a pseudoknot region. Common features can include a pseudoknot region. Common features can include a primary sequence or secondary structure.
  • A guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence. A guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid. Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • More Methods
  • Disclosed herein are methods for genome engineering that employ a nuclease, such as a nucleic acid-guided nuclease to perform directed genome evolution/produce changes (deletions, substitutions, additions) in a target sequence, such as DNA or RNA, for example, genomic DNA or episomal DNA. Suitable nucleases can include, for example, RNA-guided nucleases such as Cas9, Cpf1, MAD2, or MAD7, DNA-guided nucleases such as Argonaute, or other nucleases such as zinc-finger nucleases, TALENs, or meganucleases. Nuclease genes can be obtained from any source, such as from a bacterium, archaea, prokaryote, eukaryote, or virus. For example, a Cas9 gene can be obtained from a bacterium harboring the corresponding Type II CRISPR system, such as the bacterium S. pyogenes (SEQ ID NO: 110). The nucleic acid sequence and/or amino acid sequence of the nuclease may be mutated, relative to the sequence of a naturally occurring nuclease. A mutation can be, for example, one or more insertions, deletions, substitutions or any combination of two or three of the foregoing. In some cases, the resulting mutated nuclease can have enhanced or reduced nuclease activity relative to the naturally occurring nuclease. In some cases, the resulting mutated nuclease can have no nuclease activity relative to the naturally occurring nuclease.
  • Methods for nucleic acid-guided nuclease-mediated genome editing are provided herein. Some disclosed methods can include a two-stage construction process which relies on generation of cassette libraries that incorporate directed mutations from an editing cassettes directly into a genome, episomal nucleic acid molecule, or isolated nucleic acid molecule. In some examples, during the first stage of cassette library construction, rationally designed editing cassettes can be cotransformed into cells with a guide nucleic acid (e.g., guide RNA) that hybridizes to or targets a target DNA sequence. In some examples, the guide nucleic acid is introduced as an RNA molecule, or encoded on a DNA molecule.
  • Editing cassettes can be designed such that they couple deletion or mutation of a PAM site with the mutation of one or more desired codons or nucleic acid residues in the adjacent nucleic acid sequence. The deleted or mutated PAM site, in some cases, can no longer be recognized by the chosen nucleic acid-guided nuclease. In some examples, at least one PAM or more than one PAM can be deleted or mutated, such as two, three, four, or more PAMs.
  • Methods disclosed herein can enable generation of an entire cassette library in a single transformation. The cassette library can be retrieved, in some cases, by amplification of the recombinant chromosomes, e.g. by a PCR reaction, using a synthetic feature or priming site from the editing cassettes. In some examples, a second PAM deletion or mutation is simultaneously incorporated. This approach can covalently couple the codon-targeted mutations directly to a PAM deletion.
  • In some examples, there is a second stage to construction of cassette libraries. During the second stage the PCR amplified cassette libraries carrying the destination PAM deletion/mutation and the targeted mutations, such as a desired mutation of one or more nucleotides, such as one or more nucleotides in one or more codons, can be co-transformed into naive cells. The cells can be eukaryotic cell, archaeal cell, or prokaryotic cells. The cassette libraries can be co-transformed with a guide nucleic acid or plasmid encoding the same to generate a population of cells that express a rationally designed protein library. The libraries can be co-transformed with a guide nucleic acid such as a gRNA, chimeric gRNA, split gRNA, or a crRNA and trRNA set. The cassette library can comprise a plurality of cassettes wherein each cassette comprises an editing cassette and guide nucleic acid. The cassette library can comprise a plurality of cassettes wherein each cassette comprises an editing cassette, recorder cassettes and two guide nucleic acids.
  • In some targetable nuclease systems, the guide nucleic acid can guide selection of a target sequence. As used herein, a target sequence refers to any locus in vitro or in in vivo, or in the nucleic acid of a cell or population of cells in which a mutation of at least one nucleotide, such as a mutation of at least one nucleotide in at least one codon, is desired. The target sequence can be, for example, a genomic locus, target genomic sequence, or extrachromosomal locus. The guide nucleic acid can be expressed as a DNA molecule, referred to as a guide DNA, or as a RNA molecule, referred to as a guide RNA. A guide nucleic acid can comprise a guide sequence, that is complementary to a region of the target region. A guide nucleic acid can comprise a scaffold sequence that can interact with a compatible nucleic acid-guided nuclease, and can optionally form a secondary structure. A guide nucleic acid can functions to recruit a nucleic acid-guided nuclease to the target site. A guide sequence can be complementary to a region upstream of the target site. A guide sequence can be complementary to at least a portion of the target site. A guide sequence can be completely complementary (100% complementary) to the target site or include one or more mismatches, provided that it is sufficiently complementary to the target site to specifically hybridize/guide and recruit the nuclease. Suitable nucleic acid guided nuclease include, as non-limiting examples, CRISPR nucleases, Cas nucleases, such as Cas9 or Cpf1, MAD2, and MAD7.
  • In some CRISPR systems, the CRISPR RNA (crRNA or spacer-containing RNA) and trans-activating CRISPR RNA (tracrRNA or trRNA) can guide selection of a target sequence. As used herein, a target sequence refers to any locus in vitro or in in vivo, or in the nucleic acid of a cell or population of cells in which a mutation of at least one nucleotide, such as a mutation of at least one nucleotide in at least one codon, is desired. The target sequence can be, for example, a genomic locus, target genomic sequence, or extrachromosomal locus. The tracrRNA and crRNA can be expressed as a single, chimeric RNA molecule, referred to as a single-guide RNA, guide RNA, or gRNA. The nucleic acid sequence of the gRNA comprises a first nucleic acid sequence, also referred to as a first region, that is complementary to a region of the target region and a second nucleic acid sequence, also referred to a second region, that forms a stem loop structure and functions to recruit a CRISPR nuclease to the target region. The first region of the gRNA can be complementary to a region upstream of the target genomic sequence. The first region of the gRNA can be complementary to at least a portion of the target region. The first region of the gRNA can be completely complementary (100% complementary) to the target genomic sequence or include one or more mismatches, provided that it is sufficiently complementary to the target genomic sequence to specifically hybridize/guide and recruit a CRISPR nuclease, such as Cas9 or Cpf1.
  • A guide sequence or first region of the gRNA can be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or at least 30 nucleotides in length. The guide sequence or first region of the gRNA can be at least 20 nucleotides in length.
  • A stem loop structure that can be formed by the scaffold sequence or second nucleic acid sequence of a gRNA can be at least 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 7, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides in length. A stem loop structure can be from 80 to 90 or 82 to 85 nucleotides in length. A scaffold sequence or second region of the gRNA that forms a stem loop structure can be 83 nucleotides in length.
  • A guide nucleic acid of a cassette that is introduced into a first cell using the methods disclosed herein can be the same as the guide nucleic acid of a second cassette that is introduced into a second cell. More than one guide nucleic acid can be introduced into the population of first cells and/or the population of second cells. The more than one guide nucleic acids can comprise guide sequences that are complementary to more than one target region.
  • Methods disclosed herein can comprise using oligonucleotides. Such oligonucleotides can be obtained or derived from many sources. For example, an oligonucleotide can be derived from a nucleic acid library that has been diversified by nonhomologous random recombination (NRR); such a library is referred to as an NRR library. An oligonucleotide can be synthesized, for example by array-based synthesis or other known chemical synthesis method. The length of an oligonucleotide can be dependent on the method used in obtaining the oligonucleotide. An oligonucleotide can be approximately 50-200 nucleotides, 75-150 nucleotides, or between 80-120 nucleotides in length. An oligonucleotide can be about 10, 20, 30, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more nucleotides in length, including any integer, for example, 51, 52, 53, 54, 201, 202, etc. An oligonucleotide can be about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, 2000, or more nucleotides in length, including any integer, for example, 101, 203, 1001, 2001, 2010, etc.
  • Oligonucleotides and/or other nucleic acid molecules can be combined or assembled to generate a cassette. Such a cassette can comprise (a) a region that is homologous to a target region of the nucleic acid of the cell and includes a desired mutation of at least one nucleotide or one codon relative to the target region, and (b) a protospacer adjacent motif (PAM) mutation. The PAM mutation can be any insertion, deletion or substitution of one or more nucleotides that mutates the sequence of the PAM such that it is no longer recognized by a nucleic acid-guided nuclease system or CRISPR nuclease system. A cell that comprises such a PAM mutation may be said to be “immune” to nuclease-mediated killing. The desired mutation relative to the sequence of the target region can be an insertion, deletion, and/or substitution of one or more nucleotides. In some examples, the insertion, deletion, and/or substitution of one or more nucleotides is in at least one codon of the target region. Alternatively, the cassette can be synthesized in a single synthesis, comprising (a) a region that is homologous to a target region of the nucleic acid of the cell and includes a desired mutation of at least one nucleotide or one codon relative to the target region, (b) a protospacer adjacent motif (PAM) mutation, and optionally (c) a region that is homologous to a second target region of the nucleic acid of the cell and includes a recorder sequence.
  • The methods disclosed herein can be applied to any target nucleic acid molecule of interest, from any prokaryote including bacteria and archaea, or any eukaryote, including yeast, mammalian, and human genes, or any viral particle. The nucleic acid module can be a non-coding nucleic acid sequence, gene, genome, chromosome, plasmid, episomal nucleic acid molecule, artificial chromosome, synthetic chromosome, or viral nucleic acid.
  • Methods for assessing recovery efficiency of donor strain libraries are disclosed herein. Recovery efficiency can be verified based on the presence of a PCR product or on changes in amplicon or PCR product sizes or sequence obtained with primers directed at the selected target locus. Primers can be designed to hybridize with endogenous sequences or heterologous sequences contained on the donor nucleic acid molecule. For example, the PCR primer can be designed to hybridize to a heterologous sequence such that PCR will only be possible if the donor nucleic acid is incorporated. Sequencing of PCR products from the recovered libraries indicates the heterologous sequence or synthetic priming site from the dsDNA cassettes or donor sequences can be incorporated with about 90-100% efficiency. In other examples, the efficiency can be about 5%, 10% 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100%.
  • In some cases, the ability to improve final editing efficiencies of the methods disclosed herein can be assessed by carrying out cassette construction in gene deficient strains before transferring to a wild-type donor strain in an effort to prevent loss of mutations during the donor construction phase. Additionally or alternatively, efficiency of the disclosed methods can be assessed by targeting an essential gene. Essential genes can include any gene required for survival or replication of a viral particle, cell, or organism. In some examples, essential genes include dxs, metA, and folA. Essential genes have been effectively targeted using guide nucleic acid design strategies described. Other suitable essential genes are well known in the art.
  • Provided herein are method of increasing editing efficiencies by modulating the level of a nucleic acid-guided nuclease. This could be done by using copy control plasmids, such as high copy number plasmids or low copy number plasmids. Low copy number plasmids could be plasmids that can have about 20 or less copies per cell, as opposed to high copy number plasmids that can have about 1000 copies per cell. High copy number plasmids and low copy number plasmids are well known in the art and it is understood that an exact plasmid copy per cell does not need to be known in order to characterize a plasmid as either high or low copy number.
  • In some cases, the decreasing expression level of a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7, can increase transformation, editing, and/or recording efficiencies. In some cases, decreasing expression level of the nucleic acid-guided nuclease is done by expressing the nucleic acid-guided nuclease on a low copy number plasmid.
  • In some cases, the increasing expression level of a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7, can increase transformation, editing, and/or recording efficiencies. In some cases, increasing expression level of the nucleic acid-guided nuclease is done by expressing the nucleic acid-guided nuclease on a high copy number plasmid.
  • Other methods of modulating the expression level of a protein are also envisioned and are known in the art. Such methods include using a inducible or constitutive promoter, incorporating enhancers or other expression regulatory elements onto an expression plasmid, using RNAi, amiRNAi, or other RNA silencing techniques to modulate transcript level, fusing the protein of interest to a degradation domain, or any other method known in the art.
  • Provided herein are methods for generating mutant libraries. In some examples, the mutant library can be effectively constructed and retrieved within 1-3 hours post recombineering. In some examples, the mutant library is constructed within 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, or 24 hours post recombineering. In some examples, the mutant library can be retrieved within 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 24, 36, or 48 hours post recombineering and/or post-constructing by recombineering.
  • Some methods disclosed herein can be used for trackable, precision genome editing. In some examples, methods disclosed herein can achieve high efficiency editing/mutating using a single cassette that encodes both an editing cassette and guide nucleic acid, and optionally a recorder cassette and second guide nucleic acid. Alternatively, a single vector can encode an editing cassette while a guide nucleic acid is provided sequentially or concomitantly. When used with parallel DNA synthesis, such as array-based DNA synthesis, methods disclosed herein can provide single step generation of hundreds or thousands of precision edits/mutations. Mutations can be mapped by sequencing the editing cassette on the vector, rather than by sequencing of the genome or a section of the genome of the cell or organism.
  • The methods disclosed herein can have broad utility in protein and genome engineering applications, as well as for reconstruction of mutations, such as mutations identified in laboratory evolution experiments. In some examples, the methods and compositions disclosed here can combine an editing cassette, which could include a desired mutation and a PAM mutation, with a gene encoding a guide nucleic acid on a single vector.
  • In some examples, a trackable mutant library can be generated in a single transformation or single reaction.
  • Methods disclosed herein can comprise introducing a cassette comprising an editing cassette that includes the desired mutation and the PAM mutation into a cell or population of cells. In some embodiments, the cell into which the cassette or vector is introduced also comprises a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7. In some embodiments, a gene or mRNA encoding the nucleic acid-guided nuclease is concomitantly, sequentially, or subsequently introduced into the cell or population of cells. Expression of a targetable nuclease system, including nucleic acid-guided nuclease and a guide nucleic acid, in the cell or cell population can be activated such that the guide nucleic acid recruits the nucleic acid-guided nuclease to the target region, where dsDNA cleavage occurs.
  • In some examples, without wishing to be bound by any particular theory, the homologous region of an editing cassette complementary to the target sequence mutates the PAM and the one or more codon of the target sequence. Cells of the population of cells that did not integrate the PAM mutation can undergo unedited cell death due to nucleic acid-guided nuclease mediated dsDNA cleavage. In some examples, cells of the population of cells that integrate the PAM mutation do not undergo cell death; they remain viable and are selectively enriched to high abundance. Viable cells can be obtained and can provide a library of trackable or targeted mutations.
  • In some examples, without wishing to be bound by any particular theory, the homologous region of a recorder cassette complementary to the target sequence mutates the PAM and introduces a barcode into a target sequence. Cells of the population of cells that did not integrate the PAM mutation can undergo unedited cell death due to nucleic acid-guided nuclease mediated dsDNA cleavage. In some examples, cells of the population of cells that integrate the PAM mutation do not undergo cell death; they remain viable and are selectively enriched to high abundance. Viable cells can be obtained and can provide a library of trackable mutations.
  • A separate vector or mRNA encoding a nucleic acid-guided nuclease can be introduced into the cell or population of cells. Introducing a vector or mRNA into a cell or population of cells can be performed using any method or technique known in the art. For example, vectors can be introduced by standard protocols, such as transformation including chemical transformation and electroporation, transduction and particle bombardment. Additionally or alternatively, mRNA can be introduced by standard protocols, such as transformation as disclosed herein, and/or by techniques involving cell permeable peptides or nanoparticles.
  • An editing cassette can include (a) a region, which recognizes (hybridizes to) a target region of a nucleic acid in a cell or population of cells, is homologous to the target region of the nucleic acid of the cell and includes a mutation, referred to a desired mutation, of at least one nucleotide that can be in at least one codon relative to the target region, and (b) a protospacer adjacent motif (PAM) mutation. In some examples, the editing cassette also comprises a barcode. The barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode. The PAM mutation may be any insertion, deletion or substitution of one or more nucleotides that mutates the sequence of the PAM such that the mutated PAM (PAM mutation) is not recognized by a chosen nucleic acid-guided nuclease system. A cell that comprises such as a PAM mutation may be said to be “immune” to nucleic acid-guided nuclease-mediated killing. The desired mutation relative to the sequence of the target region may be an insertion, deletion, and/or substitution of one or more nucleotides and may be at least one codon of the target region. In some embodiments, the distance between the PAM mutation and the desired mutation is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides on the editing cassette In some embodiments, the PAM mutation is located at least 9 nucleotides from the end of the editing cassette. In some embodiments, the desired mutation is located at least 9 nucleotides from the end of the editing cassette.
  • A desired mutation can be an insertion of a nucleic acid sequence relative to the sequence of the target sequence. The nucleic acid sequence inserted into the target sequence can be of any length. In some embodiments, the nucleic acid sequence inserted is at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, or at least 2000 nucleotides in length. In embodiments in which a nucleic acid sequence is inserted into the target sequence, the editing cassette comprises a region that is at least 10, 15, 20, 25, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 51, 52, 53, 54, 55, 56, 57, 58, 59, or at least 60 nucleotides in length and homologous to the target sequence. The homology arms or homologous region can be about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more nucleotides in length, including any integer therein. The homology arms or homologous region can be over 200 nucleotides in length.
  • A barcode can be a unique barcode or relatively unique such that the corresponding mutation can be identified based on the barcode. In some examples, the barcode is a non-naturally occurring sequence that is not found in nature. In most examples, the combination of the desired mutation and the barcode within the editing cassette is non-naturally occurring and not found in nature. A barcode can be any number of nucleotides in length. A barcode can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. In some cases, the barcode is more than 30 nucleotides in length.
  • An editing cassette or recorder cassette can comprise at least a portion of a gene encoding a guide nucleic acid, and optionally a promoter operable linked to the encoded guide nucleic acid. In some embodiments, the portion of the gene encoding the guide nucleic acid encodes the portion of the guide nucleic acid that is complementary to the target sequence. The portion of the guide nucleic acid that is complementary to the target sequence, or the guide sequence, can be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or at least 30 nucleotides in length. In some embodiments, the guide sequence is 24 nucleotides in length. In some embodiments, the guide sequence is 18 nucleotides in length.
  • In some embodiments, the editing cassette or recorder cassette further comprises at least two priming sites. The priming sites may be used to amplify the cassette, for example by PCR. In some embodiments, the portion of the guide sequence is used as a priming site.
  • Editing cassettes or recorder cassettes for use in the described methods can be obtained or derived from many sources. For example, the cassettes can be synthesized, for example by array-based synthesis, multiplex synthesis, multi-parallel synthesis, PCR assembly, in vitro assembly, Gibson assembly, or any other synthesis method known in the art. In some embodiments, the editing cassette or recorder cassette is synthesized, for example by array-based synthesis, multiplex synthesis, multi-parallel synthesis, PCR assembly,—in vitro assembly, Gibson assembly, or any other synthesis method known in the art. The length of the editing cassette or recorder cassette may be dependent on the method used in obtaining said cassette.
  • An editing cassette can be approximately 50-300 nucleotides, 75-200 nucleotides, or between 80-120 nucleotides in length. In some embodiments, the editing cassette can be any discrete length between 50 nucleotide and 1 Mb.
  • A recorder cassette can be approximately 50-300 nucleotides, 75-200 nucleotides, or between 80-120 nucleotides in length. In some embodiments, the recorder cassette can be any discrete length between 50 nucleotide and 1 Mb.
  • Methods disclosed herein can also involve obtaining editing cassettes and recorder cassettes and constructing a trackable plasmid or vector. Methods of constructing a vector will be known to one ordinary skill in the art and may involve ligating the cassettes into a vector backbone. In some examples, plasmid construction occurs by in vitro DNA assembly methods, oligonucleotide assembly, PCR-based assembly, SLIC, CPEC, or other assembly methods well known in the art. In some embodiments, the cassettes or a subset (pool) of the cassettes can be amplified prior to construction of the vector, for example by PCR.
  • The cell or population of cells comprising a polynucleotide encoding a nucleic acid-guided nuclease can be maintained or cultured under conditions in which the nuclease is expressed. Nucleic acid-guided nuclease expression can be controlled or can be constitutively on. The methods described herein can involve maintaining cells under conditions in which nuclease expression is activated, resulting in production of the nuclease, for example, Cas9, Cpf1, MAD2, or MAD7. Specific conditions under which the nucleic acid-guided nuclease is expressed can depend on factors, such as the nature of the promoter used to regulate expression of the nuclease. Nucleic acid-guided nuclease expression can be induced in the presence of an inducer molecule, such as arabinose. When the cell or population of cells comprising nucleic acid-guided nuclease encoding DNA are in the presence of the inducer molecule, expression of the nuclease can occur. CRISPR-nuclease expression can be repressed in the presence of a repressor molecule. When the cell or population of cells comprising nucleic acid-guided nuclease encoding DNA are in the absence of a molecule that represses expression of the nuclease, expression of the nuclease can occur.
  • Cells or the population of cells that remain viable can be obtained or separated from the cells that undergo unedited cell death as a result of nucleic acid-guided nuclease-mediated killing; this can be done, for example, by spreading the population of cells on culture surface, allowing growth of the viable cells, which are then available for assessment.
  • Disclosed herein are methods for the identification of the mutation without the need to sequence the genome or large portions of the genome of the cell. The methods can involve sequencing of the editing cassette, recorder cassette, or barcode to identify the mutation of one of more codon. Sequencing of the editing cassette can be performed as a component of the vector or after its separation from the vector and, optionally, amplification. Sequencing can be performed using any sequencing method known in the art, such as by Sanger sequencing or next-generation sequencing methods.
  • Some methods described herein can be carried out in any type of cell in which a targetable nuclease system can function, or target and cleave DNA, including prokaryotic and eukaryotic cells. In some embodiments, the cell is a bacterial cell, such as Escherichia spp., e.g., E. coli. In other embodiments, the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp. In other embodiments, the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
  • A “vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to or expressed in a cell. A desired sequence can be included in a vector, such as by restriction and ligation or by recombination or assembly methods know in the art. Vectors are typically composed of DNA, although RNA vectors are also available. Vectors include, but are not limited to plasmids, fosmids, phagemids, virus genomes, artificial chromosomes, and synthetic nucleic acid molecules.
  • Vectors useful in the methods disclosed herein can comprise at least one editing cassette as described herein, at least one gene encoding a gRNA, and optionally a promoter and/or a barcode. More than one editing cassette can be included on the vector, for example 2, 3, 4, 5, 6, 7, 8, 9, 10 or more editing cassettes. The more than one editing cassettes can be designed to target different target regions, for example, there could be different editing cassettes, each of which contains at least one region homologous with a different target region. In other examples, each editing cassette target the same target region while each editing cassette comprises a different desired mutation relative to the target region. In other examples, the plurality of editing cassettes can comprise a combination of editing cassettes targeting the same target region and editing cassettes targeting different target regions. Each editing cassette can comprise an identifying barcode. Alternatively or additionally, the vector can include one or more genes encoding more than one gRNA, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10 or more gRNAs. The more than one gRNAs can contain regions that are complementary to a portion of different target regions, for example, if there are different gRNAs, each of which can be complementary to a portion of a different target region. In other examples, the more than one gRNA can each target the same target region. In other examples, the more than one gRNA can be a combination of gRNAs targeting the same and different target regions.
  • A cassette comprising a gene encoding a portion of a guide nucleic acid, can be ligated or assembled into a vector that encodes another portion of a guide nucleic acid. Upon ligation or assembly, the portion of the guide nucleic acid from the cassette and the other portion of the guide nucleic acid can form a functional guide nucleic acid. A promoter and a gene encoding a guide nucleic acid can be operably linked.
  • In some embodiments, the methods involve introduction of a second vector encoding a nucleic acid-guided nuclease, such as Cas9, Cpf1, MAD2, or MAD7. The vector may further comprise one or more promoters operably linked to a gene encoding the nucleic acid-guided nuclease.
  • As used herein, “operably” linked can mean the promoter affects or regulates transcription of the DNA encoding a gene, such as the gene encoding the gRNA or the gene encoding a CRISPR nuclease.
  • A promoter can be a native promoter such as a promoter present in the cell into which the vector is introduced. A promoter can be an inducible or repressible promoter, for example, the promoter can be regulated allowing for inducible or repressible transcription of a gene, such as the gene encoding the guide nucleic acid or the gene encoding a nucleic acid-guided nuclease. Such promoters that are regulated by the presence or absence of a molecule can be referred to as an inducer or a repressor, respectively. The nature of the promoter needed for expression of the guide nucleic acid or nucleic acid-guided nuclease can vary based on the species or cell type and can be recognized by one of ordinary skill in the art.
  • A separate vector encoding a nucleic acid-guided nuclease can be introduced into a cell or population of cells before or at the same time as introduction of a trackable plasmid as disclosed herein. The gene encoding a nucleic acid-guided nuclease can be integrated into the genome of the cell or population of cells, or the gene can be maintained episomally. The nucleic acid-guided nuclease-encoding DNA can be integrated into the cellular genome before introduction of the trackable plasmid, or after introduction of the trackable plasmid. In some examples, a nucleic acid molecule, such as DNA-encoding a nucleic acid-guided nuclease, can be expressed from DNA integrated into the genome. In some embodiments, a gene encoding Cas9, Cpf1, MAD2, or MAD7 is integrated into the genome of the cell.
  • Vectors or cassettes useful in the methods described herein can further comprise two or more priming sites. In some embodiments, the presence of flanking priming sites allows amplification of the vector or cassette.
  • In some embodiments, a cassette or vector encodes a nucleic acid-guided nuclease comprising one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the engineered nuclease comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In a preferred embodiment of the invention, the engineered nuclease comprises at most 6 NLSs. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 111); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:112)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO:113) or RQRRNELKRSP (SEQ ID NO:114); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 115); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 116) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:117) and PPKKARED (SEQ ID NO:11 of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:119) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO:120) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO:121) and PKQKKRK (SEQ ID NO:122) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO:123) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 124) of the mouse M×1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 125) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 126) of the steroid hormone receptors (human) glucocorticoid.
  • In general, the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g. assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLSs.
  • ProSAR
  • Methods disclosed herein are capable of engineering a few to hundreds of genetic sequence or proteins simultaneously. These methods can permit one to map in a single experiment many or all possible residue changes over a collection of desired proteins onto a trait of interest, as part of individual proteins of interest or as part of a pathway. This approach can be used at least for the following by mapping i) any number of residue changes for any number of proteins of interest in a specific biochemical pathway or that catalyze similar reactions or ii) any number of residues in the regulatory sites of any number of proteins or interest with a specific regulon or iii) any number of residues of a biological agent used to treat a health condition.
  • In some embodiments, methods described herein include identifying genetic variations of one or more target genes that affect any number or residues, such as one or more, or all residues of one or more target proteins. In accordance with these embodiments, compositions and methods disclosed herein permit parallel analysis of two or more target proteins or proteins that contribute to a trait. Parallel analysis of multiple proteins by a single experiment described can facilitate identification, modification and design of superior systems for example for producing a eukaryotic or prokaryotic byproduct, producing a eukaryotic byproduct, for example, a biological agent such as a growth factor or antibody, in a prokaryotic organism and the like. Relevant biologics used in analysis and treatment of disease can be produced in these genetically engineered environments that could reduce production time, increase quality all while reducing costs to the manufacturers and the consumers.
  • Some embodiments disclosed herein comprise constructs of use for studying genetic variations of a gene or gene segment wherein the gene or gene segment is capable of generating a protein. A construct can be generated for any number of residues, such as one, two, more than two, or all residue modifications of a target protein that is linked to a trackable agent such as a barcode. A barcode indicative of a genetic variation of a gene of a target protein can be located outside of the open reading frame of the gene. In some embodiments such a barcode can be located many hundreds or thousands of bases away from the gene. It is contemplated herein that these methods can be performed in vivo. In some examples, such a construct comprises a trackable polynucleic acid or plasmid as disclosed herein.
  • Constructs described herein can be used to compile a comprehensive library of genetic variations encompassing all residue changes of one target protein, more than one target protein or target proteins that contribute to a trait. In certain embodiments, libraries disclosed herein can be used to select proteins with improved qualities to create an improved single or multiple protein system for example for producing a byproduct, such as a chemical, biofuels, biological agent, pharmaceutical agent, or for biomass, or biologic compared to a non-selective system.
  • Protein Sequence Activity Relationship (ProSAR) Mapping
  • Understanding the relationship between a protein's amino acid structure and its overall function continues to be of great practical, clinical, and scientific significance for biologists and engineers. Directed evolution can be a powerful engineering and discovery tool, but the random and often combinatorial nature of mutations makes their individual impacts difficult to quantify and thus challenges further engineering. More systematic analysis of contributions of individual residues or saturation mutagenesis remains labor- and time-intensive for entire proteins and simply is not possible on reasonable timescales for multiple proteins in parallel, such as metabolic pathways or multi-protein complexes, using standard methods.
  • Provided herein are methods which can be used to rapidly and efficiently examine the roles of some or all genes in a viral, microbial, or eukaryotic genome using mixtures of barcoded oligonucleotides. In some embodiments, these compositions and methods can be used to develop a powerful new technology for comprehensively mapping protein structure-activity relationships (ProSAR).
  • Using methods and compositions disclosed herein, multiplex cassette synthesis can be combined with recombineering, to create mutant libraries of specifically designed and barcoded mutations along one or more genes of interest in parallel. Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of protein sequence-activity relationships (ProSAR). In some embodiments, systematic ProSAR mapping can elucidate individual amino acid mutations for improved function and/or activity and/or stability etc.
  • Methods can be iterated to combinatorially improve the function, activity, or stability. Cassettes can be generated by oligonucleotide synthesis. Given that existing capabilities of multiplex oligonucleotide synthesis can reach over 120,000 oligonucleotides per array, combined with recombineering, the methods disclosed herein can be scaled to construct mutant libraries for dozens to hundreds of proteins in a single experiment. In some examples, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, or more proteins can be partially or completely covered by mutant libraries generated by the methods disclosed herein.
  • Disclosed herein are strategies to construct barcoded substitution libraries for several different proteins at the same time. Using existing multiplex DNA synthesis technology, as disclosed, a partial or complete substitution library for one or more protein constructs can be barcoded, or non-barcoded if desired, for one or for several hundred proteins at the same time. In some examples, such libraries comprise trackable plasmids as disclosed herein.
  • Some embodiments herein apply to analysis and structure/function/stability library construction of any protein with a corresponding screen or selection for activity. Cassette library size can depend on the number (N) of amino acids in a protein of interest, with a full saturation library, including all 20 amino acids at each position and optionally non-naturally occurring amino acids, scaling as 19 (or more)×N and an alanine-mapping library scaling as 1×N. Thus, in some examples, screening of even very large proteins of more than 1,000 amino acids can be tractable given current multiplex oligo synthesis capabilities of at least 120,000 oligos per array.
  • In addition or as an alternative to activity screens, more general properties with developed high-throughput screens and selections can be efficiently tested using methods and cassettes disclosed herein. For example, universal protein folding and solubility reporters can be engineered for expression in the cytoplasm, periplasm, and the inner membrane. In some examples, a protein library can be screened under different conditions such as different temperatures, different substrates or co-factors, in order to identify residue changes required for expression of various traits. In other embodiments, because residues can be analyzed one at a time, mutations at residues important for a particular trait, such as thermostability, resistance to environmental pressures, or increases or decreases in functionality or production, can be combined via multiplex recombineering with mutations important for various other traits, such as catalytic activity, to create combinatorial libraries for multi-trait optimization.
  • Methods disclosed herein can provide for creating and/or evaluating comprehensive, in vivo, mutational libraries of one or more target protein(s). These approaches can be extended via a recorder cassettes or barcoding technology to generate trackable mutational libraries for any number of residues or every residue in a protein. This approach can be based on protein sequence-activity relationship mapping method extended to work in vivo, capable of working on one or a few to hundreds of proteins simultaneously depending on the technology selected. For example, these methods permit one to map in a single experiment any number of, the majority of, or all possible residue changes over a collection of desired proteins onto a trait of interest, as part of individual proteins of interest or as part of a pathway.
  • In some examples, these approaches can be used at least for the following by mapping i) any number of or all residue changes for any number of or all proteins in a specific biochemical pathway, such as lycopene production, or that catalyze similar reactions, such as dehydrogenases or other enzymes of a pathway of use to produce a desired effect or produce a product, or ii) any number of or all residues in the regulatory sites of any number of or all proteins with a specific regulatory mechanism, such as heat shock response, or iii) any number of or all residues of a biological agent used to treat a health condition, such as insulin, a growth factor (HCG), an anti-cancer biologic, or a replacement protein for a deficient population.
  • Scores related to various input parameters can be assigned in order to generate one or more composite score(s) for designing genomically-engineered organisms or systems. These scores can reflect quality of genetic variations in genes or genetic loci as they relate to selection of an organism or design of an organism for a predetermined production, trait or traits. Certain organisms or systems can be designed based on a need for improved organisms for biorefining, biomass, such as crops, trees, grasses, crop residues, or forest residues, biofuel production, and using biological conversion, fermentation, chemical conversion and catalysis to generate and use compounds, biopharmaceutical production and biologic production. In certain embodiments, this can be accomplished by modulating growth or production of microorganism through genetic manipulation methods disclosed herein.
  • Genetic manipulation by methods disclosed herein of genes encoding a protein can be used to make desired genetic changes that can result in desired phenotypes and can be accomplished through numerous techniques including but not limited to, i) introduction of new genetic material, ii) genetic insertion, disruption or removal of existing genetic material, as well as, iii) mutation of genetic material, such as a point mutation, or any combinations of i, ii, and iii, that results in desired genetic changes with desired phenotypic changes. Mutations can be directed or random, in addition to those including, but not limited to, error prone or directed mutagenesis through PCR, mutator strains, and random mutagenesis. Mutations can be incorporated using trackable plasmids and methods as disclosed herein.
  • Disclosed methods can be used for inserting and accumulating higher order modifications into a microorganism's genome or a target protein; for example, multiple different site-specified mutations in the same genome, at high efficiency to generate libraries of genomes with over 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, or more targeted modifications are described. In some examples, these mutations are within regulatory modules, regulatory elements, protein-coding regions, or non-coding regions. Protein coding modifications can include, but are not limited to, amino acid changes, codon optimization, and translation tuning.
  • In some instances, methods are provided for the co-delivery of reagents to a single biological cell. The methods generally involve the attachment or linkage of two or more cassettes, followed by delivery of the linked cassettes to a single cell. Generally, the methods provided herein involve the delivery of two or more cassettes to a single cell. In many cases, it is desirable that each individual cell receives the two or more cassettes. Traditional methods of reagent delivery may often be inefficient and/or inconsistent, leading to situations in which some cells receive only one of the cassettes. The methods provided herein may improve the efficiency and/or consistency of reagent delivery, such that a majority of cells in a cell population each receive the two or more cassettes. For example, more than 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% of the cells in a cell population may receive the two or more cassettes.
  • The two or more cassettes may be linked by any known method in the art and generally the method chosen will be commensurate with the chemistry of the cassettes. Generally, the two or more cassettes are linked by a covalent bond (i.e., covalently-linked), however, other types of non-covalent chemical bonds are envisioned, such as hydrogen bonds, ionic bonds, and metallic bonds. In this way, the editing cassette and the recorder cassette may be linked and delivered into a single cell. A known edit is then associated with a known recorder or barcode sequence for that cell.
  • In one example, the two or more cassettes are nucleic acids, such as two or more nucleic acids. The nucleic acids may be RNA, DNA, or a combination of both, and may contain any number of chemically-modified nucleotides or nucleotide analogues. In some cases, two or more RNA cassettes are linked for delivery to a single cell. In other cases, two or more DNA cassettes are linked for delivery to a single cell. In yet other cases, a DNA cassettes and an RNA cassettes are linked for delivery to a single cell. The nucleic acids may be derived from genomic RNA, complementary DNA (cDNA), or chemically or enzymatically synthesized DNA.
  • A cassettes may be of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, about 110, about 120, about 130, about 140, about 150, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 275, about 300, about 325, about 350, about 375, about 400, about 425, about 450, about 475, about 500, about 525, about 550, about 575, about 600, about 625, about 650, about 675, about 700, about 725, about 750, about 775, about 800, about 825, about 850, about 875, about 900, about 925, about 950, about 975, about 1000, about 1100, about 1200, about 1300, about 1400, about 1500, about 1750, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, about 10,000 or greater nucleotide residues in length, up to a full length protein encoding or regulatory genetic element.
  • Two or more cassettes may be linked on a linear nucleic acid molecule or may be linked on a plasmid or circular nucleic acid molecule. The two or more cassettes may be linked directly to one another or may be separated by one or more nucleotide spacers or linkers.
  • Two or more cassettes may be covalently linked on a linear cassettes or may be covalently linked on a plasmid or circular nucleic acid molecule. The two or more cassettes may be covalently linked directly to one another or may be separated by one or more nucleotide spacers or linkers.
  • Any number and variety of cassettes may be linked for co-delivery. For example, the two or more cassettes may include nucleic acids, lipids, proteins, peptides, small molecules, or any combination thereof. The two or more cassettes may be essentially any cassettes that are amenable to linkage.
  • In preferred examples, the two or more cassettes are covalently linked (e.g., by a chemical bond). Covalent linkage may help to ensure that the two or more cassettes are co-delivered to a single cell. Generally, the two or more cassettes are covalently linked prior to delivery to a cell. Any method of covalently linking two or more molecules may be utilized, and it should be understood that the methods used will be at least partly determined by the types of cassettes to be linked.
  • In some instances, methods are provided for the co-delivery of reagents to a single biological cell. The methods generally involve the covalent attachment or linkage of two or more cassettes, followed by delivery of the covalently-linked cassettes into a single cell. The methods provided may help to ensure that an individual cell receives the two or more cassettes. Any known method of reagent delivery may be utilized to deliver the linked cassettes to a cell and will at least partly depend on the chemistry of the cassettes to be delivered. Non-limiting examples of reagent delivery methods may include: transformation, lipofection, electroporation, transfection, nanoparticles, and the like.
  • In various embodiments, cassettes, or isolated, donor, or editing nucleic acids may be introduced to a cell or microorganism to alter or modulate an aspect of the cell or microorganism, for example survival or growth of the microorganism as disclosed herein. The isolated nucleic acid may be derived from genomic RNA, complementary DNA (cDNA), chemically or enzymatically synthesized DNA. Additionally or alternatively, isolated nucleic acids may be of use for capture probes, primers, labeled detection oligonucleotides, or fragments for DNA assembly.
  • A “nucleic acid” can include single-stranded and/or double-stranded molecules, as well as DNA, RNA, chemically modified nucleic acids and nucleic acid analogs. It is contemplated that a nucleic acid may be of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, about 110, about 120, about 130, about 140, about 150, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 275, about 300, about 325, about 350, about 375, about 400, about 425, about 450, about 475, about 500, about 525, about 550, about 575, about 600, about 625, about 650, about 675, about 700, about 725, about 750, about 775, about 800, about 825, about 850, about 875, about 900, about 925, about 950, about 975, about 1000, about 1100, about 1200, about 1300, about 1400, about 1500, about 1750, about 2000, about 2500, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000, about 10,000 or greater nucleotide residues in length, up to a full length protein encoding or regulatory genetic element.
  • Isolated nucleic acids may be made by any method known in the art, for example using standard recombinant methods, assembly methods, synthetic techniques, or combinations thereof. In some embodiments, the nucleic acids may be cloned, amplified, assembled, or otherwise constructed.
  • The nucleic acids may conveniently comprise sequences in addition to a portion of a lysine riboswitch. For example, a multi-cloning site comprising one or more endonuclease restriction sites may be added. A nucleic acid may be attached to a vector, adapter, or linker for cloning of a nucleic acid. Additional sequences may be added to such cloning and sequences to optimize their function, to aid in isolation of the nucleic acid, or to improve the introduction of the nucleic acid into a cell. Use of cloning vectors, expression vectors, adapters, and linkers is well known in the art.
  • Isolated nucleic acids may be obtained from cellular, bacterial, or other sources using any number of cloning methodologies known in the art. In some embodiments, oligonucleotide probes which selectively hybridize, under stringent conditions, to other oligonucleotides or to the nucleic acids of an organism or cell. Methods for construction of nucleic acid libraries are known and any such known methods may be used.
  • Cellular genomic DNA, RNA, or cDNA may be screened for the presence of an identified genetic element of interest using a probe based upon one or more sequences. Various degrees of stringency of hybridization may be employed in the assay.
  • High stringency conditions for nucleic acid hybridization are well known in the art. For example, conditions may comprise low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0.15 M NaCl at temperatures of about 50° C. to about 70° C. It is understood that the temperature and ionic strength of a desired stringency are determined in part by the length of the particular nucleic acid(s), the length and nucleotide content of the target sequence(s), the charge composition of the nucleic acid(s), and by the presence or concentration of formamide, tetramethylammonium chloride or other solvent(s) in a hybridization mixture. Nucleic acids may be completely complementary to a target sequence or may exhibit one or more mismatches.
  • Nucleic acids of interest may also be amplified using a variety of known amplification techniques. For instance, polymerase chain reaction (PCR) technology may be used to amplify target sequences directly from DNA, RNA, or cDNA. PCR and other in vitro amplification methods may also be useful, for example, to clone nucleic acid sequences, to make nucleic acids to use as probes for detecting the presence of a target nucleic acid in samples, for nucleic acid sequencing, or for other purposes.
  • Isolated nucleic acids may be prepared by direct chemical synthesis by methods such as the phosphotriester method, or using an automated synthesizer. Chemical synthesis generally produces a single stranded oligonucleotide. This may be converted into double stranded DNA by hybridization with a complementary sequence or by polymerization with a DNA polymerase using the single strand as a template.
  • Any method known in the art for identifying, isolating, purifying, using and assaying activities of target proteins contemplated herein are contemplated. Target proteins contemplated herein include protein agents used to treat a human condition or to regulate processes (e.g. part of a pathway such as an enzyme) involved in disease of a human or non-human mammal. Any method known for selection and production of antibodies or antibody fragments is also contemplated. Additionally or alternatively, target proteins can be proteins or enzymes involved in a pathway or process in a virus, cell, or organism.
  • Targetable Nucleic Acid Cleavage Systems
  • Some methods disclosed herein comprise targeting cleavage of specific nucleic acid sequences using a site-specific, targetable, and/or engineered nuclease or nuclease system. Such nucleases can create double-stranded break (DSBs) at desired locations in a genome or nucleic acid molecule. In other examples, a nuclease can create a single strand break. In some cases, two nucleases are used, each of which generates a single strand break.
  • The one or more double or single strand break can be repaired by natural processes of homologous recombination (HR) and non-homologous end-joining (NHEJ) using the cell's endogenous machinery. Additionally or alternatively, endogenous or heterologous recombination machinery can be used to repair the induced break or breaks.
  • Engineered nucleases such as zinc finger nucleases (ZFNs), Transcription Activator-Like Effector Nucleases (TALENs), engineered homing endonucleases, and RNA or DNA guided endonucleases, such as CRISPR/Cas such as Cas9 or CPF1, and/or Argonaute systems, are particularly appropriate to carry out some of the methods of the present invention. Additionally or alternatively, RNA targeting systems can use used, such as CRISPR/Cas systems including c2c2 nucleases.
  • Methods disclosed herein can comprise cleaving a target nucleic acid using a CRISPR systems, such as a Type I, Type II, Type III, Type IV, Type V, or Type VI CRISPR system. CRISPR/Cas systems can be multi-protein systems or single effector protein systems. Multi-protein, or Class 1, CRISPR systems include Type I, Type III, and Type IV systems. Alternatively, Class 2 systems include a single effector molecule and include Type II, Type VI, and Type VI.
  • CRISPR systems used in methods disclosed herein can comprise a single or multiple effector proteins. An effector protein can comprise one or multiple nuclease domains. An effector protein can target DNA or RNA, and the DNA or RNA may be single stranded or double stranded. Effector proteins can generate double strand or single strand breaks. Effector proteins can comprise mutations in a nuclease domain thereby generating a nickase protein. Effector proteins can comprise mutations in one or more nuclease domains, thereby generating a catalytically dead nuclease that is able to bind but not cleave a target sequence. CRISPR systems can comprise a single or multiple guiding RNAs. The gRNA can comprise a crRNA. The gRNA can comprise a chimeric RNA with crRNA and tracrRNA sequences. The gRNA can comprise a separate crRNA and tracrRNA. Target nucleic acid sequences can comprise a protospacer adjacent motif (PAM) or a protospacer flanking site (PFS). The PAM or PFS may be 3′ or 5′ of the target or protospacer site. Cleavage of a target sequence may generate blunt ends, 3′ overhangs, or 5′ overhangs.
  • A gRNA can comprise a spacer sequence. Spacer sequences can be complementary to target sequences or protospacer sequences. Spacer sequences can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 nucleotides in length. In some examples, the spacer sequence can be less than 10 or more than 36 nucleotides in length.
  • A gRNA can comprise a repeat sequence. In some cases, the repeat sequence is part of a double stranded portion of the gRNA. A repeat sequence can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. In some examples, the spacer sequence can be less than 10 or more than 50 nucleotides in length.
  • A gRNA can comprise one or more synthetic nucleotides, non-naturally occurring nucleotides, nucleotides with a modification, deoxyribonucleotide, or any combination thereof. Additionally or alternatively, a gRNA may comprise a hairpin, linker region, single stranded region, double stranded region, or any combination thereof. Additionally or alternatively, a gRNA may comprise a signaling or reporter molecule.
  • A CRISPR nuclease can be endogenously or recombinantly expressed within a cell. A CRISPR nuclease can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome. A CRISPR nuclease can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide. In such examples, polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • gRNAs can be encoded by genetic or episomal DNA within a cell. In some examples, gRNAs can be provided or delivered to a cell expressing a CRISPR nuclease. gRNAs can be provided or delivered concomitantly with a CRISPR nuclease or sequentially. Guide RNAs can be chemically synthesized, in vitro transcribed, or otherwise generated using standard RNA generation techniques known in the art.
  • A CRISPR system can be a Type II CRISPR system, for example a Cas9 system. The Type II nuclease can comprise a single effector protein, which, in some cases, comprises a RuvC and HNH nuclease domains. In some cases a functional Type II nuclease can comprise two or more polypeptides, each of which comprises a nuclease domain or fragment thereof. The target nucleic acid sequences can comprise a 3′ protospacer adjacent motif (PAM). In some examples, the PAM may be 5′ of the target nucleic acid. Guide RNAs (gRNA) can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences. Alternatively, the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA. The Type II nuclease can generate a double strand break, which is some cases creates two blunt ends. In some cases, the Type II CRISPR nuclease is engineered to be a nickase such that the nuclease only generates a single strand break. In such cases, two distinct nucleic acid sequences can be targeted by gRNAs such that two single strand breaks are generated by the nickase. In some examples, the two single strand breaks effectively create a double strand break. In some cases where a Type II nickase is used to generate two single strand breaks, the resulting nucleic acid free ends can either be blunt, have a 3′ overhang, or a 5′ overhang. In some examples, a Type II nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave. For example, a Type II nuclease could have mutations in both the RuvC and HNH domains, thereby rendering the both nuclease domains non-functional. A Type II CRISPR system can be one of three sub-types, namely Type II-A, Type II-B, or Type II-C.
  • A CRISPR system can be a Type V CRISPR system, for example a Cpf1, C2c1, or C2c3 system. The Type V nuclease can comprise a single effector protein, which in some cases comprises a single RuvC nuclease domain. In other cases, a function Type V nuclease comprises a RuvC domain split between two or more polypeptides. In such cases, the target nucleic acid sequences can comprise a 5′ PAM or 3′ PAM. Guide RNAs (gRNA) can comprise a single gRNA or single crRNA, such as can be the case with Cpf1. In some cases, a tracrRNA is not needed. In other examples, such as when C2c1 is used, a gRNA can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences or the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA. The Type V CRISPR nuclease can generate a double strand break, which in some cases generates a 5′ overhang. In some cases, the Type V CRISPR nuclease is engineered to be a nickase such that the nuclease only generates a single strand break. In such cases, two distinct nucleic acid sequences can be targeted by gRNAs such that two single strand breaks are generated by the nickase. In some examples, the two single strand breaks effectively create a double strand break. In some cases where a Type V nickase is used to generate two single strand breaks, the resulting nucleic acid free ends can either be blunt, have a 3′ overhang, or a 5′ overhang. In some examples, a Type V nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave. For example, a Type V nuclease could have mutations a RuvC domain, thereby rendering the nuclease domain non-functional.
  • A CRISPR system can be a Type VI CRISPR system, for example a C2c2 system. A Type VI nuclease can comprise a HEPN domain. In some examples, the Type VI nuclease comprises two or more polypeptides, each of which comprises a HEPN nuclease domain or fragment thereof. In such cases, the target nucleic acid sequences can by RNA, such as single stranded RNA. When using Type VI CRISPR system, a target nucleic acid can comprise a protospacer flanking site (PFS). The PFS may be 3′ or 5′ or the target or protospacer sequence. Guide RNAs (gRNA) can comprise a single gRNA or single crRNA. In some cases, a tracrRNA is not needed. In other examples, a gRNA can comprise a single chimeric gRNA, which contains both crRNA and tracrRNA sequences or the gRNA can comprise a set of two RNAs, for example a crRNA and a tracrRNA. In some examples, a Type VI nuclease may be catalytically dead such that it binds to a target sequence, but does not cleave. For example, a Type VI nuclease could have mutations in a HEPN domain, thereby rendering the nuclease domains non-functional.
  • Non-limiting examples of suitable nucleases, including nucleic acid-guided nucleases, for use in the present disclosure include C2c1, C2c2, C2c3, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Cpf1, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx100, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologues thereof, orthologues thereof, or modified versions thereof. Suitable nucleic acid-guided nucleases can be from an organism from a genus which includes but is not limited to Thiomicrospira, Succinivibrio, Candidatus, Porphyromonas, Acidomonococcus, Prevotella, Smithella, Moraxella, Synergistes, Francisella, Leptospira, Catenibacterium, Kandleria, Clostridium, Dorea, Coprococcus, Enterococcus, Fructobacillus, Weissella, Pediococcus, Corynebacter, Sutterella, Legionella, Treponema, Roseburia, Filifactor, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola, Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Parvibaculum, Staphylococcus, Nitratifractor, Mycoplasma, Alicyclobacillus, Brevibacilus, Bacillus, Bacteroidetes, Brevibacilus, Carnobacterium, Clostridiaridium, Clostridium, Desulfonatronum, Desulfovibrio, Helcococcus, Leptotrichia, Listeria, Methanomethyophilus, Methylobacterium, Opitutaceae, Paludibacter, Rhodobacter, Sphaerochaeta, Tuberibacillus, and Campylobacter. Species of organism of such a genus can be as otherwise herein discussed. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a kingdom, which includes but is not limited to Firmicute, Actinobacteria, Bacteroidetes, Proteobacteria, Spirochates, and Tenericutes. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a phylum which includes but is not limited to Erysipelotrichia, Clostridia, Bacilli, Actinobacteria, Bacteroidetes, Flavobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteob acteria, Deltaproteob acteria, Epsilonproteobacteria, Spirochaetes, and Mollicutes. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within an order which includes but is not limited to Clostridiales, Lactobacillales, Actinomycetales, Bacteroidales, Flavobacteriales, Rhizobiales, Rhodospirillales, Burkholderiales, Neisseriales, Legionellales, Nautiliales, Campylobacterales, Spirochaetales, Mycoplasmatales, and Thiotrichales. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a family which includes but is not limited to Lachnospiraceae, Enterococcaceae, Leuconostocaceae, Lactobacillaceae, Streptococcaceae, Peptostreptococcaceae, Staphylococcaceae, Eubacteriaceae, Corynebacterineae, Bacteroidaceae, Flavobacterium, Cryomoorphaceae, Rhodobiaceae, Rhodospirillaceae, Acetobacteraceae, Sutterellaceae, Neisseriaceae, Legionellaceae, Nautiliaceae, Campylobacteraceae, Spirochaetaceae, Mycoplasmataceae, Pisciririckettsiaceae, and Francisellaceae.
  • Other nucleic acid-guided nucleases suitable for use in the methods, systems, and compositions of the present disclosure include those derived from an organism such as, but not limited to, Thiomicrospira sp. XS5, Eubacterium rectale, Succinivibrio dextrinosolvens, Candidatus Methanoplasma termitum, Candidatus Methanomethylophilus alvus, Porphyromonas crevioricanis, Flavobacterium branchiophilum, Acidomonococcus sp., Lachnospiraceae bacterium COE1, Prevotella brevis ATCC 19188, Smithella sp. SCADC, Moraxella bovoculi, Synergistes jonesii, Bacteroidetes oral taxon 274, Francisella tularensis, Leptospira inadai serovar Lyme str. 10, Acidomonococcus sp. crystal structure (5B43) S. mutans, S. agalactiae, S. equisimilis, S. sanguinis, S. pneumonia; C. jejuni, C. coli; N. salsuginis, N. tergarcus; S. auricularis, S. carnosus; N. meningitides, N. gonorrhoeae; L. monocytogenes, L. ivanovii; C. botulinum, C. difficile, C. tetani, C. sordellii; Francisella tularensis 1, Prevotella albensis, Lachnospiraceae bacterium MC 2017 1, Butyrivibrio proteoclasticus, Peregrinibacteria bacterium GW2011_GWA2_33_10, Parcubacteria bacterium GW2011_GWC2_44_17, Smithella sp. SCADC, Acidaminococcus sp. BV3L6, Lachnospiraceae bacterium MA2020, Candidatus Methanoplasma termitum, Eubacterium eligens, Moraxella bovoculi 237, Leptospira inadai, Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis 3, Prevotella disiens, Porphyromonas macacae, Catenibacterium sp. CAG:290, Kandleria vitulina, Clostridiales bacterium KA00274, Lachnospiraceae bacterium 3-2, Dorea longicatena, Coprococcus catus GD/7, Enterococcus columbae DSM 7374, Fructobacillus sp. EFB-N1, Weissella halotolerans, Pediococcus acidilactici, Lactobacillus curvatus, Streptococcus pyogenes, Lactobacillus versmoldensis, and Filifactor alocis ATCC 35896.
  • Suitable nucleases for use in any of the methods disclosed herein include, but are not limited to, nucleases having the sequences listed in Table 1, or homologues having at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% sequence identity to any of the nucleases listed in Table 1.
  • TABLE 1
    MAD Amino acid
    nuclease sequence
    MAD1 SEQ ID NO: 1 
    MAD2 SEQ ID NO: 2 
    MAD3 SEQ ID NO: 3 
    MAD4 SEQ ID NO: 4 
    MAD5 SEQ ID NO: 5 
    MAD6 SEQ ID NO: 6 
    MAD7 SEQ ID NO: 7 
    MAD8 SEQ ID NO: 8 
    MAD9 SEQ ID NO: 9 
    MAD10 SEQ ID NO: 10
    MAD11 SEQ ID NO: 11
    MAD12 SEQ ID NO: 12
    MAD13 SEQ ID NO: 13
    MAD14 SEQ ID NO: 14
    MAD15 SEQ ID NO: 15
    MAD16 SEQ ID NO: 16
    MAD17 SEQ ID NO: 17
    MAD18 SEQ ID NO: 18
    MAD19 SEQ ID NO: 19
    MAD20 SEQ ID NO: 20
  • In some methods disclosed herein, Argonaute (Ago) systems can be used to cleave target nucleic acid sequences. Ago protein can be derived from a prokaryote, eukaryote, or archaea. The target nucleic acid may be RNA or DNA. A DNA target may be single stranded or double stranded. In some examples, the target nucleic acid does not require a specific target flanking sequence, such as a sequence equivalent to a protospacer adjacent motif or protospacer flanking sequence. The Ago protein may create a double strand break or single strand break. In some examples, when a Ago protein forms a single strand break, two Ago proteins may be used in combination to generate a double strand break. In some examples, an Ago protein comprises one, two, or more nuclease domains. In some examples, an Ago protein comprises one, two, or more catalytic domains. One or more nuclease or catalytic domains may be mutated in the Ago protein, thereby generating a nickase protein capable of generating single strand breaks. In other examples, mutations in one or more nuclease or catalytic domains of an Ago protein generates a catalytically dead Ago protein that can bind but not cleave a target nucleic acid.
  • Ago proteins can be targeted to target nucleic acid sequences by a guiding nucleic acid. In many examples, the guiding nucleic acid is a guide DNA (gDNA). The gDNA can have a 5′ phosphorylated end. The gDNA can be single stranded or double stranded. Single stranded gDNA can be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length. In some examples, the gDNA can be less than 10 nucleotides in length. In some examples, the gDNA can be more than 50 nucleotides in length.
  • Argonaute-mediated cleavage can generate blunt end, 5′ overhangs, or 3′ overhangs. In some examples, one or more nucleotides are removed from the target site during or following cleavage.
  • Argonaute protein can be endogenously or recombinantly expressed within a cell. Argonaute can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome. Additionally or alternatively, an Argonaute protein can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide. In such examples, polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • Guide DNAs can be provided by genetic or episomal DNA within a cell. In some examples, gDNA are reverse transcribed from RNA or mRNA within a cell. In some examples, gDNAs can be provided or delivered to a cell expressing an Ago protein. Guide DNAs can be provided or delivered concomitantly with an Ago protein or sequentially. Guide DNAs can be chemically synthesized, assembled, or otherwise generated using standard DNA generation techniques known in the art. Guide DNAs can be cleaved, released, or otherwise derived from genomic DNA, episomal DNA molecules, isolated nucleic acid molecules, or any other source of nucleic acid molecules.
  • In some instances, compositions are provided comprising a nuclease such as an nucleic acid-guided nuclease (e.g., Cas9, Cpf1, MAD2, or MAD7) or a DNA-guided nuclease (e.g., Ago), linked to a chromatin-remodeling enzyme. Without wishing to be bound by theory, a nuclease fusion protein as described herein may provide improved accessibility to regions of highly-structured DNA. Non-limiting examples of chromatin-remodeling enzymes that can be linked to a nucleic-acid guided nuclease may include: histone acetyl transferases (HATs), histone deacetylases (HDACs), histone methyltransferases (HMTs), chromatin remodeling complexes, and transcription activator-like (Tal) effector proteins. Histone deacetylases may include HDAC1, HDAC2, HDAC3, HDAC4, HDACS, HDAC6, HDAC7, HDAC8, HDAC9, HDAC10, HDAC11, sirtuin 1, sirtuin 2, sirtuin 3, sirtuin 4, sirtuin 5, sirtuin 6, and sirtuin 7. Histone acetyl transferases may include GCNS, PCAF, Hat1, Elp3, Hpa2, Hpa3, ATF-2, Nut1, Esa1, Sas2, Sas3, Tip60, MOF, MOZ, MORF, HBO1, p300, CBP, SRC-1, ACTR, TIF-2, SRC-3, TAFII250, TFIIIC, Rtt109, and CLOCK. Histone methyltransferases may include ASH1L, DOT1L, EHMT1, EHMT2, EZH1, EZH2, MLL, MLL2, MLL3, MLL4, MLL5, NSD1, PRDM2, SET, SETBP1, SETD1A, SETD1B, SETD2, SETD3, SETD4, SETD5, SETD6, SETD7, SETD8, SETD9, SETDB1, SETDB2, SETMAR, SMYD1, SMYD2, SMYD3, SMYD4, SMYD5, SUV39H1, SUV39H2, SUV420H1, and SUV420H2. Chromatin-remodeling complexes may include SWI/SNF, ISWI, NuRD/Mi-2/CHD, INO80 and SWR1.
  • In some instances, the nuclease is a wild-type nuclease. In other instances, the nuclease is a chimeric engineered nuclease. Chimeric engineered nucleases as disclosed herein can comprise one or more fragments or domains, and the fragments or domains can be of a nuclease, such as nucleic acid-guided nuclease, orthologs of organisms of genuses, species, or other phylogenetic groups disclosed herein; advantageously the fragments are from nuclease orthologs of different species. A chimeric engineered nuclease can be comprised of fragments or domains from at least two different nucleases. A chimeric engineered nuclease can be comprised of fragments or domains from at least two different species. A chimeric engineered nuclease can be comprised of fragments or domains from at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different nucleases or different species. In some cases, more than one fragment or domain from one nuclease or species, wherein the more than one fragment or domain are separated by fragments or domains from a second nuclease or species. In some examples, a chimeric engineered nuclease comprises 2 fragments, each from a different protein or nuclease. In some examples, a chimeric engineered nuclease comprises 3 fragments, each from a different protein or nuclease. In some examples, a chimeric engineered nuclease comprises 4 fragments, each from a different protein or nuclease. In some examples, a chimeric engineered nuclease comprises 5 fragments, each from a different protein or nuclease.
  • Nuclease fusion proteins can be recombinantly expressed within a cell. A nuclease fusion protein can be encoded on a chromosome, extrachromosomally, or on a plasmid, synthetic chromosome, or artificial chromosome. A nuclease and a chromatin-remodeling enzyme may be engineered separately, and then covalently linked, prior to delivery to a cell. A nuclease fusion protein can be provided or delivered to the cell as a polypeptide or mRNA encoding the polypeptide. In such examples, polypeptide or mRNA can be delivered through standard mechanisms known in the art, such as through the use of cell permeable peptides, nanoparticles, or viral particles.
  • Cell-Cycle-Dependent Expression of Targeted Nucleases.
  • In some instances, compositions comprising a cell-cycle-dependent nuclease are provided. A cell-cycle dependent nuclease generally includes a targeted nuclease as described herein linked to an enzyme that leads to degradation of the targeted nuclease during G1 phase of the cell cycle, and expression of the targeted nuclease during G2/M phase of the cell cycle. Such cell-cycle dependent expression may, for example, bias the expression of the nuclease in cells where homology-directed repair (HDR) is most active (e.g., during G2/M phase). In some cases, the nuclease is covalently linked to cell-cycle regulated protein such as one that is actively degraded during G1 phase of the cell cycle and is actively expressed during G2/M phase of the cell cycle. In a non-limiting example, the cell-cycle regulated protein is Geminin. Other non-limiting examples of cell-cycle regulated proteins may include: Cyclin A, Cyclin B, Hsll, Cdc6, Finl, p21 and Skp2.
  • In some instances, the nuclease is a wild-type nuclease.
  • In other instances, the nuclease is a engineered nuclease. Engineered nucleases can be non-naturally occurring.
  • Non-naturally occurring targetable nucleases and non-naturally occurring targetable nuclease systems can address many of these challenges and limitations.
  • Disclosed herein are non-naturally targetable nuclease systems. Such targetable nuclease systems are engineered to address one or more of the challenges described above and can be referred to as engineered nuclease systems. Engineered nuclease systems can comprise one or more of an engineered nuclease, such as an engineered nucleic acid-guided nuclease, an engineered guide nucleic acid, an engineered polynucleotides encoding said nuclease, or an engineered polynucleotides encoding said guide nucleic acid. Engineered nucleases, engineered guide nucleic acids, and engineered polynucleotides encoding the engineered nuclease or engineered guide nucleic acid are not naturally occurring and are not found in nature. It follows that engineered nuclease systems including one or more of these elements are non-naturally occurring.
  • Non-limiting examples of types of engineering that can be done to obtain a non-naturally occurring nuclease system are as follows. Engineering can include codon optimization to facilitate expression or improve expression in a host cell, such as a heterologous host cell. Engineering can reduce the size or molecular weight of the nuclease in order to facilitate expression or delivery. Engineering can alter PAM selection in order to change PAM specificity or to broaden the range of recognized PAMs. Engineering can alter, increase, or decrease stability, processivity, specificity, or efficiency of a targetable nuclease system. Engineering can alter, increase, or decrease protein stability. Engineering can alter, increase, or decrease processivity of nucleic acid scanning. Engineering can alter, increase, or decrease target sequence specificity. Engineering can alter, increase, or decrease nuclease activity. Engineering can alter, increase, or decrease editing efficiency. Engineering can alter, increase, or decrease transformation efficiency. Engineering can alter, increase, or decrease nuclease or guide nucleic acid expression.
  • Examples of non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 41-60), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 127-146), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 147-166), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 61-80), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 21-40) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 1-20), or engineered guide nucleic acids comprising any one of SEQ ID NO: 84-107. Such non-naturally occurring nucleic acid sequences can be amplified, cloned, assembled, synthesized, generated from synthesized oligonucleotides or dNTPs, or otherwise obtained using methods known by those skilled in the art.
  • Additional examples of non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 168), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 169), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 170), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 171), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 167) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 108-110), or engineered guide nucleic acids compatible with any targetable nuclease disclosed herein. Such non-naturally occurring nucleic acid sequences can be amplified, cloned, assembled, synthesized, generated from synthesized oligonucleotides or dNTPs, or otherwise obtained using methods known by those skilled in the art.
  • A guide nucleic acid can be DNA. A guide nucleic acid can be RNA. A guide nucleic acid can comprise both DNA and RNA. A guide nucleic acid can comprise modified of non-naturally occurring nucleotides. In cases where the guide nucleic acid comprises RNA, the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features. Common features can include sequence outside a pseudoknot region. Common features can include a pseudoknot region (e.g., 172-181). Common features can include a primary sequence or secondary structure.
  • A guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence. A guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid. Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • In other instances, the nuclease is a chimeric nuclease. Chimeric nucleases can be engineered nucleases. Chimeric nucleases as disclosed herein can comprise one or more fragments or domains, and the fragments or domains can be of a nuclease, such as nucleic acid-guided nuclease, orthologs of organisms of genuses, species, or other phylogenetic groups; advantageously the fragments are from nuclease orthologs of different species. A chimeric nuclease can be comprised of fragments or domains from at least two different nucleases. A chimeric nuclease can be comprised of fragments or domains from at least two different species. A chimeric nuclease can be comprised of fragments or domains from at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different nucleases or different species. In some cases, more than one fragment or domain from one nuclease or species, wherein the more than one fragment or domain are separated by fragments or domains from a second nuclease or species. In some examples, a chimeric nuclease comprises 2 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 3 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 4 fragments, each from a different protein or nuclease. In some examples, a chimeric nuclease comprises 5 fragments, each from a different protein or nuclease.
  • EXAMPLES Example 1— CREATE-Plasmids and Libraries
  • FIGS. 1A-C depict an example of an overview of CRISPR EnAbled Trackable genome Engineering (CREATE) design and workflow. FIG. 1A shows an example of the CREATE methodology which allows programmatic genome modifications to be focused on key amino acid residues or promoter targets across the genome. Such libraries thus enable systematic assessment of sequence/activity relationships for a wide variety of genomic targets in parallel. FIG. 1B depicts an example of CREATE cassettes designed to encode both homology arm (HA) and guide RNA (gRNA) sequences to target a specific locus in the E. coli genome. The 100 bp homology arm was designed to introduce a specific codon mutation (target codon) that can be selectively enriched by a synonymous PAM mutation to rescue the sequence from Cas9 cleavage and allow highly efficient mutagenesis. The P1 and P2 sites (black) serve as general priming sites allowing multiplexed amplification, cloning and sequencing of many libraries in parallel. The promoter (J23119, green) is a constitutive promoter that drives expression of the gRNA. Detailed example the HA design for introducing a stop codon at residue 145 in the galK locus is also depicted at the bottom of FIG. 1B. The top sequence shows the wildtype genome sequence with the PAM (CCG; the reverse complement of which is CGG, which is recognized by S. pyogenes Cas9) and target codon (TAT, encoding Y) highlighted. The HA design introduces a “silent scar” at the PAM site (CgG, the reverse complement of which is CCG, which is not recognized by S. pyogenes Cas9) and a single nucleotide TAT>TAA mutation at codon 145 (resulting in a STOP). This design strategy was implemented programmatically for coding regions across the genome. FIG. 1C depicts an overview of an example CREATE workflow. CREATE cassettes are synthesized on a microarray delivered as large oligo pools (104 to 106 individual library members). Parallel cloning and recombineering allowed processing of these pools into genomic libraries, in some cases in 23 days. Deep sequencing of the CREATE plasmids can be used to track the fitness of thousands of precision mutations genome wide following selection or screening of the mutant libraries.
  • Example 2— CREATE Plasmid Validation
  • FIG. 2A-D depicts an example of the effect of Cas9 activity on transformation and editing efficiencies. The galK 120/17 CREATE cassette (120 bp HA and 17 bp PAM/codon spacing) targeting codon 145 in galK gene or a control non-targeting gRNA vector was transformed in cells carrying pSIMS along with dCas9 (e.g. left set of bars in FIG. 2A) or Cas9 (e.g. right set of bars in FIG. 2A) plasmids. The pSIMS plasmid carries lambda red recombination machinery. The cas9 gene was cloned into the pBTBX-2 backbone under the control of a pBAD promoter to allow control of the cleavage activity by addition of arabinose. Transformation efficiencies of each vector are shown with dark grey bars. The total number of recombinant cells (light grey bars) were calculated based on red/white colony screening on MacConkey agar. In cases where white colonies were undetectable by plate based screening we assumed 104 editing efficiencies. A 102 fold reduction in transformation efficiency compared to the non-targeting gRNA control was also observed for CREATE cassettes transformed into the Cas9 background.
  • FIG. 2B depicts an example of the characterization of CREATE cassette HA length and PAM/codon spacing on editing efficiency. All cassettes were designed to introduce a TAA stop at codon 145 in the gene using PAMs at the indicated distance (PAM/codon bottom) from the target codon and variable homology arms lengths (HA, bottom). Dark grey and light grey bars correspond to uninduced or induced expression of Cas9 under the pBAD promoter using 0.2% arabinose. In the majority of cases the editing efficiency appears to be unaffected by induction suggesting that low amounts of Cas9 due to leaky expression are sufficient for high efficiency editing.
  • FIG. 2C shows example data from sequencing of the genomic loci from CREATE recombineering reactions. The galK cassettes from FIG. 2B are labeled according to the HA length and PAM codon spacing. The other loci shown were cassettes isolated from multiplexed library cloning reactions. The bar plot (FIG. 2C) indicates the number of times each genotype was observed by genomic colony sequencing following recombineering with each CREATE cassette. The + and labels at the bottom indicate the presence or absence of the designed mutation at the two relevant sites in each clone. The circular inset indicates the relative position of each gene on the E. coli genome.
  • FIG. 2D depicts an example of library coverage from multiplexed cloning of CREATE plasmids. Deep sequencing counts each variant are shown with respect to their position on the genome. The inset shows a histogram of these plasmid counts for the entire library. The distribution follows expected Poisson distribution for low average counts.
  • Example 3— CREATE-Recording Used to Engineer Trackable Episomal DNA Libraries
  • FIG. 3A depicts an example of an overview of the method used to generate a trackable episomal DNA library. Transformation of a CREATE recorder plasmid generates modifications of the target DNA at two sites. One edit occurs to the desired target gene (gray) introducing a codon or promoter mutation designed to test specific engineering objectives. The second edit targets a functionally neutral site and introduces a 15 nucleotide barcode (BC, black). By virtue of coupling these libraries on a single CREATE plasmid the target DNA is edited at both sites and each unique barcode can be used to track edits throughout the rest of the plasmid.
  • FIG. 2B depicts an example of the CREATE barcode design. A degenerate library is constructed from overlapping oligos and cloned in a separate site of the CREATE vector to make a library of CREATE recorder cassettes that can be coupled to the designer editing libraries.
  • FIG. 2C depicts an exemplary CREATE record mapping strategy. Deep sequencing of both the target DNA (left) and CREATE plasmids allows a simple sequence mapping strategy by allowing each editing cassette to be uniquely assigned by the barcode sequence. This allows the relative fitness of each barcode (and thus edit) to be tracked during selection or screening processes and can be shuttled between different organisms using standard vectors.
  • Example 4— CREATE-Mediated Editing of Episomal DNA
  • Methods and compositions disclosed herein were used to mutate a key residue of the cas9 gene used for the CREATE process (e.g. FIG. 4A-4B). A cassette was designed to make an R1335K mutation in the Cas9 protein. This cassette was cloned into a CREATE plasmid and transformed into MG1655 E. coli carrying the pSIM5 and X2-Cas9 vectors. The pSIM5 vector comprises lambda red recombination machinery. The X2-Cas9 vector comprises an arabinose-inducible Cas9 expression cassette. Following three hours recovery in LB supplemented with 0.4% arabinose to induce Cas9 expression, the cells were plated on agar containing antibiotics that maintain selective pressure for replication of both the X2-Cas9 and CREATE plasmids. Colony PCR of random clones revealed the designed edits from the CREATE plasmid were efficiently transferred into the X2-Cas9 plasmid (e.g. FIG. 4B). Of the clones that were sequenced, 100% contained the silent PAM mutation in X2Cas9 and 6/14 (43%) also containing desired coding edit. This is the first demonstration that plasmid based editing using CREATE is robust despite higher copy numbers associated with the plasmid target as compared to previous genome engineering efforts.
  • Example 5—CREATE-Mediated Editing and Tracking of E. coli Genome-Double Cassette
  • To test the performance of the recording strategy in a genomic context we tested the ability to edit two distal genomic loci in the E. coli genome (e.g. FIG. 5A). To do so we cloned CREATE recording cassette libraries designed to embed the 15 nucleotide barcodes into the galK locus. After cloning, we isolated a few unique barcodes and cloned a second editing cassette designed to incorporate an F153R mutation in the dihydrofolate reductase (DHFR)/folA gene that was identified by our previous CREATE studies as conferring tolerance to the antibiotic trimethoprim. Genotyping of E. coli strains following transformation of the dual CREATE recording vector according to previously described protocols yielded the data in FIG. 5A. The efficiency of barcoding (100%) was higher than the target genome edit (80-90%), ensuring that edited genomes can be tracked. Of the transformed population we observed >80% of colonies contained the barcode edit in the galK locus as determined by red white colony screening (e.g. FIG. 5B). From the barcoded colonies we found that 85% of the colonies also encoded the DHFR F153R mutation indicating that we have a strong tracking between the barcode and codon edits. FIG. 5B depicts the total number of colonies (CFUs) in duplicate experiments that are edited and/or barcoded. The edited CFUs numbers were calculated by extrapolation of the data in FIG. 5A to the total number of CFUs on the plate. The barcoded CFUs numbers were calculated by counting the number of white colonies in a galK screening (site in which barcode is integrated). These data show that the majority of barcoded colonies contained the designed genomic edit.
  • Example 6—Plasmid Curing for Combinatorial Engineering
  • FIG. 6 depicts an example of combinatorial genome engineering and tracking. Three recursive CREATE plasmids are used, each with a gRNA targeting one of the other markers in this series (indicated by T-lines). During each transformation, an edit and barcode are incorporated into the genome and the previous CREATE plasmid is cured. In this way rapid iterative transformations can be performed to construct either a defined combination of mutations or a combinatorial library to search for improved phenotypes. The recording site is compatible with short read sequencing technologies that allow the fitness of combinations to be tracked across a population. Such an approach allows rapid investigation of genetic epistasis and optimization of phenotypes relevant to basic research or for commercial biological applications.
  • FIG. 3D and FIG. 3E depict another example of combinatorial genome engineering. With each round of engineering, an editing cassette (blue rectangle in FIG. 3D) is incorporated into the target sequence in the genome (blue star) and a recorder cassette (green rectangle in FIG. 3D) is incorporated into a different target sequence of the genome (green dash in middle panel of FIG. 3D). In this example, each recorder sequence comprises a 15 nucleotide barcode. As shown in the right panel of FIG. 3D, the recorder sequences are each inserted adjacent to the last recorder sequence, despite where the editing cassette was inserted. Each recorder cassette can simultaneously delete a PAM site. After completion of each round of engineering, the engineered cells can be selected and then the inserted mutations can be tracked by sequencing the recorder region that comprises all of the inserted recorder cassettes. By sequencing the starting plasmid library, each editing cassette can be linked or associated with one or more unique barcodes within the recorder cassette. Since each recorder cassette corresponds to the associate editing cassette, then the mutations incorporated by the editing cassettes can be tracked or identified by the sequence of the recorder cassette, or the sequence of the barcodes within the recorder cassette. As is demonstrated in FIG. 3E, by sequencing all of the recorder cassettes or barcodes within the recorder cassettes, each of the inserted mutations can be identified and tracked. The inserted recorder sequences can be referred to as a recorder site, recorder array, or barcode array. As a result, after recursive rounds of engineering, sequencing the barcode array or recorder site allows tracking of the history of genomic editing events in the strain. When the recorder cassettes are inserted in order as depicted, for example, in FIG. 3D, then the barcode array or recorder site can identify the order in which the mutations were inserted as well as what the mutation is.
  • Example 7— Recursive Engineering Using Iterative CREATE-Recording Engineering Events
  • The example of recursive engineering depicted in FIG. 7A was used for plasmid curing to demonstrate that the design is extremely efficient at eliminating previous vectors (FIG. 7B). Each CREATE plasmid can be positively selected for based on the indicated antibiotics (Trimeth: trimethoprim, Carb: carbenicillin, Tet: tetracycline) and contains a gRNA targeting one of the other antibiotic markers. For example, the reCREATE1 plasmid can be selected for on carbenicillin and encodes a gRNA that will selectively target the trimethoprim resistance gene for destruction. One pass through the carb/tetracycline/trimethoprim antibiotic marker series allows selective incorporation of up to three targeted edits. The recording function would be implemented as illustrated in FIG. 5 , but is omitted here for simplicity.
  • FIG. 7B depicts an example of data from iterative rounds of CREATE engineering. A serial transformation series began with cells transformed with X2cas9 (kan) and the reCREATE1 vector. The spot plating results indicate that curing is 99.99% effective at each transformation step, ensuring highly efficient engineering in each round of transformation. Simultaneous genome editing and plasmid curing in each transformation step with high efficiencies was achieved by introducing the requisite recording and editing CREATE cassettes into recursive vectors as disclosed herein (e.g. FIG. 7B).
  • Example 8—CREATE Design and Workflow
  • An example overview of CRISPR EnAbled Trackable genome Engineering (CREATE) design workflow is depicted in FIGS. 8A-8B. FIG. 8A shows example anatomy of a CREATE cassette designed for protein engineering. Cassettes encode a spacer (red) along with part of a guide RNA (gRNA) sequence and a designer homology arm (HA) that can template homologous recombination at the genomic cut site. For protein engineering purposes the HA is designed to systematically couple mutations to a specified codon or target site (TS, blue) to a nearby synonymous PAM mutation (SPM, red) to rescue the sequence from Cas9 cleavage and allow highly efficient mutagenesis. The priming sites (P1 and P2, black) are designed to allow multiplexed amplification and cloning of specific subpools from massively parallel array based synthesis. A constitutive promoter (green) drives expression of the gRNA. FIG. 8A further shows a detailed example of HA design for introducing a stop codon at residue 145 in the galK locus. The top sequence shows is of the wt genome with the PAM and TS codon highlighted. The translation sequences are shown to illustrate that the resulting mutant contains a single nonsynonymous mutation at the target site. FIG. 8B shows an example overview of the CREATE workflow. CREATE oligos are synthesized on a microarray and delivered as large pools (104-106 individual library members). These cassettes are amplified and cloned in multiplex with the ability to subpool designs. After introduction of the CREATE plasmids into cells expressing Cas9 mutations are transferred to the genome with high efficiencies. Measurement of the frequency of each plasmid before (fi, t1) and after selection (fi, t2) by deep sequencing provides enrichment scores (Ei) for each CREATE cassette. These scores allow rapid identification of adaptive variants at up to single nucleotide or amino acid resolution for thousands loci in parallel.
  • Example 9—CREATE Design Validation
  • FIG. 9A depicts an example of the effects of Cas9 activity on transformation and editing efficiencies were measured using no a cassette with a spacer and 120 bp HA targeted to the galK (galK_Y145*_120/17) The total transformants (TT white) produced by this CREATE vector are shown in white and the total number of recombinants (TR) in dark blue. TR is calculated as the product of the editing efficiency and Tt. Asterisks indicate experiments in which recombinants could not be observed by plate based screening. FIG. 9B shows an example of characterization of CREATE cassette HA length and PAM/codon spacing on editing efficiency. All cassettes were designed to introduce a TAA stop at codon 145 in the gene using PAMs at the indicated distance (PAM/codon bottom) from the target codon and variable homology arms lengths (HA, bottom). White and blue bars correspond to uninduced or induced expression of Cas9 under the pBAD promoter using 0.2% arabinose. In the majority of cases the editing efficiency appears to be unaffected by induction suggesting that low amounts of Cas9 due to leaky expression are sufficient for high efficiency editing. FIG. 9C depicts an example of determination of editing efficiency for oligo derived cassettes by sequencing of the genomic loci. The galK_Y145*_120/17 cassette from FIGS. 9A and 9B is shown in white for reference. The bar plot indicates the number of times each genotype was observed by genomic colony sequencing following recombineering with each CREATE cassette. The circular inset indicates the relative position of each gene on the E. coli genome. FIG. 9D depicts distance between SPM and the TS (as exemplified in FIG. 8A) is strongly correlated with editing efficiency (correct edits/total sequences sampled). The galK cassettes with 44 and 59 bp in FIG. 9B were omitted from this analysis. The depicted error bars are derived from N=3 independent replicates of the indicated experiment.
  • Example 10—Scanning Saturation Mutagenesis of an Essential Chromosomal Gene
  • FIG. 10A-10C depict an example where CREATE was used to generate a full scanning saturation mutagenesis library of the folA gene for identification of mutations that can confer resistance to TMP. The count weighted average enrichment score from two trials of selection is plotted as a function of residue position (right). Cassettes encoding nonsynonymous mutations are shown in gray, and those encoding synonymous mutations in black. Cassettes with enrichment scores greater than 1.8 are highlighted in red and mutations that affect previously reported sites are labeled for reference. The dashed lines indicate enrichment values that are significantly different (p<0.05) from the synonymous dataset as determined by bootstrapping of the confidence intervals. These values are shown as a histogram for reference (middle). Mutations that appear to significantly impact DHFR resistance are highlighted as red spheres to the far right. FIGS. 10D-10F depict example growth analysis of wt (left) F153W (middle) and F153R (right) variants in the indicated range of TMP concentrations (shown right).
  • Example 11—Reconstruction of ALE Mutation Set and Forward Engineering of Thermotolerant Genotypes
  • FIG. 11A depicts example genomic plots of enrichment scores for CREATE libraries grown at 42.2° C. in minimal media conditions. The innermost plot illustrates the counts of the plasmid library before selection with labels for the top 20 representatives. The outer ring shows the fitness of pooled library variants after growth in minimal media at elevated temperature (42.2° C.). The bars are colored according to log 2 enrichment. Blue bars represent detrimental mutations, red bars represent significantly enriched mutations and gray bars indicate mutations that appear neutral in this assay. The 20 most enriched variants are labeled for reference and labels corresponding to ALE-derived variants are colored red. FIG. 11B shows a histogram of enrichment scores of all library variants (gray), ALE-derived mutants (red) and synonymous mutants (black) under 42.2° C. growth conditions. The dotted gray line indicates significant enrichment scores compared to the synonymous population. The histograms are normalized as a fraction of the total number of variants passing the counting threshold (number indicated in parentheses). Note that 231 of 251 unique nonsynonmous ALE cassettes sampled by this experiment appear to provide significant growth benefits. FIG. 11C depicts enrichment of mutations based on mutational distance from wt. Mutations that require 2 and 3 nucleotide (nt) transitions are exceedingly rare or absent in ALE approaches however we note that the two most enriched clones from the pooled library selection (targeting the Crp regulator) require two nucleotide substitutions and are highlighted at the far right.
  • Example 12—Genome Scale Mapping of Amino Acid Substitutions for the Study of Antibiotic Resistance and Tolerance
  • FIG. 12A depicts example genomic plots of enrichment (log 2) of library variants in the presence of erythromycin (outer) and rifampicin (middle). The innermost plot illustrates the count distribution of the input plasmids for reference. Coloring and labeling are as in FIG. 11A-11C. FIG. 12B depicts CREATE mutation mapping at the individual amino acid level. CREATE cassettes that introduce bulky side chains to amino acids 1572, S531 and L533 (red) of the RNA polymerase β subunit (rpoB) are highly enriched in the presence of rifampicin from genome wide targeting libraries. FIG. 11C depicts a zoomed in region of the MarA transcription factor bound to its cognate DNA target is shown for reference (PDB ID 1BL0). The wt Q89 residue protrudes away from the DNA binding interface due to unfavorable steric and electrostatic interactions between this side chain and the DNA. The Q89N substitution identified by selection introduces a H-donor and shortens the side chain such that productive H-bonding can occur between this residue and the DNA backbone. Such an interaction likely favors stronger DNA binding and induction of downstream resistance genes. FIG. 12D depicts enrichment plot of genome wide targeting libraries with 10 g/L acetate or 2 g/L furfural respectively. Coloring is the same as in FIG. 11A. FIG. 12E depicts CREATE mapping at a gene level reveals trends at the gene level. Strong enrichment fis metA and fadR targeting mutations in acetate suggests important roles for these genes in acetate tolerance, as depicted in FIG. 12F, same as in the furfural selections depicted in FIG. 12E.
  • Example 13—CREATE-Enabled Flexible Design Strategies
  • Illustration of example designs compatible with CREATE strategy are depicted in FIGS. 13A-13D. FIG. 13A shows protein engineering applications a silent codon approach is taken (top, see also FIG. 8A-8B). This mutation strategy allows targeted mutagenesis of key protein regions to alter features such as DNA binding, protein-protein interactions, catalysis, or allosteric regulation. Above an illustration of a DNA binding saturation mutagenesis library designed for the global transcription factor Fis designed for this study is illustrated. FIG. 13B shows promoter mutations PAM sites in proximity to a specified transcription start site (TSS) can be disrupted through nucleotide replacement or integration cassettes. To simplify this design procedure used in this study consensus CAP or UP elements were designed for integration at a fixed location relative to the TSS without taking into account possible effects of these mutations may have on proximal genes. FIG. 13C shows an example cassette design for mutagenizing a ribosome binding site (RBS). FIG. 13D depicts an example of a simple deletion design. Points a and b are included to illustrate distance between two sites at the gene deletion locus. In all cases cassette designs disrupt a targeted PAM to allow selective enrichment of the designed mutant.
  • Example 14—Engineering the Lycopene Pathway
  • FIGS. 14A-14B depict edits made the DMAPP pathway in E. coli which is the precursor to lycopene. Edits were made to the ORF's for 11 genes. Eight edits were designed to improve activity and 3 edits were designed to reduce activity of competitive enzymes. Approximately 10,000 variants within the lycopene pathway were constructed and screened.
  • Example 15—Cas9 Editing Efficiency Controls
  • FIG. 15 depicts Cas9 editing control experiments. The CREATE galK_120/17 off cassette (relevant edits shown in red at bottom) was transformed into different backgrounds to assess the efficiency of homologous recombination between the CREATE plasmid and the target genome. Red colonies represent unedited (wt) genomic variants and white colonies represent edited variants. Transformation into cells containing only pSIM5 or pSIM5/X2 and dCas9 plasmids exhibited no detectable recombination as indicated by the lack of white colonies. In the presence of active Cas9 (X2-Cas9 far right) we observe high efficiency editing (>80%), indicating the requirements for dsDNA cleavage to achieve high efficiency editing and library coverage.
  • Example 16—Toxicity of gRNA dsDNA Cleavage in E. coli
  • FIGS. 16A-16C depict experiments testing the toxicity of generating double strand breaks in E. coli. The toxicity of a single gRNA cut in E. coli as observed in control experiments with a gRNA targeting galK (spacer sequence TTAACTTTGCGTAACAACGC (SEQ ID NO: 182)) or folA (spacer sequence GTAATTTTGTATAGAATTTA (SEQ ID NO: 183)). In the absence of a repair template we observe strong killing from the gRNA. Rescue efficiencies of 103-104 are observed upon co-transformation of a single stranded donor oligo indicating the need for a homologous repair template to alleviate this toxicity. b) Toxicity of multiple CREATE edits. The targeted sites are illustrated graphically on the left and at the bottom of the bar graph. A non-targeting gRNA control was used to estimate transformation efficiency based on no edits (far left, no target sites). A CREATE cassette targeting either folA (green) or galK (red) or a combination of the two. Note the multiplicative toxicity in E. coli of having additional gRNAs expressed from the same plasmid. In this scenario there is homologous repair for each site suggesting that off-target gRNA cleavage would be highly lethal. These data suggest that off target cleavage by a CREATE cassette would be selectively removed from the population early in the library construction phase.
  • FIGS. 16D-16E depicts data from another such cell survival assay. The editing cassette contained a F153R mutation, which leads to temperature sensitivity of the folA gene. The recorder cassette contained a 15 nucleotide barcode designed to disrupt the galK gene, which allows screening of colonies on MacConkey agar plates. In this example, generating two cuts decreased cell survival compared to generating zero or one cut.
  • FIG. 16F depicts data from a transformation and survival assay comparing a low copy number plasmid (Ec23) expressing Cas9 and a high copy number plasmid (MG) expressing Cas9. Different vectors with distinct editing cassettes were used to target different gene target sites (folA, lacZ, xylA, and rhaA). The recorder cassettes were designed to target different sequences within the galK gene, either site S1, S2, or S3. The recursive vector used had a different vector backbone compared to the others and is part of a 3-vector system designed for iterative engineering that cures the cell of the previous round vector. The data indicates that lower Cas9 expression (Ec23 vector) increases survival and/or transformation efficiency. The decreased Cas9 expression increased transformation efficiency by orders of magnitude in cells undergoing two genomic cuts (editing cassette and recording cassette).
  • FIG. 16G shows the correlation between editing efficiency and recording efficiency in cells transformed with the low copy number plasmid (Ec23) expressing Cas9 and the high copy number plasmid (MG) expressing Cas9. Editing and recording efficiencies were similar for high (MG) and lower (Ec23) expression of cas9. Ec23 yielded more colonies and had better survival (as shown in FIG. 16E), while maintaining a high efficiency of dual editing (editing cassette and recorder cassette incorporation).
  • Example 17—CREATE Strategy for Gene Deletion
  • FIG. 17A-D depict an example CREATE strategy for gene deletion. FIG. 17A depicts an example cassette design for deleting 100 bp from the galK ORF. The HA is designed to recombine with regions of homology with the designated spacing, with each 50 bp side of the CREATE HA designed to recombine at the designated site (blue). The PAM/spacer location (red) is proximal to one of the homology arms and is deleted during recombination, allowing selectable enrichment of the deleted segment. FIG. 17B depicts electrophoresis of chromosomal PCR amplicons from clones recombineered with this cassette. FIG. 17C depicts design for 700 bp deletion as in a). FIG. 17D depicts colony PCR of 700 bp deletion cassettes as in FIG. 17B). The asterisks in FIGS. 17B and 17D indicate colonies that appear to have the designed deletion. Note that some clones appear to have bands pertaining to both wt and deletion sizes indicating that chromosome segregation in some of the colonies is incomplete when plated 3 hrs post recombineering.
  • Example 18—Editing Efficiency Controls by Cotransformation of gRNA and Linear dsDNA Cassettes
  • FIG. 18 depicts effect of PAM distance on editing efficiency using linear dsDNA PCR amplicons and co-transformation with a gRNA. On the left is an illustration of the experiments using PCR amplicons containing a dual (TAATAA) stop codon on one side (asterisk) and a PAM mutation just downstream of the galK gene (gray box) on the other end were co-transformed with a gRNA targeting the downstream galK PAM site. The primers were designed such that the mutations were 40 nt from the end of the amplicon to ensure enough homology for recombination. Data was obtained from these experiments by red/white colony screening. A linear fit to the data is shown at the bottom. Cassettes in which only the PAM mutation is present were included as assay controls were observed to have very low rates of GalK inactivation. These experiments were performed in a BW25113 strain of E. coli in which the mutS gene was knocked out to allow high efficiency editing with double stranded DNA templates. This approach in MG1655 did not achieve high efficiency editing due to the active mutS allele.
  • Example 19—Library Cloning Analysis and Statistics
  • FIG. 19A depicts reads from an example plasmid library following cloning are shown according to the number of total mismatches between the read and the target design sequence. The majority of plasmids are matches to the correct design. However, there are a large number of 4 base pair indel/mismatch mutants that were observed in this cloned population. FIG. 19B depicts a plot of the mutation profile for the plasmid pool as a function of cassette position. An increase in the mutation frequency is observed near the center of the homology arm (HA) indicating a small error bias in the sequencing or synthesis of this region. We suspect that this is due to the presence of sequences complementary to the spacer element in the gRNA. FIG. 19C depicts a histogram of the distances between the PAM and codon for the CREATE cassettes designed in this study. Large majority (>95%) were within the design constraints tested in FIG. 9A-9D. The small fraction that are beyond 60 bp were made in cases where there was no synonymous PAM mutation within closer proximity. FIG. 19D depicts library coverage from multiplexed cloning of CREATE plasmids. Deep sequencing counts each variant are shown with respect to their position on the genome. The inset shows a histogram of the number of variants having the indicated plasmid counts in the cloned libraries.
  • Example 20—Precision of CREATE Cassette Tracking of Recombineered Populations
  • FIG. 20A depicts a correlation plot of CREATE cassette read frequencies in the plasmid population prior to Cas9 exposure (x-axis) and after 3 hours post transformation into a Cas9 background. FIG. 20B depicts a correlation plot between replicate recombineering reactions following overnight recovery. The gray lines indicate the line of perfect correlation for reference. R2 and p values were calculated from a linear fit to the data using the Python SciPy statistics package. A counting threshold of 5 for each replicate experiment was applied to the data to filter out noise from each data set.
  • Example 21—Growth Characteristics of folA Mutations in M9 Minimal Media
  • FIG. 21 depicts growth characteristics of folA mutations in M9 minimal media. While F153R appears to maintain normal growth characteristics the growth rate of the F153W mutation is significantly slower under these conditions, suggesting that these two amino acid substitutions at the same site have very different effects on organismal fitness presumably due to different changes invoked in the stability/dynamics of this protein.
  • Example 22—Enrichment Profiles for folA CREATE Cassettes in Minimal Media
  • FIG. 22 depicts enrichment profiles for folA CREATE cassettes in minimal media. Cassettes that encode synonymous HA are shown in black and non-synonymous cassettes in gray, the dashed lines indicate enrichment scores with p<0.05 significance compared to the synonymous population mean as estimated from a bootstrap analysis. The enrichment score observed for each mutant cassette at each position in the protein sequence is shown to the left and a histogram of these enrichment scores as a fraction of the total variants to the right. The two populations appear to be largely similar. Conserved residues that are highly deleterious are shown in blue for reference.
  • Example 23—Validation of Newly Identified acrB Mutations for Improved Solvent and Antibiotic Tolerance
  • FIG. 23A depicts on the left a global overview of AcrB efflux pump. Substrates enter the pump through the openings in the periplasmic space and are extruded via the AcrB/AcrA/TolC complex across the outer membrane and into the extracellular space. Library targeted residues are highlighted by blue spheres for reference and the red dot indicates the region where many of the enriched variants clustered. On the right is a blow up of the loop-helix motif abutting the central funnel where enriched mutations in isobutanol were identified (red and teal spheres), presumably affecting solute transport from the periplasmic space. Mutants targeting the T60 position (teal spheres) was also enriched in the presence of erythromycin. FIG. 23B depicts confirmation of N70D and D73L mutations for tolerance to isobutanol. The N70D mutation in particular appears to improve the final OD to a significant degree. Reconstructed strains were measured for final OD in capped 1.5 mL eppendorf tubes following 48 hours incubation. Error bars are derived from N=3 trials and p-values derived from a one-tailed T-test. FIG. 23C depicts improved growth of the AcrB T60N mutant was observed in inhibitory concentrations of erythromycin (200 μg/mL) and isobutanol (1.2%) in shaking 96 well plate, indicating that this mutation may enhance the efflux activity of this pump towards many compounds. For these experiments CREATE cassette designs were individually synthesized, cloned and sequence verified before recombineering into E. coli MG1655 to reconstruct the mutations and the genomic modifications were sequence verified by colony PCR to confirm the genotype-phenotype association.
  • Example 24—Benefits of Rational Mutagenesis for Sampling Novel Adaptive Genotypes
  • FIGS. 24A-24D depict the number of variants detected in CREATE experiments involving 500 μg/mL rifampicin (FIG. 24A), 500 μg/mL erythromycin (FIG. 24B), 10 g/L acetate (FIG. 24C), and 2 g/L furfural (FIG. 24D). While naturally evolving systems or error-prone PCR are highly biased towards sampling single nucleotide polymorphisms (e.g. 1 nt mutations, red) these histograms illustrate the potential advantages for rational design approaches that can identify rare or inaccessible mutations (2 and 3 nt, green and blue respectively). For example, the highest fitness solutions appear to be biased toward these rare mutations in rifampicin, erythromycin and furfural selections to varying degrees. These results indicate that procedures such as CREATE should allow more rapid and thorough analysis of fitness improving mutations, in much the same way that computational approaches are being used to improve directed evolution for protein engineering.
  • Example 25—Reconstruction of Mutations Identified by Erythromycin Selection
  • FIG. 25 depicts reconstructed strains grown in 0.5 mL in capped 1.5 mL eppendorf tubes following 48 hours incubation in the presence of 200 μg/mL erythromycin and final OD measurements assessed. Error bars are derived from N=3 trials. A one tailed T-test was performed on each set of measurements to determine p-values indicated for significance of growth benefit.
  • Example 26—Validation of Crp S28P Mutation for Furfural or Thermal Tolerance
  • FIG. 26A depicts a crystal structure of the Crp regulatory protein with variants identified by furfural selection highlighted in red (PDB ID 3N4M). A number of the CREATE designs targeting residues near the cyclic-AMP binding site (aa. 28-30, 65) of this regulator were highly enriched in minimal media selections for furfural or thermal tolerance suggesting that these mutations may enhance E. coli growth in minimal media under a variety of stress conditions. FIG. 26B depicts validation the Crp S28P mutant identified in 2 g/L furfural selections in M9 media. This mutant was reconstructed as described for AcrB T60S in Example 23.
  • Example 27—Genome-Scale Sequence to Activity Relationship Mapping at Single Nucleotide Resolution
  • Advances in DNA synthesis and sequencing have motivated increasingly complex efforts to rationally program genomic modifications on laboratory timescales. Realization of such efforts requires strategies that span the design-build-test forward-engineering cycle by not only precisely and efficiently generating large numbers of mutant designs but also by mapping the effects of these mutations at similar throughputs. CRISPR EnAbled Trackable genome Engineering (CREATE) couples highly efficient CRISPR editing with massively parallel oligomer synthesis to enable trackable precision editing on a genome wide scale. This can be accomplished using synthetic cassettes that link a targeting guide RNA with rationally programmable homologous repair cassettes that can be systematically designed to edit loci across a genome and track their phenotypic effects. We demonstrated the flexibility and ease of use of CREATE for genome engineering by parallel mapping of sequence-activity relationships for applications ranging from site saturation mutagenesis, rational protein engineering, complete residue substitution libraries and reconstruction of prior adaptive laboratory evolution experiments.
  • Validation of CREATE Cassette Design
  • In order to realize our engineering objectives we took into account a number of key design considerations to both maximize the editing efficiency as well as distill a complex design process into an easily executable workflow. For example, each CREATE cassette is designed to include both a targeting guide RNA (gRNA) and a homology arm (HA) that introduces rational mutations at the chromosomal cleavage site (e.g. FIG. 8A). The HA encodes both the genomic edit of interest coupled to a synonymous PAM mutation that is designed to abrogate Cas9 cleavage after repair (e.g. FIG. 8B). This arrangement not only ensures that the desired edit can be selectively enriched to high levels by Cas9 but also that the sequences required to guide cleavage and HR are covalently coupled during synthesis and thus delivered simultaneously to the same cell during transformation. The high efficiency editing of CRISPR based selection in E. coli should also ensure a strong correlation between the CREATE plasmid and genomic sequences and allow the plasmid sequence to serve as a trans-acting barcode or proxy for the genomic edit (e.g. FIG. 8C). Assuming that changes in the plasmid frequency under different selective pressures are correlated to their associated genomic edit thereby allows the impact of precise genomic modifications at many loci to be monitored in parallel using a simple downstream sequencing approach to map enriched genotypes on a population scale, analogous to previous genomic tracking methodologies.
  • To test this concept we first performed control experiments using a CREATE cassette designed to inactivate the galK gene by introducing a single point mutation to convert codon 145 from TAT to a TAA stop codon (e.g. FIG. 8B) using a 120 bp HA. The editing efficiency of this cassette using Cas9 and the nuclease deficient dCas9 control was evaluated using a red/white colony screening assay (e.g. FIG. 8A-B, FIG. 15A-15C). These experiments also indicated that HR between a circular double stranded plasmid and the chromosome is strongly dependent on the Cas9 cleavage as recombination is not observed in the absence of the active enzyme (e.g. FIG. 15A-15D). This is in contrast to single stranded recombineering approaches in which oligonucleotides anneal with high efficiency at the lagging strand of the replication fork. Cas9 also adversely impacts the overall transformation efficiency due to toxicity of dsDNA cleavage in E. coli (e.g. FIG. 9A-9D). This toxicity is further exacerbated when performing CREATE at two sites simultaneously in the same cell (e.g. FIG. 16A-16E); which when combined with the absence of an effective non-homologous end joining pathway strongly supports the fact that off target editing events should be rare within a recombineered library. Additionally, toxicity limits the size of library construction and coverage, however we note that the observed 104-105 variants/μg DNA (e.g. FIG. 9A) is on a scale compatible with current oligo synthesis capabilities (104-5 oligos per order). Thus, we anticipated that using the CREATE synthetic oligo design, we would be able to simultaneously generate ˜105 or more designer mutations at any location in the genome and precisely map such mutations onto a targeted phenotype.
  • To further characterize how changes in the CREATE cassette design influence the editing efficiency we varied the HA length (80-120 bp) and the distance between the PAM-codon/TS (17-59 bp) (e.g. FIG. 9B). Induction of Cas9 revealed that all of these cassette variants can support high efficiency HR. High efficiency conversion is also observed in the absence of Cas9 induction indicating that low level expression of Cas9, due to a leaky inducible promoter, is sufficient to drive cleavage and HR (e.g. FIG. 9B). To verify that the edits matched our intended design we sequenced the chromosome of randomly chosen clones and found that 71% (27/38) contained a perfect match to the CREATE design, while 26% (10/38) contained only the PAM edit and the remaining 3% (1/38) appeared to be wt escapers. As an additional test of design flexibility performed similar experiments using deletion cassettes that that introduce different sized deletions (e.g. FIG. 17A-17D) and observed similar efficiencies (>70%) indicating that the same design automation and tracking capabilities should readily extend to a variety of design objectives (e.g. FIG. 13A-13D).
  • High-Throughput Design and Multiplexed Library Construction
  • To scale the CREATE process for genome-wide applications we developed a custom software to automate cassette design that takes into account the above mentioned criteria to systematically identify a PAM sequence nearest to a target site (TS) of interest and modify it to create a synonymous PAM mutation. This design software is part of a suite of web-based design tools that can be implemented for E. coli and is under further development for other organisms as well as an expanded set of CRISPR-Cas systems. This software platform enables high-throughput rational design of genomic libraries in a format that is compatible with parallelized array based oligo synthesis and simple homology based cloning methods that can be performed in batch for library construction (e.g. FIG. 8B).
  • Using this design software we generated a total of 52,356 CREATE cassettes for a range of applications where sequence to activity mapping by traditional methods would be time-consuming and prohibitively expensive. Briefly, the library designs included: 1) a complete saturation of the folA gene to map the entire mutational landscape of an essential gene in its chromosomal context 2) saturation mutagenesis of functional residues in 35 global regulators, efflux pumps and metabolic enzymes implicated in a wide range of tolerance and production phenotypes in E. coli 3) a reconstruction of the complete set of nonsynonymous mutations identified by a recent adaptive laboratory evolution (ALE) study of thermotolerance, and 4) promoter engineering libraries designed to incorporate UP elements or CAP binding elements at transcription start sites annotated in RegulonDB (e.g. FIG. 13A-13D).
  • The pooled oligo libraries were amplified and cloned in parallel and a subset of single variants were isolated to further characterize editing efficiency at different loci (e.g. FIG. 9C). Amplification and sequencing of the genomic loci after transformation with the CREATE plasmids revealed editing efficiencies of 70% on average (106 of 144 clones sampled at seven different loci), with a range of 30% for the metA V20L cassette to 100% for the rpoH_V179H cassette. Interestingly, the differences in editing efficiency for each cassette were highly correlated with the distance between the PAM and target codon (e.g. FIG. 9D), a feature that also appears to affect the ability of linear DNA templates to effectively introduce targeted mutations (e.g. FIG. 18A-18B). This relationship suggests that subsequent CREATE designs should readily increase editing efficiency by optimizing PAM selection criteria. We also note that differences in editing efficiency may reflect detrimental effects of some mutations on organismal fitness (metA is considered an essential gene in most media conditions), and that there may be an upper bound on the number of mutations that can be observed for a particular protein. Finally, these data were obtained outside of any specific selective or screening steps that enrich for chromosomal mutants of interest, and as such demonstrate the ability of this approach to construct mutational libraries.
  • To further characterize the fidelity of the multiplexed synthesis and cloning procedures we performed deep sequencing on the pooled libraries (e.g. FIG. 19A-D). From 594,998 total reads of the cloned CREATE cassette libraries, 550,152 (92%) passed quality filtering and produced hits against the design database. Of these we observed a perfect match for 34,291 (65%) of the possible unique variants and note that many cassettes that were missing in this initial pool were observed in later selections, suggesting that at the cloning stage we can readily cover the majority of the intended design space. In depth analysis of these reads revealed that 46% of the reads passing quality filter were exact matches to their intended design, with the remainder containing 1-4 bp indels or mismatches, primarily in the HA region near the designed mutation site (e.g. FIG. 19A). The mutational bias in this region suggests that the repetitive spacer elements in the HA and gRNA portions of the cassette may form secondary structures that adversely affect sequencing or synthesis (e.g. FIG. 19B). We note that these variant designs are easily identified via the CREATE plasmid-barcoding strategy, and that in some cases it may be desired to have this added diversity in the generated library. We also observed significant (p<0.05) correlation between variant frequencies from the cloned pools and after overnight recovery following recombineering, as well as between replicate recombineering experiments (e.g. FIG. 20A-20B). These results suggest that well represented variants should be readily tracked by our methodology with a precision similar to previous CRISPR based saturation mutagenesis procedures performed at a single loci.
  • CREATE Based Protein Engineering
  • To test the robustness of the CREATE methodology for protein engineering at a single gene level we performed deep-scanning mutagenesis of the essential folA gene. This gene encodes the dihydrofolate reductase (DHFR) enzyme responsible for the production of tetrahydrofolate and the biosynthesis of pyrimidines, purines and nucleic acids. DHFR is also the primary target of the antibiotic trimethoprim (TMP) and other antifolates that are used as antibiotics or chemotherapeutics. The wealth of structural and biochemical data DHFR function and antibiotic resistance make it an ideal model for validation of the approach.
  • A CREATE library designed to saturate every codon from 2-158 of the DHFR enzyme was recombineered into E. coli MG1655 and allowed to recover overnight. Following recovery ˜109 cells (1 mL saturated culture) was transferred into media containing inhibitory TMP concentrations and allowed to grow for 48 hours. The resulting plasmid populations were then sequenced to assess our ability to capture information at the level of single amino acid substitutions that can confer TMP resistance (e.g. FIG. 10A-10B). Bootstrapped confidence intervals for mutational effect were derived using the enrichment data of the 158 synonymous mutations included in this experiment (e.g. FIG. 10A-10B). Using this criteria, we observed significant (P<0.05) levels of enrichment for 74 substitutions (2.3% of the design space) covering 49 aa positions in the protein. Although this degree of mutational flexibility of an essential enzyme may seem counterintuitive, it supports previous conclusions that this enzyme has not reached its evolutionary optimum and that many mutations that can improve TMP tolerance through enhancement of the endogenous enzymatic activity or alteration of the dynamic folding landscape of this enzyme.
  • These results also support the fact that we probe more deeply into the mutation space of improved fitness variants using rational mutagenesis strategies. For example, we observed 7 significantly enriched substitutions at position F153 (e.g. FIG. 10A-10B), none of which have been previously identified by error-prone PCR and adaptive laboratory evolution (ALE). To validate these specific mutations, we reconstructed F153R and F153W variants, which had not been previously reported in the literature and spanned a large range of the measured enrichment scale at this position (e.g. FIG. 10D-10F). We confirmed that the highly enriched F153R mutant grows rapidly under a large range of TMP concentrations while the F153W mutant demonstrates growth only at the moderate TMP concentration used in the selection, consistent with their respective enrichment scores (e.g. FIG. 10A-10F). Moreover, 6 of the 7 mutations we identified using CREATE require two nucleotide changes to convert the wt TTT codon to one of the observed amino acids (I: 1 nt,W: 2 nt,D: 2 nt,R: 2 nt,P: 2 nt,M: 2 nt,H: 2 nt). The F153R and F153W mutations also appear to impact the native enzyme activity in distinct ways (e.g. FIG. 21 ), implying that these substitutions may confer tolerance by altering the enzymatic cycle of this enzyme in distinct manners.
  • In addition to mapping substitutions that confer TMP resistance, we also attempted to identify substitutions that affect the native activity of DHFR. To do so, we compared the frequencies of each plasmid variant after overnight growth in M9 (e.g. FIG. 22A-22C). In this case, we observed similar overall enrichment profiles for both synonymous and nonsynonymous mutation sets, with very few mutations observed to have significant impact on growth. This unexpected result suggests a need for greater sequencing depth and/or alternate selection strategies to assign high confidence to low fitness variants.
  • As a separate validation of protein engineering applications, we generated a 4,240 variant library targeting the AcrB multidrug efflux pump in E. coli (e.g. FIG. 23A-23F). This protein acts as a proton exchange pump that exports a wide variety of chemicals including antibiotics, chemical mutagens, and short chain alcohols that are being pursued as next generation biofuels and motivating numerous engineering efforts. The library was designed to target the interior chamber, the exit funnel that channels substrates towards the outer-membrane component of the AcrB/AcrA/TolC complex, and key regions of the transmembrane domain where mutations conferring tolerance to isobutanol and longer chain alcohols have been identified (e.g. FIG. 23A-23C). We then constructed the AcrB CREATE library identically as for the FolA library and grew the library in the presence of 1.2% isobutanol. Sequencing identified multiple mutations to the loop-helix motif adjacent to the central efflux funnel that were significantly enriched, suggesting this substructure may provide a novel target for engineering enhanced efflux activity. Reconstruction of the AcrB N70D and D73L mutations also confirmed the ability of these mutations to enhance overall growth in the presence of this solvent stress (e.g. FIG. 23D).
  • Parallel Evaluation of Genotype Fitness from Large Scale Adaptation Studies
  • We next sought to expand our efforts from the single protein scale and validate the use of CREATE at the genome-scale. To do so we chose to reconstruct and map mutations resulting from a prior adaptive laboratory evolution study of E. coli thermal tolerance. ALE has been used extensively as a tool to study the bacterial adaptation in response to a broad range of environmental stressors. However, in the majority of cases the genome undergoes multiple mutations making it difficult to assess the contribution of each mutation to the phenotype in question. Here, we designed and constructed a CREATE library to include all 645 nonsynonymous mutants from the Tenaillon et al ALE experiment and then subjected this library to growth selection in minimal media at 42.2° C. To assess any possible effects that could arise from the synonymous PAM mutation we included redundancy in the design of this library such that each target codon was coupled to two different PAM mutations to provide a 4 fold design redundancy for each nonsynonymous mutation. For calibration purposes the ALE library was pooled with the protein targeting libraries to allow for relative enrichment comparisons from the non-ALE derived libraries as a benchmark (e.g. FIG. 11A-11C). Of the more than 50,000 cassettes in this experiment we observed 405 cassettes from the ALE derived library above the minimal counting threshold, pertaining to 252 unique variants (e.g. FIG. 11B). Of these 346 cassettes (encoding 231 nonsynonymous changes) were significantly enriched compared with the synonymous controls (e.g. FIG. 11B), suggesting that 92% (231/252) of the mutations sampled confer significant selective growth advantages as individual chromosomal mutations, consistent with their fixation during adaptive growth. Additionally we found that 141 mutations from the additional CREATE libraries were also significantly enriched, with 86 of these targeting residues in or around the cAMP binding site of Crp, a central regulator of carbon metabolism. The identification of such a large number of Crp mutants is highly suggestive of a role for Crp in thermal-tolerance in agreement with previous findings.
  • For each mutant we also calculated the number of mutations required to convert the wt codon to each of the other 19 amino acids (e.g. FIG. 11C). As with folA, we found that highly impactful mutations, such as the crp S28P and L30Y mutations, require more than a single nucleotide substitution and would therefore be inaccessible or exceedingly rare in naturally evolving systems under laboratory timescales. In fact, this seemed to be a recurrent theme across many of the selections we performed (e.g. FIG. 24A-24D) highlighting again the value of synthetic DNA driven search strategies for genomic engineering applications.
  • High-Throughput Mapping of Selectable Precision Edits on a Genome Wide Scale
  • To further validate the method for genome-scale mapping and exploration we challenged genome wide targeting libraries with antibiotics or solvents relevant to bioproduction (e.g. FIG. 12A-12F). In the case of selections performed with rifampicin, an antibiotic that inhibits transcription by the RNA polymerase (e.g. FIG. 12A, inner circle) we observed a number of enriched variants that highlighted the robustness of the CREATE approach for atomic resolution mapping. For example, 10 of the top 50 hits identified mutations to residues 1572, L533 and S531 of the RNA polymerase (3 subunit (encoded by rpoB) including variants that form part of the rifampicin binding site (e.g. FIG. 12B). In 6 of the 7 enriched variants the data suggest that a bulky substitution is necessary to sterically hinder 7 rifampicin binding. In addition to the (3-subunit mutations the rifampicin selections enriched a number mutations to the MarA transcriptional activator, whose over-expression due to marR knockout is a well studied aspect of multiple antibiotic resistance (MAR) phenotypes in E. coli. In the DNA bound crystal structure of MarA, Q89 is positioned near the DNA backbone but pointed into solution due to a steric clash between other possible rotamers and nearest phosphate group on the DNA backbone (e.g. FIG. 12C). Modeling of the MarA Q89N and Q89D mutations identified by this selection suggests that shortening the side chain by a single carbon unit may enable new protein-DNA H-bonding interactions and thereby improve the overall MAR induction response.
  • To compare these results to an antibiotic that interferes with translation we performed another round of selections in the presence of erythromycin (e.g. outer circle FIG. 12A). The enrichment profiles from this selection again highlighted loci previously implicated in resistance to this antibiotic. For example, we observed strong enrichment of 4 different mutations to the AcrB efflux pump which acts as the primary exporter of this drug from the periplasmic space (e.g. FIG. 12A). Interestingly, one of the variants (AcrB T60N) appears at the same residue identified from isobutanol selections (e.g. FIG. 23A-23F). As with the other mutations, reconstruction validated that at least two of these mutations (e.g. T60N in FIG. 23E-23F and D73L in FIG. 25 ) can significantly improve tolerance to both erythromycin as well as isobutanol isobutanol, further supporting the idea that this motif may provide a useful engineering target for broad range of tolerance phenotypes. In addition to AcrB we also observed enrichment of multiple soxR and rpoS mutants, both of which have been previously implicated in stress tolerance and general antibiotic resistance phenotypes. In total, we observed 136 of the 341 significantly enriched mutations (40%) were identified within the RpoB, MarA, MarR, SoxR, AcrB, or dxs proteins, each of which has extensive prior validation as antibiotic resistance genes.
  • Finally, we performed selections using furfural or acetate, common components of cellulosic hydrolysate that inhibit bacterial growth under industrial fermentation conditions and are thus the target of many strain engineering efforts (e.g. FIG. 12D-12F). In the presence of high acetate concentrations (10 g/L, e.g. inner plot FIG. 12D) the top 100 ranking mutations were predominated by cassettes targeting the fis, fadR, rho and fnr genes respectively (e.g. FIG. 12E). The Fis, Fnr and FadR regulators are all involved transcriptional regulation of the primary acetate utilization gene acs, and implicated in the so-called “acetate-switch” which allows the cell to effectively scavenge acetate. Knockout of these regulators leads to constitutive expression of the acetate utilization pathways and improved acetate growth phenotypes suggesting that the mutations identified in this study (e.g. FIG. 12E-12F) likely inhibit these regulatory functions by destabilizing their respective protein targets.
  • In contrast to the weak acid tolerance of acetate, the enrichment profiles obtained the presence of growth inhibiting concentrations of furfural (2 g/L) were significantly different with the most frequently observed mutations targeting the oxidative stress response regulator rpoS (e.g. FIG. 12F). Furfural growth inhibition is thought to occur through depletion of cellular NADPH pools, an important cofactor in the prevention of oxidative stress and anabolic pathways for cell growth. In line with our findings, previous studies of RpoS have demonstrated that inactive alleles are favored in such nutrient depleted scenarios. Interestingly, we also observed some of the same mutations in crp that were observed in the 42.2° C. selections (e.g. FIGS. 11A and 11C) and upon reconstruction confirmed that the Crp S28P mutant can substantially improve growth in the presence of furfural (e.g. FIG. 26A-26B). We also found that this selection uniquely enriched for variants of the PntA transhydrogenase, a membrane bound transhydrogenase that transfers hydride ions from NADH to NADP+ to maintain sufficient pools for anabolism. A mutation to I258A in close proximity to the substrate binding cleft may therefore impart enhanced NADPH production.
  • Collectively, these selections validate the CREATE strategy by demonstrating the ability to map known associations as well as highlight power of this method for rapid mapping of novel mutations to traits of interest. It is also important to note that in contrast to the most other functional genomics technologies that mainly identify loss of function mutations, the ability to perform such broad scale scanning mutagenesis opens the door for more general genomic searches that can also identify novel gain of function mutations.
  • In this work we have demonstrated that CREATE allows parallel mapping of tens of thousands of amino acid and promoter mutations in a single experiment. The construction, selection, and mapping of >50,000 genome-wide mutations (e.g. FIGS. 11A-11C and 12A-12F) can in some examples be accomplished in 1-2 weeks by a single researcher, offering orders of magnitude improvement in economics, throughput, and target scale over the current state of the art methods in synthetic biology. Importantly, the ability to track the enrichment of library variants allows multiplex sequence to activity mapping by a simple PCR based workflow using just a single set of primers as opposed to more complicated downstream sequencing approaches that are limited to a few dozen loci. In addition, the ability to map the effects of single nucleotide or amino acid level variation in coding regions or promoters allows CREATE to address a considerably more diverse set of design objectives than previous high-throughput genomic technologies such as trackable multiplexed recombineering (TRMR) or Tn-seq approaches that are limited to gene resolution analysis. Such capabilities enable new paradigms for deciphering gene function and engineering cellular traits including workflows in which iterative rounds of CREATE could be implemented to perform design-driven genome engineering and address a broad range of ambitions.
  • Notably, as a further distinction from prior approaches, the high efficiency mutagenesis (e.g. FIG. 9A-9D) reported in this work was not only an order of magnitude improved but was also achieved in a wild type MG1655 strain in which all of the native DNA repair pathways are intact. The majority of previously reported recombineering efforts in E. coli have used single-stranded oligo engineering which requires deletion of the mismatch repair genes or chemically modified oligonucleotides to achieve mutagenesis at 1-30% efficiency. The combination of plasmid based homologous recombination substrates and Cas9 dsDNA cleavage appears to circumvent these requirements (e.g. FIG. 13A-13D and FIG. 9A-9D), eliminating the need for specialized genetic modifications outside of the Cas9 and k-RED genes to perform efficient editing and tracking on a population scale (e.g. FIG. 9A-9D). This fact alongside the broad utility of CRISPR editing suggests that the CREATE approach will readily port to a wide range of microorganisms such as Saccharomyces cerevisiae and other recombinogenic bacteria for which high-efficiency transformation protocols are available. The CREATE strategy should also be compatible with a wide range of CRISPR/Cas systems using similar automation approaches to design and tracking. Extension of this methodology to higher eukaryotes however will require the development of strategies to overcome non-homologous end-joining as well as alternative tracking systems that can stably replicate.
  • The CREATE strategy provides a streamlined approach for sequence to activity mapping and directed evolution by integrating multiplexed oligo synthesis, CRISPR-CAS editing, and high-throughput sequencing.
  • Example 28—Genome-Scale Sequence to Activity Relationship Mapping at Single Nucleotide Resolution, Additional Examples
  • Possible Effects of Inconsistent Mapping of Plasmid Barcode to Genomic Edit
  • We note that the initial CREATE library included designs that we would expect to have low confidence mapping between the plasmid barcode and the genomic edit (as explained primarily by distance between the PAM and target mutation in the CREATE cassette, see FIG. 2 d ). We describe below the various scenarios that may arise in the fraction of cases where the plasmid tracking may lead to erroneous conclusions regarding a genomic variant. A few things to note in evaluating these scenarios include i) the plasmid cassette should have minimal or no functional influence relative to the genomic edit, ii) the genomic loci will only be either the WT sequence or the sequence from the editing cassette that we obtain via sequencing, and iii) offsite editing is highly unlikely given the toxicity of CRISPR-Cas editing of multiple sites (e.g. FIG. 16A-16E) or when performed in the absence of an added editing-repair template. Finally, we note that the use of replicate experiments and deeper sequencing can also address these issues.
  • Tracking of High Fitness Variants (Positive Enrichment Tracking)
  • In cases where there is a strong selective advantage for the genomic modification (and thus the associated plasmid) we will only observe cells with the edit in the chromosome post selection. Thus, this is almost always a true positive particularly when selection times are short, thus limiting the possibility of random mutations due to replication error sweeping the population. While this phenomenon may lead to a quantitative underestimation of the true fitness of a mutation due to an enrichment profile that represents the convolution of modified and wt fitness, it will not produce false positives. Moreover, the use of replicated experiments and/or longer selections can also address this potential issue and eliminate erroneous conclusions regarding a mutations impact on fitness.
  • Tracking of Low Fitness Variants (Negative Enrichment Tracking)
  • In cases where the encoded mutation has a negative fitness contribution but is linked to a PAM only or unmodified chromosome we would incorrectly overestimate the fitness of the mutant and assume that it is closer to wt, especially for longer selection times (e.g. see FIG. 22A-22C). However, any deep sequencing approach must deal with similar limitations due to the lack of information regarding such mutations following selection and the problems associated with counting statistics in these scenarios. Moreover, we would note that this scenario is only relevant to the subset of truly negative fitness mutants (which should be 10-20% based on historic directed evolution and ALE data) within the unedited fraction (˜30%) and that remain in the unedited fraction in multiple replicate transformations. In other words, it is a small percentage (4-5%) scenario that can be detected and/or addressed through replicate transformations where one would observe inconsistencies in the particular mutant showing up occasionally with WT fitness.
  • Incomplete Coverage
  • In cases where a variant is not present in the initial population (due to both low transformation efficiency and low editing efficiency) a couple of scenarios could arise. As implied by the points above, if the mutation is beneficial one could falsely conclude that it does not confer a fitness advantage, and if it is truly deleterious it also could be incorrectly assigned a neutral fitness score. This appears to be encountered sometimes in this work and impacts both the error associated with replicate measurements and our ability to distinguish low fitness variants from a synonymous control. However, our ability to identify beneficial mutants is robust despite these issues as evidenced by our ability to readily identify novel and previously validated mutations. Strategies to address this by overcoming Cas9 toxicity and improving recombineering efficiencies hold promise to largely eliminate such problems. Furthermore, increasing the number of replicates, increasing sequencing depth, and/or improving the library coverage by performing larger scale transformation also can help to address these issues.
  • Off Target gRNA Cleavage
  • Off target gRNA cleavage should be rare in E. coli due to the relatively small size of its genome (4 Mb), and thus lack of (non-targeted) regions of homology to the CREATE cassette. Moreover, the toxicity of gRNAs in the presence of Cas9 (e.g. FIG. 9A) ensures that cells survival is compromised in E. coli due to dsDNA breaks. Each additional cut introduced into E. coli appears to incur multiplicative toxicity effects, even when homologous repair templates are provided for each cut site (e.g. FIG. 16A-16E). This toxicity effect would be further exacerbated by the absence of a repair template to guide HR (e.g. FIG. 16A-16E), as would be the case for an off-target cleavage event from a single gRNA targeting two sites but containing only a single HA.
  • Random Off Target Mutagenesis (Evolution)
  • The probability that a CREATE variant is strongly enriched due to an off target mutation even is highly improbable due to 2 factors: 1) the toxicity effect for the reasons stated above and 2) the low mutation rates of MG1655 or other mutation repair proficient strains compared with the mutagenesis rates of CREATE, particularly in multiple replicates of selection. We also have validated that we can transfer the plasmid pool back into a naive parental background and rapidly verify the enrichment of fitness improving CREATE plasmids from the initial population. Like replicate data, this allows us to decouple each CREATE plasmid from the potential of background mutations that would interfere with our analysis. These factors simplify the assumptions made during our analysis, the validity of which is supported both by externally and internally validated genotypes that were identified during this work.
  • Possible Effects of Synonymous Mutations
  • Synonymous mutations (e.g. in the PAM region) can confer unexpected effects on phenotype. We have controlled for this in a number of manners. In every experiment we included an internal control that consists of a library of synonymous mutations ( 1/20 at each codon or 5% of total input), each of which samples different PAM and codon combinations and thus give us an idea of the range of possible effects we may have on a gene by measuring the enrichment profile of many synonymous changes. Using this population as a control we can accurately identify significant fitness changes at the resolution of single amino acids as the work suggests. We can also control for this effect by utilizing redundant sampling approaches where a site is coupled to multiple PAM mutations similar to what was done for the ALE study described herein.
  • CREATE Library Design Considerations
  • A variety of design principles were implemented in the gene targeting libraries described in some work disclosed herein. For example, the folA library (3140 cassettes) was designed to be an unbiased, exploratory library for full single site saturation mutagenesis and sequence activity. However, for the majority of the genes we sought to maximize the probability of interesting genotypes by choosing to focus the diversity of sites most likely to have a functional impact on the targeted protein (e.g. DNA binding sites, active sites, regions identified as mutational hotspots by previous selections). The sites that were included in these library designs were selected based on information deposited in databases including Ecocyc (biocyc.org/), Uniprot (uniprot.org/), and the PDB (rcsb.org/pdb) as well as relevant literature citations that identified residues or regions of interest using directed evolution approaches. The Uniprot and Ecocyc databases provide manually curated sequence features that indicate mutational effects and important domains of each protein. In cases where there was enough structural information to model ligand or DNA binding sites the relevant crystal structures were loaded into Pymol and manual residue selections were made and exported as numerical lists. For promoter libraries we took into account the spacing of these sites relative to the transcription start site and the canonical recognition sequence of either the CRP binding site (AAATGTGAtctagaTCACATTT located between −72 and −40 relative to the transcription start site) or the UP element (AAAATTTTTTTTCAAAAGTA (SEQ ID NO: 185) −60 from the transcription start site) that directly recruit the alpha subunit of the RNA polymerase. These sequences were designed to integrate at these positions relative to the publicly available transcriptional start site annotations in RegulonDB using a variation of the automated CREATE design software designed for protein targeting (e.g. FIG. 13A-13D). These cassettes were made with the intent of assessing the effects of gene dosage and regulation on fitness. Finally, we designed a library to reconstruct all of the 645 non-synonymous mutations targeting 197 genes that were identified by a comprehensive ALE experiment in which the complete genomes of 115 isolates were sequenced after a year of adaptation to growth at elevated temperature (e.g. 42.2° C.). In all, we designed 52,356 oligomers, with 48,080 intended to saturate 2404 codon positions across 35 genes, 2,550 oligos were made for regenerating the ALE mutations, 379 UP promoter mutants and 772 CAP promoter mutations in a manner that would allow simultaneous sequence to activity relationship mapping.
  • Cassette Design and Automation Principles
  • Based on the control experiments with galK (e.g. FIG. 9A-9D) and current maximal commercial synthesis length constraints (200 bp from Agilent) we developed a general design for each CREATE cassette (e.g. FIG. 8A-8B).
  • Design of the CREATE cassettes was automated using custom Python scripts. The basic algorithm takes a gene sequence, a list of target residues, and a list of codons as inputs. The gene sequence is searched for all available PAM sites with the corresponding spacer sequence. This list is then sorted according to relative proximity to the targeted codon position. For each PAM site in the initial list the algorithm checks for synonymous mutations that can be made in-frame that also directly disrupt the PAM site, in the event that this condition is met the algorithm proceeds to making the prescribed codon change and designing the full CREATE cassette with the accompanying spacer and iterates for each input codon and position respectively. For each PAM mutation, all possible synonymous codon substitutions are checked before proceeding to the next PAM site. For the codon saturation libraries in this study we chose the most frequent codons (genscript.com/cgi-bin/tools/codon_freq_table) for each designed amino acid substitution according to the E. coli usage statistics. The script can be run rapidly on a laptop computer and was used to generate the full design of these libraries in <10 minutes. The algorithm used in this study was designed to make the most conservative mutations possible by sometimes using only the PAM as the selectable mutation marker.
  • Plasmids
  • The X2-cas9 broad host range vector was constructed by amplifying the cas9 gene from genomic S. pyogenes DNA into the pBTBX2 backbone (Lucigen). A vector map and sequence of this vector and the galK_Y145*_120/17 CREATE cassette are provided at the following locations: benchling.com/s/3c941j/edit; benchling.com/s/xRBDwcMy/edit.
  • The editing experiments performed in some of this work employed the X2-cas9 vector in combination with the pSIM5 vector (redrecombineering.ncifcrf.gov/strains-plasmids.html) to achieve the reported efficiencies.
  • Recombineering of CREATE Libraries
  • Genomic libraries were prepared by transforming CREATE plasmid libraries into a wildtype E. coli MG1655 strain carrying the temperature sensitive pSIM5 plasmid (lambda RED) and a broad host range plasmid containing an inducible cas9 gene from cloned from S. pyogenes genomic DNA into the pBTBX-2 backbone (X2cas9, e.g. FIG. 15A-15D). pSIM5 was induced for 15 min at 42° C. followed by chilling on ice for 15 min. The cells were washed 3 times with ⅕ the initial culture volume of ddH2O (e.g. 10 mL washes for 50 mL culture). Following electroporation the cells were recovered in LB+0.4% arabinose to induce Cas9. The cells were recovered 1-2 hrs before spot plating to determine library coverage and transferred to a 10× volume for overnight recovery in LB+0.4% arabinose+50 μg/mL kanamycin+100 μg/mL carbenicillin. Saturated overnight cultures were pelleted and resuspended in 5 mL of LB. 1 mL was used to make glycerol stocks and the other 1 mL washed with the appropriate selection media before proceeding with selection.
  • For the control experiments with galK we used CREATE cassettes designed to convert Y145 (TAT) into a stop codon (TAA) with a single point mutation at this position and a second point mutation to make a synonymous mutation that abolishes the targeted PAM site (e.g. FIG. 8B and FIG. 13A-13D). Editing efficiencies (e.g. FIG. 13A-13D and FIG. 9A-9B) were estimated using red/white plate based screening on 1% galactose supplemented MacConkey agar as previously described.
  • Selection Procedures
  • Following overnight recovery, the cells were harvested by pelleting and resuspension in fresh selection media. All selections were performed in shake flask and inoculated at an initial OD600 of 0.1. Three serial dilutions (48-96 hrs depending on growth rates in the target condition) were carried out for each selection by transferring 1/100th the media volume after the cultures reached stationary phase. The 42° C. selections were performed in M9 media+0.2% glucose to mimic low carbon availability from the initial adaptation. Antibiotic selections were carried out in LB+500 μg/mL rifampicin or erythromycin to ensure stringent selection. The solvent selections were performed in M9+0.4% glucose and either 10 g/L acetate (unbuffered) or 2 g/L furfural. Selections were harvested by pelleting 1 mL of the final culture and the cell pellet was boiled in 100 μL TE buffer to preserve both the plasmid and the genomic DNA for further desired analyses.
  • Library Preparation and Sequencing
  • Custom Illumina compatible primers were designed to allow a single amplification step from the CREATE plasmid and assignment of experimental reads using barcodes. The CREATE cassettes were amplified directly from the plasmid sequences of boiled cell lysates using 20 cycles of PCR with the Phusion (NEB) polymerase using 60° C. annealing and 1:30 minute extension times. As in the cloning procedure a minimal number of PCR cycles was maintained to prevent accumulation of mutations and recombined CREATE cassettes that were observed when an excessive number of PCR cycles was implemented (e.g. >25-30). Amplified fragments were verified and quantified by 1% agarose gel electrophoresis and pooled according to the desired read depth for each sample. The pooled library was cleaned using Qiaquick PCR cleanup kit and processed for NGS using standard Illumina preparation kits. The Illumina sequencing and sample preparation were performed with the primers.
  • Preprocessing of High-Throughput Sequencing and Count Generation
  • Paired-end Illumina sequencing reads were sorted according to the golay barcode index with allowance of up to 3 mismatches then merged using the usearch-fastq_merge algorithm. Sorted reads were then matched against the database of designed CREATE cassettes using the usearch global algorithm at an identity threshold of 90% allowing up to 60 possible hits for each read. The resulting hits were further sorted according to percent identity and read assignment was made using the best matching CREATE cassette design at a final cutoff 98% identity to the initial design. It should be noted that this read assignment strategy attempts to identify correlations between the designed genotypes and may therefore miss other important features that arise due to mutations that could occur during the experimental procedure. This approach was taken both to simplify data analysis as well as evaluate the ‘forward’ design and annotation procedure and it's ability to accurately identify meaningful genetic phenomena.
  • Data Analysis and Fitness Calculation
  • Enrichment scores (or absolute fitness scores) were calculated as the log 2 enrichment score using the following equation:
  • W = log 2 ( F x , f F x , i ) ,
  • where Fx,f is the frequency of cassette X at the final time point and Fx,i is the initial frequency of cassette X and W is the absolute fitness of each variant. Frequencies were determined by dividing the read counts for each variant by the total experimental counts including those that were lost to filtering. Each selection was performed in duplicate and the count weighted average of the two measurements was used to infer the average fitness score of each mutation as follows:
  • W avg = Σ i = 1 N counts i * W i Σ i = 1 N counts i
  • These scores were used to rank and assess the fitness contributions of each mutation under the various selection pressures investigated. For all selections we took average absolute fitness scores for all of the synonymous mutants as a composite measure of the average growth rate. Absolute enrichment scores were considered significant if the mutant enrichment was at least +/−2*σ (e.g. p=0.05 assuming a normal distribution) of the wild-type value. We performed two replicates of each selection reported in this study to derive these figures and applied a cutoff threshold of 10 across the replicate experiments for inclusion in each analysis.
  • For every codon targeted our designs also included a synonymous variant to provide an internal experimental control. Thus 5% of the protein targeting cassettes encoded synonymous mutations that allow us to estimate confidence intervals for mutation effects using custom Python bootstrapping scripts. The enrichment data for each experiment was resampled with replacement 20000 to obtain 95% confidence interval estimations that were used to infer statistical significance of enrichment scores for each analysis presented in the manuscript.
  • Mutant Reconstructions and Growth Measurements
  • The AcrB T60N and Crp S28P and FolA F153R/W CREATE cassettes were ordered as separate gblocks from IDT, cloned and sequence verified. Each cassette was transformed into MG1655 and colony screened to identify a clone with the designed genomic edit. These strains (e.g. FIG. 21 and FIG. 22A-22C) were then subjected to the growth conditions from the pooled library selection as indicated. The growth curves were taken in triplicate for each condition in 100 μL in a 96 well plate reader set to measure absorbance at 600 nm. The plate was covered and water added to empty wells to reduce evaporation during the growth.
  • Software and Figure Generation
  • Circle plots were generated using Circos v0.67. Plots were generated in Python 2.7 using the matplotlib plotting libraries and figures were made using Adobe Illustrator CS5. Entropy scores for the FolA (FIG. 10A) were determined using the ProDy Python package and the Pfam accession PF00186 representative proteome alignment RP35.
  • Figures of the protein libraries and high fitness mutations were made using The PyMol Molecular Graphics System, Schrodinger, LLC. The following are the proteins and PDBs used in the figure generation: AcrB (3W9H, 4K7Q, 3AOC), Fis (3JR9), Ihf (1IHF), RNA polymerase (4KMU, 4IGC), Crp (3N4M), MarA (1BLO), and SoxR (2ZHG).
  • Example 29: Testing Edit-Barcode Correlation
  • A strain expressing a low copy number plasmid (Ec23) which is a Cas9-pSIM5 dual vector, was tested using different gene editing cassettes (lacZ, xylA, and rhaA) and recorder cassettes with different barcodes and insertion sites (galK site 1, galK site 2, and galK site 3) (Summarized in FIG. 27A). The possible outcomes are depicted in FIG. 27B. Pre-selection, all combinations of edit/barcode/WT are possible. After selection, edits cells could be enriched whether they are barcoded or not in this experimental design.
  • The transformations were plated on selective media that allowed for enrichment of cells containing the gene edits. 30 colonies from each combination transformation were sequenced to determine if they contained the desired barcode.
  • FIG. 27C shows the results from the sequencing data. Two of the edit/barcode combinations were found in 100% of the tested colonies (30/30 colonies), and the other edit/barcode combination transformation was found in approximately 97% of tested colonies (29/30 colonies). The single colony that was not properly engineered contained the gene edit, but not the barcode.
  • Overall, 89 out of 90 tested colonies has the designed gene edit and barcode.
  • Example 30: Selectable Recording
  • When a barcode is not selected for, it allows for enrichment of non-barcoded cells even if the corresponding gene edit is incorporated and selected for. FIG. 28 depicts an example strategy for selecting for the recording event (e.g., incorporation of the barcode by the recorder cassette), in addition to selecting for the editing cassette incorporation, thereby increasing the efficiency of recovering cells that have been both edited and barcoded.
  • As depicted in FIG. 28 , sequences S0, S1, S2, etc. are designed to be targeted by the guide RNA associated with the recorder cassette of the next round. In the depicted example, in the first round of engineering, a PAM mutation, a barcode, S1 site, and regulatory elementary necessary to turn on a selectable marker are incorporated into the S0 site in the target region. This turns on the TetR selectable marker and allows for enrichment of barcoded mutants variants with the S1 site that have the first round PAM site deleted. In the second round of engineering, a new recorder cassette comprising a second PAM mutation, a second barcode, a S2 site, and a mutation that turns off the selectable marker is incorporated into the S1 site from the previous round. This allows for counter-selection of variants that have incorporated the second barcode and S2 site. The subsequent rounds continue to flip the selectable marker between an on and off state and using selection or counter-selection respectively to enrich the desired variants. The recorder cassette from each round is designed to incorporate into a unique sequence (e.g., S0, S1, etc.) that was incorporated in the previous round. This ensures that the last round of barcoding was successful so that all desired engineering steps are contained in the final product. The incorporation of PAM mutations at each step also helps ensure that the desired barcoded variants are selected for since cells having the unmodified PAM sequences will be killed as they can't escape CRISPR enzyme cleavage.
  • This strategy uses multiple methods to increase the efficiency of isolating desired variants that contain all of the engineered edits from each round of engineering. The PAM mutation, selectable marker switch, and unique landing site incorporated in each round separately increase efficiency and together increase efficiency as well. These tools allow for selection of each recording round and allow design of highly active recording guide RNAs. An array of equally spaced (or not equally spaced, depending on the design) barcodes is generated and facilitates downstream analysis such as sequencing the barcode array to determine which corresponding edits are incorporated throughout the genome.
  • FIG. 29 depicts an experimental design to test the selectable recorder strategy described above. A plasmid (pREC1) containing an editing cassette and a recorder cassette was transformed into cells. The editing cassette either contained a non-targeting editing cassette, or a mutation that incorporated a mutation (not TS) or a temperature sensitive mutation (TS) into a target gene. The recorder cassette was designed to incorporate into the S0 site in the target gene that originally had the tetR selectable marker turned off. The recorder cassette also contained a PAM mutation that deleted the S0 PAM site, first barcode (BC1), a unique S1 site for the subsequent engineering round recording cassette to incorporate into, and a corrective mutation that will turn on the TetR selectable marker. A guide RNA on the recorder cassette that targets a PAM site in the S0 site (S0-gRNA) allows a CRISPR enzyme, in this case Cas9, to cleave the S0 site. The recorder cassette recombines into the cleaved S0 site. The PAM mutation is incorporated, which means the S0-gRNA can no longer target the S0 site, thereby killing WT cells and enriching for cells that received the barcode. The TetR selectable marker was also turned on, allowing further selection of the barcoded variant.
  • The data in FIGS. 30A and 30B show the results from the experiment described above and depicted in FIG. 29 . Of the Tet Resistant colonies that were recovered from the transformation and engineering round, 16 were sequence and determined to all contain the designed barcode (FIG. 30A). FIG. 30B shows that the control cells that did not contain the recorder target site (non-target) did not survive the presence of Tet, while cells that contained the target site were successfully barcoded as evidences by the turning on of TetR, allowing cells to be selected on Tet containing media. The Tet resistant colonies were confirmed at the genomic site to have TetR gene turned on. These data showed that selectable recording was successful.
  • Example 31: Expression of MAD Nucleases
  • Wild-type nucleic acid sequences for MAD1-MAD20 include SEQ ID NOs 21-40, respectively. These MAD nucleases were codon optimized for expression in E. coli and the codon optimized sequences are listed as SEQ ID NO: 41-60, respectively (summarized in Table 2). Codon optimized MAD1-MAD20 were cloned into an expression construct comprising a constitutive or inducible promoter (e.g., T7 promoter SEQ ID NO: 83, or pBAD promoter SEQ ID NO: 81 or SEQ ID NO: 82) and an optional 6×-His tag (SEQ ID NO: 186). The generated MAD1-MAD20 expression constructs are provided as SEQ ID NOs: 61-80, respectively.
  • TABLE 2
    WT Codon
    MAD nucleic optimized
    nu- acid nucleic acid Amino acid Expression
    clease sequence sequence sequence constructs
    MAD1 SEQ ID SEQ ID NO: 41 SEQ ID NO: 1 SEQ ID NO: 61
    NO: 21
    MAD2 SEQ ID SEQ ID NO: 42 SEQ ID NO: 2 SEQ ID NO: 62
    NO: 22
    MAD3 SEQ ID SEQ ID NO: 43 SEQ ID NO: 3 SEQ ID NO: 63
    NO: 23
    MAD4 SEQ ID SEQ ID NO: 44 SEQ ID NO: 4 SEQ ID NO: 64
    NO: 24
    MAD5 SEQ ID SEQ ID NO: 45 SEQ ID NO: 5 SEQ ID NO: 65
    NO: 25
    MAD6 SEQ ID SEQ ID NO: 46 SEQ ID NO: 6 SEQ ID NO: 66
    NO: 26
    MAD7 SEQ ID SEQ ID NO: 47 SEQ ID NO: 7 SEQ ID NO: 67
    NO: 27
    MAD8 SEQ ID SEQ ID NO: 48 SEQ ID NO: 8 SEQ ID NO: 68
    NO: 28
    MAD9 SEQ ID SEQ ID NO: 49 SEQ ID NO: 9 SEQ ID NO: 69
    NO: 29
    MAD10 SEQ ID SEQ ID NO: 50 SEQ ID NO: 10 SEQ ID NO: 70
    NO: 30
    MAD11 SEQ ID SEQ ID NO: 51 SEQ ID NO: 11 SEQ ID NO: 71
    NO: 31
    MAD12 SEQ ID SEQ ID NO: 52 SEQ ID NO: 12 SEQ ID NO: 72
    NO: 32
    MAD13 SEQ ID SEQ ID NO: 53 SEQ ID NO: 13 SEQ ID NO: 73
    NO: 33
    MAD14 SEQ ID SEQ ID NO: 54 SEQ ID NO: 14 SEQ ID NO: 74
    NO: 34
    MAD15 SEQ ID SEQ ID NO: 55 SEQ ID NO: 15 SEQ ID NO: 75
    NO: 35
    MAD16 SEQ ID SEQ ID NO: 56 SEQ ID NO: 16 SEQ ID NO: 76
    NO: 36
    MAD17 SEQ ID SEQ ID NO: 57 SEQ ID NO: 17 SEQ ID NO: 77
    NO: 37
    MAD18 SEQ ID SEQ ID NO: 58 SEQ ID NO: 18 SEQ ID NO: 78
    NO: 38
    MAD19 SEQ ID SEQ ID NO: 59 SEQ ID NO: 19 SEQ ID NO: 79
    NO: 39
    MAD20 SEQ ID SEQ ID NO: 60 SEQ ID NO: 20 SEQ ID NO: 80
    NO: 40
  • Example 32: MAD2 and MAD7 Nucleases
  • MAD2 and MAD7 nucleases are nucleic acid-guided nuclease that can be used in the methods disclosed herein. Nucleases Mad2 (SEQ ID NO: 2) and Mad 7 (SEQ ID NO: 7) were cloned and transformed into cells. Editing cassettes designed to mutate a target site in a galK gene were designed with mutations, which allowed for white/red screening of successfully editing colonies. The editing cassettes also encoded a guide nucleic acid designed to target galK. The editing cassettes were transformed into E. coli cells expressing MAD2, MAD7, or Cas9. FIG. 31A shows the editing efficiency of Mad2 and Mad7 compared to Cas9 (SEQ ID NO: 110). FIG. 31B shows the transformation efficiency as evidenced by cell survival rates. In this example, the guide nucleic acid used with MAD2 and MAD7 comprised a scaffold-12 sequence and a guide sequence targeting galK. The guide nucleic acid used with Cas9 comprised a sequence compatible with the S. pyogenes Cas9.
  • FIG. 32 and Table 3 show more examples of gene editing using the MAD2 nuclease. In this experiment, different guide nucleic acid sequences were tested. The guide sequence of the guide nucleic acids targeted the galK gene as described above. The scaffold sequence of the guide nucleic acids were one of various sequences tested as indicated. Guide nucleic acids with scaffold-5, scaffold-10, scaffold-11, and scaffold-12 were able to form functional complexes with MAD2.
  • FIG. 33 and Table 4 show more examples of gene editing using the MAD7 nuclease. In this experiment, different guide nucleic acid sequences were tested. The guide sequence of the guide nucleic acids targeted the galK gene as described above. The scaffold sequence of the guide nucleic acids were one of various sequences tested as indicated. Guide nucleic acids with scaffold-10, scaffold-11, and scaffold-12 (e.g., FIG. 31A) were able to form functional complexes with MAD7. Amino acid sequences are provided in Table 2 and scaffolding sequences are provided in Table 3 and Table 4. Table 3 and Table 4 also provided the designed mutations in the editing cassettes that were used to mutate the galK target gene.
  • Further details and characterization of MAD2, MAD7, and other MAD nucleases are described in U.S. application Ser. No. 15/631,989, filed Jun. 23, 2017, and U.S. application Ser. No. 15/632,001, filed Jun. 23, 2017, each of which are incorporated herein in their entirety.
  • TABLE 3
    Nucleic Editing
    acid-guided Guide nucleic acid sequence Target
    # nuclease scaffold sequence mutation gene
     1 MAD2 Scaffold-12; SEQ ID NO: 95 N89KpnI galK
     2 MAD2 Scaffold-10; SEQ ID NO: 93 L80** galK
     3 MAD2 Scaffold-5; SEQ ID NO: 88 L80** galK
     4 MAD2 Scaffold-12; SEQ ID NO: 95 D70KpnI galK
     5 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK
     6 MAD2 Scaffold-11; SEQ ID NO: 94 Y145** galK
     7 MAD2 Scaffold-10; SEQ ID NO: 93 Y145** galK
     8 MAD2 Scaffold-12; SEQ ID NO: 95 L10KpnI galK
     9 MAD2 Scaffold-11; SEQ ID NO: 94 L80** galK
    10 SpCas9 S. pyogenese gRNA Y145** galK
    11 MAD2 Scaffold-2; SEQ ID NO: 85 Y145** galK
    12 MAD2 Scaffold-4; SEQ ID NO: 87 Y145** galK
    13 MAD2 Scaffold-1; SEQ ID NO: 84 L80** galK
    14 MAD2 Scaffold-13; SEQ ID NO: 96 Y145** galK
  • TABLE 4
    Nucleic Editing
    acid-guided Guide nucleic acid sequence Target
    # nuclease scaffold sequence mutation gene
    1 MAD7 Scaffold-1; SEQ ID NO: 84 L80** galK
    2 MAD7 Scaffold-2; SEQ ID NO: 85 Y145** galK
    3 MAD7 Scaffold-4; SEQ ID NO: 87 Y145** galK
    4 MAD7 Scaffold-10; SEQ ID NO: 93 Y145** galK
    5 MAD7 Scaffold-11; SEQ ID NO: 95 L80** galK
  • While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
  • SEQUENCE LISTING
  • TABLE 5
    SEQ
    ID
    NO: Sequence
    SEQ MGKMYYLGLDIGTNSVGYAVTDPSYHLLKFKGEPMWGAHVFAAGNQSAERRSFRTSRRRL
    ID DRRQQRVKLVQEIFAPVISPIDPRFFIRLHESALWRDDVAETDKHIFFNDPTYTDKEYYS
    NO: DYPTIHHLIVDLMESSEKHDPRLVYLAVAWLVAHRGHFLNEVDKDNIGDVLSFDAFYPEF
    1 LAFLSDNGVSPWVCESKALQATLLSRNSVNDKYKALKSLIFGSQKPEDNFDANISEDGLI
    QLLAGKKVKVNKLFPQESNDASFTLNDKEDAIEEILGTLTPDECEWIAHIRRLFDWAIMK
    HALKDGRTISESKVKLYEQHHHDLTQLKYFVKTYLAKEYDDIFRNVDSETTKNYVAYSYH
    VKEVKGTLPKNKATQEEFCKYVLGKVKNIECSEADKVDFDEMIQRLTDNSFMPKQVSGEN
    RVIPYQLYYYELKTILNKAASYLPFLTQCGKDAISNQDKLLSIMTFRIPYFVGPLRKDNS
    EHAWLERKAGKIYPWNFNDKVDLDKSEEAFIRRMTNTCTYYPGEDVLPLDSLIYEKFMIL
    NEINNIRIDGYPISVDVKQQVFGLFEKKRRVTVKDIQNLLLSLGALDKHGKLTGIDTTIH
    SNYNTYHHFKSLMERGVLTRDDVERIVERMTYSDDTKRVRLWLNNNYGTLTADDVKHISR
    LRKHDFGRLSKMFLTGLKGVHKETGERASILDFMWNTNDNLMQLLSECYTFSDEITKLQE
    AYYAKAQLSLNDFLDSMYISNAVKRPIYRTLAVVNDIRKACGTAPKRIFIEMARDGESKK
    KRSVTRREQIKNLYRSIRKDFQQEVDFLEKILENKSDGQLQSDALYLYFAQLGRDMYTGD
    PIKLEHIKDQSFYNIDHIYPQSMVKDDSLDNKVLVQSEINGEKSSRYPLDAAIRNKMKPL
    WDAYYNHGLISLKKYQRLTRSTPFTDDEKWDFINRQLVETRQSTKALAILLKRKFPDTEI
    VYSKAGLSSDFRHEFGLVKSRNINDLHHAKDAFLAIVTGNVYHERFNRRWFMVNQPYSVK
    TKTLFTHSIKNGNFVAWNGEEDLGRIVKMLKQNKNTIHFTRFSFDRKEGLFDIQPLKAST
    GLVPRKAGLDVVKYGGYDKSTAAYYLLVRFTLEDKKTQHKLMMIPVEGLYKARIDHDKEF
    LTDYAQTTISEILQKDKQKVINIMFPMGTRHIKLNSMISIDGFYLSIGGKSSKGKSVLCH
    AMVPLIVPHKIECYIKAMESFARKFKENNKLRIVEKFDKITVEDNLNLYELFLQKLQHNP
    YNKFFSTQFDVLTNGRSTFTKLSPEEQVQTLLNILSIFKTCRSSGCDLKSINGSAQAARI
    MISADLTGLSKKYSDIRLVEQSASGLFVSKSQNLLEYL*
    SEQ MSSLTKFTNKYSKQLTIKNELIPVGKTLENIKENGLIDGDEQLNENYQKAKIIVDDFLRD
    ID FINKALNNTQIGNWRELADALNKEDEDNIEKLQDKIRGIIVSKFETFDLFSSYSIKKDEK
    NO: IIDDDNDVEEEELDLGKKTSSFKYIFKKNLFKLVLPSYLKTTNQDKLKIISSFDNFSTYF
    2 RGFFENRKNIFTKKPISTSIAYRIVHDNFPKFLDNIRCFNVWQTECPQLIVKADNYLKSK
    NVIAKDKSLANYFTVGAYDYFLSQNGIDFYNNIIGGLPAFAGHEKIQGLNEFINQECQKD
    SELKSKLKNRHAFKMAVLFKQILSDREKSFVIDEFESDAQVIDAVKNFYAEQCKDNNVIF
    NLLNLIKNIAFLSDDELDGIFIEGKYLSSVSQKLYSDWSKLRNDIEDSANSKQGNKELAK
    KIKTNKGDVEKAISKYEFSLSELNSIVHDNTKFSDLLSCTLHKVASEKLVKVNEGDWPKH
    LKNNEEKQKIKEPLDALLEIYNTLLIFNCKSFNKNGNFYVDYDRCINELSSVVYLYNKTR
    NYCTKKPYNTDKFKLNFNSPQLGEGFSKSKENDCLTLLFKKDDNYYVGIIRKGAKINFDD
    TQAIADNTDNCIFKMNYFLLKDAKKFIPKCSIQLKEVKAHFKKSEDDYILSDKEKFASPL
    VIKKSTFLLATAHVKGKKGNIKKFQKEYSKENPTEYRNSLNEWIAFCKEFLKTYKAATIF
    DITTLKKAEEYADIVEFYKDVDNLCYKLEFCPIKTSFIENLIDNGDLYLFRINNKDFSSK
    STGTKNLHTLYLQAIFDERNLNNPTIMLNGGAELFYRKESIEQKNRITHKAGSILVNKVC
    KDGTSLDDKIRNEIYQYENKFIDTLSDEAKKVLPNVIKKEATHDITKDKRFTSDKFFFHC
    PLTINYKEGDTKQFNNEVLSFLRGNPDINIIGIDRGERNLIYVTVINQKGEILDSVSFNT
    VTNKSSKIEQTVDYEEKLAVREKERIEAKRSWDSISKIATLKEGYLSAIVHEICLLMIKH
    NAIVVLENLNAGFKRIRGGLSEKSVYQKFEKMLINKLNYFVSKKESDWNKPSGLLNGLQL
    SDQFESFEKLGIQSGFIFYVPAAYTSKIDPTTGFANVLNLSKVRNVDAIKSFFSNFNEIS
    YSKKEALFKFSFDLDSLSKKGFSSFVKFSKSKWNVYTFGERIIKPKNKQGYREDKRINLT
    FEMKKLLNEYKVSFDLENNLIPNLTSANLKDTFWKELFFIFKTTLQLRNSVTNGKEDVLI
    SPVKNAKGEFFVSGTHNKTLPQDCDANGAYHIALKGLMILERNNLVREEKDTKKIMAISN
    VDWFEYVQKRRGVL*
    SEQ MNNYDEFTKLYPIQKTIRFELKPQGRTMEHLETFNFFEEDRDRAEKYKILKEAIDEYHKK
    ID FIDEHLTNMSLDWNSLKQISEKYYKSREEKDKKVFLSEQKRMRQEIVSEFKKDDRFKDLF
    NO: SKKLFSELLKEEIYKKGNHQEIDALKSFDKFSGYFIGLHENRKNMYSDGDEITAISNRIV
    3 NENFPKFLDNLQKYQEARKKYPEWIIKAESALVAHNIKMDEVFSLEYFNKVLNQEGIQRY
    NLALGGYVTKSGEKMMGLNDALNLAHQSEKSSKGRIHMTPLFKQILSEKESFSYIPDVFT
    EDSQLLPSIGGFFAQIENDKDGNIFDRALELISSYAEYDTERIYIRQADINRVSNVIFGE
    WGTLGGLMREYKADSINDINLERTCKKVDKWLDSKEFALSDVLEAIKRTGNNDAFNEYIS
    KMRTAREKIDAARKEMKFISEKISGDEESIHIIKTLLDSVQQFLHFFNLFKARQDIPLDG
    AFYAEFDEVHSKLFAIVPLYNKVRNYLTKNNLNTKKIKLNFKNPTLANGWDQNKVYDYAS
    LIFLRDGNYYLGIINPKRKKNIKFEQGSGNGPFYRKMVYKQIPGPNKNLPRVFLTSTKGK
    KEYKPSKEIIEGYEADKHIRGDKFDLDFCHKLIDFFKESIEKHKDWSKFNFYFSPTESYG
    DISEFYLDVEKQGYRMHFENISAETIDEYVEKGDLFLFQIYNKDFVKAATGKKDMHTIYW
    NAAFSPENLQDVVVKLNGEAELFYRDKSDIKEIVHREGEILVNRTYNGRTPVPDKIHKKL
    TDYHNGRTKDLGEAKEYLDKVRYFKAHYDITKDRRYLNDKIYFHVPLTLNFKANGKKNLN
    KMVIEKFLSDEKAHIIGIDRGERNLLYYSIIDRSGKIIDQQSLNVIDGFDYREKLNQREI
    EMKDARQSWNAIGKIKDLKEGYLSKAVHEITKMAIQYNAIVVMEELNYGFKRGRFKVEKQ
    IYQKFENMLIDKMNYLVFKDAPDESPGGVLNAYQLTNPLESFAKLGKQTGILFYVPAAYT
    SKIDPTTGFVNLFNTSSKTNAQERKEFLQKFESISYSAKDGGIFAFAFDYRKFGTSKTDH
    KNVWTAYTNGERMRYIKEKKRNELFDPSKEIKEALTSSGIKYDGGQNILPDILRSNNNGL
    IYTMYSSFIAAIQMRVYDGKEDYIISPIKNSKGEFFRTDPKRRELPIDADANGAYNIALR
    GELTMRAIAEKFDPDSEKMAKLELKHKDWFEFMQTRGD*
    SEQ MTKTFDSEFFNLYSLQKTVRFELKPVGETASFVEDFKNEGLKRVVSEDERRAVDYQKVKE
    ID IIDDYHRDFIEESLNYFPEQVSKDALEQAFHLYQKLKAAKVEEREKALKEWEALQKKLRE
    NO: KVVKCFSDSNKARFSRIDKKELIKEDLINWLVAQNREDDIPTVETFNNFTTYFTGFHENR
    4 KNIYSKDDHATAISFRLIHENLPKFFDNVISFNKLKEGFPELKFDKVKEDLEVDYDLKHA
    FEIEYFVNFVTQAGIDQYNYLLGGKTLEDGTKKQGMNEQINLFKQQQTRDKARQIPKLIP
    LFKQILSERTESQSFIPKQFESDQELFDSLQKLHNNCQDKFTVLQQAILGLAEADLKKVF
    IKTSDLNALSNTIFGNYSVFSDALNLYKESLKTKKAQEAFEKLPAHSIHDLIQYLEQFNS
    SLDAEKQQSTDTVLNYFIKTDELYSRFIKSTSEAFTQVQPLFELEALSSKRRPPESEDEG
    AKGQEGFEQIKRIKAYLDTLMEAVHFAKPLYLVKGRKMIEGLDKDQSFYEAFEMAYQELE
    SLIIPIYNKARSYLSRKPFKADKFKINFDNNTLLSGWDANKETANASILFKKDGLYYLGI
    MPKGKTFLFDYFVSSEDSEKLKQRRQKTAEEALAQDGESYFEKIRYKLLPGASKMLPKVF
    FSNKNIGFYNPSDDILRIRNTASHTKNGTPQKGHSKVEFNLNDCHKMIDFFKSSIQKHPE
    WGSFGFTFSDTSDFEDMSAFYREVENQGYVISFDKIKETYIQSQVEQGNLYLFQIYNKDF
    SPYSKGKPNLHTLYWKALFEEANLNNVVAKLNGEAEIFFRRHSIKASDKVVHPANQAIDN
    KNPHTEKTQSTFEYDLVKDKRYTQDKFFFHVPISLNFKAQGVSKFNDKVNGFLKGNPDVN
    IIGIDRGERHLLYFTVVNQKGEILVQESLNTLMSDKGHVNDYQQKLDKKEQERDAARKSW
    TTVENIKELKEGYLSHVVHKLAHLIIKYNAIVCLEDLNFGFKRGRFKVEKQVYQKFEKAL
    IDKLNYLVFKEKELGEVGHYLTAYQLTAPFESFKKLGKQSGILFYVPADYTSKIDPTTGF
    VNFLDLRYQSVEKAKQLLSDFNAIRFNSVQNYFEFEIDYKKLTPKRKVGTQSKWVICTYG
    DVRYQNRRNQKGHWETEEVNVTEKLKALFASDSKTTTVIDYANDDNLIDVILEQDKASFF
    KELLWLLKLTMTLRHSKIKSEDDFILSPVKNEQGEFYDSRKAGEVWPKDADANGAYHIAL
    KGLWNLQQINQWEKGKTLNLAIKNQDWFSFIQEKPYQE*
    SEQ MHTGGLLSMDAKEFTGQYPLSKTLRFELRPIGRTWDNLEASGYLAEDRHRAECYPRAKEL
    ID LDDNHRAFLNRVLPQIDMDWHPIAEAFCKVHKNPGNKELAQDYNLQLSKRRKEISAYLQD
    NO: ADGYKGLFAKPALDEAMKIAKENGNESDIEVLEAFNGFSVYFTGYHESRENIYSDEDMVS
    5 VAYRITEDNFPRFVSNALIFDKLNESHPDIISEVSGNLGVDDIGKYFDVSNYNNFLSQAG
    IDDYNHIIGGHTTEDGLIQAFNVVLNLRHQKDPGFEKIQFKQLYKQILSVRTSKSYIPKQ
    FDNSKEMVDCICDYVSKIEKSETVERALKLVRNISSFDLRGIFVNKKNLRILSNKLIGDW
    DAIETALMHSSSSENDKKSVYDSAEAFTLDDIFSSVKKFSDASAEDIGNRAEDICRVISE
    TAPFINDLRAVDLDSLNDDGYEAAVSKIRESLEPYMDLFHELEIFSVGDEFPKCAAFYSE
    LEEVSEQLIEIIPLFNKARSFCTRKRYSTDKIKVNLKFPTLADGWDLNKERDNKAAILRK
    DGKYYLAILDMKKDLSSIRTSDEDESSFEKMEYKLLPSPVKMLPKIFVKSKAAKEKYGLT
    DRMLECYDKGMHKSGSAFDLGFCHELIDYYKRCIAEYPGWDVFDFKFRETSDYGSMKEFN
    EDVAGAGYYMSLRKIPCSEVYRLLDEKSIYLFQIYNKDYSENAHGNKNMHTMYWEGLFSP
    QNLESPVFKLSGGAELFFRKSSIPNDAKTVHPKGSVLVPRNDVNGRRIPDSIYRELTRYF
    NRGDCRISDEAKSYLDKVKTKKADHDIVKDRRFTVDKMMFHVPIAMNFKAISKPNLNKKV
    IDGIIDDQDLKIIGIDRGERNLIYVTMVDRKGNILYQDSLNILNGYDYRKALDVREYDNK
    EARRNWTKVEGIRKMKEGYLSLAVSKLADMIIENNAIIVMEDLNHGFKAGRSKIEKQVYQ
    KFESMLINKLGYMVLKDKSIDQSGGALHGYQLANHVTTLASVGKQCGVIFYIPAAFTSKI
    DPTTGFADLFALSNVKNVASMREFFSKMKSVIYDKAEGKFAFTFDYLDYNVKSECGRTLW
    TVYTVGERFTYSRVNREYVRKVPTDIIYDALQKAGISVEGDLRDRIAESDGDTLKSIFYA
    FKYALDMRVENREEDYIQSPVKNASGEFFCSKNAGKSLPQDSDANGAYNIALKGILQLRM
    LSEQYDPNAESIRLPLITNKAWLTFMQSGMKTWKN*
    SEQ MDSLKDFTNLYPVSKTLRFELKPVGKTLENIEKAGILKEDEHRAESYRRVKKIIDTYHKV
    ID FIDSSLENMAKMGIENEIKAMLQSFCELYKKDHRTEGEDKALDKIRAVLRGLIVGAFTGV
    NO: CGRRENTVQNEKYESLFKEKLIKEILPDFVLSTEAESLPFSVEEATRSLKEFDSFTSYFA
    6 GFYENRKNIYSTKPQSTAIAYRLIHENLPKFIDNILVFQKIKEPIAKELEHIRADFSAGG
    YIKKDERLEDIFSLNYYIHVLSQAGIEKYNALIGKIVTEGDGEMKGLNEHINLYNQQRGR
    EDRLPLFRPLYKQILSDREQLSYLPESFEKDEELLRALKEFYDHIAEDILGRTQQLMTSI
    SEYDLSRIYVRNDSQLTDISKKMLGDWNAIYMARERAYDHEQAPKRITAKYERDRIKALK
    GEESISLANLNSCIAFLDNVRDCRVDTYLSTLGQKEGPHGLSNLVENVFASYHEAEQLLS
    FPYPEENNLIQDKDNVVLIKNLLDNISDLQRFLKPLWGMGDEPDKDERFYGEYNYIRGAL
    DQVIPLYNKVRNYLTRKPYSTRKVKLNFGNSQLLSGWDRNKEKDNSCVILRKGQNFYLAI
    MNNRHKRSFENKVLPEYKEGEPYFEKMDYKFLPDPNKMLPKVFLSKKGIEIYKPSPKLLE
    QYGHGTHKKGDTFSMDDLHELIDFFKHSIEAHEDWKQFGFKFSDTATYENVSSFYREVED
    QGYKLSFRKVSESYVYSLIDQGKLYLFQIYNKDFSPCSKGTPNLHTLYWRMLFDERNLAD
    VIYKLDGKAEIFFREKSLKNDHPTHPAGKPIKKKSRQKKGEESLFEYDLVKDRHYTMDKF
    QFHVPITMNFKCSAGSKVNDMVNAHIREAKDMHVIGIDRGERNLLYICVIDSRGTILDQI
    SLNTINDIDYHDLLESRDKDRQQERRNWQTIEGIKELKQGYLSQAVHRIAELMVAYKAVV
    ALEDLNMGFKRGRQKVESSVYQQFEKQLIDKLNYLVDKKKRPEDIGGLLRAYQFTAPFKS
    FKEMGKQNGFLFYIPAWNTSNIDPTTGFVNLFHAQYENVDKAKSFFQKFDSISYNPKKDW
    FEFAFDYKNFTKKAEGSRSMWILCTHGSRIKNFRNSQKNGQWDSEEFALTEAFKSLFVRY
    EIDYTADLKTAIVDEKQKDFFVDLLKLFKLTVQMRNSWKEKDLDYLISPVAGADGRFFDT
    REGNKSLPKDADANGAYNIALKGLWALRQIRQTSEGGKLKLAISNKEWLQFVQERSYEKD
    *
    SEQ MNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDD
    ID YYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFK
    NO: NMFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDI
    7 SSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKY
    GEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKF
    ESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWE
    TINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKLCSDDNIKAETY
    IHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSVFMTEELVDKD
    NNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNN
    AIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSK
    TGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEWKNFGFDFSDT
    STYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLH
    TMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGN
    IQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYF
    LHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKS
    FNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIA
    MEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQLTYIPDKL
    KNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSEKNLF
    CFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTD
    INWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLNENNIFYD
    SAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKDWFDFIQNK
    RYL*
    SEQ MTNKFTNQYSLSKTLRFELIPQGKTLEFIQEKGLLSQDKQRAESYQEMKKTIDKFHKYFI
    ID DLALSNAKLTHLETYLELYNKSAETKKEQKFKDDLKKVQDNLRKEIVKSFSDGDAKSIFA
    NO: ILDKKELITVELEKWFENNEQKDIYFDEKFKTFTTYFTGFHQNRKNMYSVEPNSTAIAYR
    8 LIHENLPKFLENAKAFEKIKQVESLQVNFRELMGEFGDEGLIFVNELEEMFQINYYNDVL
    SQNGITIYNSIISGFTKNDIKYKGLNEYINNYNQTKDKKDRLPKLKQLYKQILSDRISLS
    FLPDAFTDGKQVLKAIFDFYKINLLSYTIEGQEESQNLLLLIRQTIENLSSFDTQKIYLK
    NDTHLTTISQQVFGDFSVFSTALNYWYETKVNPKFETEYSKANEKKREILDKAKAVFTKQ
    DYFSIAFLQEVLSEYILTLDHTSDIVKKHSSNCIADYFKNHFVAKKENETDKTFDFIANI
    TAKYQCIQGILENADQYEDELKQDQKLIDNLKFFLDAILELLHFIKPLHLKSESITEKDT
    AFYDVFENYYEALSLLTPLYNMVRNYVTQKPYSTEKIKLNFENAQLLNGWDANKEGDYLT
    TILKKDGNYFLAIMDKKHNKAFQKFPEGKENYEKMVYKLLPGVNKMLPKVFFSNKNIAYF
    NPSKELLENYKKETHKKGDTFNLEHCHTLIDFFKDSLNKHEDWKYFDFQFSETKSYQDLS
    GFYREVEHQGYKINFKNIDSEYIDGLVNEGKLFLFQIYSKDFSPFSKGKPNMHTLYWKAL
    FEEQNLQNVIYKLNGQAEIFFRKASIKPKNIILHKKKIKIAKKHFIDKKTKTSEIVPVQT
    IKNLNMYYQGKISEKELTQDDLRYIDNFSIFNEKNKTIDIIKDKRFTVDKFQFHVPITMN
    FKATGGSYINQTVLEYLQNNPEVKIIGLDRGERHLVYLTLIDQQGNILKQESLNTITDSK
    ISTPYHKLLDNKENERDLARKNWGTVENIKELKEGYISQVVHKIATLMLEENAIVVMEDL
    NFGFKRGRFKVEKQIYQKLEKMLIDKLNYLVLKDKQPQELGGLYNALQLTNKFESFQKMG
    KQSGFLFYVPAWNTSKIDPTTGFVNYFYTKYENVDKAKAFFEKFEAIRFNAEKKYFEFEV
    KKYSDFNPKAEGTQQAWTICTYGERIETKRQKDQNNKFVSTPINLTEKIEDFLGKNQIVY
    GDGNCIKSQIASKDDKAFFETLLYWFKMTLQMRNSETRTDIDYLISPVMNDNGTFYNSRD
    YEKLENPTLPKDADANGAYHIAKKGLMLLNKIDQADLTKKVDLSISNRDWLQFVQKNK*
    SEQ MEQEYYLGLDMGTGSVGWAVTDSEYHVLRKHGKALWGVRLFESASTAEERRMFRTSRRRL
    ID DRRNWRIEILQEIFAEEISKKDPGFFLRMKESKYYPEDKRDINGNCPELPYALFVDDDFT
    NO: DKDYHKKFPTIYHLRKMLMNTEETPDIRLVYLAIHHMMKHRGHFLLSGDINEIKEFGTTF
    9 SKLLENIKNEELDWNLELGKEEYAVVESILKDNMLNRSTKKTRLIKALKAKSICEKAVLN
    LLAGGTVKLSDIFGLEELNETERPKISFADNGYDDYIGEVENELGEQFYIIETAKAVYDW
    AVLVEILGKYTSISEAKVATYEKHKSDLQFLKKIVRKYLTKEEYKDIFVSTSDKLKNYSA
    YIGMTKINGKKVDLQSKRCSKEEFYDFIKKNVLKKLEGQPEYEYLKEELERETFLPKQVN
    RDNGVIPYQIHLYELKKILGNLRDKIDLIKENEDKLVQLFEFRIPYYVGPLNKIDDGKEG
    KFTWAVRKSNEKIYPWNFENVVDIEASAEKFIRRMTNKCTYLMGEDVLPKDSLLYSKYMV
    LNELNNVKLDGEKLSVELKQRLYTDVFCKYRKVTVKKIKNYLKCEGIISGNVEITGIDGD
    FKASLTAYHDFKEILTGTELAKKDKENIITNIVLFGDDKKLLKKRLNRLYPQITPNQLKK
    ICALSYTGWGRFSKKFLEEITAPDPETGEVWNIITALWESNNNLMQLLSNEYRFMEEVET
    YNMGKQTKTLSYETVENMYVSPSVKRQIWQTLKIVKELEKVMKESPKRVFIEMAREKQES
    KRTESRKKQLIDLYKACKNEEKDWVKELGDQEEQKLRSDKLYLYYTQKGRCMYSGEVIEL
    KDLWDNTKYDIDHIYPQSKTMDDSLNNRVLVKKKYNATKSDKYPLNENIRHERKGFWKSL
    LDGGFISKEKYERLIRNTELSPEELAGFIERQIVETRQSTKAVAEILKQVFPESEIVYVK
    AGTVSRFRKDFELLKVREVNDLHHAKDAYLNIVVGNSYYVKFTKNASWFIKENPGRTYNL
    KKMFTSGWNIERNGEVAWEVGKKGTIVTVKQIMNKNNILVTRQVHEAKGGLFDQQIMKKG
    KGQIAIKETDERLASIEKYGGYNKAAGAYFMLVESKDKKGKTIRTIEFIPLYLKNKIESD
    ESIALNFLEKGRGLKEPKILLKKIKIDTLFDVDGFKMWLSGRTGDRLLFKCANQLILDEK
    IIVTMKKIVKFIQRRQENRELKLSDKDGIDNEVLMEIYNTFVDKLENTVYRIRLSEQAKT
    LIDKQKEFERLSLEDKSSTLFEILHIFQCQSSAANLKMIGGPGKAGILVMNNNISKCNKI
    SIINQSPTGIFENEIDLLK
    SEQ MNKFENFTGLYPISKTLRFELIPQGKTLEYIEKSEILENDNYRAEKYEEVKDIIDGYHKW
    ID FINETLHDLHINWSELKVALENNRIEKSDASKKELQRVQKIKREEIYNAFIEHEAFQYLF
    NO: KENLLSDLLPIQIEQSEDLDAEKKKQAVETFNRFSTYFTGFHENRKNIYSKEGISTSVTY
    10 RIVHDNFPKFLENMKVFEILRNECPEVISDTANELAPFIDGVRIEDIFLIDFFNSTFSQN
    GIDYYNRILGGVTTETGEKYRGINEFTNLYRQQHPEFGKSKKATKMVVLFKQILSDRDTL
    SFIPEMFGNDKQVQNSIQLFYNREISQFENEGVKTDVCTALATLTSKIAEFDTEKIYIQQ
    PELPNVSQRLFGSWNELNACLFKYAELKFGTAEKVANRKKIDKWLKSDLFSFTELNKALE
    FSGKDERIENYFSETGIFAQLVKTGFDEAQSILETEYTSEVHLKDQQTDIEKIKTFLDAL
    QNLMHLLKSLCVSEEADRDAAFYNEFDMLYNQLKLVVPLYNKVRNYITQKLFRSDKIKIY
    FENKGQFLGGWVDSQTENSDNGTQAGGYIFRKENVINEYDYYLGICSDPKLFRRTTIVSE
    NDRSSFERLDYYQLKTASVYGNSYCGKHPYTEDKNELVNSIDRFVHLSGNNILIEKIAKD
    KVKSNPTTNTPSGYLNFIHREAPNTYECLLQDENFVSLNQRVVSALKATLATLVRVPKAL
    VYAKKDYHLFSEIINDIDELSYEKAFSYFPVSQTEFENSSNRTIKPLLLFKISNKDLSFA
    ENFEKGNRQKIGKKNLHTLYFEALMKGNQDTIDIGTGMVFHRVKSLNYNEKTLKYGHHST
    QLNEKFSYPIIKDKRFASDKFLFHLSTEINYKEKRKPLNNSIIEFLTNNPDINIIGLDRG
    ERHLIYLTLINQKGEILRQKTFNIVGNTNYHEKLNQREKERDNARKSWATIGKIKELKEG
    FLSLVIHEIAKIMVENNAIVVLEDLNFGFKRGRFKVEKQIYQKFEKMLIDKLNYLVFKDK
    KANEAGGVLKGYQLAEKFESFQKMGKQSGFLFYVPAAYTSKIDPTTGFVNMLNLNYTNMK
    DAQTLLSGMDKISFNADANYFEFELDYEKFKTNQTDHTNKWTICTVGEKRFTYNSATKET
    TTVNVTEDLKKLLDKFEVKYSNGDNIKDEICRQTDAKFFEIILWLLKLTMQMRNSNTKTE
    EDFILSPVKNSNGEFFRSNDDANGIWPADADANGAYHIALKGLYLVKECFNKNEKSLKIE
    HKNWFKFAQTRFNGSLTKNG*
    SEQ MENFKNLYPINKTLRFELRPYGKTLENFKKSGLLEKDAFKANSRRSMQAIIDEKFKETIE
    ID ERLKYTEFSECDLGNMTSKDKKITDKAATNLKKQVILSFDDEIFNNYLKPDKNIDALFKN
    NO: DPSNPVISTFKGFTTYFVNFFEIRKHIFKGESSGSMAYRIIDENLTTYLNNIEKIKKLPE
    11 ELKSQLEGIDQIDKLNNYNEFITQSGITHYNEIIGGISKSENVKIQGINEGINLYCQKNK
    VKLPRLTPLYKMILSDRVSNSFVLDTIENDTELIEMISDLINKTEISQDVIMSDIQNIFI
    KYKQLGNLPGISYSSIVNAICSDYDNNFGDGKRKKSYENDRKKHLETNVYSINYISELLT
    DTDVSSNIKMRYKELEQNYQVCKENFNATNWMNIKNIKQSEKTNLIKDLLDILKSIQRFY
    DLFDIVDEDKNPSAEFYTWLSKNAEKLDFEFNSVYNKSRNYLTRKQYSDKKIKLNFDSPT
    LAKGWDANKEIDNSTIIMRKFNNDRGDYDYFLGIWNKSTPANEKIIPLEDNGLFEKMQYK
    LYPDPSKMLPKQFLSKIWKAKHPTTPEFDKKYKEGRHKKGPDFEKEFLHELIDCFKHGLV
    NHDEKYQDVFGFNLRNTEDYNSYTEFLEDVERCNYNLSFNKIADTSNLINDGKLYVFQIW
    SKDFSIDSKGTKNLNTIYFESLFSEENMIEKMFKLSGEAEIFYRPASLNYCEDIIKKGHH
    HAELKDKFDYPIIKDKRYSQDKFFFHVPMVINYKSEKLNSKSLNNRTNENLGQFTHIIGI
    DRGERHLIYLTVVDVSTGEIVEQKHLDEIINTDTKGVEHKTHYLNKLEEKSKTRDNERKS
    WEAIETIKELKEGYISHVINEIQKLQEKYNALIVMENLNYGFKNSRIKVEKQVYQKFETA
    LIKKFNYIIDKKDPETYIHGYQLTNPITTLDKIGNQSGIVLYIPAWNTSKIDPVTGFVNL
    LYADDLKYKNQEQAKSFIQKIDNIYFENGEFKFDIDFSKWNNRYSISKTKWTLTSYGTRI
    QTFRNPQKNNKWDSAEYDLTEEFKLILNIDGTLKSQDVETYKKFMSLFKLMLQLRNSVTG
    TDIDYMISPVTDKTGTHFDSRENIKNLPADADANGAYNIARKGIMAIENIMNGISDPLKI
    SNEDYLKYIQNQQE
    SEQ MTQFEGFTNLYQVSKTLRFELIPQGKTLKHIQEQGFIEEDKARNDHYKELKPIIDRIYKT
    ID YADQCLQLVQLDWENLSAAIDSYRKEKTEETRNALIEEQATYRNAIHDYFIGRTDNLTDA
    NO: INKRHAEIYKGLFKAELFNGKVLKQLGTVTTTEHENALLRSFDKFTTYFSGFYENRKNVF
    12 SAEDISTAIPHRIVQDNFPKFKENCHIFTRLITAVPSLREHFENVKKAIGIFVSTSIEEV
    FSFPFYNQLLTQTQIDLYNQLLGGISREAGTEKIKGLNEVLNLAIQKNDETAHIIASLPH
    RFIPLFKQILSDRNTLSFILEEFKSDEEVIQSFCKYKTLLRNENVLETAEALFNELNSID
    LTHIFISHKKLETISSALCDHWDTLRNALYERRISELTGKITKSAKEKVQRSLKHEDINL
    QEIISAAGKELSEAFKQKTSEILSHAHAALDQPLPTTLKKQEEKEILKSQLDSLLGLYHL
    LDWFAVDESNEVDPEFSARLTGIKLEMEPSLSFYNKARNYATKKPYSVEKFKLNFQMPTL
    ASGWDVNKEKNNGAILFVKNGLYYLGIMPKQKGRYKALSFEPTEKTSEGFDKMYYDYFPD
    AAKMIPKCSTQLKAVTAHFQTHTTPILLSNNFIEPLEITKEIYDLNNPEKEPKKFQTAYA
    KKTGDQKGYREALCKWIDFTRDFLSKYTKTTSIDLSSLRPSSQYKDLGEYYAELNPLLYH
    ISFQRIAEKEIMDAVETGKLYLFQIYNKDFAKGHHGKPNLHTLYWTGLFSPENLAKTSIK
    LNGQAELFYRPKSRMKRMAHRLGEKMLNKKLKDQKTPIPDTLYQELYDYVNHRLSHDLSD
    EARALLPNVITKEVSHEIIKDRRFTSDKFFFHVPITLNYQAANSPSKFNQRVNAYLKEHP
    ETPIIGIDRGERNLIYITVIDSTGKILEQRSLNTIQQFDYQKKLDNREKERVAARQAWSV
    VGTIKDLKQGYLSQVIHEIVDLMIHYQAVVVLENLNFGFKSKRTGIAEKAVYQQFEKMLI
    DKLNCLVLKDYPAEKVGGVLNPYQLTDQFTSFAKMGTQSGFLFYVPAPYTSKIDPLTGFV
    DPFVWKTIKNHESRKHFLEGFDFLHYDVKTGDFILHFKMNRNLSFQRGLPGFMPAWDIVF
    EKNETQFDAKGTPFIAGKRIVPVIENHRFTGRYRDLYPANELIALLEEKGIVFRDGSNIL
    PKLLENDDSHAIDTMVALIRSVLQMRNSNAATGEDYINSPVRDLNGVCFDSRFQNPEWPM
    DADANGAYHIALKGQLLLNHLKESKDLKLQNGISNQDWLAYIQELRN*
    SEQ MAVKSIKVKLRLDDMPEIRAGLWKLHKEVNAGVRYYTEWLSLLRQENLYRRSPNGDGEQE
    ID CDKTAEECKAELLERLRARQVENGHRGPAGSDDELLQLARQLYELLVPQAIGAKGDAQQI
    NO: ARKFLSPLADKDAVGGLGIAKAGNKPRWVRMREAGEPGWEEEKEKAETRKSADRTADVLR
    13 ALADFGLKPLMRVYTDSEMSSVEWKPLRKGQAVRTWDRDMFQQAIERMMSWESWNQRVGQ
    EYAKLVEQKNRFEQKNFVGQEHLVHLVNQLQQDMKEASPGLESKEQTAHYVTGRALRGSD
    KVFEKWGKLAPDAPFDLYDAEIKNVQRRNTRRFGSHDLFAKLAEPEYQALWREDASFLTR
    YAVYNSILRKLNHAKMFATFTLPDATAHPIWTRFDKLGGNLHQYTFLFNEFGERRHAIRF
    HKLLKVENGVAREVDDVTVPISMS
    EQLDNLLPRDPNEPIALYFRDYGAEQHFTGEFGGAKIQCRRDQLAHMHRRRGARDVYLNV
    SVRVQSQSEARGERRPPYAAVFRLVGDNHRAFVHFDKLSDYLAEHPDDGKLGSEGLLSGL
    RVMSVDLGLRTSASISVFRVARKDELKPNSKGRVPFFFPIKGNDNLVAVHERSQLLKLPG
    ETESKDLRAIREERQRTLRQLRTQLAYLRLLVRCGSEDVGRRERSWAKLIEQPVDAANHM
    TPDWREAFENELQKLKSLHGICSDKEWMDAVYESVRRVWRHMGKQVRDWRKDVRSGERPK
    IRGYAKDVVGGNSIEQIEYLERQYKFLKSWSFFGKVSGQVIRAEKGSRFAITLREHIDHA
    KEDRLKKLADRIIMEALGYVYALDERGKGKWVAKYPPCQLILLEELSEYQFNNDRPPSEN
    NQLMQWSHRGVFQELINQAQVHDLLVGTMYAAFSSRFDARTGAPGIRCRRVPARCTQEHN
    PEPFPWWLNKFVVEHTLDACPLRADDLIPTGEGEIFVSPFSAEEGDFHQIHADLNAAQNL
    QQRLWSDFDISQIRLRCDWGEVDGELVLIPRLTGKRTADSYSNKVFYTNTGVTYYERERG
    KKRRKVFAQEKLSEEEAELLVEADEAREKSVVLMRDPSGIINRGNWTRQKEFWSMVNQRI
    EGYLVKQIRSRVPLQDSACENTGDI*
    SEQ MATRSFILKIEPNEEVKKGLWKTHEVLNHGIAYYMNILKLIRQEAIYEHHEQDPKNPKKV
    ID SKAEIQAELWDFVLKMQKCNSFTHEVDKDVVFNILRELYEELVPSSVEKKGEANQLSNKF
    NO: LYPLVDPNSQSGKGTASSGRKPRWYNLKIAGDPSWEEEKKKWEEDKKKDPLAKILGKLAE
    14 YGLIPLFIPFTDSNEPIVKEIKWMEKSRNQSVRRLDKDMFIQALERFLSWESWNLKVKEE
    YEKVEKEHKTLEERIKEDIQAFKSLEQYEKERQEQLLRDTLNTNEYRLSKRGLRGWREII
    QKWLKMDENEPSEKYLEVFKDYQRKHPREAGDYSVYEFLSKKENHFIWRNHPEYPYLYAT
    FCEIDKKKKDAKQQATFTLADPINHPLWVRFEERSGSNLNKYRILTEQLHTEKLKKKLTV
    QLDRLIYPTESGGWEEKGKVDIVLLPSRQFYNQIFLDIEEKGKHAFTYKDESIKFPLKGT
    LGGARVQFDRDHLRRYPHKVESGNVGRIYFNMTVNIEPTESPVSKSLKIHRDDFPKFVNF
    KPKELTEWIKDSKGKKLKSGIESLEIGLRVMSIDLGQRQAAAASIFEVVDQKPDIEGKLF
    FPIKGTELYAVHRASFNIKLPGETLVKSREVLRKAREDNLKLMNQKLNFLRNVLHFQQFE
    DITEREKRVTKWISRQENSDVPLVYQDELIQIRELMYKPYKDWVAFLKQLHKRLEVEIGK
    EVKHWRKSLSDGRKGLYGISLKNIDEIDRTRKFLLRWSLRPTEPGEVRRLEPGQRFAIDQ
    LNHLNALKEDRLKKMANTIIMHALGYCYDVRKKKWQAKNPACQIILFEDLSNYNPYEERS
    RFENSKLMKWSRREIPRQVALQGEIYGLQVGEVGAQFSSRFHAKTGSPGIRCSVVTKEKL
    QDNRFFKNLQREGRLTLDKIAVLKEGDLYPDKGGEKFISLSKDRKLVTTHADINAAQNLQ
    KRFWTRTHGFYKVYCKAYQVDGQTVYIPESKDQKQKIIEEFGEGYFILKDGVYEWGNAGK
    LKIKKGSSKQSSSELVDSDILKDSFDLASELKGEKLMLYRDPSGNVFPSDKWMAAGVFFG
    KLERILISKLTNQYSISTIEDDSSKQSM*
    SEQ MPTRTINLKLVLGKNPENATLRRALFSTHRLVNQATKRIEEFLLLCRGEAYRTVDNEGKE
    ID AEIPRHAVQEEALAFAKAAQRHNGCISTYEDQEILDVLRQLYERLVPSVNENNEAGDAQA
    NO: ANAWVSPLMSAESEGGLSVYDKVLDPPPVWMKLKEEKAPGWEAASQIWIQSDEGQSLLNK
    15 PGSPPRWIRKLRSGQPWQDDFVSDQKKKQDELTKGNAPLIKQLKEMGLLPLVNPFFRHLL
    DPEGKGVSPWDRLAVRAAVAHFISWESWNHRTRAEYNSLKLRRDEFEAASDEFKDDFTLL
    RQYEAKRHSTLKSIALADDSNPYRIGVRSLRAWNRVREEWIDKGATEEQRVTILSKLQTQ
    LRGKFGDPDLFNWLAQDRHVHLWSPRDSVTPLVRINAVDKVLRRRKPYALMTFAHPRFHP
    RWILYEAPGGSNLRQYALDCTENALHITLPLLVDDAHGTWIEKKIRVPLAPSGQIQDLTL
    EKLEKKKNRLYYRSGFQQFAGLAGGAEVLFHRPYMEHDERSEESLLERPGAVWFKLTLDV
    ATQAPPNWLDGKGRVRTPPEVHHFKTALSNKSKHTRTLQPGLRVLSVDLGMRTFASCSVF
    ELIEGKPETGRAFPVADERSMDSPNKLWAKHERSFKLTLPGETPSRKEEEERSIARAEIY
    ALKRDIQRLKSLLRLGEEDNDNRRDALLEQFFKGWGEEDVVPGQAFPRSLFQGLGAAPFR
    STPELWRQHCQTYYDKAEACLAKHISDWRKRTRPRPTSREMWYKTRSYHGGKSIWMLEYL
    DAVRKLLLSWSLRGRTYGAINRQDTARFGSLASRLLHHINSLKEDRIKTGADSIVQAARG
    YIPL
    PHGKGWEQRYEPCQLILFEDLARYRFRVDRPRRENSQLMQWNHRAIVAETTMQAELYGQI
    VENTAAGFSSRFHAATGAPGVRCRFLLERDFDNDLPKPYLLRELSWMLGNTKVESEEEKL
    RLLSEKIRPGSLVPWDGGEQFATLHPKRQTLCVIHADMNAAQNLQRRFFGRCGEAFRLVC
    QPHGDDVLRLASTPGARLLGALQQLENGQGAFELVRDMGSTSQMNRFVMKSLGKKKIKPL
    QDNNGDDELEDVLSVLPEEDDTGRITVFRDSSGIFFPCNVWIPAKQFWPAVRAMIWKVMA
    SHSLG*
    SEQ MTKLRHRQKKLTHDWAGSKKREVLGSNGKLQNPLLMPVKKGQVTEFRKAFSAYARATKGE
    ID MTDGRKNMFTHSFEPFKTKPSLHQCELADKAYQSLHSYLPGSLAHFLLSAHALGFRIFSK
    NO: SGEATAFQASSKIEAYESKLASELACVDLSIQNLTISTLFNALTTSVRGKGEETSADPLI
    16 ARFYTLLTGKPLSRDTQGPERDLAEVISRKIASSFGTWKEMTANPLQSLQFFEEELHALD
    ANVSLSPAFDVLIKMNDLQGDLKNRTIVFDPDAPVFEYNAEDPADIIIKLTARYAKEAVI
    KNQNVGNYVKNAITTTNANGLGWLLNKGLSLLPVSTDDELLEFIGVERSHPSCHALIELI
    AQLEAPELFEKNVFSDTRSEVQGMIDSAVSNHIARLSSSRNSLSMDSEELERLIKSFQIH
    TPHCSLFIGAQSLSQQLESLPEALQSGVNSADILLGSTQYMLTNSLVEESIATYQRTLNR
    INYLSGVAGQINGAIKRKAIDGEKIHLPAAWSELISLPFIGQPVIDVESDLAHLKNQYQT
    LSNEFDTLISALQKNFDLNFNKALLNRTQHFEAMCRSTKKNALSKPEIVSYRDLLARLTS
    CLYRGSLVLRRAGIEVLKKHKIFESNSELREHVHERKHFVFVSPLDRKAKKLLRLTDSRP
    DLLHVIDEILQHDNLENKDRESLWLVRSGYLLAGLPDQLSSSFINLPIITQKGDRRLIDL
    IQYDQINRDAFVMLVTSAFKSNLSGLQYRANKQSFVVTRTLSPYLGSKLVYVPKDKDWLV
    PSQMFEGRFADILQSDYMVWKDAGRLCVIDTAKHLSNIKKSVFSSEEVLAFLRELPHRTF
    IQTEVRGLGVNVDGIAFNNGDIPSLKTFSNCVQVKVSRTNTSLVQTLNRWFEGGKVSPPS
    IQFERAYYKKDDQIHEDAAKRKIRFQMPATELVHASDDAGWTPSYLLGIDPGEYGMGLSL
    VSINNGEVLDSGFIHINSLINFASKKSNHQTKVVPRQQYKSPYANYLEQSKDSAAGDIAH
    ILDRLIYKLNALPVFEALSGNSQSAADQVWTKVLSFYTWGDNDAQNSIRKQHWFGASHWD
    IKGMLRQPPTEKKPKPYIAFPGSQVSSYGNSQRCSCCGRNPIEQLREMAKDTSIKELKIR
    NSEIQLFDGTIKLFNPDPSTVIERRRHNLGPSRIPVADRTFKNISPSSLEFKELITIVSR
    SIRHSPEFIAKKRGIGSEYFCAYSDCNSSLNSEANAAANVAQKFQKQLFFEL*
    SEQ MKRILNSLKVAALRLLFRGKGSELVKTVKYPLVSPVQGAVEELAEAIRHDNLHLFGQKEI
    ID VDLMEKDEGTQVYSVVDFWLDTLRLGMFFSPSANALKITLGKFNSDQVSPFRKVLEQSPF
    NO: FLAGRLKVEPAERILSVEIRKIGKRENRVENYAADVETCFIGQLSSDEKQSIQKLANDIW
    17 DSKDHEEQRMLKADFFAIPLIKDPKAVTEEDPENETAGKQKPLELCVCLVPELYTRGFGS
    IADFLVQRLTLLRDKMSTDTAEDCLEYVGIEEEKGNGMNSLLGTFLKNLQGDGFEQIFQF
    MLGSYVGWQGKEDVLRERLDLLAEKVKRLPKPKFAGEWSGHRMFLHGQLKSWSSNFFRLF
    NETRELLESIKSDIQHATMLISYVEEKGGYHPQLLSQYRKLMEQLPALRTKVLDPEIEMT
    HMSEAVRSYIMIHKSVAGFLPDLLESLDRDKDREFLLSIFPRIPKIDKKTKEIVAWELPG
    EPEEGYLFTANNLFRNFLENPKHVPRFMAERIPEDWTRLRSAPVWFDGMVKQWQKVVNQL
    VESPGALYQFNESFLRQRLQAMLTVYKRDLQTEKFLKLLADVCRPLVDFFGLGGNDIIFK
    SCQDPRKQWQTVIPLSVPADVYTACEGLAIRLRETLGFEWKNLKGHEREDFLRLHQLLGN
    LLFWIRDAKLVVKLEDWMNNPCVQEYVEARKAIDLPLEIFGFEVPIFLNGYLFSELRQLE
    LLLRRKSVMTSYSVKTTGSPNRLFQLVYLPLNPSDPEKKNSNNFQERLDTPTGLSRRFLD
    LTLDAFAGKLLTDPVTQELKTMAGFYDHLFGFKLPCKLAAMSNHPGSSSKMVVLAKPKKG
    VASNIGFEPIPDPAHPVFRVRSSWPELKYLEGLLYLPEDTPLTIELAETSVSCQSVSSVA
    FDLKNLTTILGRVGEFRVTADQPFKLTPIIPEKEESFIGKTYLGLDAGERSGVGFAIVTV
    DGDGYEVQRLGVHEDTQLMALQQVASKSLKEPVFQPLRKGTFRQQERIRKSLRGCYWNFY
    HALMIKYRAKVVHEESVGSSGLVGQWLRAFQKDLKKADVLPKKGGKNGVDKKKRESSAQD
    TLWGGAFSKKEEQQIAFEVQAAGSSQFCLKCGWWFQLGMREVNRVQESGVVLDWNRSIVT
    FLIESSGEKVYGFSPQQLEKGFRPDIETFKKMVRDFMRPPMFDRKGRPAAAYERFVLGRR
    HRRYRFDKVFEERFGRSALFICPRVGCGNFDHSSEQSAVVLALIGYIADKEGMSGKKLVY
    VRLAELMAEWKLKKLERSRVEEQSSAQ*
    SEQ MAESKQMQCRKCGASMKYEVIGLGKKSCRYMCPDCGNHTSARKIQNKKKRDKKYGSASKA
    ID QSQRIAVAGALYPDKKVQTIKTYKYPADLNGEVHDSGVAEKIAQAIQEDEIGLLGPSSEY
    NO: ACWIASQKQSEPYSVVDFWFDAVCAGGVFAYSGARLLSTVLQLSGEESVLRAALASSPFV
    18 DDINLAQAEKFLAVSRRTGQDKLGKRIGECFAEGRLEALGIKDRMREFVQAIDVAQTAGQ
    RFAAKLKIFGISQMPEAKQWNNDSGLTVCILPDYYVPEENRADQLVVLLRRLREIAYCMG
    IEDEAGFEHLGIDPGALSNFSNGNPKRGFLGRLLNNDIIALANNMSAMTPYWEGRKGELI
    ERLAWLKHRAEGLYLKEPHFGNSWADHRSRIFSRIAGWLSGCAGKLKIAKDQISGVRTDL
    FLLKRLLDAVPQSAPSPDFIASISALDRFLEAAESSQDPAEQVRALYAFHLNAPAVRSIA
    NKAVQRSDSQEWLIKELDAVDHLEFNKAFPFFSDTGKKKKKGANSNGAPSEEEYTETESI
    QQPEDAEQEVNGQEGNGASKNQKKFQRIPRFFGEGSRSEYRILTEAPQYFDMFCNNMRAI
    FMQLESQPRKAPRDFKCFLQNRLQKLYKQTFLNARSNKCRALLESVLISWGEFYTYGANE
    KKFRLRHEASERSSDPDYVVQQALEIARRLFLFGFEWRDCSAGERVDLVEIHKKAISFLL
    AITQAEVSVGSYNWLGNSTVSRYLSVAGTDTLYGTQLEEFLNATVLSQMRGLAIRLSSQE
    LKDGFDVQLESSCQDNLQHLLVYRASRDLAACKRATCPAELDPKILVLPVGAFIASVMKM
    IERGDEPLAGAYLRHRPHSFGWQIRVRGVAEVGMDQGTALAFQKPTESEPFKIKPFSAQY
    GPVLWLNSSSYSQSQYLDGFLSQPKNWSMRVLPQAGSVRVEQRVALIWNLQAGKMRLERS
    GARAFFMPVPFSFRPSGSGDEAVLAPNRYLGLFPHSGGIEYAVVDVLDSAGFKILERGTI
    AVNGFSQKRGERQEEAHREKQRRGISDIGRKKPVQAEVDAANELHRKYTDVATRLGCRIV
    VQWAPQPKPGTAPTAQTVYARAVRTEAPRSGNQEDHARMKSSWGYTWGTYWEKRKPEDIL
    GISTQVYWTGGIGESCPAVAVALLGHIRATSTQTEWEKEEVVFGRLKKFFPS*
    SEQ MEKRINKIRKKLSADNATKPVSRSGPMKTLLVRVMTDDLKKRLEKRRKKPEVMPQVISNN
    ID AANNLRMLLDDYTKMKEAILQVYWQEFKDDHVGLMCKFAQPASKKIDQNKLKPEMDEKGN
    NO: LTTAGFACSQCGQPLFVYKLEQVSEKGKAYTNYFGRCNVAEHEKLILLAQLKPEKDSDEA
    19 VTYSLGKFGQRALDFYSIHVTKESTHPVKPLAQIAGNRYASGPVGKALSDACMGTIASFL
    SKYQDIIIEHQKVVKGNQKRLESLRELAGKENLEYPSVTLPPQPHTKEGVDAYNEVIARV
    RMWVNLNLWQKLKLSRDDAKPLLRLKGFPSFPVVERRENEVDWWNTINEVKKLIDAKRDM
    GRVFWSGVTAEKRNTILEGYNYLPNENDHKKREGSLENPKKPAKRQFGDLLLYLEKKYAG
    DWGKVFDEAWERIDKKIAGLTSHIEREEARNAEDAQSKAVLTDWLRAKASFVLERLKEMD
    EKEFYACEIQLQKWYGDLRGNPFAVEAENRVVDISGFSIGSDGHSIQYRNLLAWKYLENG
    KREFYLLMNYGKKGRIRFTDGTDIKKSGKWQGLLYGGGKAKVIDLTFDPDDEQLIILPLA
    FGTRQGREFIWNDLLSLETGLIKLANGRVIEKTIYNKKIGRDEPALFVALTFERREVVDP
    SNIKPVNLIGVDRGENIPAVIALTDPEGCPLPEFKDSSGGPTDILRIGEGYKEKQRAIQA
    AKEVEQRRAGGYSRKFASKSRNLADDMVRNSARDLFYHAVTHDAVLVFENLSRGFGRQGK
    RTFMTERQYTKMEDWLTAKLAYEGLTSKTYLSKTLAQYTSKTCSNCGFTITTADYDGMLV
    RLKKTSDGWATTLNNKELKAEGQITYYNRYKRQTVEKELSAELDRLSEESGNNDISKWTK
    GRRDEALFLLKKRFSHRPVQEQFVCLDCGHEVHADEQAALNIARSWLFLNSNSTEFKSYK
    SGKQPFVGAWQAFYKRRLKEVWKPNA
    SEQ MKRINKIRRRLVKDSNTKKAGKTGPMKTLLVRVMTPDLRERLENLRKKPENIPQPISNTS
    ID RANLNKLLTDYTEMKKAILHVYWEEFQKDPVGLMSRVAQPAPKNIDQRKLIPVKDGNERL
    NO: TSSGFACSQCCQPLYVYKLEQVNDKGKPHTNYFGRCNVSEHERLILLSPHKPEANDELVT
    20 YSLGKFGQRALDFYSIHVTRESNHPVKPLEQIGGNSCASGPVGKALSDACMGAVASFLTK
    YQDIILEHQKVIKKNEKRLANLKDIASANGLAFPKITLPPQPHTKEGIEAYNNVVAQIVI
    WVNLNLWQKLKIGRDEAKPLQRLKGFPSFPLVERQANEVDWWDMVCNVKKLINEKKEDGK
    VFWQNLAGYKRQEALLPYLSSEEDRKKGKKFARYQFGDLLLHLEKKHGEDWGKVYDEAWE
    RIDKKVEGLSKHIKLEEERRSEDAQSKAALTDWLRAKASFVIEGLKEADKDEFCRCELKL
    QKWYGDLRGKPFAIEAENSILDISGFSKQYNCAFIWQKDGVKKLNLYLIINYFKGGKLRF
    KKIKPEAFEANRFYTVINKKSGEIVPMEVNFNFDDPNLIILPLAFGKRQGREFIWNDLLS
    LETGSLKLANGRVIEKTLYNRRTRQDEPALFVALTFERREVLDSSNIKPMNLIGIDRGEN
    IPAVIALTDPEGCPLSRFKDSLGNPTHILRIGESYKEKQRTIQAAKEVEQRRAGGYSRKY
    ASKAKNLADDMVRNTARDLLYYAVTQDAMLIFENLSRGFGRQGKRTFMAERQYTRMEDWL
    TAKLAYEGLPSKTYLSKTLAQYTSKTCSNCGFTITSADYDRVLEKLKKTATGWMTTINGK
    ELKVEGQITYYNRYKRQNVVKDLSVELDRLSEESVNNDISSWTKGRSGEALSLLKKRFSH
    RPVQEKFVCLNCGFETHADEQAALNIARSWLFLRSQEYKKYQTNKTTGNTDKRAFVETWQ
    SFYRKKLKEVWKP
    SEQ atgGGAAAAATGTATTATCTTGGTCTGGATATAGGAACAAATTCTGTTGGATATGCCGTA
    ID ACCGACCCATCGTACCATTTGCTCAAATTTAAAGGCGAACCGATGTGGGGTGCCCACGTG
    NO: TTTGCTGCGGGGAATCAATCAGCTGAACGGAGAAGCTTTCGTACGAGCCGCAGACGCCTT
    21 GACCGCAGGCAACAGCGTGTCAAACTGGTTCAAGAAATCTTTGCTCCCGTGATTAGTCCC
    ATTGATCCACGTTTTTTTATCAGACTTCATGAGAGCGCTTTATGGCGGGATGATGTGGCT
    GAAACGGATAAACATATTTTCTTTAATGACCCGACCTATACGGATAAGGAATATTATTCT
    GACTATCCAACCATCCATCATCTCATTGTGGACCTTATGGAAAGCAGTGAAAAGCATGAC
    CCGCGGCTTGTTTATTTGGCTGTTGCCTGGCTGGTTGCTCATCGTGGTCATTTCCTCAAT
    GAAGTGGATAAGGATAATATTGGGGATGTCCTGAGTTTTGACGCCTTTTATCCTGAGTTT
    CTGGCATTTCTTTCCGATAATGGGGTGTCACCTTGGGTATGTGAGTCAAAAGCACTCCAA
    GCGACCCTGCTTTCACGAAACTCCGTCAACGATAAGTATAAAGCCTTGAAGTCTCTGATC
    TTTGGCAGCCAAAAGCCGGAGGATAATTTTGATGCCAATATCAGTGAAGATGGACTTATC
    CAACTTTTAGCAGGAAAAAAGGTCAAGGTCAATAAACTTTTTCCTCAAGAAAGTAATGAT
    GCTTCCTTTACACTCAATGATAAGGAAGATGCAATTGAGGAAATCTTAGGAACGCTTACA
    CCGGATGAGTGTGAATGGATTGCGCATATTAGGAGGCTGTTTGATTGGGCCATCATGAAA
    CATGCTCTCAAAGATGGCAGAACAATCTCCGAATCGAAAGTAAAGCTCTATGAACAGCAT
    CACCATGACTTGACACAGCTCAAGTATTTTGTGAAGACCTATCTAGCAAAGGAATATGAT
    GACATTTTTCGAAACGTAGATAGTGAAACAACCAAAAACTATGTCGCATATTCCTATCAT
    GTAAAAGAAGTCAAGGGTACATTGCCCAAAAATAAGGCAACCCAAGAAGAATTTTGCAAG
    TATGTCCTTGGAAAGGTAAAGAACATCGAATGCAGTGAAGCTGATAAGGTTGATTTTGAT
    GAAATGATTCAGCGTCTTACAGACAATTCCTTTATGCCGAAACAAGTATCAGGTGAAAAC
    AGGGTTATCCCTTACCAGCTTTACTATTATGAACTAAAGACTATTTTGAATAAAGCCGCT
    TCTTATCTGCCTTTTTTGACCCAATGCGGAAAAGATGCCATCTCCAATCAAGATAAGCTC
    CTTTCCATCATGACCTTTCGGATTCCGTATTTCGTTGGGCCCTTGCGCAAGGACAATTCA
    GAGCATGCCTGGCTGGAACGAAAAGCAGGGAAAATCTATCCGTGGAATTTTAACGACAAA
    GTTGACCTTGATAAAAGTGAAGAAGCGTTCATTCGGAGAATGACGAATACCTGCACTTAT
    TATCCCGGTGAAGATGTTTTGCCACTTGACTCCCTTATTTATGAAAAATTCATGATCCTC
    AATGAAATCAATAATATCCGAATTGATGGTTATCCTATTTCTGTAGATGTAAAACAGCAG
    GTTTTTGGCCTCTTTGAAAAGAAGAGAAGAGTGACCGTAAAGGATATCCAGAATCTCCTG
    CTTTCCTTGGGTGCCTTGGATAAGCATGGTAAATTGACGGGAATCGATACTACCATCCAT
    AGCAATTACAATACATACCATCATTTTAAATCGCTCATGGAGCGTGGCGTTCTTACTCGT
    GATGATGTGGAACGCATTGTGGAGCGTATGACCTATAGTGATGATACAAAACGCGTCCGT
    CTTTGGCTGAACAATAATTATGGAACGCTCACTGCTGACGACGTAAAGCATATTTCAAGG
    CTCCGAAAGCATGATTTTGGCCGGCTTTCCAAAATGTTCCTCACAGGCCTAAAGGGAGTT
    CATAAGGAAACGGGGGAACGAGCTTCCATTTTGGATTTTATGTGGAATACCAATGATAAC
    TTGATGCAGCTTTTATCTGAATGTTATACTTTTTCGGATGAAATTACCAAGCTGCAGGAA
    GCATACTATGCCAAGGCGCAGCTTTCCCTGAATGATTTTCTGGACTCCATGTATATTTCA
    AATGCTGTCAAACGTCCTATCTATCGAACTCTTGCCGTTGTAAATGACATACGCAAAGCC
    TGTGGGACGGCGCCAAAACGCATTTTTATCGAAATGGCAAGAGATGGGGAAAGCAAAAAG
    AAAAGGAGCGTAACAAGAAGAGAACAAATCAAGAATCTTTATAGGTCCATCCGCAAGGAT
    TTTCAGCAGGAGGTAGATTTCCTTGAAAAAATCCTTGAAAACAAAAGCGATGGACAGCTG
    CAAAGCGATGCGCTCTATCTATACTTTGCGCAGCTTGGAAGGGATATGTATACCGGGGAC
    CCTATCAAGTTGGAGCATATCAAGGACCAGTCCTTCTATAATATTGATCATATCTATCCC
    CAAAGCATGGTCAAGGACGATAGTCTTGATAACAAGGTGTTGGTTCAATCGGAAATTAAT
    GGAGAGAAGAGCAGTCGATATCCTCTTGATGCTGCTATCCGTAATAAAATGAAGCCTCTT
    TGGGATGCTTATTATAACCATGGCCTGATTTCCCTCAAGAAGTATCAGCGTTTGACGCGG
    AGCACTCCCTTTACAGATGATGAAAAGTGGGATTTCATCAATCGGCAGCTTGTTGAGACA
    AGACAATCCACGAAGGCCTTGGCAATCTTACTAAAAAGGAAGTTCCCTGATACGGAGATT
    GTCTACTCCAAGGCAGGGCTTTCTTCTGATTTTCGGCATGAGTTTGGTCTCGTAAAATCG
    AGGAATATCAATGACCTGCACCATGCAAAGGACGCATTTCTTGCGATTGTAACAGGAAAT
    GTCTATCATGAACGCTTTAATCGCCGGTGGTTTATGGTGAACCAGCCCTATTCCGTCAAG
    ACCAAGACGTTGTTTACGCATTCTATTAAAAATGGTAATTTTGTAGCTTGGAATGGAGAA
    GAGGATCTTGGCCGCATTGTTAAAATGTTAAAGCAAAATAAGAACACTATTCATTTCACG
    CGGTTCTCTTTTGATCGAAAGGAAGGCCTGTTTGATATTCAGCCACTAAAAGCGTCAACC
    GGTCTTGTACCAAGAAAAGCCGGACTAGACGTGGTAAAATATGGTGGCTATGACAAATCG
    ACAGCAGCTTATTATCTCCTTGTTCGATTTACACTAGAAGATAAAAAGACTCAACATAAA
    TTGATGATGATTCCTGTAGAAGGCTTGTATAAAGCTCGAATTGACCATGATAAGGAATTC
    TTAACGGACTATGCACAAACTACAATCAGTGAAATCCTACAAAAAGATAAACAAAAGGTG
    ATAAATATAATGTTTCCAATGGGAACAAGGCACATTAAACTGAATTCCATGATTTCAATC
    GATGGTTTTTATCTTTCCATTGGAGGAAAGTCTAGTAAGGGAAAATCGGTGTTGTGTCAT
    GCTATGGTACCTCTTATTGTACCTCATAAGATAGAATGTTATATTAAGGCGATGGAGTCT
    TTTGCACGTAAATTTAAAGAAAATAATAAATTAAGGATTGTGGAAAAGTTTGATAAGATT
    ACGGTGGAAGATAACTTGAACCTATACGAACTATTTTTACAAAAACTTCAACATAACCCA
    TATAATAAGTTCTTCTCCACACAATTTGATGTGCTGACTAATGGAAGAAGTACATTTACT
    AAATTATCTCCAGAGGAACAAGTTCAAACGTTATTGAATATCTTATCAATTTTTAAAACT
    TGTCGGAGCTCTGGCTGCGATTTAAAATCCATTAACGGTTCTGCTCAAGCTGCCAGAATT
    ATGATCAGCGCAGATTTAACTGGACTCTCAAAAAAATATTCCGATATTCGGCTTGTTGAG
    CAATCAGCATCTGGACTTTTTGTTAGTAAATCACAAAATCTTTTGGAGTATTTAtga
    SEQ atgtcttcattaacaaaatttacaaataaatacagtaagcagctaaccataaaaaatgaa
    ID ctcatcccagtaggaaagactctcgagaacattaaggaaaacggtctcatagatggagat
    NO: gaacagctaaacgagaattatcaaaaagcaaagataatcgttgatgattttctacgagat
    22 ttcataaataaagctttaaataatacccaaataggaaattggagagaattagcagatgct
    ttaaataaagaagatgaagataacatagaaaagctccaagacaaaatcagaggaataatt
    gtaagtaaattcgagacatttgatttgttttcttcttactcgataaagaaagacgaaaag
    ataatagatgatgataatgatgttgaagaagaggagctagatctaggaaaaaaaacttcc
    tcatttaaatatatttttaagaaaaacctttttaaattagtacttccttcttatttaaag
    acaacaaatcaggataaactgaaaataatctcttcttttgataatttttctacctatttc
    agaggattctttgagaacagaaaaaatattttcactaagaagcctatatctacgtcaatt
    gcctacagaattgtccatgataactttccaaagtttctagataacatcagatgttttaat
    gtgtggcaaacagaatgcccacagttaattgtaaaggctgataattatttaaaatcaaag
    aacgtcatagctaaagataaatctttagcaaactattttactgtaggagcatatgattac
    ttcttatcccagaatggcattgatttctacaacaacattatcggcggtctaccagcattt
    gctggtcatgagaaaatccaaggacttaatgaatttataaatcaagaatgccaaaaggac
    agcgaactaaaatctaaactgaaaaacagacatgctttcaaaatggctgttctatttaag
    caaattctttcagatagagaaaaaagttttgttatagacgagttcgaatctgatgctcag
    gtcatagatgcggttaagaacttctatgcagaacaatgtaaggataataatgttattttt
    aaccttctaaatcttatcaagaatatagcgttcttatctgatgatgaattagatggaatt
    tttatagaaggcaagtatttaagctctgtttcccaaaagctatattcagattggtcgaag
    cttcgaaatgatattgaagatagtgcaaacagtaaacaaggaaataaagagttagcaaag
    aaaattaaaacaaataaaggcgatgttgaaaaggccataagtaaatatgagttttcttta
    tcagaacttaactcaattgtacatgataatacaaaattcagtgaccttctttcttgtacg
    ttacataaagtggctagcgaaaaactagtgaaagttaatgaaggggactggccaaaacac
    ctgaaaaataatgaagaaaaacaaaagataaaagagcctttagatgcattgttagaaatt
    tataatacattgctgatattcaactgcaagtcatttaataagaacggtaatttctatgtt
    gattatgacagatgcataaatgagctttctagtgttgtttatttatataacaaaacaaga
    aattactgtacaaagaaaccttataacacagacaaattcaaattaaactttaacagtcct
    caattaggagagggctttagtaagtcgaaagaaaatgactgtctgacattattatttaaa
    aaagacgacaattactatgttggaattatcagaaaaggggcaaaaattaactttgatgat
    acacaagccattgcagacaatacagataactgtatatttaagatgaattatttcctatta
    aaagatgctaaaaagtttattcctaaatgttcaattcagttaaaagaagtaaaagcacat
    tttaaaaaatcagaggatgattatatcctgagtgacaaagaaaaatttgcctctcccctt
    gttattaagaaatcaacatttttattagcaacagcacatgtaaaaggaaagaaaggaaac
    ataaaaaaattccaaaaggaatattctaaggaaaatccaacagaatatagaaattctctg
    aatgaatggattgcattttgtaaagaatttctaaaaacatataaggcggcaacaatcttt
    gacattacaacgttaaaaaaagctgaagaatatgctgatattgttgagttttataaggat
    gtagataatctttgttataaactagagttttgccctattaaaacatctttcattgagaat
    cttattgataatggggacttatatttattcagaatcaataataaagatttcagttcaaaa
    tctactggtacaaagaatcttcatacgctctatcttcaggcaatctttgatgaaagaaac
    ctcaataatcctactattatgttaaatggcggagcagagttattttatcgaaaagaaagc
    attgaacagaaaaataggataactcataaggcaggatcaattcttgtaaacaaggtttgt
    aaggatggaacaagtctagatgacaaaatcagaaacgaaatatatcaatatgaaaacaag
    tttattgatacattgtctgatgaagctaaaaaagttttacctaatgtaataaaaaaagaa
    gcaactcacgacataacaaaagataagcgatttacatcagataagttctttttccattgc
    ccattaacaattaactataaggaaggagatacaaaacaatttaacaatgaggttttatct
    ttccttagaggtaatccagacattaatatcatcggaattgacagaggagaaagaaacctt
    atatacgtaactgttattaatcagaaaggcgaaatacttgacagcgtttcgtttaacaca
    gtaacaaacaagtcgagcaaaattgaacaaactgttgattatgaggaaaagcttgctgtt
    agggaaaaagaaagaatagaagcaaaaagatcctgggattcaatatcaaagatagcaacc
    ttaaaagaaggttatctatcagctattgttcatgagatatgcctactgatgatcaaacac
    aacgcaatcgttgtacttgagaatctaaatgcaggatttaagagaattagaggaggatta
    tcagaaaagtctgtttatcagaaattcgagaagatgcttattaacaaactaaattacttt
    gtatctaaaaaagaatcagactggaataaacctagtggacttttaaatggtttacaactt
    tcagaccagttcgagtcatttgagaaattaggaattcaatctgggttcatcttctatgtt
    cctgcagcatatacatctaagattgatcctacaacaggatttgcaaatgttcttaactta
    tccaaggtaagaaatgttgatgcaataaagagttttttcagtaatttcaatgaaatttca
    tatagcaaaaaagaagctctctttaaattctcttttgatttagattccttatcaaagaag
    ggcttcagctcatttgtaaaattcagtaaatctaaatggaatgtatatacatttggagag
    agaataataaaaccaaagaataagcaagggtatcgtgaagataagagaattaatttaaca
    tttgaaatgaaaaaacttctgaatgaatataaagtaagttttgatcttgaaaacaactta
    attccaaatctaacctctgcaaatctgaaagataccttctggaaagaactattctttatt
    tttaaaacaactctgcagcttagaaacagtgtaacaaatggcaaagaagatgtactgatt
    tctccagtaaagaacgctaaaggagagttctttgtatcaggaactcataacaagacatta
    cctcaagactgtgatgcaaatggagcatatcatatcgccctaaaaggtctgatgattctt
    gaacgtaacaatcttgttagagaagaaaaagacacaaagaagataatggcaatttctaat
    gttgactggtttgagtatgttcaaaaaaggagaggtgtcctgtaa
    SEQ ATGAACAACTATGATGAGTTTACCAAACTGTACCCAATACAGAAAACGATAAGGTTCGAA
    ID TTGAAGCCGCAGGGAAGAACGATGGAACACCTCGAAACATTCAACTTTTTCGAAGAGGAC
    NO: AGGGATAGAGCGGAGAAATATAAGATTTTAAAGGAAGCAATCGACGAGTATCATAAGAAG
    23 TTTATAGACGAACATCTAACAAATATGTCTCTTGACTGGAATTCTTTAAAACAGATTTCA
    GAGAAATACTATAAGAGTAGAGAGGAAAAAGACAAGAAAGTTTTTCTGTCAGAACAGAAA
    CGCATGAGGCAAGAGATAGTTTCTGAGTTCAAAAAAGACGATCGGTTTAAAGATCTTTTT
    TCAAAAAAATTGTTTTCTGAACTTCTCAAGGAAGAGATTTACAAAAAAGGAAACCATCAG
    GAAATTGACGCATTGAAAAGTTTTGATAAATTCTCAGGCTATTTTATTGGGTTGCATGAG
    AACCGAAAAAATATGTATTCTGACGGAGACGAGATCACGGCTATCTCTAACCGTATTGTA
    AATGAGAATTTCCCGAAGTTCCTCGACAACCTTCAGAAATATCAGGAAGCTCGTAAAAAA
    TATCCAGAGTGGATCATTAAGGCAGAATCTGCTTTAGTTGCACATAATATCAAGATGGAT
    GAAGTCTTTTCCTTAGAGTATTTCAACAAAGTCCTGAATCAAGAAGGAATACAGAGATAC
    AATCTCGCCCTAGGTGGCTATGTGACCAAAAGTGGTGAGAAAATGATGGGGCTTAATGAT
    GCACTTAATCTTGCCCATCAAAGTGAAAAAAGCAGCAAGGGAAGGATACACATGACTCCA
    CTCTTCAAACAGATTCTGAGTGAAAAAGAGTCCTTTTCTTATATACCAGATGTTTTTACA
    GAAGACTCTCAACTTTTACCATCCATTGGTGGGTTCTTTGCACAAATAGAAAATGATAAG
    GACGGGAATATTTTTGACAGAGCATTAGAATTGATATCTTCTTATGCAGAATACGATACA
    GAAAGGATATATATCAGGCAAGCGGACATAAACAGAGTTTCTAATGTTATTTTCGGGGAG
    TGGGGAACACTGGGGGGGTTAATGAGGGAATACAAAGCAGACTCTATCAACGACATCAAT
    TTGGAGAGAACATGCAAGAAGGTAGACAAGTGGCTCGACTCAAAGGAGTTTGCGTTATCA
    GATGTATTAGAGGCAATAAAAAGAACCGGCAATAATGATGCTTTTAATGAATATATCTCA
    AAGATGCGCACTGCCAGGGAAAAGATTGACGCTGCAAGAAAGGAAATGAAATTCATTTCG
    GAAAAAATATCTGGAGACGAAGAATCGATCCATATTATCAAAACCTTATTGGACTCGGTG
    CAACAGTTTTTACATTTTTTCAATTTATTCAAAGCGCGTCAGGACATTCCTCTTGATGGA
    GCATTCTATGCGGAGTTCGATGAAGTCCATAGCAAACTGTTTGCTATTGTTCCGTTGTAT
    AATAAGGTTAGGAACTATCTTACGAAAAATAACCTTAACACGAAAAAGATAAAGCTAAAC
    TTCAAGAATCCAACTCTGGCAAACGGATGGGATCAAAACAAGGTATATGACTACGCCTCC
    TTAATCTTTCTCCGCGATGGTAATTATTATCTCGGAATAATAAATCCAAAAAGGAAAAAG
    AATATTAAATTCGAACAAGGGTCTGGAAATGGCCCATTCTACCGGAAGATGGTGTACAAA
    CAAATTCCAGGGCCGAACAAGAACTTACCAAGAGTCTTCCTCACATCTACGAAAGGCAAA
    AAAGAGTACAAGCCGTCAAAGGAGATAATAGAAGGATATGAAGCGGACAAACACATAAGA
    GGAGATAAATTCGATCTGGATTTCTGTCATAAGCTGATAGACTTCTTCAAGGAATCCATC
    GAGAAGCACAAGGACTGGAGTAAGTTCAACTTCTATTTCTCTCCAACTGAATCATATGGA
    GACATCAGCGAATTCTATCTGGATGTAGAAAAACAGGGATACCGGATGCATTTTGAGAAT
    ATTTCTGCCGAGACGATTGATGAGTATGTCGAAAAGGGGGACTTATTCCTCTTCCAGATA
    TACAACAAAGACTTTGTGAAAGCGGCAACCGGAAAAAAAGATATGCACACCATTTATTGG
    AACGCGGCATTCTCGCCCGAGAACCTTCAGGATGTGGTAGTGAAACTGAACGGTGAAGCA
    GAACTTTTCTACAGAGACAAGAGCGACATCAAGGAGATAGTTCACAGGGAGGGAGAGATA
    CTGGTCAATCGTACCTACAACGGCAGGACACCTGTGCCTGACAAGATCCACAAAAAATTA
    ACAGATTATCATAATGGCCGTACCAAAGATCTCGGAGAAGCAAAAGAATACCTCGATAAG
    GTCAGATATTTCAAAGCGCACTACGACATCACAAAGGATCGCAGATACCTGAATGATAAA
    ATATACTTCCATGTGCCTCTGACATTGAATTTCAAAGCAAACGGGAAGAAGAATCTCAAT
    AAGATGGTAATTGAAAAGTTCCTCTCGGACGAAAAAGCGCATATTATTGGGATTGATCGC
    GGGGAAAGGAATCTTCTTTACTATTCTATCATTGACAGGTCAGGTAAAATAATCGATCAA
    CAGAGCCTCAACGTCATCGATGGATTCGATTACCGAGAGAAACTGAATCAGAGGGAGATC
    GAGATGAAGGATGCCAGACAAAGCTGGAATGCTATCGGGAAGATAAAGGACCTCAAGGAA
    GGGTATCTTTCAAAAGCGGTCCACGAAATTACCAAGATGGCGATACAATACAATGCCATT
    GTTGTCATGGAGGAACTCAATTATGGGTTCAAACGCGGACGTTTCAAAGTTGAGAAGCAG
    ATATATCAGAAATTCGAGAATATGCTGATTGACAAGATGAATTATCTGGTATTCAAGGAT
    GCTCCGGATGAAAGTCCGGGAGGAGTCCTCAATGCATATCAGCTTACTAATCCGCTTGAA
    AGTTTCGCTAAACTTGGGAAACAGACAGGAATTCTTTTCTATGTTCCGGCAGCCTATACT
    TCGAAGATAGATCCGACGACCGGGTTTGTCAATCTTTTCAATACTTCAAGTAAAACGAAC
    GCACAGGAAAGAAAAGAATTCTTGCAAAAATTCGAGTCGATCTCCTATTCCGCTAAAGAC
    GGAGGAATATTCGCATTCGCGTTCGATTATCGGAAGTTCGGAACGTCAAAAACAGACCAC
    AAAAATGTATGGACCGCATACACGAACGGGGAAAGGATGAGGTACATAAAAGAGAAAAAA
    CGCAACGAACTGTTCGACCCCTCGAAGGAGATCAAAGAGGCTCTCACTTCATCAGGAATC
    AAATATGACGGCGGACAGAACATATTGCCAGATATCCTGAGGAGCAACAATAACGGTCTG
    ATCTACACAATGTATTCCTCTTTCATAGCGGCCATTCAAATGAGGGTCTATGACGGGAAA
    GAAGACTATATCATCTCGCCGATAAAGAACAGCAAGGGAGAGTTCTTCAGGACCGATCCG
    AAAAGAAGGGAACTTCCGATAGACGCGGATGCGAACGGCGCGTATAACATTGCTCTCAGG
    GGCGAATTGACGATGCGTGCGATAGCGGAGAAGTTCGATCCGGACTCGGAAAAGATGGCG
    AAGCTAGAACTGAAACATAAGGACTGGTTCGAATTCATGCAGACAAGGGGGGATTGA
    SEQ ATGACAAAAACATTTGATTCAGAATTTTTTAATTTATATTCTCTTCAAAAAACAGTTCGT
    ID TTTGAACTCAAGCCGGTTGGTGAAACAGCCTCGTTTGTTGAAGATTTTAAAAACGAAGGT
    NO: TTGAAACGAGTTGTTTCAGAGGATGAACGGCGTGCGGTTGATTACCAAAAAGTGAAAGAA
    24 ATTATTGATGACTACCACCGAGATTTTATTGAAGAATCGCTGAACTATTTTCCTGAGCAG
    GTCTCAAAAGACGCTTTGGAACAAGCTTTTCACCTTTATCAAAAACTAAAAGCCGCTAAG
    GTTGAAGAGCGTGAAAAAGCATTGAAAGAATGGGAAGCCCTTCAGAAAAAACTGCGCGAA
    AAAGTTGTTAAATGTTTTTCAGATTCAAACAAAGCACGCTTTTCCCGCATTGATAAAAAA
    GAACTGATTAAAGAAGATTTAATTAACTGGTTGGTTGCACAAAATCGCGAAGATGACATT
    CCAACCGTTGAAACCTTTAACAACTTTACGACTTATTTTACGGGGTTTCATGAAAACCGA
    AAAAACATTTATTCAAAAGACGATCATGCCACAGCCATTTCATTTCGACTCATTCATGAA
    AACCTGCCTAAGTTTTTTGATAATGTGATCAGCTTTAATAAATTGAAGGAAGGATTTCCA
    GAGCTGAAATTTGATAAGGTTAAGGAAGATTTAGAAGTTGATTATGACTTGAAACATGCC
    TTTGAAATCGAATACTTTGTCAATTTTGTTACCCAAGCCGGAATTGACCAATATAACTAT
    CTTTTGGGGGGTAAAACCTTAGAAGACGGCACCAAAAAGCAAGGCATGAATGAACAAATC
    AATCTGTTCAAGCAACAGCAAACCCGAGACAAAGCCCGACAAATTCCCAAACTCATACCA
    TTGTTTAAACAAATTCTAAGCGAACGAACGGAAAGCCAATCGTTTATTCCAAAACAATTT
    GAATCAGACCAAGAGCTATTTGACTCACTGCAAAAACTGCATAACAACTGCCAAGATAAA
    TTTACCGTACTGCAACAAGCCATTTTAGGCTTAGCCGAAGCAGATCTGAAAAAAGTATTC
    ATTAAAACATCTGATCTTAATGCGCTATCAAATACCATTTTTGGAAATTACAGTGTGTTT
    TCGGATGCGTTGAATTTATACAAAGAATCGCTCAAAACAAAAAAGGCGCAAGAAGCGTTT
    GAAAAACTACCCGCTCACAGCATTCATGACTTGATTCAATATTTGGAGCAATTTAATAGC
    TCTTTGGATGCAGAAAAACAGCAATCAACTGACACCGTACTGAATTACTTTATTAAAACA
    GACGAGCTGTATTCTCGGTTCATAAAATCAACGAGCGAAGCCTTCACACAAGTACAACCA
    CTCTTTGAATTGGAAGCATTAAGCTCAAAACGTCGTCCACCGGAAAGTGAAGACGAAGGC
    GCAAAAGGTCAGGAAGGGTTTGAGCAAATTAAACGCATAAAAGCCTATTTGGATACCTTG
    ATGGAGGCGGTGCATTTTGCAAAACCACTTTATCTGGTGAAGGGGCGCAAAATGATTGAA
    GGTCTGGACAAAGACCAAAGTTTCTATGAAGCCTTTGAAATGGCTTACCAAGAACTAGAA
    AGTCTGATTATTCCAATCTACAACAAAGCTCGTAGTTATTTAAGTCGTAAACCGTTTAAA
    GCGGACAAATTCAAAATTAATTTTGATAATAATACATTGCTTTCCGGTTGGGATGCTAAT
    AAAGAAACGGCTAACGCTTCAATTTTGTTTAAGAAGGATGGTTTGTATTATTTAGGAATC
    ATGCCTAAAGGAAAAACGTTTTTGTTCGATTACTTCGTTTCATCGGAAGATTCTGAAAAG
    TTAAAACAAAGAAGACAAAAAACCGCCGAAGAAGCGCTTGCGCAAGATGGCGAAAGCTAC
    TTTGAAAAAATTCGTTACAAGCTGTTACCTGGCGCCAGCAAAATGTTGCCGAAAGTATTT
    TTTTCCAACAAAAACATAGGGTTTTACAACCCAAGTGATGACATACTTCGTATCAGGAAT
    ACAGCCTCTCACACTAAAAACGGAACACCGCAAAAAGGGCACTCTAAAGTAGAGTTTAAT
    TTGAATGATTGTCATAAGATGATTGATTTCTTTAAATCAAGCATTCAAAAGCATCCAGAG
    TGGGGAAGTTTTGGATTCACCTTTTCAGATACATCAGATTTTGAAGATATGAGCGCCTTT
    TATCGAGAAGTCGAAAACCAAGGTTATGTCATTAGTTTCGATAAAATAAAAGAAACTTAC
    ATTCAGAGTCAAGTTGAACAGGGGAACCTATATTTATTCCAAATCTACAATAAAGACTTC
    TCGCCCTACAGCAAAGGCAAACCAAATTTACACACGCTTTACTGGAAAGCGTTGTTTGAG
    GAAGCCAACCTAAATAATGTGGTGGCAAAACTCAATGGTGAAGCTGAAATTTTCTTTAGG
    CGACACTCAATCAAAGCATCTGATAAAGTGGTGCACCCAGCGAATCAAGCCATTGACAAT
    AAAAACCCGCATACCGAAAAAACGCAAAGCACCTTTGAATATGATCTTGTAAAAGACAAG
    CGCTATACCCAAGACAAATTCTTCTTCCATGTACCGATTTCATTGAACTTTAAGGCACAA
    GGTGTTTCAAAATTTAACGATAAAGTGAATGGATTTTTAAAGGGTAACCCAGATGTCAAT
    ATTATTGGCATTGACCGAGGCGAACGACACCTTCTGTATTTCACTGTGGTGAATCAGAAA
    GGTGAAATTTTGGTTCAAGAGTCGCTTAATACCCTAATGAGTGATAAAGGGCATGTGAAT
    GACTACCAGCAAAAACTCGACAAAAAAGAACAAGAACGCGATGCCGCTCGCAAAAGCTGG
    ACGACGGTTGAAAATATCAAAGAATTAAAAGAAGGCTATTTATCTCATGTTGTTCATAAG
    TTGGCACACCTGATTATTAAATACAATGCCATTGTTTGCTTGGAAGACCTGAATTTTGGT
    TTCAAACGCGGGCGTTTTAAAGTGGAAAAACAAGTTTATCAGAAATTTGAAAAAGCGCTT
    ATTGATAAGCTTAACTACTTGGTATTTAAAGAAAAAGAGTTAGGCGAGGTGGGCCATTAT
    CTAACCGCCTATCAGTTGACCGCACCGTTTGAAAGTTTCAAGAAGTTAGGCAAGCAAAGT
    GGCATATTGTTTTATGTTCCGGCGGATTACACCTCCAAAATTGACCCAACCACCGGGTTT
    GTCAACTTTCTTGATCTGCGTTATCAGAGTGTCGAAAAAGCCAAACAGCTCTTAAGCGAC
    TTTAATGCCATTCGTTTTAATTCAGTACAAAACTATTTTGAGTTCGAAATAGATTACAAA
    AAACTCACACCCAAACGTAAAGTTGGTACTCAGAGTAAATGGGTGATTTGTACCTATGGA
    GATGTCCGCTATCAAAATCGGCGTAATCAAAAAGGTCACTGGGAAACGGAAGAAGTCAAT
    GTGACTGAAAAACTAAAAGCCCTTTTCGCCAGTGATTCCAAAACTACAACCGTAATCGAT
    TACGCCAATGACGACAACCTAATTGACGTCATTCTGGAACAGGACAAAGCCAGCTTCTTC
    AAAGAACTGTTATGGTTATTAAAACTCACCATGACGCTCCGCCACAGCAAAATCAAAAGT
    GAAGACGACTTTATTCTTTCACCCGTTAAAAACGAACAAGGCGAGTTTTACGATAGTCGA
    AAAGCGGGCGAGGTGTGGCCTAAAGATGCAGACGCCAATGGCGCTTATCACATAGCGTTG
    AAAGGCTTGTGGAATCTGCAACAGATCAATCAGTGGGAAAAGGGTAAAACACTTAATCTG
    GCGATTAAAAACCAGGATTGGTTCAGTTTTATTCAAGAAAAGCCCTATCAAGAATAA
    SEQ ATGCACACAGGCGGATTACTTAGCATGGATGCCAAGGAGTTTACCGGACAGTACCCCCTT
    ID TCGAAGACTCTGCGTTTTGAACTGAGACCGATAGGCAGAACGTGGGACAATCTCGAAGCA
    NO: TCGGGGTATCTTGCGGAGGACAGACACCGTGCAGAATGCTATCCCAGGGCAAAAGAGCTC
    25 TTGGACGACAACCATCGTGCATTCCTCAACCGTGTCCTGCCTCAGATCGATATGGATTGG
    CACCCGATCGCAGAGGCATTCTGCAAAGTCCACAAGAATCCGGGAAACAAGGAATTGGCT
    CAGGATTACAATCTTCAGCTGTCCAAACGCAGAAAGGAGATTTCGGCCTATCTGCAGGAT
    GCGGACGGCTATAAAGGTCTGTTTGCCAAACCTGCATTGGATGAAGCAATGAAGATCGCG
    AAAGAAAACGGAAATGAATCGGACATAGAGGTTCTTGAGGCATTCAACGGTTTCTCCGTA
    TACTTCACCGGATATCATGAGAGCAGGGAGAACATCTATTCGGACGAGGATATGGTGTCG
    GTAGCTTATCGCATCACCGAAGACAATTTCCCGAGATTCGTTTCCAATGCGCTTATATTC
    GATAAGCTGAATGAGTCGCACCCCGATATAATCTCGGAAGTATCCGGAAATCTGGGCGTA
    GACGACATCGGAAAATATTTTGATGTGTCTAACTACAATAATTTCCTGTCGCAGGCCGGT
    ATAGATGACTACAATCACATCATCGGCGGCCATACGACGGAGGACGGTCTGATCCAGGCA
    TTCAATGTTGTTCTGAATCTCAGGCATCAGAAAGACCCCGGATTCGAAAAAATCCAATTC
    AAACAGCTGTACAAACAGATACTCAGCGTCCGTACATCCAAATCCTATATCCCGAAACAG
    TTCGATAATTCGAAGGAGATGGTGGACTGCATCTGCGACTATGTGTCCAAGATCGAAAAA
    TCCGAAACGGTCGAGAGAGCATTGAAGCTGGTAAGGAACATATCTTCTTTTGATTTGCGC
    GGAATATTCGTAAACAAGAAGAATCTCCGCATTCTTTCCAACAAACTGATTGGTGATTGG
    GACGCGATCGAAACCGCGCTGATGCACTCCTCCTCTTCGGAAAATGATAAGAAATCCGTC
    TACGACAGCGCCGAGGCATTTACGCTGGATGATATCTTTTCGTCCGTTAAAAAATTCTCA
    GATGCATCTGCAGAGGATATCGGAAACCGGGCGGAGGACATATGCAGAGTCATATCTGAG
    ACCGCTCCGTTCATAAACGATCTGAGGGCTGTCGATTTGGACAGTTTGAATGACGACGGT
    TACGAGGCGGCGGTTTCCAAGATAAGGGAATCTCTGGAACCATATATGGATCTGTTTCAT
    GAACTGGAGATATTCTCCGTAGGCGATGAATTCCCGAAATGTGCAGCTTTCTACAGTGAA
    CTTGAAGAAGTCTCCGAACAGCTAATCGAGATTATACCGTTATTCAACAAGGCCCGTTCG
    TTCTGTACGCGCAAGAGATACAGTACGGACAAGATAAAGGTCAATTTGAAATTCCCGACA
    CTCGCCGACGGATGGGATCTCAACAAAGAACGCGACAACAAAGCCGCAATACTCAGGAAA
    GACGGAAAGTACTACCTGGCCATACTGGATATGAAGAAAGATCTTTCTTCGATCAGAACT
    TCGGATGAAGACGAATCCAGTTTTGAGAAAATGGAGTACAAGCTTCTTCCGAGTCCGGTA
    AAGATGCTGCCAAAGATCTTCGTAAAATCGAAGGCGGCCAAGGAGAAGTACGGTCTGACC
    GACCGTATGCTGGAGTGCTACGATAAAGGGATGCACAAGAGCGGCAGTGCATTCGATCTC
    GGATTTTGTCACGAATTGATCGATTACTACAAGAGGTGCATCGCAGAATATCCCGGCTGG
    GACGTCTTCGATTTCAAGTTCAGGGAAACATCGGATTATGGCAGCATGAAGGAGTTCAAT
    GAGGATGTTGCAGGGGCCGGATACTATATGTCCCTCAGAAAGATCCCTTGTTCGGAGGTC
    TACAGGCTTCTTGATGAGAAATCGATATATCTTTTCCAGATCTACAACAAAGATTATTCG
    GAAAACGCTCATGGGAATAAGAACATGCATACCATGTATTGGGAAGGGCTCTTTTCCCCC
    CAGAATCTGGAATCCCCTGTGTTTAAACTCAGCGGCGGTGCGGAGCTTTTCTTCCGTAAA
    TCCTCCATACCCAATGACGCCAAAACGGTCCATCCGAAGGGAAGCGTCCTGGTTCCGCGC
    AATGATGTAAACGGCCGCAGGATACCTGACAGCATATATCGGGAGCTCACCAGATATTTC
    AACCGCGGAGATTGCCGCATAAGCGACGAGGCAAAGAGTTATCTGGACAAGGTGAAAACC
    AAGAAAGCTGACCACGATATCGTGAAAGACAGGAGGTTCACGGTGGACAAGATGATGTTC
    CACGTCCCTATCGCCATGAATTTCAAAGCGATTTCGAAGCCGAATCTCAATAAAAAGGTG
    ATTGACGGCATAATCGACGACCAAGATCTGAAGATCATCGGCATAGACCGCGGAGAGCGC
    AACCTCATCTACGTAACCATGGTGGATCGCAAAGGGAACATCCTCTATCAGGATAGCCTC
    AATATTCTGAACGGATACGATTACCGTAAGGCCCTCGACGTCCGCGAATATGACAATAAA
    GAGGCTCGGAGGAACTGGACGAAGGTCGAAGGCATCCGTAAGATGAAAGAGGGGTATCTG
    TCGCTTGCAGTCAGCAAATTGGCAGATATGATCATAGAGAACAATGCGATTATCGTCATG
    GAGGATCTCAATCACGGATTCAAGGCAGGGCGTTCGAAGATAGAGAAACAGGTCTATCAG
    AAGTTCGAATCCATGCTCATAAACAAACTCGGTTACATGGTCCTCAAGGATAAGTCTATC
    GATCAGAGCGGCGGAGCTCTCCACGGATACCAGCTTGCCAACCATGTGACAACATTGGCA
    TCTGTAGGTAAACAATGTGGAGTGATATTCTACATCCCTGCTGCATTTACATCCAAGATA
    GATCCGACAACAGGATTTGCAGATCTGTTCGCCCTCAGCAATGTTAAAAACGTGGCATCT
    ATGAGAGAATTTTTCTCCAAGATGAAGTCTGTAATCTATGATAAGGCGGAGGGAAAATTC
    GCATTTACCTTCGACTATCTTGATTATAATGTGAAATCCGAGTGCGGAAGGACCCTTTGG
    ACCGTGTATACGGTCGGAGAGAGATTCACATACAGCAGGGTCAATAGAGAATATGTCAGA
    AAAGTTCCGACAGACATAATCTACGACGCATTGCAAAAGGCAGGAATATCTGTTGAAGGG
    GATCTCAGGGACAGGATTGCTGAATCGGATGGCGACACTCTGAAGAGCATATTCTATGCA
    TTCAAGTATGCATTGGATATGAGAGTAGAGAACCGCGAAGAGGATTACATACAGTCTCCT
    GTCAAAAATGCCTCCGGAGAATTCTTCTGTTCCAAGAACGCAGGCAAATCGCTCCCTCAG
    GATTCCGATGCGAACGGTGCATACAATATCGCACTCAAGGGGATCCTGCAGCTACGTATG
    CTTTCCGAGCAGTATGATCCGAATGCAGAGAGCATACGGTTGCCACTGATAACCAACAAG
    GCCTGGCTGACCTTTATGCAGTCCGGTATGAAGACATGGAAGAACTGA
    SEQ atgGATAGTTTGAAAGATTTCACCAATCTGTACCCTGTCAGTAAGACATTGAGATTTGAA
    ID TTAAAGCCCGTTGGAAAGACTTTAGAAAATATCGAGAAAGCAGGTATTTTGAAAGAGGAT
    NO: GAGCATCGTGCAGAAAGTTATCGGAGGGTGAAGAAAATAATTGATACTTATCATAAGGTA
    26 TTTATCGATTCTTCTCTTGAAAATATGGCTAAAATGGGTATTGAGAATGAAATAAAAGCA
    ATGCTCCAAAGTTTCTGCGAATTGTATAAAAAAGATCATCGCACTGAGGGTGAAGACAAG
    GCATTAGATAAAATTCGAGCAGTACTTCGTGGCCTGATTGTTGGGGCTTTCACTGGTGTT
    TGCGGAAGACGGGAAAATACAGTCCAAAACGAGAAGTACGAGAGTTTGTTCAAAGAAAAG
    TTGATAAAAGAAATTTTACCTGATTTTGTGCTCTCTACTGAGGCTGAAAGCTTGCCTTTC
    TCTGTTGAAGAAGCTACGAGGTCACTGAAGGAGTTTGATAGCTTTACATCCTACTTTGCT
    GGTTTTTACGAGAATAGAAAGAATATATACTCGACGAAACCTCAATCCACTGCCATTGCT
    TATCGTCTTATTCATGAGAACTTGCCGAAGTTCATTGATAATATTCTTGTTTTTCAGAAG
    ATCAAAGAGCCTATAGCCAAAGAGCTGGAACATATTCGTGCGGACTTTTCTGCCGGGGGG
    TACATAAAAAAGGATGAGAGATTGGAGGATATTTTTTCGTTGAACTATTATATCCACGTG
    TTATCTCAGGCTGGGATCGAAAAATATAACGCATTGATTGGGAAGATTGTGACAGAAGGA
    GATGGAGAGATGAAAGGGCTCAATGAACACATCAACCTTTACAACCAACAAAGAGGCAGA
    GAGGATCGGCTCCCTCTTTTTAGGCCTCTTTATAAACAGATATTGAGTGACAGAGAGCAA
    TTATCATACTTGCCTGAGAGTTTTGAAAAAGATGAGGAGCTCCTCAGGGCTCTAAAAGAG
    TTCTATGATCATATCGCAGAAGACATTCTCGGACGTACTCAACAGTTGATGACTTCTATT
    TCAGAATATGATTTATCTCGGATATACGTAAGGAACGATAGCCAATTGACTGATATATCA
    AAAAAAATGTTGGGAGATTGGAATGCTATCTACATGGCTAGAGAACGAGCATATGACCAC
    GAGCAGGCTCCCAAAAGAATCACGGCGAAATACGAGAGGGACAGGATTAAAGCTCTTAAA
    GGAGAAGAGAGTATAAGTCTGGCAAATCTTAATAGTTGTATTGCCTTTCTGGACAATGTT
    AGAGATTGCCGTGTAGATACTTATCTTTCCACACTGGGCCAGAAGGAAGGACCACATGGT
    CTATCTAATCTCGTTGAGAACGTTTTTGCCTCATACCATGAAGCAGAGCAATTGTTGAGC
    TTTCCATACCCCGAAGAGAATAATCTGATTCAGGACAAGGACAATGTGGTGTTAATTAAG
    AATCTTCTCGACAATATCAGTGATCTGCAGAGGTTCTTGAAACCTCTTTGGGGTATGGGA
    GACGAACCCGATAAAGATGAAAGATTTTATGGAGAGTATAATTATATCCGAGGAGCTCTA
    GATCAGGTGATCCCTCTGTACAATAAGGTAAGGAACTACCTCACTCGGAAGCCTTATTCG
    ACCAGAAAAGTAAAACTCAATTTTGGGAATTCTCAATTGCTTAGTGGTTGGGATAGAAAT
    AAGGAAAAGGATAATAGCTGTGTGATTTTGCGTAAGGGGCAGAACTTCTATTTGGCTATT
    ATGAACAATAGGCACAAAAGAAGTTTCGAAAACAAGGTGTTGCCCGAGTATAAGGAGGGA
    GAACCTTACTTCGAAAAGATGGATTATAAATTTTTGCCTGATCCTAATAAAATGCTTCCT
    AAGGTTTTTCTTTCGAAAAAAGGAATAGAGATATACAAACCAAGTCCGAAGCTTTTAGAA
    CAATATGGACATGGAACTCACAAAAAGGGAGATACCTTTAGTATGGATGATTTGCACGAA
    CTGATCGATTTCTTCAAACACTCAATCGAGGCTCATGAAGATTGGAAGCAATTCGGATTC
    AAATTTTCTGATACGGCTACTTATGAGAATGTATCTAGTTTCTATAGAGAAGTTGAGGAT
    CAGGGGTATAAGCTCTCTTTCCGAAAAGTTTCGGAATCTTATGTCTATTCATTAATAGAT
    CAAGGCAAGTTGTATTTATTTCAGATATACAACAAGGACTTTTCTCCCTGCAGCAAAGGG
    ACACCTAATCTGCATACCTTGTATTGGAGAATGCTTTTTGACGAGCGCAATTTGGCAGAT
    GTCATATACAAACTGGATGGGAAGGCTGAAATCTTTTTCCGAGAGAAGAGTTTGAAAAAT
    GATCATCCCACGCATCCGGCTGGTAAGCCTATCAAAAAGAAAAGTCGACAAAAAAAAGGA
    GAGGAGAGTCTGTTTGAGTATGATTTAGTCAAGGATAGGCACTATACGATGGATAAGTTC
    CAGTTTCATGTGCCTATTACTATGAATTTTAAATGTTCTGCAGGAAGCAAAGTCAATGAT
    ATGGTTAATGCTCATATTCGAGAGGCAAAGGATATGCATGTCATTGGAATTGATCGTGGA
    GAACGCAATCTGCTGTATATATGCGTGATAGATAGTCGAGGGACGATTTTGGATCAAATT
    TCTCTGAATACGATTAACGATATAGACTATCATGATTTATTGGAGAGTCGAGACAAAGAC
    CGTCAGCAGGAGCGCCGAAACTGGCAAACTATCGAAGGGATCAAGGAGCTAAAACAAGGC
    TACCTTAGTCAGGCGGTTCATCGGATAGCCGAACTGATGGTGGCTTATAAGGCTGTAGTT
    GCTTTGGAGGATTTGAATATGGGGTTCAAACGTGGGCGGCAGAAAGTAGAAAGTTCTGTT
    TATCAGCAGTTTGAGAAACAGCTGATAGATAAGCTCAACTATCTTGTGGACAAGAAGAAA
    AGGCCTGAAGATATTGGAGGATTGTTGAGAGCCTATCAATTTACGGCCCCATTTAAGAGT
    TTTAAGGAAATGGGAAAGCAAAACGGCTTCTTGTTTTATATCCCGGCTTGGAACACGAGC
    AACATAGATCCGACTACTGGATTTGTTAATTTATTTCATGCCCAGTATGAAAATGTAGAT
    AAAGCGAAGAGCTTCTTTCAAAAGTTTGATTCAATTAGTTACAACCCGAAGAAAGACTGG
    TTTGAGTTTGCATTCGATTATAAAAACTTTACTAAAAAGGCTGAAGGAAGTCGTTCTATG
    TGGATATTATGCACACATGGTTCCCGAATAAAGAATTTTAGAAATTCCCAGAAGAATGGT
    CAATGGGATTCCGAAGAATTCGCCTTGACGGAGGCTTTTAAGTCTCTTTTTGTGCGATAT
    GAGATAGATTATACCGCTGATTTGAAAACAGCTATTGTGGACGAAAAGCAAAAAGACTTC
    TTCGTGGATCTTCTGAAGCTATTCAAATTGACAGTACAGATGCGCAACAGCTGGAAAGAG
    AAGGATTTGGATTATCTAATCTCTCCTGTAGCAGGGGCTGATGGCCGTTTCTTCGATACA
    AGAGAGGGAAATAAAAGTCTGCCTAAGGATGCAGATGCCAATGGAGCTTATAATATTGCC
    CTAAAAGGACTTTGGGCTCTACGCCAGATTCGGCAAACTTCAGAAGGCGGTAAACTCAAA
    TTGGCGATTTCCAATAAGGAATGGCTACAGTTTGTGCAAGAGAGATCTTACGAGAAAGAC
    tga
    SEQ atgaataatggaacaaataactttcagaattttatcggaatttcttctttgcagaagact
    ID cttaggaatgctctcattccaaccgaaacaacacagcaatttattgttaaaaacggaata
    NO: attaaagaagatgagctaagaggagaaaatcgtcagatacttaaagatatcatggatgat
    27 tattacagaggtttcatttcagaaactttatcgtcaattgatgatattgactggacttct
    ttatttgagaaaatggaaattcagttaaaaaatggagataacaaagacactcttataaaa
    gaacagactgaataccgtaaggcaattcataaaaaatttgcaaatgatgatagatttaaa
    aatatgttcagtgcaaaattaatctcagatattcttcctgaatttgtcattcataacaat
    aattattctgcatcagaaaaggaagaaaaaacacaggtaattaaattattttccagattt
    gcaacgtcattcaaggactattttaaaaacagggctaattgtttttcggctgatgatata
    tcttcatcttcttgtcatagaatagttaatgataatgcagagatattttttagtaatgca
    ttggtgtataggagaattgtaaaaagtctttcaaatgatgatataaataaaatatccgga
    gatatgaaggattcattaaaggaaatgtctctggaagaaatttattcttatgaaaaatat
    ggggaatttattacacaggaaggtatatctttttataatgatatatgtggtaaagtaaat
    tcatttatgaatttatattgccagaaaaataaagaaaacaaaaatctctataagctgcaa
    aagcttcataaacagatactgtgcatagcagatacttcttatgaggtgccgtataaattt
    gaatcagatgaagaggtttatcaatcagtgaatggatttttggacaatattagttcgaaa
    catatcgttgaaagattgcgtaagattggagacaactataacggctacaatcttgataag
    atttatattgttagtaaattctatgaatcagtttcacaaaagacatatagagattgggaa
    acaataaatactgcattagaaattcattacaacaatatattacccggaaatggtaaatct
    aaagctgacaaggtaaaaaaagcggtaaagaatgatctgcaaaaaagcattactgaaatc
    aatgagcttgttagcaattataaattatgttcggatgataatattaaagctgagacatat
    atacatgaaatatcacatattttgaataattttgaagcacaggagcttaagtataatcct
    gaaattcatctggtggaaagtgaattgaaagcatctgaattaaaaaatgttctcgatgta
    ataatgaatgcttttcattggtgttcggttttcatgacagaggagctggtagataaagat
    aataatttttatgccgagttagaagagatatatgacgaaatatatccggtaatttcattg
    tataatcttgtgcgtaattatgtaacgcagaagccatatagtacaaaaaaaattaaattg
    aattttggtattcctacactagcggatggatggagtaaaagtaaagaatatagtaataat
    gcaattattctcatgcgtgataatttgtactatttaggaatatttaatgcaaaaaataag
    cctgacaaaaagataattgaaggtaatacatcagaaaataaaggggattataagaagatg
    atttataatcttctgccaggaccaaataaaatgatccccaaggtattcctctcttcaaaa
    accggagtggaaacatataagccgtctgcctatatattggagggctataaacaaaacaag
    catattaaatcctctaaggattttgatataacattttgtcacgatttgattgattatttt
    aagaactgtatagcaatacatcctgaatggaagaattttggctttgatttttctgacacc
    tccacatatgaagatatcagcggattttacagagaagtcgaattacaaggttataaaatc
    gactggacatatatcagcgaaaaggatattgatttgttgcaggaaaaaggacagttatat
    ttattccaaatatataacaaagatttttccaagaaaagtaccggaaatgataatcttcat
    actatgtatttgaagaatttgtttagtgaagagaatttaaaggatattgtactgaaatta
    aacggtgaggcggaaatcttctttagaaaatcaagcataaagaatccaataattcataaa
    aaaggctctattcttgttaatagaacatatgaagcagaggaaaaagatcaatttggaaat
    atccagatagtcagaaaaaacataccggaaaatatatatcaggagctttataaatatttc
    aatgataaaagtgataaagaactttcggatgaagcagctaagcttaagaatgtagtaggt
    catcatgaggctgctacaaacatagtaaaagattatagatatacatatgataaatatttt
    cttcatatgcctattacaatcaattttaaagccaataagacaggctttattaatgacaga
    atattacaatatattgctaaagaaaaggatttgcatgtaataggcattgatcgtggtgaa
    agaaacctgatatatgtttcagtaattgatacttgtggaaatattgttgaacaaaaatcg
    tttaacattgttaatggatatgattatcagattaagctcaagcagcaggagggggcgcga
    caaatcgcacgaaaagaatggaaagaaatcggcaaaataaaagaaattaaagaaggctat
    ttatctcttgtaattcatgaaatttcaaagatggttattaaatataatgccataattgca
    atggaggatttaagctacggatttaaaaaaggtcgtttcaaggttgagcgacaggtttac
    cagaagtttgagacaatgcttatcaacaaactcaactatctggtatttaaagatatatcc
    ataacggaaaacggtggtcttctaaagggataccagcttacatatattccagataaactg
    aaaaatgtgggtcatcaatgtggctgtatattttatgtacctgctgcctatacatcaaaa
    atagatcctacaaccggatttgtaaatatattcaaatttaaagatttaacagttgatgcg
    aagagagaatttataaaaaaatttgacagtatcagatatgattcagaaaaaaatctgttt
    tgttttacattcgattataataactttattacgcaaaatactgttatgtcaaagtcaagc
    tggagtgtatatacgtacggagttaggataaaaagaagatttgtcaatggcaggttctca
    aatgaatcggatacaattgatataacaaaagatatggaaaaaacactcgaaatgacagat
    ataaattggagagatggtcatgatctgaggcaggatattattgattatgaaatcgtacaa
    cacatatttgagatttttagattgactgtacaaatgagaaacagtttaagtgaattagaa
    gacagggattatgaccgtttgatttctccggtgctcaatgaaaataatatattttatgat
    tcagctaaagcaggagatgcgttacctaaagacgcagatgctaatggtgcatattgtata
    gctctaaaaggcttgtatgaaatcaaacaaattacagagaattggaaagaagacggtaag
    ttttcaagagataaacttaaaatttccaataaggactggtttgactttattcaaaataaa
    aggtatttataa
    SEQ atgacaaacaaatttacaaaccagtactcgctttccaaaacacttcgatttgagttgatt
    ID ccacaaggaaaaacattggaatttattcaagaaaaaggattgctctctcaagataaacaa
    NO: cgagcggagagttatcaagaaatgaaaaaaactattgataaatttcataaatactttatc
    28 gatttagctttaagcaatgctaaactaactcatttagaaacttacttggaattatacaat
    aaaagtgctgaaacaaaaaaagaacaaaaatttaaagacgatttaaagaaagtacaagac
    aatttacgaaaagaaatcgttaaatctttttcagatggtgatgcaaaatcaatttttgca
    attttggataaaaaagaactgattaccgtagaacttgaaaaatggtttgaaaacaacgaa
    caaaaagacatttattttgacgaaaaattcaaaacgtttactacttattttactggtttt
    catcaaaacagaaaaaacatgtattcggttgaacccaattctacagcaattgcttatcga
    ttgattcatgaaaatttacctaaatttttagaaaatgctaaagcatttgaaaaaataaaa
    caagtagaaagtttgcaagttaattttagagaattaatgggggaatttggagatgaaggg
    ctaattttcgtaaatgaattagaagaaatgtttcaaatcaattattataatgatgtgctt
    tcacaaaatggaattacaatttataatagtataatttcaggatttaccaaaaatgatata
    aaatataaaggtctaaatgaatacataaataattacaatcaaaccaaagacaaaaaagac
    cgtttgccaaaattaaaacaattgtataaacagattttgagtgataggatttcactttcg
    tttttgcccgatgcttttacggatgggaaacaagttttgaaagccatatttgacttttat
    aaaatcaacttactttcttataccattgaaggacaggaagaaagccaaaatcttttacta
    ttaattcgtcagacaattgaaaacctttctagttttgatacccaaaaaatttatctaaaa
    aatgatacccatttaaccactatttcacaacaagtatttggcgatttttcggtgttttca
    actgctttaaattattggtatgaaactaaagtaaatccaaaatttgaaacggaatatagc
    aaagccaacgaaaaaaaacgagaaattttagataaagccaaagcggtatttacaaaacaa
    gattatttttcaattgcttttttacaagaagtactttcggaatacattcttaccttagat
    cacacttctgatattgtaaaaaagcattcctccaactgtattgcggattattttaaaaat
    cattttgtagccaaaaaagaaaatgaaaccgacaaaacctttgattttattgctaatatt
    actgcaaaataccaatgtattcaaggtattttagaaaatgcagaccaatacgaagacgaa
    ctcaaacaagaccaaaaattaattgataatttgaaattctttttagatgctattttagaa
    ttgttgcattttattaaacctttgcatttaaaatcagaaagcattaccgaaaaagacact
    gctttttatgatgtgtttgaaaattattacgaagcattgagtttgttgaccccattatat
    aatatggtgcgaaactatgtaacgcaaaagccgtacagcaccgaaaaaataaaattaaat
    tttgaaaatgcacaattattgaatggttgggatgccaataaagaaggtgattacctaact
    accattttgaaaaaagacggtaattattttttagccataatggataaaaagcataacaaa
    gcgtttcaaaagtttccagaaggaaaagaaaattatgaaaaaatggtgtataaactattg
    cctggagtaaataagatgttgccaaaagtatttttttccaataaaaatattgcttacttc
    aacccatcaaaagagttattagaaaactataaaaaagagacgcacaaaaaaggagacaca
    ttcaatttagaacattgtcatacgttgatcgattttttcaaggactctttaaacaaacat
    gaagactggaaatactttgattttcaattttctgaaacaaaatcgtatcaagatttgagt
    ggtttttatagagaagtagaacatcaaggctacaaaatcaattttaaaaatatcgattca
    gaatatattgatggtttggtgaacgaaggtaaattgtttctatttcaaatttacagcaaa
    gatttttcgcctttttccaaagggaaaccgaacatgcacactttgtattggaaagcctta
    tttgaagaacaaaatttgcaaaatgtaatctataaattgaatggacaagccgaaatattt
    tttagaaaagcctctataaaacctaaaaatataatattgcacaaaaagaaaattaaaatt
    gccaaaaagcattttattgataaaaaaacaaaaacatctgaaattgttcctgttcaaaca
    ataaaaaacctcaatatgtactaccaaggaaaaataagtgaaaaagaattaacacaagat
    gatttaaggtatattgataattttagcattttcaatgaaaaaaataaaacaattgatatt
    ataaaagacaaacgatttacggttgataaatttcagtttcatgtgccgattaccatgaac
    tttaaagcaacgggcggaagttatatcaatcaaaccgtattagaatatttgcaaaacaat
    cccgaagttaagattattggattggatagaggcgaacgccatttggtatatctgacactg
    atagaccagcaaggaaacatcttgaaacaagaaagtttgaatacaatcaccgattctaaa
    atctcgacaccttatcataagttgttggataacaaggaaaacgagcgtgacttggctcga
    aaaaattggggaacggtggaaaacatcaaagaactcaaagaaggctacatcagtcaagtg
    gtgcataaaattgctacgttgatgctggaagaaaatgccattgtggtaatggaagatttg
    aattttggatttaaacgtggacgttttaaagtggaaaaacaaatttatcaaaagctggaa
    aaaatgttgattgacaaattgaattatttggttttaaaagacaaacaacctcaggaatta
    ggcggattgtacaacgcattacaactcaccaataaatttgaaagtttccaaaaaatgggt
    aaacaatcgggctttttgttttatgtacccgcttggaacacctccaaaatagacccaacc
    acagggtttgtcaattatttttataccaaatatgaaaatgttgacaaagccaaagccttt
    tttgaaaaatttgaggcgattcgtttcaatgcagaaaagaagtattttgaatttgaagta
    aaaaaatatagcgattttaacccaaaagccgaaggcactcaacaagcctggaccatttgc
    acgtatggcgaacgaatagaaaccaaacgacaaaaagaccaaaacaacaaatttgtaagc
    actccaattaatctaaccgaaaagatagaagactttttgggtaaaaaccaaattgtttat
    ggtgatggtaattgcatcaaatctcaaattgctagcaaagacgacaaggctttttttgaa
    accttattgtattggttcaaaatgactttacaaatgcgaaacagcgaaacaagaacagat
    atagattatctaatttcgcccgtgatgaatgacaacggaacattttacaacagccgagat
    tatgaaaaattagaaaatccaactttgcccaaagatgccgatgccaacggagcgtatcat
    attgccaaaaaaggattgatgcttttgaataaaatagaccaagccgacttgacaaaaaaa
    gtggatttatctattagtaacagagattggttgcaatttgtacaaaaaaataaataa
    SEQ atggaacaggagtactatttaggactggatatgggaaccggatctgtaggatgggctgtt
    ID acagattcggaatatcatgtcttgcgtaaacatggaaaagcactatggggagtccgatta
    NO: tttgaaagtgcatcgacagcagaagaacgaagaatgttccgaacatcaagaagaagacta
    29 gatcgaagaaactggagaattgaaattttacaggaaatttttgcagaggaaataagtaag
    aaagatccaggatttttcttgcgaatgaaagaaagcaaatattatccagaagataagcga
    gatatcaatggaaattgtccggaactgccatatgcattatttgttgatgacgattttaca
    gataaagattatcataaaaaatttccgacaatttatcatctcaggaaaatgttgatgaat
    acagaggagacaccggatatccggttggtgtatctggcaattcatcatatgatgaagcat
    aggggccatttcttgttatctggtgacattaatgagattaaggagttcggaacgacattt
    tcaaaattgttggagaatatcaaaaatgaggaattggattggaatcttgaactgggaaaa
    gaagaatatgctgttgtagaaagtattttaaaagataacatgttaaaccgatccacaaag
    aaaaccagattaataaaagcattaaaagcaaaatcaatatgtgaaaaggctgtactgaat
    ttattggctggtggaacggtgaaattgagtgatatatttggtcttgaagaattaaatgag
    acagaaagaccgaagatttcctttgctgataatggatacgatgattatatcggagaagtt
    gaaaatgagctgggagaacaattctatattatagagacggcaaaagcagtgtatgactgg
    gcggtattagttgaaatattgggaaaatatacgtcaatttcagaagcgaaagtagcaacg
    tatgaaaaacataaatcggatttacaatttttgaaaaagatagttcggaaatatctgaca
    aaggaggaatataaagatatttttgtaagtacgagtgacaaattgaaaaattactctgct
    tatataggaatgacgaaaataaatggaaaaaaggttgatttgcagagcaaacggtgcagt
    aaagaagaattctatgattttattaagaaaaacgtacttaaaaagctagaaggacaacct
    gaatatgaatatttgaaagaagagctagaaagagaaacatttctaccaaaacaggtgaac
    agggataatggtgtaataccgtatcagattcatttgtacgagttgaaaaagatattagga
    aatttacgggataaaatagacctcattaaagagaacgaagataaactggttcaattattt
    gaattcagaattccgtattatgttggtccgctgaataagatagatgacggaaaagaggga
    aaatttacatgggctgtacggaaaagtaatgaaaagatatatccatggaattttgaaaat
    gtagttgatatagaagcaagtgcagaaaaatttatccggagaatgacaaataagtgtaca
    tatctgatgggcgaagatgtattgccgaaggattcattgctttacagtaaatatatggtt
    ttaaatgaattaaataatgtaaagttggatggcgaaaaattatctgtagaattgaaacaa
    cggttgtatacagatgtattttgtaagtatcggaaagtaactgtaaagaagataaaaaat
    tacttgaaatgtgaaggtatcatatccggcaatgtcgaaataactggaattgatggtgat
    tttaaggcatcgttaacggcatatcatgattttaaagaaatcttgacaggaacagaattg
    gctaaaaagga
    caaagaaaatattattaccaatatagtattgtttggagatgataaaaagctgctgaaaaa
    gagactgaatcgattatatcctcagattacgccgaatcagttgaagaaaatatgtgcgct
    atcctatacaggctggggaagattttctaaaaagttcttagaagaaataacagctccaga
    tccggaaacgggagaggtatggaatatcattacggcattgtgggaatcgaataataatct
    gatgcaattattaagtaatgaatatcggtttatggaagaagtcgaaacatacaatatggg
    aaaacagactaaaacattgtcgtacgaaacagtagagaatatgtatgtttctccatctgt
    gaaaagacagatatggcagacgctgaaaatcgtgaaagaattagaaaaagtaatgaaaga
    atctccgaaacgtgtatttattgagatggcgagagaaaagcaagaaagtaagagaaccga
    atcgcgtaaaaaacaactaatagatttgtataaggcttgtaaaaatgaagaaaaagattg
    ggtaaaagaactgggagatcaggaagaacagaaattacgaagcgataagttgtacctata
    ttatacgcaaaagggtcgttgtatgtattctggcgaggtaatagaactgaaagacttatg
    ggataatacaaaatatgatattgatcatatatatccacaatctaaaacgatggatgacag
    tcttaataatcgcgtattggtaaaaaagaaatataatgcaacaaaatcagataagtatcc
    attaaatgaaaatatacgacatgagagaaaaggcttttggaagtcactgttagatggagg
    gtttataagtaaagaaaaatatgaacgcttaataagaaatacagaattgagtccggaaga
    attagcaggatttattgaaaggcagattgttgaaacgaggcagagtacaaaagctgtagc
    ggaaatattaaagcaagtgtttccggaaagtgaaattgtatatgtcaaagcaggtacggt
    ttcaagattcagaaaagattttgaattactgaaagttcgagaagtgaatgatttgcatca
    cgcaaaggatgcgtatttaaatattgtagttggtaatagttattatgtgaaatttactaa
    gaatgcatcatggtttataaaagaaaatccgggacgtacttacaacttaaaaaagatgtt
    tacatcaggttggaatattgaacgaaatggagaagttgcatgggaagtcgggaaaaaagg
    aacaattgtaacggtaaaacaaataatgaataaaaataatatattggtgacaagacaggt
    tcatgaagcgaaaggtgggctgtttgatcagcagattatgaaaaaaggaaaaggtcagat
    tgctataaaggaaactgatgaacgtcttgcatcaatagaaaagtatggaggctataataa
    agctgccggggcatattttatgctggtagaatctaaagataaaaaaggaaaaacaattcg
    aacgatagaatttataccattatatttaaagaataaaatcgagtcggatgaatcaatagc
    attgaactttttagaaaaaggcagaggtttgaaagaaccaaagatactattgaaaaaaat
    taagattgatacattatttgatgtggacggattcaaaatgtggttgtctggaagaacagg
    ggacagactactatttaaatgtgcaaatcaattgattttggatgagaaaataattgtaac
    aatgaaaaaaattgtaaagtttattcaaaggagacaagaaaatagagaattaaaattatc
    tgataaagatggaattgataatgaagtacttatggaaatatataacacttttgtggataa
    gttagaaaacacagtgtatagaatacgattatccgaacaggcaaaaacgcttatagataa
    acaaaaagaatttgaaaggttatcactagaggataaaagtagtactttgtttgaaatttt
    acatatttttcagtgtcaaagtagtgcggccaatttaaaaatgataggcggacctggaaa
    agcaggaatattagttatgaataataatataagtaagtgtaacaaaatttctattataaa
    tcagtctccaacaggaattttcgaaaatgagattgatttgttaaagat
    SEQ ATGAAATCTTTCGATTCATTCACAAATCTTTATTCTCTTTCAAAAACCTTGAAATTTGAG
    ID ATGAGACCTGTCGGAAATACCCAAAAAATGCTCGACAATGCAGGAGTATTTGAAAAAGAC
    NO: AAACTAATTCAAAAAAAGTACGGAAAAACAAAGCCGTATTTCGACAGACTCCACAGAGAA
    30 TTTATAGAAGAAGCGCTCACGGGGGTAGAGCTAATAGGACTAGATGAGAACTTTAGGACA
    CTTGTTGACTGGCAAAAAGATAAGAAAAATAATGTCGCAATGAAAGCGTATGAAAATAGT
    TTGCAGCGGCTGAGAACGGAAATAGGTAAAATATTTAACCTAAAGGCTGAGGATTGGGTA
    AAGAACAAATATCCAATATTAGGGCTGAAAAATAAAAATACCGATATTTTATTCGAAGAG
    GCTGTATTCGGGATATTGAAAGCCCGATATGGAGAAGAAAAAGATACTTTTATAGAAGTA
    GAGGAAATAGATAAAACCGGCAAATCAAAGATCAATCAAATATCAATTTTCGATAGTTGG
    AAAGGATTTACAGGATATTTCAAAAAATTTTTTGAAACCAGAAAGAATTTTTACAAAAAC
    GACGGAACTTCTACAGCAATTGCTACAAGGATCATTGATCAAAATCTGAAAAGATTCATA
    GATAATCTGTCAATAGTTGAAAGTGTGAGACAAAAGGTTGATCTCGCCGAGACAGAAAAA
    TCTTTCAGCATATCTCTATCGCAATTCTTCTCAATAGACTTTTATAACAAGTGTCTCCTT
    CAAGATGGTATTGATTACTACAACAAGATAATCGGTGGAGAAACTCTCAAAAATGGCGAA
    AAACTAATAGGTCTCAATGAACTAATAAATCAATATAGGCAGAATAATAAGGATCAGAAA
    ATCCCATTTTTCAAACTTCTTGATAAACAAATTTTGAGTGAAAAGATATTATTTTTGGAT
    GAAATAAAAAATGACACAGAACTGATCGAGGCGCTGAGTCAGTTCGCAAAAACAGCCGAA
    GAAAAAACAAAAATTGTCAAAAAGCTTTTTGCCGATTTTGTAGAAAATAATTCCAAATAC
    GATCTTGCACAGATTTATATTTCCCAAGAAGCATTCAATACTATATCAAACAAGTGGACA
    AGCGAAACTGAGACGTTCGCTAAATATCTATTCGAAGCAATGAAGAGTGGAAAACTTGCA
    AAGTATGAGAAAAAAGATAATAGCTATAAATTTCCTGATTTTATTGCCCTTTCACAGATG
    AAGAGTGCTTTATTAAGTATCAGCCTTGAGGGACATTTTTGGAAAGAGAAATACTACAAA
    ATTTCAAAATTCCAAGAGAAGACCAATTGGGAGCAGTTTCTTGCAATTTTTCTATACGAG
    TTTAACTCTCTTTTCAGCGACAAAATAAATACAAAAGATGGAGAAACAAAGCAAGTTGGA
    TACTATCTATTTGCCAAAGACCTGCATAATCTTATCTTAAGTGAGCAGATTGATATTCCA
    AAAGATTCAAAAGTCACAATAAAAGATTTTGCCGATTCTGTACTCACAATCTACCAAATG
    GCAAAATATTTTGCGGTAGAAAAAAAACGAGCGTGGCTTGCCGAGTATGAACTAGATTCA
    TTTTATACCCAGCCAGACACAGGCTATTTACAGTTTTATGATAACGCCTACGAGGATATT
    GTGCAGGTATACAACAAGCTTCGAAACTATCTGACCAAAAAGCCATATAGCGAGGAGAAA
    TGGAAGTTGAATTTTGAAAATTCTACGCTGGCAAATGGATGGGATAAGAACAAAGAATCT
    GATAATTCAGCAGTTATTCTACAAAAAGGTGGAAAATATTATTTGGGACTGATTACTAAA
    GGACACAACAAAATTTTTGATGACCGTTTTCAAGAAAAATTTATTGTGGGAATTGAAGGT
    GGAAAATATGAAAAAATAGTCTATAAATTTTTCCCCGACCAGGCAAAAATGTTTCCCAAA
    GTGTGCTTTTCTGCAAAAGGACTCGAATTTTTTAGACCGTCTGAAGAAATTTTAAGAATT
    TATAACAATGCAGAGTTTAAAAAAGGAGAAACTTATTCAATAGATAGTATGCAGAAGTTG
    ATTGATTTTTATAAAGATTGCTTGACTAAATATGAAGGCTGGGCATGTTATACCTTTCGG
    CATCTAAAACCCACAGAAGAATACCAAAACAATATTGGAGAGTTTTTTCGAGATGTTGCA
    GAGGACGGATACAGGATTGATTTTCAAGGCATTTCAGATCAATATATTCATGAAAAAAAC
    GAGAAAGGCGAACTTCACCTTTTTGAAATCCACAATAAAGATTGGAATTTGGATAAGGCA
    CGAGACGGAAAGTCAAAAACAACACAAAAAAACCTTCATACACTCTATTTCGAATCGCTC
    TTTTCAAACGATAATGTTGTTCAAAACTTTCCAATAAAACTCAATGGTCAAGCTGAAATT
    TTTTATAGACCGAAAACGGAAAAAGACAAATTAGAATCAAAAAAAGATAAGAAAGGGAAT
    AAAGTGATTGACCATAAACGCTATAGTGAGAATAAGATTTTTTTTCATGTTCCTCTCACA
    CTAAACCGCACTAAAAATGACTCATATCGCTTTAATGCTCAAATCAACAACTTTCTCGCA
    AATAATAAAGATATCAACATCATCGGTGTAGATAGGGGAGAAAAGCATTTAGTCTATTAT
    TCGGTGATTACACAAGCTAGTGACATCTTAGAAAGTGGCTCACTAAATGAGCTAAATGGC
    GTGAATTATGCTGAAAAACTGGGAAAAAAGGCAGAAAATCGAGAACAAGCACGCAGAGAC
    TGGCAAGACGTACAAGGGATCAAAGACCTCAAGAAAGGATATATTTCACAGGTGGTGCGA
    AAGCTTGCTGATTTAGCAATTAAACACAATGCCATTATCATTCTTGAAGATTTGAATATG
    AGATTTAAACAAGTTCGGGGCGGTATCGAAAAATCCATTTATCAACAGTTAGAAAAAGCA
    CTGATAGATAAATTAAGCTTTCTTGTAGACAAAGGTGAAAAAAATCCCGAGCAAGCAGGA
    CATCTTCTGAAAGCATATCAGCTTTCGGCCCCATTTGAGACATTTCAAAAAATGGGCAAA
    CAGACGGGTATAATCTTTTATACACAAGCTTCGTATACCTCAAAAAGTGACCCTGTAACA
    GGTTGGCGACCACACCTGTATCTCAAATATTTCAGTGCCAAAAAAGCAAAAGACGATATT
    GCAAAGTTTACAAAAATAGAATTTGTAAACGATAGGTTTGAGCTTACCTATGATATAAAG
    GACTTTCAGCAAGCAAAAGAATATCCAAATAAAACTGTTTGGAAAGTTTGCTCAAATGTA
    GAGAGATTCAGGTGGGACAAAAACCTCAATCAAAACAAAGGCGGATATACTCACTACACA
    AATATAACTGAGAATATCCAAGAGCTTTTTACAAAATATGGAATTGATATCACAAAAGAT
    TTGCTCACACAGATTTCTACAATTGATGAAAAACAAAATACCTCATTTTTTAGAGATTTT
    ATTTTTTATTTCAACCTTATTTGCCAAATCAGAAATACCGATGATTCTGAGATTGCTAAA
    AAGAATGGGAAAGATGATTTTATACTGTCACCTGTTGAGCCGTTTTTCGATAGCCGAAAA
    GACAATGGAAATAAACTTCCTGAGAATGGAGATGATAACGGCGCGTATAACATAGCAAGA
    AAAGGGATTGTCATACTCAACAAA
    ATCTCACAATATTCAGAGAAAAACGAAAATTGCGAGAAAATGAAATGGGGGGATTTGTAT
    GTATCAAACATTGACTGGGACAATTTTGTAACCCAAGCTAATGCACGGCATTAA
    SEQ ATGATTATCTTATATATTAGTACCTCGAATATGAACATGGAAGGAGTATTTATGGAAAAT
    ID TTTAAAAACTTGTATCCAATAAACAAAACACTTCGATTTGAATTAAGACCCTATGGAAAA
    NO: ACATTGGAAAATTTTAAAAAATCCGGACTTTTAGAAAAAGATGCCTTTAAGGCAAATAGT
    31 AGACGAAGTATGCAAGCTATAATCGATGAAAAATTCAAAGAGACTATCGAAGAACGCTTA
    AAGTACACTGAATTCAGTGAATGTGATCTTGGAAACATGACATCAAAAGATAAAAAAATA
    ACTGATAAAGCAGCTACAAATTTAAAAAAGCAAGTTATCTTATCTTTTGACGATGAAATA
    TTTAATAATTACCTAAAACCTGATAAAAATATTGACGCATTATTTAAAAATGATCCTTCA
    AATCCTGTAATCTCTACATTTAAAGGTTTTACGACATATTTTGTGAATTTTTTTGAAATT
    CGAAAACATATTTTCAAGGGAGAATCATCAGGCTCAATGGCATACCGAATTATAGATGAA
    AACCTGACAACATACTTGAATAATATTGAAAAAATAAAAAAACTGCCAGAAGAATTAAAA
    TCACAGCTAGAAGGCATTGATCAGATTGATAAACTTAATAATTATAATGAGTTCATTACA
    CAGTCAGGTATAACACACTATAATGAAATCATCGGCGGTATATCAAAATCAGAGAATGTC
    AAAATACAGGGAATTAATGAAGGAATTAATCTATACTGTCAGAAGAACAAAGTTAAACTT
    CCTCGACTGACTCCGCTATACAAAATGATATTATCAGACAGAGTTTCCAACTCTTTTGTA
    TTAGACACTATTGAAAATGACACAGAATTAATTGAAATGATAAGTGATTTGATTAATAAG
    ACTGAGATTTCGCAAGATGTTATAATGTCAGATATTCAAAATATTTTCATAAAATACAAA
    CAACTTGGTAATTTGCCGGGTATCTCATATTCTTCAATAGTTAATGCTATTTGCTCGGAT
    TATGACAACAATTTCGGAGATGGGAAGCGAAAAAAATCTTACGAAAATGATCGCAAAAAG
    CATTTGGAGACTAATGTATACTCCATAAATTATATTTCTGAATTGCTTACAGATACCGAT
    GTTTCATCAAATATCAAGATGAGATATAAAGAGCTTGAGCAAAATTATCAGGTTTGCAAA
    GAAAATTTTAATGCCACAAACTGGATGAATATTAAAAATATAAAACAATCTGAAAAAACA
    AACCTTATTAAAGATTTGTTAGATATACTTAAATCGATTCAACGTTTCTATGATTTGTTT
    GATATTGTTGACGAAGATAAAAATCCAAGTGCTGAATTTTATACCTGGTTATCAAAAAAT
    GCTGAAAAGCTTGACTTTGAATTCAATTCTGTATATAACAAGTCACGAAACTATCTCACC
    AGGAAACAATACTCTGATAAAAAAATCAAGCTGAATTTTGATTCTCCAACATTGGCCAAA
    GGGTGGGATGCTAACAAAGAAATAGATAACTCCACGATTATAATGCGTAAATTTAATAAT
    GACAGAGGCGATTATGATTACTTCCTTGGCATATGGAATAAATCCACACCTGCAAATGAA
    AAAATAATCCCACTGGAGGATAATGGATTATTCGAAAAAATGCAATATAAGCTGTATCCA
    GATCCTAGTAAGATGTTACCGAAACAATTTCTATCAAAAATATGGAAGGCAAAGCATCCT
    ACGACACCTGAATTTGATAAAAAATATAAAGAGGGAAGACATAAAAAAGGTCCTGATTTC
    GAAAAAGAATTCCTGCATGAATTGATTGATTGCTTCAAACATGGTCTTGTTAATCACGAT
    GAAAAATATCAGGATGTTTTTGGCTTCAATCTCCGTAACACTGAAGATTATAATTCATAT
    ACAGAGTTTCTCGAAGATGTGGAAAGATGCAATTACAATCTTTCATTTAACAAAATTGCT
    GATACTTCAAACCTTATTAATGATGGGAAATTGTATGTATTTCAGATATGGTCAAAAGAC
    TTTTCTATTGATTCAAAAGGTACTAAAAACTTGAATACAATCTATTTTGAATCACTATTT
    TCAGAAGAAAACATGATAGAAAAAATGTTCAAGCTTTCTGGAGAGGCTGAGATATTCTAT
    CGACCAGCATCGTTGAATTATTGTGAAGATATCATAAAAAAAGGTCATCACCATGCAGAA
    TTAAAAGATAAGTTTGACTATCCTATAATAAAAGATAAGCGATATTCACAAGATAAGTTT
    TTCTTTCATGTGCCAATGGTTATAAATTATAAATCTGAGAAACTGAATTCCAAAAGCCTT
    AACAACCGAACAAATGAAAACCTGGGACAGTTTACACATATTATAGGTATAGACAGGGGC
    GAGCGGCACTTGATTTATTTAACTGTTGTTGATGTTTCCACTGGTGAAATCGTTGAACAG
    AAACATCTGGACGAAATTATCAATACTGATACCAAGGGAGTTGAACACAAAACCCATTAT
    TTGAATAAATTGGAAGAAAAATCTAAAACAAGAGATAACGAGCGTAAATCATGGGAAGCT
    ATTGAAACTATCAAAGAATTAAAAGAAGGCTATATTTCTCATGTAATTAATGAAATACAA
    AAGCTGCAAGAAAAATATAATGCCTTAATCGTAATGGAAAATCTTAACTATGGGTTCAAA
    AACTCACGAATCAAAGTTGAAAAACAGGTTTATCAAAAATTCGAGACAGCATTGATTAAA
    AAGTTCAATTATATTATTGATAAAAAAGATCCAGAAACCTATATACATGGTTACCAGCTT
    ACAAATCCTATTACCACTCTGGATAAGATTGGAAATCAATCTGGAATAGTGCTGTATATT
    CCTGCGTGGAATACTTCTAAGATAGATCCCGTCACAGGATTTGTAAACCTTCTGTACGCA
    GATGATTTGAAGTATAAAAATCAGGAGCAGGCCAAATCATTCATTCAGAAAATAGACAAC
    ATATATTTTGAAAATGGAGAGTTTAAATTTGATATTGATTTTTCCAAATGGAATAATCGC
    TACTCAATAAGTAAAACTAAATGGACGTTAACAAGTTATGGGACTCGCATCCAGACATTT
    AGAAATCCCCAGAAAAACAATAAGTGGGATTCTGCTGAATATGATTTGACAGAAGAGTTT
    AAATTAATTTTAAATATAGACGGAACGTTAAAGTCACAGGACGTAGAAACATACAAAAAA
    TTCATGTCTTTATTTAAACTAATGCTACAGCTTCGAAACTCTGTTACAGGAACCGACATT
    GATTATATGATCTCTCCTGTCACTGATAAAACAGGAACACATTTCGATTCAAGAGAAAAT
    ATTAAAAATCTTCCTGCCGATGCAGATGCCAATGGTGCCTACAACATTGCGCGCAAAGGA
    ATAATGGCTATTGAAAATATAATGAACGGTATAAGCGATCCACTAAAAATAAGCAACGAA
    GACTATTTAAAGTATATTCAGAATCAACAGGAATAA
    SEQ ATGACCCAATTTGAAGGTTTTACCAATTTATACCAAGTTTCGAAGACCCTTCGTTTTGAA
    ID CTGATTCCCCAAGGAAAAACACTCAAACATATCCAGGAGCAAGGGTTCATTGAGGAGGAT
    NO: AAAGCTCGCAATGACCATTACAAAGAGTTAAAACCAATCATTGACCGCATCTATAAGACT
    32 TATGCTGATCAATGTCTCCAACTGGTACAGCTTGACTGGGAGAATCTATCTGCAGCCATA
    GACTCCTATCGTAAGGAAAAAACCGAAGAAACACGAAATGCGCTGATTGAGGAGCAAGCA
    ACATATAGAAATGCGATTCATGACTACTTTATAGGTCGGACGGATAATCTGACAGATGCC
    ATAAATAAGCGCCATGCTGAAATCTATAAAGGACTTTTTAAAGCTGAACTTTTCAATGGA
    AAAGTTTTAAAGCAATTAGGGACCGTAACCACGACAGAACATGAAAATGCTCTACTCCGT
    TCGTTTGACAAATTTACGACCTATTTTTCCGGCTTTTATGAAAACCGAAAAAATGTCTTT
    AGCGCTGAAGATATCAGCACGGCAATTCCCCATCGAATCGTCCAGGACAATTTCCCTAAA
    TTTAAGGAAAACTGCCATATTTTTACAAGATTGATAACCGCAGTTCCTTCTTTGCGGGAG
    CATTTTGAAAATGTCAAAAAGGCCATTGGAATCTTTGTTAGTACGTCTATTGAAGAAGTC
    TTTTCCTTTCCCTTTTATAATCAACTTCTAACCCAAACGCAAATTGATCTTTATAATCAA
    CTTCTCGGCGGCATATCTAGGGAAGCAGGCACAGAAAAAATCAAGGGACTTAATGAAGTT
    CTCAATCTGGCTATCCAAAAAAATGATGAAACAGCCCATATAATCGCGTCCCTGCCGCAT
    CGTTTTATTCCTCTTTTTAAACAAATTCTTTCCGATCGAAATACGTTATCCTTTATTTTG
    GAAGAATTCAAAAGCGATGAGGAAGTCATCCAATCCTTCTGCAAATATAAAACCCTCTTG
    AGAAACGAAAATGTACTGGAGACTGCAGAAGCCCTTTTCAATGAATTAAATTCCATTGAT
    TTGACTCATATCTTTATTTCCCATAAAAAGTTAGAAACCATCTCTTCAGCGCTTTGTGAC
    CATTGGGATACCTTGCGCAATGCACTTTACGAAAGACGGATTTCTGAACTCACTGGCAAA
    ATAACAAAAAGTGCCAAAGAAAAAGTTCAAAGGTCATTAAAACATGAGGATATAAATCTC
    CAAGAAATTATTTCTGCTGCAGGAAAAGAACTATCAGAAGCATTCAAACAAAAAACAAGT
    GAAATTCTTTCCCATGCCCATGCTGCACTTGACCAGCCTCTTCCCACAACATTAAAAAAA
    CAGGAAGAAAAAGAAATCCTCAAATCACAGCTCGATTCGCTTTTAGGCCTTTATCATCTT
    CTTGATTGGTTTGCTGTCGATGAAAGCAATGAAGTCGACCCAGAATTCTCAGCACGGCTG
    ACAGGCATTAAACTAGAAATGGAACCAAGCCTTTCGTTTTATAATAAAGCAAGAAATTAT
    GCGACAAAAAAGCCCTATTCGGTGGAAAAATTTAAATTGAATTTTCAAATGCCAACCCTT
    GCCTCTGGTTGGGATGTCAATAAAGAAAAAAATAATGGAGCTATTTTATTCGTAAAAAAT
    GGTCTCTATTACCTTGGTATCATGCCTAAACAGAAGGGGCGCTATAAAGCCCTGTCTTTT
    GAGCCGACAGAAAAAACATCAGAAGGATTCGATAAGATGTACTATGACTACTTCCCAGAT
    GCCGCAAAAATGATTCCTAAGTGTTCCACTCAGCTAAAGGCTGTAACCGCTCATTTTCAA
    ACTCATACCACCCCCATTCTTCTCTCAAATAATTTCATTGAACCTCTTGAAATCACAAAA
    GAAATTTATGACCTGAACAATCCTGAAAAGGAGCCTAAAAAGTTTCAAACGGCTTATGCA
    AAGAAGACAGGCGATCAAAAAGGCTATAGAGAAGCGCTTTGCAAATGGATTGACTTTACG
    CGGGATTTTCTCTCTAAATATACGAAAACAACTTCAATCGATTTATCTTCACTCCGCCCT
    TCTTCGCAATATAAAGATTTAGGGGAATATTACGCCGAACTGAATCCGCTTCTCTATCAT
    ATCTCCTTCCAACGAATTGCTGAAAAGGAAATCATGGATGCTGTAGAAACGGGAAAATTG
    TATCTGTTCCAAATCTACAATAAGGATTTTGCGAAGGGCCATCACGGGAAACCAAATCTC
    CACACCCTGTATTGGACAGGTCTCTTCAGTCCTGAAAACCTTGCGAAAACCAGCATCAAA
    CTTAATGGTCAAGCAGAATTGTTCTATCGACCTAAAAGCCGCATGAAGCGGATGGCCCAT
    CGTCTTGGGGAAAAAATGCTGAACAAAAAACTAAAGGACCAGAAGACACCGATTCCAGAT
    ACCCTCTACCAAGAACTGTACGATTATGTCAACCACCGGCTAAGCCATGATCTTTCCGAT
    GAAGCAAGGGCCCTGCTTCCAAATGTTATCACCAAAGAAGTCTCCCATGAAATTATAAAG
    GATCGGCGGTTTACTTCCGATAAATTTTTCTTCCATGTTCCCATTACACTGAATTATCAA
    GCAGCCAATAGTCCCAGTAAATTCAACCAGCGTGTCAATGCCTACCTTAAGGAGCATCCG
    GAAACGCCCATCATTGGTATCGATCGTGGAGAACGCAATCTAATCTATATTACCGTCATT
    GACAGTACTGGGAAAATTTTGGAGCAGCGTTCCCTGAATACCATCCAGCAATTTGACTAC
    CAAAAAAAATTGGACAACAGGGAAAAAGAGCGTGTTGCCGCCCGTCAAGCCTGGTCCGTC
    GTCGGAACGATCAAAGACCTTAAACAAGGCTACTTGTCACAGGTCATCCATGAAATTGTA
    GACCTGATGATTCATTACCAAGCTGTTGTCGTCCTTGAAAACCTCAACTTCGGATTTAAA
    TCAAAACGGACAGGCATTGCCGAAAAAGCAGTCTACCAACAATTTGAAAAGATGCTAATA
    GATAAACTCAACTGTTTGGTTCTCAAAGATTATCCTGCTGAGAAAGTGGGAGGCGTCTTA
    AACCCGTATCAACTTACAGATCAGTTCACGAGCTTTGCAAAAATGGGCACGCAAAGCGGC
    TTCCTTTTCTATGTACCGGCCCCTTATACCTCAAAGATTGATCCCCTGACTGGTTTTGTC
    GATCCCTTTGTATGGAAGACCATTAAAAATCATGAAAGTCGGAAGCATTTCCTAGAAGGA
    TTTGATTTCCTGCATTATGATGTCAAAACAGGTGATTTTATCCTCCATTTTAAAATGAAT
    CGGAATCTCTCTTTCCAGAGAGGGCTTCCTGGCTTCATGCCAGCTTGGGATATTGTTTTC
    GAAAAGAATGAAACCCAATTTGATGCAAAAGGGACGCCCTTCATTGCAGGAAAACGAATT
    GTTCCTGTAATCGAAAATCATCGTTTTACGGGTCGTTACAGAGACCTCTATCCCGCTAAT
    GAACTCATTGCCCTTCTGGAAGAAAAAGGCATTGTCTTTAGAGACGGAAGTAATATATTA
    CCCAAACTTTTAGAAAATGATGATTCTCATGCAATTGATACGATGGTCGCCTTGATTCGC
    AGTGTACTCCAAATGAGAAACAGCAATGCCGCAACGGGGGAAGACTACATCAACTCTCCC
    GTTAGGGATCTGAACGGGGTGTGTTTCGACAGTCGATTCCAAAATCCAGAATGGCCAATG
    GATGCGGATGCCAACGGAGCTTATCATATTGCCTTAAAAGGGCAGCTTCTTCTGAACCAC
    CTCAAAGAAAGCAAAGATCTGAAATTACAAAACGGCATCAGCAACCAAGATTGGCTGGCC
    TACATTCAGGAACTGAGAAACTGA
    SEQ ATGGCCGTCAAATCCATCAAAGTGAAACTTCGTCTCGACGATATGCCGGAGATTCGGGCC
    ID GGTCTATGGAAACTTCATAAGGAAGTCAATGCGGGGGTTCGATATTACACGGAATGGCTC
    NO: AGTCTTCTCCGTCAAGAGAACTTGTATCGAAGAAGTCCGAATGGGGACGGAGAGCAAGAA
    33 TGTGATAAGACTGCAGAAGAATGCAAAGCCGAATTGTTGGAGCGGCTGCGCGCGCGTCAA
    GTGGAGAATGGACACCGTGGTCCGGCGGGATCGGACGATGAATTGCTGCAGTTGGCGCGT
    CAACTCTATGAGTTGTTGGTTCCGCAGGCGATAGGTGCGAAAGGCGACGCGCAGCAAATT
    GCCCGCAAATTTTTGAGCCCCTTGGCCGACAAGGACGCAGTTGGTGGGCTTGGAATCGCG
    AAGGCGGGGAACAAACCGCGGTGGGTTCGCATGCGCGAAGCGGGGGAACCAGGCTGGGAA
    GAGGAGAAGGAGAAGGCTGAGACGAGGAAATCTGCGGATCGGACTGCGGATGTTTTGCGC
    GCGCTCGCGGATTTTGGGTTAAAGCCACTGATGCGCGTATACACCGATTCTGAGATGTCA
    TCGGTGGAGTGGAAACCGCTTCGGAAGGGACAAGCCGTTCGGACGTGGGATAGGGACATG
    TTCCAACAAGCTATCGAACGGATGATGTCGTGGGAGTCGTGGAATCAGCGCGTTGGGCAA
    GAGTACGCGAAACTCGTAGAACAAAAAAATCGATTTGAGCAGAAGAATTTCGTCGGCCAG
    GAACATCTGGTCCATCTCGTCAATCAGTTGCAACAAGATATGAAAGAAGCATCGCCCGGA
    CTCGAATCGAAAGAGCAAACCGCGCACTATGTGACGGGACGGGCATTGCGCGGATCGGAC
    AAGGTATTTGAGAAGTGGGGGAAACTCGCCCCCGATGCACCTTTCGATTTGTACGACGCC
    GAAATCAAGAATGTGCAGAGACGTAACACGAGACGATTCGGATCACATGACTTGTTCGCA
    AAATTGGCAGAGCCAGAGTATCAGGCCCTGTGGCGCGAAGATGCTTCGTTTCTCACGCGT
    TACGCGGTGTACAACAGCATCCTTCGCAAACTGAATCACGCCAAAATGTTCGCGACGTTT
    ACTTTGCCGGATGCAACGGCGCACCCGATTTGGACTCGCTTCGATAAATTGGGTGGGAAT
    TTGCACCAGTACACCTTTTTGTTCAACGAATTTGGAGAACGCAGGCACGCGATTCGTTTT
    CACAAGCTATTGAAAGTCGAGAATGGTGTCGCAAGAGAAGTTGATGATGTCACCGTGCCC
    ATTTCAATGTCAGAGCAATTGGATAATCTGCTTCCCAGAGATCCCAATGAACCGATTGCG
    CTATATTTTCGAGATTACGGAGCCGAACAGCATTTCACAGGTGAATTTGGTGGCGCGAAG
    ATCCAGTGCCGCCGGGATCAGCTGGCTCATATGCACCGACGCAGAGGGGCGAGGGATGTT
    TATCTCAATGTCAGCGTACGTGTGCAGAGTCAGTCTGAGGCGCGGGGAGAACGTCGCCCG
    CCGTATGCGGCAGTATTTCGTCTGGTCGGGGACAACCATCGCGCGTTTGTCCATTTCGAT
    AAACTATCGGATTATCTTGCGGAACATCCGGATGATGGGAAGCTCGGGTCGGAGGGGTTG
    CTTTCCGGGCTGCGGGTGATGAGTGTCGATCTCGGCCTTCGCACATCTGCATCGATTTCC
    GTTTTTCGCGTTGCCCGGAAGGACGAGTTGAAGCCGAACTCAAAAGGTCGTGTACCGTTT
    TTCTTTCCGATAAAAGGGAATGACAATCTCGTCGCGGTTCATGAGCGATCACAACTCTTG
    AAGCTGCCTGGCGAAACGGAGTCGAAGGACCTGCGTGCTATCCGAGAAGAACGCCAACGG
    ACATTGCGGCAGTTGCGGACGCAACTGGCGTATTTGCGGCTGCTCGTGCGGTGTGGGTCG
    GAAGATGTGGGGCGGCGTGAACGGAGTTGGGCAAAGCTTATCGAGCAGCCGGTGGATGCG
    GCCAATCACATGACACCGGATTGGCGCGAGGCTTTTGAAAACGAACTTCAGAAGCTTAAG
    TCACTCCATGGTATCTGTAGCGACAAGGAATGGATGGATGCTGTCTACGAGAGCGTTCGC
    CGCGTGTGGCGTCACATGGGCAAACAGGTTCGCGATTGGCGAAAGGACGTACGAAGCGGA
    GAGCGGCCCAAGATTCGCGGCTATGCGAAAGACGTGGTCGGTGGAAACTCGATTGAGCAA
    ATCGAGTATCTGGAACGTCAGTACAAGTTCCTCAAGAGTTGGAGCTTCTTTGGTAAGGTG
    TCGGGACAAGTGATTCGTGCGGAGAAGGGATCTCGTTTTGCGATCACGCTGCGCGAACAC
    ATTGATCACGCGAAGGAAGATCGGCTGAAGAAATTGGCGGATCGCATCATTATGGAGGCT
    CTCGGCTATGTGTACGCGTTGGATGAGCGCGGCAAAGGAAAGTGGGTTGCGAAGTATCCG
    CCGTGCCAGCTCATCCTGCTGGAGGAATTGAGCGAGTACCAGTTCAATAACGACAGGCCT
    CCGAGCGAAAACAACCAGTTGATGCAATGGAGTCATCGCGGCGTGTTCCAGGAGTTGATA
    AATCAGGCCCAAGTCCATGATTTACTCGTTGGGACGATGTATGCAGCGTTCTCGTCGCGA
    TTCGACGCGCGAACTGGGGCACCGGGTATCCGCTGTCGCCGGGTTCCGGCGCGTTGCACC
    CAGGAGCACAATCCAGAACCATTTCCTTGGTGGCTGAACAAGTTTGTGGTGGAACATACG
    TTGGATGCTTGTCCCCTACGCGCAGACGACCTCATCCCAACGGGTGAAGGAGAGATTTTT
    GTCTCGCCGTTCAGCGCGGAGGAGGGGGACTTTCATCAGATTCACGCCGACCTGAATGCG
    GCGCAAAATCTGCAGCAGCGACTCTGGTCTGATTTTGATATCAGTCAAATTCGGTTGCGG
    TGTGATTGGGGTGAAGTGGACGGTGAACTCGTTCTGATCCCAAGGCTTACAGGAAAACGA
    ACGGCGGATTCATATAGCAACAAGGTGTTTTATACCAATACAGGTGTCACCTATTATGAG
    CGAGAGCGGGGGAAGAAGCGGAGAAAGGTTTTCGCGCAAGAGAAATTGTCGGAGGAAGAG
    GCGGAGTTGCTCGTGGAAGCAGACGAGGCGAGGGAGAAATCGGTCGTTTTGATGCGTGAT
    CCGTCTGGCATCATCAATCGGGGAAATTGGACCAGGCAAAAGGAATTTTGGTCGATGGTG
    AACCAGCGGATCGAAGGATACTTGGTCAAGCAGATTCGCTCGCGCGTTCCATTACAAGAT
    AGTGCGTGTGAAAACACGGGGGATATTTAA
    SEQ ATGGCGACACGCAGTTTTATTTTAAAAATTGAACCAAATGAAGAAGTTAAAAAGGGATTA
    ID TGGAAGACGCATGAGGTATTGAATCATGGAATTGCCTACTACATGAATATTCTGAAACTA
    NO: ATTAGACAGGAAGCTATTTATGAACATCATGAACAAGATCCTAAAAATCCGAAAAAAGTT
    34 TCAAAAGCAGAAATACAAGCCGAGTTATGGGATTTTGTTTTAAAAATGCAAAAATGTAAT
    AGTTTTACACATGAAGTTGACAAAGATGTTGTTTTTAACATCCTGCGTGAACTATATGAA
    GAGTTGGTCCCTAGTTCAGTCGAGAAAAAGGGTGAAGCCAATCAATTATCGAATAAGTTT
    CTGTACCCGCTAGTTGATCCGAACAGTCAAAGTGGGAAAGGGACGGCATCATCCGGACGT
    AAACCTCGGTGGTATAATTTAAAAATAGCAGGCGACCCATCGTGGGAGGAAGAAAAGAAA
    AAATGGGAAGAGGATAAAAAGAAAGATCCCCTTGCTAAAATCTTAGGTAAGTTAGCAGAA
    TATGGGCTTATTCCGCTATTTATTCCATTTACTGACAGCAACGAACCAATTGTAAAAGAA
    ATTAAATGGATGGAAAAAAGTCGTAATCAAAGTGTCCGGCGACTTGATAAGGATATGTTT
    ATCCAAGCATTAGAGCGTTTTCTTTCATGGGAAAGCTGGAACCTTAAAGTAAAGGAAGAG
    TATGAAAAAGTTGAAAAGGAACACAAAACACTAGAGGAAAGGATAAAAGAGGACATTCAA
    GCATTTAAATCCCTTGAACAATATGAAAAAGAACGGCAGGAGCAACTTCTTAGAGATACA
    TTGAATACAAATGAATACCGATTAAGCAAAAGAGGATTACGTGGTTGGCGTGAAATTATC
    CAAAAATGGCTAAAGATGGATGAAAATGAACCATCAGAAAAATATTTAGAAGTATTTAAA
    GATTATCAACGGAAACATCCACGAGAAGCCGGGGACTATTCTGTCTATGAATTTTTAAGC
    AAGAAAGAAAATCATTTTATTTGGCGAAATCATCCTGAATATCCTTATTTGTATGCTACA
    TTTTGTGAAATTGACAAAAAAAAGAAAGACGCTAAGCAACAGGCAACTTTTACTTTGGCT
    GACCCGATTAACCATCCGTTATGGGTACGATTTGAAGAAAGAAGCGGTTCGAACTTAAAC
    AAATATCGAATTTTAACAGAGCAATTACACACTGAAAAGTTAAAAAAGAAATTAACAGTT
    CAACTTGATCGTTTAATTTATCCAACTGAATCCGGCGGTTGGGAGGAAAAAGGTAAAGTA
    GATATCGTTTTGTTGCCGTCAAGACAATTTTATAATCAAATCTTCCTTGATATAGAAGAA
    AAGGGGAAACATGCTTTTACTTATAAGGATGAAAGTATTAAATTCCCCCTTAAAGGTACA
    CTTGGTGGTGCAAGAGTGCAGTTTGACCGTGACCATTTGCGGAGATATCCGCATAAAGTA
    GAATCAGGAAATGTTGGACGGATTTATTTTAACATGACAGTAAATATTGAACCAACTGAG
    AGCCCTGTTAGTAAGTCTTTGAAAATACATAGGGACGATTTCCCCAAGTTCGTTAATTTT
    AAACCGAAAGAGCTCACCGAATGGATAAAAGATAGTAAAGGGAAAAAATTAAAAAGTGGT
    ATAGAATCCCTTGAAATTGGTCTACGGGTGATGAGTATCGACTTAGGTCAACGTCAAGCG
    GCTGCTGCATCGATTTTTGAAGTAGTTGATCAGAAACCGGATATTGAAGGGAAGTTATTT
    TTTCCAATCAAAGGAACTGAGCTTTATGCTGTTCACCGGGCAAGTTTTAACATTAAATTA
    CCGGGTGAAACATTAGTAAAATCACGGGAAGTATTGCGGAAAGCTCGGGAGGACAACTTA
    AAATTAATGAATCAAAAGTTAAACTTTCTAAGAAATGTTCTACATTTCCAACAGTTTGAA
    GATATCACAGAAAGAGAGAAGCGTGTAACTAAATGGATTTCTAGACAAGAAAATAGTGAT
    GTTCCTCTTGTATATCAAGATGAGCTAATTCAAATTCGTGAATTAATGTATAAACCCTAT
    AAAGATTGGGTTGCCTTTTTAAAACAACTCCATAAACGGCTAGAAGTCGAGATTGGCAAA
    GAGGTTAAGCATTGGCGAAAATCATTAAGTGACGGGAGAAAAGGTCTTTACGGAATCTCC
    CTAAAAAATATTGATGAAATTGATCGAACAAGGAAATTCCTTTTAAGATGGAGCTTACGT
    CCAACAGAACCTGGGGAAGTAAGACGCTTGGAACCAGGACAGCGTTTTGCGATTGATCAA
    TTAAACCACCTAAATGCATTAAAAGAAGATCGATTAAAAAAGATGGCAAATACGATTATC
    ATGCATGCCTTAGGTTACTGTTATGATGTAAGAAAGAAAAAGTGGCAGGCAAAAAATCCA
    GCATGTCAAATTATTTTATTTGAAGATTTATCTAACTACAATCCTTACGAGGAAAGGTCC
    CGTTTTGAAAACTCAAAACTGATGAAGTGGTCACGGAGAGAAATTCCACGACAAGTCGCC
    TTACAAGGTGAAATTTACGGATTACAAGTTGGGGAAGTAGGTGCCCAATTCAGTTCAAGA
    TTCCATGCGAAAACCGGGTCGCCGGGAATTCGTTGCAGTGTTGTAACGAAAGAAAAATTG
    CAGGATAATCGCTTTTTTAAAAATTTACAAAGAGAAGGACGACTTACTCTTGATAAAATC
    GCAGTTTTAAAAGAAGGAGACTTATATCCAGATAAAGGTGGAGAAAAGTTTATTTCTTTA
    TCAAAGGATCGAAAGTTGGTAACTACGCATGCTGATATTAACGCGGCCCAAAATTTACAG
    AAGCGTTTTTGGACAAGAACACATGGATTTTATAAAGTTTACTGCAAAGCCTATCAGGTT
    GATGGACAAACTGTTTATATTCCGGAGAGCAAGGACCAAAAACAAAAAATAATTGAAGAA
    TTTGGGGAAGGCTATTTTATTTTAAAAGATGGTGTATATGAATGGGGTAATGCGGGGAAA
    CTAAAAATTAAAAAAGGTTCCTCTAAACAATCATCGAGTGAATTAGTAGATTCGGACATA
    CTGAAAGATTCATTTGATTTAGCAAGTGAACTTAAGGGAGAGAAACTCATGTTATATCGA
    GATCCGAGTGGAAACGTATTTCCTTCCGACAAGTGGATGGCAGCAGGAGTATTTTTTGGC
    AAATTAGAAAGAATATTGATTTCTAAGTTAACAAATCAATACTCAATATCAACAATAGAA
    GATGATTCTTCAAAACAATCAATGTAA
    SEQ ATGCCCACCCGCACCATCAATCTGAAACTTGTTCTTGGGAAAAATCCTGAAAACGCAACA
    ID TTGCGACGCGCCCTATTTTCGACACACCGTTTGGTTAACCAAGCGACGAAACGTATTGAG
    NO: GAATTCTTGTTGCTGTGTCGTGGAGAAGCCTACAGAACAGTGGATAATGAGGGGAAGGAA
    35 GCCGAGATTCCACGTCATGCAGTCCAAGAAGAAGCTCTTGCCTTTGCCAAAGCTGCTCAA
    CGCCACAACGGCTGTATATCCACCTATGAAGACCAAGAGATTCTTGATGTACTGCGGCAA
    CTGTACGAACGTCTTGTTCCTTCGGTCAACGAAAACAACGAGGCAGGCGATGCTCAAGCT
    GCTAACGCCTGGGTCAGTCCGCTCATGTCGGCAGAAAGCGAAGGAGGCTTGTCGGTCTAC
    GACAAGGTGCTTGATCCACCGCCGGTTTGGATGAAGCTTAAAGAAGAAAAGGCTCCAGGA
    TGGGAAGCCGCTTCTCAAATTTGGATTCAGAGTGATGAGGGACAGTCGTTACTTAATAAG
    CCAGGTAGCCCTCCCCGCTGGATTCGAAAACTGCGATCTGGGCAACCGTGGCAAGATGAT
    TTCGTCAGTGACCAAAAGAAAAAGCAAGATGAGCTGACCAAAGGGAACGCACCACTTATA
    AAACAACTCAAAGAAATGGGGTTGTTGCCTCTTGTTAACCCATTTTTTAGACATCTTCTT
    GACCCTGAAGGTAAAGGCGTGAGTCCATGGGACCGTCTTGCTGTACGCGCTGCAGTGGCT
    CACTTTATCTCCTGGGAAAGTTGGAATCATAGAACACGTGCAGAATACAATTCCTTGAAA
    CTACGGCGAGACGAGTTTGAGGCAGCATCCGACGAATTCAAAGACGATTTTACTTTGCTC
    CGACAATATGAAGCCAAACGCCATAGTACATTGAAAAGCATCGCGCTGGCCGACGATTCG
    AACCCTTACCGGATTGGAGTACGTTCTCTGCGTGCCTGGAACCGCGTTCGTGAAGAATGG
    ATAGACAAGGGTGCAACAGAAGAACAACGCGTGACCATATTGTCAAAGCTTCAAACACAA
    CTTCGGGGAAAATTCGGCGATCCCGATCTGTTCAACTGGCTAGCTCAGGATAGGCATGTC
    CATTTGTGGTCTCCTCGGGACAGCGTGACACCATTGGTTCGCATCAATGCGGTAGATAAA
    GTTCTGCGTCGACGAAAACCGTATGCATTGATGACCTTTGCCCATCCCCGCTTCCACCCT
    CGATGGATACTGTACGAGGCTCCAGGAGGAAGCAATCTCCGTCAATATGCATTGGATTGT
    ACAGAAAACGCTCTACACATCACGTTGCCTTTGCTTGTCGACGATGCGCACGGAACCTGG
    ATTGAAAAAAAGATCAGGGTGCCGCTGGCACCATCCGGACAAATTCAAGATTTAACTCTG
    GAAAAACTTGAGAAGAAAAAAAATCGTTTATACTACCGTTCCGGTTTTCAGCAGTTTGCC
    GGCTTGGCTGGCGGAGCTGAGGTTCTTTTCCACAGACCCTATATGGAACACGACGAACGC
    AGCGAGGAGTCTCTTTTGGAACGTCCGGGAGCCGTTTGGTTCAAATTGACCCTGGATGTG
    GCAACACAGGCTCCCCCGAACTGGCTTGATGGTAAGGGCCGTGTCCGTACACCGCCGGAG
    GTACATCATTTTAAAACCGCATTGTCGAATAAAAGCAAACATACACGTACGCTGCAGCCG
    GGTCTCCGTGTCTTGTCAGTAGACTTGGGCATGCGAACATTCGCCTCCTGCTCAGTATTT
    GAACTCATCGAGGGAAAGCCTGAGACAGGCCGTGCCTTCCCTGTTGCCGATGAGAGATCA
    ATGGACAGCCCGAATAAACTGTGGGCCAAGCATGAACGTAGTTTTAAACTGACGCTCCCC
    GGCGAAACCCCTTCTCGAAAGGAAGAGGAAGAGCGTAGCATAGCAAGAGCGGAAATTTAT
    GCACTGAAACGCGACATACAACGCCTCAAAAGCCTACTCCGCTTAGGTGAAGAAGATAAC
    GATAACCGTCGTGATGCATTGCTTGAACAGTTCTTTAAAGGATGGGGAGAAGAAGACGTT
    GTGCCTGGACAAGCGTTTCCACGCTCTCTTTTCCAAGGGTTGGGAGCTGCCCCGTTTCGC
    TCAACTCCAGAGTTATGGCGTCAGCATTGCCAAACATATTATGACAAAGCGGAAGCCTGT
    CTGGCTAAACATATCAGTGATTGGCGCAAGCGAACTCGTCCCCGTCCGACATCGCGGGAG
    ATGTGGTACAAAACACGTTCCTATCATGGCGGCAAGTCCATTTGGATGTTGGAATATCTT
    GATGCCGTTCGAAAACTGCTTCTCAGTTGGAGCTTACGTGGTCGTACTTACGGTGCCATT
    AATCGCCAGGATACAGCCCGGTTTGGTTCTTTGGCATCACGGCTGCTCCACCATATCAAT
    TCCCTAAAGGAAGACCGCATCAAAACAGGAGCCGACTCTATCGTTCAGGCTGCTCGCGGG
    TATATTCCTCTCCCTCATGGCAAGGGTTGGGAACAAAGATATGAGCCTTGTCAGCTCATA
    TTATTTGAAGACCTCGCACGATATCGCTTTCGCGTGGATCGACCTCGTCGAGAGAACAGC
    CAACTCATGCAGTGGAACCATCGAGCCATCGTGGCAGAAACAACGATGCAAGCCGAACTC
    TACGGACAAATTGTCGAAAATACTGCAGCGGGGTTCAGCAGTCGTTTTCACGCGGCGACA
    GGTGCCCCCGGTGTACGTTGTCGTTTTCTTCTAGAAAGAGACTTTGATAACGATTTGCCC
    AAACCGTACCTTCTCAGGGAACTTTCTTGGATGCTCGGCAATACAAAAGTCGAGTCTGAA
    GAAGAAAAGCTTCGATTGCTGTCTGAAAAAATCAGGCCAGGCAGTCTTGTTCCTTGGGAT
    GGAGGCGAACAGTTCGCTACCCTGCATCCCAAAAGACAAACACTTTGCGTCATTCATGCC
    GATATGAATGCTGCCCAAAATTTACAACGCCGGTTTTTCGGTCGATGCGGCGAGGCCTTT
    CGGCTTGTTTGTCAACCCCACGGTGACGACGTGTTACGACTCGCATCCACCCCAGGAGCT
    CGTCTTCTTGGAGCCCTGCAGCAGCTTGAAAATGGACAAGGAGCTTTCGAGTTGGTTCGA
    GACATGGGGTCAACAAGTCAAATGAACCGGTTCGTCATGAAGTCTTTGGGAAAAAAGAAA
    ATAAAACCCCTTCAGGACAACAATGGAGACGACGAGCTTGAAGACGTGTTGTCCGTACTC
    CCGGAGGAAGACGACACAGGACGTATCACAGTCTTCCGCGATTCATCAGGAATCTTTTTT
    CCTTGCAACGTCTGGATACCGGCCAAACAGTTTTGGCCAGCAGTACGCGCCATGATTTGG
    AAGGTCATGGCTTCCCATTCTTTGGGGTGA
    SEQ ATGACAAAGTTAAGACACCGACAGAAAAAATTAACACACGACTGGGCTGGCTCCAAAAAG
    ID AGGGAAGTATTAGGCTCAAATGGCAAGCTTCAGAATCCGTTGTTAATGCCGGTTAAAAAA
    NO: GGTCAGGTTACTGAGTTCCGGAAAGCGTTTTCTGCGTATGCTCGCGCAACGAAAGGAGAA
    36 ATGACTGACGGCCGAAAGAATATGTTTACGCATAGTTTCGAGCCATTTAAGACAAAGCCC
    TCGCTTCATCAGTGTGAATTGGCAGATAAAGCATATCAATCTTTACATTCGTATCTGCCT
    GGTTCTCTTGCTCATTTTCTATTATCTGCTCACGCATTAGGTTTTCGTATTTTTTCAAAA
    TCTGGTGAAGCAACTGCATTCCAGGCATCCTCTAAAATTGAAGCTTACGAATCAAAATTG
    GCAAGCGAATTAGCTTGTGTAGATTTATCTATTCAAAACTTGACTATTTCAACGCTTTTT
    AATGCGCTTACAACGTCTGTAAGAGGGAAGGGCGAAGAAACTAGCGCTGACCCCTTAATT
    GCACGATTTTACACCTTACTTACTGGCAAGCCTCTGTCTCGAGACACTCAAGGGCCTGAA
    CGTGATTTAGCAGAAGTTATCTCGCGTAAGATAGCTAGTTCTTTTGGCACATGGAAAGAA
    ATGACGGCAAACCCTCTTCAGTCATTACAATTTTTTGAAGAGGAACTCCATGCGCTGGAT
    GCCAATGTCTCGCTCTCACCCGCCTTCGACGTTTTAATTAAAATGAATGATTTGCAGGGC
    GATTTAAAAAATCGAACCATTGTTTTTGATCCTGACGCCCCTGTTTTTGAATATAACGCA
    GAAGACCCTGCCGACATAATTATTAAACTTACAGCTCGTTACGCTAAAGAAGCTGTCATC
    AAAAATCAAAACGTAGGAAATTACGTTAAAAACGCTATTACTACCACAAATGCCAATGGT
    CTTGGTTGGCTTTTGAACAAAGGTTTGTCGTTACTCCCTGTCTCGACCGATGACGAATTG
    CTAGAGTTTATTGGCGTTGAACGATCTCATCCCTCATGCCATGCCTTAATTGAATTGATT
    GCACAATTAGAAGCCCCCGAGCTCTTTGAGAAGAACGTATTTTCAGATACTCGTTCTGAA
    GTTCAAGGTATGATTGATTCAGCTGTTTCTAATCATATTGCTCGTCTTTCCAGCTCTAGA
    AATAGCTTGTCAATGGATAGTGAAGAATTAGAACGTTTAATCAAAAGCTTTCAGATACAC
    ACACCTCATTGCTCACTTTTTATTGGCGCCCAATCACTTTCACAGCAGTTAGAATCTTTG
    CCTGAAGCCCTTCAATCGGGCGTTAATTCAGCCGATATTTTACTAGGCTCTACTCAATAT
    ATGCTCACCAATTCTTTGGTTGAAGAGTCAATTGCAACTTATCAAAGAACACTTAATCGC
    ATCAATTACTTGTCAGGTGTTGCAGGTCAGATTAACGGCGCAATAAAGCGAAAAGCGATA
    GATGGAGAAAAAATTCACTTGCCTGCAGCTTGGTCAGAGTTGATATCTTTACCATTTATA
    GGCCAGCCTGTTATAGATGTTGAAAGCGATTTAGCTCATCTAAAAAATCAATACCAAACA
    CTTTCAAATGAGTTTGATACTCTTATATCTGCTTTGCAAAAGAATTTTGATTTGAACTTT
    AATAAAGCGCTCCTTAATCGTACTCAGCATTTTGAAGCCATGTGTAGAAGCACTAAGAAA
    AACGCTTTATCCAAACCAGAGATCGTTTCCTATCGCGACCTGCTTGCTCGATTAACTTCT
    TGTTTGTATCGAGGCTCTTTAGTTTTGCGTCGTGCCGGCATTGAAGTGTTAAAAAAACAT
    AAAATATTTGAGTCAAACAGCGAACTTCGTGAACATGTTCATGAAAGAAAGCATTTCGTG
    TTTGTTAGTCCTCTAGATCGCAAAGCCAAGAAACTCCTTCGATTAACTGATTCGCGTCCA
    GACTTGTTACATGTTATTGATGAAATATTGCAGCACGATAATCTTGAAAACAAAGACCGC
    GAGTCACTTTGGCTAGTTCGCTCTGGTTATTTGCTTGCAGGACTTCCAGATCAACTTTCT
    TCATCTTTTATTAACTTGCCTATCATTACTCAAAAAGGAGATAGACGCCTTATAGACCTG
    ATTCAGTATGATCAAATTAATCGTGATGCTTTTGTTATGTTAGTGACCTCTGCATTCAAG
    TCTAATTTGTCTGGTCTGCAGTATCGTGCCAATAAGCAATCGTTCGTTGTTACTCGCACG
    CTAAGCCCTTATCTCGGCTCAAAACTTGTCTACGTACCCAAGGATAAAGATTGGTTAGTT
    CCTTCTCAAATGTTTGAAGGACGATTTGCTGACATTCTTCAATCAGATTATATGGTCTGG
    AAAGATGCCGGTCGTCTTTGTGTTATTGATACTGCAAAACACCTTTCTAATATAAAGAAG
    TCTGTATTTTCATCCGAAGAAGTTCTCGCTTTTTTAAGAGAACTCCCTCACCGCACATTT
    ATCCAGACCGAAGTTCGCGGCCTTGGCGTTAATGTCGATGGAATTGCATTTAATAATGGT
    GATATTCCGTCATTAAAAACCTTTTCAAATTGCGTTCAGGTAAAAGTTTCTCGGACTAAT
    ACATCCCTAGTTCAAACACTTAATCGTTGGTTTGAAGGAGGAAAAGTTTCTCCTCCGAGC
    ATTCAATTTGAACGGGCGTATTATAAAAAAGACGATCAAATTCATGAAGACGCAGCGAAA
    AGAAAGATACGATTCCAGATGCCCGCAACTGAGTTGGTTCATGCTTCTGACGATGCGGGG
    TGGACACCAAGTTATTTGCTCGGCATTGATCCTGGCGAGTATGGAATGGGTCTTTCATTG
    GTTTCGATTAATAACGGAGAAGTCTTAGATTCAGGCTTTATTCATATTAATTCTCTGATC
    AATTTTGCCTCTAAAAAGAGCAACCATCAAACTAAGGTTGTTCCGCGTCAGCAGTACAAA
    TCTCCTTATGCAAATTATTTAGAACAATCTAAAGATTCTGCTGCTGGTGATATTGCGCAT
    ATACTCGATCGACTTATATACAAATTAAATGCGTTGCCTGTTTTTGAGGCTCTTTCAGGT
    AATTCTCAGAGTGCTGCTGATCAAGTTTGGACGAAAGTCTTATCGTTTTACACTTGGGGT
    GATAATGACGCTCAGAATTCTATTAGAAAGCAGCATTGGTTTGGAGCCAGTCATTGGGAT
    ATCAAAGGTATGTTAAGGCAACCCCCTACGGAGAAGAAGCCTAAACCGTATATTGCTTTT
    CCTGGCTCTCAGGTTTCTTCGTATGGTAATTCCCAACGTTGCTCTTGCTGCGGTCGCAAT
    CCTATTGAACAACTTCGAGAAATGGCAAAGGATACCTCTATTAAAGAGCTAAAAATTCGC
    AATTCTGAGATACAGCTTTTTGACGGAACCATTAAATTATTTAATCCAGACCCATCCACT
    GTGATAGAGAGAAGGCGACATAATCTTGGTCCATCAAGAATTCCTGTTGCTGACCGTACT
    TTCAAAAACATCAGTCCATCAAGTCTAGAATTTAAAGAATTGATTACTATCGTGTCTCGA
    TCTATCCGTCATTCACCTGAGTTTATCGCTAAAAAACGCGGCATAGGGTCTGAGTATTTT
    TGCGCTTATTCCGATTGCAACTCATCCTTAAATTCTGAAGCTAACGCAGCTGCTAACGTA
    GCGCAAAAATTTCAAAAACAGTTATTTTTTGAGTTATAA
    SEQ ATGAAGAGAATTCTGAACAGTCTGAAAGTTGCTGCCTTGAGACTTCTGTTTCGAGGCAAA
    ID GGTTCTGAATTAGTGAAGACAGTCAAATATCCATTGGTTTCCCCGGTTCAAGGCGCGGTT
    NO: GAAGAACTTGCTGAAGCAATTCGGCACGACAACCTGCACCTTTTTGGGCAGAAGGAAATA
    37 GTGGATCTTATGGAGAAAGACGAAGGAACCCAGGTGTATTCGGTTGTGGATTTTTGGTTG
    GATACCCTGCGTTTAGGGATGTTTTTCTCACCATCAGCGAATGCGTTGAAAATCACGCTG
    GGAAAATTCAATTCTGATCAGGTTTCACCTTTTCGTAAGGTTTTGGAGCAGTCACCTTTT
    TTTCTTGCGGGTCGCTTGAAGGTTGAACCTGCGGAAAGGATACTTTCTGTTGAAATCAGA
    AAGATTGGTAAAAGAGAAAACAGAGTTGAGAACTATGCCGCCGATGTGGAGACATGCTTC
    ATTGGTCAGCTTTCTTCAGATGAGAAACAGAGTATCCAGAAGCTGGCAAATGATATCTGG
    GATAGCAAGGATCATGAGGAACAGAGAATGTTGAAGGCGGATTTTTTTGCTATACCTCTT
    ATAAAAGACCCCAAAGCTGTCACAGAAGAAGATCCTGAAAATGAAACGGCGGGAAAACAG
    AAACCGCTTGAATTATGTGTTTGTCTTGTTCCTGAGTTGTATACCCGAGGTTTCGGCTCC
    ATTGCTGATTTTCTGGTTCAGCGACTTACCTTGCTGCGTGACAAAATGAGTACCGACACG
    GCGGAAGATTGCCTCGAGTATGTTGGCATTGAGGAAGAAAAAGGCAATGGAATGAATTCC
    TTGCTCGGCACTTTTTTGAAGAACCTGCAGGGTGATGGTTTTGAACAGATTTTTCAGTTT
    ATGCTTGGGTCTTATGTTGGCTGGCAGGGGAAGGAAGATGTACTGCGCGAACGATTGGAT
    TTGCTGGCCGAAAAAGTCAAAAGATTACCAAAGCCAAAATTTGCCGGAGAATGGAGTGGT
    CATCGTATGTTTCTCCATGGTCAGCTGAAAAGCTGGTCGTCGAATTTCTTCCGTCTTTTT
    AATGAGACGCGGGAACTTCTGGAAAGTATCAAGAGTGATATTCAACATGCCACCATGCTC
    ATTAGCTATGTGGAAGAGAAAGGAGGCTATCATCCACAGCTGTTGAGTCAGTATCGGAAG
    TTAATGGAACAATTACCGGCGTTGCGGACTAAGGTTTTGGATCCTGAGATTGAGATGACG
    CATATGTCCGAGGCTGTTCGAAGTTACATTATGATACACAAGTCTGTAGCGGGATTTCTG
    CCGGATTTACTCGAGTCTTTGGATCGAGATAAGGATAGGGAATTTTTGCTTTCCATCTTT
    CCTCGTATTCCAAAGATAGATAAGAAGACGAAAGAGATCGTTGCATGGGAGCTACCGGGC
    GAGCCAGAGGAAGGCTATTTGTTCACAGCAAACAACCTTTTCCGGAATTTTCTTGAGAAT
    CCGAAACATGTGCCACGATTTATGGCAGAGAGGATTCCCGAGGATTGGACGCGTTTGCGC
    TCGGCCCCTGTGTGGTTTGATGGGATGGTGAAGCAATGGCAGAAGGTGGTGAATCAGTTG
    GTTGAATCTCCAGGCGCCCTTTATCAGTTCAATGAAAGTTTTTTGCGTCAAAGACTGCAA
    GCAATGCTTACGGTCTATAAGCGGGATCTCCAGACTGAGAAGTTTCTGAAGCTGCTGGCT
    GATGTCTGTCGTCCACTCGTTGATTTTTTCGGACTTGGAGGAAATGATATTATCTTCAAG
    TCATGTCAGGATCCAAGAAAGCAATGGCAGACTGTTATTCCACTCAGTGTCCCAGCGGAT
    GTTTATACAGCATGTGAAGGCTTGGCTATTCGTCTCCGCGAAACTCTTGGATTCGAATGG
    AAAAATCTGAAAGGACACGAGCGGGAAGATTTTTTACGGCTGCATCAGTTGCTGGGAAAT
    CTGCTGTTCTGGATCAGGGATGCGAAACTTGTCGTGAAGCTGGAAGACTGGATGAACAAT
    CCTTGTGTTCAGGAGTATGTGGAAGCACGAAAAGCCATTGATCTTCCCTTGGAGATTTTC
    GGATTTGAGGTGCCGATTTTTCTCAATGGCTATCTCTTTTCGGAACTGCGCCAGCTGGAA
    TTGTTGCTGAGGCGTAAGTCGGTGATGACGTCTTACAGCGTCAAAACGACAGGCTCGCCA
    AATAGGCTCTTCCAGTTGGTTTACCTACCTCTAAACCCTTCAGATCCGGAAAAGAAAAAT
    TCCAACAACTTTCAGGAGCGCCTCGATACACCTACCGGTTTGTCGCGTCGTTTTCTGGAT
    CTTACGCTGGATGCATTTGCTGGCAAACTCTTGACGGATCCGGTAACTCAGGAACTGAAG
    ACGATGGCCGGTTTTTACGATCATCTCTTTGGCTTCAAGTTGCCGTGTAAACTGGCGGCG
    ATGAGTAACCATCCAGGATCCTCTTCCAAAATGGTGGTTCTGGCAAAACCAAAGAAGGGT
    GTTGCTAGTAACATCGGCTTTGAACCTATTCCCGATCCTGCTCATCCTGTGTTCCGGGTG
    AGAAGTTCCTGGCCGGAGTTGAAGTACCTGGAGGGGTTGTTGTATCTTCCCGAAGATACA
    CCACTGACCATTGAACTGGCGGAAACGTCGGTCAGTTGTCAGTCTGTGAGTTCAGTCGCT
    TTCGATTTGAAGAATCTGACGACTATCTTGGGTCGTGTTGGTGAATTCAGGGTGACGGCA
    GATCAACCTTTCAAGCTGACGCCCATTATTCCTGAGAAAGAGGAATCCTTCATCGGGAAG
    ACCTACCTCGGTCTTGATGCTGGAGAGCGATCTGGCGTTGGTTTCGCGATTGTGACGGTT
    GACGGCGATGGGTATGAGGTGCAGAGGTTGGGTGTGCATGAAGATACTCAGCTTATGGCG
    CTTCAGCAAGTCGCCAGCAAGTCTCTTAAGGAGCCGGTTTTCCAGCCACTCCGTAAGGGC
    ACATTTCGTCAGCAGGAGCGCATTCGCAAAAGCCTCCGCGGTTGCTACTGGAATTTCTAT
    CATGCATTGATGATCAAGTACCGAGCTAAAGTTGTGCATGAGGAATCGGTGGGTTCATCC
    GGTCTGGTGGGGCAGTGGCTGCGTGCATTTCAGAAGGATCTCAAAAAGGCTGATGTTCTG
    CCCAAGAAGGGTGGAAAAAATGGTGTAGACAAAAAAAAGAGAGAAAGCAGCGCTCAGGAT
    ACCTTATGGGGAGGAGCTTTCTCGAAGAAGGAAGAGCAGCAGATAGCCTTTGAGGTTCAG
    GCAGCTGGATCAAGCCAGTTTTGTCTGAAGTGTGGTTGGTGGTTTCAGTTGGGGATGCGG
    GAAGTAAATCGTGTGCAGGAGAGTGGCGTGGTGCTGGACTGGAACCGGTCCATTGTAACC
    TTCCTCATCGAATCCTCAGGAGAAAAGGTATATGGTTTCAGTCCTCAGCAACTGGAAAAA
    GGCTTTCGTCCTGACATCGAAACGTTCAAAAAAATGGTAAGGGATTTTATGAGACCCCCC
    ATGTTTGATCGCAAAGGTCGGCCGGCCGCGGCGTATGAAAGATTCGTACTGGGACGTCGT
    CACCGTCGTTATCGCTTTGATAAAGTTTTTGAAGAGAGATTTGGTCGCAGTGCTCTTTTC
    ATCTGCCCGCGGGTCGGGTGTGGGAATTTCGATCACTCCAGTGAGCAGTCAGCCGTTGTC
    CTTGCCCTTATTGGTTACATTGCTGATAAGGAAGGGATGAGTGGTAAGAAGCTTGTTTAT
    GTGAGGCTGGCTGAACTTATGGCTGAGTGGAAGCTGAAGAAACTGGAGAGATCAAGGGTG
    GAAGAACAGAGCTCGGCACAATAA
    SEQ ATGGCAGAAAGCAAGCAGATGCAATGCCGCAAGTGCGGCGCAAGCATGAAGTATGAAGTA
    ID ATTGGATTGGGCAAGAAGTCATGCAGATATATGTGCCCAGATTGCGGCAATCACACCAGC
    NO: GCGCGCAAGATTCAGAACAAGAAAAAGCGCGACAAAAAGTATGGATCCGCAAGCAAAGCG
    38 CAGAGCCAGAGGATAGCTGTGGCTGGCGCGCTTTATCCAGACAAAAAAGTGCAGACCATA
    AAGACCTACAAATACCCAGCGGATCTTAATGGCGAAGTTCATGACAGCGGCGTCGCAGAG
    AAGATTGCGCAGGCGATTCAGGAAGATGAGATCGGCCTGCTTGGCCCGTCCAGCGAATAC
    GCTTGCTGGATTGCTTCACAAAAACAGAGCGAGCCGTATTCAGTTGTAGATTTTTGGTTT
    GACGCGGTGTGCGCAGGCGGAGTATTCGCGTATTCTGGCGCGCGCCTGCTTTCCACAGTC
    CTCCAGTTGAGTGGCGAGGAAAGCGTTTTGCGCGCTGCTTTAGCATCTAGCCCGTTTGTA
    GATGACATTAATTTGGCGCAAGCGGAAAAGTTCCTAGCCGTTAGCCGGCGCACAGGCCAA
    GATAAGCTAGGCAAGCGCATTGGAGAATGTTTTGCGGAAGGCCGGCTTGAAGCGCTTGGC
    ATCAAAGATCGCATGCGCGAATTCGTGCAAGCGATTGATGTGGCCCAAACCGCGGGCCAG
    CGGTTCGCGGCCAAGCTAAAGATATTCGGCATCAGTCAGATGCCTGAAGCCAAGCAATGG
    AACAATGATTCCGGGCTCACTGTATGTATTTTGCCGGATTATTATGTCCCGGAAGAAAAC
    CGCGCGGACCAGCTGGTTGTTTTGCTTCGGCGCTTACGCGAGATCGCGTATTGCATGGGA
    ATTGAGGATGAAGCAGGATTTGAGCATCTAGGCATTGACCCTGGTGCTCTTTCCAATTTT
    TCCAATGGCAATCCAAAGCGAGGATTTCTCGGCCGCCTGCTCAATAATGACATTATAGCG
    CTGGCAAACAACATGTCAGCCATGACGCCGTATTGGGAAGGCAGAAAAGGCGAGTTGATT
    GAGCGCCTTGCATGGCTTAAACATCGCGCTGAAGGATTGTATTTGAAAGAGCCACATTTC
    GGCAACTCCTGGGCAGACCACCGCAGCAGGATTTTCAGTCGCATTGCGGGCTGGCTTTCC
    GGATGCGCGGGCAAGCTCAAGATTGCCAAGGATCAGATTTCAGGCGTGCGTACGGATTTG
    TTTCTGCTCAAGCGCCTTCTGGATGCGGTACCGCAAAGCGCGCCGTCGCCGGACTTTATT
    GCTTCCATCAGCGCGCTGGATCGGTTTTTGGAAGCGGCAGAAAGCAGCCAGGATCCGGCA
    GAACAGGTACGCGCTTTGTACGCGTTTCATCTGAACGCGCCTGCGGTCCGATCCATCGCC
    AACAAGGCGGTACAGAGGTCTGATTCCCAGGAGTGGCTTATCAAGGAACTGGATGCTGTA
    GATCACCTTGAATTCAACAAAGCATTTCCGTTTTTTTCGGATACAGGAAAGAAAAAGAAG
    AAAGGAGCGAATAGCAACGGAGCGCCTTCTGAAGAAGAATACACGGAAACAGAATCCATT
    CAACAACCAGAAGATGCAGAGCAGGAAGTGAATGGTCAAGAAGGAAATGGCGCTTCAAAG
    AACCAGAAAAAGTTTCAGCGCATTCCTCGATTTTTCGGGGAAGGGTCAAGGAGTGAGTAT
    CGAATTTTAACAGAAGCGCCGCAATATTTTGACATGTTCTGCAATAATATGCGCGCGATC
    TTTATGCAGCTAGAGAGTCAGCCGCGCAAGGCGCCTCGTGATTTCAAATGCTTTCTGCAG
    AATCGTTTGCAGAAGCTTTACAAGCAAACCTTTCTCAATGCTCGCAGTAATAAATGCCGC
    GCGCTTCTGGAATCCGTCCTTATTTCATGGGGAGAATTTTATACTTATGGCGCGAATGAA
    AAGAAGTTTCGTCTGCGCCATGAAGCGAGCGAGCGCAGCTCGGATCCGGACTATGTGGTT
    CAGCAGGCATTGGAAATCGCGCGCCGGCTTTTCTTGTTCGGATTTGAGTGGCGCGATTGC
    TCTGCTGGAGAGCGCGTGGATTTGGTTGAAATCCACAAAAAAGCAATCTCATTTTTGCTT
    GCAATCACTCAGGCCGAGGTTTCAGTTGGTTCCTATAACTGGCTTGGGAATAGCACCGTG
    AGCCGGTATCTTTCGGTTGCTGGCACAGACACATTGTACGGCACTCAACTGGAGGAGTTT
    TTGAACGCCACAGTGCTTTCACAGATGCGTGGGCTGGCGATTCGGCTTTCATCTCAGGAG
    TTAAAAGACGGATTTGATGTTCAGTTGGAGAGTTCGTGCCAGGACAATCTCCAGCATCTG
    CTGGTGTATCGCGCTTCGCGCGACTTGGCTGCGTGCAAACGCGCTACATGCCCGGCTGAA
    TTGGATCCGAAAATTCTTGTTCTGCCGGTTGGTGCGTTTATCGCGAGCGTAATGAAAATG
    ATTGAGCGTGGCGATGAACCATTAGCAGGCGCGTATTTGCGTCATCGGCCGCATTCATTC
    GGCTGGCAGATACGGGTTCGTGGAGTGGCGGAAGTAGGCATGGATCAGGGCACAGCGCTA
    GCATTCCAGAAGCCGACTGAATCAGAGCCGTTTAAAATAAAGCCGTTTTCCGCTCAATAC
    GGCCCAGTACTTTGGCTTAATTCTTCATCCTATAGCCAGAGCCAGTATCTGGATGGATTT
    TTAAGCCAGCCAAAGAATTGGTCTATGCGGGTGCTACCTCAAGCCGGATCAGTGCGCGTG
    GAACAGCGCGTTGCTCTGATATGGAATTTGCAGGCAGGCAAGATGCGGCTGGAGCGCTCT
    GGAGCGCGCGCGTTTTTCATGCCAGTGCCATTCAGCTTCAGGCCGTCTGGTTCAGGAGAT
    GAAGCAGTATTGGCGCCGAATCGGTACTTGGGACTTTTTCCGCATTCCGGAGGAATAGAA
    TACGCGGTGGTGGATGTATTAGATTCCGCGGGTTTCAAAATTCTTGAGCGCGGTACGATT
    GCGGTAAATGGCTTTTCCCAGAAGCGCGGCGAACGCCAAGAGGAGGCACACAGAGAAAAA
    CAGAGACGCGGAATTTCTGATATAGGCCGCAAGAAGCCGGTGCAAGCTGAAGTTGACGCA
    GCCAATGAATTGCACCGCAAATACACCGATGTTGCCACTCGTTTAGGGTGCAGAATTGTG
    GTTCAGTGGGCGCCCCAGCCAAAGCCGGGCACAGCGCCGACCGCGCAAACAGTATACGCG
    CGCGCAGTGCGGACCGAAGCGCCGCGATCTGGAAATCAAGAGGATCATGCTCGTATGAAA
    TCCTCTTGGGGATATACCTGGGGCACCTATTGGGAGAAGCGCAAACCAGAGGATATTTTG
    GGCATCTCAACCCAAGTATACTGGACCGGCGGTATAGGCGAGTCATGTCCCGCAGTCGCG
    GTTGCGCTTTTGGGGCACATTAGGGCAACATCCACTCAAACTGAATGGGAAAAAGAGGAG
    GTTGTATTCGGTCGACTGAAGAAGTTCTTTCCAAGCTAG
    SEQ ATGGAAAAGAGAATAAACAAGATACGAAAGAAACTATCGGCCGATAATGCCACAAAGCCT
    ID GTGAGCAGGAGCGGCCCCATGAAAACACTCCTTGTCCGGGTCATGACGGACGACTTGAAA
    NO: AAAAGACTGGAGAAGCGTCGGAAAAAGCCGGAAGTTATGCCGCAGGTTATTTCAAATAAC
    39 GCAGCAAACAATCTTAGAATGCTCCTTGATGACTATACAAAGATGAAGGAGGCGATACTA
    CAAGTTTACTGGCAGGAATTTAAGGACGACCATGTGGGCTTGATGTGCAAATTTGCCCAG
    CCTGCTTCCAAAAAAATTGACCAGAACAAACTAAAACCGGAAATGGATGAAAAAGGAAAT
    CTAACAACTGCCGGTTTTGCATGTTCTCAATGCGGTCAGCCGCTATTTGTTTATAAGCTT
    GAACAGGTGAGTGAAAAAGGCAAGGCTTATACAAATTACTTCGGCCGGTGTAATGTGGCC
    GAGCATGAGAAATTGATTCTTCTTGCTCAATTAAAACCTGAAAAAGACAGTGACGAAGCA
    GTGACATACTCCCTTGGCAAATTCGGCCAGAGGGCATTGGACTTTTATTCAATCCACGTA
    ACAAAAGAATCCACCCATCCAGTAAAGCCCCTGGCACAGATTGCGGGCAACCGCTATGCA
    AGCGGACCTGTTGGCAAGGCCCTTTCCGATGCCTGTATGGGCACTATAGCCAGTTTTCTT
    TCGAAATATCAAGACATCATCATAGAACATCAAAAGGTTGTGAAGGGTAATCAAAAGAGG
    TTAGAGAGTCTCAGGGAATTGGCAGGGAAAGAAAATCTTGAGTACCCATCGGTTACACTG
    CCGCCGCAGCCGCATACGAAAGAAGGGGTTGACGCTTATAACGAAGTTATTGCAAGGGTA
    CGTATGTGGGTTAATCTTAATCTGTGGCAAAAGCTGAAGCTCAGCCGTGATGACGCAAAA
    CCGCTACTGCGGCTAAAAGGATTCCCATCTTTCCCTGTTGTGGAGCGGCGTGAAAACGAA
    GTTGACTGGTGGAATACGATTAATGAAGTAAAAAAACTGATTGACGCTAAACGAGATATG
    GGACGGGTATTCTGGAGCGGCGTTACCGCAGAAAAGAGAAATACCATCCTTGAAGGATAC
    AACTATCTGCCAAATGAGAATGACCATAAAAAGAGAGAGGGCAGTTTGGAAAACCCTAAG
    AAGCCTGCCAAACGCCAGTTTGGAGACCTCTTGCTGTATCTTGAAAAGAAATATGCCGGA
    GACTGGGGAAAGGTCTTCGATGAGGCATGGGAGAGGATAGATAAGAAAATAGCCGGACTC
    ACAAGCCATATAGAGCGCGAAGAAGCAAGAAACGCGGAAGACGCTCAATCCAAAGCCGTA
    CTTACAGACTGGCTAAGGGCAAAGGCATCATTTGTTCTTGAAAGACTGAAGGAAATGGAT
    GAAAAGGAATTCTATGCGTGTGAAATCCAACTTCAAAAATGGTATGGCGATCTTCGAGGC
    AACCCGTTTGCCGTTGAAGCTGAGAATAGAGTTGTTGATATAAGCGGGTTTTCTATCGGA
    AGCGATGGCCATTCAATCCAATACAGAAATCTCCTTGCCTGGAAATATCTGGAGAACGGC
    AAGCGTGAATTCTATCTGTTAATGAATTATGGCAAGAAAGGGCGCATCAGATTTACAGAT
    GGAACAGATATTAAAAAGAGCGGCAAATGGCAGGGACTATTATATGGCGGTGGCAAGGCA
    AAGGTTATTGATCTGACTTTCGACCCCGATGATGAACAGTTGATAATCCTGCCGCTGGCC
    TTTGGCACAAGGCAAGGCCGCGAGTTTATCTGGAACGATTTGCTGAGTCTTGAAACAGGC
    CTGATAAAGCTCGCAAACGGAAGAGTTATCGAAAAAACAATCTATAACAAAAAAATAGGG
    CGGGATGAACCGGCTCTATTCGTTGCCTTAACATTTGAGCGCCGGGAAGTTGTTGATCCA
    TCAAATATAAAGCCTGTAAACCTTATAGGCGTTGACCGCGGCGAAAACATCCCGGCGGTT
    ATTGCATTGACAGACCCTGAAGGTTGTCCTTTACCGGAATTCAAGGATTCATCAGGGGGC
    CCAACAGACATCCTGCGAATAGGAGAAGGATATAAGGAAAAGCAGAGGGCTATTCAGGCA
    GCAAAGGAGGTAGAGCAAAGGCGGGCTGGCGGTTATTCACGGAAGTTTGCATCCAAGTCG
    AGGAACCTGGCGGACGACATGGTGAGAAATTCAGCGCGAGACCTTTTTTACCATGCCGTT
    ACCCACGATGCCGTCCTTGTCTTTGAAAACCTGAGCAGGGGTTTTGGAAGGCAGGGCAAA
    AGGACCTTCATGACGGAAAGACAATATACAAAGATGGAAGACTGGCTGACAGCGAAGCTC
    GCATACGAAGGTCTTACGTCAAAAACCTACCTTTCAAAGACGCTGGCGCAATATACGTCA
    AAAACATGCTCCAACTGCGGGTTTACTATAACGACTGCCGATTATGACGGGATGTTGGTA
    AGGCTTAAAAAGACTTCTGATGGATGGGCAACTACCCTCAACAACAAAGAATTAAAAGCC
    GAAGGCCAGATAACGTATTATAACCGGTATAAAAGGCAAACCGTGGAAAAAGAACTCTCC
    GCAGAGCTTGACAGGCTTTCAGAAGAGTCGGGCAATAATGATATTTCTAAGTGGACCAAG
    GGTCGCCGGGACGAGGCATTATTTTTGTTAAAGAAAAGATTCAGCCATCGGCCTGTTCAG
    GAACAGTTTGTTTGCCTCGATTGCGGCCATGAAGTCCACGCCGATGAACAGGCAGCCTTG
    AATATTGCAAGGTCATGGCTTTTTCTAAACTCAAATTCAACAGAATTCAAAAGTTATAAA
    TCGGGTAAACAGCCCTTCGTTGGTGCTTGGCAGGCCTTTTACAAAAGGAGGCTTAAAGAG
    GTATGGAAGCCCAACGCC
    SEQ ATGAAAAGGATAAATAAAATACGAAGGAGATTGGTAAAGGATAGCAACACGAAAAAAGCC
    ID GGCAAAACCGGCCCTATGAAAACCTTGCTCGTTCGGGTTATGACACCTGACCTGAGAGAA
    NO: AGGTTAGAGAATCTTCGCAAAAAGCCGGAAAACATTCCTCAGCCCATTTCAAATACTTCA
    40 CGTGCAAATTTAAATAAACTCCTCACTGACTATACGGAAATGAAGAAAGCAATCCTGCAT
    GTTTATTGGGAAGAGTTCCAAAAAGACCCTGTCGGATTGATGAGCAGGGTTGCACAACCA
    GCGCCCAAGAATATTGATCAGAGAAAATTGATTCCGGTGAAGGACGGAAATGAGAGACTA
    ACAAGTTCTGGATTTGCCTGTTCTCAGTGCTGTCAACCCCTCTATGTTTATAAGCTTGAA
    CAAGTGAATGACAAGGGTAAGCCCCATACAAATTACTTTGGCCGTTGTAATGTCTCCGAG
    CATGAACGTTTGATATTGCTCTCGCCGCATAAACCGGAGGCAAATGACGAGCTAGTAACG
    TATTCGTTGGGGAAGTTCGGTCAAAGGGCATTGGACTTTTATTCAATCCACGTAACAAGA
    GAATCGAACCATCCTGTAAAGCCGCTAGAACAGATCGGTGGCAATAGCTGCGCAAGTGGT
    CCCGTTGGTAAGGCTTTATCTGATGCCTGTATGGGAGCAGTAGCCAGTTTCCTTACAAAG
    TACCAGGACATCATCCTCGAACACCAAAAGGTTATAAAAAAAAACGAAAAGAGATTGGCA
    AATCTAAAGGATATAGCAAGTGCAAACGGGCTTGCATTTCCTAAAATCACTCTTCCACCG
    CAACCGCATACAAAAGAAGGGATTGAAGCTTATAACAATGTTGTTGCTCAGATAGTGATC
    TGGGTAAACCTGAATCTTTGGCAGAAACTCAAAATTGGCAGGGATGAGGCAAAGCCCTTA
    CAGCGGCTTAAGGGTTTTCCGTCCTTCCCTCTTGTTGAACGCCAGGCGAATGAGGTTGAT
    TGGTGGGATATGGTCTGTAATGTCAAAAAGTTGATTAACGAAAAGAAAGAGGACGGGAAG
    GTCTTCTGGCAAAATCTTGCTGGATATAAAAGGCAGGAAGCCTTGCTTCCATATCTTTCG
    TCTGAAGAAGACCGTAAAAAAGGAAAAAAGTTTGCGCGTTATCAGTTTGGTGACCTTTTG
    CTTCACCTTGAAAAGAAACACGGTGAAGATTGGGGCAAAGTTTATGATGAGGCATGGGAA
    AGAATAGATAAAAAAGTTGAAGGTCTGAGTAAGCACATAAAGTTGGAGGAAGAAAGAAGG
    TCTGAAGATGCTCAATCAAAGGCTGCCCTCACTGATTGGCTCAGGGCAAAGGCCTCTTTT
    GTTATTGAAGGGCTCAAAGAAGCTGATAAGGATGAGTTTTGCAGGTGTGAGTTAAAGCTT
    CAAAAGTGGTATGGAGATTTGAGAGGAAAACCATTTGCTATAGAAGCAGAGAACAGCATT
    TTAGATATAAGCGGATTTTCTAAACAGTATAATTGTGCATTTATATGGCAGAAAGACGGC
    GTAAAGAAGTTAAATCTTTATTTAATAATAAATTACTTCAAAGGTGGTAAGCTACGCTTC
    AAAAAAATCAAGCCAGAAGCTTTTGAAGCAAATAGGTTTTATACAGTAATTAATAAAAAA
    AGCGGTGAGATTGTGCCTATGGAGGTCAACTTCAATTTTGATGACCCGAATTTGATAATT
    CTGCCTTTGGCCTTTGGAAAAAGGCAGGGGAGGGAGTTTATCTGGAACGACCTATTGAGC
    CTTGAGACGGGTTCATTGAAACTCGCCAATGGCAGGGTTATTGAAAAAACGCTCTATAAC
    AGAAGGACGAGACAGGATGAACCAGCACTTTTTGTTGCCCTGACATTTGAAAGAAGAGAG
    GTGCTTGACTCATCGAATATAAAACCGATGAATCTGATAGGAATAGACCGGGGAGAAAAT
    ATCCCGGCAGTCATAGCATTAACAGACCCGGAAGGATGCCCCTTGTCAAGATTCAAAGAT
    TCATTGGGCAATCCAACGCATATTTTGCGAATAGGAGAAAGTTATAAGGAAAAACAACGG
    ACTATTCAGGCTGCTAAAGAAGTTGAACAAAGGCGGGCAGGCGGATATTCGAGAAAATAT
    GCATCAAAGGCGAAGAATCTGGCGGACGATATGGTAAGAAATACAGCTCGTGACCTCTTA
    TATTATGCTGTTACTCAAGATGCAATGCTCATTTTTGAAAATCTTTCCCGCGGTTTTGGT
    AGACAAGGCAAGAGGACTTTTATGGCGGAAAGGCAGTACACGAGGATGGAAGACTGGCTG
    ACTGCAAAGCTTGCCTATGAAGGTCTGCCATCAAAAACCTATCTTTCAAAGACTCTGGCA
    CAGTATACCTCAAAGACATGTTCTAATTGTGGTTTTACAATCACAAGTGCAGATTATGAC
    AGGGTGCTCGAAAAGCTCAAGAAGACGGCTACTGGATGGATGACTACAATCAATGGAAAA
    GAGTTAAAAGTTGAAGGACAGATAACATACTATAACCGGTATAAAAGGCAGAATGTGGTA
    AAAGACCTCTCTGTAGAGCTGGATAGACTTTCGGAAGAGTCGGTAAATAATGATATTTCT
    AGTTGGACAAAAGGCCGCAGTGGTGAAGCTTTATCTCTGCTAAAAAAGAGATTTAGTCAC
    AGGCCGGTGCAGGAAAAGTTTGTTTGCCTGAACTGTGGTTTTGAAACCCATGCAGACGAA
    CAAGCAGCACTGAATATTGCAAGGTCGTGGCTCTTTCTCCGTTCTCAAGAATATAAGAAG
    TATCAAACCAATAAAACGACCGGAAATACTGACAAAAGGGCATTTGTTGAAACATGGCAA
    TCCTTTTACAGAAAGAAGCTCAAAGAAGTATGGAAACCA
    SEQ ATGGGTAAAATGTATTACCTTGGTTTAGACATTGGCACGAATTCCGTGGGCTACGCGGTG
    ID ACCGACCCCTCATACCACCTGCTGAAGTTTAAGGGGGAACCAATGTGGGGTGCGCACGTA
    NO: TTTGCCGCCGGTAATCAGAGCGCGGAACGACGCTCGTTCCGCACATCGCGTCGTCGTTTG
    41 GACCGACGCCAACAGCGCGTTAAACTGGTACAGGAGATTTTTGCCCCGGTGATTAGTCCG
    ATCGACCCACGCTTCTTCATTCGTCTGCATGAATCCGCCCTGTGGCGCGATGACGTCGCG
    GAGACGGATAAACATATCTTTTTCAATGATCCTACCTATACCGATAAGGAATATTATAGC
    GATTACCCGACTATCCATCACCTGATCGTTGATCTGATGGAAAGCTCTGAGAAACACGAT
    CCGCGGCTGGTGTACCTTGCAGTGGCGTGGTTAGTGGCACACCGTGGTCATTTTCTGAAC
    GAGGTGGACAAGGATAATATTGGAGATGTGTTGTCGTTCGACGCATTTTATCCGGAGTTT
    CTCGCGTTCCTGTCGGACAACGGTGTATCACCGTGGGTGTGCGAAAGCAAAGCGCTGCAG
    GCGACCTTGCTGAGCCGTAACTCAGTGAACGACAAATATAAAGCCCTTAAGTCTCTGATC
    TTCGGATCCCAGAAACCTGAAGATAACTTCGATGCCAATATTTCGGAAGATGGACTCATT
    CAACTGCTGGCCGGCAAAAAGGTAAAAGTTAACAAACTGTTCCCTCAGGAATCGAACGAT
    GCATCCTTCACATTGAATGATAAAGAAGACGCGATAGAAGAAATCCTGGGTACGCTTACA
    CCAGATGAATGTGAATGGATTGCGCATATACGCCGCCTTTTTGACTGGGCTATCATGAAA
    CATGCTCTGAAAGATGGCAGGACTATTAGCGAGTCAAAAGTCAAACTGTATGAGCAGCAC
    CATCACGATCTGACCCAACTTAAATACTTCGTGAAAACCTACCTTGCAAAAGAATACGAC
    GATATTTTCCGCAACGTGGATAGCGAAACAACGAAAAACTATGTAGCGTATTCCTATCAT
    GTGAAAGAGGTGAAAGGCACTCTGCCTAAAAATAAGGCAACGCAAGAAGAGTTTTGTAAG
    TATGTCCTGGGCAAGGTTAAAAACATTGAATGCTCTGAAGCAGACAAGGTTGACTTTGAT
    GAGATGATTCAGCGTCTTACCGACAACTCTTTTATGCCTAAGCAGGTTTCGGGCGAAAAC
    CGCGTTATTCCTTATCAGTTATATTATTATGAACTGAAGACAATTCTGAATAAAGCAGCC
    TCGTACCTGCCTTTCCTGACGCAGTGTGGAAAAGATGCAATTTCGAACCAGGACAAACTA
    CTGTCGATCATGACGTTCCGTATTCCTTACTTCGTCGGACCCTTGCGAAAAGATAATTCG
    GAACATGCATGGCTCGAACGAAAGGCCGGTAAGATTTATCCGTGGAACTTTAACGACAAA
    GTGGACTTGGATAAATCAGAAGAAGCGTTCATTCGCCGAATGACCAATACCTGTACCTAT
    TATCCCGGCGAAGATGTTTTACCGTTGGATTCGCTGATCTATGAGAAATTTATGATTTTA
    AATGAAATCAATAATATTCGTATTGACGGCTACCCGATTAGTGTTGACGTTAAACAGCAG
    GTTTTTGGCTTGTTCGAAAAAAAACGACGCGTAACCGTGAAAGATATTCAGAACCTGCTG
    CTGTCTCTCGGAGCTCTGGACAAACACGGGAAGCTGACAGGCATCGATACCACTATCCAC
    TCAAACTATAATACGTATCACCATTTTAAATCTCTCATGGAACGCGGCGTCCTGACCCGG
    GATGACGTGGAACGCATCGTTGAAAGGATGACCTACAGCGACGATACTAAGCGTGTGCGT
    CTGTGGCTGAATAACAACTATGGTACTTTAACCGCCGACGATGTGAAACACATTTCGCGT
    CTGCGCAAACACGATTTTGGCCGTTTATCCAAAATGTTCTTAACAGGTCTGAAGGGTGTC
    CATAAGGAGACCGGTGAACGTGCCTCCATACTGGATTTCATGTGGAACACGAACGATAAC
    CTGATGCAGCTCCTTTCCGAATGCTACACGTTCAGTGATGAAATCACAAAGCTGCAAGAG
    GCGTATTATGCAAAAGCCCAGTTGTCTTTAAACGATTTTTTAGACTCGATGTACATCTCT
    AACGCGGTGAAACGTCCGATTTACAGAACTCTGGCAGTGGTGAACGATATTCGAAAAGCA
    TGTGGGACGGCCCCTAAACGCATTTTCATCGAAATGGCTCGTGATGGTGAATCAAAAAAA
    AAGAGAAGTGTTACACGTCGCGAGCAGATCAAAAACCTGTACCGCTCGATTCGTAAAGAT
    TTCCAGCAGGAAGTTGATTTTCTGGAAAAGATCCTGGAAAATAAATCTGATGGTCAACTT
    CAGTCAGATGCTTTGTATCTTTACTTTGCACAATTAGGGCGCGATATGTACACGGGCGAT
    CCAATAAAGCTGGAGCACATCAAAGATCAGAGTTTCTATAACATAGACCATATTTACCCG
    CAGTCTATGGTGAAAGACGATTCCCTAGATAACAAAGTGCTGGTGCAAAGCGAAATTAAC
    GGCGAGAAAAGCTCGCGATACCCTTTGGACGCCGCGATCCGCAATAAAATGAAGCCCCTT
    TGGGACGCTTACTATAATCATGGCCTGATCTCCTTAAAGAAATACCAGCGTCTAACGCGC
    TCGACCCCGTTTACCGATGATGAAAAATGGGACTTTATTAATCGCCAGTTAGTGGAAACC
    CGTCAATCTACCAAAGCGCTGGCCATTTTGTTGAAGCGTAAGTTTCCAGACACCGAAATT
    GTGTATTCGAAGGCGGGGTTATCGTCCGACTTCAGACATGAATTCGGCCTTGTAAAAAGT
    CGCAATATTAATGATTTGCACCACGCTAAAGACGCATTCTTGGCTATCGTTACCGGCAAT
    GTGTACCATGAAAGATTCAATCGCAGATGGTTTATGGTGAACCAGCCGTACTCAGTTAAA
    ACTAAAACTCTTTTTACCCACAGCATAAAGAATGGCAACTTCGTTGCCTGGAACGGCGAA
    GAAGATCTCGGTCGTATTGTAAAAATGCTGAAGCAAAACAAAAATACCATTCACTTCACG
    CGCTTCTCCTTCGATCGCAAAGAAGGATTATTTGATATCCAACCTCTGAAAGCCAGCACC
    GGCTTAGTCCCACGAAAAGCCGGTCTGGATGTCGTTAAATACGGCGGATATGACAAATCT
    ACCGCGGCCTATTACCTGCTGGTGAGGTTCACGCTCGAGGACAAGAAAACCCAGCACAAG
    CTGATGATGATTCCTGTAGAAGGCCTGTACAAGGCTCGCATTGATCATGACAAGGAATTT
    CTTACCGATTATGCGCAAACGACTATAAGCGAAATCCTACAGAAAGATAAACAGAAAGTG
    ATCAATATTATGTTTCCAATGGGTACGAGGCATATAAAACTCAATTCAATGATTAGTATC
    GATGGCTTCTATCTTAGTATCGGCGGAAAGTCCTCTAAAGGTAAGTCAGTTCTATGTCAC
    GCAATGGTTCCACTGATCGTCCCTCACAAAATCGAATGTTACATTAAAGCAATGGAAAGC
    TTCGCCCGGAAGTTTAAAGAAAACAACAAGCTGCGCATCGTAGAAAAATTCGATAAAATC
    ACCGTTGAAGACAACCTGAATCTCTACGAGCTCTTTCTCCAAAAACTGCAGCATAATCCC
    TATAATAAGTTTTTTTCGACACAGTTTGACGTACTGACGAACGGCCGTTCTACTTTCACA
    AAACTGTCGCCGGAGGAACAGGTACAGACGCTCTTGAACATTTTAAGTATCTTTAAAACA
    TGCCGCAGTTCGGGTTGCGACCTGAAATCCATCAACGGCAGTGCCCAGGCAGCGCGCATC
    ATGATTAGCGCTGACTTAACTGGACTGTCGAAAAAATATTCAGATATTAGGTTGGTTGAA
    CAGTCAGCTTCTGGTTTGTTCGTATCCAAAAGTCAGAACTTACTGGAGTATCTCTAA
    SEQ ATGTCATCGCTCACGAAATTCACTAACAAATACTCTAAACAGCTCACCATTAAGAATGAA
    ID CTCATCCCAGTTGGCAAAACACTGGAGAACATCAAAGAGAATGGTCTGATAGATGGCGAC
    NO: GAACAGCTGAATGAGAATTATCAGAAGGCGAAAATTATTGTGGATGATTTTCTGCGGGAC
    42 TTCATTAATAAAGCACTGAATAATACGCAGATCGGGAACTGGCGCGAACTGGCGGATGCC
    CTTAATAAAGAGGATGAAGATAACATCGAGAAATTGCAGGATAAAATTCGGGGAATCATT
    GTATCCAAATTTGAAACGTTTGATCTGTTTAGCAGCTATTCTATTAAGAAAGATGAAAAG
    ATTATTGACGACGACAATGATGTTGAAGAAGAGGAACTGGATCTGGGCAAGAAGACCAGC
    TCATTTAAATACATATTTAAAAAAAACCTGTTTAAGTTAGTGTTGCCATCCTACCTGAAA
    ACCACAAACCAGGACAAGCTGAAGATTATTAGCTCGTTTGATAATTTTTCAACGTACTTC
    CGCGGGTTCTTTGAAAACCGGAAAAACATTTTTACCAAGAAACCGATCTCCACAAGTATT
    GCGTATCGCATTGTTCATGATAACTTCCCGAAATTCCTTGATAACATTCGTTGTTTTAAT
    GTGTGGCAGACGGAATGCCCGCAACTAATCGTGAAAGCAGATAACTATCTGAAAAGCAAA
    AATGTTATAGCGAAAGATAAAAGTTTGGCAAACTATTTTACCGTGGGCGCGTATGACTAT
    TTCCTGTCTCAGAATGGTATAGATTTTTACAACAATATTATAGGTGGACTGCCAGCGTTC
    GCCGGCCATGAGAAAATCCAAGGTCTCAATGAATTCATCAATCAAGAGTGCCAAAAAGAC
    AGCGAGCTGAAAAGTAAGCTGAAAAACCGTCACGCGTTCAAAATGGCGGTACTGTTCAAA
    CAGATACTCAGCGATCGTGAAAAAAGTTTTGTAATTGATGAGTTCGAGTCGGATGCTCAA
    GTTATTGACGCCGTTAAAAACTTTTACGCCGAACAGTGCAAAGATAACAATGTTATTTTT
    AACTTATTAAATCTTATCAAGAATATCGCTTTCTTAAGTGATGACGAACTGGACGGCATA
    TTCATTGAAGGGAAATACCTGTCGAGCGTTAGTCAAAAACTCTATAGCGATTGGTCAAAA
    TTACGTAACGACATTGAGGATTCGGCTAACTCTAAACAAGGCAATAAAGAGCTGGCCAAG
    AAGATCAAAACCAACAAAGGGGATGTAGAAAAAGCGATCTCGAAATATGAGTTCTCGCTG
    TCGGAACTGAACTCGATTGTACATGATAACACCAAGTTTTCTGACCTCCTTAGTTGTACA
    CTGCATAAGGTGGCTTCTGAGAAACTGGTGAAGGTCAATGAAGGCGACTGGCCGAAACAT
    CTCAAGAATAATGAAGAGAAACAAAAAATCAAAGAGCCGCTTGATGCTCTGCTGGAGATC
    TATAATACACTTCTGATTTTTAACTGCAAAAGCTTCAATAAAAACGGCAACTTCTATGTC
    GACTATGATCGTTGCATCAATGAACTGAGTTCGGTCGTGTATCTGTATAATAAAACACGT
    AACTATTGCACTAAAAAACCCTATAACACGGACAAGTTCAAACTCAATTTTAACAGTCCG
    CAGCTCGGTGAAGGCTTTTCCAAGTCGAAAGAAAATGACTGTCTGACTCTTTTGTTTAAA
    AAAGACGACAACTATTATGTAGGCATTATCCGCAAAGGTGCAAAAATCAATTTTGATGAT
    ACACAAGCAATCGCCGATAACACCGACAATTGCATCTTTAAAATGAATTATTTCCTACTT
    AAAGACGCAAAAAAATTTATCCCGAAATGTAGCATTCAGCTGAAAGAAGTCAAGGCCCAT
    TTTAAGAAATCTGAAGATGATTACATTTTGTCTGATAAAGAGAAATTTGCTAGCCCGCTG
    GTCATTAAAAAGAGCACATTTTTGCTGGCAACTGCACATGTGAAAGGGAAAAAAGGCAAT
    ATCAAGAAATTTCAGAAAGAATATTCGAAAGAAAACCCCACTGAGTATCGCAATTCTTTA
    AACGAATGGATTGCTTTTTGTAAAGAGTTCTTAAAAACTTATAAAGCGGCTACCATTTTT
    GATATAACCACATTGAAAAAGGCAGAGGAATATGCTGATATTGTAGAATTCTACAAGGAT
    GTCGATAATCTGTGCTACAAACTGGAGTTCTGCCCGATTAAAACCTCGTTTATAGAAAAC
    CTGATAGATAACGGCGACCTGTATCTGTTTCGCATCAATAACAAAGACTTCAGCAGTAAA
    TCGACCGGCACCAAGAACCTTCATACGTTATATTTACAAGCTATATTCGATGAACGTAAT
    CTGAACAATCCGACAATTATGCTGAATGGGGGAGCAGAACTGTTCTATCGTAAAGAAAGT
    ATTGAGCAGAAAAACCGTATCACACACAAAGCCGGTTCAATTCTCGTGAATAAGGTGTGT
    AAAGACGGTACAAGCCTGGATGATAAGATACGTAATGAAATTTATCAATATGAGAATAAA
    TTTATTGATACCCTGTCTGATGAAGCTAAAAAGGTGTTACCGAATGTCATTAAAAAGGAA
    GCTACCCATGACATTACAAAAGATAAACGTTTCACTAGTGACAAATTCTTCTTTCACTGC
    CCCCTGACAATTAATTATAAGGAAGGCGATACCAAGCAGTTCAATAACGAAGTGCTGAGT
    TTTCTGCGTGGAAATCCTGACATCAACATTATCGGCATTGACCGCGGAGAGCGTAATTTA
    ATCTATGTAACGGTTATAAACCAGAAAGGCGAGATTCTGGATTCGGTTTCATTCAATACC
    GTGACCAACAAGAGTTCAAAAATCGAGCAGACAGTCGATTATGAAGAGAAATTGGCAGTC
    CGCGAGAAAGAGAGGATTGAAGCAAAACGTTCCTGGGACTCTATCTCAAAAATTGCGACA
    CTAAAGGAAGGTTATCTGAGCGCAATAGTTCACGAGATCTGTCTGTTAATGATTAAACAC
    AACGCGATCGTTGTCTTAGAGAATCTTAATGCAGGCTTTAAGCGTATTCGTGGCGGTTTA
    TCAGAAAAAAGTGTTTATCAAAAATTCGAAAAAATGTTGATTAACAAACTGAACTATTTT
    GTCAGCAAGAAGGAATCCGACTGGAATAAACCGTCTGGTCTGCTGAATGGACTGCAGCTT
    TCGGATCAGTTTGAAAGCTTCGAAAAACTGGGTATTCAGTCTGGTTTTATTTTTTACGTG
    CCGGCTGCATATACCTCAAAGATTGATCCGACCACGGGCTTCGCCAATGTTCTGAATCTG
    TCGAAGGTACGCAATGTTGATGCGATCAAAAGCTTTTTTTCTAACTTCAACGAAATTAGT
    TATAGCAAGAAAGAAGCCCTTTTCAAATTCTCATTCGATCTGGATTCACTGAGTAAGAAA
    GGCTTTAGTAGCTTTGTGAAATTTAGTAAGAGTAAATGGAACGTCTACACCTTTGGAGAA
    CGTATCATAAAGCCAAAGAATAAGCAAGGTTATCGGGAGGACAAAAGAATCAACTTGACC
    TTCGAGATGAAGAAGTTACTTAACGAGTATAAGGTTTCTTTTGATCTTGAAAATAACTTG
    ATTCCGAATCTCACGAGTGCCAACCTGAAGGATACTTTTTGGAAAGAGCTATTCTTTATC
    TTCAAGACTACGCTGCAGCTCCGTAACAGCGTTACTAACGGTAAAGAAGATGTGCTCATC
    TCTCCGGTCAAAAATGCGAAGGGTGAATTCTTCGTTTCGGGAACGCATAACAAGACTCTT
    CCGCAAGATTGCGATGCGAACGGTGCATACCATATTGCGTTGAAAGGTCTGATGATACTC
    GAACGTAACAACCTTGTACGTGAGGAGAAAGATACGAAAAAGATTATGGCGATTTCAAAC
    GTGGATTGGTTCGAGTACGTGCAGAAACGTAGAGGCGTTCTGTAA
    SEQ ATGAACAACTACGACGAATTCACCAAACTGTACCCGATCCAGAAAACCATCCGTTTCGAA
    ID CTGAAACCGCAGGGTCGTACCATGGAACACCTGGAAACCTTCAACTTCTTCGAAGAAGAC
    NO: CGTGACCGTGCGGAAAAATACAAAATCCTGAAAGAAGCGATCGACGAATACCACAAAAAA
    43 TTCATCGACGAACACCTGACCAACATGTCTCTGGACTGGAACTCTCTGAAACAGATCTCT
    GAAAAATACTACAAATCTCGTGAAGAAAAAGACAAAAAAGTTTTCCTGTCTGAACAGAAA
    CGTATGCGTCAGGAAATCGTTTCTGAATTCAAAAAAGACGACCGTTTCAAAGACCTGTTC
    TCTAAAAAACTGTTCTCTGAACTGCTGAAAGAAGAAATCTACAAAAAAGGTAACCACCAG
    GAAATCGACGCGCTGAAATCTTTCGACAAATTCTCTGGTTACTTCATCGGTCTGCACGAA
    AACCGTAAAAACATGTACTCTGACGGTGACGAAATCACCGCGATCTCTAACCGTATCGTT
    AACGAAAACTTCCCGAAATTCCTGGACAACCTGCAGAAATACCAGGAAGCGCGTAAAAAA
    TACCCGGAATGGATCATCAAAGCGGAATCTGCGCTGGTTGCGCACAACATCAAAATGGAC
    GAAGTTTTCTCTCTGGAATACTTCAACAAAGTTCTGAACCAGGAAGGTATCCAGCGTTAC
    AACCTGGCGCTGGGTGGTTACGTTACCAAATCTGGTGAAAAAATGATGGGTCTGAACGAC
    GCGCTGAACCTGGCGCACCAGTCTGAAAAATCTTCTAAAGGTCGTATCCACATGACCCCG
    CTGTTCAAACAGATCCTGTCTGAAAAAGAATCTTTCTCTTACATCCCGGACGTTTTCACC
    GAAGACTCTCAGCTGCTGCCGTCTATCGGTGGTTTCTTCGCGCAGATCGAAAACGACAAA
    GACGGTAACATCTTCGACCGTGCGCTGGAACTGATCTCTTCTTACGCGGAATACGACACC
    GAACGTATCTACATCCGTCAGGCGGACATCAACCGTGTTTCTAACGTTATCTTCGGTGAA
    TGGGGTACCCTGGGTGGTCTGATGCGTGAATACAAAGCGGACTCTATCAACGACATCAAC
    CTGGAACGTACCTGCAAAAAAGTTGACAAATGGCTGGACTCTAAAGAATTCGCGCTGTCT
    GACGTTCTGGAAGCGATCAAACGTACCGGTAACAACGACGCGTTCAACGAATACATCTCT
    AAAATGCGTACCGCGCGTGAAAAAATCGACGCGGCGCGTAAAGAAATGAAATTCATCTCT
    GAAAAAATCTCTGGTGACGAAGAATCTATCCACATCATCAAAACCCTGCTGGACTCTGTT
    CAGCAGTTCCTGCACTTCTTCAACCTGTTCAAAGCGCGTCAGGACATCCCGCTGGACGGT
    GCGTTCTACGCGGAATTCGACGAAGTTCACTCTAAACTGTTCGCGATCGTTCCGCTGTAC
    AACAAAGTTCGTAACTACCTGACCAAAAACAACCTGAACACCAAAAAAATCAAACTGAAC
    TTCAAAAACCCGACCCTGGCGAACGGTTGGGACCAGAACAAAGTTTACGACTACGCGTCT
    CTGATCTTCCTGCGTGACGGTAACTACTACCTGGGTATCATCAACCCGAAACGTAAAAAA
    AACATCAAATTCGAACAGGGTTCTGGTAACGGTCCGTTCTACCGTAAAATGGTTTACAAA
    CAGATCCCGGGTCCGAACAAAAACCTGCCGCGTGTTTTCCTGACCTCTACCAAAGGTAAA
    AAAGAATACAAACCGTCTAAAGAAATCATCGAAGGTTACGAAGCGGACAAACACATCCGT
    GGTGACAAATTCGACCTGGACTTCTGCCACAAACTGATCGACTTCTTCAAAGAATCTATC
    GAAAAACACAAAGACTGGTCTAAATTCAACTTCTACTTCTCTCCGACCGAATCTTACGGT
    GACATCTCTGAATTCTACCTGGACGTTGAAAAACAGGGTTACCGTATGCACTTCGAAAAC
    ATCTCTGCGGAAACCATCGACGAATACGTTGAAAAAGGTGACCTGTTCCTGTTCCAGATC
    TACAACAAAGACTTCGTTAAAGCGGCGACCGGTAAAAAAGACATGCACACCATCTACTGG
    AACGCGGCGTTCTCTCCGGAAAACCTGCAGGACGTTGTTGTTAAACTGAACGGTGAAGCG
    GAACTGTTCTACCGTGACAAATCTGACATCAAAGAAATCGTTCACCGTGAAGGTGAAATC
    CTGGTTAACCGTACCTACAACGGTCGTACCCCGGTTCCGGACAAAATCCACAAAAAACTG
    ACCGACTACCACAACGGTCGTACCAAAGACCTGGGTGAAGCGAAAGAATACCTGGACAAA
    GTTCGTTACTTCAAAGCGCACTACGACATCACCAAAGACCGTCGTTACCTGAACGACAAA
    ATCTACTTCCACGTTCCGCTGACCCTGAACTTCAAAGCGAACGGTAAAAAAAACCTGAAC
    AAAATGGTTATCGAAAAATTCCTGTCTGACGAAAAAGCGCACATCATCGGTATCGACCGT
    GGTGAACGTAACCTGCTGTACTACTCTATCATCGACCGTTCTGGTAAAATCATCGACCAG
    CAGTCTCTGAACGTTATCGACGGTTTCGACTACCGTGAAAAACTGAACCAGCGTGAAATC
    GAAATGAAAGACGCGCGTCAGTCTTGGAACGCGATCGGTAAAATCAAAGACCTGAAAGAA
    GGTTACCTGTCTAAAGCGGTTCACGAAATCACCAAAATGGCGATCCAGTACAACGCGATC
    GTTGTTATGGAAGAACTGAACTACGGTTTCAAACGTGGTCGTTTCAAAGTTGAAAAACAG
    ATCTACCAGAAATTCGAAAACATGCTGATCGACAAAATGAACTACCTGGTTTTCAAAGAC
    GCGCCGGACGAATCTCCGGGTGGTGTTCTGAACGCGTACCAGCTGACCAACCCGCTGGAA
    TCTTTCGCGAAACTGGGTAAACAGACCGGTATCCTGTTCTACGTTCCGGCGGCGTACACC
    TCTAAAATCGACCCGACCACCGGTTTCGTTAACCTGTTCAACACCTCTTCTAAAACCAAC
    GCGCAGGAACGTAAAGAATTCCTGCAGAAATTCGAATCTATCTCTTACTCTGCGAAAGAC
    GGTGGTATCTTCGCGTTCGCGTTCGACTACCGTAAATTCGGTACCTCTAAAACCGACCAC
    AAAAACGTTTGGACCGCGTACACCAACGGTGAACGTATGCGTTACATCAAAGAAAAAAAA
    CGTAACGAACTGTTCGACCCGTCTAAAGAAATCAAAGAAGCGCTGACCTCTTCTGGTATC
    AAATACGACGGTGGTCAGAACATCCTGCCGGACATCCTGCGTTCTAACAACAACGGTCTG
    ATCTACACCATGTACTCTTCTTTCATCGCGGCGATCCAGATGCGTGTTTACGACGGTAAA
    GAAGACTACATCATCTCTCCGATCAAAAACTCTAAAGGTGAATTCTTCCGTACCGACCCG
    AAACGTCGTGAACTGCCGATCGACGCGGACGCGAACGGTGCGTACAACATCGCGCTGCGT
    GGTGAACTGACCATGCGTGCGATCGCGGAAAAATTCGACCCGGACTCTGAAAAAATGGCG
    AAACTGGAACTGAAACACAAAGACTGGTTCGAATTCATGCAGACCCGTGGTGACTAA
    SEQ ATGACTAAAACATTTGATTCAGAGTTTTTTAATTTGTACTCGCTGCAAAAAACGGTACGC
    ID TTTGAGTTAAAACCCGTGGGAGAAACCGCGTCATTTGTGGAAGACTTTAAAAACGAGGGC
    NO: TTGAAACGTGTTGTGAGCGAAGATGAAAGGCGAGCCGTCGATTACCAGAAAGTTAAGGAA
    44 ATAATTGACGATTACCATCGGGATTTCATTGAAGAAAGTTTAAATTATTTTCCGGAACAG
    GTGAGTAAAGATGCTCTTGAGCAGGCGTTTCATCTTTATCAGAAACTGAAGGCAGCAAAA
    GTTGAGGAAAGGGAAAAAGCGCTGAAAGAATGGGAAGCGCTGCAGAAAAAGCTACGTGAA
    AAAGTGGTGAAATGCTTCTCGGACTCGAATAAAGCCCGCTTCTCAAGGATTGATAAAAAG
    GAACTGATTAAGGAAGACCTGATAAATTGGTTGGTCGCCCAGAATCGCGAGGATGATATC
    CCTACGGTCGAAACGTTTAACAACTTCACCACATATTTTACCGGCTTCCATGAGAATCGT
    AAAAATATTTACTCCAAAGATGATCACGCCACCGCTATTAGCTTTCGCCTTATTCATGAA
    AATCTTCCAAAGTTTTTTGACAACGTGATTAGCTTCAATAAGTTGAAAGAGGGTTTCCCT
    GAATTAAAATTTGATAAAGTGAAAGAGGATTTAGAAGTAGATTATGATCTGAAGCATGCG
    TTTGAAATAGAATATTTCGTTAACTTCGTGACCCAAGCGGGCATAGATCAGTATAATTAT
    CTGTTAGGAGGGAAAACCCTGGAGGACGGGACGAAAAAACAAGGGATGAATGAGCAAATT
    AATCTGTTCAAACAACAGCAAACGCGAGATAAAGCGCGTCAGATTCCCAAACTGATCCCC
    CTGTTCAAACAGATTCTTAGCGAAAGGACTGAAAGCCAGTCCTTTATTCCTAAACAATTT
    GAAAGTGATCAGGAGTTGTTCGATTCACTGCAGAAGTTACATAATAACTGCCAGGATAAA
    TTCACCGTGCTGCAACAAGCCATTCTCGGTCTGGCAGAGGCGGATCTTAAGAAGGTCTTC
    ATCAAAACCTCTGATTTAAATGCCTTATCTAACACCATTTTCGGGAATTACAGCGTCTTT
    TCCGATGCACTGAACCTGTATAAAGAAAGCCTGAAAACGAAAAAAGCGCAGGAGGCTTTT
    GAGAAACTACCGGCCCATTCTATTCACGACCTCATTCAATACTTGGAACAGTTCAATTCC
    AGCCTGGACGCGGAAAAACAACAGAGCACCGACACCGTCCTGAACTACTTCATCAAGACC
    GATGAATTATATTCTCGCTTCATTAAATCCACTAGCGAGGCTTTCACTCAGGTGCAGCCT
    TTGTTCGAACTGGAAGCCCTGTCATCTAAGCGCCGCCCACCGGAATCGGAAGATGAAGGG
    GCAAAAGGGCAGGAAGGCTTCGAGCAGATCAAGCGTATTAAAGCTTACCTGGATACGCTT
    ATGGAAGCGGTACACTTTGCAAAGCCGTTGTATCTTGTTAAGGGTCGTAAAATGATCGAA
    GGGCTCGATAAAGACCAGTCCTTTTATGAAGCGTTTGAAATGGCGTACCAAGAACTTGAA
    TCGTTAATCATTCCTATCTATAACAAAGCGCGGAGCTATCTGTCGCGGAAACCTTTCAAG
    GCCGATAAATTCAAGATTAATTTTGACAACAACACGCTACTGAGCGGATGGGATGCGAAC
    AAGGAAACTGCTAACGCGTCCATTCTGTTTAAGAAAGACGGGTTATATTACCTTGGAATT
    ATGCCGAAAGGTAAGACCTTTCTCTTTGACTACTTTGTATCGAGCGAGGATTCAGAGAAA
    CTGAAACAGCGTCGCCAGAAGACCGCCGAAGAAGCTCTGGCGCAGGATGGTGAAAGTTAC
    TTCGAAAAAATTCGTTATAAACTGTTACCAGGGGCTTCAAAGATGTTACCGAAAGTCTTT
    TTTAGCAACAAAAATATTGGCTTTTACAACCCGTCGGATGACATTTTACGCATTCGCAAC
    ACAGCCTCTCACACCAAAAACGGGACCCCTCAGAAAGGCCACTCAAAAGTTGAGTTTAAC
    CTGAATGATTGTCATAAGATGATTGATTTCTTCAAATCATCAATTCAGAAACACCCGGAA
    TGGGGGTCTTTTGGCTTTACGTTTTCTGATACCAGTGATTTTGAAGACATGAGTGCCTTC
    TACCGGGAAGTAGAAAACCAGGGTTACGTAATTAGCTTTGACAAAATCAAAGAGACCTAT
    ATACAGAGCCAGGTGGAACAGGGTAATCTCTACTTATTCCAGATTTATAACAAGGATTTC
    TCGCCCTACAGCAAAGGCAAACCAAACCTGCATACTCTGTACTGGAAAGCCCTGTTTGAA
    GAAGCGAACCTGAATAACGTAGTGGCGAAGTTGAACGGTGAAGCGGAAATCTTCTTCCGT
    CGTCACTCCATTAAGGCCTCTGATAAAGTTGTCCATCCGGCAAATCAGGCCATTGATAAT
    AAGAATCCACACACGGAAAAAACGCAGTCAACCTTTGAATATGACCTCGTTAAAGACAAA
    CGCTACACGCAAGATAAGTTCTTTTTCCACGTCCCAATCAGCCTCAACTTTAAAGCACAA
    GGGGTTTCAAAGTTTAATGATAAAGTCAATGGGTTCCTCAAGGGCAACCCGGATGTCAAC
    ATTATAGGTATAGACAGGGGCGAACGCCATCTGCTTTACTTTACCGTAGTGAATCAGAAA
    GGTGAAATACTGGTTCAGGAATCATTAAATACCTTGATGTCGGACAAAGGGCACGTTAAT
    GATTACCAGCAGAAACTGGATAAAAAAGAACAGGAACGTGATGCTGCGCGTAAATCGTGG
    ACCACGGTTGAGAACATTAAAGAGCTGAAAGAGGGGTATCTAAGCCATGTGGTACACAAA
    CTGGCGCACCTCATCATTAAATATAACGCAATAGTCTGCCTAGAAGACTTGAATTTTGGC
    TTTAAACGCGGCCGCTTCAAAGTGGAAAAACAAGTTTATCAAAAATTTGAAAAGGCGCTT
    ATAGATAAACTGAATTATCTGGTTTTTAAAGAAAAGGAACTTGGTGAGGTAGGGCACTAC
    TTGACAGCTTATCAACTGACGGCCCCGTTCGAATCATTCAAAAAACTGGGCAAACAGTCT
    GGCATTCTGTTTTACGTGCCGGCAGATTATACTTCAAAAATCGATCCAACAACTGGCTTT
    GTGAACTTCCTGGACCTGAGATATCAGTCTGTAGAAAAAGCTAAACAACTTCTTAGCGAT
    TTTAATGCCATTCGTTTTAACAGCGTTCAGAATTACTTTGAATTCGAAATTGACTATAAA
    AAACTTACTCCGAAACGTAAAGTCGGAACCCAAAGTAAATGGGTAATTTGTACGTATGGC
    GATGTCAGGTATCAGAACCGTCGGAATCAAAAAGGTCATTGGGAGACCGAAGAAGTGAAC
    GTGACCGAAAAGCTGAAGGCTCTGTTCGCCAGCGATTCAAAAACTACAACTGTGATCGAT
    TACGCAAATGATGATAACCTGATAGATGTGATTTTAGAGCAGGATAAAGCCAGCTTTTTT
    AAAGAACTGTTGTGGCTCCTGAAACTTACGATGACCTTACGACATTCCAAGATCAAATCG
    GAAGATGATTTTATTCTGTCACCGGTCAAGAATGAGCAGGGTGAATTCTATGATAGTAGG
    AAAGCCGGCGAAGTGTGGCCGAAAGACGCCGACGCCAATGGCGCCTATCATATCGCGCTC
    AAAGGGCTTTGGAATTTGCAGCAGATTAACCAGTGGGAAAAAGGTAAAACCCTGAATCTG
    GCTATCAAAAACCAGGATTGGTTTAGCTTTATCCAAGAGAAACCGTATCAGGAATGA
    SEQ ATGCATACAGGCGGTCTTCTTAGTATGGACGCGAAAGAGTTCACAGGTCAGTATCCGTTG
    ID TCGAAAACATTACGATTCGAACTTCGGCCCATCGGCCGCACGTGGGATAACCTGGAGGCC
    NO: TCAGGCTACTTAGCGGAAGACCGCCATCGTGCCGAATGTTATCCTCGTGCGAAAGAGTTA
    45 TTGGATGACAACCATCGTGCCTTCCTGAATCGTGTGTTGCCACAAATCGATATGGATTGG
    CACCCGATTGCGGAGGCCTTTTGTAAGGTACATAAAAACCCTGGTAATAAAGAACTTGCC
    CAGGATTACAACCTTCAGTTGTCAAAGCGCCGTAAGGAGATCAGCGCATATCTTCAGGAT
    GCAGATGGCTATAAAGGCCTGTTCGCGAAGCCCGCCTTAGACGAAGCTATGAAAATTGCG
    AAAGAAAACGGGAACGAAAGTGATATTGAGGTTCTCGAAGCGTTTAACGGTTTTAGCGTA
    TACTTCACCGGTTATCATGAGTCACGCGAGAACATTTATAGCGATGAGGATATGGTGAGC
    GTAGCCTACCGAATTACTGAGGATAATTTCCCGCGCTTTGTCTCAAACGCTTTGATCTTT
    GATAAATTAAACGAAAGCCATCCGGATATTATCTCTGAAGTATCGGGCAATCTTGGAGTT
    GATGACATTGGTAAGTACTTTGACGTGTCGAACTATAACAATTTTCTTTCCCAGGCCGGT
    ATAGATGACTACAATCACATTATTGGCGGCCATACAACCGAAGACGGACTGATACAAGCG
    TTTAATGTCGTATTGAACTTACGTCACCAAAAAGACCCTGGCTTTGAAAAAATTCAGTTC
    AAACAGCTCTACAAACAAATCCTGAGCGTGCGTACCAGCAAAAGCTACATCCCGAAACAG
    TTTGACAACTCTAAGGAGATGGTTGACTGCATTTGCGATTATGTCAGCAAAATAGAGAAA
    TCCGAAACAGTAGAACGGGCCCTGAAACTAGTCCGTAATATCAGTTCTTTCGACTTGCGC
    GGGATCTTTGTCAATAAAAAGAACTTGCGCATACTGAGCAACAAACTGATAGGAGATTGG
    GACGCGATCGAAACCGCATTGATGCATAGTTCTTCATCAGAAAACGATAAGAAAAGCGTA
    TATGATAGCGCGGAGGCTTTTACGTTGGATGACATCTTTTCAAGCGTGAAAAAATTTTCT
    GATGCCTCTGCCGAAGATATTGGCAACAGGGCGGAAGACATCTGTAGAGTGATAAGTGAG
    ACGGCCCCTTTTATCAACGATCTGCGAGCGGTGGACCTGGATAGCCTGAACGACGATGGT
    TATGAAGCGGCCGTCTCAAAAATTCGGGAGTCGCTGGAGCCTTATATGGATCTTTTCCAT
    GAACTGGAAATTTTCTCGGTTGGCGATGAGTTCCCAAAATGCGCAGCATTTTACAGCGAA
    CTGGAGGAAGTCAGCGAACAGCTGATCGAAATTATTCCGTTATTCAACAAGGCGCGTTCG
    TTCTGCACCCGGAAACGCTATAGCACCGATAAGATTAAAGTGAACTTAAAATTCCCGACC
    TTGGCGGACGGGTGGGACCTGAACAAAGAGAGAGACAACAAAGCCGCGATTCTGCGGAAA
    GACGGTAAGTATTATCTGGCAATTCTGGATATGAAGAAAGATCTGTCAAGCATTAGGACC
    AGCGACGAAGATGAATCCAGCTTCGAAAAGATGGAGTATAAACTGTTACCGAGTCCAGTA
    AAAATGCTGCCAAAGATATTCGTAAAATCGAAAGCCGCTAAGGAAAAATATGGCCTGACA
    GATCGTATGCTTGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCGTTTGATCTT
    GGCTTTTGCCATGAACTCATTGATTATTACAAGCGTTGTATCGCGGAGTACCCAGGCTGG
    GATGTGTTCGATTTCAAGTTTCGCGAAACTTCCGATTATGGGTCCATGAAAGAGTTCAAT
    GAAGATGTGGCCGGAGCCGGTTACTATATGAGTCTGAGAAAAATTCCGTGCAGCGAAGTG
    TACCGTCTGTTAGACGAGAAATCGATTTATCTATTTCAAATTTATAACAAAGATTACTCT
    GAAAATGCACATGGTAATAAGAACATGCATACCATGTACTGGGAGGGTCTCTTTTCCCCG
    CAAAACCTGGAGTCGCCCGTTTTCAAGTTGTCGGGTGGGGCAGAACTTTTCTTTCGAAAA
    TCCTCAATCCCTAACGATGCCAAAACAGTACACCCGAAAGGCTCAGTGCTGGTTCCACGT
    AATGATGTTAACGGTCGGCGTATTCCAGATTCAATCTACCGCGAACTGACACGCTATTTT
    AACCGTGGCGATTGCCGAATCAGTGACGAAGCCAAAAGTTATCTTGACAAGGTTAAGACT
    AAAAAAGCGGACCATGACATTGTGAAAGATCGCCGCTTTACCGTGGATAAAATGATGTTC
    CACGTCCCGATTGCGATGAACTTTAAGGCGATCAGTAAACCGAACTTAAACAAAAAAGTC
    ATTGATGGCATCATTGATGATCAGGATCTGAAAATCATTGGTATTGATCGTGGCGAGCGG
    AACTTAATTTACGTCACGATGGTTGACAGAAAAGGGAATATCTTATATCAGGATTCTCTT
    AACATCCTCAATGGCTACGACTATCGTAAAGCTCTGGATGTGCGCGAATATGACAACAAG
    GAAGCGCGTCGTAACTGGACTAAAGTGGAGGGCATTCGCAAAATGAAGGAAGGCTATCTG
    TCATTAGCGGTCTCGAAATTAGCGGATATGATTATCGAAAATAACGCCATCATCGTTATG
    GAGGACCTGAACCACGGATTCAAAGCGGGCCGCTCAAAGATTGAAAAACAAGTTTATCAG
    AAATTTGAGAGTATGCTGATTAACAAACTGGGCTATATGGTGTTAAAAGACAAGTCAATT
    GACCAATCAGGTGGCGCGCTGCATGGATACCAGCTGGCGAACCATGTTACCACCTTAGCA
    TCAGTTGGAAAGCAGTGTGGGGTTATCTTTTATATACCGGCAGCGTTCACTAGTAAAATA
    GATCCGACCACTGGTTTCGCCGATCTCTTTGCCCTGAGTAACGTTAAAAACGTAGCGAGC
    ATGCGTGAATTCTTTTCCAAAATGAAATCTGTCATTTATGATAAAGCTGAAGGCAAATTC
    GCATTCACCTTTGATTACTTGGATTACAACGTGAAGAGCGAATGTGGTCGTACGCTGTGG
    ACCGTTTACACCGTTGGTGAGCGCTTCACCTATTCCCGTGTGAACCGCGAATATGTACGT
    AAAGTCCCCACCGATATTATCTATGATGCCCTCCAGAAAGCAGGCATTAGCGTCGAAGGA
    GACTTAAGGGACAGAATTGCCGAAAGCGATGGCGATACGCTGAAGTCTATTTTTTACGCA
    TTCAAATACGCGCTAGATATGCGCGTTGAGAATCGCGAGGAAGACTACATTCAATCACCT
    GTGAAAAATGCCTCTGGGGAATTTTTTTGTTCAAAAAATGCTGGTAAAAGCCTCCCACAA
    GATAGCGATGCAAACGGTGCATATAACATTGCCCTGAAAGGTATTCTTCAATTACGCATG
    CTGTCTGAGCAGTACGACCCCAACGCGGAATCTATTAGACTTCCGCTGATAACCAATAAA
    GCCTGGCTGACATTCATGCAGTCTGGCATGAAGACCTGGAAAAATTAG
    SEQ ATGGATAGTTTAAAAGATTTTACGAATCTATATCCCGTAAGCAAAACTCTTCGTTTTGAA
    ID CTGAAACCTGTTGGAAAAACGTTGGAGAATATCGAGAAAGCGGGCATCCTGAAAGAAGAC
    NO: GAGCACCGTGCCGAAAGCTACAGGCGTGTCAAAAAGATTATCGATACTTATCACAAAGTG
    46 TTCATTGATAGCAGTCTGGAGAACATGGCAAAAATGGGCATAGAAAATGAAATCAAAGCA
    ATGCTGCAGAGCTTTTGCGAGCTCTACAAGAAAGATCACCGAACGGAAGGTGAAGATAAA
    GCACTGGACAAAATTCGCGCCGTTCTTCGCGGTCTGATTGTTGGCGCGTTCACCGGCGTG
    TGCGGCCGCCGTGAAAACACCGTGCAGAACGAAAAGTACGAGTCGCTGTTCAAAGAAAAA
    CTGATAAAAGAAATTTTGCCTGACTTTGTGCTTTCGACCGAAGCGGAATCCCTGCCATTT
    TCTGTCGAAGAAGCGACCCGCAGCCTGAAAGAATTTGACTCATTCACAAGTTACTTTGCA
    GGCTTCTACGAAAACCGTAAAAACATCTACAGCACGAAGCCACAGAGCACGGCTATTGCT
    TATCGCCTGATTCATGAGAACCTGCCGAAGTTCATCGATAACATCCTTGTTTTTCAAAAA
    ATTAAAGAGCCGATTGCGAAAGAGTTAGAACATATTCGAGCTGACTTTTCTGCGGGTGGG
    TACATTAAAAAAGATGAGCGGCTGGAAGACATCTTCAGTCTAAACTATTATATCCACGTT
    CTGTCGCAGGCAGGCATTGAGAAATATAATGCGCTGATTGGTAAGATTGTCACAGAAGGC
    GATGGTGAGATGAAAGGTCTTAATGAACATATCAATCTGTATAACCAGCAGCGTGGTCGC
    GAAGACCGTCTTCCACTGTTCCGCCCACTGTATAAACAGATCCTGTCTGACCGGGAACAG
    CTGTCCTACCTGCCGGAAAGCTTTGAAAAGGATGAAGAGCTACTTCGCGCATTAAAGGAG
    TTTTACGACCATATTGCGGAAGACATTTTGGGTAGAACGCAGCAACTGATGACGTCAATT
    TCTGAATACGATCTGAGTAGAATCTACGTTAGGAATGATAGCCAGCTGACCGATATTAGC
    AAAAAAATGCTGGGCGACTGGAACGCTATCTATATGGCACGTGAACGTGCATATGATCAT
    GAACAAGCACCGAAACGTATAACCGCGAAATATGAGCGTGATCGCATTAAGGCGCTAAAG
    GGAGAAGAAAGCATCTCACTCGCAAACCTGAACTCCTGTATCGCTTTCTTAGATAACGTG
    CGCGATTGTCGCGTCGACACGTATCTGTCAACCCTTGGGCAGAAAGAGGGTCCACATGGT
    CTGTCTAACCTGGTGGAAAATGTCTTTGCGAGTTACCATGAAGCGGAACAACTGCTGTCT
    TTTCCATACCCCGAAGAAAACAATCTAATACAGGATAAAGATAACGTGGTGTTAATCAAA
    AACCTGCTGGACAACATCAGCGATCTGCAACGTTTCCTGAAACCTTTGTGGGGTATGGGT
    GACGAGCCAGACAAAGACGAACGTTTTTATGGTGAGTATAATTATATACGTGGCGCCCTT
    GACCAAGTTATTCCGCTGTATAACAAAGTACGGAACTATCTGACCCGTAAGCCATATTCT
    ACCCGTAAAGTGAAACTGAACTTTGGCAACTCGCAACTGCTGTCGGGTTGGGATCGTAAC
    AAAGAAAAAGATAATAGTTGTGTTATCCTGCGTAAGGGACAAAATTTTTACCTCGCGATT
    ATGAACAACAGACACAAGCGTTCATTTGAAAATAAGGTTCTGCCGGAGTATAAAGAGGGC
    GAACCGTACTTCGAGAAAATGGATTATAAGTTCTTACCAGACCCTAATAAGATGTTACCG
    AAAGTCTTTCTTTCGAAAAAAGGCATAGAAATCTATAAGCCGTCCCCGAAATTACTCGAA
    CAGTATGGGCACGGGACCCACAAGAAAGGGGATACTTTTAGCATGGACGATCTGCACGAA
    CTGATCGATTTTTTTAAACACTCCATCGAAGCCCATGAAGACTGGAAACAGTTTGGGTTC
    AAGTTCTCTGATACAGCCACATACGAGAATGTGTCTAGTTTTTATCGGGAAGTGGAGGAT
    CAGGGCTACAAACTTAGTTTTCGTAAAGTTTCAGAGAGTTATGTTTATAGTTTAATTGAT
    CAGGGAAAACTTTACCTGTTCCAGATCTACAACAAAGATTTCTCGCCATGTAGTAAGGGT
    ACCCCGAATCTGCATACACTCTATTGGAGAATGTTATTCGATGAGCGTAACTTAGCGGAT
    GTCATTTATAAATTGGACGGGAAAGCAGAGATCTTTTTTCGTGAAAAATCACTGAAGAAT
    GACCACCCGACTCATCCGGCCGGGAAACCGATCAAAAAAAAATCCCGCCAGAAAAAAGGA
    GAAGAGTCTCTGTTTGAATATGATCTGGTGAAAGACCGTCATTACACTATGGATAAATTT
    CAATTTCATGTTCCAATTACAATGAACTTCAAATGTTCGGCGGGTTCCAAAGTAAATGAT
    ATGGTAAACGCCCATATTCGCGAAGCGAAAGATATGCATGTTATTGGCATCGATAGAGGC
    GAAAGAAACCTGCTTTATATTTGCGTAATTGACAGCCGTGGTACCATTCTGGACCAGATC
    TCTTTAAACACCATCAATGACATCGATTATCACGACCTGTTGGAGTCTCGGGACAAGGAC
    CGCCAGCAGGAGCGCCGTAATTGGCAGACAATTGAAGGCATAAAAGAATTAAAACAGGGT
    TACCTTTCCCAGGCCGTACACCGCATAGCGGAACTGATGGTGGCCTACAAAGCCGTAGTT
    GCCCTGGAAGACTTGAATATGGGGTTTAAACGTGGCCGTCAAAAAGTCGAGAGCAGCGTG
    TATCAGCAATTTGAAAAACAGTTGATTGACAAGTTGAATTATTTGGTTGATAAAAAGAAA
    CGTCCAGAAGATATTGGTGGCTTACTGCGTGCATACCAGTTTACGGCACCTTTTAAGTCC
    TTCAAAGAAATGGGTAAACAGAACGGGTTTCTGTTTTACATCCCGGCCTGGAATACATCC
    AACATCGATCCTACCACCGGGTTTGTCAACCTGTTTCATGCACAATATGAAAACGTGGAT
    AAAGCGAAGAGTTTTTTCCAAAAATTCGATAGTATTTCGTATAACCCAAAAAAAGATTGG
    TTTGAGTTTGCGTTCGATTATAAAAATTTTACTAAAAAGGCTGAGGGATCCCGCAGTATG
    TGGATCCTCTGCACCCATGGCAGTCGTATTAAAAATTTTCGTAATTCGCAAAAGAATGGC
    CAGTGGGACTCGGAAGAGTTTGCCCTGACCGAAGCGTTCAAATCGCTGTTTGTACGCTAC
    GAAATTGACTACACAGCAGATCTGAAAACAGCCATCGTCGATGAAAAACAGAAAGATTTT
    TTTGTAGATCTCCTAAAACTGTTCAAACTGACTGTTCAGATGCGCAATTCCTGGAAAGAG
    AAAGACCTGGATTATCTGATTAGCCCGGTAGCCGGTGCTGATGGACGATTTTTCGATACT
    CGTGAAGGTAACAAAAGTCTCCCGAAAGATGCTGATGCCAATGGTGCATACAATATTGCA
    TTAAAGGGGCTATGGGCCTTGCGACAGATCCGCCAGACCAGCGAAGGCGGCAAGCTGAAA
    TTGGCCATATCGAATAAGGAATGGTTACAATTTGTTCAGGAACGTAGCTATGAAAAAGAT
    TGA
    SEQ ATGAACAACGGCACAAATAATTTTCAGAACTTCATCGGGATCTCAAGTTTGCAGAAAACG
    ID CTGCGCAATGCTCTGATCCCCACGGAAACCACGCAACAGTTCATCGTCAAGAACGGAATA
    NO: ATTAAAGAAGATGAGTTACGTGGCGAGAACCGCCAGATTCTGAAAGATATCATGGATGAC
    47 TACTACCGCGGATTCATCTCTGAGACTCTGAGTTCTATTGATGACATAGATTGGACTAGC
    CTGTTCGAAAAAATGGAAATTCAGCTGAAAAATGGTGATAATAAAGATACCTTAATTAAG
    GAACAGACAGAGTATCGGAAAGCAATCCATAAAAAATTTGCGAACGACGATCGGTTTAAG
    AACATGTTTAGCGCCAAACTGATTAGTGACATATTACCTGAATTTGTCATCCACAACAAT
    AATTATTCGGCATCAGAGAAAGAGGAAAAAACCCAGGTGATAAAATTGTTTTCGCGCTTT
    GCGACTAGCTTTAAAGATTACTTCAAGAACCGTGCAAATTGCTTTTCAGCGGACGATATT
    TCATCAAGCAGCTGCCATCGCATCGTCAACGACAATGCAGAGATATTCTTTTCAAATGCG
    CTGGTCTACCGCCGGATCGTAAAATCGCTGAGCAATGACGATATCAACAAAATTTCGGGC
    GATATGAAAGATTCATTAAAAGAAATGAGTCTGGAAGAAATATATTCTTACGAGAAGTAT
    GGGGAATTTATTACCCAGGAAGGCATTAGCTTCTATAATGATATCTGTGGGAAAGTGAAT
    TCTTTTATGAACCTGTATTGTCAGAAAAATAAAGAAAACAAAAATTTATACAAACTTCAG
    AAACTTCACAAACAGATTCTATGCATTGCGGACACTAGCTATGAGGTCCCGTATAAATTT
    GAAAGTGACGAGGAAGTGTACCAATCAGTTAACGGCTTCCTTGATAACATTAGCAGCAAA
    CATATAGTCGAAAGATTACGCAAAATCGGCGATAACTATAACGGCTACAACCTGGATAAA
    ATTTATATCGTGTCCAAATTTTACGAGAGCGTTAGCCAAAAAACCTACCGCGACTGGGAA
    ACAATTAATACCGCCCTCGAAATTCATTACAATAATATCTTGCCGGGTAACGGTAAAAGT
    AAAGCCGACAAAGTAAAAAAAGCGGTTAAGAATGATTTACAGAAATCCATCACCGAAATA
    AATGAACTAGTGTCAAACTATAAGCTGTGCAGTGACGACAACATCAAAGCGGAGACTTAT
    ATACATGAGATTAGCCATATCTTGAATAACTTTGAAGCACAGGAATTGAAATACAATCCG
    GAAATTCACCTAGTTGAATCCGAGCTCAAAGCGAGTGAGCTTAAAAACGTGCTGGACGTG
    ATCATGAATGCGTTTCATTGGTGTTCGGTTTTTATGACTGAGGAACTTGTTGATAAAGAC
    AACAATTTTTATGCGGAACTGGAGGAGATTTACGATGAAATTTATCCAGTAATTAGTCTG
    TACAACCTGGTTCGTAACTACGTTACCCAGAAACCGTACAGCACGAAAAAGATTAAATTG
    AACTTTGGAATACCGACGTTAGCAGACGGTTGGTCAAAGTCCAAAGAGTATTCTAATAAC
    GCTATCATACTGATGCGCGACAATCTGTATTATCTGGGCATCTTTAATGCGAAGAATAAA
    CCGGACAAGAAGATTATCGAGGGTAATACGTCAGAAAATAAGGGTGACTACAAAAAGATG
    ATTTATAATTTGCTCCCGGGTCCCAACAAAATGATCCCGAAAGTTTTCTTGAGCAGCAAG
    ACGGGGGTGGAAACGTATAAACCGAGCGCCTATATCCTAGAGGGGTATAAACAGAATAAA
    CATATCAAGTCTTCAAAAGACTTTGATATCACTTTCTGTCATGATCTGATCGACTACTTC
    AAAAACTGTATTGCAATTCATCCCGAGTGGAAAAACTTCGGTTTTGATTTTAGCGACACC
    AGTACTTATGAAGACATTTCCGGGTTTTATCGTGAGGTAGAGTTACAAGGTTACAAGATT
    GATTGGACATACATTAGCGAAAAAGACATTGATCTGCTGCAGGAAAAAGGTCAACTGTAT
    CTGTTCCAGATATATAACAAAGATTTTTCGAAAAAATCAACCGGGAATGACAACCTTCAC
    ACCATGTACCTGAAAAATCTTTTCTCAGAAGAAAATCTTAAGGATATCGTCCTGAAACTT
    AACGGCGAAGCGGAAATCTTCTTCAGGAAGAGCAGCATAAAGAACCCAATCATTCATAAA
    AAAGGCTCGATTTTAGTCAACCGTACCTACGAAGCAGAAGAAAAAGACCAGTTTGGCAAC
    ATTCAAATTGTGCGTAAAAATATTCCGGAAAACATTTATCAGGAGCTGTACAAATACTTC
    AACGATAAAAGCGACAAAGAGCTGTCTGATGAAGCAGCCAAACTGAAGAATGTAGTGGGA
    CACCACGAGGCAGCGACGAATATAGTCAAGGACTATCGCTACACGTATGATAAATACTTC
    CTTCATATGCCTATTACGATCAATTTCAAAGCCAATAAAACGGGTTTTATTAATGATAGG
    ATCTTACAGTATATCGCTAAAGAAAAAGACTTACATGTGATCGGCATTGATCGGGGCGAG
    CGTAACCTGATCTACGTGTCCGTGATTGATACTTGTGGTAATATAGTTGAACAGAAAAGC
    TTTAACATTGTAAACGGCTACGACTATCAGATAAAACTGAAACAACAGGAGGGCGCTAGA
    CAGATTGCGCGGAAAGAATGGAAAGAAATTGGTAAAATTAAAGAGATCAAAGAGGGCTAC
    CTGAGCTTAGTAATCCACGAGATCTCTAAAATGGTAATCAAATACAATGCAATTATAGCG
    ATGGAGGATTTGTCTTATGGTTTTAAAAAAGGGCGCTTTAAGGTCGAACGGCAAGTTTAC
    CAGAAATTTGAAACCATGCTCATCAATAAACTCAACTATCTGGTATTTAAAGATATTTCG
    ATTACCGAGAATGGCGGTCTCCTGAAAGGTTATCAGCTGACATACATTCCTGATAAACTT
    AAAAACGTGGGTCATCAGTGCGGCTGCATTTTTTATGTGCCTGCTGCATACACGAGCAAA
    ATTGATCCGACCACCGGCTTTGTGAATATCTTTAAATTTAAAGACCTGACAGTGGACGCA
    AAACGTGAATTCATTAAAAAATTTGACTCAATTCGTTATGACAGTGAAAAAAATCTGTTC
    TGCTTTACATTTGACTACAATAACTTTATTACGCAAAACACGGTCATGAGCAAATCATCG
    TGGAGTGTGTATACATACGGCGTGCGCATCAAACGTCGCTTTGTGAACGGCCGCTTCTCA
    AACGAAAGTGATACCATTGACATAACCAAAGATATGGAGAAAACGTTGGAAATGACGGAC
    ATTAACTGGCGCGATGGCCACGATCTTCGTCAAGACATTATAGATTATGAAATTGTTCAG
    CACATATTCGAAATTTTCCGTTTAACAGTGCAAATGCGTAACTCCTTGTCTGAACTGGAG
    GACCGTGATTACGATCGTCTCATTTCACCTGTACTGAACGAAAATAACATTTTTTATGAC
    AGCGCGAAAGCGGGGGATGCACTTCCTAAGGATGCCGATGCAAATGGTGCGTATTGTATT
    GCATTAAAAGGGTTATATGAAATTAAACAAATTACCGAAAATTGGAAAGAAGATGGTAAA
    TTTTCGCGCGATAAACTCAAAATCAGCAATAAAGATTGGTTCGACTTTATCCAGAATAAG
    CGCTATCTCTAA
    SEQ ATGACCAATAAATTCACTAACCAGTATTCTCTCTCTAAGACCCTGCGCTTTGAACTGATT
    ID CCGCAGGGGAAAACCTTGGAGTTCATTCAAGAAAAAGGCCTCTTGTCTCAGGATAAACAG
    NO: AGGGCTGAATCTTACCAAGAAATGAAGAAAACTATTGATAAGTTTCATAAATATTTCATT
    48 GATTTAGCCTTGTCTAACGCCAAATTAACTCACTTGGAAACGTATCTGGAGTTATACAAC
    AAATCTGCCGAAACTAAGAAAGAACAGAAATTTAAAGACGATTTGAAAAAAGTACAGGAC
    AATCTGCGTAAAGAAATTGTCAAATCCTTCAGTGACGGCGATGCTAAAAGCATTTTTGCC
    ATTCTGGACAAAAAAGAGTTGATTACTGTGGAATTAGAAAAGTGGTTTGAAAACAATGAG
    CAGAAAGACATCTACTTCGATGAGAAATTCAAAACTTTCACCACCTATTTTACAGGATTT
    CATCAAAACCGGAAGAACATGTACTCAGTAGAACCGAACTCCACGGCCATTGCGTATCGT
    TTGATCCATGAGAATCTGCCTAAATTTCTGGAGAATGCGAAAGCCTTTGAAAAGATTAAG
    CAGGTCGAATCGCTGCAAGTGAATTTTCGTGAACTCATGGGCGAATTTGGTGACGAAGGT
    CTAATCTTCGTTAACGAACTGGAAGAAATGTTTCAGATTAATTACTACAATGACGTGCTA
    TCGCAGAACGGTATCACAATCTACAATAGTATTATCTCAGGGTTCACAAAAAACGATATA
    AAATACAAAGGCCTGAACGAGTATATCAATAACTACAACCAAACAAAGGACAAAAAGGAT
    AGGCTTCCGAAACTGAAGCAGTTATACAAACAGATTTTATCTGACAGAATCTCCCTGAGC
    TTTCTGCCGGATGCTTTCACTGATGGGAAGCAGGTTCTGAAAGCGATTTTCGATTTTTAT
    AAGATTAACTTACTGAGCTACACGATTGAAGGTCAAGAAGAATCTCAAAACTTACTGCTC
    TTGATCCGTCAAACCATTGAAAATCTATCATCGTTCGATACGCAGAAAATCTACCTCAAA
    AACGATACTCACCTGACTACGATCTCTCAGCAGGTTTTCGGGGATTTTAGTGTATTTTCA
    ACAGCTCTGAACTACTGGTATGAAACCAAAGTCAATCCGAAATTCGAGACGGAATATTCT
    AAGGCCAACGAAAAAAAACGTGAGATTCTTGATAAAGCTAAAGCCGTATTTACTAAACAG
    GATTACTTTTCTATTGCTTTCCTGCAGGAAGTTTTATCGGAGTATATCCTGACCCTGGAT
    CATACATCTGATATCGTTAAAAAACACAGCAGCAATTGCATCGCTGACTATTTCAAAAAC
    CACTTTGTCGCCAAAAAAGAAAACGAAACAGACAAGACTTTCGATTTCATTGCTAACATC
    ACCGCAAAATACCAGTGTATTCAGGGTATCTTGGAAAACGCCGACCAATACGAAGACGAA
    CTGAAACAAGATCAGAAGCTGATCGATAATTTAAAATTCTTCTTAGATGCAATCCTGGAG
    CTGCTGCACTTCATCAAACCGCTTCATTTAAAGAGCGAGTCCATTACCGAAAAGGACACC
    GCCTTCTATGACGTTTTTGAAAATTATTATGAAGCCCTCTCCTTGCTGACTCCGCTGTAT
    AATATGGTACGCAATTACGTAACCCAGAAACCATATTCTACCGAAAAAATTAAACTGAAC
    TTTGAAAACGCACAGCTGCTCAACGGTTGGGACGCGAATAAAGAAGGTGACTACCTCACC
    ACCATCCTGAAAAAAGATGGTAACTATTTTCTGGCAATTATGGATAAGAAACATAATAAA
    GCATTCCAGAAATTTCCTGAAGGGAAAGAAAATTACGAAAAGATGGTGTACAAACTCTTA
    CCTGGAGTTAACAAAATGTTGCCGAAAGTATTTTTTAGTAATAAGAACATCGCGTACTTT
    AACCCGTCCAAAGAACTGCTGGAAAATTATAAAAAGGAGACGCATAAGAAAGGGGATACC
    TTTAACCTGGAACATTGCCATACCTTAATAGACTTCTTCAAGGATTCCCTGAATAAACAC
    GAGGATTGGAAATATTTCGATTTTCAGTTTAGTGAGACCAAGTCATACCAGGATCTTAGC
    GGCTTTTATCGCGAAGTAGAACACCAAGGCTATAAAATTAACTTCAAAAACATCGACAGC
    GAATACATCGACGGTTTAGTTAACGAGGGCAAACTGTTTCTGTTCCAGATCTATTCAAAG
    GATTTTAGCCCGTTCTCTAAAGGCAAACCAAATATGCATACGTTGTACTGGAAAGCACTG
    TTTGAAGAGCAAAACCTGCAGAATGTGATTTATAAACTGAACGGCCAAGCTGAGATTTTT
    TTCCGTAAAGCCTCGATTAAACCGAAAAATATCATCCTTCATAAGAAGAAAATAAAGATC
    GCTAAAAAACACTTCATAGATAAAAAAACCAAAACCTCCGAAATAGTGCCTGTTCAAACA
    ATTAAGAACTTGAATATGTACTACCAGGGCAAGATATCGGAAAAGGAGTTGACTCAAGAC
    GATCTTCGCTATATCGATAACTTTTCGATTTTTAACGAAAAAAACAAGACGATCGACATC
    ATCAAAGATAAACGCTTCACTGTAGATAAGTTCCAGTTTCATGTGCCGATTACTATGAAC
    TTCAAAGCTACCGGGGGTAGCTATATCAACCAAACGGTGTTGGAATACCTGCAGAATAAC
    CCGGAAGTCAAAATCATTGGGCTGGACCGCGGAGAACGTCACCTTGTGTACTTGACCTTA
    ATCGATCAGCAAGGCAACATCTTAAAACAAGAATCGCTGAATACCATTACGGATTCAAAG
    ATTAGCACCCCGTATCATAAGCTGCTCGATAACAAGGAGAATGAGCGCGACCTGGCCCGT
    AAAAACTGGGGCACGGTGGAAAACATTAAGGAGTTAAAGGAGGGTTATATTTCCCAGGTA
    GTGCATAAGATCGCCACTCTCATGCTCGAGGAAAATGCGATCGTTGTCATGGAAGACTTA
    AACTTCGGATTTAAACGTGGGCGATTTAAAGTAGAGAAACAAATCTACCAGAAGTTAGAA
    AAAATGCTGATTGACAAATTAAATTACTTGGTCCTAAAAGACAAACAGCCGCAAGAATTG
    GGTGGATTATACAACGCCCTCCAACTTACCAATAAATTCGAAAGTTTTCAGAAAATGGGT
    AAACAGTCAGGCTTTCTTTTTTATGTTCCTGCGTGGAACACATCCAAAATCGACCCTACA
    ACCGGCTTCGTCAATTACTTCTATACTAAATATGAAAACGTCGACAAAGCAAAAGCATTC
    TTTGAAAAGTTCGAAGCAATACGTTTTAACGCTGAGAAAAAATATTTCGAGTTCGAAGTC
    AAGAAATACTCAGACTTTAACCCCAAAGCTGAGGGCACACAGCAAGCGTGGACAATCTGC
    ACCTACGGCGAGCGCATCGAAACGAAGCGTCAAAAAGATCAGAATAACAAATTTGTTTCA
    ACACCTATCAACCTGACCGAGAAGATTGAAGACTTCTTAGGTAAAAATCAGATTGTTTAT
    GGCGACGGTAACTGTATAAAATCTCAAATAGCCTCAAAGGATGATAAAGCATTTTTCGAA
    ACATTATTATATTGGTTCAAAATGACACTGCAGATGCGCAATAGTGAGACGCGTACAGAT
    ATTGATTATCTTATCAGCCCGGTCATGAACGACAACGGTACTTTTTACAACTCCAGAGAC
    TATGAAAAACTTGAGAATCCAACTCTCCCCAAAGATGCTGATGCGAACGGTGCTTATCAC
    ATCGCGAAAAAAGGTCTGATGCTGCTGAACAAAATCGACCAAGCCGATCTGACTAAGAAA
    GTTGACCTAAGCATTTCAAATCGGGACTGGTTACAGTTTGTTCAAAAGAACAAATGA
    SEQ ATGGAACAGGAATATTATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTT
    ID ACTGACAGTGAATATCACGTTCTAAGAAAGCATGGTAAGGCATTGTGGGGTGTAAGACTT
    NO: TTCGAATCTGCTTCCACTGCTGAAGAGCGTAGAATGTTTAGAACGAGTCGACGTAGGCTA
    49 GACAGGCGCAATTGGAGAATCGAAATTTTACAAGAAATTTTTGCGGAAGAGATATCTAAG
    AAAGACCCAGGCTTTTTCCTGAGAATGAAGGAATCTAAGTATTACCCTGAGGATAAAAGA
    GATATAAATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGACGATGATTTTACC
    GATAAGGATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATGAAT
    ACAGAGGAAACCCCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACAT
    AGAGGCCATTTCTTACTTTCCGGGGATATCAACGAAATCAAAGAGTTTGGTACCACATTT
    AGTAAGTTACTGGAAAACATAAAGAATGAAGAATTGGATTGGAACTTAGAACTCGGAAAA
    GAAGAATACGCGGTTGTCGAATCTATCCTGAAGGATAATATGCTGAATAGGTCGACCAAA
    AAAACTAGGCTGATCAAAGCACTGAAAGCCAAATCTATCTGCGAAAAAGCTGTTTTAAAT
    TTACTTGCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTTTGGAAGAATTGAACGAA
    ACCGAGCGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTGGTGAGGTG
    GAAAACGAGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGG
    GCTGTTTTAGTAGAAATCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACT
    TACGAAAAGCACAAGTCCGATCTCCAGTTTTTGAAGAAAATTGTCAGGAAATATCTGACT
    AAGGAAGAATATAAAGATATTTTCGTTAGTACCTCTGACAAACTGAAAAATTACTCCGCT
    TACATCGGGATGACCAAGATTAATGGCAAAAAAGTTGATCTGCAAAGCAAAAGGTGTTCG
    AAGGAAGAATTTTATGATTTCATTAAAAAGAATGTCTTAAAAAAATTAGAAGGTCAGCCA
    GAATACGAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACCAAAACAAGTCAAC
    AGAGATAATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTTAGGC
    AATTTACGCGATAAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTT
    GAATTCAGAATACCCTATTATGTGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGT
    AAATTCACATGGGCCGTCCGCAAATCCAATGAAAAAATTTACCCATGGAACTTTGAAAAT
    GTAGTAGATATTGAAGCGTCTGCGGAGAAATTTATTCGAAGAATGACTAATAAATGCACT
    TACTTGATGGGAGAGGATGTTCTGCCTAAAGACAGCTTATTATACAGCAAGTACATGGTT
    CTAAACGAACTTAACAACGTTAAGTTGGACGGTGAGAAATTAAGTGTAGAATTGAAACAA
    AGATTGTATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAAAATTAAGAAT
    TACTTGAAGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGAT
    TTCAAAGCATCCCTAACAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTC
    GCAAAAAAAGATAAAGAAAACATTATTACTAATATTGTTCTTTTCGGTGATGACAAGAAA
    TTGTTGAAGAAAAGACTGAATAGACTTTACCCCCAGATTACTCCCAATCAACTTAAGAAA
    ATTTGTGCTTTGTCTTACACAGGATGGGGTCGTTTTTCAAAAAAGTTCTTAGAAGAGATT
    ACCGCACCTGATCCAGAAACAGGCGAAGTATGGAATATAATTACCGCCTTATGGGAATCG
    AACAATAATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGGAAGAAGTTGAGACT
    TACAACATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGTATGTA
    TCACCTTCTGTCAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAG
    GTAATGAAGGAGTCTCCTAAACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCA
    AAAAGAACCGAGTCAAGAAAGAAGCAGTTAATCGATTTATATAAGGCTTGTAAAAACGAA
    GAGAAAGATTGGGTTAAAGAATTGGGGGACCAAGAGGAACAAAAACTACGGTCGGATAAG
    TTGTATTTATACTATACGCAAAAGGGACGATGTATGTATTCCGGCGAGGTAATAGAATTG
    AAGGATTTATGGGACAATACAAAATATGACATAGACCATATATATCCCCAATCAAAAACG
    ATGGACGATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATAATGCGACCAAATCT
    GATAAGTATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGTCCTTG
    TTAGATGGTGGGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTA
    TCGCCAGAAGAACTCGCTGGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACC
    AAAGCCGTTGCTGAGATCCTAAAGCAAGTTTTCCCAGAGTCGGAGATTGTCTATGTCAAA
    GCTGGCACAGTGAGCAGGTTTAGGAAAGACTTCGAACTATTAAAGGTAAGAGAAGTGAAC
    GATTTACATCACGCAAAGGACGCTTACCTAAATATCGTTGTAGGTAACTCATATTATGTT
    AAATTTACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCCAGGTAGAACATATAACCTG
    AAAAAGATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGCATGGGAAGTT
    GGTAAGAAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCGTT
    ACAAGGCAGGTTCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGG
    AAAGGTCAAATTGCAATAAAAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGT
    GGCTATAATAAAGCTGCGGGTGCATACTTTATGCTTGTTGAATCAAAAGACAAGAAAGGT
    AAGACTATTAGAACTATAGAATTTATACCCCTGTACCTTAAAAACAAAATTGAATCGGAT
    GAGTCAATCGCGTTAAATTTTCTAGAGAAAGGAAGGGGTTTAAAAGAACCAAAGATCCTG
    TTAAAAAAGATTAAGATTGACACCTTGTTCGATGTAGATGGATTTAAAATGTGGTTATCT
    GGCAGAACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTGGATGAGAAA
    ATCATTGTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGAG
    TTGAAATTATCTGATAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACA
    TTCGTTGATAAACTTGAAAATACCGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACA
    TTAATTGATAAACAAAAAGAATTTGAAAGGCTATCACTGGAAGACAAATCCTCCACCCTA
    TTTGAAATTTTGCATATATTCCAGTGCCAATCTTCAGCAGCTAATTTAAAAATGATTGGC
    GGACCTGGGAAAGCCGGCATCCTAGTGATGAACAATAATATCTCCAAGTGTAACAAAATA
    TCAATTATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGACTTGCTTAAGATA
    TAA
    SEQ ATGTCTTTCGACTCTTTCACCAACCTGTACTCTCTGTCTAAAACCCTGAAATTCGAAATG
    ID CGTCCGGTTGGTAACACCCAGAAAATGCTGGACAACGCGGGTGTTTTCGAAAAAGACAAA
    NO: CTGATCCAGAAAAAATACGGTAAAACCAAACCGTACTTCGACCGTCTGCACCGTGAATTC
    50 ATCGAAGAAGCGCTGACCGGTGTTGAACTGATCGGTCTGGACGAAAACTTCCGTACCCTG
    GTTGACTGGCAGAAAGACAAAAAAAACAACGTTGCGATGAAAGCGTACGAAAACTCTCTG
    CAGCGTCTGCGTACCGAAATCGGTAAAATCTTCAACCTGAAAGCGGAAGACTGGGTTAAA
    AACAAATACCCGATCCTGGGTCTGAAAAACAAAAACACCGACATCCTGTTCGAAGAAGCG
    GTTTTCGGTATCCTGAAAGCGCGTTACGGTGAAGAAAAAGACACCTTCATCGAAGTTGAA
    GAAATCGACAAAACCGGTAAATCTAAAATCAACCAGATCTCTATCTTCGACTCTTGGAAA
    GGTTTCACCGGTTACTTCAAAAAATTCTTCGAAACCCGTAAAAACTTCTACAAAAACGAC
    GGTACCTCTACCGCGATCGCGACCCGTATCATCGACCAGAACCTGAAACGTTTCATCGAC
    AACCTGTCTATCGTTGAATCTGTTCGTCAGAAAGTTGACCTGGCGGAAACCGAAAAATCT
    TTCTCTATCTCTCTGTCTCAGTTCTTCTCTATCGACTTCTACAACAAATGCCTGCTGCAG
    GACGGTATCGACTACTACAACAAAATCATCGGTGGTGAAACCCTGAAAAACGGTGAAAAA
    CTGATCGGTCTGAACGAACTGATCAACCAGTACCGTCAGAACAACAAAGACCAGAAAATC
    CCGTTCTTCAAACTGCTGGACAAACAGATCCTGTCTGAAAAAATCCTGTTCCTGGACGAA
    ATCAAAAACGACACCGAACTGATCGAAGCGCTGTCTCAGTTCGCGAAAACCGCGGAAGAA
    AAAACCAAAATCGTTAAAAAACTGTTCGCGGACTTCGTTGAAAACAACTCTAAATACGAC
    CTGGCGCAGATCTACATCTCTCAGGAAGCGTTCAACACCATCTCTAACAAATGGACCTCT
    GAAACCGAAACCTTCGCGAAATACCTGTTCGAAGCGATGAAATCTGGTAAACTGGCGAAA
    TACGAAAAAAAAGACAACTCTTACAAATTCCCGGACTTCATCGCGCTGTCTCAGATGAAA
    TCTGCGCTGCTGTCTATCTCTCTGGAAGGTCACTTCTGGAAAGAAAAATACTACAAAATC
    TCTAAATTCCAGGAAAAAACCAACTGGGAACAGTTCCTGGCGATCTTCCTGTACGAATTC
    AACTCTCTGTTCTCTGACAAAATCAACACCAAAGACGGTGAAACCAAACAGGTTGGTTAC
    TACCTGTTCGCGAAAGACCTGCACAACCTGATCCTGTCTGAACAGATCGACATCCCGAAA
    GACTCTAAAGTTACCATCAAAGACTTCGCGGACTCTGTTCTGACCATCTACCAGATGGCG
    AAATACTTCGCGGTTGAAAAAAAACGTGCGTGGCTGGCGGAATACGAACTGGACTCTTTC
    TACACCCAGCCGGACACCGGTTACCTGCAGTTCTACGACAACGCGTACGAAGACATCGTT
    CAGGTTTACAACAAACTGCGTAACTACCTGACCAAAAAACCGTACTCTGAAGAAAAATGG
    AAACTGAACTTCGAAAACTCTACCCTGGCGAACGGTTGGGACAAAAACAAAGAATCTGAC
    AACTCTGCGGTTATCCTGCAGAAAGGTGGTAAATACTACCTGGGTCTGATCACCAAAGGT
    CACAACAAAATCTTCGACGACCGTTTCCAGGAAAAATTCATCGTTGGTATCGAAGGTGGT
    AAATACGAAAAAATCGTTTACAAATTCTTCCCGGACCAGGCGAAAATGTTCCCGAAAGTT
    TGCTTCTCTGCGAAAGGTCTGGAATTCTTCCGTCCGTCTGAAGAAATCCTGCGTATCTAC
    AACAACGCGGAATTCAAAAAAGGTGAAACCTACTCTATCGACTCTATGCAGAAACTGATC
    GACTTCTACAAAGACTGCCTGACCAAATACGAAGGTTGGGCGTGCTACACCTTCCGTCAC
    CTGAAACCGACCGAAGAATACCAGAACAACATCGGTGAATTCTTCCGTGACGTTGCGGAA
    GACGGTTACCGTATCGACTTCCAGGGTATCTCTGACCAGTACATCCACGAAAAAAACGAA
    AAAGGTGAACTGCACCTGTTCGAAATCCACAACAAAGACTGGAACCTGGACAAAGCGCGT
    GACGGTAAATCTAAAACCACCCAGAAAAACCTGCACACCCTGTACTTCGAATCTCTGTTC
    TCTAACGACAACGTTGTTCAGAACTTCCCGATCAAACTGAACGGTCAGGCGGAAATCTTC
    TACCGTCCGAAAACCGAAAAAGACAAACTGGAATCTAAAAAAGACAAAAAAGGTAACAAA
    GTTATCGACCACAAACGTTACTCTGAAAACAAAATCTTCTTCCACGTTCCGCTGACCCTG
    AACCGTACCAAAAACGACTCTTACCGTTTCAACGCGCAGATCAACAACTTCCTGGCGAAC
    AACAAAGACATCAACATCATCGGTGTTGACCGTGGTGAAAAACACCTGGTTTACTACTCT
    GTTATCACCCAGGCGTCTGACATCCTGGAATCTGGTTCTCTGAACGAACTGAACGGTGTT
    AACTACGCGGAAAAACTGGGTAAAAAAGCGGAAAACCGTGAACAGGCGCGTCGTGACTGG
    CAGGACGTTCAGGGTATCAAAGACCTGAAAAAAGGTTACATCTCTCAGGTTGTTCGTAAA
    CTGGCGGACCTGGCGATCAAACACAACGCGATCATCATCCTGGAAGACCTGAACATGCGT
    TTCAAACAGGTTCGTGGTGGTATCGAAAAATCTATCTACCAGCAGCTGGAAAAAGCGCTG
    ATCGACAAACTGTCTTTCCTGGTTGACAAAGGTGAAAAAAACCCGGAACAGGCGGGTCAC
    CTGCTGAAAGCGTACCAGCTGTCTGCGCCGTTCGAAACCTTCCAGAAAATGGGTAAACAG
    ACCGGTATCATCTTCTACACCCAGGCGTCTTACACCTCTAAATCTGACCCGGTTACCGGT
    TGGCGTCCGCACCTGTACCTGAAATACTTCTCTGCGAAAAAAGCGAAAGACGACATCGCG
    AAATTCACCAAAATCGAATTCGTTAACGACCGTTTCGAACTGACCTACGACATCAAAGAC
    TTCCAGCAGGCGAAAGAATACCCGAACAAAACCGTTTGGAAAGTTTGCTCTAACGTTGAA
    CGTTTCCGTTGGGACAAAAACCTGAACCAGAACAAAGGTGGTTACACCCACTACACCAAC
    ATCACCGAAAACATCCAGGAACTGTTCACCAAATACGGTATCGACATCACCAAAGACCTG
    CTGACCCAGATCTCTACCATCGACGAAAAACAGAACACCTCTTTCTTCCGTGACTTCATC
    TTCTACTTCAACCTGATCTGCCAGATCCGTAACACCGACGACTCTGAAATCGCGAAAAAA
    AACGGTAAAGACGACTTCATCCTGTCTCCGGTTGAACCGTTCTTCGACTCTCGTAAAGAC
    AACGGTAACAAACTGCCGGAAAACGGTGACGACAACGGTGCGTACAACATCGCGCGTAAA
    GGTATCGTTATCCTGAACAAAATCTCTCAGTACTCTGAAAAAAACGAAAACTGCGAAAAA
    ATGAAATGGGGTGACCTGTACGTTTCTAACATCGACTGGGACAACTTCGTT
    SEQ ATGGAAAACTTTAAAAACTTATACCCAATAAACAAAACGTTACGTTTTGAACTGCGTCCA
    ID TATGGTAAAACACTGGAAAACTTTAAAAAAAGCGGTTTGTTGGAGAAGGATGCATTTAAA
    NO: GCGAACTCTCGCAGATCCATGCAGGCCATCATTGATGAAAAATTTAAAGAGACGATCGAA
    51 GAACGTCTGAAATACACGGAATTTAGTGAGTGTGACTTAGGTAATATGACTTCTAAAGAT
    AAGAAAATCACCGATAAGGCGGCGACCAACCTGAAGAAGCAAGTCATTTTATCTTTTGAT
    GATGAAATCTTTAACAACTATTTGAAACCGGACAAAAACATCGATGCCTTATTTAAAAAT
    GACCCTTCGAACCCGGTGATTAGCACATTTAAGGGCTTCACAACGTATTTTGTCAATTTT
    TTTGAAATTCGTAAACATATCTTCAAAGGAGAATCAAGCGGCTCTATGGCTTATCGCATT
    ATTGATGAAAACCTGACGACCTATTTGAATAACATTGAAAAAATCAAAAAACTGCCAGAG
    GAATTAAAGTCTCAGTTAGAAGGCATCGACCAGATCGACAAACTCAACAACTATAACGAA
    TTTATTACGCAGTCTGGTATCACCCACTATAATGAAATTATTGGAGGTATCAGTAAATCA
    GAAAATGTGAAAATCCAAGGGATTAATGAAGGCATTAACCTCTATTGCCAGAAAAATAAA
    GTGAAACTGCCGAGGCTGACTCCACTCTACAAAATGATCCTGTCTGACCGCGTCTCGAAT
    AGCTTTGTCCTGGACACAATTGAAAACGATACGGAATTGATTGAGATGATAAGCGATCTG
    ATTAACAAAACCGAAATTTCACAGGATGTAATCATGAGTGATATACAAAACATCTTTATT
    AAATATAAACAGCTTGGTAATCTGCCTGGAATTAGCTATTCGTCAATAGTGAACGCAATC
    TGTTCTGATTATGATAACAATTTTGGCGACGGTAAGCGTAAAAAGAGTTATGAAAACGAT
    AGGAAAAAACACCTGGAAACTAACGTGTATTCTATCAACTATATCAGCGAACTGCTTACG
    GACACCGATGTGAGTTCAAACATTAAGATGCGGTATAAGGAGCTTGAACAGAACTACCAG
    GTCTGTAAGGAAAACTTCAACGCAACCAACTGGATGAACATTAAAAATATCAAACAATCC
    GAGAAGACCAACTTAATCAAAGATCTGCTGGATATTTTGAAGAGCATTCAACGTTTTTAT
    GATCTGTTCGATATCGTTGATGAAGACAAGAATCCTAGTGCGGAATTTTATACATGGCTG
    TCTAAAAATGCGGAGAAATTGGATTTCGAATTCAATTCTGTTTATAATAAATCACGCAAC
    TATTTGACCCGCAAACAATACAGCGACAAAAAGATAAAACTAAACTTCGACAGTCCGACA
    TTGGCAAAGGGCTGGGACGCAAATAAGGAAATCGATAACTCTACGATAATTATGCGTAAG
    TTCAATAATGATCGAGGTGATTATGATTATTTCTTAGGCATTTGGAACAAAAGCACCCCG
    GCCAACGAAAAGATAATTCCACTGGAGGATAACGGTCTGTTCGAAAAAATGCAGTACAAA
    TTATATCCGGATCCAAGCAAGATGCTTCCAAAGCAGTTTCTGTCTAAAATTTGGAAAGCT
    AAGCATCCGACCACCCCAGAATTTGACAAGAAATATAAGGAAGGCCGCCATAAGAAAGGT
    CCCGATTTTGAAAAAGAATTCTTGCACGAACTGATTGATTGCTTTAAACATGGCTTAGTC
    AATCACGATGAAAAGTATCAAGATGTTTTTGGATTCAATTTGAGAAACACAGAAGACTAC
    AATTCCTACACTGAGTTTCTCGAAGATGTGGAACGATGTAATTATAATCTGAGCTTTAAC
    AAAATCGCGGACACCTCGAATCTGATTAACGATGGTAAACTTTATGTTTTCCAGATCTGG
    AGCAAGGATTTCTCTATTGACAGCAAAGGCACCAAAAACCTGAACACCATTTACTTTGAA
    AGTCTCTTCAGCGAAGAAAATATGATTGAGAAAATGTTTAAACTTAGCGGTGAAGCTGAA
    ATATTCTATCGCCCGGCAAGCCTGAACTATTGCGAAGACATTATCAAAAAGGGTCATCAC
    CACGCTGAACTGAAAGATAAATTTGATTATCCTATCATAAAAGATAAACGCTATAGCCAG
    GATAAATTTTTTTTTCATGTTCCTATGGTCATTAACTACAAATCAGAAAAACTGAACTCT
    AAAAGCCTCAATAATCGAACCAATGAAAACCTTGGGCAGTTTACCCATATAATTGGAATT
    GATCGCGGAGAGCGTCATTTAATCTACCTGACCGTAGTCGATGTATCGACCGGCGAGATC
    GTCGAGCAGAAGCACTTAGACGAGATTATCAACACTGATACCAAAGGTGTTGAGCATAAG
    ACGCACTATCTAAACAAGCTGGAGGAAAAATCGAAAACCCGTGATAATGAACGTAAGAGT
    TGGGAGGCAATTGAAACGATTAAAGAACTGAAGGAGGGTTATATCAGCCACGTAATCAAT
    GAAATTCAAAAACTGCAGGAAAAATACAACGCCCTGATCGTTATGGAAAATCTGAATTAC
    GGTTTCAAAAATTCTCGCATCAAAGTGGAAAAACAGGTATATCAGAAGTTCGAGACGGCA
    TTAATTAAAAAGTTTAATTACATCATTGACAAAAAAGATCCGGAAACTTATATTCATGGC
    TATCAGCTGACGAACCCGATCACCACACTGGATAAAATTGGTAACCAGTCTGGTATCGTG
    CTTTACATCCCTGCCTGGAATACCAGTAAAATCGATCCGGTAACGGGATTCGTCAACCTT
    CTATATGCAGATGACCTCAAATATAAGAATCAGGAACAGGCCAAGTCTTTTATTCAGAAA
    ATCGATAACATTTACTTTGAGAATGGGGAATTCAAATTTGATATTGATTTTTCTAAATGG
    AACAATCGTTATAGTATATCTAAGACGAAATGGACGCTCACCTCGTACGGAACCCGAATC
    CAGACATTCCGCAATCCGCAGAAGAACAATAAATGGGACAGCGCCGAGTATGATCTCACT
    GAAGAATTCAAATTGATTCTGAACATTGACGGTACCCTGAAAAGCCAGGATGTCGAAACC
    TATAAAAAATTTATGTCTCTGTTCAAGCTGATGCTGCAACTTAGGAACTCTGTTACCGGC
    ACTGATATCGATTATATGATCTCCCCTGTCACTGATAAAACAGGTACGCATTTCGATTCG
    CGCGAAAATATCAAAAATCTGCCCGCAGATGCCGACGCCAATGGGGCGTACAATATTGCA
    CGCAAGGGTATCATGGCGATCGAAAACATTATGAATGGTATCAGCGACCCGCTGAAAATC
    TCAAACGAAGATTATTTGAAATATATCCAAAACCAGCAGGAATAA
    SEQ ATGACCCAGTTCGAAGGTTTCACCAACCTGTACCAGGTTTCTAAAACCCTGCGTTTCGAA
    ID CTGATCCCGCAGGGTAAAACCCTGAAACACATCCAGGAACAGGGTTTCATCGAAGAAGAC
    NO: AAAGCGCGTAACGACCACTACAAAGAACTGAAACCGATCATCGACCGTATCTACAAAACC
    52 TACGCGGACCAGTGCCTGCAGCTGGTTCAGCTGGACTGGGAAAACCTGTCTGCGGCGATC
    GACTCTTACCGTAAAGAAAAAACCGAAGAAACCCGTAACGCGCTGATCGAAGAACAGGCG
    ACCTACCGTAACGCGATCCACGACTACTTCATCGGTCGTACCGACAACCTGACCGACGCG
    ATCAACAAACGTCACGCGGAAATCTACAAAGGTCTGTTCAAAGCGGAACTGTTCAACGGT
    AAAGTTCTGAAACAGCTGGGTACCGTTACCACCACCGAACACGAAAACGCGCTGCTGCGT
    TCTTTCGACAAATTCACCACCTACTTCTCTGGTTTCTACGAAAACCGTAAAAACGTTTTC
    TCTGCGGAAGACATCTCTACCGCGATCCCGCACCGTATCGTTCAGGACAACTTCCCGAAA
    TTCAAAGAAAACTGCCACATCTTCACCCGTCTGATCACCGCGGTTCCGTCTCTGCGTGAA
    CACTTCGAAAACGTTAAAAAAGCGATCGGTATCTTCGTTTCTACCTCTATCGAAGAAGTT
    TTCTCTTTCCCGTTCTACAACCAGCTGCTGACCCAGACCCAGATCGACCTGTACAACCAG
    CTGCTGGGTGGTATCTCTCGTGAAGCGGGTACCGAAAAAATCAAAGGTCTGAACGAAGTT
    CTGAACCTGGCGATCCAGAAAAACGACGAAACCGCGCACATCATCGCGTCTCTGCCGCAC
    CGTTTCATCCCGCTGTTCAAACAGATCCTGTCTGACCGTAACACCCTGTCTTTCATCCTG
    GAAGAATTCAAATCTGACGAAGAAGTTATCCAGTCTTTCTGCAAATACAAAACCCTGCTG
    CGTAACGAAAACGTTCTGGAAACCGCGGAAGCGCTGTTCAACGAACTGAACTCTATCGAC
    CTGACCCACATCTTCATCTCTCACAAAAAACTGGAAACCATCTCTTCTGCGCTGTGCGAC
    CACTGGGACACCCTGCGTAACGCGCTGTACGAACGTCGTATCTCTGAACTGACCGGTAAA
    ATCACCAAATCTGCGAAAGAAAAAGTTCAGCGTTCTCTGAAACACGAAGACATCAACCTG
    CAGGAAATCATCTCTGCGGCGGGTAAAGAACTGTCTGAAGCGTTCAAACAGAAAACCTCT
    GAAATCCTGTCTCACGCGCACGCGGCGCTGGACCAGCCGCTGCCGACCACCCTGAAAAAA
    CAGGAAGAAAAAGAAATCCTGAAATCTCAGCTGGACTCTCTGCTGGGTCTGTACCACCTG
    CTGGACTGGTTCGCGGTTGACGAATCTAACGAAGTTGACCCGGAATTCTCTGCGCGTCTG
    ACCGGTATCAAACTGGAAATGGAACCGTCTCTGTCTTTCTACAACAAAGCGCGTAACTAC
    GCGACCAAAAAACCGTACTCTGTTGAAAAATTCAAACTGAACTTCCAGATGCCGACCCTG
    GCGTCTGGTTGGGACGTTAACAAAGAAAAAAACAACGGTGCGATCCTGTTCGTTAAAAAC
    GGTCTGTACTACCTGGGTATCATGCCGAAACAGAAAGGTCGTTACAAAGCGCTGTCTTTC
    GAACCGACCGAAAAAACCTCTGAAGGTTTCGACAAAATGTACTACGACTACTTCCCGGAC
    GCGGCGAAAATGATCCCGAAATGCTCTACCCAGCTGAAAGCGGTTACCGCGCACTTCCAG
    ACCCACACCACCCCGATCCTGCTGTCTAACAACTTCATCGAACCGCTGGAAATCACCAAA
    GAAATCTACGACCTGAACAACCCGGAAAAAGAACCGAAAAAATTCCAGACCGCGTACGCG
    AAAAAAACCGGTGACCAGAAAGGTTACCGTGAAGCGCTGTGCAAATGGATCGACTTCACC
    CGTGACTTCCTGTCTAAATACACCAAAACCACCTCTATCGACCTGTCTTCTCTGCGTCCG
    TCTTCTCAGTACAAAGACCTGGGTGAATACTACGCGGAACTGAACCCGCTGCTGTACCAC
    ATCTCTTTCCAGCGTATCGCGGAAAAAGAAATCATGGACGCGGTTGAAACCGGTAAACTG
    TACCTGTTCCAGATCTACAACAAAGACTTCGCGAAAGGTCACCACGGTAAACCGAACCTG
    CACACCCTGTACTGGACCGGTCTGTTCTCTCCGGAAAACCTGGCGAAAACCTCTATCAAA
    CTGAACGGTCAGGCGGAACTGTTCTACCGTCCGAAATCTCGTATGAAACGTATGGCGCAC
    CGTCTGGGTGAAAAAATGCTGAACAAAAAACTGAAAGACCAGAAAACCCCGATCCCGGAC
    ACCCTGTACCAGGAACTGTACGACTACGTTAACCACCGTCTGTCTCACGACCTGTCTGAC
    GAAGCGCGTGCGCTGCTGCCGAACGTTATCACCAAAGAAGTTTCTCACGAAATCATCAAA
    GACCGTCGTTTCACCTCTGACAAATTCTTCTTCCACGTTCCGATCACCCTGAACTACCAG
    GCGGCGAACTCTCCGTCTAAATTCAACCAGCGTGTTAACGCGTACCTGAAAGAACACCCG
    GAAACCCCGATCATCGGTATCGACCGTGGTGAACGTAACCTGATCTACATCACCGTTATC
    GACTCTACCGGTAAAATCCTGGAACAGCGTTCTCTGAACACCATCCAGCAGTTCGACTAC
    CAGAAAAAACTGGACAACCGTGAAAAAGAACGTGTTGCGGCGCGTCAGGCGTGGTCTGTT
    GTTGGTACCATCAAAGACCTGAAACAGGGTTACCTGTCTCAGGTTATCCACGAAATCGTT
    GACCTGATGATCCACTACCAGGCGGTTGTTGTTCTGGAAAACCTGAACTTCGGTTTCAAA
    TCTAAACGTACCGGTATCGCGGAAAAAGCGGTTTACCAGCAGTTCGAAAAAATGCTGATC
    GACAAACTGAACTGCCTGGTTCTGAAAGACTACCCGGCGGAAAAAGTTGGTGGTGTTCTG
    AACCCGTACCAGCTGACCGACCAGTTCACCTCTTTCGCGAAAATGGGTACCCAGTCTGGT
    TTCCTGTTCTACGTTCCGGCGCCGTACACCTCTAAAATCGACCCGCTGACCGGTTTCGTT
    GACCCGTTCGTTTGGAAAACCATCAAAAACCACGAATCTCGTAAACACTTCCTGGAAGGT
    TTCGACTTCCTGCACTACGACGTTAAAACCGGTGACTTCATCCTGCACTTCAAAATGAAC
    CGTAACCTGTCTTTCCAGCGTGGTCTGCCGGGTTTCATGCCGGCGTGGGACATCGTTTTC
    GAAAAAAACGAAACCCAGTTCGACGCGAAAGGTACCCCGTTCATCGCGGGTAAACGTATC
    GTTCCGGTTATCGAAAACCACCGTTTCACCGGTCGTTACCGTGACCTGTACCCGGCGAAC
    GAACTGATCGCGCTGCTGGAAGAAAAAGGTATCGTTTTCCGTGACGGTTCTAACATCCTG
    CCGAAACTGCTGGAAAACGACGACTCTCACGCGATCGACACCATGGTTGCGCTGATCCGT
    TCTGTTCTGCAGATGCGTAACTCTAACGCGGCGACCGGTGAAGACTACATCAACTCTCCG
    GTTCGTGACCTGAACGGTGTTTGCTTCGACTCTCGTTTCCAGAACCCGGAATGGCCGATG
    GACGCGGACGCGAACGGTGCGTACCACATCGCGCTGAAAGGTCAGCTGCTGCTGAACCAC
    CTGAAAGAATCTAAAGACCTGAAACTGCAGAACGGTATCTCTAACCAGGACTGGCTGGCG
    TACATCCAGGAACTGCGTAACTA
    SEQ ATGGCGGTTAAATCTATCAAAGTTAAACTGCGTCTGGACGACATGCCGGAAATCCGTGCG
    ID GGTCTGTGGAAACTGCACAAAGAAGTTAACGCGGGTGTTCGTTACTACACCGAATGGCTG
    NO: TCTCTGCTGCGTCAGGAAAACCTGTACCGTCGTTCTCCGAACGGTGACGGTGAACAGGAA
    53 TGCGACAAAACCGCGGAAGAATGCAAAGCGGAACTGCTGGAACGTCTGCGTGCGCGTCAG
    GTTGAAAACGGTCACCGTGGTCCGGCGGGTTCTGACGACGAACTGCTGCAGCTGGCGCGT
    CAGCTGTACGAACTGCTGGTTCCGCAGGCGATCGGTGCGAAAGGTGACGCGCAGCAGATC
    GCGCGTAAATTCCTGTCTCCGCTGGCGGACAAAGACGCGGTTGGTGGTCTGGGTATCGCG
    AAAGCGGGTAACAAACCGCGTTGGGTTCGTATGCGTGAAGCGGGTGAACCGGGTTGGGAA
    GAAGAAAAAGAAAAAGCGGAAACCCGTAAATCTGCGGACCGTACCGCGGACGTTCTGCGT
    GCGCTGGCGGACTTCGGTCTGAAACCGCTGATGCGTGTTTACACCGACTCTGAAATGTCT
    TCTGTTGAATGGAAACCGCTGCGTAAAGGTCAGGCGGTTCGTACCTGGGACCGTGACATG
    TTCCAGCAGGCGATCGAACGTATGATGTCTTGGGAATCTTGGAACCAGCGTGTTGGTCAG
    GAATACGCGAAACTGGTTGAACAGAAAAACCGTTTCGAACAGAAAAACTTCGTTGGTCAG
    GAACACCTGGTTCACCTGGTTAACCAGCTGCAGCAGGACATGAAAGAAGCGTCTCCGGGT
    CTGGAATCTAAAGAACAGACCGCGCACTACGTTACCGGTCGTGCGCTGCGTGGTTCTGAC
    AAAGTTTTCGAAAAATGGGGTAAACTGGCGCCGGACGCGCCGTTCGACCTGTACGACGCG
    GAAATCAAAAACGTTCAGCGTCGTAACACCCGTCGTTTCGGTTCTCACGACCTGTTCGCG
    AAACTGGCGGAACCGGAATACCAGGCGCTGTGGCGTGAAGACGCGTCTTTCCTGACCCGT
    TACGCGGTTTACAACTCTATCCTGCGTAAACTGAACCACGCGAAAATGTTCGCGACCTTC
    ACCCTGCCGGACGCGACCGCGCACCCGATCTGGACCCGTTTCGACAAACTGGGTGGTAAC
    CTGCACCAGTACACCTTCCTGTTCAACGAATTCGGTGAACGTCGTCACGCGATCCGTTTC
    CACAAACTGCTGAAAGTTGAAAACGGTGTTGCGCGTGAAGTTGACGACGTTACCGTTCCG
    ATCTCTATGTCTGAACAGCTGGACAACCTGCTGCCGCGTGACCCGAACGAACCGATCGCG
    CTGTACTTCCGTGACTACGGTGCGGAACAGCACTTCACCGGTGAATTCGGTGGTGCGAAA
    ATCCAGTGCCGTCGTGACCAGCTGGCGCACATGCACCGTCGTCGTGGTGCGCGTGACGTT
    TACCTGAACGTTTCTGTTCGTGTTCAGTCTCAGTCTGAAGCGCGTGGTGAACGTCGTCCG
    CCGTACGCGGCGGTTTTCCGTCTGGTTGGTGACAACCACCGTGCGTTCGTTCACTTCGAC
    AAACTGTCTGACTACCTGGCGGAACACCCGGACGACGGTAAACTGGGTTCTGAAGGTCTG
    CTGTCTGGTCTGCGTGTTATGTCTGTTGACCTGGGTCTGCGTACCTCTGCGTCTATCTCT
    GTTTTCCGTGTTGCGCGTAAAGACGAACTGAAACCGAACTCTAAAGGTCGTGTTCCGTTC
    TTCTTCCCGATCAAAGGTAACGACAACCTGGTTGCGGTTCACGAACGTTCTCAGCTGCTG
    AAACTGCCGGGTGAAACCGAATCTAAAGACCTGCGTGCGATCCGTGAAGAACGTCAGCGT
    ACCCTGCGTCAGCTGCGTACCCAGCTGGCGTACCTGCGTCTGCTGGTTCGTTGCGGTTCT
    GAAGACGTTGGTCGTCGTGAACGTTCTTGGGCGAAACTGATCGAACAGCCGGTTGACGCG
    GCGAACCACATGACCCCGGACTGGCGTGAAGCGTTCGAAAACGAACTGCAGAAACTGAAA
    TCTCTGCACGGTATCTGCTCTGACAAAGAATGGATGGACGCGGTTTACGAATCTGTTCGT
    CGTGTTTGGCGTCACATGGGTAAACAGGTTCGTGACTGGCGTAAAGACGTTCGTTCTGGT
    GAACGTCCGAAAATCCGTGGTTACGCGAAAGACGTTGTTGGTGGTAACTCTATCGAACAG
    ATCGAATACCTGGAACGTCAGTACAAATTCCTGAAATCTTGGTCTTTCTTCGGTAAAGTT
    TCTGGTCAGGTTATCCGTGCGGAAAAAGGTTCTCGTTTCGCGATCACCCTGCGTGAACAC
    ATCGACCACGCGAAAGAAGACCGTCTGAAAAAACTGGCGGACCGTATCATCATGGAAGCG
    CTGGGTTACGTTTACGCGCTGGACGAACGTGGTAAAGGTAAATGGGTTGCGAAATACCCG
    CCGTGCCAGCTGATCCTGCTGGAAGAACTGTCTGAATACCAGTTCAACAACGACCGTCCG
    CCGTCTGAAAACAACCAGCTGATGCAGTGGTCTCACCGTGGTGTTTTCCAGGAACTGATC
    AACCAGGCGCAGGTTCACGACCTGCTGGTTGGTACCATGTACGCGGCGTTCTCTTCTCGT
    TTCGACGCGCGTACCGGTGCGCCGGGTATCCGTTGCCGTCGTGTTCCGGCGCGTTGCACC
    CAGGAACACAACCCGGAACCGTTCCCGTGGTGGCTGAACAAATTCGTTGTTGAACACACC
    CTGGACGCGTGCCCGCTGCGTGCGGACGACCTGATCCCGACCGGTGAAGGTGAAATCTTC
    GTTTCTCCGTTCTCTGCGGAAGAAGGTGACTTCCACCAGATCCACGCGGACCTGAACGCG
    GCGCAGAACCTGCAGCAGCGTCTGTGGTCTGACTTCGACATCTCTCAGATCCGTCTGCGT
    TGCGACTGGGGTGAAGTTGACGGTGAACTGGTTCTGATCCCGCGTCTGACCGGTAAACGT
    ACCGCGGACTCTTACTCTAACAAAGTTTTCTACACCAACACCGGTGTTACCTACTACGAA
    CGTGAACGTGGTAAAAAACGTCGTAAAGTTTTCGCGCAGGAAAAACTGTCTGAAGAAGAA
    GCGGAACTGCTGGTTGAAGCGGACGAAGCGCGTGAAAAATCTGTTGTTCTGATGCGTGAC
    CCGTCTGGTATCATCAACCGTGGTAACTGGACCCGTCAGAAAGAATTCTGGTCTATGGTT
    AACCAGCGTATCGAAGGTTACCTGGTTAAACAGATCCGTTCTCGTGTTCCGCTGCAGGAC
    TCTGCGTGCGAAAACACCGGTGACATCTAA
    SEQ ATGGCGACCCGTTCTTTCATCCTGAAAATCGAACCGAACGAAGAAGTTAAAAAAGGTCTG
    ID TGGAAAACCCACGAAGTTCTGAACCACGGTATCGCGTACTACATGAACATCCTGAAACTG
    NO: ATCCGTCAGGAAGCGATCTACGAACACCACGAACAGGACCCGAAAAACCCGAAAAAAGTT
    54 TCTAAAGCGGAAATCCAGGCGGAACTGTGGGACTTCGTTCTGAAAATGCAGAAATGCAAC
    TCTTTCACCCACGAAGTTGACAAAGACGTTGTTTTCAACATCCTGCGTGAACTGTACGAA
    GAACTGGTTCCGTCTTCTGTTGAAAAAAAAGGTGAAGCGAACCAGCTGTCTAACAAATTC
    CTGTACCCGCTGGTTGACCCGAACTCTCAGTCTGGTAAAGGTACCGCGTCTTCTGGTCGT
    AAACCGCGTTGGTACAACCTGAAAATCGCGGGTGACCCGTCTTGGGAAGAAGAAAAAAAA
    AAATGGGAAGAAGACAAAAAAAAAGACCCGCTGGCGAAAATCCTGGGTAAACTGGCGGAA
    TACGGTCTGATCCCGCTGTTCATCCCGTTCACCGACTCTAACGAACCGATCGTTAAAGAA
    ATCAAATGGATGGAAAAATCTCGTAACCAGTCTGTTCGTCGTCTGGACAAAGACATGTTC
    ATCCAGGCGCTGGAACGTTTCCTGTCTTGGGAATCTTGGAACCTGAAAGTTAAAGAAGAA
    TACGAAAAAGTTGAAAAAGAACACAAAACCCTGGAAGAACGTATCAAAGAAGACATCCAG
    GCGTTCAAATCTCTGGAACAGTACGAAAAAGAACGTCAGGAACAGCTGCTGCGTGACACC
    CTGAACACCAACGAATACCGTCTGTCTAAACGTGGTCTGCGTGGTTGGCGTGAAATCATC
    CAGAAATGGCTGAAAATGGACGAAAACGAACCGTCTGAAAAATACCTGGAAGTTTTCAAA
    GACTACCAGCGTAAACACCCGCGTGAAGCGGGTGACTACTCTGTTTACGAATTCCTGTCT
    AAAAAAGAAAACCACTTCATCTGGCGTAACCACCCGGAATACCCGTACCTGTACGCGACC
    TTCTGCGAAATCGACAAAAAAAAAAAAGACGCGAAACAGCAGGCGACCTTCACCCTGGCG
    GACCCGATCAACCACCCGCTGTGGGTTCGTTTCGAAGAACGTTCTGGTTCTAACCTGAAC
    AAATACCGTATCCTGACCGAACAGCTGCACACCGAAAAACTGAAAAAAAAACTGACCGTT
    CAGCTGGACCGTCTGATCTACCCGACCGAATCTGGTGGTTGGGAAGAAAAAGGTAAAGTT
    GACATCGTTCTGCTGCCGTCTCGTCAGTTCTACAACCAGATCTTCCTGGACATCGAAGAA
    AAAGGTAAACACGCGTTCACCTACAAAGACGAATCTATCAAATTCCCGCTGAAAGGTACC
    CTGGGTGGTGCGCGTGTTCAGTTCGACCGTGACCACCTGCGTCGTTACCCGCACAAAGTT
    GAATCTGGTAACGTTGGTCGTATCTACTTCAACATGACCGTTAACATCGAACCGACCGAA
    TCTCCGGTTTCTAAATCTCTGAAAATCCACCGTGACGACTTCCCGAAATTCGTTAACTTC
    AAACCGAAAGAACTGACCGAATGGATCAAAGACTCTAAAGGTAAAAAACTGAAATCTGGT
    ATCGAATCTCTGGAAATCGGTCTGCGTGTTATGTCTATCGACCTGGGTCAGCGTCAGGCG
    GCGGCGGCGTCTATCTTCGAAGTTGTTGACCAGAAACCGGACATCGAAGGTAAACTGTTC
    TTCCCGATCAAAGGTACCGAACTGTACGCGGTTCACCGTGCGTCTTTCAACATCAAACTG
    CCGGGTGAAACCCTGGTTAAATCTCGTGAAGTTCTGCGTAAAGCGCGTGAAGACAACCTG
    AAACTGATGAACCAGAAACTGAACTTCCTGCGTAACGTTCTGCACTTCCAGCAGTTCGAA
    GACATCACCGAACGTGAAAAACGTGTTACCAAATGGATCTCTCGTCAGGAAAACTCTGAC
    GTTCCGCTGGTTTACCAGGACGAACTGATCCAGATCCGTGAACTGATGTACAAACCGTAC
    AAAGACTGGGTTGCGTTCCTGAAACAGCTGCACAAACGTCTGGAAGTTGAAATCGGTAAA
    GAAGTTAAACACTGGCGTAAATCTCTGTCTGACGGTCGTAAAGGTCTGTACGGTATCTCT
    CTGAAAAACATCGACGAAATCGACCGTACCCGTAAATTCCTGCTGCGTTGGTCTCTGCGT
    CCGACCGAACCGGGTGAAGTTCGTCGTCTGGAACCGGGTCAGCGTTTCGCGATCGACCAG
    CTGAACCACCTGAACGCGCTGAAAGAAGACCGTCTGAAAAAAATGGCGAACACCATCATC
    ATGCACGCGCTGGGTTACTGCTACGACGTTCGTAAAAAAAAATGGCAGGCGAAAAACCCG
    GCGTGCCAGATCATCCTGTTCGAAGACCTGTCTAACTACAACCCGTACGAAGAACGTTCT
    CGTTTCGAAAACTCTAAACTGATGAAATGGTCTCGTCGTGAAATCCCGCGTCAGGTTGCG
    CTGCAGGGTGAAATCTACGGTCTGCAGGTTGGTGAAGTTGGTGCGCAGTTCTCTTCTCGT
    TTCCACGCGAAAACCGGTTCTCCGGGTATCCGTTGCTCTGTTGTTACCAAAGAAAAACTG
    CAGGACAACCGTTTCTTCAAAAACCTGCAGCGTGAAGGTCGTCTGACCCTGGACAAAATC
    GCGGTTCTGAAAGAAGGTGACCTGTACCCGGACAAAGGTGGTGAAAAATTCATCTCTCTG
    TCTAAAGACCGTAAACTGGTTACCACCCACGCGGACATCAACGCGGCGCAGAACCTGCAG
    AAACGTTTCTGGACCCGTACCCACGGTTTCTACAAAGTTTACTGCAAAGCGTACCAGGTT
    GACGGTCAGACCGTTTACATCCCGGAATCTAAAGACCAGAAACAGAAAATCATCGAAGAA
    TTCGGTGAAGGTTACTTCATCCTGAAAGACGGTGTTTACGAATGGGGTAACGCGGGTAAA
    CTGAAAATCAAAAAAGGTTCTTCTAAACAGTCTTCTTCTGAACTGGTTGACTCTGACATC
    CTGAAAGACTCTTTCGACCTGGCGTCTGAACTGAAAGGTGAAAAACTGATGCTGTACCGT
    GACCCGTCTGGTAACGTTTTCCCGTCTGACAAATGGATGGCGGCGGGTGTTTTCTTCGGT
    AAACTGGAACGTATCCTGATCTCTAAACTGACCAACCAGTACTCTATCTCTACCATCGAA
    GACGACTCTTCTAAACAGTCTATGTAA
    SEQ ATGCCGACCCGTACCATCAACCTGAAACTGGTTCTGGGTAAAAACCCGGAAAACGCGACC
    ID CTGCGTCGTGCGCTGTTCTCTACCCACCGTCTGGTTAACCAGGCGACCAAACGTATCGAA
    NO: GAATTCCTGCTGCTGTGCCGTGGTGAAGCGTACCGTACCGTTGACAACGAAGGTAAAGAA
    55 GCGGAAATCCCGCGTCACGCGGTTCAGGAAGAAGCGCTGGCGTTCGCGAAAGCGGCGCAG
    CGTCACAACGGTTGCATCTCTACCTACGAAGACCAGGAAATCCTGGACGTTCTGCGTCAG
    CTGTACGAACGTCTGGTTCCGTCTGTTAACGAAAACAACGAAGCGGGTGACGCGCAGGCG
    GCGAACGCGTGGGTTTCTCCGCTGATGTCTGCGGAATCTGAAGGTGGTCTGTCTGTTTAC
    GACAAAGTTCTGGACCCGCCGCCGGTTTGGATGAAACTGAAAGAAGAAAAAGCGCCGGGT
    TGGGAAGCGGCGTCTCAGATCTGGATCCAGTCTGACGAAGGTCAGTCTCTGCTGAACAAA
    CCGGGTTCTCCGCCGCGTTGGATCCGTAAACTGCGTTCTGGTCAGCCGTGGCAGGACGAC
    TTCGTTTCTGACCAGAAAAAAAAACAGGACGAACTGACCAAAGGTAACGCGCCGCTGATC
    AAACAGCTGAAAGAAATGGGTCTGCTGCCGCTGGTTAACCCGTTCTTCCGTCACCTGCTG
    GACCCGGAAGGTAAAGGTGTTTCTCCGTGGGACCGTCTGGCGGTTCGTGCGGCGGTTGCG
    CACTTCATCTCTTGGGAATCTTGGAACCACCGTACCCGTGCGGAATACAACTCTCTGAAA
    CTGCGTCGTGACGAATTCGAAGCGGCGTCTGACGAATTCAAAGACGACTTCACCCTGCTG
    CGTCAGTACGAAGCGAAACGTCACTCTACCCTGAAATCTATCGCGCTGGCGGACGACTCT
    AACCCGTACCGTATCGGTGTTCGTTCTCTGCGTGCGTGGAACCGTGTTCGTGAAGAATGG
    ATCGACAAAGGTGCGACCGAAGAACAGCGTGTTACCATCCTGTCTAAACTGCAGACCCAG
    CTGCGTGGTAAATTCGGTGACCCGGACCTGTTCAACTGGCTGGCGCAGGACCGTCACGTT
    CACCTGTGGTCTCCGCGTGACTCTGTTACCCCGCTGGTTCGTATCAACGCGGTTGACAAA
    GTTCTGCGTCGTCGTAAACCGTACGCGCTGATGACCTTCGCGCACCCGCGTTTCCACCCG
    CGTTGGATCCTGTACGAAGCGCCGGGTGGTTCTAACCTGCGTCAGTACGCGCTGGACTGC
    ACCGAAAACGCGCTGCACATCACCCTGCCGCTGCTGGTTGACGACGCGCACGGTACCTGG
    ATCGAAAAAAAAATCCGTGTTCCGCTGGCGCCGTCTGGTCAGATCCAGGACCTGACCCTG
    GAAAAACTGGAAAAAAAAAAAAACCGTCTGTACTACCGTTCTGGTTTCCAGCAGTTCGCG
    GGTCTGGCGGGTGGTGCGGAAGTTCTGTTCCACCGTCCGTACATGGAACACGACGAACGT
    TCTGAAGAATCTCTGCTGGAACGTCCGGGTGCGGTTTGGTTCAAACTGACCCTGGACGTT
    GCGACCCAGGCGCCGCCGAACTGGCTGGACGGTAAAGGTCGTGTTCGTACCCCGCCGGAA
    GTTCACCACTTCAAAACCGCGCTGTCTAACAAATCTAAACACACCCGTACCCTGCAGCCG
    GGTCTGCGTGTTCTGTCTGTTGACCTGGGTATGCGTACCTTCGCGTCTTGCTCTGTTTTC
    GAACTGATCGAAGGTAAACCGGAAACCGGTCGTGCGTTCCCGGTTGCGGACGAACGTTCT
    ATGGACTCTCCGAACAAACTGTGGGCGAAACACGAACGTTCTTTCAAACTGACCCTGCCG
    GGTGAAACCCCGTCTCGTAAAGAAGAAGAAGAACGTTCTATCGCGCGTGCGGAAATCTAC
    GCGCTGAAACGTGACATCCAGCGTCTGAAATCTCTGCTGCGTCTGGGTGAAGAAGACAAC
    GACAACCGTCGTGACGCGCTGCTGGAACAGTTCTTCAAAGGTTGGGGTGAAGAAGACGTT
    GTTCCGGGTCAGGCGTTCCCGCGTTCTCTGTTCCAGGGTCTGGGTGCGGCGCCGTTCCGT
    TCTACCCCGGAACTGTGGCGTCAGCACTGCCAGACCTACTACGACAAAGCGGAAGCGTGC
    CTGGCGAAACACATCTCTGACTGGCGTAAACGTACCCGTCCGCGTCCGACCTCTCGTGAA
    ATGTGGTACAAAACCCGTTCTTACCACGGTGGTAAATCTATCTGGATGCTGGAATACCTG
    GACGCGGTTCGTAAACTGCTGCTGTCTTGGTCTCTGCGTGGTCGTACCTACGGTGCGATC
    AACCGTCAGGACACCGCGCGTTTCGGTTCTCTGGCGTCTCGTCTGCTGCACCACATCAAC
    TCTCTGAAAGAAGACCGTATCAAAACCGGTGCGGACTCTATCGTTCAGGCGGCGCGTGGT
    TACATCCCGCTGCCGCACGGTAAAGGTTGGGAACAGCGTTACGAACCGTGCCAGCTGATC
    CTGTTCGAAGACCTGGCGCGTTACCGTTTCCGTGTTGACCGTCCGCGTCGTGAAAACTCT
    CAGCTGATGCAGTGGAACCACCGTGCGATCGTTGCGGAAACCACCATGCAGGCGGAACTG
    TACGGTCAGATCGTTGAAAACACCGCGGCGGGTTTCTCTTCTCGTTTCCACGCGGCGACC
    GGTGCGCCGGGTGTTCGTTGCCGTTTCCTGCTGGAACGTGACTTCGACAACGACCTGCCG
    AAACCGTACCTGCTGCGTGAACTGTCTTGGATGCTGGGTAACACCAAAGTTGAATCTGAA
    GAAGAAAAACTGCGTCTGCTGTCTGAAAAAATCCGTCCGGGTTCTCTGGTTCCGTGGGAC
    GGTGGTGAACAGTTCGCGACCCTGCACCCGAAACGTCAGACCCTGTGCGTTATCCACGCG
    GACATGAACGCGGCGCAGAACCTGCAGCGTCGTTTCTTCGGTCGTTGCGGTGAAGCGTTC
    CGTCTGGTTTGCCAGCCGCACGGTGACGACGTTCTGCGTCTGGCGTCTACCCCGGGTGCG
    CGTCTGCTGGGTGCGCTGCAGCAGCTGGAAAACGGTCAGGGTGCGTTCGAACTGGTTCGT
    GACATGGGTTCTACCTCTCAGATGAACCGTTTCGTTATGAAATCTCTGGGTAAAAAAAAA
    ATCAAACCGCTGCAGGACAACAACGGTGACGACGAACTGGAAGACGTTCTGTCTGTTCTG
    CCGGAAGAAGACGACACCGGTCGTATCACCGTTTTCCGTGACTCTTCTGGTATCTTCTTC
    CCGTGCAACGTTTGGATCCCGGCGAAACAGTTCTGGCCGGCGGTTCGTGCGATGATCTGG
    AAAGTTATGGCGTCTCACTCTCTGGGTTAA
    SEQ ATGACCAAACTGCGTCACCGTCAGAAAAAACTGACCCACGACTGGGCGGGTTCTAAAAAA
    ID CGTGAAGTTCTGGGTTCTAACGGTAAACTGCAGAACCCGCTGCTGATGCCGGTTAAAAAA
    NO: GGTCAGGTTACCGAATTCCGTAAAGCGTTCTCTGCGTACGCGCGTGCGACCAAAGGTGAA
    56 ATGACCGACGGTCGTAAAAACATGTTCACCCACTCTTTCGAACCGTTCAAAACCAAACCG
    TCTCTGCACCAGTGCGAACTGGCGGACAAAGCGTACCAGTCTCTGCACTCTTACCTGCCG
    GGTTCTCTGGCGCACTTCCTGCTGTCTGCGCACGCGCTGGGTTTCCGTATCTTCTCTAAA
    TCTGGTGAAGCGACCGCGTTCCAGGCGTCTTCTAAAATCGAAGCGTACGAATCTAAACTG
    GCGTCTGAACTGGCGTGCGTTGACCTGTCTATCCAGAACCTGACCATCTCTACCCTGTTC
    AACGCGCTGACCACCTCTGTTCGTGGTAAAGGTGAAGAAACCTCTGCGGACCCGCTGATC
    GCGCGTTTCTACACCCTGCTGACCGGTAAACCGCTGTCTCGTGACACCCAGGGTCCGGAA
    CGTGACCTGGCGGAAGTTATCTCTCGTAAAATCGCGTCTTCTTTCGGTACCTGGAAAGAA
    ATGACCGCGAACCCGCTGCAGTCTCTGCAGTTCTTCGAAGAAGAACTGCACGCGCTGGAC
    GCGAACGTTTCTCTGTCTCCGGCGTTCGACGTTCTGATCAAAATGAACGACCTGCAGGGT
    GACCTGAAAAACCGTACCATCGTTTTCGACCCGGACGCGCCGGTTTTCGAATACAACGCG
    GAAGACCCGGCGGACATCATCATCAAACTGACCGCGCGTTACGCGAAAGAAGCGGTTATC
    AAAAACCAGAACGTTGGTAACTACGTTAAAAACGCGATCACCACCACCAACGCGAACGGT
    CTGGGTTGGCTGCTGAACAAAGGTCTGTCTCTGCTGCCGGTTTCTACCGACGACGAACTG
    CTGGAATTCATCGGTGTTGAACGTTCTCACCCGTCTTGCCACGCGCTGATCGAACTGATC
    GCGCAGCTGGAAGCGCCGGAACTGTTCGAAAAAAACGTTTTCTCTGACACCCGTTCTGAA
    GTTCAGGGTATGATCGACTCTGCGGTTTCTAACCACATCGCGCGTCTGTCTTCTTCTCGT
    AACTCTCTGTCTATGGACTCTGAAGAACTGGAACGTCTGATCAAATCTTTCCAGATCCAC
    ACCCCGCACTGCTCTCTGTTCATCGGTGCGCAGTCTCTGTCTCAGCAGCTGGAATCTCTG
    CCGGAAGCGCTGCAGTCTGGTGTTAACTCTGCGGACATCCTGCTGGGTTCTACCCAGTAC
    ATGCTGACCAACTCTCTGGTTGAAGAATCTATCGCGACCTACCAGCGTACCCTGAACCGT
    ATCAACTACCTGTCTGGTGTTGCGGGTCAGATCAACGGTGCGATCAAACGTAAAGCGATC
    GACGGTGAAAAAATCCACCTGCCGGCGGCGTGGTCTGAACTGATCTCTCTGCCGTTCATC
    GGTCAGCCGGTTATCGACGTTGAATCTGACCTGGCGCACCTGAAAAACCAGTACCAGACC
    CTGTCTAACGAATTCGACACCCTGATCTCTGCGCTGCAGAAAAACTTCGACCTGAACTTC
    AACAAAGCGCTGCTGAACCGTACCCAGCACTTCGAAGCGATGTGCCGTTCTACCAAAAAA
    AACGCGCTGTCTAAACCGGAAATCGTTTCTTACCGTGACCTGCTGGCGCGTCTGACCTCT
    TGCCTGTACCGTGGTTCTCTGGTTCTGCGTCGTGCGGGTATCGAAGTTCTGAAAAAACAC
    AAAATCTTCGAATCTAACTCTGAACTGCGTGAACACGTTCACGAACGTAAACACTTCGTT
    TTCGTTTCTCCGCTGGACCGTAAAGCGAAAAAACTGCTGCGTCTGACCGACTCTCGTCCG
    GACCTGCTGCACGTTATCGACGAAATCCTGCAGCACGACAACCTGGAAAACAAAGACCGT
    GAATCTCTGTGGCTGGTTCGTTCTGGTTACCTGCTGGCGGGTCTGCCGGACCAGCTGTCT
    TCTTCTTTCATCAACCTGCCGATCATCACCCAGAAAGGTGACCGTCGTCTGATCGACCTG
    ATCCAGTACGACCAGATCAACCGTGACGCGTTCGTTATGCTGGTTACCTCTGCGTTCAAA
    TCTAACCTGTCTGGTCTGCAGTACCGTGCGAACAAACAGTCTTTCGTTGTTACCCGTACC
    CTGTCTCCGTACCTGGGTTCTAAACTGGTTTACGTTCCGAAAGACAAAGACTGGCTGGTT
    CCGTCTCAGATGTTCGAAGGTCGTTTCGCGGACATCCTGCAGTCTGACTACATGGTTTGG
    AAAGACGCGGGTCGTCTGTGCGTTATCGACACCGCGAAACACCTGTCTAACATCAAAAAA
    TCTGTTTTCTCTTCTGAAGAAGTTCTGGCGTTCCTGCGTGAACTGCCGCACCGTACCTTC
    ATCCAGACCGAAGTTCGTGGTCTGGGTGTTAACGTTGACGGTATCGCGTTCAACAACGGT
    GACATCCCGTCTCTGAAAACCTTCTCTAACTGCGTTCAGGTTAAAGTTTCTCGTACCAAC
    ACCTCTCTGGTTCAGACCCTGAACCGTTGGTTCGAAGGTGGTAAAGTTTCTCCGCCGTCT
    ATCCAGTTCGAACGTGCGTACTACAAAAAAGACGACCAGATCCACGAAGACGCGGCGAAA
    CGTAAAATCCGTTTCCAGATGCCGGCGACCGAACTGGTTCACGCGTCTGACGACGCGGGT
    TGGACCCCGTCTTACCTGCTGGGTATCGACCCGGGTGAATACGGTATGGGTCTGTCTCTG
    GTTTCTATCAACAACGGTGAAGTTCTGGACTCTGGTTTCATCCACATCAACTCTCTGATC
    AACTTCGCGTCTAAAAAATCTAACCACCAGACCAAAGTTGTTCCGCGTCAGCAGTACAAA
    TCTCCGTACGCGAACTACCTGGAACAGTCTAAAGACTCTGCGGCGGGTGACATCGCGCAC
    ATCCTGGACCGTCTGATCTACAAACTGAACGCGCTGCCGGTTTTCGAAGCGCTGTCTGGT
    AACTCTCAGTCTGCGGCGGACCAGGTTTGGACCAAAGTTCTGTCTTTCTACACCTGGGGT
    GACAACGACGCGCAGAACTCTATCCGTAAACAGCACTGGTTCGGTGCGTCTCACTGGGAC
    ATCAAAGGTATGCTGCGTCAGCCGCCGACCGAAAAAAAACCGAAACCGTACATCGCGTTC
    CCGGGTTCTCAGGTTTCTTCTTACGGTAACTCTCAGCGTTGCTCTTGCTGCGGTCGTAAC
    CCGATCGAACAGCTGCGTGAAATGGCGAAAGACACCTCTATCAAAGAACTGAAAATCCGT
    AACTCTGAAATCCAGCTGTTCGACGGTACCATCAAACTGTTCAACCCGGACCCGTCTACC
    GTTATCGAACGTCGTCGTCACAACCTGGGTCCGTCTCGTATCCCGGTTGCGGACCGTACC
    TTCAAAAACATCTCTCCGTCTTCTCTGGAATTCAAAGAACTGATCACCATCGTTTCTCGT
    TCTATCCGTCACTCTCCGGAATTCATCGCGAAAAAACGTGGTATCGGTTCTGAATACTTC
    TGCGCGTACTCTGACTGCAACTCTTCTCTGAACTCTGAAGCGAACGCGGCGGCGAACGTT
    GCGCAGAAATTCCAGAAACAGCTGTTCTTCGAACTGTAA
    SEQ ATGAAACGTATCCTGAACTCTCTGAAAGTTGCGGCGCTGCGTCTGCTGTTCCGTGGTAAA
    ID GGTTCTGAACTGGTTAAAACCGTTAAATACCCGCTGGTTTCTCCGGTTCAGGGTGCGGTT
    NO: GAAGAACTGGCGGAAGCGATCCGTCACGACAACCTGCACCTGTTCGGTCAGAAAGAAATC
    57 GTTGACCTGATGGAAAAAGACGAAGGTACCCAGGTTTACTCTGTTGTTGACTTCTGGCTG
    GACACCCTGCGTCTGGGTATGTTCTTCTCTCCGTCTGCGAACGCGCTGAAAATCACCCTG
    GGTAAATTCAACTCTGACCAGGTTTCTCCGTTCCGTAAAGTTCTGGAACAGTCTCCGTTC
    TTCCTGGCGGGTCGTCTGAAAGTTGAACCGGCGGAACGTATCCTGTCTGTTGAAATCCGT
    AAAATCGGTAAACGTGAAAACCGTGTTGAAAACTACGCGGCGGACGTTGAAACCTGCTTC
    ATCGGTCAGCTGTCTTCTGACGAAAAACAGTCTATCCAGAAACTGGCGAACGACATCTGG
    GACTCTAAAGACCACGAAGAACAGCGTATGCTGAAAGCGGACTTCTTCGCGATCCCGCTG
    ATCAAAGACCCGAAAGCGGTTACCGAAGAAGACCCGGAAAACGAAACCGCGGGTAAACAG
    AAACCGCTGGAACTGTGCGTTTGCCTGGTTCCGGAACTGTACACCCGTGGTTTCGGTTCT
    ATCGCGGACTTCCTGGTTCAGCGTCTGACCCTGCTGCGTGACAAAATGTCTACCGACACC
    GCGGAAGACTGCCTGGAATACGTTGGTATCGAAGAAGAAAAAGGTAACGGTATGAACTCT
    CTGCTGGGTACCTTCCTGAAAAACCTGCAGGGTGACGGTTTCGAACAGATCTTCCAGTTC
    ATGCTGGGTTCTTACGTTGGTTGGCAGGGTAAAGAAGACGTTCTGCGTGAACGTCTGGAC
    CTGCTGGCGGAAAAAGTTAAACGTCTGCCGAAACCGAAATTCGCGGGTGAATGGTCTGGT
    CACCGTATGTTCCTGCACGGTCAGCTGAAATCTTGGTCTTCTAACTTCTTCCGTCTGTTC
    AACGAAACCCGTGAACTGCTGGAATCTATCAAATCTGACATCCAGCACGCGACCATGCTG
    ATCTCTTACGTTGAAGAAAAAGGTGGTTACCACCCGCAGCTGCTGTCTCAGTACCGTAAA
    CTGATGGAACAGCTGCCGGCGCTGCGTACCAAAGTTCTGGACCCGGAAATCGAAATGACC
    CACATGTCTGAAGCGGTTCGTTCTTACATCATGATCCACAAATCTGTTGCGGGTTTCCTG
    CCGGACCTGCTGGAATCTCTGGACCGTGACAAAGACCGTGAATTCCTGCTGTCTATCTTC
    CCGCGTATCCCGAAAATCGACAAAAAAACCAAAGAAATCGTTGCGTGGGAACTGCCGGGT
    GAACCGGAAGAAGGTTACCTGTTCACCGCGAACAACCTGTTCCGTAACTTCCTGGAAAAC
    CCGAAACACGTTCCGCGTTTCATGGCGGAACGTATCCCGGAAGACTGGACCCGTCTGCGT
    TCTGCGCCGGTTTGGTTCGACGGTATGGTTAAACAGTGGCAGAAAGTTGTTAACCAGCTG
    GTTGAATCTCCGGGTGCGCTGTACCAGTTCAACGAATCTTTCCTGCGTCAGCGTCTGCAG
    GCGATGCTGACCGTTTACAAACGTGACCTGCAGACCGAAAAATTCCTGAAACTGCTGGCG
    GACGTTTGCCGTCCGCTGGTTGACTTCTTCGGTCTGGGTGGTAACGACATCATCTTCAAA
    TCTTGCCAGGACCCGCGTAAACAGTGGCAGACCGTTATCCCGCTGTCTGTTCCGGCGGAC
    GTTTACACCGCGTGCGAAGGTCTGGCGATCCGTCTGCGTGAAACCCTGGGTTTCGAATGG
    AAAAACCTGAAAGGTCACGAACGTGAAGACTTCCTGCGTCTGCACCAGCTGCTGGGTAAC
    CTGCTGTTCTGGATCCGTGACGCGAAACTGGTTGTTAAACTGGAAGACTGGATGAACAAC
    CCGTGCGTTCAGGAATACGTTGAAGCGCGTAAAGCGATCGACCTGCCGCTGGAAATCTTC
    GGTTTCGAAGTTCCGATCTTCCTGAACGGTTACCTGTTCTCTGAACTGCGTCAGCTGGAA
    CTGCTGCTGCGTCGTAAATCTGTTATGACCTCTTACTCTGTTAAAACCACCGGTTCTCCG
    AACCGTCTGTTCCAGCTGGTTTACCTGCCGCTGAACCCGTCTGACCCGGAAAAAAAAAAC
    TCTAACAACTTCCAGGAACGTCTGGACACCCCGACCGGTCTGTCTCGTCGTTTCCTGGAC
    CTGACCCTGGACGCGTTCGCGGGTAAACTGCTGACCGACCCGGTTACCCAGGAACTGAAA
    ACCATGGCGGGTTTCTACGACCACCTGTTCGGTTTCAAACTGCCGTGCAAACTGGCGGCG
    ATGTCTAACCACCCGGGTTCTTCTTCTAAAATGGTTGTTCTGGCGAAACCGAAAAAAGGT
    GTTGCGTCTAACATCGGTTTCGAACCGATCCCGGACCCGGCGCACCCGGTTTTCCGTGTT
    CGTTCTTCTTGGCCGGAACTGAAATACCTGGAAGGTCTGCTGTACCTGCCGGAAGACACC
    CCGCTGACCATCGAACTGGCGGAAACCTCTGTTTCTTGCCAGTCTGTTTCTTCTGTTGCG
    TTCGACCTGAAAAACCTGACCACCATCCTGGGTCGTGTTGGTGAATTCCGTGTTACCGCG
    GACCAGCCGTTCAAACTGACCCCGATCATCCCGGAAAAAGAAGAATCTTTCATCGGTAAA
    ACCTACCTGGGTCTGGACGCGGGTGAACGTTCTGGTGTTGGTTTCGCGATCGTTACCGTT
    GACGGTGACGGTTACGAAGTTCAGCGTCTGGGTGTTCACGAAGACACCCAGCTGATGGCG
    CTGCAGCAGGTTGCGTCTAAATCTCTGAAAGAACCGGTTTTCCAGCCGCTGCGTAAAGGT
    ACCTTCCGTCAGCAGGAACGTATCCGTAAATCTCTGCGTGGTTGCTACTGGAACTTCTAC
    CACGCGCTGATGATCAAATACCGTGCGAAAGTTGTTCACGAAGAATCTGTTGGTTCTTCT
    GGTCTGGTTGGTCAGTGGCTGCGTGCGTTCCAGAAAGACCTGAAAAAAGCGGACGTTCTG
    CCGAAAAAAGGTGGTAAAAACGGTGTTGACAAAAAAAAACGTGAATCTTCTGCGCAGGAC
    ACCCTGTGGGGTGGTGCGTTCTCTAAAAAAGAAGAACAGCAGATCGCGTTCGAAGTTCAG
    GCGGCGGGTTCTTCTCAGTTCTGCCTGAAATGCGGTTGGTGGTTCCAGCTGGGTATGCGT
    GAAGTTAACCGTGTTCAGGAATCTGGTGTTGTTCTGGACTGGAACCGTTCTATCGTTACC
    TTCCTGATCGAATCTTCTGGTGAAAAAGTTTACGGTTTCTCTCCGCAGCAGCTGGAAAAA
    GGTTTCCGTCCGGACATCGAAACCTTCAAAAAAATGGTTCGTGACTTCATGCGTCCGCCG
    ATGTTCGACCGTAAAGGTCGTCCGGCGGCGGCGTACGAACGTTTCGTTCTGGGTCGTCGT
    CACCGTCGTTACCGTTTCGACAAAGTTTTCGAAGAACGTTTCGGTCGTTCTGCGCTGTTC
    ATCTGCCCGCGTGTTGGTTGCGGTAACTTCGACCACTCTTCTGAACAGTCTGCGGTTGTT
    CTGGCGCTGATCGGTTACATCGCGGACAAAGAAGGTATGTCTGGTAAAAAACTGGTTTAC
    GTTCGTCTGGCGGAACTGATGGCGGAATGGAAACTGAAAAAACTGGAACGTTCTCGTGTT
    GAAGAACAGTCTTCTGCGCAGTAA
    SEQ ATGGCGGAATCTAAACAGATGCAGTGCCGTAAATGCGGTGCGTCTATGAAATACGAAGTT
    ID ATCGGTCTGGGTAAAAAATCTTGCCGTTACATGTGCCCGGACTGCGGTAACCACACCTCT
    NO: GCGCGTAAAATCCAGAACAAAAAAAAACGTGACAAAAAATACGGTTCTGCGTCTAAAGCG
    58 CAGTCTCAGCGTATCGCGGTTGCGGGTGCGCTGTACCCGGACAAAAAAGTTCAGACCATC
    AAAACCTACAAATACCCGGCGGACCTGAACGGTGAAGTTCACGACTCTGGTGTTGCGGAA
    AAAATCGCGCAGGCGATCCAGGAAGACGAAATCGGTCTGCTGGGTCCGTCTTCTGAATAC
    GCGTGCTGGATCGCGTCTCAGAAACAGTCTGAACCGTACTCTGTTGTTGACTTCTGGTTC
    GACGCGGTTTGCGCGGGTGGTGTTTTCGCGTACTCTGGTGCGCGTCTGCTGTCTACCGTT
    CTGCAGCTGTCTGGTGAAGAATCTGTTCTGCGTGCGGCGCTGGCGTCTTCTCCGTTCGTT
    GACGACATCAACCTGGCGCAGGCGGAAAAATTCCTGGCGGTTTCTCGTCGTACCGGTCAG
    GACAAACTGGGTAAACGTATCGGTGAATGCTTCGCGGAAGGTCGTCTGGAAGCGCTGGGT
    ATCAAAGACCGTATGCGTGAATTCGTTCAGGCGATCGACGTTGCGCAGACCGCGGGTCAG
    CGTTTCGCGGCGAAACTGAAAATCTTCGGTATCTCTCAGATGCCGGAAGCGAAACAGTGG
    AACAACGACTCTGGTCTGACCGTTTGCATCCTGCCGGACTACTACGTTCCGGAAGAAAAC
    CGTGCGGACCAGCTGGTTGTTCTGCTGCGTCGTCTGCGTGAAATCGCGTACTGCATGGGT
    ATCGAAGACGAAGCGGGTTTCGAACACCTGGGTATCGACCCGGGTGCGCTGTCTAACTTC
    TCTAACGGTAACCCGAAACGTGGTTTCCTGGGTCGTCTGCTGAACAACGACATCATCGCG
    CTGGCGAACAACATGTCTGCGATGACCCCGTACTGGGAAGGTCGTAAAGGTGAACTGATC
    GAACGTCTGGCGTGGCTGAAACACCGTGCGGAAGGTCTGTACCTGAAAGAACCGCACTTC
    GGTAACTCTTGGGCGGACCACCGTTCTCGTATCTTCTCTCGTATCGCGGGTTGGCTGTCT
    GGTTGCGCGGGTAAACTGAAAATCGCGAAAGACCAGATCTCTGGTGTTCGTACCGACCTG
    TTCCTGCTGAAACGTCTGCTGGACGCGGTTCCGCAGTCTGCGCCGTCTCCGGACTTCATC
    GCGTCTATCTCTGCGCTGGACCGTTTCCTGGAAGCGGCGGAATCTTCTCAGGACCCGGCG
    GAACAGGTTCGTGCGCTGTACGCGTTCCACCTGAACGCGCCGGCGGTTCGTTCTATCGCG
    AACAAAGCGGTTCAGCGTTCTGACTCTCAGGAATGGCTGATCAAAGAACTGGACGCGGTT
    GACCACCTGGAATTCAACAAAGCGTTCCCGTTCTTCTCTGACACCGGTAAAAAAAAAAAA
    AAAGGTGCGAACTCTAACGGTGCGCCGTCTGAAGAAGAATACACCGAAACCGAATCTATC
    CAGCAGCCGGAAGACGCGGAACAGGAAGTTAACGGTCAGGAAGGTAACGGTGCGTCTAAA
    AACCAGAAAAAATTCCAGCGTATCCCGCGTTTCTTCGGTGAAGGTTCTCGTTCTGAATAC
    CGTATCCTGACCGAAGCGCCGCAGTACTTCGACATGTTCTGCAACAACATGCGTGCGATC
    TTCATGCAGCTGGAATCTCAGCCGCGTAAAGCGCCGCGTGACTTCAAATGCTTCCTGCAG
    AACCGTCTGCAGAAACTGTACAAACAGACCTTCCTGAACGCGCGTTCTAACAAATGCCGT
    GCGCTGCTGGAATCTGTTCTGATCTCTTGGGGTGAATTCTACACCTACGGTGCGAACGAA
    AAAAAATTCCGTCTGCGTCACGAAGCGTCTGAACGTTCTTCTGACCCGGACTACGTTGTT
    CAGCAGGCGCTGGAAATCGCGCGTCGTCTGTTCCTGTTCGGTTTCGAATGGCGTGACTGC
    TCTGCGGGTGAACGTGTTGACCTGGTTGAAATCCACAAAAAAGCGATCTCTTTCCTGCTG
    GCGATCACCCAGGCGGAAGTTTCTGTTGGTTCTTACAACTGGCTGGGTAACTCTACCGTT
    TCTCGTTACCTGTCTGTTGCGGGTACCGACACCCTGTACGGTACCCAGCTGGAAGAATTC
    CTGAACGCGACCGTTCTGTCTCAGATGCGTGGTCTGGCGATCCGTCTGTCTTCTCAGGAA
    CTGAAAGACGGTTTCGACGTTCAGCTGGAATCTTCTTGCCAGGACAACCTGCAGCACCTG
    CTGGTTTACCGTGCGTCTCGTGACCTGGCGGCGTGCAAACGTGCGACCTGCCCGGCGGAA
    CTGGACCCGAAAATCCTGGTTCTGCCGGTTGGTGCGTTCATCGCGTCTGTTATGAAAATG
    ATCGAACGTGGTGACGAACCGCTGGCGGGTGCGTACCTGCGTCACCGTCCGCACTCTTTC
    GGTTGGCAGATCCGTGTTCGTGGTGTTGCGGAAGTTGGTATGGACCAGGGTACCGCGCTG
    GCGTTCCAGAAACCGACCGAATCTGAACCGTTCAAAATCAAACCGTTCTCTGCGCAGTAC
    GGTCCGGTTCTGTGGCTGAACTCTTCTTCTTACTCTCAGTCTCAGTACCTGGACGGTTTC
    CTGTCTCAGCCGAAAAACTGGTCTATGCGTGTTCTGCCGCAGGCGGGTTCTGTTCGTGTT
    GAACAGCGTGTTGCGCTGATCTGGAACCTGCAGGCGGGTAAAATGCGTCTGGAACGTTCT
    GGTGCGCGTGCGTTCTTCATGCCGGTTCCGTTCTCTTTCCGTCCGTCTGGTTCTGGTGAC
    GAAGCGGTTCTGGCGCCGAACCGTTACCTGGGTCTGTTCCCGCACTCTGGTGGTATCGAA
    TACGCGGTTGTTGACGTTCTGGACTCTGCGGGTTTCAAAATCCTGGAACGTGGTACCATC
    GCGGTTAACGGTTTCTCTCAGAAACGTGGTGAACGTCAGGAAGAAGCGCACCGTGAAAAA
    CAGCGTCGTGGTATCTCTGACATCGGTCGTAAAAAACCGGTTCAGGCGGAAGTTGACGCG
    GCGAACGAACTGCACCGTAAATACACCGACGTTGCGACCCGTCTGGGTTGCCGTATCGTT
    GTTCAGTGGGCGCCGCAGCCGAAACCGGGTACCGCGCCGACCGCGCAGACCGTTTACGCG
    CGTGCGGTTCGTACCGAAGCGCCGCGTTCTGGTAACCAGGAAGACCACGCGCGTATGAAA
    TCTTCTTGGGGTTACACCTGGGGTACCTACTGGGAAAAACGTAAACCGGAAGACATCCTG
    GGTATCTCTACCCAGGTTTACTGGACCGGTGGTATCGGTGAATCTTGCCCGGCGGTTGCG
    GTTGCGCTGCTGGGTCACATCCGTGCGACCTCTACCCAGACCGAATGGGAAAAAGAAGAA
    GTTGTTTTCGGTCGTCTGAAAAAATTCTTCCCGTCTTAA
    SEQ ATGGAAAAACGTATCAACAAAATCCGTAAAAAACTGTCTGCGGACAACGCGACCAAACCG
    ID GTTTCTCGTTCTGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCGACGACCTGAAA
    NO: AAACGTCTGGAAAAACGTCGTAAAAAACCGGAAGTTATGCCGCAGGTTATCTCTAACAAC
    59 GCGGCGAACAACCTGCGTATGCTGCTGGACGACTACACCAAAATGAAAGAAGCGATCCTG
    CAGGTTTACTGGCAGGAATTCAAAGACGACCACGTTGGTCTGATGTGCAAATTCGCGCAG
    CCGGCGTCTAAAAAAATCGACCAGAACAAACTGAAACCGGAAATGGACGAAAAAGGTAAC
    CTGACCACCGCGGGTTTCGCGTGCTCTCAGTGCGGTCAGCCGCTGTTCGTTTACAAACTG
    GAACAGGTTTCTGAAAAAGGTAAAGCGTACACCAACTACTTCGGTCGTTGCAACGTTGCG
    GAACACGAAAAACTGATCCTGCTGGCGCAGCTGAAACCGGAAAAAGACTCTGACGAAGCG
    GTTACCTACTCTCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTT
    ACCAAAGAATCTACCCACCCGGTTAAACCGCTGGCGCAGATCGCGGGTAACCGTTACGCG
    TCTGGTCCGGTTGGTAAAGCGCTGTCTGACGCGTGCATGGGTACCATCGCGTCTTTCCTG
    TCTAAATACCAGGACATCATCATCGAACACCAGAAAGTTGTTAAAGGTAACCAGAAACGT
    CTGGAATCTCTGCGTGAACTGGCGGGTAAAGAAAACCTGGAATACCCGTCTGTTACCCTG
    CCGCCGCAGCCGCACACCAAAGAAGGTGTTGACGCGTACAACGAAGTTATCGCGCGTGTT
    CGTATGTGGGTTAACCTGAACCTGTGGCAGAAACTGAAACTGTCTCGTGACGACGCGAAA
    CCGCTGCTGCGTCTGAAAGGTTTCCCGTCTTTCCCGGTTGTTGAACGTCGTGAAAACGAA
    GTTGACTGGTGGAACACCATCAACGAAGTTAAAAAACTGATCGACGCGAAACGTGACATG
    GGTCGTGTTTTCTGGTCTGGTGTTACCGCGGAAAAACGTAACACCATCCTGGAAGGTTAC
    AACTACCTGCCGAACGAAAACGACCACAAAAAACGTGAAGGTTCTCTGGAAAACCCGAAA
    AAACCGGCGAAACGTCAGTTCGGTGACCTGCTGCTGTACCTGGAAAAAAAATACGCGGGT
    GACTGGGGTAAAGTTTTCGACGAAGCGTGGGAACGTATCGACAAAAAAATCGCGGGTCTG
    ACCTCTCACATCGAACGTGAAGAAGCGCGTAACGCGGAAGACGCGCAGTCTAAAGCGGTT
    CTGACCGACTGGCTGCGTGCGAAAGCGTCTTTCGTTCTGGAACGTCTGAAAGAAATGGAC
    GAAAAAGAATTCTACGCGTGCGAAATCCAGCTGCAGAAATGGTACGGTGACCTGCGTGGT
    AACCCGTTCGCGGTTGAAGCGGAAAACCGTGTTGTTGACATCTCTGGTTTCTCTATCGGT
    TCTGACGGTCACTCTATCCAGTACCGTAACCTGCTGGCGTGGAAATACCTGGAAAACGGT
    AAACGTGAATTCTACCTGCTGATGAACTACGGTAAAAAAGGTCGTATCCGTTTCACCGAC
    GGTACCGACATCAAAAAATCTGGTAAATGGCAGGGTCTGCTGTACGGTGGTGGTAAAGCG
    AAAGTTATCGACCTGACCTTCGACCCGGACGACGAACAGCTGATCATCCTGCCGCTGGCG
    TTCGGTACCCGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGTCTCTGGAAACCGGT
    CTGATCAAACTGGCGAACGGTCGTGTTATCGAAAAAACCATCTACAACAAAAAAATCGGT
    CGTGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTGTTGACCCG
    TCTAACATCAAACCGGTTAACCTGATCGGTGTTGACCGTGGTGAAAACATCCCGGCGGTT
    ATCGCGCTGACCGACCCGGAAGGTTGCCCGCTGCCGGAATTCAAAGACTCTTCTGGTGGT
    CCGACCGACATCCTGCGTATCGGTGAAGGTTACAAAGAAAAACAGCGTGCGATCCAGGCG
    GCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAAATTCGCGTCTAAATCT
    CGTAACCTGGCGGACGACATGGTTCGTAACTCTGCGCGTGACCTGTTCTACCACGCGGTT
    ACCCACGACGCGGTTCTGGTTTTCGAAAACCTGTCTCGTGGTTTCGGTCGTCAGGGTAAA
    CGTACCTTCATGACCGAACGTCAGTACACCAAAATGGAAGACTGGCTGACCGCGAAACTG
    GCGTACGAAGGTCTGACCTCTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCTCT
    AAAACCTGCTCTAACTGCGGTTTCACCATCACCACCGCGGACTACGACGGTATGCTGGTT
    CGTCTGAAAAAAACCTCTGACGGTTGGGCGACCACCCTGAACAACAAAGAACTGAAAGCG
    GAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGACCGTTGAAAAAGAACTGTCT
    GCGGAACTGGACCGTCTGTCTGAAGAATCTGGTAACAACGACATCTCTAAATGGACCAAA
    GGTCGTCGTGACGAAGCGCTGTTCCTGCTGAAAAAACGTTTCTCTCACCGTCCGGTTCAG
    GAACAGTTCGTTTGCCTGGACTGCGGTCACGAAGTTCACGCGGACGAACAGGCGGCGCTG
    AACATCGCGCGTTCTTGGCTGTTCCTGAACTCTAACTCTACCGAATTCAAATCTTACAAA
    TCTGGTAAACAGCCGTTCGTTGGTGCGTGGCAGGCGTTCTACAAACGTCGTCTGAAAGAA
    GTTTGGAAACCGAACGCG
    SEQ ATGAAACGTATCAACAAAATCCGTCGTCGTCTGGTTAAAGACTCTAACACCAAAAAAGCG
    ID GGTAAAACCGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCCCGGACCTGCGTGAA
    NO: CGTCTGGAAAACCTGCGTAAAAAACCGGAAAACATCCCGCAGCCGATCTCTAACACCTCT
    60 CGTGCGAACCTGAACAAACTGCTGACCGACTACACCGAAATGAAAAAAGCGATCCTGCAC
    GTTTACTGGGAAGAATTCCAGAAAGACCCGGTTGGTCTGATGTCTCGTGTTGCGCAGCCG
    GCGCCGAAAAACATCGACCAGCGTAAACTGATCCCGGTTAAAGACGGTAACGAACGTCTG
    ACCTCTTCTGGTTTCGCGTGCTCTCAGTGCTGCCAGCCGCTGTACGTTTACAAACTGGAA
    CAGGTTAACGACAAAGGTAAACCGCACACCAACTACTTCGGTCGTTGCAACGTTTCTGAA
    CACGAACGTCTGATCCTGCTGTCTCCGCACAAACCGGAAGCGAACGACGAACTGGTTACC
    TACTCTCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCCGT
    GAATCTAACCACCCGGTTAAACCGCTGGAACAGATCGGTGGTAACTCTTGCGCGTCTGGT
    CCGGTTGGTAAAGCGCTGTCTGACGCGTGCATGGGTGCGGTTGCGTCTTTCCTGACCAAA
    TACCAGGACATCATCCTGGAACACCAGAAAGTTATCAAAAAAAACGAAAAACGTCTGGCG
    AACCTGAAAGACATCGCGTCTGCGAACGGTCTGGCGTTCCCGAAAATCACCCTGCCGCCG
    CAGCCGCACACCAAAGAAGGTATCGAAGCGTACAACAACGTTGTTGCGCAGATCGTTATC
    TGGGTTAACCTGAACCTGTGGCAGAAACTGAAAATCGGTCGTGACGAAGCGAAACCGCTG
    CAGCGTCTGAAAGGTTTCCCGTCTTTCCCGCTGGTTGAACGTCAGGCGAACGAAGTTGAC
    TGGTGGGACATGGTTTGCAACGTTAAAAAACTGATCAACGAAAAAAAAGAAGACGGTAAA
    GTTTTCTGGCAGAACCTGGCGGGTTACAAACGTCAGGAAGCGCTGCTGCCGTACCTGTCT
    TCTGAAGAAGACCGTAAAAAAGGTAAAAAATTCGCGCGTTACCAGTTCGGTGACCTGCTG
    CTGCACCTGGAAAAAAAACACGGTGAAGACTGGGGTAAAGTTTACGACGAAGCGTGGGAA
    CGTATCGACAAAAAAGTTGAAGGTCTGTCTAAACACATCAAACTGGAAGAAGAACGTCGT
    TCTGAAGACGCGCAGTCTAAAGCGGCGCTGACCGACTGGCTGCGTGCGAAAGCGTCTTTC
    GTTATCGAAGGTCTGAAAGAAGCGGACAAAGACGAATTCTGCCGTTGCGAACTGAAACTG
    CAGAAATGGTACGGTGACCTGCGTGGTAAACCGTTCGCGATCGAAGCGGAAAACTCTATC
    CTGGACATCTCTGGTTTCTCTAAACAGTACAACTGCGCGTTCATCTGGCAGAAAGACGGT
    GTTAAAAAACTGAACCTGTACCTGATCATCAACTACTTCAAAGGTGGTAAACTGCGTTTC
    AAAAAAATCAAACCGGAAGCGTTCGAAGCGAACCGTTTCTACACCGTTATCAACAAAAAA
    TCTGGTGAAATCGTTCCGATGGAAGTTAACTTCAACTTCGACGACCCGAACCTGATCATC
    CTGCCGCTGGCGTTCGGTAAACGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGTCT
    CTGGAAACCGGTTCTCTGAAACTGGCGAACGGTCGTGTTATCGAAAAAACCCTGTACAAC
    CGTCGTACCCGTCAGGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAA
    GTTCTGGACTCTTCTAACATCAAACCGATGAACCTGATCGGTATCGACCGTGGTGAAAAC
    ATCCCGGCGGTTATCGCGCTGACCGACCCGGAAGGTTGCCCGCTGTCTCGTTTCAAAGAC
    TCTCTGGGTAACCCGACCCACATCCTGCGTATCGGTGAATCTTACAAAGAAAAACAGCGT
    ACCATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAAATAC
    GCGTCTAAAGCGAAAAACCTGGCGGACGACATGGTTCGTAACACCGCGCGTGACCTGCTG
    TACTACGCGGTTACCCAGGACGCGATGCTGATCTTCGAAAACCTGTCTCGTGGTTTCGGT
    CGTCAGGGTAAACGTACCTTCATGGCGGAACGTCAGTACACCCGTATGGAAGACTGGCTG
    ACCGCGAAACTGGCGTACGAAGGTCTGCCGTCTAAAACCTACCTGTCTAAAACCCTGGCG
    CAGTACACCTCTAAAACCTGCTCTAACTGCGGTTTCACCATCACCTCTGCGGACTACGAC
    CGTGTTCTGGAAAAACTGAAAAAAACCGCGACCGGTTGGATGACCACCATCAACGGTAAA
    GAACTGAAAGTTGAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGAACGTTGTT
    AAAGACCTGTCTGTTGAACTGGACCGTCTGTCTGAAGAATCTGTTAACAACGACATCTCT
    TCTTGGACCAAAGGTCGTTCTGGTGAAGCGCTGTCTCTGCTGAAAAAACGTTTCTCTCAC
    CGTCCGGTTCAGGAAAAATTCGTTTGCCTGAACTGCGGTTTCGAAACCCACGCGGACGAA
    CAGGCGGCGCTGAACATCGCGCGTTCTTGGCTGTTCCTGCGTTCTCAGGAATACAAAAAA
    TACCAGACCAACAAAACCACCGGTAACACCGACAAACGTGCGTTCGTTGAAACCTGGCAG
    TCTTTCTACCGTAAAAAACTGAAAGAAGTTTGGAAACCG
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    61 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGGGT
    AAAATGTATTACCTTGGTTTAGACATTGGCACGAATTCCGTGGGCTACGCGGTGACCGAC
    CCCTCATACCACCTGCTGAAGTTTAAGGGGGAACCAATGTGGGGTGCGCACGTATTTGCC
    GCCGGTAATCAGAGCGCGGAACGACGCTCGTTCCGCACATCGCGTCGTCGTTTGGACCGA
    CGCCAACAGCGCGTTAAACTGGTACAGGAGATTTTTGCCCCGGTGATTAGTCCGATCGAC
    CCACGCTTCTTCATTCGTCTGCATGAATCCGCCCTGTGGCGCGATGACGTCGCGGAGACG
    GATAAACATATCTTTTTCAATGATCCTACCTATACCGATAAGGAATATTATAGCGATTAC
    CCGACTATCCATCACCTGATCGTTGATCTGATGGAAAGCTCTGAGAAACACGATCCGCGG
    CTGGTGTACCTTGCAGTGGCGTGGTTAGTGGCACACCGTGGTCATTTTCTGAACGAGGTG
    GACAAGGATAATATTGGAGATGTGTTGTCGTTCGACGCATTTTATCCGGAGTTTCTCGCG
    TTCCTGTCGGACAACGGTGTATCACCGTGGGTGTGCGAAAGCAAAGCGCTGCAGGCGACC
    TTGCTGAGCCGTAACTCAGTGAACGACAAATATAAAGCCCTTAAGTCTCTGATCTTCGGA
    TCCCAGAAACCTGAAGATAACTTCGATGCCAATATTTCGGAAGATGGACTCATTCAACTG
    CTGGCCGGCAAAAAGGTAAAAGTTAACAAACTGTTCCCTCAGGAATCGAACGATGCATCC
    TTCACATTGAATGATAAAGAAGACGCGATAGAAGAAATCCTGGGTACGCTTACACCAGAT
    GAATGTGAATGGATTGCGCATATACGCCGCCTTTTTGACTGGGCTATCATGAAACATGCT
    CTGAAAGATGGCAGGACTATTAGCGAGTCAAAAGTCAAACTGTATGAGCAGCACCATCAC
    GATCTGACCCAACTTAAATACTTCGTGAAAACCTACCTTGCAAAAGAATACGACGATATT
    TTCCGCAACGTGGATAGCGAAACAACGAAAAACTATGTAGCGTATTCCTATCATGTGAAA
    GAGGTGAAAGGCACTCTGCCTAAAAATAAGGCAACGCAAGAAGAGTTTTGTAAGTATGTC
    CTGGGCAAGGTTAAAAACATTGAATGCTCTGAAGCAGACAAGGTTGACTTTGATGAGATG
    ATTCAGCGTCTTACCGACAACTCTTTTATGCCTAAGCAGGTTTCGGGCGAAAACCGCGTT
    ATTCCTTATCAGTTATATTATTATGAACTGAAGACAATTCTGAATAAAGCAGCCTCGTAC
    CTGCCTTTCCTGACGCAGTGTGGAAAAGATGCAATTTCGAACCAGGACAAACTACTGTCG
    ATCATGACGTTCCGTATTCCTTACTTCGTCGGACCCTTGCGAAAAGATAATTCGGAACAT
    GCATGGCTCGAACGAAAGGCCGGTAAGATTTATCCGTGGAACTTTAACGACAAAGTGGAC
    TTGGATAAATCAGAAGAAGCGTTCATTCGCCGAATGACCAATACCTGTACCTATTATCCC
    GGCGAAGATGTTTTACCGTTGGATTCGCTGATCTATGAGAAATTTATGATTTTAAATGAA
    ATCAATAATATTCGTATTGACGGCTACCCGATTAGTGTTGACGTTAAACAGCAGGTTTTT
    GGCTTGTTCGAAAAAAAACGACGCGTAACCGTGAAAGATATTCAGAACCTGCTGCTGTCT
    CTCGGAGCTCTGGACAAACACGGGAAGCTGACAGGCATCGATACCACTATCCACTCAAAC
    TATAATACGTATCACCATTTTAAATCTCTCATGGAACGCGGCGTCCTGACCCGGGATGAC
    GTGGAACGCATCGTTGAAAGGATGACCTACAGCGACGATACTAAGCGTGTGCGTCTGTGG
    CTGAATAACAACTATGGTACTTTAACCGCCGACGATGTGAAACACATTTCGCGTCTGCGC
    AAACACGATTTTGGCCGTTTATCCAAAATGTTCTTAACAGGTCTGAAGGGTGTCCATAAG
    GAGACCGGTGAACGTGCCTCCATACTGGATTTCATGTGGAACACGAACGATAACCTGATG
    CAGCTCCTTTCCGAATGCTACACGTTCAGTGATGAAATCACAAAGCTGCAAGAGGCGTAT
    TATGCAAAAGCCCAGTTGTCTTTAAACGATTTTTTAGACTCGATGTACATCTCTAACGCG
    GTGAAACGTCCGATTTACAGAACTCTGGCAGTGGTGAACGATATTCGAAAAGCATGTGGG
    ACGGCCCCTAAACGCATTTTCATCGAAATGGCTCGTGATGGTGAATCAAAAAAAAAGAGA
    AGTGTTACACGTCGCGAGCAGATCAAAAACCTGTACCGCTCGATTCGTAAAGATTTCCAG
    CAGGAAGTTGATTTTCTGGAAAAGATCCTGGAAAATAAATCTGATGGTCAACTTCAGTCA
    GATGCTTTGTATCTTTACTTTGCACAATTAGGGCGCGATATGTACACGGGCGATCCAATA
    AAGCTGGAGCACATCAAAGATCAGAGTTTCTATAACATAGACCATATTTACCCGCAGTCT
    ATGGTGAAAGACGATTCCCTAGATAACAAAGTGCTGGTGCAAAGCGAAATTAACGGCGAG
    AAAAGCTCGCGATACCCTTTGGACGCCGCGATCCGCAATAAAATGAAGCCCCTTTGGGAC
    GCTTACTATAATCATGGCCTGATCTCCTTAAAGAAATACCAGCGTCTAACGCGCTCGACC
    CCGTTTACCGATGATGAAAAATGGGACTTTATTAATCGCCAGTTAGTGGAAACCCGTCAA
    TCTACCAAAGCGCTGGCCATTTTGTTGAAGCGTAAGTTTCCAGACACCGAAATTGTGTAT
    TCGAAGGCGGGGTTATCGTCCGACTTCAGACATGAATTCGGCCTTGTAAAAAGTCGCAAT
    ATTAATGATTTGCACCACGCTAAAGACGCATTCTTGGCTATCGTTACCGGCAATGTGTAC
    CATGAAAGATTCAATCGCAGATGGTTTATGGTGAACCAGCCGTACTCAGTTAAAACTAAA
    ACTCTTTTTACCCACAGCATAAAGAATGGCAACTTCGTTGCCTGGAACGGCGAAGAAGAT
    CTCGGTCGTATTGTAAAAATGCTGAAGCAAAACAAAAATACCATTCACTTCACGCGCTTC
    TCCTTCGATCGCAAAGAAGGATTATTTGATATCCAACCTCTGAAAGCCAGCACCGGCTTA
    GTCCCACGAAAAGCCGGTCTGGATGTCGTTAAATACGGCGGATATGACAAATCTACCGCG
    GCCTATTACCTGCTGGTGAGGTTCACGCTCGAGGACAAGAAAACCCAGCACAAGCTGATG
    ATGATTCCTGTAGAAGGCCTGTACAAGGCTCGCATTGATCATGACAAGGAATTTCTTACC
    GATTATGCGCAAACGACTATAAGCGAAATCCTACAGAAAGATAAACAGAAAGTGATCAAT
    ATTATGTTTCCAATGGGTACGAGGCATATAAAACTCAATTCAATGATTAGTATCGATGGC
    TTCTATCTTAGTATCGGCGGAAAGTCCTCTAAAGGTAAGTCAGTTCTATGTCACGCAATG
    GTTCCACTGATCGTCCCTCACAAAATCGAATGTTACATTAAAGCAATGGAAAGCTTCGCC
    CGGAAGTTTAAAGAAAACAACAAGCTGCGCATCGTAGAAAAATTCGATAAAATCACCGTT
    GAAGACAACCTGAATCTCTACGAGCTCTTTCTCCAAAAACTGCAGCATAATCCCTATAAT
    AAGTTTTTTTCGACACAGTTTGACGTACTGACGAACGGCCGTTCTACTTTCACAAAACTG
    TCGCCGGAGGAACAGGTACAGACGCTCTTGAACATTTTAAGTATCTTTAAAACATGCCGC
    AGTTCGGGTTGCGACCTGAAATCCATCAACGGCAGTGCCCAGGCAGCGCGCATCATGATT
    AGCGCTGACTTAACTGGACTGTCGAAAAAATATTCAGATATTAGGTTGGTTGAACAGTCA
    GCTTCTGGTTTGTTCGTATCCAAAAGTCAGAACTTACTGGAGTATCTCTAAGAAATCATC
    CTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGAT
    GCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCAT
    TCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACT
    CAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    62 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGTCA
    TCGCTCACGAAATTCACTAACAAATACTCTAAACAGCTCACCATTAAGAATGAACTCATC
    CCAGTTGGCAAAACACTGGAGAACATCAAAGAGAATGGTCTGATAGATGGCGACGAACAG
    CTGAATGAGAATTATCAGAAGGCGAAAATTATTGTGGATGATTTTCTGCGGGACTTCATT
    AATAAAGCACTGAATAATACGCAGATCGGGAACTGGCGCGAACTGGCGGATGCCCTTAAT
    AAAGAGGATGAAGATAACATCGAGAAATTGCAGGATAAAATTCGGGGAATCATTGTATCC
    AAATTTGAAACGTTTGATCTGTTTAGCAGCTATTCTATTAAGAAAGATGAAAAGATTATT
    GACGACGACAATGATGTTGAAGAAGAGGAACTGGATCTGGGCAAGAAGACCAGCTCATTT
    AAATACATATTTAAAAAAAACCTGTTTAAGTTAGTGTTGCCATCCTACCTGAAAACCACA
    AACCAGGACAAGCTGAAGATTATTAGCTCGTTTGATAATTTTTCAACGTACTTCCGCGGG
    TTCTTTGAAAACCGGAAAAACATTTTTACCAAGAAACCGATCTCCACAAGTATTGCGTAT
    CGCATTGTTCATGATAACTTCCCGAAATTCCTTGATAACATTCGTTGTTTTAATGTGTGG
    CAGACGGAATGCCCGCAACTAATCGTGAAAGCAGATAACTATCTGAAAAGCAAAAATGTT
    ATAGCGAAAGATAAAAGTTTGGCAAACTATTTTACCGTGGGCGCGTATGACTATTTCCTG
    TCTCAGAATGGTATAGATTTTTACAACAATATTATAGGTGGACTGCCAGCGTTCGCCGGC
    CATGAGAAAATCCAAGGTCTCAATGAATTCATCAATCAAGAGTGCCAAAAAGACAGCGAG
    CTGAAAAGTAAGCTGAAAAACCGTCACGCGTTCAAAATGGCGGTACTGTTCAAACAGATA
    CTCAGCGATCGTGAAAAAAGTTTTGTAATTGATGAGTTCGAGTCGGATGCTCAAGTTATT
    GACGCCGTTAAAAACTTTTACGCCGAACAGTGCAAAGATAACAATGTTATTTTTAACTTA
    TTAAATCTTATCAAGAATATCGCTTTCTTAAGTGATGACGAACTGGACGGCATATTCATT
    GAAGGGAAATACCTGTCGAGCGTTAGTCAAAAACTCTATAGCGATTGGTCAAAATTACGT
    AACGACATTGAGGATTCGGCTAACTCTAAACAAGGCAATAAAGAGCTGGCCAAGAAGATC
    AAAACCAACAAAGGGGATGTAGAAAAAGCGATCTCGAAATATGAGTTCTCGCTGTCGGAA
    CTGAACTCGATTGTACATGATAACACCAAGTTTTCTGACCTCCTTAGTTGTACACTGCAT
    AAGGTGGCTTCTGAGAAACTGGTGAAGGTCAATGAAGGCGACTGGCCGAAACATCTCAAG
    AATAATGAAGAGAAACAAAAAATCAAAGAGCCGCTTGATGCTCTGCTGGAGATCTATAAT
    ACACTTCTGATTTTTAACTGCAAAAGCTTCAATAAAAACGGCAACTTCTATGTCGACTAT
    GATCGTTGCATCAATGAACTGAGTTCGGTCGTGTATCTGTATAATAAAACACGTAACTAT
    TGCACTAAAAAACCCTATAACACGGACAAGTTCAAACTCAATTTTAACAGTCCGCAGCTC
    GGTGAAGGCTTTTCCAAGTCGAAAGAAAATGACTGTCTGACTCTTTTGTTTAAAAAAGAC
    GACAACTATTATGTAGGCATTATCCGCAAAGGTGCAAAAATCAATTTTGATGATACACAA
    GCAATCGCCGATAACACCGACAATTGCATCTTTAAAATGAATTATTTCCTACTTAAAGAC
    GCAAAAAAATTTATCCCGAAATGTAGCATTCAGCTGAAAGAAGTCAAGGCCCATTTTAAG
    AAATCTGAAGATGATTACATTTTGTCTGATAAAGAGAAATTTGCTAGCCCGCTGGTCATT
    AAAAAGAGCACATTTTTGCTGGCAACTGCACATGTGAAAGGGAAAAAAGGCAATATCAAG
    AAATTTCAGAAAGAATATTCGAAAGAAAACCCCACTGAGTATCGCAATTCTTTAAACGAA
    TGGATTGCTTTTTGTAAAGAGTTCTTAAAAACTTATAAAGCGGCTACCATTTTTGATATA
    ACCACATTGAAAAAGGCAGAGGAATATGCTGATATTGTAGAATTCTACAAGGATGTCGAT
    AATCTGTGCTACAAACTGGAGTTCTGCCCGATTAAAACCTCGTTTATAGAAAACCTGATA
    GATAACGGCGACCTGTATCTGTTTCGCATCAATAACAAAGACTTCAGCAGTAAATCGACC
    GGCACCAAGAACCTTCATACGTTATATTTACAAGCTATATTCGATGAACGTAATCTGAAC
    AATCCGACAATTATGCTGAATGGGGGAGCAGAACTGTTCTATCGTAAAGAAAGTATTGAG
    CAGAAAAACCGTATCACACACAAAGCCGGTTCAATTCTCGTGAATAAGGTGTGTAAAGAC
    GGTACAAGCCTGGATGATAAGATACGTAATGAAATTTATCAATATGAGAATAAATTTATT
    GATACCCTGTCTGATGAAGCTAAAAAGGTGTTACCGAATGTCATTAAAAAGGAAGCTACC
    CATGACATTACAAAAGATAAACGTTTCACTAGTGACAAATTCTTCTTTCACTGCCCCCTG
    ACAATTAATTATAAGGAAGGCGATACCAAGCAGTTCAATAACGAAGTGCTGAGTTTTCTG
    CGTGGAAATCCTGACATCAACATTATCGGCATTGACCGCGGAGAGCGTAATTTAATCTAT
    GTAACGGTTATAAACCAGAAAGGCGAGATTCTGGATTCGGTTTCATTCAATACCGTGACC
    AACAAGAGTTCAAAAATCGAGCAGACAGTCGATTATGAAGAGAAATTGGCAGTCCGCGAG
    AAAGAGAGGATTGAAGCAAAACGTTCCTGGGACTCTATCTCAAAAATTGCGACACTAAAG
    GAAGGTTATCTGAGCGCAATAGTTCACGAGATCTGTCTGTTAATGATTAAACACAACGCG
    ATCGTTGTCTTAGAGAATCTTAATGCAGGCTTTAAGCGTATTCGTGGCGGTTTATCAGAA
    AAAAGTGTTTATCAAAAATTCGAAAAAATGTTGATTAACAAACTGAACTATTTTGTCAGC
    AAGAAGGAATCCGACTGGAATAAACCGTCTGGTCTGCTGAATGGACTGCAGCTTTCGGAT
    CAGTTTGAAAGCTTCGAAAAACTGGGTATTCAGTCTGGTTTTATTTTTTACGTGCCGGCT
    GCATATACCTCAAAGATTGATCCGACCACGGGCTTCGCCAATGTTCTGAATCTGTCGAAG
    GTACGCAATGTTGATGCGATCAAAAGCTTTTTTTCTAACTTCAACGAAATTAGTTATAGC
    AAGAAAGAAGCCCTTTTCAAATTCTCATTCGATCTGGATTCACTGAGTAAGAAAGGCTTT
    AGTAGCTTTGTGAAATTTAGTAAGAGTAAATGGAACGTCTACACCTTTGGAGAACGTATC
    ATAAAGCCAAAGAATAAGCAAGGTTATCGGGAGGACAAAAGAATCAACTTGACCTTCGAG
    ATGAAGAAGTTACTTAACGAGTATAAGGTTTCTTTTGATCTTGAAAATAACTTGATTCCG
    AATCTCACGAGTGCCAACCTGAAGGATACTTTTTGGAAAGAGCTATTCTTTATCTTCAAG
    ACTACGCTGCAGCTCCGTAACAGCGTTACTAACGGTAAAGAAGATGTGCTCATCTCTCCG
    GTCAAAAATGCGAAGGGTGAATTCTTCGTTTCGGGAACGCATAACAAGACTCTTCCGCAA
    GATTGCGATGCGAACGGTGCATACCATATTGCGTTGAAAGGTCTGATGATACTCGAACGT
    AACAACCTTGTACGTGAGGAGAAAGATACGAAAAAGATTATGGCGATTTCAAACGTGGAT
    TGGTTCGAGTACGTGCAGAAACGTAGAGGCGTTCTGTAAGAAATCATCCTTAGCGAAAGC
    TAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGATGCTGTTTTTAGT
    TTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCATTCATAATAAGTA
    TGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAG
    AGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    63 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATAACAACTACGACGAATTCACCAAACTGTACCCGATCCAGAAAACCATCCGTTTCG
    AACTGAAACCGCAGGGTCGTACCATGGAACACCTGGAAACCTTCAACTTCTTCGAAGAAG
    ACCGTGACCGTGCGGAAAAATACAAAATCCTGAAAGAAGCGATCGACGAATACCACAAAA
    AATTCATCGACGAACACCTGACCAACATGTCTCTGGACTGGAACTCTCTGAAACAGATCT
    CTGAAAAATACTACAAATCTCGTGAAGAAAAAGACAAAAAAGTTTTCCTGTCTGAACAGA
    AACGTATGCGTCAGGAAATCGTTTCTGAATTCAAAAAAGACGACCGTTTCAAAGACCTGT
    TCTCTAAAAAACTGTTCTCTGAACTGCTGAAAGAAGAAATCTACAAAAAAGGTAACCACC
    AGGAAATCGACGCGCTGAAATCTTTCGACAAATTCTCTGGTTACTTCATCGGTCTGCACG
    AAAACCGTAAAAACATGTACTCTGACGGTGACGAAATCACCGCGATCTCTAACCGTATCG
    TTAACGAAAACTTCCCGAAATTCCTGGACAACCTGCAGAAATACCAGGAAGCGCGTAAAA
    AATACCCGGAATGGATCATCAAAGCGGAATCTGCGCTGGTTGCGCACAACATCAAAATGG
    ACGAAGTTTTCTCTCTGGAATACTTCAACAAAGTTCTGAACCAGGAAGGTATCCAGCGTT
    ACAACCTGGCGCTGGGTGGTTACGTTACCAAATCTGGTGAAAAAATGATGGGTCTGAACG
    ACGCGCTGAACCTGGCGCACCAGTCTGAAAAATCTTCTAAAGGTCGTATCCACATGACCC
    CGCTGTTCAAACAGATCCTGTCTGAAAAAGAATCTTTCTCTTACATCCCGGACGTTTTCA
    CCGAAGACTCTCAGCTGCTGCCGTCTATCGGTGGTTTCTTCGCGCAGATCGAAAACGACA
    AAGACGGTAACATCTTCGACCGTGCGCTGGAACTGATCTCTTCTTACGCGGAATACGACA
    CCGAACGTATCTACATCCGTCAGGCGGACATCAACCGTGTTTCTAACGTTATCTTCGGTG
    AATGGGGTACCCTGGGTGGTCTGATGCGTGAATACAAAGCGGACTCTATCAACGACATCA
    ACCTGGAACGTACCTGCAAAAAAGTTGACAAATGGCTGGACTCTAAAGAATTCGCGCTGT
    CTGACGTTCTGGAAGCGATCAAACGTACCGGTAACAACGACGCGTTCAACGAATACATCT
    CTAAAATGCGTACCGCGCGTGAAAAAATCGACGCGGCGCGTAAAGAAATGAAATTCATCT
    CTGAAAAAATCTCTGGTGACGAAGAATCTATCCACATCATCAAAACCCTGCTGGACTCTG
    TTCAGCAGTTCCTGCACTTCTTCAACCTGTTCAAAGCGCGTCAGGACATCCCGCTGGACG
    GTGCGTTCTACGCGGAATTCGACGAAGTTCACTCTAAACTGTTCGCGATCGTTCCGCTGT
    ACAACAAAGTTCGTAACTACCTGACCAAAAACAACCTGAACACCAAAAAAATCAAACTGA
    ACTTCAAAAACCCGACCCTGGCGAACGGTTGGGACCAGAACAAAGTTTACGACTACGCGT
    CTCTGATCTTCCTGCGTGACGGTAACTACTACCTGGGTATCATCAACCCGAAACGTAAAA
    AAAACATCAAATTCGAACAGGGTTCTGGTAACGGTCCGTTCTACCGTAAAATGGTTTACA
    AACAGATCCCGGGTCCGAACAAAAACCTGCCGCGTGTTTTCCTGACCTCTACCAAAGGTA
    AAAAAGAATACAAACCGTCTAAAGAAATCATCGAAGGTTACGAAGCGGACAAACACATCC
    GTGGTGACAAATTCGACCTGGACTTCTGCCACAAACTGATCGACTTCTTCAAAGAATCTA
    TCGAAAAACACAAAGACTGGTCTAAATTCAACTTCTACTTCTCTCCGACCGAATCTTACG
    GTGACATCTCTGAATTCTACCTGGACGTTGAAAAACAGGGTTACCGTATGCACTTCGAAA
    ACATCTCTGCGGAAACCATCGACGAATACGTTGAAAAAGGTGACCTGTTCCTGTTCCAGA
    TCTACAACAAAGACTTCGTTAAAGCGGCGACCGGTAAAAAAGACATGCACACCATCTACT
    GGAACGCGGCGTTCTCTCCGGAAAACCTGCAGGACGTTGTTGTTAAACTGAACGGTGAAG
    CGGAACTGTTCTACCGTGACAAATCTGACATCAAAGAAATCGTTCACCGTGAAGGTGAAA
    TCCTGGTTAACCGTACCTACAACGGTCGTACCCCGGTTCCGGACAAAATCCACAAAAAAC
    TGACCGACTACCACAACGGTCGTACCAAAGACCTGGGTGAAGCGAAAGAATACCTGGACA
    AAGTTCGTTACTTCAAAGCGCACTACGACATCACCAAAGACCGTCGTTACCTGAACGACA
    AAATCTACTTCCACGTTCCGCTGACCCTGAACTTCAAAGCGAACGGTAAAAAAAACCTGA
    ACAAAATGGTTATCGAAAAATTCCTGTCTGACGAAAAAGCGCACATCATCGGTATCGACC
    GTGGTGAACGTAACCTGCTGTACTACTCTATCATCGACCGTTCTGGTAAAATCATCGACC
    AGCAGTCTCTGAACGTTATCGACGGTTTCGACTACCGTGAAAAACTGAACCAGCGTGAAA
    TCGAAATGAAAGACGCGCGTCAGTCTTGGAACGCGATCGGTAAAATCAAAGACCTGAAAG
    AAGGTTACCTGTCTAAAGCGGTTCACGAAATCACCAAAATGGCGATCCAGTACAACGCGA
    TCGTTGTTATGGAAGAACTGAACTACGGTTTCAAACGTGGTCGTTTCAAAGTTGAAAAAC
    AGATCTACCAGAAATTCGAAAACATGCTGATCGACAAAATGAACTACCTGGTTTTCAAAG
    ACGCGCCGGACGAATCTCCGGGTGGTGTTCTGAACGCGTACCAGCTGACCAACCCGCTGG
    AATCTTTCGCGAAACTGGGTAAACAGACCGGTATCCTGTTCTACGTTCCGGCGGCGTACA
    CCTCTAAAATCGACCCGACCACCGGTTTCGTTAACCTGTTCAACACCTCTTCTAAAACCA
    ACGCGCAGGAACGTAAAGAATTCCTGCAGAAATTCGAATCTATCTCTTACTCTGCGAAAG
    ACGGTGGTATCTTCGCGTTCGCGTTCGACTACCGTAAATTCGGTACCTCTAAAACCGACC
    ACAAAAACGTTTGGACCGCGTACACCAACGGTGAACGTATGCGTTACATCAAAGAAAAAA
    AACGTAACGAACTGTTCGACCCGTCTAAAGAAATCAAAGAAGCGCTGACCTCTTCTGGTA
    TCAAATACGACGGTGGTCAGAACATCCTGCCGGACATCCTGCGTTCTAACAACAACGGTC
    TGATCTACACCATGTACTCTTCTTTCATCGCGGCGATCCAGATGCGTGTTTACGACGGTA
    AAGAAGACTACATCATCTCTCCGATCAAAAACTCTAAAGGTGAATTCTTCCGTACCGACC
    CGAAACGTCGTGAACTGCCGATCGACGCGGACGCGAACGGTGCGTACAACATCGCGCTGC
    GTGGTGAACTGACCATGCGTGCGATCGCGGAAAAATTCGACCCGGACTCTGAAAAAATGG
    CGAAACTGGAACTGAAACACAAAGACTGGTTCGAATTCATGCAGACCCGTGGTGACTAAG
    AAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGG
    TTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    64 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGACT
    AAAACATTTGATTCAGAGTTTTTTAATTTGTACTCGCTGCAAAAAACGGTACGCTTTGAG
    TTAAAACCCGTGGGAGAAACCGCGTCATTTGTGGAAGACTTTAAAAACGAGGGCTTGAAA
    CGTGTTGTGAGCGAAGATGAAAGGCGAGCCGTCGATTACCAGAAAGTTAAGGAAATAATT
    GACGATTACCATCGGGATTTCATTGAAGAAAGTTTAAATTATTTTCCGGAACAGGTGAGT
    AAAGATGCTCTTGAGCAGGCGTTTCATCTTTATCAGAAACTGAAGGCAGCAAAAGTTGAG
    GAAAGGGAAAAAGCGCTGAAAGAATGGGAAGCGCTGCAGAAAAAGCTACGTGAAAAAGTG
    GTGAAATGCTTCTCGGACTCGAATAAAGCCCGCTTCTCAAGGATTGATAAAAAGGAACTG
    ATTAAGGAAGACCTGATAAATTGGTTGGTCGCCCAGAATCGCGAGGATGATATCCCTACG
    GTCGAAACGTTTAACAACTTCACCACATATTTTACCGGCTTCCATGAGAATCGTAAAAAT
    ATTTACTCCAAAGATGATCACGCCACCGCTATTAGCTTTCGCCTTATTCATGAAAATCTT
    CCAAAGTTTTTTGACAACGTGATTAGCTTCAATAAGTTGAAAGAGGGTTTCCCTGAATTA
    AAATTTGATAAAGTGAAAGAGGATTTAGAAGTAGATTATGATCTGAAGCATGCGTTTGAA
    ATAGAATATTTCGTTAACTTCGTGACCCAAGCGGGCATAGATCAGTATAATTATCTGTTA
    GGAGGGAAAACCCTGGAGGACGGGACGAAAAAACAAGGGATGAATGAGCAAATTAATCTG
    TTCAAACAACAGCAAACGCGAGATAAAGCGCGTCAGATTCCCAAACTGATCCCCCTGTTC
    AAACAGATTCTTAGCGAAAGGACTGAAAGCCAGTCCTTTATTCCTAAACAATTTGAAAGT
    GATCAGGAGTTGTTCGATTCACTGCAGAAGTTACATAATAACTGCCAGGATAAATTCACC
    GTGCTGCAACAAGCCATTCTCGGTCTGGCAGAGGCGGATCTTAAGAAGGTCTTCATCAAA
    ACCTCTGATTTAAATGCCTTATCTAACACCATTTTCGGGAATTACAGCGTCTTTTCCGAT
    GCACTGAACCTGTATAAAGAAAGCCTGAAAACGAAAAAAGCGCAGGAGGCTTTTGAGAAA
    CTACCGGCCCATTCTATTCACGACCTCATTCAATACTTGGAACAGTTCAATTCCAGCCTG
    GACGCGGAAAAACAACAGAGCACCGACACCGTCCTGAACTACTTCATCAAGACCGATGAA
    TTATATTCTCGCTTCATTAAATCCACTAGCGAGGCTTTCACTCAGGTGCAGCCTTTGTTC
    GAACTGGAAGCCCTGTCATCTAAGCGCCGCCCACCGGAATCGGAAGATGAAGGGGCAAAA
    GGGCAGGAAGGCTTCGAGCAGATCAAGCGTATTAAAGCTTACCTGGATACGCTTATGGAA
    GCGGTACACTTTGCAAAGCCGTTGTATCTTGTTAAGGGTCGTAAAATGATCGAAGGGCTC
    GATAAAGACCAGTCCTTTTATGAAGCGTTTGAAATGGCGTACCAAGAACTTGAATCGTTA
    ATCATTCCTATCTATAACAAAGCGCGGAGCTATCTGTCGCGGAAACCTTTCAAGGCCGAT
    AAATTCAAGATTAATTTTGACAACAACACGCTACTGAGCGGATGGGATGCGAACAAGGAA
    ACTGCTAACGCGTCCATTCTGTTTAAGAAAGACGGGTTATATTACCTTGGAATTATGCCG
    AAAGGTAAGACCTTTCTCTTTGACTACTTTGTATCGAGCGAGGATTCAGAGAAACTGAAA
    CAGCGTCGCCAGAAGACCGCCGAAGAAGCTCTGGCGCAGGATGGTGAAAGTTACTTCGAA
    AAAATTCGTTATAAACTGTTACCAGGGGCTTCAAAGATGTTACCGAAAGTCTTTTTTAGC
    AACAAAAATATTGGCTTTTACAACCCGTCGGATGACATTTTACGCATTCGCAACACAGCC
    TCTCACACCAAAAACGGGACCCCTCAGAAAGGCCACTCAAAAGTTGAGTTTAACCTGAAT
    GATTGTCATAAGATGATTGATTTCTTCAAATCATCAATTCAGAAACACCCGGAATGGGGG
    TCTTTTGGCTTTACGTTTTCTGATACCAGTGATTTTGAAGACATGAGTGCCTTCTACCGG
    GAAGTAGAAAACCAGGGTTACGTAATTAGCTTTGACAAAATCAAAGAGACCTATATACAG
    AGCCAGGTGGAACAGGGTAATCTCTACTTATTCCAGATTTATAACAAGGATTTCTCGCCC
    TACAGCAAAGGCAAACCAAACCTGCATACTCTGTACTGGAAAGCCCTGTTTGAAGAAGCG
    AACCTGAATAACGTAGTGGCGAAGTTGAACGGTGAAGCGGAAATCTTCTTCCGTCGTCAC
    TCCATTAAGGCCTCTGATAAAGTTGTCCATCCGGCAAATCAGGCCATTGATAATAAGAAT
    CCACACACGGAAAAAACGCAGTCAACCTTTGAATATGACCTCGTTAAAGACAAACGCTAC
    ACGCAAGATAAGTTCTTTTTCCACGTCCCAATCAGCCTCAACTTTAAAGCACAAGGGGTT
    TCAAAGTTTAATGATAAAGTCAATGGGTTCCTCAAGGGCAACCCGGATGTCAACATTATA
    GGTATAGACAGGGGCGAACGCCATCTGCTTTACTTTACCGTAGTGAATCAGAAAGGTGAA
    ATACTGGTTCAGGAATCATTAAATACCTTGATGTCGGACAAAGGGCACGTTAATGATTAC
    CAGCAGAAACTGGATAAAAAAGAACAGGAACGTGATGCTGCGCGTAAATCGTGGACCACG
    GTTGAGAACATTAAAGAGCTGAAAGAGGGGTATCTAAGCCATGTGGTACACAAACTGGCG
    CACCTCATCATTAAATATAACGCAATAGTCTGCCTAGAAGACTTGAATTTTGGCTTTAAA
    CGCGGCCGCTTCAAAGTGGAAAAACAAGTTTATCAAAAATTTGAAAAGGCGCTTATAGAT
    AAACTGAATTATCTGGTTTTTAAAGAAAAGGAACTTGGTGAGGTAGGGCACTACTTGACA
    GCTTATCAACTGACGGCCCCGTTCGAATCATTCAAAAAACTGGGCAAACAGTCTGGCATT
    CTGTTTTACGTGCCGGCAGATTATACTTCAAAAATCGATCCAACAACTGGCTTTGTGAAC
    TTCCTGGACCTGAGATATCAGTCTGTAGAAAAAGCTAAACAACTTCTTAGCGATTTTAAT
    GCCATTCGTTTTAACAGCGTTCAGAATTACTTTGAATTCGAAATTGACTATAAAAAACTT
    ACTCCGAAACGTAAAGTCGGAACCCAAAGTAAATGGGTAATTTGTACGTATGGCGATGTC
    AGGTATCAGAACCGTCGGAATCAAAAAGGTCATTGGGAGACCGAAGAAGTGAACGTGACC
    GAAAAGCTGAAGGCTCTGTTCGCCAGCGATTCAAAAACTACAACTGTGATCGATTACGCA
    AATGATGATAACCTGATAGATGTGATTTTAGAGCAGGATAAAGCCAGCTTTTTTAAAGAA
    CTGTTGTGGCTCCTGAAACTTACGATGACCTTACGACATTCCAAGATCAAATCGGAAGAT
    GATTTTATTCTGTCACCGGTCAAGAATGAGCAGGGTGAATTCTATGATAGTAGGAAAGCC
    GGCGAAGTGTGGCCGAAAGACGCCGACGCCAATGGCGCCTATCATATCGCGCTCAAAGGG
    CTTTGGAATTTGCAGCAGATTAACCAGTGGGAAAAAGGTAAAACCCTGAATCTGGCTATC
    AAAAACCAGGATTGGTTTAGCTTTATCCAAGAGAAACCGTATCAGGAATGAGAAATCATC
    CTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGAT
    GCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCAT
    TCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACT
    CAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    65 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATCATACAGGCGGTCTTCTTAGTATGGACGCGAAAGAGTTCACAGGTCAGTATCCGT
    TGTCGAAAACATTACGATTCGAACTTCGGCCCATCGGCCGCACGTGGGATAACCTGGAGG
    CCTCAGGCTACTTAGCGGAAGACCGCCATCGTGCCGAATGTTATCCTCGTGCGAAAGAGT
    TATTGGATGACAACCATCGTGCCTTCCTGAATCGTGTGTTGCCACAAATCGATATGGATT
    GGCACCCGATTGCGGAGGCCTTTTGTAAGGTACATAAAAACCCTGGTAATAAAGAACTTG
    CCCAGGATTACAACCTTCAGTTGTCAAAGCGCCGTAAGGAGATCAGCGCATATCTTCAGG
    ATGCAGATGGCTATAAAGGCCTGTTCGCGAAGCCCGCCTTAGACGAAGCTATGAAAATTG
    CGAAAGAAAACGGGAACGAAAGTGATATTGAGGTTCTCGAAGCGTTTAACGGTTTTAGCG
    TATACTTCACCGGTTATCATGAGTCACGCGAGAACATTTATAGCGATGAGGATATGGTGA
    GCGTAGCCTACCGAATTACTGAGGATAATTTCCCGCGCTTTGTCTCAAACGCTTTGATCT
    TTGATAAATTAAACGAAAGCCATCCGGATATTATCTCTGAAGTATCGGGCAATCTTGGAG
    TTGATGACATTGGTAAGTACTTTGACGTGTCGAACTATAACAATTTTCTTTCCCAGGCCG
    GTATAGATGACTACAATCACATTATTGGCGGCCATACAACCGAAGACGGACTGATACAAG
    CGTTTAATGTCGTATTGAACTTACGTCACCAAAAAGACCCTGGCTTTGAAAAAATTCAGT
    TCAAACAGCTCTACAAACAAATCCTGAGCGTGCGTACCAGCAAAAGCTACATCCCGAAAC
    AGTTTGACAACTCTAAGGAGATGGTTGACTGCATTTGCGATTATGTCAGCAAAATAGAGA
    AATCCGAAACAGTAGAACGGGCCCTGAAACTAGTCCGTAATATCAGTTCTTTCGACTTGC
    GCGGGATCTTTGTCAATAAAAAGAACTTGCGCATACTGAGCAACAAACTGATAGGAGATT
    GGGACGCGATCGAAACCGCATTGATGCATAGTTCTTCATCAGAAAACGATAAGAAAAGCG
    TATATGATAGCGCGGAGGCTTTTACGTTGGATGACATCTTTTCAAGCGTGAAAAAATTTT
    CTGATGCCTCTGCCGAAGATATTGGCAACAGGGCGGAAGACATCTGTAGAGTGATAAGTG
    AGACGGCCCCTTTTATCAACGATCTGCGAGCGGTGGACCTGGATAGCCTGAACGACGATG
    GTTATGAAGCGGCCGTCTCAAAAATTCGGGAGTCGCTGGAGCCTTATATGGATCTTTTCC
    ATGAACTGGAAATTTTCTCGGTTGGCGATGAGTTCCCAAAATGCGCAGCATTTTACAGCG
    AACTGGAGGAAGTCAGCGAACAGCTGATCGAAATTATTCCGTTATTCAACAAGGCGCGTT
    CGTTCTGCACCCGGAAACGCTATAGCACCGATAAGATTAAAGTGAACTTAAAATTCCCGA
    CCTTGGCGGACGGGTGGGACCTGAACAAAGAGAGAGACAACAAAGCCGCGATTCTGCGGA
    AAGACGGTAAGTATTATCTGGCAATTCTGGATATGAAGAAAGATCTGTCAAGCATTAGGA
    CCAGCGACGAAGATGAATCCAGCTTCGAAAAGATGGAGTATAAACTGTTACCGAGTCCAG
    TAAAAATGCTGCCAAAGATATTCGTAAAATCGAAAGCCGCTAAGGAAAAATATGGCCTGA
    CAGATCGTATGCTTGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCGTTTGATC
    TTGGCTTTTGCCATGAACTCATTGATTATTACAAGCGTTGTATCGCGGAGTACCCAGGCT
    GGGATGTGTTCGATTTCAAGTTTCGCGAAACTTCCGATTATGGGTCCATGAAAGAGTTCA
    ATGAAGATGTGGCCGGAGCCGGTTACTATATGAGTCTGAGAAAAATTCCGTGCAGCGAAG
    TGTACCGTCTGTTAGACGAGAAATCGATTTATCTATTTCAAATTTATAACAAAGATTACT
    CTGAAAATGCACATGGTAATAAGAACATGCATACCATGTACTGGGAGGGTCTCTTTTCCC
    CGCAAAACCTGGAGTCGCCCGTTTTCAAGTTGTCGGGTGGGGCAGAACTTTTCTTTCGAA
    AATCCTCAATCCCTAACGATGCCAAAACAGTACACCCGAAAGGCTCAGTGCTGGTTCCAC
    GTAATGATGTTAACGGTCGGCGTATTCCAGATTCAATCTACCGCGAACTGACACGCTATT
    TTAACCGTGGCGATTGCCGAATCAGTGACGAAGCCAAAAGTTATCTTGACAAGGTTAAGA
    CTAAAAAAGCGGACCATGACATTGTGAAAGATCGCCGCTTTACCGTGGATAAAATGATGT
    TCCACGTCCCGATTGCGATGAACTTTAAGGCGATCAGTAAACCGAACTTAAACAAAAAAG
    TCATTGATGGCATCATTGATGATCAGGATCTGAAAATCATTGGTATTGATCGTGGCGAGC
    GGAACTTAATTTACGTCACGATGGTTGACAGAAAAGGGAATATCTTATATCAGGATTCTC
    TTAACATCCTCAATGGCTACGACTATCGTAAAGCTCTGGATGTGCGCGAATATGACAACA
    AGGAAGCGCGTCGTAACTGGACTAAAGTGGAGGGCATTCGCAAAATGAAGGAAGGCTATC
    TGTCATTAGCGGTCTCGAAATTAGCGGATATGATTATCGAAAATAACGCCATCATCGTTA
    TGGAGGACCTGAACCACGGATTCAAAGCGGGCCGCTCAAAGATTGAAAAACAAGTTTATC
    AGAAATTTGAGAGTATGCTGATTAACAAACTGGGCTATATGGTGTTAAAAGACAAGTCAA
    TTGACCAATCAGGTGGCGCGCTGCATGGATACCAGCTGGCGAACCATGTTACCACCTTAG
    CATCAGTTGGAAAGCAGTGTGGGGTTATCTTTTATATACCGGCAGCGTTCACTAGTAAAA
    TAGATCCGACCACTGGTTTCGCCGATCTCTTTGCCCTGAGTAACGTTAAAAACGTAGCGA
    GCATGCGTGAATTCTTTTCCAAAATGAAATCTGTCATTTATGATAAAGCTGAAGGCAAAT
    TCGCATTCACCTTTGATTACTTGGATTACAACGTGAAGAGCGAATGTGGTCGTACGCTGT
    GGACCGTTTACACCGTTGGTGAGCGCTTCACCTATTCCCGTGTGAACCGCGAATATGTAC
    GTAAAGTCCCCACCGATATTATCTATGATGCCCTCCAGAAAGCAGGCATTAGCGTCGAAG
    GAGACTTAAGGGACAGAATTGCCGAAAGCGATGGCGATACGCTGAAGTCTATTTTTTACG
    CATTCAAATACGCGCTAGATATGCGCGTTGAGAATCGCGAGGAAGACTACATTCAATCAC
    CTGTGAAAAATGCCTCTGGGGAATTTTTTTGTTCAAAAAATGCTGGTAAAAGCCTCCCAC
    AAGATAGCGATGCAAACGGTGCATATAACATTGCCCTGAAAGGTATTCTTCAATTACGCA
    TGCTGTCTGAGCAGTACGACCCCAACGCGGAATCTATTAGACTTCCGCTGATAACCAATA
    AAGCCTGGCTGACATTCATGCAGTCTGGCATGAAGACCTGGAAAAATTAGGAAATCATCC
    TTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATT
    CACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    66 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccatgGAT
    AGTTTGAAAGATTTCACCAATCTGTACCCTGTCAGTAAGACATTGAGATTTGAATTAAAG
    CCCGTTGGAAAGACTTTAGAAAATATCGAGAAAGCAGGTATTTTGAAAGAGGATGAGCAT
    CGTGCAGAAAGTTATCGGAGGGTGAAGAAAATAATTGATACTTATCATAAGGTATTTATC
    GATTCTTCTCTTGAAAATATGGCTAAAATGGGTATTGAGAATGAAATAAAAGCAATGCTC
    CAAAGTTTCTGCGAATTGTATAAAAAAGATCATCGCACTGAGGGTGAAGACAAGGCATTA
    GATAAAATTCGAGCAGTACTTCGTGGCCTGATTGTTGGGGCTTTCACTGGTGTTTGCGGA
    AGACGGGAAAATACAGTCCAAAACGAGAAGTACGAGAGTTTGTTCAAAGAAAAGTTGATA
    AAAGAAATTTTACCTGATTTTGTGCTCTCTACTGAGGCTGAAAGCTTGCCTTTCTCTGTT
    GAAGAAGCTACGAGGTCACTGAAGGAGTTTGATAGCTTTACATCCTACTTTGCTGGTTTT
    TACGAGAATAGAAAGAATATATACTCGACGAAACCTCAATCCACTGCCATTGCTTATCGT
    CTTATTCATGAGAACTTGCCGAAGTTCATTGATAATATTCTTGTTTTTCAGAAGATCAAA
    GAGCCTATAGCCAAAGAGCTGGAACATATTCGTGCGGACTTTTCTGCCGGGGGGTACATA
    AAAAAGGATGAGAGATTGGAGGATATTTTTTCGTTGAACTATTATATCCACGTGTTATCT
    CAGGCTGGGATCGAAAAATATAACGCATTGATTGGGAAGATTGTGACAGAAGGAGATGGA
    GAGATGAAAGGGCTCAATGAACACATCAACCTTTACAACCAACAAAGAGGCAGAGAGGAT
    CGGCTCCCTCTTTTTAGGCCTCTTTATAAACAGATATTGAGTGACAGAGAGCAATTATCA
    TACTTGCCTGAGAGTtttGAAAAAGAtGAGGAGctcctcAGGGctctAAAAGAGttctAt
    GAtcAtAtcGcAGAAGACATTCTCGGACGTACTCAACAGTTGATGACTTCTATTTCAGAA
    TATGATTTATCTCGGATATACGTAAGGAACGATAGCCAATTGACTGATATATCAAAAAAA
    ATGTTGGGAGATTGGAATGCTATCTACATGGCTAGAGAACGAGCATATGACCACGAGCAG
    GCTCCCAAAAGAATCACGGCGAAATACGAGAGGGACAGGATTAAAGCTCTTAAAGGAGAA
    GAGAGTATAAGTCTGGCAAATCTTAATAGTTGTATTGCCTTTCTGGACAATGTTAGAGAT
    TGCCGTGTAGATACTTATCTTTCCACACTGGGCCAGAAGGAAGGACCACATGGTCTATCT
    AATCTCGTTGAGAACGTTTTTGCCTCATACCATGAAGCAGAGCAATTGTTGAGCTTTCCA
    TACCCCGAAGAGAATAATCTGATTCAGGACAAGGACAATGTGGTGTTAATTAAGAATCTT
    CTCGACAATATCAGTGATCTGCAGAGGTTCTTGAAACCTCTTTGGGGTATGGGAGACGAA
    CCCGATAAAGATGAAAGATTTTATGGAGAGTATAATTATATCCGAGGAGCTCTAGATCAG
    GTGATCCCTCTGTACAATAAGGTAAGGAACTACCTCACTCGGAAGCCTTATTCGACCAGA
    AAAGTAAAACTCAATTTTGGGAATTCTCAATTGCTTAGTGGTTGGGATAGAAATAAGGAA
    AAGGATAATAGCTGTGTGATTTTGCGTAAGGGGCAGAACTTCTATTTGGCTATTATGAAC
    AATAGGCACAAAAGAAGTTTCGAAAACAAGGTGTTGCCCGAGTATAAGGAGGGAGAACCT
    TACTTCGAAAAGATGGATTATAAATTTTTGCCTGATCCTAATAAAATGCTTCCTAAGGTT
    TTTCTTTCGAAAAAAGGAATAGAGATATACAAACCAAGTCCGAAGCTTTTAGAACAATAT
    GGACATGGAACTCACAAAAAGGGAGATACCTTTAGTATGGATGATTTGCACGAACTGATC
    GATTTCTTCAAACACTCAATCGAGGCTCATGAAGATTGGAAGCAATTCGGATTCAAATTT
    TCTGATACGGCTACTTATGAGAATGTATCTAGTTTCTATAGAGAAGTTGAGGATCAGGGG
    TATAAGCTCTCTTTCCGAAAAGTTTCGGAATCTTATGTCTATTCATTAATAGATCAAGGC
    AAGTTGTATTTATTTCAGATATACAACAAGGACTTTTCTCCCTGCAGCAAAGGGACACCT
    AATCTGCATACCTTGTATTGGAGAATGCTTTTTGACGAGCGCAATTTGGCAGATGTCATA
    TACAAACTGGATGGGAAGGCTGAAATCTTTTTCCGAGAGAAGAGTTTGAAAAATGATCAT
    CCCACGCATCCGGCTGGTAAGCCTATCAAAAAGAAAAGTCGACAAAAAAAAGGAGAGGAG
    AGTCTGTTTGAGTATGATTTAGTCAAGGATAGGCACTATACGATGGATAAGTTCCAGTTT
    CATGTGCCTATTACTATGAATTTTAAATGTTCTGCAGGAAGCAAAGTCAATGATATGGTT
    AATGCTCATATTCGAGAGGCAAAGGATATGCATGTCATTGGAATTGATCGTGGAGAACGC
    AATCTGCTGTATATATGCGTGATAGATAGTCGAGGGACGATTTTGGATCAAATTTCTCTG
    AATACGATTAACGATATAGACTATCATGATTTATTGGAGAGTCGAGACAAAGACCGTCAG
    CAGGAGCGCCGAAACTGGCAAACTATCGAAGGGATCAAGGAGCTAAAACAAGGCTACCTT
    AGTCAGGCGGTTCATCGGATAGCCGAACTGATGGTGGCTTATAAGGCTGTAGTTGCTTTG
    GAGGATTTGAATATGGGGTTCAAACGTGGGCGGCAGAAAGTAGAAAGTTCTGTTTATCAG
    CAGTTTGAGAAACAGCTGATAGATAAGCTCAACTATCTTGTGGACAAGAAGAAAAGGCCT
    GAAGATATTGGAGGATTGTTGAGAGCCTATCAATTTACGGCCCCATTTAAGAGTTTTAAG
    GAAATGGGAAAGCAAAACGGCTTCTTGTTTTATATCCCGGCTTGGAACACGAGCAACATA
    GATCCGACTACTGGATTTGTTAATTTATTTCATGCCCAGTATGAAAATGTAGATAAAGCG
    AAGAGCTTCTTTCAAAAGTTTGATTCAATTAGTTACAACCCGAAGAAAGACTGGTTTGAG
    TTTGCATTCGATTATAAAAACTTTACTAAAAAGGCTGAAGGAAGTCGTTCTATGTGGATA
    TTATGCACACATGGTTCCCGAATAAAGAATTTTAGAAATTCCCAGAAGAATGGTCAATGG
    GATTCCGAAGAATTCGCCTTGACGGAGGCTTTTAAGTCTCTTTTTGTGCGATATGAGATA
    GATTATACCGCTGATTTGAAAACAGCTATTGTGGACGAAAAGCAAAAAGACTTCTTCGTG
    GATCTTCTGAAGCTATTCAAATTGACAGTACAGATGCGCAACAGCTGGAAAGAGAAGGAT
    TTGGATTATCTAATCTCTCCTGTAGCAGGGGCTGATGGCCGTTTCTTCGATACAAGAGAG
    GGAAATAAAAGTCTGCCTAAGGATGCAGATGCCAATGGAGCTTATAATATTGCCCTAAAA
    GGACTTTGGGCTCTACGCCAGATTCGGCAAACTTCAGAAGGCGGTAAACTCAAATTGGCG
    ATTTCCAATAAGGAATGGCTACAGTTTGTGCAAGAGAGATCTTACGAGAAAGACtgaGAA
    ATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATT
    ATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTT
    TATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTT
    ATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTA
    TTTCC
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    67 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGAAC
    AACGGCACAAATAATTTTCAGAACTTCATCGGGATCTCAAGTTTGCAGAAAACGCTGCGC
    AATGCTCTGATCCCCACGGAAACCACGCAACAGTTCATCGTCAAGAACGGAATAATTAAA
    GAAGATGAGTTACGTGGCGAGAACCGCCAGATTCTGAAAGATATCATGGATGACTACTAC
    CGCGGATTCATCTCTGAGACTCTGAGTTCTATTGATGACATAGATTGGACTAGCCTGTTC
    GAAAAAATGGAAATTCAGCTGAAAAATGGTGATAATAAAGATACCTTAATTAAGGAACAG
    ACAGAGTATCGGAAAGCAATCCATAAAAAATTTGCGAACGACGATCGGTTTAAGAACATG
    TTTAGCGCCAAACTGATTAGTGACATATTACCTGAATTTGTCATCCACAACAATAATTAT
    TCGGCATCAGAGAAAGAGGAAAAAACCCAGGTGATAAAATTGTTTTCGCGCTTTGCGACT
    AGCTTTAAAGATTACTTCAAGAACCGTGCAAATTGCTTTTCAGCGGACGATATTTCATCA
    AGCAGCTGCCATCGCATCGTCAACGACAATGCAGAGATATTCTTTTCAAATGCGCTGGTC
    TACCGCCGGATCGTAAAATCGCTGAGCAATGACGATATCAACAAAATTTCGGGCGATATG
    AAAGATTCATTAAAAGAAATGAGTCTGGAAGAAATATATTCTTACGAGAAGTATGGGGAA
    TTTATTACCCAGGAAGGCATTAGCTTCTATAATGATATCTGTGGGAAAGTGAATTCTTTT
    ATGAACCTGTATTGTCAGAAAAATAAAGAAAACAAAAATTTATACAAACTTCAGAAACTT
    CACAAACAGATTCTATGCATTGCGGACACTAGCTATGAGGTCCCGTATAAATTTGAAAGT
    GACGAGGAAGTGTACCAATCAGTTAACGGCTTCCTTGATAACATTAGCAGCAAACATATA
    GTCGAAAGATTACGCAAAATCGGCGATAACTATAACGGCTACAACCTGGATAAAATTTAT
    ATCGTGTCCAAATTTTACGAGAGCGTTAGCCAAAAAACCTACCGCGACTGGGAAACAATT
    AATACCGCCCTCGAAATTCATTACAATAATATCTTGCCGGGTAACGGTAAAAGTAAAGCC
    GACAAAGTAAAAAAAGCGGTTAAGAATGATTTACAGAAATCCATCACCGAAATAAATGAA
    CTAGTGTCAAACTATAAGCTGTGCAGTGACGACAACATCAAAGCGGAGACTTATATACAT
    GAGATTAGCCATATCTTGAATAACTTTGAAGCACAGGAATTGAAATACAATCCGGAAATT
    CACCTAGTTGAATCCGAGCTCAAAGCGAGTGAGCTTAAAAACGTGCTGGACGTGATCATG
    AATGCGTTTCATTGGTGTTCGGTTTTTATGACTGAGGAACTTGTTGATAAAGACAACAAT
    TTTTATGCGGAACTGGAGGAGATTTACGATGAAATTTATCCAGTAATTAGTCTGTACAAC
    CTGGTTCGTAACTACGTTACCCAGAAACCGTACAGCACGAAAAAGATTAAATTGAACTTT
    GGAATACCGACGTTAGCAGACGGTTGGTCAAAGTCCAAAGAGTATTCTAATAACGCTATC
    ATACTGATGCGCGACAATCTGTATTATCTGGGCATCTTTAATGCGAAGAATAAACCGGAC
    AAGAAGATTATCGAGGGTAATACGTCAGAAAATAAGGGTGACTACAAAAAGATGATTTAT
    AATTTGCTCCCGGGTCCCAACAAAATGATCCCGAAAGTTTTCTTGAGCAGCAAGACGGGG
    GTGGAAACGTATAAACCGAGCGCCTATATCCTAGAGGGGTATAAACAGAATAAACATATC
    AAGTCTTCAAAAGACTTTGATATCACTTTCTGTCATGATCTGATCGACTACTTCAAAAAC
    TGTATTGCAATTCATCCCGAGTGGAAAAACTTCGGTTTTGATTTTAGCGACACCAGTACT
    TATGAAGACATTTCCGGGTTTTATCGTGAGGTAGAGTTACAAGGTTACAAGATTGATTGG
    ACATACATTAGCGAAAAAGACATTGATCTGCTGCAGGAAAAAGGTCAACTGTATCTGTTC
    CAGATATATAACAAAGATTTTTCGAAAAAATCAACCGGGAATGACAACCTTCACACCATG
    TACCTGAAAAATCTTTTCTCAGAAGAAAATCTTAAGGATATCGTCCTGAAACTTAACGGC
    GAAGCGGAAATCTTCTTCAGGAAGAGCAGCATAAAGAACCCAATCATTCATAAAAAAGGC
    TCGATTTTAGTCAACCGTACCTACGAAGCAGAAGAAAAAGACCAGTTTGGCAACATTCAA
    ATTGTGCGTAAAAATATTCCGGAAAACATTTATCAGGAGCTGTACAAATACTTCAACGAT
    AAAAGCGACAAAGAGCTGTCTGATGAAGCAGCCAAACTGAAGAATGTAGTGGGACACCAC
    GAGGCAGCGACGAATATAGTCAAGGACTATCGCTACACGTATGATAAATACTTCCTTCAT
    ATGCCTATTACGATCAATTTCAAAGCCAATAAAACGGGTTTTATTAATGATAGGATCTTA
    CAGTATATCGCTAAAGAAAAAGACTTACATGTGATCGGCATTGATCGGGGCGAGCGTAAC
    CTGATCTACGTGTCCGTGATTGATACTTGTGGTAATATAGTTGAACAGAAAAGCTTTAAC
    ATTGTAAACGGCTACGACTATCAGATAAAACTGAAACAACAGGAGGGCGCTAGACAGATT
    GCGCGGAAAGAATGGAAAGAAATTGGTAAAATTAAAGAGATCAAAGAGGGCTACCTGAGC
    TTAGTAATCCACGAGATCTCTAAAATGGTAATCAAATACAATGCAATTATAGCGATGGAG
    GATTTGTCTTATGGTTTTAAAAAAGGGCGCTTTAAGGTCGAACGGCAAGTTTACCAGAAA
    TTTGAAACCATGCTCATCAATAAACTCAACTATCTGGTATTTAAAGATATTTCGATTACC
    GAGAATGGCGGTCTCCTGAAAGGTTATCAGCTGACATACATTCCTGATAAACTTAAAAAC
    GTGGGTCATCAGTGCGGCTGCATTTTTTATGTGCCTGCTGCATACACGAGCAAAATTGAT
    CCGACCACCGGCTTTGTGAATATCTTTAAATTTAAAGACCTGACAGTGGACGCAAAACGT
    GAATTCATTAAAAAATTTGACTCAATTCGTTATGACAGTGAAAAAAATCTGTTCTGCTTT
    ACATTTGACTACAATAACTTTATTACGCAAAACACGGTCATGAGCAAATCATCGTGGAGT
    GTGTATACATACGGCGTGCGCATCAAACGTCGCTTTGTGAACGGCCGCTTCTCAAACGAA
    AGTGATACCATTGACATAACCAAAGATATGGAGAAAACGTTGGAAATGACGGACATTAAC
    TGGCGCGATGGCCACGATCTTCGTCAAGACATTATAGATTATGAAATTGTTCAGCACATA
    TTCGAAATTTTCCGTTTAACAGTGCAAATGCGTAACTCCTTGTCTGAACTGGAGGACCGT
    GATTACGATCGTCTCATTTCACCTGTACTGAACGAAAATAACATTTTTTATGACAGCGCG
    AAAGCGGGGGATGCACTTCCTAAGGATGCCGATGCAAATGGTGCGTATTGTATTGCATTA
    AAAGGGTTATATGAAATTAAACAAATTACCGAAAATTGGAAAGAAGATGGTAAATTTTCG
    CGCGATAAACTCAAAATCAGCAATAAAGATTGGTTCGACTTTATCCAGAATAAGCGCTAT
    CTCTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATC
    GCGTTGATTATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTG
    AATGAATTTTATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACT
    CAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAA
    GGGATGTTATTTCC
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    68 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATACCAATAAATTCACTAACCAGTATTCTCTCTCTAAGACCCTGCGCTTTGAACTGA
    TTCCGCAGGGGAAAACCTTGGAGTTCATTCAAGAAAAAGGCCTCTTGTCTCAGGATAAAC
    AGAGGGCTGAATCTTACCAAGAAATGAAGAAAACTATTGATAAGTTTCATAAATATTTCA
    TTGATTTAGCCTTGTCTAACGCCAAATTAACTCACTTGGAAACGTATCTGGAGTTATACA
    ACAAATCTGCCGAAACTAAGAAAGAACAGAAATTTAAAGACGATTTGAAAAAAGTACAGG
    ACAATCTGCGTAAAGAAATTGTCAAATCCTTCAGTGACGGCGATGCTAAAAGCATTTTTG
    CCATTCTGGACAAAAAAGAGTTGATTACTGTGGAATTAGAAAAGTGGTTTGAAAACAATG
    AGCAGAAAGACATCTACTTCGATGAGAAATTCAAAACTTTCACCACCTATTTTACAGGAT
    TTCATCAAAACCGGAAGAACATGTACTCAGTAGAACCGAACTCCACGGCCATTGCGTATC
    GTTTGATCCATGAGAATCTGCCTAAATTTCTGGAGAATGCGAAAGCCTTTGAAAAGATTA
    AGCAGGTCGAATCGCTGCAAGTGAATTTTCGTGAACTCATGGGCGAATTTGGTGACGAAG
    GTCTAATCTTCGTTAACGAACTGGAAGAAATGTTTCAGATTAATTACTACAATGACGTGC
    TATCGCAGAACGGTATCACAATCTACAATAGTATTATCTCAGGGTTCACAAAAAACGATA
    TAAAATACAAAGGCCTGAACGAGTATATCAATAACTACAACCAAACAAAGGACAAAAAGG
    ATAGGCTTCCGAAACTGAAGCAGTTATACAAACAGATTTTATCTGACAGAATCTCCCTGA
    GCTTTCTGCCGGATGCTTTCACTGATGGGAAGCAGGTTCTGAAAGCGATTTTCGATTTTT
    ATAAGATTAACTTACTGAGCTACACGATTGAAGGTCAAGAAGAATCTCAAAACTTACTGC
    TCTTGATCCGTCAAACCATTGAAAATCTATCATCGTTCGATACGCAGAAAATCTACCTCA
    AAAACGATACTCACCTGACTACGATCTCTCAGCAGGTTTTCGGGGATTTTAGTGTATTTT
    CAACAGCTCTGAACTACTGGTATGAAACCAAAGTCAATCCGAAATTCGAGACGGAATATT
    CTAAGGCCAACGAAAAAAAACGTGAGATTCTTGATAAAGCTAAAGCCGTATTTACTAAAC
    AGGATTACTTTTCTATTGCTTTCCTGCAGGAAGTTTTATCGGAGTATATCCTGACCCTGG
    ATCATACATCTGATATCGTTAAAAAACACAGCAGCAATTGCATCGCTGACTATTTCAAAA
    ACCACTTTGTCGCCAAAAAAGAAAACGAAACAGACAAGACTTTCGATTTCATTGCTAACA
    TCACCGCAAAATACCAGTGTATTCAGGGTATCTTGGAAAACGCCGACCAATACGAAGACG
    AACTGAAACAAGATCAGAAGCTGATCGATAATTTAAAATTCTTCTTAGATGCAATCCTGG
    AGCTGCTGCACTTCATCAAACCGCTTCATTTAAAGAGCGAGTCCATTACCGAAAAGGACA
    CCGCCTTCTATGACGTTTTTGAAAATTATTATGAAGCCCTCTCCTTGCTGACTCCGCTGT
    ATAATATGGTACGCAATTACGTAACCCAGAAACCATATTCTACCGAAAAAATTAAACTGA
    ACTTTGAAAACGCACAGCTGCTCAACGGTTGGGACGCGAATAAAGAAGGTGACTACCTCA
    CCACCATCCTGAAAAAAGATGGTAACTATTTTCTGGCAATTATGGATAAGAAACATAATA
    AAGCATTCCAGAAATTTCCTGAAGGGAAAGAAAATTACGAAAAGATGGTGTACAAACTCT
    TACCTGGAGTTAACAAAATGTTGCCGAAAGTATTTTTTAGTAATAAGAACATCGCGTACT
    TTAACCCGTCCAAAGAACTGCTGGAAAATTATAAAAAGGAGACGCATAAGAAAGGGGATA
    CCTTTAACCTGGAACATTGCCATACCTTAATAGACTTCTTCAAGGATTCCCTGAATAAAC
    ACGAGGATTGGAAATATTTCGATTTTCAGTTTAGTGAGACCAAGTCATACCAGGATCTTA
    GCGGCTTTTATCGCGAAGTAGAACACCAAGGCTATAAAATTAACTTCAAAAACATCGACA
    GCGAATACATCGACGGTTTAGTTAACGAGGGCAAACTGTTTCTGTTCCAGATCTATTCAA
    AGGATTTTAGCCCGTTCTCTAAAGGCAAACCAAATATGCATACGTTGTACTGGAAAGCAC
    TGTTTGAAGAGCAAAACCTGCAGAATGTGATTTATAAACTGAACGGCCAAGCTGAGATTT
    TTTTCCGTAAAGCCTCGATTAAACCGAAAAATATCATCCTTCATAAGAAGAAAATAAAGA
    TCGCTAAAAAACACTTCATAGATAAAAAAACCAAAACCTCCGAAATAGTGCCTGTTCAAA
    CAATTAAGAACTTGAATATGTACTACCAGGGCAAGATATCGGAAAAGGAGTTGACTCAAG
    ACGATCTTCGCTATATCGATAACTTTTCGATTTTTAACGAAAAAAACAAGACGATCGACA
    TCATCAAAGATAAACGCTTCACTGTAGATAAGTTCCAGTTTCATGTGCCGATTACTATGA
    ACTTCAAAGCTACCGGGGGTAGCTATATCAACCAAACGGTGTTGGAATACCTGCAGAATA
    ACCCGGAAGTCAAAATCATTGGGCTGGACCGCGGAGAACGTCACCTTGTGTACTTGACCT
    TAATCGATCAGCAAGGCAACATCTTAAAACAAGAATCGCTGAATACCATTACGGATTCAA
    AGATTAGCACCCCGTATCATAAGCTGCTCGATAACAAGGAGAATGAGCGCGACCTGGCCC
    GTAAAAACTGGGGCACGGTGGAAAACATTAAGGAGTTAAAGGAGGGTTATATTTCCCAGG
    TAGTGCATAAGATCGCCACTCTCATGCTCGAGGAAAATGCGATCGTTGTCATGGAAGACT
    TAAACTTCGGATTTAAACGTGGGCGATTTAAAGTAGAGAAACAAATCTACCAGAAGTTAG
    AAAAAATGCTGATTGACAAATTAAATTACTTGGTCCTAAAAGACAAACAGCCGCAAGAAT
    TGGGTGGATTATACAACGCCCTCCAACTTACCAATAAATTCGAAAGTTTTCAGAAAATGG
    GTAAACAGTCAGGCTTTCTTTTTTATGTTCCTGCGTGGAACACATCCAAAATCGACCCTA
    CAACCGGCTTCGTCAATTACTTCTATACTAAATATGAAAACGTCGACAAAGCAAAAGCAT
    TCTTTGAAAAGTTCGAAGCAATACGTTTTAACGCTGAGAAAAAATATTTCGAGTTCGAAG
    TCAAGAAATACTCAGACTTTAACCCCAAAGCTGAGGGCACACAGCAAGCGTGGACAATCT
    GCACCTACGGCGAGCGCATCGAAACGAAGCGTCAAAAAGATCAGAATAACAAATTTGTTT
    CAACACCTATCAACCTGACCGAGAAGATTGAAGACTTCTTAGGTAAAAATCAGATTGTTT
    ATGGCGACGGTAACTGTATAAAATCTCAAATAGCCTCAAAGGATGATAAAGCATTTTTCG
    AAACATTATTATATTGGTTCAAAATGACACTGCAGATGCGCAATAGTGAGACGCGTACAG
    ATATTGATTATCTTATCAGCCCGGTCATGAACGACAACGGTACTTTTTACAACTCCAGAG
    ACTATGAAAAACTTGAGAATCCAACTCTCCCCAAAGATGCTGATGCGAACGGTGCTTATC
    ACATCGCGAAAAAAGGTCTGATGCTGCTGAACAAAATCGACCAAGCCGATCTGACTAAGA
    AAGTTGACCTAAGCATTTCAAATCGGGACTGGTTACAGTTTGTTCAAAAGAACAAATGAG
    AAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGG
    TTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    69 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGGAA
    CAGGAATATTATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTTACTGAC
    AGTGAATATCACGTTCTAAGAAAGCATGGTAAGGCATTGTGGGGTGTAAGACTTTTCGAA
    TCTGCTTCCACTGCTGAAGAGCGTAGAATGTTTAGAACGAGTCGACGTAGGCTAGACAGG
    CGCAATTGGAGAATCGAAATTTTACAAGAAATTTTTGCGGAAGAGATATCTAAGAAAGAC
    CCAGGCTTTTTCCTGAGAATGAAGGAATCTAAGTATTACCCTGAGGATAAAAGAGATATA
    AATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGACGATGATTTTACCGATAAG
    GATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATGAATACAGAG
    GAAACCCCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACATAGAGGC
    CATTTCTTACTTTCCGGGGATATCAACGAAATCAAAGAGTTTGGTACCACATTTAGTAAG
    TTACTGGAAAACATAAAGAATGAAGAATTGGATTGGAACTTAGAACTCGGAAAAGAAGAA
    TACGCGGTTGTCGAATCTATCCTGAAGGATAATATGCTGAATAGGTCGACCAAAAAAACT
    AGGCTGATCAAAGCACTGAAAGCCAAATCTATCTGCGAAAAAGCTGTTTTAAATTTACTT
    GCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTTTGGAAGAATTGAACGAAACCGAG
    CGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTGGTGAGGTGGAAAAC
    GAGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGGGCTGTT
    TTAGTAGAAATCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACTTACGAA
    AAGCACAAGTCCGATCTCCAGTTTTTGAAGAAAATTGTCAGGAAATATCTGACTAAGGAA
    GAATATAAAGATATTTTCGTTAGTACCTCTGACAAACTGAAAAATTACTCCGCTTACATC
    GGGATGACCAAGATTAATGGCAAAAAAGTTGATCTGCAAAGCAAAAGGTGTTCGAAGGAA
    GAATTTTATGATTTCATTAAAAAGAATGTCTTAAAAAAATTAGAAGGTCAGCCAGAATAC
    GAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACCAAAACAAGTCAACAGAGAT
    AATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTTAGGCAATTTA
    CGCGATAAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTTGAATTC
    AGAATACCCTATTATGTGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGTAAATTC
    ACATGGGCCGTCCGCAAATCCAATGAAAAAATTTACCCATGGAACTTTGAAAATGTAGTA
    GATATTGAAGCGTCTGCGGAGAAATTTATTCGAAGAATGACTAATAAATGCACTTACTTG
    ATGGGAGAGGATGTTCTGCCTAAAGACAGCTTATTATACAGCAAGTACATGGTTCTAAAC
    GAACTTAACAACGTTAAGTTGGACGGTGAGAAATTAAGTGTAGAATTGAAACAAAGATTG
    TATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAAAATTAAGAATTACTTG
    AAGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGATTTCAAA
    GCATCCCTAACAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTCGCAAAA
    AAAGATAAAGAAAACATTATTACTAATATTGTTCTTTTCGGTGATGACAAGAAATTGTTG
    AAGAAAAGACTGAATAGACTTTACCCCCAGATTACTCCCAATCAACTTAAGAAAATTTGT
    GCTTTGTCTTACACAGGATGGGGTCGTTTTTCAAAAAAGTTCTTAGAAGAGATTACCGCA
    CCTGATCCAGAAACAGGCGAAGTATGGAATATAATTACCGCCTTATGGGAATCGAACAAT
    AATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGGAAGAAGTTGAGACTTACAAC
    ATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGTATGTATCACCT
    TCTGTCAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAGGTAATG
    AAGGAGTCTCCTAAACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCAAAAAGA
    ACCGAGTCAAGAAAGAAGCAGTTAATCGATTTATATAAGGCTTGTAAAAACGAAGAGAAA
    GATTGGGTTAAAGAATTGGGGGACCAAGAGGAACAAAAACTACGGTCGGATAAGTTGTAT
    TTATACTATACGCAAAAGGGACGATGTATGTATTCCGGCGAGGTAATAGAATTGAAGGAT
    TTATGGGACAATACAAAATATGACATAGACCATATATATCCCCAATCAAAAACGATGGAC
    GATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATAATGCGACCAAATCTGATAAG
    TATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGTCCTTGTTAGAT
    GGTGGGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTATCGCCA
    GAAGAACTCGCTGGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACCAAAGCC
    GTTGCTGAGATCCTAAAGCAAGTTTTCCCAGAGTCGGAGATTGTCTATGTCAAAGCTGGC
    ACAGTGAGCAGGTTTAGGAAAGACTTCGAACTATTAAAGGTAAGAGAAGTGAACGATTTA
    CATCACGCAAAGGACGCTTACCTAAATATCGTTGTAGGTAACTCATATTATGTTAAATTT
    ACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCCAGGTAGAACATATAACCTGAAAAAG
    ATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGCATGGGAAGTTGGTAAG
    AAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCGTTACAAGG
    CAGGTTCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGGAAAGGT
    CAAATTGCAATAAAAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGTGGCTAT
    AATAAAGCTGCGGGTGCATACTTTATGCTTGTTGAATCAAAAGACAAGAAAGGTAAGACT
    ATTAGAACTATAGAATTTATACCCCTGTACCTTAAAAACAAAATTGAATCGGATGAGTCA
    ATCGCGTTAAATTTTCTAGAGAAAGGAAGGGGTTTAAAAGAACCAAAGATCCTGTTAAAA
    AAGATTAAGATTGACACCTTGTTCGATGTAGATGGATTTAAAATGTGGTTATCTGGCAGA
    ACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTGGATGAGAAAATCATT
    GTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGAGTTGAAA
    TTATCTGATAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACATTCGTT
    GATAAACTTGAAAATACCGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACATTAATT
    GATAAACAAAAAGAATTTGAAAGGCTATCACTGGAAGACAAATCCTCCACCCTATTTGAA
    ATTTTGCATATATTCCAGTGCCAATCTTCAGCAGCTAATTTAAAAATGATTGGCGGACCT
    GGGAAAGCCGGCATCCTAGTGATGAACAATAATATCTCCAAGTGTAACAAAATATCAATT
    ATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGACTTGCTTAAGATATAAGAA
    ATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATT
    ATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTT
    TATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTT
    ATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTA
    TTTCC
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    70 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATTCTTTCGACTCTTTCACCAACCTGTACTCTCTGTCTAAAACCCTGAAATTCGAAA
    TGCGTCCGGTTGGTAACACCCAGAAAATGCTGGACAACGCGGGTGTTTTCGAAAAAGACA
    AACTGATCCAGAAAAAATACGGTAAAACCAAACCGTACTTCGACCGTCTGCACCGTGAAT
    TCATCGAAGAAGCGCTGACCGGTGTTGAACTGATCGGTCTGGACGAAAACTTCCGTACCC
    TGGTTGACTGGCAGAAAGACAAAAAAAACAACGTTGCGATGAAAGCGTACGAAAACTCTC
    TGCAGCGTCTGCGTACCGAAATCGGTAAAATCTTCAACCTGAAAGCGGAAGACTGGGTTA
    AAAACAAATACCCGATCCTGGGTCTGAAAAACAAAAACACCGACATCCTGTTCGAAGAAG
    CGGTTTTCGGTATCCTGAAAGCGCGTTACGGTGAAGAAAAAGACACCTTCATCGAAGTTG
    AAGAAATCGACAAAACCGGTAAATCTAAAATCAACCAGATCTCTATCTTCGACTCTTGGA
    AAGGTTTCACCGGTTACTTCAAAAAATTCTTCGAAACCCGTAAAAACTTCTACAAAAACG
    ACGGTACCTCTACCGCGATCGCGACCCGTATCATCGACCAGAACCTGAAACGTTTCATCG
    ACAACCTGTCTATCGTTGAATCTGTTCGTCAGAAAGTTGACCTGGCGGAAACCGAAAAAT
    CTTTCTCTATCTCTCTGTCTCAGTTCTTCTCTATCGACTTCTACAACAAATGCCTGCTGC
    AGGACGGTATCGACTACTACAACAAAATCATCGGTGGTGAAACCCTGAAAAACGGTGAAA
    AACTGATCGGTCTGAACGAACTGATCAACCAGTACCGTCAGAACAACAAAGACCAGAAAA
    TCCCGTTCTTCAAACTGCTGGACAAACAGATCCTGTCTGAAAAAATCCTGTTCCTGGACG
    AAATCAAAAACGACACCGAACTGATCGAAGCGCTGTCTCAGTTCGCGAAAACCGCGGAAG
    AAAAAACCAAAATCGTTAAAAAACTGTTCGCGGACTTCGTTGAAAACAACTCTAAATACG
    ACCTGGCGCAGATCTACATCTCTCAGGAAGCGTTCAACACCATCTCTAACAAATGGACCT
    CTGAAACCGAAACCTTCGCGAAATACCTGTTCGAAGCGATGAAATCTGGTAAACTGGCGA
    AATACGAAAAAAAAGACAACTCTTACAAATTCCCGGACTTCATCGCGCTGTCTCAGATGA
    AATCTGCGCTGCTGTCTATCTCTCTGGAAGGTCACTTCTGGAAAGAAAAATACTACAAAA
    TCTCTAAATTCCAGGAAAAAACCAACTGGGAACAGTTCCTGGCGATCTTCCTGTACGAAT
    TCAACTCTCTGTTCTCTGACAAAATCAACACCAAAGACGGTGAAACCAAACAGGTTGGTT
    ACTACCTGTTCGCGAAAGACCTGCACAACCTGATCCTGTCTGAACAGATCGACATCCCGA
    AAGACTCTAAAGTTACCATCAAAGACTTCGCGGACTCTGTTCTGACCATCTACCAGATGG
    CGAAATACTTCGCGGTTGAAAAAAAACGTGCGTGGCTGGCGGAATACGAACTGGACTCTT
    TCTACACCCAGCCGGACACCGGTTACCTGCAGTTCTACGACAACGCGTACGAAGACATCG
    TTCAGGTTTACAACAAACTGCGTAACTACCTGACCAAAAAACCGTACTCTGAAGAAAAAT
    GGAAACTGAACTTCGAAAACTCTACCCTGGCGAACGGTTGGGACAAAAACAAAGAATCTG
    ACAACTCTGCGGTTATCCTGCAGAAAGGTGGTAAATACTACCTGGGTCTGATCACCAAAG
    GTCACAACAAAATCTTCGACGACCGTTTCCAGGAAAAATTCATCGTTGGTATCGAAGGTG
    GTAAATACGAAAAAATCGTTTACAAATTCTTCCCGGACCAGGCGAAAATGTTCCCGAAAG
    TTTGCTTCTCTGCGAAAGGTCTGGAATTCTTCCGTCCGTCTGAAGAAATCCTGCGTATCT
    ACAACAACGCGGAATTCAAAAAAGGTGAAACCTACTCTATCGACTCTATGCAGAAACTGA
    TCGACTTCTACAAAGACTGCCTGACCAAATACGAAGGTTGGGCGTGCTACACCTTCCGTC
    ACCTGAAACCGACCGAAGAATACCAGAACAACATCGGTGAATTCTTCCGTGACGTTGCGG
    AAGACGGTTACCGTATCGACTTCCAGGGTATCTCTGACCAGTACATCCACGAAAAAAACG
    AAAAAGGTGAACTGCACCTGTTCGAAATCCACAACAAAGACTGGAACCTGGACAAAGCGC
    GTGACGGTAAATCTAAAACCACCCAGAAAAACCTGCACACCCTGTACTTCGAATCTCTGT
    TCTCTAACGACAACGTTGTTCAGAACTTCCCGATCAAACTGAACGGTCAGGCGGAAATCT
    TCTACCGTCCGAAAACCGAAAAAGACAAACTGGAATCTAAAAAAGACAAAAAAGGTAACA
    AAGTTATCGACCACAAACGTTACTCTGAAAACAAAATCTTCTTCCACGTTCCGCTGACCC
    TGAACCGTACCAAAAACGACTCTTACCGTTTCAACGCGCAGATCAACAACTTCCTGGCGA
    ACAACAAAGACATCAACATCATCGGTGTTGACCGTGGTGAAAAACACCTGGTTTACTACT
    CTGTTATCACCCAGGCGTCTGACATCCTGGAATCTGGTTCTCTGAACGAACTGAACGGTG
    TTAACTACGCGGAAAAACTGGGTAAAAAAGCGGAAAACCGTGAACAGGCGCGTCGTGACT
    GGCAGGACGTTCAGGGTATCAAAGACCTGAAAAAAGGTTACATCTCTCAGGTTGTTCGTA
    AACTGGCGGACCTGGCGATCAAACACAACGCGATCATCATCCTGGAAGACCTGAACATGC
    GTTTCAAACAGGTTCGTGGTGGTATCGAAAAATCTATCTACCAGCAGCTGGAAAAAGCGC
    TGATCGACAAACTGTCTTTCCTGGTTGACAAAGGTGAAAAAAACCCGGAACAGGCGGGTC
    ACCTGCTGAAAGCGTACCAGCTGTCTGCGCCGTTCGAAACCTTCCAGAAAATGGGTAAAC
    AGACCGGTATCATCTTCTACACCCAGGCGTCTTACACCTCTAAATCTGACCCGGTTACCG
    GTTGGCGTCCGCACCTGTACCTGAAATACTTCTCTGCGAAAAAAGCGAAAGACGACATCG
    CGAAATTCACCAAAATCGAATTCGTTAACGACCGTTTCGAACTGACCTACGACATCAAAG
    ACTTCCAGCAGGCGAAAGAATACCCGAACAAAACCGTTTGGAAAGTTTGCTCTAACGTTG
    AACGTTTCCGTTGGGACAAAAACCTGAACCAGAACAAAGGTGGTTACACCCACTACACCA
    ACATCACCGAAAACATCCAGGAACTGTTCACCAAATACGGTATCGACATCACCAAAGACC
    TGCTGACCCAGATCTCTACCATCGACGAAAAACAGAACACCTCTTTCTTCCGTGACTTCA
    TCTTCTACTTCAACCTGATCTGCCAGATCCGTAACACCGACGACTCTGAAATCGCGAAAA
    AAAACGGTAAAGACGACTTCATCCTGTCTCCGGTTGAACCGTTCTTCGACTCTCGTAAAG
    ACAACGGTAACAAACTGCCGGAAAACGGTGACGACAACGGTGCGTACAACATCGCGCGTA
    AAGGTATCGTTATCCTGAACAAAATCTCTCAGTACTCTGAAAAAAACGAAAACTGCGAAA
    AAATGAAATGGGGTGACCTGTACGTTTCTAACATCGACTGGGACAACTTCGTTGAAATCA
    TCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAAT
    ATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    71 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATAACAAATTCGAAAACTTCACCGGTCTGTACCCGATCTCTAAAACCCTGCGTTTCG
    AACTGATCCCGCAGGGTAAAACCCTGGAATACATCGAAAAATCTGAAATCCTGGAAAACG
    ACAACTACCGTGCGGAAAAATACGAAGAAGTTAAAGACATCATCGACGGTTACCACAAAT
    GGTTCATCAACGAAACCCTGCACGACCTGCACATCAACTGGTCTGAACTGAAAGTTGCGC
    TGGAAAACAACCGTATCGAAAAATCTGACGCGTCTAAAAAAGAACTGCAGCGTGTTCAGA
    AAATCAAACGTGAAGAAATCTACAACGCGTTCATCGAACACGAAGCGTTCCAGTACCTGT
    TCAAAGAAAACCTGCTGTCTGACCTGCTGCCGATCCAGATCGAACAGTCTGAAGACCTGG
    ACGCGGAAAAAAAAAAACAGGCGGTTGAAACCTTCAACCGTTTCTCTACCTACTTCACCG
    GTTTCCACGAAAACCGTAAAAACATCTACTCTAAAGAAGGTATCTCTACCTCTGTTACCT
    ACCGTATCGTTCACGACAACTTCCCGAAATTCCTGGAAAACATGAAAGTTTTCGAAATCC
    TGCGTAACGAATGCCCGGAAGTTATCTCTGACACCGCGAACGAACTGGCGCCGTTCATCG
    ACGGTGTTCGTATCGAAGACATCTTCCTGATCGACTTCTTCAACTCTACCTTCTCTCAGA
    ACGGTATCGACTACTACAACCGTATCCTGGGTGGTGTTACCACCGAAACCGGTGAAAAAT
    ACCGTGGTATCAACGAATTCACCAACCTGTACCGTCAGCAGCACCCGGAATTCGGTAAAT
    CTAAAAAAGCGACCAAAATGGTTGTTCTGTTCAAACAGATCCTGTCTGACCGTGACACCC
    TGTCTTTCATCCCGGAAATGTTCGGTAACGACAAACAGGTTCAGAACTCTATCCAGCTGT
    TCTACAACCGTGAAATCTCTCAGTTCGAAAACGAAGGTGTTAAAACCGACGTTTGCACCG
    CGCTGGCGACCCTGACCTCTAAAATCGCGGAATTCGACACCGAAAAAATCTACATCCAGC
    AGCCGGAACTGCCGAACGTTTCTCAGCGTCTGTTCGGTTCTTGGAACGAACTGAACGCGT
    GCCTGTTCAAATACGCGGAACTGAAATTCGGTACCGCGGAAAAAGTTGCGAACCGTAAAA
    AAATCGACAAATGGCTGAAATCTGACCTGTTCTCTTTCACCGAACTGAACAAAGCGCTGG
    AATTCTCTGGTAAAGACGAACGTATCGAAAACTACTTCTCTGAAACCGGTATCTTCGCGC
    AGCTGGTTAAAACCGGTTTCGACGAAGCGCAGTCTATCCTGGAAACCGAATACACCTCTG
    AAGTTCACCTGAAAGACCAGCAGACCGACATCGAAAAAATCAAAACCTTCCTGGACGCGC
    TGCAGAACCTGATGCACCTGCTGAAATCTCTGTGCGTTTCTGAAGAAGCGGACCGTGACG
    CGGCGTTCTACAACGAATTCGACATGCTGTACAACCAGCTGAAACTGGTTGTTCCGCTGT
    ACAACAAAGTTCGTAACTACATCACCCAGAAACTGTTCCGTTCTGACAAAATCAAAATCT
    ACTTCGAAAACAAAGGTCAGTTCCTGGGTGGTTGGGTTGACTCTCAGACCGAAAACTCTG
    ACAACGGTACCCAGGCGGGTGGTTACATCTTCCGTAAAGAAAACGTTATCAACGAATACG
    ACTACTACCTGGGTATCTGCTCTGACCCGAAACTGTTCCGTCGTACCACCATCGTTTCTG
    AAAACGACCGTTCTTCTTTCGAACGTCTGGACTACTACCAGCTGAAAACCGCGTCTGTTT
    ACGGTAACTCTTACTGCGGTAAACACCCGTACACCGAAGACAAAAACGAACTGGTTAACT
    CTATCGACCGTTTCGTTCACCTGTCTGGTAACAACATCCTGATCGAAAAAATCGCGAAAG
    ACAAAGTTAAATCTAACCCGACCACCAACACCCCGTCTGGTTACCTGAACTTCATCCACC
    GTGAAGCGCCGAACACCTACGAATGCCTGCTGCAGGACGAAAACTTCGTTTCTCTGAACC
    AGCGTGTTGTTTCTGCGCTGAAAGCGACCCTGGCGACCCTGGTTCGTGTTCCGAAAGCGC
    TGGTTTACGCGAAAAAAGACTACCACCTGTTCTCTGAAATCATCAACGACATCGACGAAC
    TGTCTTACGAAAAAGCGTTCTCTTACTTCCCGGTTTCTCAGACCGAATTCGAAAACTCTT
    CTAACCGTACCATCAAACCGCTGCTGCTGTTCAAAATCTCTAACAAAGACCTGTCTTTCG
    CGGAAAACTTCGAAAAAGGTAACCGTCAGAAAATCGGTAAAAAAAACCTGCACACCCTGT
    ACTTCGAAGCGCTGATGAAAGGTAACCAGGACACCATCGACATCGGTACCGGTATGGTTT
    TCCACCGTGTTAAATCTCTGAACTACAACGAAAAAACCCTGAAATACGGTCACCACTCTA
    CCCAGCTGAACGAAAAATTCTCTTACCCGATCATCAAAGACAAACGTTTCGCGTCTGACA
    AATTCCTGTTCCACCTGTCTACCGAAATCAACTACAAAGAAAAACGTAAACCGCTGAACA
    ACTCTATCATCGAATTCCTGACCAACAACCCGGACATCAACATCATCGGTCTGGACCGTG
    GTGAACGTCACCTGATCTACCTGACCCTGATCAACCAGAAAGGTGAAATCCTGCGTCAGA
    AAACCTTCAACATCGTTGGTAACACCAACTACCACGAAAAACTGAACCAGCGTGAAAAAG
    AACGTGACAACGCGCGTAAATCTTGGGCGACCATCGGTAAAATCAAAGAACTGAAAGAAG
    GTTTCCTGTCTCTGGTTATCCACGAAATCGCGAAAATCATGGTTGAAAACAACGCGATCG
    TTGTTCTGGAAGACCTGAACTTCGGTTTCAAACGTGGTCGTTTCAAAGTTGAAAAACAGA
    TCTACCAGAAATTCGAAAAAATGCTGATCGACAAACTGAACTACCTGGTTTTCAAAGACA
    AAAAAGCGAACGAAGCGGGTGGTGTTCTGAAAGGTTACCAGCTGGCGGAAAAATTCGAAT
    CTTTCCAGAAAATGGGTAAACAGTCTGGTTTCCTGTTCTACGTTCCGGCGGCGTACACCT
    CTAAAATCGACCCGACCACCGGTTTCGTTAACATGCTGAACCTGAACTACACCAACATGA
    AAGACGCGCAGACCCTGCTGTCTGGTATGGACAAAATCTCTTTCAACGCGGACGCGAACT
    ACTTCGAATTCGAACTGGACTACGAAAAATTCAAAACCAACCAGACCGACCACACCAACA
    AATGGACCATCTGCACCGTTGGTGAAAAACGTTTCACCTACAACTCTGCGACCAAAGAAA
    CCACCACCGTTAACGTTACCGAAGACCTGAAAAAACTGCTGGACAAATTCGAAGTTAAAT
    ACTCTAACGGTGACAACATCAAAGACGAAATCTGCCGTCAGACCGACGCGAAATTCTTCG
    AAATCATCCTGTGGCTGCTGAAACTGACCATGCAGATGCGTAACTCTAACACCAAAACCG
    AAGAAGACTTCATCCTGTCTCCGGTTAAAAACTCTAACGGTGAATTCTTCCGTTCTAACG
    ACGACGCGAACGGTATCTGGCCGGCGGACGCGGACGCGAACGGTGCGTACCACATCGCGC
    TGAAAGGTCTGTACCTGGTTAAAGAATGCTTCAACAAAAACGAAAAATCTCTGAAAATCG
    AACACAAAAACTGGTTCAAATTCGCGCAGACCCGTTTCAACGGTTCTCTGACCAAAAACG
    GTTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACC
    CTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    72 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATACCCAGTTCGAAGGTTTCACCAACCTGTACCAGGTTTCTAAAACCCTGCGTTTCG
    AACTGATCCCGCAGGGTAAAACCCTGAAACACATCCAGGAACAGGGTTTCATCGAAGAAG
    ACAAAGCGCGTAACGACCACTACAAAGAACTGAAACCGATCATCGACCGTATCTACAAAA
    CCTACGCGGACCAGTGCCTGCAGCTGGTTCAGCTGGACTGGGAAAACCTGTCTGCGGCGA
    TCGACTCTTACCGTAAAGAAAAAACCGAAGAAACCCGTAACGCGCTGATCGAAGAACAGG
    CGACCTACCGTAACGCGATCCACGACTACTTCATCGGTCGTACCGACAACCTGACCGACG
    CGATCAACAAACGTCACGCGGAAATCTACAAAGGTCTGTTCAAAGCGGAACTGTTCAACG
    GTAAAGTTCTGAAACAGCTGGGTACCGTTACCACCACCGAACACGAAAACGCGCTGCTGC
    GTTCTTTCGACAAATTCACCACCTACTTCTCTGGTTTCTACGAAAACCGTAAAAACGTTT
    TCTCTGCGGAAGACATCTCTACCGCGATCCCGCACCGTATCGTTCAGGACAACTTCCCGA
    AATTCAAAGAAAACTGCCACATCTTCACCCGTCTGATCACCGCGGTTCCGTCTCTGCGTG
    AACACTTCGAAAACGTTAAAAAAGCGATCGGTATCTTCGTTTCTACCTCTATCGAAGAAG
    TTTTCTCTTTCCCGTTCTACAACCAGCTGCTGACCCAGACCCAGATCGACCTGTACAACC
    AGCTGCTGGGTGGTATCTCTCGTGAAGCGGGTACCGAAAAAATCAAAGGTCTGAACGAAG
    TTCTGAACCTGGCGATCCAGAAAAACGACGAAACCGCGCACATCATCGCGTCTCTGCCGC
    ACCGTTTCATCCCGCTGTTCAAACAGATCCTGTCTGACCGTAACACCCTGTCTTTCATCC
    TGGAAGAATTCAAATCTGACGAAGAAGTTATCCAGTCTTTCTGCAAATACAAAACCCTGC
    TGCGTAACGAAAACGTTCTGGAAACCGCGGAAGCGCTGTTCAACGAACTGAACTCTATCG
    ACCTGACCCACATCTTCATCTCTCACAAAAAACTGGAAACCATCTCTTCTGCGCTGTGCG
    ACCACTGGGACACCCTGCGTAACGCGCTGTACGAACGTCGTATCTCTGAACTGACCGGTA
    AAATCACCAAATCTGCGAAAGAAAAAGTTCAGCGTTCTCTGAAACACGAAGACATCAACC
    TGCAGGAAATCATCTCTGCGGCGGGTAAAGAACTGTCTGAAGCGTTCAAACAGAAAACCT
    CTGAAATCCTGTCTCACGCGCACGCGGCGCTGGACCAGCCGCTGCCGACCACCCTGAAAA
    AACAGGAAGAAAAAGAAATCCTGAAATCTCAGCTGGACTCTCTGCTGGGTCTGTACCACC
    TGCTGGACTGGTTCGCGGTTGACGAATCTAACGAAGTTGACCCGGAATTCTCTGCGCGTC
    TGACCGGTATCAAACTGGAAATGGAACCGTCTCTGTCTTTCTACAACAAAGCGCGTAACT
    ACGCGACCAAAAAACCGTACTCTGTTGAAAAATTCAAACTGAACTTCCAGATGCCGACCC
    TGGCGTCTGGTTGGGACGTTAACAAAGAAAAAAACAACGGTGCGATCCTGTTCGTTAAAA
    ACGGTCTGTACTACCTGGGTATCATGCCGAAACAGAAAGGTCGTTACAAAGCGCTGTCTT
    TCGAACCGACCGAAAAAACCTCTGAAGGTTTCGACAAAATGTACTACGACTACTTCCCGG
    ACGCGGCGAAAATGATCCCGAAATGCTCTACCCAGCTGAAAGCGGTTACCGCGCACTTCC
    AGACCCACACCACCCCGATCCTGCTGTCTAACAACTTCATCGAACCGCTGGAAATCACCA
    AAGAAATCTACGACCTGAACAACCCGGAAAAAGAACCGAAAAAATTCCAGACCGCGTACG
    CGAAAAAAACCGGTGACCAGAAAGGTTACCGTGAAGCGCTGTGCAAATGGATCGACTTCA
    CCCGTGACTTCCTGTCTAAATACACCAAAACCACCTCTATCGACCTGTCTTCTCTGCGTC
    CGTCTTCTCAGTACAAAGACCTGGGTGAATACTACGCGGAACTGAACCCGCTGCTGTACC
    ACATCTCTTTCCAGCGTATCGCGGAAAAAGAAATCATGGACGCGGTTGAAACCGGTAAAC
    TGTACCTGTTCCAGATCTACAACAAAGACTTCGCGAAAGGTCACCACGGTAAACCGAACC
    TGCACACCCTGTACTGGACCGGTCTGTTCTCTCCGGAAAACCTGGCGAAAACCTCTATCA
    AACTGAACGGTCAGGCGGAACTGTTCTACCGTCCGAAATCTCGTATGAAACGTATGGCGC
    ACCGTCTGGGTGAAAAAATGCTGAACAAAAAACTGAAAGACCAGAAAACCCCGATCCCGG
    ACACCCTGTACCAGGAACTGTACGACTACGTTAACCACCGTCTGTCTCACGACCTGTCTG
    ACGAAGCGCGTGCGCTGCTGCCGAACGTTATCACCAAAGAAGTTTCTCACGAAATCATCA
    AAGACCGTCGTTTCACCTCTGACAAATTCTTCTTCCACGTTCCGATCACCCTGAACTACC
    AGGCGGCGAACTCTCCGTCTAAATTCAACCAGCGTGTTAACGCGTACCTGAAAGAACACC
    CGGAAACCCCGATCATCGGTATCGACCGTGGTGAACGTAACCTGATCTACATCACCGTTA
    TCGACTCTACCGGTAAAATCCTGGAACAGCGTTCTCTGAACACCATCCAGCAGTTCGACT
    ACCAGAAAAAACTGGACAACCGTGAAAAAGAACGTGTTGCGGCGCGTCAGGCGTGGTCTG
    TTGTTGGTACCATCAAAGACCTGAAACAGGGTTACCTGTCTCAGGTTATCCACGAAATCG
    TTGACCTGATGATCCACTACCAGGCGGTTGTTGTTCTGGAAAACCTGAACTTCGGTTTCA
    AATCTAAACGTACCGGTATCGCGGAAAAAGCGGTTTACCAGCAGTTCGAAAAAATGCTGA
    TCGACAAACTGAACTGCCTGGTTCTGAAAGACTACCCGGCGGAAAAAGTTGGTGGTGTTC
    TGAACCCGTACCAGCTGACCGACCAGTTCACCTCTTTCGCGAAAATGGGTACCCAGTCTG
    GTTTCCTGTTCTACGTTCCGGCGCCGTACACCTCTAAAATCGACCCGCTGACCGGTTTCG
    TTGACCCGTTCGTTTGGAAAACCATCAAAAACCACGAATCTCGTAAACACTTCCTGGAAG
    GTTTCGACTTCCTGCACTACGACGTTAAAACCGGTGACTTCATCCTGCACTTCAAAATGA
    ACCGTAACCTGTCTTTCCAGCGTGGTCTGCCGGGTTTCATGCCGGCGTGGGACATCGTTT
    TCGAAAAAAACGAAACCCAGTTCGACGCGAAAGGTACCCCGTTCATCGCGGGTAAACGTA
    TCGTTCCGGTTATCGAAAACCACCGTTTCACCGGTCGTTACCGTGACCTGTACCCGGCGA
    ACGAACTGATCGCGCTGCTGGAAGAAAAAGGTATCGTTTTCCGTGACGGTTCTAACATCC
    TGCCGAAACTGCTGGAAAACGACGACTCTCACGCGATCGACACCATGGTTGCGCTGATCC
    GTTCTGTTCTGCAGATGCGTAACTCTAACGCGGCGACCGGTGAAGACTACATCAACTCTC
    CGGTTCGTGACCTGAACGGTGTTTGCTTCGACTCTCGTTTCCAGAACCCGGAATGGCCGA
    TGGACGCGGACGCGAACGGTGCGTACCACATCGCGCTGAAAGGTCAGCTGCTGCTGAACC
    ACCTGAAAGAATCTAAAGACCTGAAACTGCAGAACGGTATCTCTAACCAGGACTGGCTGG
    CGTACATCCAGGAACTGCGTAACTAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTT
    ATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAA
    GCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    73 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATGCGGTTAAATCTATCAAAGTTAAACTGCGTCTGGACGACATGCCGGAAATCCGTG
    CGGGTCTGTGGAAACTGCACAAAGAAGTTAACGCGGGTGTTCGTTACTACACCGAATGGC
    TGTCTCTGCTGCGTCAGGAAAACCTGTACCGTCGTTCTCCGAACGGTGACGGTGAACAGG
    AATGCGACAAAACCGCGGAAGAATGCAAAGCGGAACTGCTGGAACGTCTGCGTGCGCGTC
    AGGTTGAAAACGGTCACCGTGGTCCGGCGGGTTCTGACGACGAACTGCTGCAGCTGGCGC
    GTCAGCTGTACGAACTGCTGGTTCCGCAGGCGATCGGTGCGAAAGGTGACGCGCAGCAGA
    TCGCGCGTAAATTCCTGTCTCCGCTGGCGGACAAAGACGCGGTTGGTGGTCTGGGTATCG
    CGAAAGCGGGTAACAAACCGCGTTGGGTTCGTATGCGTGAAGCGGGTGAACCGGGTTGGG
    AAGAAGAAAAAGAAAAAGCGGAAACCCGTAAATCTGCGGACCGTACCGCGGACGTTCTGC
    GTGCGCTGGCGGACTTCGGTCTGAAACCGCTGATGCGTGTTTACACCGACTCTGAAATGT
    CTTCTGTTGAATGGAAACCGCTGCGTAAAGGTCAGGCGGTTCGTACCTGGGACCGTGACA
    TGTTCCAGCAGGCGATCGAACGTATGATGTCTTGGGAATCTTGGAACCAGCGTGTTGGTC
    AGGAATACGCGAAACTGGTTGAACAGAAAAACCGTTTCGAACAGAAAAACTTCGTTGGTC
    AGGAACACCTGGTTCACCTGGTTAACCAGCTGCAGCAGGACATGAAAGAAGCGTCTCCGG
    GTCTGGAATCTAAAGAACAGACCGCGCACTACGTTACCGGTCGTGCGCTGCGTGGTTCTG
    ACAAAGTTTTCGAAAAATGGGGTAAACTGGCGCCGGACGCGCCGTTCGACCTGTACGACG
    CGGAAATCAAAAACGTTCAGCGTCGTAACACCCGTCGTTTCGGTTCTCACGACCTGTTCG
    CGAAACTGGCGGAACCGGAATACCAGGCGCTGTGGCGTGAAGACGCGTCTTTCCTGACCC
    GTTACGCGGTTTACAACTCTATCCTGCGTAAACTGAACCACGCGAAAATGTTCGCGACCT
    TCACCCTGCCGGACGCGACCGCGCACCCGATCTGGACCCGTTTCGACAAACTGGGTGGTA
    ACCTGCACCAGTACACCTTCCTGTTCAACGAATTCGGTGAACGTCGTCACGCGATCCGTT
    TCCACAAACTGCTGAAAGTTGAAAACGGTGTTGCGCGTGAAGTTGACGACGTTACCGTTC
    CGATCTCTATGTCTGAACAGCTGGACAACCTGCTGCCGCGTGACCCGAACGAACCGATCG
    CGCTGTACTTCCGTGACTACGGTGCGGAACAGCACTTCACCGGTGAATTCGGTGGTGCGA
    AAATCCAGTGCCGTCGTGACCAGCTGGCGCACATGCACCGTCGTCGTGGTGCGCGTGACG
    TTTACCTGAACGTTTCTGTTCGTGTTCAGTCTCAGTCTGAAGCGCGTGGTGAACGTCGTC
    CGCCGTACGCGGCGGTTTTCCGTCTGGTTGGTGACAACCACCGTGCGTTCGTTCACTTCG
    ACAAACTGTCTGACTACCTGGCGGAACACCCGGACGACGGTAAACTGGGTTCTGAAGGTC
    TGCTGTCTGGTCTGCGTGTTATGTCTGTTGACCTGGGTCTGCGTACCTCTGCGTCTATCT
    CTGTTTTCCGTGTTGCGCGTAAAGACGAACTGAAACCGAACTCTAAAGGTCGTGTTCCGT
    TCTTCTTCCCGATCAAAGGTAACGACAACCTGGTTGCGGTTCACGAACGTTCTCAGCTGC
    TGAAACTGCCGGGTGAAACCGAATCTAAAGACCTGCGTGCGATCCGTGAAGAACGTCAGC
    GTACCCTGCGTCAGCTGCGTACCCAGCTGGCGTACCTGCGTCTGCTGGTTCGTTGCGGTT
    CTGAAGACGTTGGTCGTCGTGAACGTTCTTGGGCGAAACTGATCGAACAGCCGGTTGACG
    CGGCGAACCACATGACCCCGGACTGGCGTGAAGCGTTCGAAAACGAACTGCAGAAACTGA
    AATCTCTGCACGGTATCTGCTCTGACAAAGAATGGATGGACGCGGTTTACGAATCTGTTC
    GTCGTGTTTGGCGTCACATGGGTAAACAGGTTCGTGACTGGCGTAAAGACGTTCGTTCTG
    GTGAACGTCCGAAAATCCGTGGTTACGCGAAAGACGTTGTTGGTGGTAACTCTATCGAAC
    AGATCGAATACCTGGAACGTCAGTACAAATTCCTGAAATCTTGGTCTTTCTTCGGTAAAG
    TTTCTGGTCAGGTTATCCGTGCGGAAAAAGGTTCTCGTTTCGCGATCACCCTGCGTGAAC
    ACATCGACCACGCGAAAGAAGACCGTCTGAAAAAACTGGCGGACCGTATCATCATGGAAG
    CGCTGGGTTACGTTTACGCGCTGGACGAACGTGGTAAAGGTAAATGGGTTGCGAAATACC
    CGCCGTGCCAGCTGATCCTGCTGGAAGAACTGTCTGAATACCAGTTCAACAACGACCGTC
    CGCCGTCTGAAAACAACCAGCTGATGCAGTGGTCTCACCGTGGTGTTTTCCAGGAACTGA
    TCAACCAGGCGCAGGTTCACGACCTGCTGGTTGGTACCATGTACGCGGCGTTCTCTTCTC
    GTTTCGACGCGCGTACCGGTGCGCCGGGTATCCGTTGCCGTCGTGTTCCGGCGCGTTGCA
    CCCAGGAACACAACCCGGAACCGTTCCCGTGGTGGCTGAACAAATTCGTTGTTGAACACA
    CCCTGGACGCGTGCCCGCTGCGTGCGGACGACCTGATCCCGACCGGTGAAGGTGAAATCT
    TCGTTTCTCCGTTCTCTGCGGAAGAAGGTGACTTCCACCAGATCCACGCGGACCTGAACG
    CGGCGCAGAACCTGCAGCAGCGTCTGTGGTCTGACTTCGACATCTCTCAGATCCGTCTGC
    GTTGCGACTGGGGTGAAGTTGACGGTGAACTGGTTCTGATCCCGCGTCTGACCGGTAAAC
    GTACCGCGGACTCTTACTCTAACAAAGTTTTCTACACCAACACCGGTGTTACCTACTACG
    AACGTGAACGTGGTAAAAAACGTCGTAAAGTTTTCGCGCAGGAAAAACTGTCTGAAGAAG
    AAGCGGAACTGCTGGTTGAAGCGGACGAAGCGCGTGAAAAATCTGTTGTTCTGATGCGTG
    ACCCGTCTGGTATCATCAACCGTGGTAACTGGACCCGTCAGAAAGAATTCTGGTCTATGG
    TTAACCAGCGTATCGAAGGTTACCTGGTTAAACAGATCCGTTCTCGTGTTCCGCTGCAGG
    ACTCTGCGTGCGAAAACACCGGTGACATCTAAGAAATCATCCTTAGCGAAAGCTAAGGAT
    TTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTAC
    TCAGGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    74 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATGCGACCCGTTCTTTCATCCTGAAAATCGAACCGAACGAAGAAGTTAAAAAAGGTC
    TGTGGAAAACCCACGAAGTTCTGAACCACGGTATCGCGTACTACATGAACATCCTGAAAC
    TGATCCGTCAGGAAGCGATCTACGAACACCACGAACAGGACCCGAAAAACCCGAAAAAAG
    TTTCTAAAGCGGAAATCCAGGCGGAACTGTGGGACTTCGTTCTGAAAATGCAGAAATGCA
    ACTCTTTCACCCACGAAGTTGACAAAGACGTTGTTTTCAACATCCTGCGTGAACTGTACG
    AAGAACTGGTTCCGTCTTCTGTTGAAAAAAAAGGTGAAGCGAACCAGCTGTCTAACAAAT
    TCCTGTACCCGCTGGTTGACCCGAACTCTCAGTCTGGTAAAGGTACCGCGTCTTCTGGTC
    GTAAACCGCGTTGGTACAACCTGAAAATCGCGGGTGACCCGTCTTGGGAAGAAGAAAAAA
    AAAAATGGGAAGAAGACAAAAAAAAAGACCCGCTGGCGAAAATCCTGGGTAAACTGGCGG
    AATACGGTCTGATCCCGCTGTTCATCCCGTTCACCGACTCTAACGAACCGATCGTTAAAG
    AAATCAAATGGATGGAAAAATCTCGTAACCAGTCTGTTCGTCGTCTGGACAAAGACATGT
    TCATCCAGGCGCTGGAACGTTTCCTGTCTTGGGAATCTTGGAACCTGAAAGTTAAAGAAG
    AATACGAAAAAGTTGAAAAAGAACACAAAACCCTGGAAGAACGTATCAAAGAAGACATCC
    AGGCGTTCAAATCTCTGGAACAGTACGAAAAAGAACGTCAGGAACAGCTGCTGCGTGACA
    CCCTGAACACCAACGAATACCGTCTGTCTAAACGTGGTCTGCGTGGTTGGCGTGAAATCA
    TCCAGAAATGGCTGAAAATGGACGAAAACGAACCGTCTGAAAAATACCTGGAAGTTTTCA
    AAGACTACCAGCGTAAACACCCGCGTGAAGCGGGTGACTACTCTGTTTACGAATTCCTGT
    CTAAAAAAGAAAACCACTTCATCTGGCGTAACCACCCGGAATACCCGTACCTGTACGCGA
    CCTTCTGCGAAATCGACAAAAAAAAAAAAGACGCGAAACAGCAGGCGACCTTCACCCTGG
    CGGACCCGATCAACCACCCGCTGTGGGTTCGTTTCGAAGAACGTTCTGGTTCTAACCTGA
    ACAAATACCGTATCCTGACCGAACAGCTGCACACCGAAAAACTGAAAAAAAAACTGACCG
    TTCAGCTGGACCGTCTGATCTACCCGACCGAATCTGGTGGTTGGGAAGAAAAAGGTAAAG
    TTGACATCGTTCTGCTGCCGTCTCGTCAGTTCTACAACCAGATCTTCCTGGACATCGAAG
    AAAAAGGTAAACACGCGTTCACCTACAAAGACGAATCTATCAAATTCCCGCTGAAAGGTA
    CCCTGGGTGGTGCGCGTGTTCAGTTCGACCGTGACCACCTGCGTCGTTACCCGCACAAAG
    TTGAATCTGGTAACGTTGGTCGTATCTACTTCAACATGACCGTTAACATCGAACCGACCG
    AATCTCCGGTTTCTAAATCTCTGAAAATCCACCGTGACGACTTCCCGAAATTCGTTAACT
    TCAAACCGAAAGAACTGACCGAATGGATCAAAGACTCTAAAGGTAAAAAACTGAAATCTG
    GTATCGAATCTCTGGAAATCGGTCTGCGTGTTATGTCTATCGACCTGGGTCAGCGTCAGG
    CGGCGGCGGCGTCTATCTTCGAAGTTGTTGACCAGAAACCGGACATCGAAGGTAAACTGT
    TCTTCCCGATCAAAGGTACCGAACTGTACGCGGTTCACCGTGCGTCTTTCAACATCAAAC
    TGCCGGGTGAAACCCTGGTTAAATCTCGTGAAGTTCTGCGTAAAGCGCGTGAAGACAACC
    TGAAACTGATGAACCAGAAACTGAACTTCCTGCGTAACGTTCTGCACTTCCAGCAGTTCG
    AAGACATCACCGAACGTGAAAAACGTGTTACCAAATGGATCTCTCGTCAGGAAAACTCTG
    ACGTTCCGCTGGTTTACCAGGACGAACTGATCCAGATCCGTGAACTGATGTACAAACCGT
    ACAAAGACTGGGTTGCGTTCCTGAAACAGCTGCACAAACGTCTGGAAGTTGAAATCGGTA
    AAGAAGTTAAACACTGGCGTAAATCTCTGTCTGACGGTCGTAAAGGTCTGTACGGTATCT
    CTCTGAAAAACATCGACGAAATCGACCGTACCCGTAAATTCCTGCTGCGTTGGTCTCTGC
    GTCCGACCGAACCGGGTGAAGTTCGTCGTCTGGAACCGGGTCAGCGTTTCGCGATCGACC
    AGCTGAACCACCTGAACGCGCTGAAAGAAGACCGTCTGAAAAAAATGGCGAACACCATCA
    TCATGCACGCGCTGGGTTACTGCTACGACGTTCGTAAAAAAAAATGGCAGGCGAAAAACC
    CGGCGTGCCAGATCATCCTGTTCGAAGACCTGTCTAACTACAACCCGTACGAAGAACGTT
    CTCGTTTCGAAAACTCTAAACTGATGAAATGGTCTCGTCGTGAAATCCCGCGTCAGGTTG
    CGCTGCAGGGTGAAATCTACGGTCTGCAGGTTGGTGAAGTTGGTGCGCAGTTCTCTTCTC
    GTTTCCACGCGAAAACCGGTTCTCCGGGTATCCGTTGCTCTGTTGTTACCAAAGAAAAAC
    TGCAGGACAACCGTTTCTTCAAAAACCTGCAGCGTGAAGGTCGTCTGACCCTGGACAAAA
    TCGCGGTTCTGAAAGAAGGTGACCTGTACCCGGACAAAGGTGGTGAAAAATTCATCTCTC
    TGTCTAAAGACCGTAAACTGGTTACCACCCACGCGGACATCAACGCGGCGCAGAACCTGC
    AGAAACGTTTCTGGACCCGTACCCACGGTTTCTACAAAGTTTACTGCAAAGCGTACCAGG
    TTGACGGTCAGACCGTTTACATCCCGGAATCTAAAGACCAGAAACAGAAAATCATCGAAG
    AATTCGGTGAAGGTTACTTCATCCTGAAAGACGGTGTTTACGAATGGGGTAACGCGGGTA
    AACTGAAAATCAAAAAAGGTTCTTCTAAACAGTCTTCTTCTGAACTGGTTGACTCTGACA
    TCCTGAAAGACTCTTTCGACCTGGCGTCTGAACTGAAAGGTGAAAAACTGATGCTGTACC
    GTGACCCGTCTGGTAACGTTTTCCCGTCTGACAAATGGATGGCGGCGGGTGTTTTCTTCG
    GTAAACTGGAACGTATCCTGATCTCTAAACTGACCAACCAGTACTCTATCTCTACCATCG
    AAGACGACTCTTCTAAACAGTCTATGTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTT
    TTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCA
    GGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    75 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATCCGACCCGTACCATCAACCTGAAACTGGTTCTGGGTAAAAACCCGGAAAACGCGA
    CCCTGCGTCGTGCGCTGTTCTCTACCCACCGTCTGGTTAACCAGGCGACCAAACGTATCG
    AAGAATTCCTGCTGCTGTGCCGTGGTGAAGCGTACCGTACCGTTGACAACGAAGGTAAAG
    AAGCGGAAATCCCGCGTCACGCGGTTCAGGAAGAAGCGCTGGCGTTCGCGAAAGCGGCGC
    AGCGTCACAACGGTTGCATCTCTACCTACGAAGACCAGGAAATCCTGGACGTTCTGCGTC
    AGCTGTACGAACGTCTGGTTCCGTCTGTTAACGAAAACAACGAAGCGGGTGACGCGCAGG
    CGGCGAACGCGTGGGTTTCTCCGCTGATGTCTGCGGAATCTGAAGGTGGTCTGTCTGTTT
    ACGACAAAGTTCTGGACCCGCCGCCGGTTTGGATGAAACTGAAAGAAGAAAAAGCGCCGG
    GTTGGGAAGCGGCGTCTCAGATCTGGATCCAGTCTGACGAAGGTCAGTCTCTGCTGAACA
    AACCGGGTTCTCCGCCGCGTTGGATCCGTAAACTGCGTTCTGGTCAGCCGTGGCAGGACG
    ACTTCGTTTCTGACCAGAAAAAAAAACAGGACGAACTGACCAAAGGTAACGCGCCGCTGA
    TCAAACAGCTGAAAGAAATGGGTCTGCTGCCGCTGGTTAACCCGTTCTTCCGTCACCTGC
    TGGACCCGGAAGGTAAAGGTGTTTCTCCGTGGGACCGTCTGGCGGTTCGTGCGGCGGTTG
    CGCACTTCATCTCTTGGGAATCTTGGAACCACCGTACCCGTGCGGAATACAACTCTCTGA
    AACTGCGTCGTGACGAATTCGAAGCGGCGTCTGACGAATTCAAAGACGACTTCACCCTGC
    TGCGTCAGTACGAAGCGAAACGTCACTCTACCCTGAAATCTATCGCGCTGGCGGACGACT
    CTAACCCGTACCGTATCGGTGTTCGTTCTCTGCGTGCGTGGAACCGTGTTCGTGAAGAAT
    GGATCGACAAAGGTGCGACCGAAGAACAGCGTGTTACCATCCTGTCTAAACTGCAGACCC
    AGCTGCGTGGTAAATTCGGTGACCCGGACCTGTTCAACTGGCTGGCGCAGGACCGTCACG
    TTCACCTGTGGTCTCCGCGTGACTCTGTTACCCCGCTGGTTCGTATCAACGCGGTTGACA
    AAGTTCTGCGTCGTCGTAAACCGTACGCGCTGATGACCTTCGCGCACCCGCGTTTCCACC
    CGCGTTGGATCCTGTACGAAGCGCCGGGTGGTTCTAACCTGCGTCAGTACGCGCTGGACT
    GCACCGAAAACGCGCTGCACATCACCCTGCCGCTGCTGGTTGACGACGCGCACGGTACCT
    GGATCGAAAAAAAAATCCGTGTTCCGCTGGCGCCGTCTGGTCAGATCCAGGACCTGACCC
    TGGAAAAACTGGAAAAAAAAAAAAACCGTCTGTACTACCGTTCTGGTTTCCAGCAGTTCG
    CGGGTCTGGCGGGTGGTGCGGAAGTTCTGTTCCACCGTCCGTACATGGAACACGACGAAC
    GTTCTGAAGAATCTCTGCTGGAACGTCCGGGTGCGGTTTGGTTCAAACTGACCCTGGACG
    TTGCGACCCAGGCGCCGCCGAACTGGCTGGACGGTAAAGGTCGTGTTCGTACCCCGCCGG
    AAGTTCACCACTTCAAAACCGCGCTGTCTAACAAATCTAAACACACCCGTACCCTGCAGC
    CGGGTCTGCGTGTTCTGTCTGTTGACCTGGGTATGCGTACCTTCGCGTCTTGCTCTGTTT
    TCGAACTGATCGAAGGTAAACCGGAAACCGGTCGTGCGTTCCCGGTTGCGGACGAACGTT
    CTATGGACTCTCCGAACAAACTGTGGGCGAAACACGAACGTTCTTTCAAACTGACCCTGC
    CGGGTGAAACCCCGTCTCGTAAAGAAGAAGAAGAACGTTCTATCGCGCGTGCGGAAATCT
    ACGCGCTGAAACGTGACATCCAGCGTCTGAAATCTCTGCTGCGTCTGGGTGAAGAAGACA
    ACGACAACCGTCGTGACGCGCTGCTGGAACAGTTCTTCAAAGGTTGGGGTGAAGAAGACG
    TTGTTCCGGGTCAGGCGTTCCCGCGTTCTCTGTTCCAGGGTCTGGGTGCGGCGCCGTTCC
    GTTCTACCCCGGAACTGTGGCGTCAGCACTGCCAGACCTACTACGACAAAGCGGAAGCGT
    GCCTGGCGAAACACATCTCTGACTGGCGTAAACGTACCCGTCCGCGTCCGACCTCTCGTG
    AAATGTGGTACAAAACCCGTTCTTACCACGGTGGTAAATCTATCTGGATGCTGGAATACC
    TGGACGCGGTTCGTAAACTGCTGCTGTCTTGGTCTCTGCGTGGTCGTACCTACGGTGCGA
    TCAACCGTCAGGACACCGCGCGTTTCGGTTCTCTGGCGTCTCGTCTGCTGCACCACATCA
    ACTCTCTGAAAGAAGACCGTATCAAAACCGGTGCGGACTCTATCGTTCAGGCGGCGCGTG
    GTT
    ACATCCCGCTGCCGCACGGTAAAGGTTGGGAACAGCGTTACGAACCGTGCCAGCTGATCC
    TGTTCGAAGACCTGGCGCGTTACCGTTTCCGTGTTGACCGTCCGCGTCGTGAAAACTCTC
    AGCTGATGCAGTGGAACCACCGTGCGATCGTTGCGGAAACCACCATGCAGGCGGAACTGT
    ACGGTCAGATCGTTGAAAACACCGCGGCGGGTTTCTCTTCTCGTTTCCACGCGGCGACCG
    GTGCGCCGGGTGTTCGTTGCCGTTTCCTGCTGGAACGTGACTTCGACAACGACCTGCCGA
    AACCGTACCTGCTGCGTGAACTGTCTTGGATGCTGGGTAACACCAAAGTTGAATCTGAAG
    AAGAAAAACTGCGTCTGCTGTCTGAAAAAATCCGTCCGGGTTCTCTGGTTCCGTGGGACG
    GTGGTGAACAGTTCGCGACCCTGCACCCGAAACGTCAGACCCTGTGCGTTATCCACGCGG
    ACATGAACGCGGCGCAGAACCTGCAGCGTCGTTTCTTCGGTCGTTGCGGTGAAGCGTTCC
    GTCTGGTTTGCCAGCCGCACGGTGACGACGTTCTGCGTCTGGCGTCTACCCCGGGTGCGC
    GTCTGCTGGGTGCGCTGCAGCAGCTGGAAAACGGTCAGGGTGCGTTCGAACTGGTTCGTG
    ACATGGGTTCTACCTCTCAGATGAACCGTTTCGTTATGAAATCTCTGGGTAAAAAAAAAA
    TCAAACCGCTGCAGGACAACAACGGTGACGACGAACTGGAAGACGTTCTGTCTGTTCTGC
    CGGAAGAAGACGACACCGGTCGTATCACCGTTTTCCGTGACTCTTCTGGTATCTTCTTCC
    CGTGCAACGTTTGGATCCCGGCGAAACAGTTCTGGCCGGCGGTTCGTGCGATGATCTGGA
    AAGTTATGGCGTCTCACTCTCTGGGTTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTT
    TTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCA
    GGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    76 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATACCAAACTGCGTCACCGTCAGAAAAAACTGACCCACGACTGGGCGGGTTCTAAAA
    AACGTGAAGTTCTGGGTTCTAACGGTAAACTGCAGAACCCGCTGCTGATGCCGGTTAAAA
    AAGGTCAGGTTACCGAATTCCGTAAAGCGTTCTCTGCGTACGCGCGTGCGACCAAAGGTG
    AAATGACCGACGGTCGTAAAAACATGTTCACCCACTCTTTCGAACCGTTCAAAACCAAAC
    CGTCTCTGCACCAGTGCGAACTGGCGGACAAAGCGTACCAGTCTCTGCACTCTTACCTGC
    CGGGTTCTCTGGCGCACTTCCTGCTGTCTGCGCACGCGCTGGGTTTCCGTATCTTCTCTA
    AATCTGGTGAAGCGACCGCGTTCCAGGCGTCTTCTAAAATCGAAGCGTACGAATCTAAAC
    TGGCGTCTGAACTGGCGTGCGTTGACCTGTCTATCCAGAACCTGACCATCTCTACCCTGT
    TCAACGCGCTGACCACCTCTGTTCGTGGTAAAGGTGAAGAAACCTCTGCGGACCCGCTGA
    TCGCGCGTTTCTACACCCTGCTGACCGGTAAACCGCTGTCTCGTGACACCCAGGGTCCGG
    AACGTGACCTGGCGGAAGTTATCTCTCGTAAAATCGCGTCTTCTTTCGGTACCTGGAAAG
    AAATGACCGCGAACCCGCTGCAGTCTCTGCAGTTCTTCGAAGAAGAACTGCACGCGCTGG
    ACGCGAACGTTTCTCTGTCTCCGGCGTTCGACGTTCTGATCAAAATGAACGACCTGCAGG
    GTGACCTGAAAAACCGTACCATCGTTTTCGACCCGGACGCGCCGGTTTTCGAATACAACG
    CGGAAGACCCGGCGGACATCATCATCAAACTGACCGCGCGTTACGCGAAAGAAGCGGTTA
    TCAAAAACCAGAACGTTGGTAACTACGTTAAAAACGCGATCACCACCACCAACGCGAACG
    GTCTGGGTTGGCTGCTGAACAAAGGTCTGTCTCTGCTGCCGGTTTCTACCGACGACGAAC
    TGCTGGAATTCATCGGTGTTGAACGTTCTCACCCGTCTTGCCACGCGCTGATCGAACTGA
    TCGCGCAGCTGGAAGCGCCGGAACTGTTCGAAAAAAACGTTTTCTCTGACACCCGTTCTG
    AAGTTCAGGGTATGATCGACTCTGCGGTTTCTAACCACATCGCGCGTCTGTCTTCTTCTC
    GTAACTCTCTGTCTATGGACTCTGAAGAACTGGAACGTCTGATCAAATCTTTCCAGATCC
    ACACCCCGCACTGCTCTCTGTTCATCGGTGCGCAGTCTCTGTCTCAGCAGCTGGAATCTC
    TGCCGGAAGCGCTGCAGTCTGGTGTTAACTCTGCGGACATCCTGCTGGGTTCTACCCAGT
    ACATGCTGACCAACTCTCTGGTTGAAGAATCTATCGCGACCTACCAGCGTACCCTGAACC
    GTATCAACTACCTGTCTGGTGTTGCGGGTCAGATCAACGGTGCGATCAAACGTAAAGCGA
    TCGACGGTGAAAAAATCCACCTGCCGGCGGCGTGGTCTGAACTGATCTCTCTGCCGTTCA
    TCGGTCAGCCGGTTATCGACGTTGAATCTGACCTGGCGCACCTGAAAAACCAGTACCAGA
    CCCTGTCTAACGAATTCGACACCCTGATCTCTGCGCTGCAGAAAAACTTCGACCTGAACT
    TCAACAAAGCGCTGCTGAACCGTACCCAGCACTTCGAAGCGATGTGCCGTTCTACCAAAA
    AAAACGCGCTGTCTAAACCGGAAATCGTTTCTTACCGTGACCTGCTGGCGCGTCTGACCT
    CTTGCCTGTACCGTGGTTCTCTGGTTCTGCGTCGTGCGGGTATCGAAGTTCTGAAAAAAC
    ACAAAATCTTCGAATCTAACTCTGAACTGCGTGAACACGTTCACGAACGTAAACACTTCG
    TTTTCGTTTCTCCGCTGGACCGTAAAGCGAAAAAACTGCTGCGTCTGACCGACTCTCGTC
    CGGACCTGCTGCACGTTATCGACGAAATCCTGCAGCACGACAACCTGGAAAACAAAGACC
    GTGAATCTCTGTGGCTGGTTCGTTCTGGTTACCTGCTGGCGGGTCTGCCGGACCAGCTGT
    CTTCTTCTTTCATCAACCTGCCGATCATCACCCAGAAAGGTGACCGTCGTCTGATCGACC
    TGATCCAGTACGACCAGATCAACCGTGACGCGTTCGTTATGCTGGTTACCTCTGCGTTCA
    AATCTAACCTGTCTGGTCTGCAGTACCGTGCGAACAAACAGTCTTTCGTTGTTACCCGTA
    CCCTGTCTCCGTACCTGGGTTCTAAACTGGTTTACGTTCCGAAAGACAAAGACTGGCTGG
    TTCCGTCTCAGATGTTCGAAGGTCGTTTCGCGGACATCCTGCAGTCTGACTACATGGTTT
    GGAAAGACGCGGGTCGTCTGTGCGTTATCGACACCGCGAAACACCTGTCTAACATCAAAA
    AATCTGTTTTCTCTTCTGAAGAAGTTCTGGCGTTCCTGCGTGAACTGCCGCACCGTACCT
    TCATCCAGACCGAAGTTCGTGGTCTGGGTGTTAACGTTGACGGTATCGCGTTCAACAACG
    GTGACATCCCGTCTCTGAAAACCTTCTCTAACTGCGTTCAGGTTAAAGTTTCTCGTACCA
    ACACCTCTCTGGTTCAGACCCTGAACCGTTGGTTCGAAGGTGGTAAAGTTTCTCCGCCGT
    CTATCCAGTTCGAACGTGCGTACTACAAAAAAGACGACCAGATCCACGAAGACGCGGCGA
    AACGTAAAATCCGTTTCCAGATGCCGGCGACCGAACTGGTTCACGCGTCTGACGACGCGG
    GTTGGACCCCGTCTTACCTGCTGGGTATCGACCCGGGTGAATACGGTATGGGTCTGTCTC
    TGGTTTCTATCAACAACGGTGAAGTTCTGGACTCTGGTTTCATCCACATCAACTCTCTGA
    TCAACTTCGCGTCTAAAAAATCTAACCACCAGACCAAAGTTGTTCCGCGTCAGCAGTACA
    AATCTCCGTACGCGAACTACCTGGAACAGTCTAAAGACTCTGCGGCGGGTGACATCGCGC
    ACATCCTGGACCGTCTGATCTACAAACTGAACGCGCTGCCGGTTTTCGAAGCGCTGTCTG
    GTAACTCTCAGTCTGCGGCGGACCAGGTTTGGACCAAAGTTCTGTCTTTCTACACCTGGG
    GTGACAACGACGCGCAGAACTCTATCCGTAAACAGCACTGGTTCGGTGCGTCTCACTGGG
    ACATCAAAGGTATGCTGCGTCAGCCGCCGACCGAAAAAAAACCGAAACCGTACATCGCGT
    TCCCGGGTTCTCAGGTTTCTTCTTACGGTAACTCTCAGCGTTGCTCTTGCTGCGGTCGTA
    ACCCGATCGAACAGCTGCGTGAAATGGCGAAAGACACCTCTATCAAAGAACTGAAAATCC
    GTAACTCTGAAATCCAGCTGTTCGACGGTACCATCAAACTGTTCAACCCGGACCCGTCTA
    CCGTTATCGAACGTCGTCGTCACAACCTGGGTCCGTCTCGTATCCCGGTTGCGGACCGTA
    CCTTCAAAAACATCTCTCCGTCTTCTCTGGAATTCAAAGAACTGATCACCATCGTTTCTC
    GTTCTATCCGTCACTCTCCGGAATTCATCGCGAAAAAACGTGGTATCGGTTCTGAATACT
    TCTGCGCGTACTCTGACTGCAACTCTTCTCTGAACTCTGAAGCGAACGCGGCGGCGAACG
    TTGCGCAGAAATTCCAGAAACAGCTGTTCTTCGAACTGTAAGAAATCATCCTTAGCGAAA
    GCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGA
    AGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    77 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATAAACGTATCCTGAACTCTCTGAAAGTTGCGGCGCTGCGTCTGCTGTTCCGTGGTA
    AAGGTTCTGAACTGGTTAAAACCGTTAAATACCCGCTGGTTTCTCCGGTTCAGGGTGCGG
    TTGAAGAACTGGCGGAAGCGATCCGTCACGACAACCTGCACCTGTTCGGTCAGAAAGAAA
    TCGTTGACCTGATGGAAAAAGACGAAGGTACCCAGGTTTACTCTGTTGTTGACTTCTGGC
    TGGACACCCTGCGTCTGGGTATGTTCTTCTCTCCGTCTGCGAACGCGCTGAAAATCACCC
    TGGGTAAATTCAACTCTGACCAGGTTTCTCCGTTCCGTAAAGTTCTGGAACAGTCTCCGT
    TCTTCCTGGCGGGTCGTCTGAAAGTTGAACCGGCGGAACGTATCCTGTCTGTTGAAATCC
    GTAAAATCGGTAAACGTGAAAACCGTGTTGAAAACTACGCGGCGGACGTTGAAACCTGCT
    TCATCGGTCAGCTGTCTTCTGACGAAAAACAGTCTATCCAGAAACTGGCGAACGACATCT
    GGGACTCTAAAGACCACGAAGAACAGCGTATGCTGAAAGCGGACTTCTTCGCGATCCCGC
    TGATCAAAGACCCGAAAGCGGTTACCGAAGAAGACCCGGAAAACGAAACCGCGGGTAAAC
    AGAAACCGCTGGAACTGTGCGTTTGCCTGGTTCCGGAACTGTACACCCGTGGTTTCGGTT
    CTATCGCGGACTTCCTGGTTCAGCGTCTGACCCTGCTGCGTGACAAAATGTCTACCGACA
    CCGCGGAAGACTGCCTGGAATACGTTGGTATCGAAGAAGAAAAAGGTAACGGTATGAACT
    CTCTGCTGGGTACCTTCCTGAAAAACCTGCAGGGTGACGGTTTCGAACAGATCTTCCAGT
    TCATGCTGGGTTCTTACGTTGGTTGGCAGGGTAAAGAAGACGTTCTGCGTGAACGTCTGG
    ACCTGCTGGCGGAAAAAGTTAAACGTCTGCCGAAACCGAAATTCGCGGGTGAATGGTCTG
    GTCACCGTATGTTCCTGCACGGTCAGCTGAAATCTTGGTCTTCTAACTTCTTCCGTCTGT
    TCAACGAAACCCGTGAACTGCTGGAATCTATCAAATCTGACATCCAGCACGCGACCATGC
    TGATCTCTTACGTTGAAGAAAAAGGTGGTTACCACCCGCAGCTGCTGTCTCAGTACCGTA
    AACTGATGGAACAGCTGCCGGCGCTGCGTACCAAAGTTCTGGACCCGGAAATCGAAATGA
    CCCACATGTCTGAAGCGGTTCGTTCTTACATCATGATCCACAAATCTGTTGCGGGTTTCC
    TGCCGGACCTGCTGGAATCTCTGGACCGTGACAAAGACCGTGAATTCCTGCTGTCTATCT
    TCCCGCGTATCCCGAAAATCGACAAAAAAACCAAAGAAATCGTTGCGTGGGAACTGCCGG
    GTGAACCGGAAGAAGGTTACCTGTTCACCGCGAACAACCTGTTCCGTAACTTCCTGGAAA
    ACCCGAAACACGTTCCGCGTTTCATGGCGGAACGTATCCCGGAAGACTGGACCCGTCTGC
    GTTCTGCGCCGGTTTGGTTCGACGGTATGGTTAAACAGTGGCAGAAAGTTGTTAACCAGC
    TGGTTGAATCTCCGGGTGCGCTGTACCAGTTCAACGAATCTTTCCTGCGTCAGCGTCTGC
    AGGCGATGCTGACCGTTTACAAACGTGACCTGCAGACCGAAAAATTCCTGAAACTGCTGG
    CGGACGTTTGCCGTCCGCTGGTTGACTTCTTCGGTCTGGGTGGTAACGACATCATCTTCA
    AATCTTGCCAGGACCCGCGTAAACAGTGGCAGACCGTTATCCCGCTGTCTGTTCCGGCGG
    ACGTTTACACCGCGTGCGAAGGTCTGGCGATCCGTCTGCGTGAAACCCTGGGTTTCGAAT
    GGAAAAACCTGAAAGGTCACGAACGTGAAGACTTCCTGCGTCTGCACCAGCTGCTGGGTA
    ACCTGCTGTTCTGGATCCGTGACGCGAAACTGGTTGTTAAACTGGAAGACTGGATGAACA
    ACCCGTGCGTTCAGGAATACGTTGAAGCGCGTAAAGCGATCGACCTGCCGCTGGAAATCT
    TCGGTTTCGAAGTTCCGATCTTCCTGAACGGTTACCTGTTCTCTGAACTGCGTCAGCTGG
    AACTGCTGCTGCGTCGTAAATCTGTTATGACCTCTTACTCTGTTAAAACCACCGGTTCTC
    CGAACCGTCTGTTCCAGCTGGTTTACCTGCCGCTGAACCCGTCTGACCCGGAAAAAAAAA
    ACTCTAACAACTTCCAGGAACGTCTGGACACCCCGACCGGTCTGTCTCGTCGTTTCCTGG
    ACCTGACCCTGGACGCGTTCGCGGGTAAACTGCTGACCGACCCGGTTACCCAGGAACTGA
    AAACCATGGCGGGTTTCTACGACCACCTGTTCGGTTTCAAACTGCCGTGCAAACTGGCGG
    CGATGTCTAACCACCCGGGTTCTTCTTCTAAAATGGTTGTTCTGGCGAAACCGAAAAAAG
    GTGTTGCGTCTAACATCGGTTTCGAACCGATCCCGGACCCGGCGCACCCGGTTTTCCGTG
    TTCGTTCTTCTTGGCCGGAACTGAAATACCTGGAAGGTCTGCTGTACCTGCCGGAAGACA
    CCCCGCTGACCATCGAACTGGCGGAAACCTCTGTTTCTTGCCAGTCTGTTTCTTCTGTTG
    CGTTCGACCTGAAAAACCTGACCACCATCCTGGGTCGTGTTGGTGAATTCCGTGTTACCG
    CGGACCAGCCGTTCAAACTGACCCCGATCATCCCGGAAAAAGAAGAATCTTTCATCGGTA
    AAACCTACCTGGGTCTGGACGCGGGTGAACGTTCTGGTGTTGGTTTCGCGATCGTTACCG
    TTGACGGTGACGGTTACGAAGTTCAGCGTCTGGGTGTTCACGAAGACACCCAGCTGATGG
    CGCTGCAGCAGGTTGCGTCTAAATCTCTGAAAGAACCGGTTTTCCAGCCGCTGCGTAAAG
    GTACCTTCCGTCAGCAGGAACGTATCCGTAAATCTCTGCGTGGTTGCTACTGGAACTTCT
    ACCACGCGCTGATGATCAAATACCGTGCGAAAGTTGTTCACGAAGAATCTGTTGGTTCTT
    CTGGTCTGGTTGGTCAGTGGCTGCGTGCGTTCCAGAAAGACCTGAAAAAAGCGGACGTTC
    TGCCGAAAAAAGGTGGTAAAAACGGTGTTGACAAAAAAAAACGTGAATCTTCTGCGCAGG
    ACACCCTGTGGGGTGGTGCGTTCTCTAAAAAAGAAGAACAGCAGATCGCGTTCGAAGTTC
    AGGCGGCGGGTTCTTCTCAGTTCTGCCTGAAATGCGGTTGGTGGTTCCAGCTGGGTATGC
    GTGAAGTTAACCGTGTTCAGGAATCTGGTGTTGTTCTGGACTGGAACCGTTCTATCGTTA
    CCTTCCTGATCGAATCTTCTGGTGAAAAAGTTTACGGTTTCTCTCCGCAGCAGCTGGAAA
    AAGGTTTCCGTCCGGACATCGAAACCTTCAAAAAAATGGTTCGTGACTTCATGCGTCCGC
    CGATGTTCGACCGTAAAGGTCGTCCGGCGGCGGCGTACGAACGTTTCGTTCTGGGTCGTC
    GTCACCGTCGTTACCGTTTCGACAAAGTTTTCGAAGAACGTTTCGGTCGTTCTGCGCTGT
    TCATCTGCCCGCGTGTTGGTTGCGGTAACTTCGACCACTCTTCTGAACAGTCTGCGGTTG
    TTCTGGCGCTGATCGGTTACATCGCGGACAAAGAAGGTATGTCTGGTAAAAAACTGGTTT
    ACGTTCGTCTGGCGGAACTGATGGCGGAATGGAAACTGAAAAAACTGGAACGTTCTCGTG
    TTGAAGAACAGTCTTCTGCGCAGTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTT
    TATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGA
    AGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    78 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATGCGGAATCTAAACAGATGCAGTGCCGTAAATGCGGTGCGTCTATGAAATACGAAG
    TTATCGGTCTGGGTAAAAAATCTTGCCGTTACATGTGCCCGGACTGCGGTAACCACACCT
    CTGCGCGTAAAATCCAGAACAAAAAAAAACGTGACAAAAAATACGGTTCTGCGTCTAAAG
    CGCAGTCTCAGCGTATCGCGGTTGCGGGTGCGCTGTACCCGGACAAAAAAGTTCAGACCA
    TCAAAACCTACAAATACCCGGCGGACCTGAACGGTGAAGTTCACGACTCTGGTGTTGCGG
    AAAAAATCGCGCAGGCGATCCAGGAAGACGAAATCGGTCTGCTGGGTCCGTCTTCTGAAT
    ACGCGTGCTGGATCGCGTCTCAGAAACAGTCTGAACCGTACTCTGTTGTTGACTTCTGGT
    TCGACGCGGTTTGCGCGGGTGGTGTTTTCGCGTACTCTGGTGCGCGTCTGCTGTCTACCG
    TTCTGCAGCTGTCTGGTGAAGAATCTGTTCTGCGTGCGGCGCTGGCGTCTTCTCCGTTCG
    TTGACGACATCAACCTGGCGCAGGCGGAAAAATTCCTGGCGGTTTCTCGTCGTACCGGTC
    AGGACAAACTGGGTAAACGTATCGGTGAATGCTTCGCGGAAGGTCGTCTGGAAGCGCTGG
    GTATCAAAGACCGTATGCGTGAATTCGTTCAGGCGATCGACGTTGCGCAGACCGCGGGTC
    AGCGTTTCGCGGCGAAACTGAAAATCTTCGGTATCTCTCAGATGCCGGAAGCGAAACAGT
    GGAACAACGACTCTGGTCTGACCGTTTGCATCCTGCCGGACTACTACGTTCCGGAAGAAA
    ACCGTGCGGACCAGCTGGTTGTTCTGCTGCGTCGTCTGCGTGAAATCGCGTACTGCATGG
    GTATCGAAGACGAAGCGGGTTTCGAACACCTGGGTATCGACCCGGGTGCGCTGTCTAACT
    TCTCTAACGGTAACCCGAAACGTGGTTTCCTGGGTCGTCTGCTGAACAACGACATCATCG
    CGCTGGCGAACAACATGTCTGCGATGACCCCGTACTGGGAAGGTCGTAAAGGTGAACTGA
    TCGAACGTCTGGCGTGGCTGAAACACCGTGCGGAAGGTCTGTACCTGAAAGAACCGCACT
    TCGGTAACTCTTGGGCGGACCACCGTTCTCGTATCTTCTCTCGTATCGCGGGTTGGCTGT
    CTGGTTGCGCGGGTAAACTGAAAATCGCGAAAGACCAGATCTCTGGTGTTCGTACCGACC
    TGTTCCTGCTGAAACGTCTGCTGGACGCGGTTCCGCAGTCTGCGCCGTCTCCGGACTTCA
    TCGCGTCTATCTCTGCGCTGGACCGTTTCCTGGAAGCGGCGGAATCTTCTCAGGACCCGG
    CGGAACAGGTTCGTGCGCTGTACGCGTTCCACCTGAACGCGCCGGCGGTTCGTTCTATCG
    CGAACAAAGCGGTTCAGCGTTCTGACTCTCAGGAATGGCTGATCAAAGAACTGGACGCGG
    TTGACCACCTGGAATTCAACAAAGCGTTCCCGTTCTTCTCTGACACCGGTAAAAAAAAAA
    AAAAAGGTGCGAACTCTAACGGTGCGCCGTCTGAAGAAGAATACACCGAAACCGAATCTA
    TCCAGCAGCCGGAAGACGCGGAACAGGAAGTTAACGGTCAGGAAGGTAACGGTGCGTCTA
    AAAACCAGAAAAAATTCCAGCGTATCCCGCGTTTCTTCGGTGAAGGTTCTCGTTCTGAAT
    ACCGTATCCTGACCGAAGCGCCGCAGTACTTCGACATGTTCTGCAACAACATGCGTGCGA
    TCTTCATGCAGCTGGAATCTCAGCCGCGTAAAGCGCCGCGTGACTTCAAATGCTTCCTGC
    AGAACCGTCTGCAGAAACTGTACAAACAGACCTTCCTGAACGCGCGTTCTAACAAATGCC
    GTGCGCTGCTGGAATCTGTTCTGATCTCTTGGGGTGAATTCTACACCTACGGTGCGAACG
    AAAAAAAATTCCGTCTGCGTCACGAAGCGTCTGAACGTTCTTCTGACCCGGACTACGTTG
    TTCAGCAGGCGCTGGAAATCGCGCGTCGTCTGTTCCTGTTCGGTTTCGAATGGCGTGACT
    GCTCTGCGGGTGAACGTGTTGACCTGGTTGAAATCCACAAAAAAGCGATCTCTTTCCTGC
    TGGCGATCACCCAGGCGGAAGTTTCTGTTGGTTCTTACAACTGGCTGGGTAACTCTACCG
    TTTCTCGTTACCTGTCTGTTGCGGGTACCGACACCCTGTACGGTACCCAGCTGGAAGAAT
    TCCTGAACGCGACCGTTCTGTCTCAGATGCGTGGTCTGGCGATCCGTCTGTCTTCTCAGG
    AACTGAAAGACGGTTTCGACGTTCAGCTGGAATCTTCTTGCCAGGACAACCTGCAGCACC
    TGCTGGTTTACCGTGCGTCTCGTGACCTGGCGGCGTGCAAACGTGCGACCTGCCCGGCGG
    AACTGGACCCGAAAATCCTGGTTCTGCCGGTTGGTGCGTTCATCGCGTCTGTTATGAAAA
    TGATCGAACGTGGTGACGAACCGCTGGCGGGTGCGTACCTGCGTCACCGTCCGCACTCTT
    TCGGTTGGCAGATCCGTGTTCGTGGTGTTGCGGAAGTTGGTATGGACCAGGGTACCGCGC
    TGGCGTTCCAGAAACCGACCGAATCTGAACCGTTCAAAATCAAACCGTTCTCTGCGCAGT
    ACGGTCCGGTTCTGTGGCTGAACTCTTCTTCTTACTCTCAGTCTCAGTACCTGGACGGTT
    TCCTGTCTCAGCCGAAAAACTGGTCTATGCGTGTTCTGCCGCAGGCGGGTTCTGTTCGTG
    TTGAACAGCGTGTTGCGCTGATCTGGAACCTGCAGGCGGGTAAAATGCGTCTGGAACGTT
    CTGGTGCGCGTGCGTTCTTCATGCCGGTTCCGTTCTCTTTCCGTCCGTCTGGTTCTGGTG
    ACGAAGCGGTTCTGGCGCCGAACCGTTACCTGGGTCTGTTCCCGCACTCTGGTGGTATCG
    AATACGCGGTTGTTGACGTTCTGGACTCTGCGGGTTTCAAAATCCTGGAACGTGGTACCA
    TCGCGGTTAACGGTTTCTCTCAGAAACGTGGTGAACGTCAGGAAGAAGCGCACCGTGAAA
    AACAGCGTCGTGGTATCTCTGACATCGGTCGTAAAAAACCGGTTCAGGCGGAAGTTGACG
    CGGCGAACGAACTGCACCGTAAATACACCGACGTTGCGACCCGTCTGGGTTGCCGTATCG
    TTGTTCAGTGGGCGCCGCAGCCGAAACCGGGTACCGCGCCGACCGCGCAGACCGTTTACG
    CGCGTGCGGTTCGTACCGAAGCGCCGCGTTCTGGTAACCAGGAAGACCACGCGCGTATGA
    AATCTTCTTGGGGTTACACCTGGGGTACCTACTGGGAAAAACGTAAACCGGAAGACATCC
    TGGGTATCTCTACCCAGGTTTACTGGACCGGTGGTATCGGTGAATCTTGCCCGGCGGTTG
    CGGTTGCGCTGCTGGGTCACATCCGTGCGACCTCTACCCAGACCGAATGGGAAAAAGAAG
    AAGTTGTTTTCGGTCGTCTGAAAAAATTCTTCCCGTCTTAAGAAATCATCCTTAGCGAAA
    GCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGA
    AGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    79 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATGAAAAACGTATCAACAAAATCCGTAAAAAACTGTCTGCGGACAACGCGACCAAAC
    CGGTTTCTCGTTCTGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCGACGACCTGA
    AAAAACGTCTGGAAAAACGTCGTAAAAAACCGGAAGTTATGCCGCAGGTTATCTCTAACA
    ACGCGGCGAACAACCTGCGTATGCTGCTGGACGACTACACCAAAATGAAAGAAGCGATCC
    TGCAGGTTTACTGGCAGGAATTCAAAGACGACCACGTTGGTCTGATGTGCAAATTCGCGC
    AGCCGGCGTCTAAAAAAATCGACCAGAACAAACTGAAACCGGAAATGGACGAAAAAGGTA
    ACCTGACCACCGCGGGTTTCGCGTGCTCTCAGTGCGGTCAGCCGCTGTTCGTTTACAAAC
    TGGAACAGGTTTCTGAAAAAGGTAAAGCGTACACCAACTACTTCGGTCGTTGCAACGTTG
    CGGAACACGAAAAACTGATCCTGCTGGCGCAGCTGAAACCGGAAAAAGACTCTGACGAAG
    CGGTTACCTACTCTCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACG
    TTACCAAAGAATCTACCCACCCGGTTAAACCGCTGGCGCAGATCGCGGGTAACCGTTACG
    CGTCTGGTCCGGTTGGTAAAGCGCTGTCTGACGCGTGCATGGGTACCATCGCGTCTTTCC
    TGTCTAAATACCAGGACATCATCATCGAACACCAGAAAGTTGTTAAAGGTAACCAGAAAC
    GTCTGGAATCTCTGCGTGAACTGGCGGGTAAAGAAAACCTGGAATACCCGTCTGTTACCC
    TGCCGCCGCAGCCGCACACCAAAGAAGGTGTTGACGCGTACAACGAAGTTATCGCGCGTG
    TTCGTATGTGGGTTAACCTGAACCTGTGGCAGAAACTGAAACTGTCTCGTGACGACGCGA
    AACCGCTGCTGCGTCTGAAAGGTTTCCCGTCTTTCCCGGTTGTTGAACGTCGTGAAAACG
    AAGTTGACTGGTGGAACACCATCAACGAAGTTAAAAAACTGATCGACGCGAAACGTGACA
    TGGGTCGTGTTTTCTGGTCTGGTGTTACCGCGGAAAAACGTAACACCATCCTGGAAGGTT
    ACAACTACCTGCCGAACGAAAACGACCACAAAAAACGTGAAGGTTCTCTGGAAAACCCGA
    AAAAACCGGCGAAACGTCAGTTCGGTGACCTGCTGCTGTACCTGGAAAAAAAATACGCGG
    GTGACTGGGGTAAAGTTTTCGACGAAGCGTGGGAACGTATCGACAAAAAAATCGCGGGTC
    TGACCTCTCACATCGAACGTGAAGAAGCGCGTAACGCGGAAGACGCGCAGTCTAAAGCGG
    TTCTGACCGACTGGCTGCGTGCGAAAGCGTCTTTCGTTCTGGAACGTCTGAAAGAAATGG
    ACGAAAAAGAATTCTACGCGTGCGAAATCCAGCTGCAGAAATGGTACGGTGACCTGCGTG
    GTAACCCGTTCGCGGTTGAAGCGGAAAACCGTGTTGTTGACATCTCTGGTTTCTCTATCG
    GTTCTGACGGTCACTCTATCCAGTACCGTAACCTGCTGGCGTGGAAATACCTGGAAAACG
    GTAAACGTGAATTCTACCTGCTGATGAACTACGGTAAAAAAGGTCGTATCCGTTTCACCG
    ACGGTACCGACATCAAAAAATCTGGTAAATGGCAGGGTCTGCTGTACGGTGGTGGTAAAG
    CGAAAGTTATCGACCTGACCTTCGACCCGGACGACGAACAGCTGATCATCCTGCCGCTGG
    CGTTCGGTACCCGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGTCTCTGGAAACCG
    GTCTGATCAAACTGGCGAACGGTCGTGTTATCGAAAAAACCATCTACAACAAAAAAATCG
    GTCGTGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTGTTGACC
    CGTCTAACATCAAACCGGTTAACCTGATCGGTGTTGACCGTGGTGAAAACATCCCGGCGG
    TTATCGCGCTGACCGACCCGGAAGGTTGCCCGCTGCCGGAATTCAAAGACTCTTCTGGTG
    GTCCGACCGACATCCTGCGTATCGGTGAAGGTTACAAAGAAAAACAGCGTGCGATCCAGG
    CGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAAATTCGCGTCTAAAT
    CTCGTAACCTGGCGGACGACATGGTTCGTAACTCTGCGCGTGACCTGTTCTACCACGCGG
    TTACCCACGACGCGGTTCTGGTTTTCGAAAACCTGTCTCGTGGTTTCGGTCGTCAGGGTA
    AACGTACCTTCATGACCGAACGTCAGTACACCAAAATGGAAGACTGGCTGACCGCGAAAC
    TGGCGTACGAAGGTCTGACCTCTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCT
    CTAAAACCTGCTCTAACTGCGGTTTCACCATCACCACCGCGGACTACGACGGTATGCTGG
    TTCGTCTGAAAAAAACCTCTGACGGTTGGGCGACCACCCTGAACAACAAAGAACTGAAAG
    CGGAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGACCGTTGAAAAAGAACTGT
    CTGCGGAACTGGACCGTCTGTCTGAAGAATCTGGTAACAACGACATCTCTAAATGGACCA
    AAGGTCGTCGTGACGAAGCGCTGTTCCTGCTGAAAAAACGTTTCTCTCACCGTCCGGTTC
    AGGAACAGTTCGTTTGCCTGGACTGCGGTCACGAAGTTCACGCGGACGAACAGGCGGCGC
    TGAACATCGCGCGTTCTTGGCTGTTCCTGAACTCTAACTCTACCGAATTCAAATCTTACA
    AATCTGGTAAACAGCCGTTCGTTGGTGCGTGGCAGGCGTTCTACAAACGTCGTCTGAAAG
    AAGTTTGGAAACCGAACGCGTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTAT
    CTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGC
    AAAGAGGATTACA
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCG
    ID TCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAG
    NO: CATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAA
    80 TCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGC
    ATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTGT
    TTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTC
    GTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCATC
    ACCATAAACGTATCAACAAAATCCGTCGTCGTCTGGTTAAAGACTCTAACACCAAAAAAG
    CGGGTAAAACCGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCCCGGACCTGCGTG
    AACGTCTGGAAAACCTGCGTAAAAAACCGGAAAACATCCCGCAGCCGATCTCTAACACCT
    CTCGTGCGAACCTGAACAAACTGCTGACCGACTACACCGAAATGAAAAAAGCGATCCTGC
    ACGTTTACTGGGAAGAATTCCAGAAAGACCCGGTTGGTCTGATGTCTCGTGTTGCGCAGC
    CGGCGCCGAAAAACATCGACCAGCGTAAACTGATCCCGGTTAAAGACGGTAACGAACGTC
    TGACCTCTTCTGGTTTCGCGTGCTCTCAGTGCTGCCAGCCGCTGTACGTTTACAAACTGG
    AACAGGTTAACGACAAAGGTAAACCGCACACCAACTACTTCGGTCGTTGCAACGTTTCTG
    AACACGAACGTCTGATCCTGCTGTCTCCGCACAAACCGGAAGCGAACGACGAACTGGTTA
    CCTACTCTCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCC
    GTGAATCTAACCACCCGGTTAAACCGCTGGAACAGATCGGTGGTAACTCTTGCGCGTCTG
    GTCCGGTTGGTAAAGCGCTGTCTGACGCGTGCATGGGTGCGGTTGCGTCTTTCCTGACCA
    AATACCAGGACATCATCCTGGAACACCAGAAAGTTATCAAAAAAAACGAAAAACGTCTGG
    CGAACCTGAAAGACATCGCGTCTGCGAACGGTCTGGCGTTCCCGAAAATCACCCTGCCGC
    CGCAGCCGCACACCAAAGAAGGTATCGAAGCGTACAACAACGTTGTTGCGCAGATCGTTA
    TCTGGGTTAACCTGAACCTGTGGCAGAAACTGAAAATCGGTCGTGACGAAGCGAAACCGC
    TGCAGCGTCTGAAAGGTTTCCCGTCTTTCCCGCTGGTTGAACGTCAGGCGAACGAAGTTG
    ACTGGTGGGACATGGTTTGCAACGTTAAAAAACTGATCAACGAAAAAAAAGAAGACGGTA
    AAGTTTTCTGGCAGAACCTGGCGGGTTACAAACGTCAGGAAGCGCTGCTGCCGTACCTGT
    CTTCTGAAGAAGACCGTAAAAAAGGTAAAAAATTCGCGCGTTACCAGTTCGGTGACCTGC
    TGCTGCACCTGGAAAAAAAACACGGTGAAGACTGGGGTAAAGTTTACGACGAAGCGTGGG
    AACGTATCGACAAAAAAGTTGAAGGTCTGTCTAAACACATCAAACTGGAAGAAGAACGTC
    GTTCTGAAGACGCGCAGTCTAAAGCGGCGCTGACCGACTGGCTGCGTGCGAAAGCGTCTT
    TCGTTATCGAAGGTCTGAAAGAAGCGGACAAAGACGAATTCTGCCGTTGCGAACTGAAAC
    TGCAGAAATGGTACGGTGACCTGCGTGGTAAACCGTTCGCGATCGAAGCGGAAAACTCTA
    TCCTGGACATCTCTGGTTTCTCTAAACAGTACAACTGCGCGTTCATCTGGCAGAAAGACG
    GTGTTAAAAAACTGAACCTGTACCTGATCATCAACTACTTCAAAGGTGGTAAACTGCGTT
    TCAAAAAAATCAAACCGGAAGCGTTCGAAGCGAACCGTTTCTACACCGTTATCAACAAAA
    AATCTGGTGAAATCGTTCCGATGGAAGTTAACTTCAACTTCGACGACCCGAACCTGATCA
    TCCTGCCGCTGGCGTTCGGTAAACGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGT
    CTCTGGAAACCGGTTCTCTGAAACTGGCGAACGGTCGTGTTATCGAAAAAACCCTGTACA
    ACCGTCGTACCCGTCAGGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTG
    AAGTTCTGGACTCTTCTAACATCAAACCGATGAACCTGATCGGTATCGACCGTGGTGAAA
    ACATCCCGGCGGTTATCGCGCTGACCGACCCGGAAGGTTGCCCGCTGTCTCGTTTCAAAG
    ACTCTCTGGGTAACCCGACCCACATCCTGCGTATCGGTGAATCTTACAAAGAAAAACAGC
    GTACCATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAAAT
    ACGCGTCTAAAGCGAAAAACCTGGCGGACGACATGGTTCGTAACACCGCGCGTGACCTGC
    TGTACTACGCGGTTACCCAGGACGCGATGCTGATCTTCGAAAACCTGTCTCGTGGTTTCG
    GTCGTCAGGGTAAACGTACCTTCATGGCGGAACGTCAGTACACCCGTATGGAAGACTGGC
    TGACCGCGAAACTGGCGTACGAAGGTCTGCCGTCTAAAACCTACCTGTCTAAAACCCTGG
    CGCAGTACACCTCTAAAACCTGCTCTAACTGCGGTTTCACCATCACCTCTGCGGACTACG
    ACCGTGTTCTGGAAAAACTGAAAAAAACCGCGACCGGTTGGATGACCACCATCAACGGTA
    AAGAACTGAAAGTTGAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGAACGTTG
    TTAAAGACCTGTCTGTTGAACTGGACCGTCTGTCTGAAGAATCTGTTAACAACGACATCT
    CTTCTTGGACCAAAGGTCGTTCTGGTGAAGCGCTGTCTCTGCTGAAAAAACGTTTCTCTC
    ACCGTCCGGTTCAGGAAAAATTCGTTTGCCTGAACTGCGGTTTCGAAACCCACGCGGACG
    AACAGGCGGCGCTGAACATCGCGCGTTCTTGGCTGTTCCTGCGTTCTCAGGAATACAAAA
    AATACCAGACCAACAAAACCACCGGTAACACCGACAAACGTGCGTTCGTTGAAACCTGGC
    AGTCTTTCTACCGTAAAAAACTGAAAGAAGTTTGGAAACCGGAAATCATCCTTAGCGAAA
    GCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGA
    AGTTATTACTCAGGAAGCAAAGAGGATTACA
    SEQ tgccgtcactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttatt
    ID aaaagcattctgtaacaaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtc
    NO: tataatcacggcagaaaagtccacattgattatttgcacggcgtcacactttgctatgcc
    81 atagcatttttatccataagattagcggatcctacctgacgctttttatcgcaactctct
    actgtttctccatacccgtttttttgggctagcaccgcctatctcgtgtgagataggcgg
    agatacgaactttaagAAGGAGatatacc
    SEQ TGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATT
    ID AAAAGCATTCTGTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTC
    NO: TATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCC
    82 ATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCT
    ACTGTTTCTCCATACCCGTTTTTTTGGGTAGCGGATCCTACCTGAC
    SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTTCTA
    ID GAGCACAGCTAACACCACGTCGTCCCTATCTGCTGCCCTAGGTCTATGAGTGGTTGCTGG
    NO: ATAACTTTACGGGCATGCATAAGGCTCGTAATATATATTCAGGGAGACCACAACGGTTTC
    83 CCTCTACAAATAATTTTGTTTAACTTTTACTAGAGCTAGCAGTAATACGACTCACTATAG
    GGGTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCA
    SEQ GTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC
    ID
    NO:
    84
    SEQ actacattttttaagacctaattttgagt
    ID
    NO:
    85
    SEQ ctcaaaactcattcgaatctctactctttgtagat
    ID
    NO:
    86
    SEQ CTCTAGCAGGCCTGGCAAATTTCTACTGTTGTAGAT
    ID
    NO:
    87
    SEQ CCGTCTAAAACTCATTCAGAATTTCTACTAGTGTAGAT
    ID
    NO:
    88
    SEQ GTCTAGGTACTCTCTTTAATTTCTACTATTGT
    ID
    NO:
    89
    SEQ gttaagttatatagaataatttctactgttgtaga
    ID
    NO:
    90
    SEQ gtttaaaaccactttaaaatttctactattgta
    ID
    NO:
    91
    SEQ GTTTGAGAATGATGTAAAAATGTATGGTACACAGAAATGTTTTAATACCATATTTTTACA
    ID TCACTCTCAAACATACATCTCTTGTTACTGTTTATCGTATCCAGATTAAATTTCACGTTT
    NO: TT
    92
    SEQ CTCTACAACTGATAAAGAATTTCTACTTTTGTAGAT
    ID
    NO:93
    SEQ GTCTGGCCCCAAATTTTAATTTCTACTGTTGTAGAT
    ID
    NO:
    94
    SEQ GTCAAAAGACCTTTTTAATTTCTACTCTTGTAGAT
    ID
    NO:
    95
    SEQ GTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCC
    ID CGTTGAGCTTCTACGGAAGTGGCAC
    NO:
    96
    SEQ CGAGGTTCTGTCTTTTGGTCAGGACAACCGTCTAGCTATAAGTGCTGCAGGGGTGTGAGA
    ID AACTCCTATTGCTGGACGATGTCTCTTTTAACGAGGCATTAGCAC
    NO:
    97
    SEQ GAACGAGGGACGTTTTGTCTCCAATGATTTTGCTATGACGACCTCGAACTGTGCCTTCAA
    ID GTCTGAGGCGAAAAAGAAATGGAAAAAAGTGTCTCATCGCTCTACCTCGTAGTTAGAGG
    NO:
    98
    SEQ AATTACTGATGTTGTGATGAAGG
    ID
    NO:
    99
    SEQ TATACCATAAGGATTTAAAGACT
    ID
    NO:
    100
    SEQ GTCTTTACTCTCACCTTTCCACCTG
    ID
    NO:
    101
    SEQ ATTTGAAGGTATCTCCGATAAGTAAAACGCATCAAAG
    ID
    NO:
    102
    SEQ GTTTGAAGATATCTCCGATAAATAAGAAGCATCAAAG
    ID
    NO:
    103
    SEQ TTGTTTTAATACCATATTTTTACATCACTCTCAAAC
    ID
    NO:
    104
    SEQ AAAGAACGCTCGCTCAGTGTTCTGACCTTTCGAGCGCCTGTTCAGGGCGAAAACCCTGGG
    ID AGGCGCTCGAATCATAGGTGGGACAAGGGATTCGCGGCGAAAA
    NO:
    105
    SEQ GTTTGAGAATGATGTAAAAATGTATGGTACACAGAAATGTTTTAATACCATATTTTTACA
    ID TCACTCTCAAACATACATCTCTTGTTACTGTTTATCGTATCCAGATTAAATTTCACGTTT
    NO: TT
    106
    SEQ GTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCC
    ID CGTTGAGCTTCTACGGAAGTGGCAC
    NO:
    107
    SEQ MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQF
    ID FIEEILSSVCISEDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFK
    NO: NLFNQNLIDAKKGQESDLILWLKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFK
    108 GFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLENKAKYESLKDKAPEAINYEQIKKDLAE
    ELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNTIIGGKFVNGENTKRKGI
    NEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQSFYEQIA
    AFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEY
    ITQQIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILA
    NFAAIPMIFDEIAQNKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKL
    KIFHISQSEDKANILDKDEHFYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNF
    ENSTLANGWDKNKEPDNTAILFIKDDKYYLGVMNKKNNKIFDDKAIKENKGEGYKKIVYK
    LLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNGSPQKGYEKFEFNIEDCRKF
    IDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISESYIDSVVNQ
    GKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK
    ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEI
    NLLLKEKANDVHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAI
    EKDRDSARKDWKKINNIKEMKEGYLSQVVHEIAKLVIEYNAIVVFEDLNFGFKRGRFKVE
    KQVYQKLEKMLIEKLNYLVFKDNEFDKTGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAG
    FTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICYNLDKGYFEFSFDYKNFGDKAAKG
    KWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGECIKAAICGESD
    KKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDADANGAY
    HIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN
    SEQ MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQF
    ID FIEEILSSVCISEDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFK
    NO: NLFNQNLIDAKKGQESDLILWLKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFK
    109 GFHENRKNVYSSDDIPTSIIYRIVDDNLPKFLENKAKYESLKDKAPEAINYEQIKKDLAE
    ELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNTIIGGKFVNGENTKRKGI
    NEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQSFYEQIA
    AFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEY
    ITQQVAPKNLDNPSKKEQDLIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILA
    NFAAIPMIFDEIAQNKDNLAQISLKYQNQGKKDLLQASAEEDVKAIKDLLDQTNNLLHRL
    KIFHISQSEDKANILDKDEHFYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNF
    ENSTLANGWDKNKEPDNTAILFIKDDKYYLGVMNKKNNKIFDDKAIKENKGEGYKKIVYK
    LLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNGNPQKGYEKFEFNIEDCRKF
    IDFYKESISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISESYIDSVVNQ
    GKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK
    ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEI
    NLLLKEKANDVHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAI
    EKDRDSARKDWKKINNIKEMKEGYLSQVVHEIAKLVIEHNAIVVFEDLNFGFKRGRFKVE
    KQVYQKLEKMLIEKLNYLVFKDNEFDKTGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAG
    FTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICYNLDKGYFEFSFDYKNFGDKAAKG
    KWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGECIKAAICGESD
    KKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDADANGAY
    HIGLKGLMLLDRIKNNQEGKKLNLVIKNEEYFEFVQNRNN
    SEQ MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAE
    ID ATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFG
    NO: NIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSD
    110 VDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGN
    LIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI
    LLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA
    GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELH
    AILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE
    VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFL
    SGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKI
    IKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG
    RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSL
    HEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER
    MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDH
    IVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL
    TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKS
    KLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK
    MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF
    ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVA
    YSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPK
    YSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE
    QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA
    PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
    SEQ PKKKRKV
    ID
    NO:
    111
    SEQ KRPAATKKAGQAKKKK
    ID
    NO:
    112
    SEQ PAAKRVKLD
    ID
    NO:
    113
    SEQ RQRRNELKRSP
    ID
    NO:
    114
    SEQ NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY
    ID
    NO:
    115
    SEQ RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV
    ID
    NO:
    116
    SEQ VSRKRPRP
    ID
    NO:
    117
    SEQ PPKKARED
    ID
    NO:
    118
    SEQ PQPKKKPL
    ID
    NO:
    119
    SEQ SALIKKKKKMAP
    ID
    NO:
    120
    SEQ DRLRR
    ID
    NO:
    121
    SEQ PKQKKRK
    ID
    NO:
    122
    SEQ RKLKKKIKKL
    ID
    NO:
    123
    SEQ REKKKFLKRR
    ID
    NO:
    124
    SEQ KRKGDEVDGVDEVAKKKSKK
    ID
    NO:
    125
    SEQ RKCLQAGMNLEARKTKK
    ID
    NO:
    126
    SEQ ATGGGTAAGATGTATTATCTGGGTTTGGATATAGGCACTAACTCTGTGGGATATGCAGTA
    ID ACTGATCCCTCGTATCACTTGTTAAAGTTCAAAGGCGAACCCATGTGGGGAGCACATGTA
    NO: TTTGCTGCGGGTAATCAGAGTGCCGAAAGGCGATCTTTCAGAACATCCAGGAGGCGATTA
    127 GATAGGAGACAGCAAAGAGTAAAGCTTGTGCAAGAGATCTTTGCTCCTGTCATTTCACCT
    ATAGACCCTCGTTTTTTTATAAGATTGCACGAATCGGCTCTATGGAGAGACGATGTTGCC
    GAAACAGATAAACATATCTTTTTCAATGATCCCACTTATACAGACAAGGAATACTACTCC
    GACTACCCGACAATTCATCATTTGATCGTCGATCTTATGGAGAGCTCTGAAAAGCATGAC
    CCCCGACTTGTCTATTTGGCTGTAGCTTGGTTAGTTGCTCATAGAGGTCATTTCTTGAAT
    GAAGTAGATAAAGACAATATAGGTGATGTACTTTCTTTTGATGCTTTCTACCCGGAATTT
    TTGGCCTTTTTGTCAGACAATGGCGTCAGTCCCTGGGTCTGTGAGTCGAAGGCCCTTCAA
    GCTACTCTGCTGTCTAGGAATAGCGTGAACGACAAATATAAAGCATTAAAATCGCTGATA
    TTCGGATCGCAAAAACCGGAAGATAACTTTGACGCTAACATCTCTGAAGATGGTTTAATC
    CAATTGCTGGCGGGTAAGAAAGTTAAAGTAAACAAACTATTCCCACAAGAGTCCAACGAT
    GCTAGCTTTACGTTGAATGATAAAGAAGACGCTATTGAAGAAATTCTAGGTACTTTAACG
    CCTGACGAGTGCGAATGGATCGCTCATATTCGCAGATTGTTCGATTGGGCCATCATGAAA
    CACGCGCTAAAGGATGGCAGGACGATATCTGAATGAAAAGTGAAGCTATACGAGCAGCAT
    CATCATGACTTGACTCAGTTAAAGTACTTTGTGAAGACCTACCTAGCTAAAGAGTATGAT
    GATATCTTCAGAAACGTAGACTCCGAGACAACTAAAAATTATGTAGCTTATTCTTACCAT
    GTGAAGGAAGTGAAAGGCACATTACCAAAAAATAAAGCAACGCAAGAAGAATTTTGTAAA
    TACGTCCTTGGCAAAGTCAAAAACATTGAATGTTCCGAAGCAGACAAGGTTGATTttGAt
    GAAAtGAtAcAAcGActtAcGGAcAAttcttttAtGccAAAGcAAGtctcAGGtGAAAAT
    AGAGTAATACCATACCAGTTGTACTACTATGAATTAAAGACAATTTTAAACAAAGCCGCC
    TCATATCTACCTTTTTTGACACAATGCGGTAAAGATGCTATTTCTAACCAAGACAAATTA
    CTGTCTATAATGACATTTCGCATACCATATTTCGTCGGCCCTTTAAGGAAAGATAATTCA
    GAACATGCCTGGTTGGAACGTAAAGCGGGTAAAATTTACCCGTGGAACTTTAATGATAAA
    GTAGATCTTGATAAATCGGAGGAAGCCTTTATCCGTAGGATGACCAATACTTGCACGTAT
    TACCCAGGAGAAGACGTGTTACCATTAGATTCACTTATCTATGAAAAGTTTATGATCTTG
    AATGAGATAAACAATATTAGGATTGACGGATACCCCATTTCTGTTGATGTGAAACAACAA
    GTATTTGGTTTATTTGAGAAGAAAAGGCGAGTAACAGTTAAGGATATTCAAAATCTACTA
    TTATCTCTTGGAGCGTTGGATAAACACGGTAAGCTGACTGGTATTGACACGACAATACAC
    TCTAATTATAACACTTATCATCATTTTAAATCTCTTATGGAGCGGGGAGTATTGACCAGA
    GATGATGTGGAAAGAATAGTGGAAAGAATGACATATTCTGACGATACTAAGAGGGTCAGA
    CTGTGGTTAAATAATAATTATGGAACTCTAACAGCTGACGATGTTAAGCATATCTCAAGA
    CTCAGAAAACACGATTTCGGCCGTTTGTCTAAAATGTTTTTGACAGGATTGAAAGGTGTT
    CATAAGGAGACAGGCGAGAGAGCAAGTATACTGGATTTTATGTGGAATACTAACGACAAT
    TTAATGCAACTACTGTCCGAATGTTACACATTCTCGGATGAGATCACCAAATTACAAGAG
    GCCTACTACGCAAAAGCTCAATTATCGCTAAATGACTTCTTGGACTCTATGTATATATCA
    AACGCCGTTAAGAGACCTATTTATCGGACCTTAGCGGTAGTAAATGATATTAGAAAGGCA
    TGCGGGACGGCACCTAAAAGAATTTTCATCGAGATGGCGCGAGATGGAGAGTCTAAGAAG
    AAAAGATCTGTGACTCGTAGAGAGCAAATTAAAAATCTCTATAGATCAATTCGTAAAGAC
    TTTCAACAAGAAGTTGATTTTCTGGAAAAGATATTGGAAAATAAGAGTGACGGGCAGCTT
    CAGTCTGACGCTTTATATTTGTATTTTGCTCAATTAGGCAGAGACATGTACACAGGTGAT
    CCAATCAAATTAGAACATATTAAAGACCAATCTTTTTACAACATTGATCATATTTATCCT
    CAATCGATGGTGAAAGATGACAGTTTGGATAACAAGGTACTAGTCCAAAGCGAAATCAAT
    GGCGAAAAGAGTTCGCGCTATCCATTAGACGCAGCCATTAGAAACAAAATGAAGCCGTTG
    TGGGATGCCTACTATAATCATGGATTAATTTCTCTTAAGAAATACCAGCGTTTGACGAGA
    TCTACTCCATTTACGGACGACGAGAAGTGGGATTTTATCAATCGTCAGCTAGTTGAAACT
    AGGCAATCTACTAAAGCTTTAGCAATATTGTTAAAGCGTAAGTTTCCAGATACTGAAATA
    GTTTACTCAAAGGCTGGACTATCCAGCGATTTTAGACATGAATTCGGCCTGGTTAAGAGT
    AGGAATATTAATGATCTACACCATGCTAAAGATGCCTTTCTCGCAATAGTTACTGGGAAC
    GTTTATCATGAAAGATTTAATAGAAGATGGTTTATGGTTAACCAGCCATACTCTGTGAAA
    ACTAAGACATTGTTTACCCATTCAATTAAGAATGGCAACTTTGTCGCTTGGAATGGAGAA
    GAAGATCTTGGACGTATCGTAAAGATGTTGAAACAAAACAAGAACACAATCCACTTCACC
    AGGTTTTCCTTTGATAGGAAGGAGGGATTGTTCGATATTCAACCTCTCAAAGCTTCTACC
    GGATTGGTTCCACGAAAAGCAGGGTTGGATGTTGTTAAATATGGAGGATACGATAAAAGC
    ACTGCCGCGTATTATTTATTAGTACGTTTTACACTCGAGGATAAGAAGACTCAACACAAA
    TTGATGATGATTCCTGTTGAAGGTCTCTACAAAGCACGTATTGACCATGATAAAGAGTTT
    TTAACAGATTATGCTCAGACCACGATCAGCGAAATTCTTCAAAAGGACAAGCAGAAAGTG
    ATCAACATCATGTTCCCTATGGGCACGAGACATATCAAACTGAATTCGATGATTTCTATT
    GATGGATTCTATCTTTCTATTGGTGGGAAGAGTAGCAAAGGTAAGTCAGTACTATGTCAT
    GCTATGGTGCCATTAATCGTCCCACACAAGATAGAATGTTATATCAAGGCTATGGAATCG
    TTTGCAAGAAAATTCAAAGAAAATAATAAATTGAGGATCGTTGAAAAGTTTGATAAAATA
    ACTGTTGAAGATAACTTGAACTTATACGAGCTTTTTCTACAAAAGTTGCAACATAACCCA
    TATAATAAATTTTTCTCTACACAATTTGATGTGTTGACGAACGGTAGAAGTACATTCACC
    AAATTGTCTCCAGAGGAGCAAGTCCAGACTTTACTTAATATACTGAGTATATTTAAAACT
    TGTCGTTCTTCTGGGTGTGATTTAAAATCAATAAATGGTTCCGCTCAAGCGGCTAGAATT
    ATGATATCCGCTGATTTAACTGGCTTATCAAAAAAGTATTCAGATATTAGATTAGTTGAG
    CAAAGCGCATCAGGTCTATTTGTTTCAAAATCTCAAAATCTCTTGGAATACTTGCCAAAA
    AAGAAAAGGAAAGTTTAG
    SEQ ATGAGTAGTTTAACAAAGTTTACCAATAAATATAGTAAGCAACTAACTATAAAGAACGAA
    ID TTGATACCGGTCGGTAAGACTTTGGAAAACATAAAAGAAAATGGGTTGATTGATGGAGAC
    NO: GAGCAATTGAATGAGAATTATCAAAAAGCAAAGATAATAGTAGATGATTTTTTGAGAGAC
    128 TTTATTAATAAAGCTCTAAATAACACTCAAATTGGTAACTGGAGAGAGCTAGCCGACGCC
    TTGAACAAGGAAGATGAGGATAATATTGAGAAATTACAAGATAAGATTAGAGGGATTATC
    GTGTCTAAGTTTGAGACTTTTGATCTGTTCAGTTCGTATTCGATTAAAAAGGACGAGAAA
    ATCATCGATGATGATAACGATGTGGAAGAAGAGGAGCTAGACCTTGGGAAGAAGACATCT
    AGCTTCAAATACATATTCAAGAAAAATTTGTTCAAACTTGTCCTTCCTTCATATTTAAAA
    ACAACAAATCAAGATAAGTTAAAAATCATTTCTTCCTTCGATAATTTTAGTACTTATTTT
    CGTGGTTTTTTCGAAAACAGGAAAAATATATTCACTAAAAAGCCTATATCTACCTCTATA
    GCTTATAGAATTGTTCACGATAATTTCCCAAAATTTCTAGATAATATCAGGTGTTTTAAT
    GTTTGGCAAACCGAGTGTCCTCAGTTAATAGTCAAGGCCGACAACTACCTTAAAAGCAAG
    AATGTGATTGCAAAAGATAAGTCTTTGGCTAACTATTTTACAGTCGGTGCCTATGATTAT
    TTTCTGAGTCAAAATGGTATCGATTTCTATAACAACATTATTGGCGGCTTACCAGCTTTT
    GCCGGGCATGAGAAGATTCAGGGTTTGAACGAATTTATCAATCAAGAATGTCAAAAGGAT
    TCTGAATTAAAGTCTAAGCTCAAGAATAGGCACGCTTTCAAAATGGCAGTCTTATTCAAA
    CAAATCCTTTCAGACAGAGAAAAGTCATTTGTGATTGACGAGTTCGAATCAGACGCTCAG
    GTAATTGATGCTGTTAAAAATTTTTACGCGGAACAATGCAAAGATAATAACGTCATATTT
    AATTTATTGAATCTGATCAAGAATATTGCTTTTTTGTCGGATGATGAGTTAGACGGCATT
    TTCATAGAGGGTAAATACCTGTCCTCTGTGTCTCAAAAATTGTATAGTGATTGGTCAAAG
    TTGAGAAATGATATTGAAGATTCGGCTAATTCTAAACAGGGTAACAAAGAATTAGCGAAG
    AAAATCAAAACTAACAAGGGTGATGTTGAAAAGGCTATAAGTAAGTACGAGTTCAGTTTA
    TCTGAACTAAATTCAATTGTTCATGATAACACAAAATTTTCCGATCTTTTATCATGCACA
    TTACATAAAGTTGCAAGTGAAAAATTAGTCAAAGTAAACGAAGGTGATTGGCCAAAACAT
    CTAAAAAACAACGAGGAAAAACAGAAGATAAAAGAACCTCTTGACGCTTTATTGGAAATA
    TACAATACTCTATTAATATTTAACTGTAAAAGTTTTAACAAAAATGGTAATTTCTATGTC
    GACTACGATCGCTGCATTAATGAGTTGTCCAGTGTTGTGTACTTGTATAATAAAACTCGT
    AATTATTGTACGAAAAAGCCGTACAACACTGACAAATTTAAGTTGAATTTCAACTCCCCA
    CAACTGGGTGAGGGCTTCTCTAAAAGTAAAGAGAATGATTGCCTTACATTATTATTTAAA
    AAAGATGATAATTATTATGTCGGAATCATAAGAAAGGGGGCAAAGATCAACTTCGATGAC
    ACTCAGGCCATAGCAGACAACACAGATAACTGTATATTCAAAATGAATTATTTTTTGCTG
    AAGGATGCTAAAAAATTTATCCCCAAATGTTCAATACAATTAAAAGAGGTTAAGGCCCAT
    TTCAAAAAGTCGGAAGATGACTATATTTTGTCCGATAAGGAAAAATTCGCTAGTCCGCTT
    GTTATTAAAAAATCCACATTTCTTCTCGCTACGGCTCATGTGAAAGGAAAGAAGGGCAAT
    ATTAAGAAATTTCAGAAAGAATACTCCAAAGAAAATCCTACGGAGTATAGAAATAGTCTG
    AACGAATGGATAGCATTCTGCAAAGAGTTCTTGAAGACCTATAAAGCTGCCACCATCTTT
    GATATTACAACTTTGAAAAAGGCCGAGGAATACGCTGACATTGTGGAATTCTATAAGGAT
    GTAGATAATCTTTGTTACAAGTTAGAATTTTGCCCTATCAAAACTTCTTTTATCGAAAAT
    CTTATAGATAATGGCGATTTATACCTGTTTAGAATTAATAACAAGGACTTTTCTTCAAAA
    AGTACAGGCACGAAAAACTTACACACATTATACTTGCAGGCTATATTTGACGAGCGAAAC
    TTAAACAACCCCACGATAATGTTGAATGGAGGTGCAGAGTTATTCTACAGAAAAGAATCT
    ATAGAACAGAAAAATCGGATCACGCACAAAGCCGGTAGTATCTTAGTGAATAAAGTGTGC
    AAAGATGGTACAAGTCTAGATGACAAAATCCGTAACGAAATTTACCAGTATGAAAACAAA
    TTCATTGATACTCTTTCGGACGAAGCTAAAAAGGTTCTGCCAAACGTTATTAAGAAAGAG
    GCTACGCATGATATAACAAAAGATAAACGTTTCACTAGCGACAAATTCTTCTTTCATTGT
    CCTTTAACAATCAACTACAAGGAAGGTGACACCAAACAATTTAATAATGAAGTGCTCTCA
    TTCCTTAGAGGTAACCCCGATATCAATATTATCGGCATTGATAGAGGAGAAAGAAACCTA
    ATCTATGTAACAGTCATTAACCAAAAAGGCGAAATATTGGATAGCGTCTCCTTCAATACT
    GTCACCAATAAGTCATCGAAGATAGAACAAACTGTTGATTACGAAGAAAAATTGGCCGTT
    AGAGAAAAGGAACGTATCGAAGCGAAGAGATCTTGGGATAGCATATCCAAGATTGCCACC
    TTGAAGGAGGGTTATCTAAGCGCGATCGTACATGAAATCTGCTTATTAATGATTAAGCAT
    AATGCTATTGTCGTGTTAGAAAACCTGAATGCCGGTTTTAAAAGGATTAGAGGTGGTTTG
    TCAGAAAAGTCAGTATATCAAAAGTTTGAAAAGATGCTTATTAATAAACTCAACTACTTC
    GTTAGCAAGAAAGAAAGTGATTGGAATAAACCGTCAGGTTTGCTCAATGGTCTTCAGTTA
    AGTGATCAATTTGAGTCTTTCGAAAAATTAGGAATTCAAAGTGGATTCATTTTTTATGTA
    CCAGCCGCGTACACTTCAAAAATTGACCCTACGACCGGATTTGCCAACGTCTTGAATTTG
    TCCAAGGTCAGAAATGTTGACGCCATCAAAAGTTTTTTTAGCAACTTCAATGAAATCTCT
    TATTCCAAAAAGGAAGCCCTTTTCAAGTTTTCTTTTGACCTAGACTCGTTATCGAAGAAA
    GGATTTTCATCTTTCGTAAAGTTTAGCAAGTCCAAGTGGAATGTATACACATTCGGCGAG
    AGAATTATCAAGCCCAAGAACAAACAGGGCTATAGAGAAGACAAGAGAATCAACTTGACT
    TTTGAGATGAAAAAATTACTCAACGAATACAAGGTTTCATTTGATTTGGAGAACAACTTG
    ATTCCCAATTTGACATCAGCTAACTTGAAGGATACGTTCTGGAAGGAGTTATTCTTTATA
    TTCAAAACGACATTACAACTGCGTAATAGTGTTACAAACGGTAAAGAAGATGTATTAATC
    TCACCTGTAAAGAATGCCAAAGGAGAATTTTTCGTATCCGGTACTCACAATAAGACACTA
    CCACAGGATTGCGACGCTAACGGTGCGTATCATATTGCGTTGAAAGGATTAATGATACTT
    GAAAGAAATAACCTTGTTCGCGAAGAAAAAGACACCAAGAAGATCATGGCTATTAGCAAT
    GTTGATTGGTTTGAATACGTGCAAAAGAGGAGAGGTGTTTTGTAA
    SEQ ATGAACAATTATGACGAGTTCACAAAGCTATACCCTATCCAAAAAACTATCAGGTTCGAA
    ID TTGAAACCACAAGGGAGAACAATGGAACATCTGGAGACATTCAACTTTTTTGAAGAGGAC
    NO: AGAGACAGAGCGGAGAAATACAAAATTTTAAAAGAGGCCATCGATGAATATCACAAAAAG
    129 TTTATCGACGAGCATTTAACAAACATGTCTTTGGACTGGAATTCACTTAAACAAATTTCT
    GAGAAATATTATAAGTCTCGGGAGGAAAAAGACAAAAAGGTCTTTTTGTCCGAGCAAAAG
    AGAATGAGACAAGAAATTGTCTCGGAGTTTAAAAAAGATGATCGGTTCAAAGATTTGTTT
    AGCAAGAAATTGTTTTCTGAATTGTTGAAGGAGGAGATATACAAGAAAGGCAACCATCAA
    GAAATAGATGCTTTGAAATCGTTTGACAAGTTCAGCGGTTACTTCATTGGTTTACATGAA
    AATAGGAAGAACATGTATAGCGACGGCGATGAGATCACCGCTATATCGAATAGAATCGTT
    AACGAAAATTTTCCGAAATTTTTGGATAATTTGCAAAAATACCAGGAAGCTAGGAAAAAG
    TACCCTGAATGGATAATAAAGGCGGAATCAGCTTTGGTGGCTCACAACATAAAGATGGAT
    GAAGTCTTCTCGCTGGAATATTTTAACAAAGTATTAAATCAGGAAGGAATCCAAAGATAC
    AACTTAGCCTTGGGTGGATACGTAACCAAATCAGGTGAGAAAATGATGGGCTTAAATGAT
    GCACTTAATCTAGCTCACCAATCCGAAAAGTCCTCTAAAGGGAGGATACACATGACACCA
    TTGTTTAAGCAAATCCTTTCGGAGAAAGAATCTTTTTCATATATCCCCGATGTTTTCACT
    GAGGATAGTCAATTGTTGCCCAGCATTGGTGGATTTTTTGCACAAATAGAAAATGATAAA
    GATGGTAACATCTTCGATAGAGCCTTGGAATTGATAAGCTCCTATGCAGAATACGATACG
    GAACGAATATACATTAGACAAGCTGACATCAACAGAGTAAGCAATGTTATTTTTGGTGAG
    TGGGGAACTTTAGGTGGATTAATGCGGGAGTACAAAGCTGACTCAATCAATGATATTAAT
    TTGGAACGTACGTGCAAAAAAGTCGATAAGTGGCTTGATAGTAAGGAGTTTGCTCTGTCG
    GATGTACTAGAAGCAATTAAGAGAACAGGAAACAATGATGCATTTAATGAATATATTAGT
    AAAATGAGGACGGCTAGAGAAAAGATAGACGCCGCACGTAAGGAAATGAAGTTTATTTCC
    GAGAAAATATCTGGCGATGAAGAGTCGATTCACATCATCAAGACCCTACTCGATTCTGTT
    CAGCAATTTCTCCATTTTTTTAACCTCTTCAAAGCAAGACAAGACATTCCCTTAGATGGG
    GCTTTTTATGCCGAATTTGATGAAGTTCATTCAAAGTTGTTTGCTATTGTTCCTCTTTAC
    AATAAGGTCCGTAATTACCTTACTAAAAATAACTTGAACACCAAGAAAATAAAGTTAAAC
    TTCAAGAATCCGACTCTTGCCAACGGGTGGGATCAGAATAAAGTTTATGATTATGCTAGC
    TTAATATTTCTAAGAGATGGGAATTATTACTTAGGAATCATCAATCCAAAGCGTAAGAAA
    AACATTAAATTTGAACAAGGGTCAGGCAATGGCCCATTCTATAGAAAAATGGTGTATAAG
    CAAATACCAGGACCTAACAAGAACTTGCCTCGCGTATTTTTAACTTCAACAAAGGGTAAA
    AAAGAATATAAACCAAGCAAAGAAATTATTGAAGGTTACGAAGCAGATAAACACATCAGA
    GGTGATAAGTTCGATCTGGATTTCTGCCATAAATTGATTGACTTTTTTAAGGAATCTATA
    GAAAAACATAAGGACTGGTCCAAATTTAATTTCTACTTCTCACCTACAGAAAGTTATGGT
    GACATTTCAGAATTTTATTTAGACGTTGAGAAACAAGGATATAGGATGCATTTTGAAAAT
    ATTTCAGCGGAAACCATCGACGAATACGTTGAGAAGGGTGATTTATTCTTGTTCCAAATT
    TACAATAAAGACTTCGTTAAAGCTGCAACCGGAAAGAAGGATATGCATACCATATATTGG
    AACGCTGCATTCTCGCCAGAAAACTTACAAGATGTCGTTGTAAAGCTTAATGGAGAAGCT
    GAGCTGTTCTATAGAGACAAGAGTGATATAAAAGAGATTGTGCATCGGGAAGGTGAAATT
    CTGGTGAACAGAACTTACAATGGTCGTACACCCGTTCCAGACAAAATACATAAAAAACTG
    ACCGATTATCATAATGGTAGGACAAAGGACTTGGGCGAGGCCAAGGAGTACCTCGATAAA
    GTTAGATATTTCAAGGCACACTATGATATTACGAAAGACAGGAGATATTTAAACGATAAA
    ATTTACTTTCATGTCCCTTTGACCCTTAACTTTAAAGCTAATGGTAAAAAGAATTTGAAC
    AAAATGGTAATTGAGAAGTTTTTATCGGACGAAAAAGCTCACATAATCGGAATCGACCGC
    GGAGAGAGAAATTTACTGTATTATAGTATCATCGACAGAAGTGGAAAGATTATTGATCAG
    CAATCTTTGAACGTCATTGATGGGTTTGACTATCGGGAAAAGTTAAATCAAAGGGAAATT
    GAAATGAAGGATGCGAGACAATCATGGAATGCCATTGGTAAAATTAAAGATCTCAAGGAG
    GGGTACTTATCAAAAGCTGTACACGAGATAACTAAAATGGCTATCCAATATAATGCAATT
    GTTGTAATGGAAGAATTGAATTATGGTTTTAAACGCGGCAGGTTTAAAGTCGAAAAACAA
    ATATACCAAAAGTTTGAAAACATGTTAATTGATAAGATGAACTATCTTGTTTTCAAAGAT
    GCACCTGATGAGAGTCCTGGCGGTGTGCTGAACGCCTATCAATTAACAAACCCATTAGAG
    TCCTTTGCTAAACTGGGTAAACAAACTGGCATTCTATTTTATGTTCCAGCCGCTTACACC
    TCAAAGATCGATCCAACGACCGGTTTTGTAAACTTATTTAATACTTCTTCCAAAACAAAC
    GCGCAAGAACGCAAAGAATTCCTACAAAAATTTGAATCAATATCCTATAGCGCAAAAGAT
    GGAGGTATATTCGCTTTCGCTTTTGACTACAGAAAGTTTGGCACTTCCAAGACAGATCAT
    AAAAATGTGTGGACCGCTTATACCAACGGAGAAAGGATGCGTTATATTAAAGAAAAAAAG
    AGGAACGAACTATTTGATCCATCGAAAGAAATTAAAGAAGCTTTGACAAGCAGCGGAATC
    AAATATGATGGAGGTCAAAACATACTTCCAGATATTCTCAGATCTAATAATAACGGTCTT
    ATTTACACGATGTATTCATCTTTTATCGCTGCCATCCAAATGCGTGTGTATGATGGCAAG
    GAAGATTATATTATATCTCCTATTAAAAATTCAAAGGGTGAATTTTTTCGCACGGATCCA
    AAAAGAAGAGAGCTTCCAATTGACGCCGATGCTAACGGTGCTTACAATATTGCATTGCGT
    GGTGAACTTACTATGAGAGCCATCGCCGAAAAGTTTGATCCGGACAGTGAAAAAATGGCG
    AAATTGGAGCTAAAGCACAAGGATTGGTTTGAATTCATGCAGACCCGTGGCGATTGA
    SEQ ATGACTAAAACGTTCGACTCCGAGTTTTTTAATCTCTATTCCTTGCAAAAGACCGTTAGG
    ID TTTGAATTGAAACCAGTTGGTGAAACTGCCTCATTTGTCGAAGACTTTAAAAACGAGGGA
    NO: TTGAAAAGAGTGGTTAGTGAAGATGAAAGAAGGGCAGTAGACTATCAAAAGGTTAAAGAA
    130 ATCATTGACGATTACCACAGAGATTTTATAGAAGAATCTCTGAACTATTTTCCAGAGCAG
    GTTTCAAAAGATGCTCTAGAGCAAGCGTTTCATTTGTATCAAAAGTTGAAAGCAGCGAAG
    GTGGAAGAAAGGGAAAAAGCTTTAAAAGAATGGGAAGCATTACAGAAAAAATTGCGAGAA
    AAAGTCGTCAAATGTTTCAGCGACTCTAATAAAGCTCGCTTTTCTAGAATCGATAAAAAA
    GAATTGATTAAGGAAGATTTAATAAATTGGCTGGTAGCACAAAACAGAGAGGATGATATT
    CCTACTGTTGAAACGTTCAATAATTTTACTACTTACTTCACTGGTTTCCATGAGAACAGG
    AAGAATATTTACTCTAAAGATGATCACGCTACTGCTATAAGTTTTAGGTTGATTCACGAA
    AACTTGCCTAAATTTTTTGACAATGTCATCAGTTTTAACAAGTTGAAAGAAGGTTTCCCG
    GAATTAAAATTCGACAAAGTTAAAGAAGATTTAGAAGTAGATTACGACTTGAAGCATGCG
    TTTGAAATTGAATATTTCGTTAATTTCGTCACACAAGCTGGTATCGACCAATATAATTAC
    CTGCTTGGAGGCAAAACTCTAGAAGACGGTACGAAGAAACAAGGAATGAATGAACAGATT
    AATTTATTTAAGCAACAACAAACTCGCGATAAAGCTAGACAGATTCCAAAACTGATTCCA
    CTTTTCAAACAGATTCTATCTGAGAGAACTGAATCTCAGAGTTTTATCCCTAAGCAGTTC
    GAGTCTGATCAGGAACTATTCGATTCCCTGCAGAAATTGCATAACAACTGTCAAGATAAG
    TTTACCGTTTTGCAACAGGCGATCTTGGGATTGGCTGAGGCAGATCTTAAAAAGGTCTTT
    ATTAAAACTAGTGATCTAAACGCATTGTCTAACACTATTTTTGGAAATTATTCTGTGTTC
    TCAGACGCGCTCAATTTATATAAAGAGTCGCTAAAAACTAAAAAGGCTCAAGAAGCTTTT
    GAAAAGTTGCCTGCACATAGTATTCATGATTTAATCCAATACTTAGAACAATTTAATTCG
    TCTCTCGATGCTGAAAAGCAACAGTCTACCGATACTGTATTAAACTACTTTATTAAAACC
    GACGAATTATATAGTCGTTTCATTAAATCCACCTCTGAGGCATTCACCCAAGTACAACCT
    CTCTTTGAACTGGAAGCTTTGAGCTCCAAAAGAAGACCCCCAGAAAGTGAAGATGAGGGG
    GCTAAAGGCCAAGAAGGTTTCGAACAAATTAAGAGAATCAAAGCTTATCTAGACACTCTA
    ATGGAGGCTGTCCACTTTGCTAAGCCTTTGTATCTTGTCAAGGGTAGAAAGATGATAGAG
    GGTCTAGACAAGGATCAAAGCTTCTACGAAGCGTTTGAAATGGCCTACCAGGAGTTGGAG
    TCTTTAATCATCCCCATTTACAATAAGGCCAGATCTTACCTGTCTAGGAAGCCATTTAAA
    GCGGATAAATTCAAAATTAATTTTGACAATAATACACTTCTATCTGGGTGGGATGCTAAC
    AAGGAGACGGCTAACGCCAGCATATTGTTTAAGAAGGATGGTTTATACTACCTGGGAATC
    ATGCCAAAAGGCAAAACTTTCTTGTTCGATTATTTCGTTAGTTCAGAAGATTCTGAAAAG
    TTGAAACAACGGAGACAGAAAACCGCAGAGGAAGCGCTCGCACAGGATGGAGAATCCTAT
    TTTGAAAAAATACGGTATAAACTCCTACCAGGTGCTAGTAAGATGTTGCCAAAGGTATTT
    TTTAGCAATAAAAATATTGGGTTTTACAATCCCTCAGATGATATTCTACGAATTCGGAAT
    ACGGCCTCTCATACTAAGAATGGTACTCCCCAGAAGGGTCATTCCAAGGTAGAATTTAAC
    TTGAATGACTGTCACAAAATGATTGATTTTTTTAAATCTTCCATACAGAAACATCCCGAG
    TGGGGATCCTTTGGTTTCACTTTTTCTGATACGTCGGACTTTGAAGATATGAGTGCTTTC
    TACCGAGAAGTTGAAAATCAAGGTTACGTTATAAGTTTTGATAAAATAAAAGAAACTTAC
    ATTCAGTCTCAAGTTGAGCAAGGTAACTTATATTTATTTCAAATTTACAACAAAGATTTT
    AGTCCGTATTCAAAGGGAAAGCCAAACCTGCACACTTTATACTGGAAAGCTCTGTTTGAA
    GAGGCTAATTTGAATAACGTAGTGGCTAAGCTAAACGGCGAAGCAGAAATCTTTTTCAGA
    AGACACAGTATCAAAGCATCTGATAAAGTGGTACATCCTGCTAATCAAGCTATAGATAAT
    AAGAATCCCCATACTGAGAAGACGCAGTCCACATTTGAATATGACTTGGTCAAAGACAAA
    AGATATACCCAAGACAAATTTTTTTTTCATGTACCGATATCTTTAAACTTTAAGGCTCAG
    GGCGTTTCAAAGTTTAATGATAAGGTAAATGGATTCTTAAAGGGCAATCCCGACGTTAAT
    ATAATCGGTATAGATCGAGGTGAGAGACATCTTTTATACTTTACCGTGGTGAATCAAAAA
    GGAGAAATATTAGTGCAAGAGTCCTTGAATACATTAATGTCTGACAAGGGTCATGTCAAC
    GATTATCAACAGAAATTGGACAAGAAGGAACAGGAAAGGGACGCTGCCAGGAAGTCCTGG
    ACGACAGTAGAAAATATTAAAGAATTAAAAGAAGGTTATTTATCACATGTGGTTCATAAA
    CTTGCACATTTAATCATCAAATATAACGCAATAGTGTGCTTGGAAGATCTTAATTTTGGC
    TTCAAGAGGGGTAGGTTCAAGGTCGAAAAACAGGTCTACCAGAAGTTCGAGAAAGCTCTG
    ATCGATAAATTGAATTATCTTGTTTTCAAAGAAAAAGAATTAGGAGAAGTTGGTCATTAT
    CTTACAGCATACCAACTCACTGCACCATTTGAAAGCTTCAAAAAGCTAGGCAAGCAATCT
    GGGATTTTGTTCTATGTTCCGGCTGATTATACATCAAAGATAGATCCTACCACAGGCTTT
    GTAAATTTTTTAGATCTTAGGTACCAATCCGTTGAAAAAGCTAAACAGTTGCTGTCCGAT
    TTTAATGCGATAAGATTTAATAGTGTTCAGAATTATTTTGAGTTCGAAATTGATTATAAA
    AAATTGACACCAAAACGTAAAGTAGGAACACAATCTAAATGGGTTATTTGTACCTATGGA
    GATGTTAGATACCAAAACAGAAGAAATCAGAAAGGTCACTGGGAAACTGAAGAAGTTAAC
    GTTACTGAAAAACTTAAAGCTCTATTTGCGAGCGATTCAAAAACGACGACGGTGATCGAT
    TATGCAAATGATGATAACCTTATTGATGTAATTCTGGAACAAGATAAGGCATCATTTTTT
    AAAGAACTACTATGGTTGTTAAAGCTAACCATGACCCTAAGGCACTCCAAGATAAAGTCA
    GAGGATGATTTTATCCTCTCTCCAGTGAAAAACGAACAAGGTGAGTTTTACGACTCAAGA
    AAGGCGGGTGAAGTCTGGCCTAAGGATGCTGATGCCAATGGAGCTTATCACATCGCTCTG
    AAGGGGCTATGGAACTTACAGCAAATTAACCAATGGGAAAAAGGTAAAACTTTAAACCTC
    GCCATAAAGAACCAGGATTGGTTCAGCTTTATCCAAGAAAAACCATATCAAGAATAA
    SEQ ATGCACACAGGAGGTCTACTCTCGATGGATGCTAAGGAATTTACCGGTCAATATCCGCTG
    ID TCCAAAACTTTGCGTTTTGAGCTTAGACCTATTGGCCGAACGTGGGATAACCTAGAGGCT
    NO: TCTGGTTATTTGGCGGAAGATAGACATAGAGCTGAGTGTTATCCCCGAGCTAAAGAATTG
    131 CTGGATGATAACCACAGGGCGTTCCTGAATAGAGTTCTACCGCAAATCGATATGGATTGG
    CATCCAATTGCTGAAGCTTTCTGCAAGGTGCACAAAAATCCAGGTAATAAAGAATTGGCT
    CAGGATTATAATTTGCAGCTTAGTAAGAGAAGAAAAGAAATTTCCGCTTATTTGCAGGAT
    GCTGATGGATACAAGGGGTTGTTCGCGAAACCTGCCCTGGACGAAGCTATGAAAATAGCT
    AAGGAAAACGGCAATGAATCTGATATTGAAGTTTTGGAAGCCTTCAATGGATTTTCCGTT
    TATTTCACTGGTTATCATGAGAGTAGGGAGAATATATACTCAGACGAAGATATGGTATCC
    GTCGCCTATCGCATAACTGAAGATAATTTTCCAAGGTTCGTGTCGAACGCGTTAATTTTT
    GATAAACTAAATGAATCGCACCCGGATATTATTTCGGAAGTGTCCGGTAATCTGGGGGTA
    GACGATATTGGTAAATATTTTGATGTGTCCAACTACAATAATTTCCTTAGTCAAGCAGGA
    ATTGATGACTACAACCATATTATAGGAGGGCATACAACTGAAGACGGTCTCATTCAAGCT
    TTTAACGTAGTGTTAAACCTAAGGCACCAAAAAGACCCAGGTTTTGAGAAAATTCAATTT
    AAGCAACTCTACAAGCAGATACTGAGCGTTAGGACTAGTAAGTCATATATCCCAAAGCAA
    TTCGATAACTCAAAGGAAATGGTCGACTGTATATGCGACTACGTCTCAAAAATAGAAAAA
    TCTGAAACAGTAGAAAGAGCTCTGAAATTGGTAAGAAATATATCTTCTTTTGATTTAAGA
    GGTATTTTCGTAAATAAAAAAAACCTTCGAATTTTGTCTAATAAGTTAATTGGAGACTGG
    GACGCAATAGAGACAGCTTTGATGCACAGTTCCAGCAGTGAAAACGATAAGAAATCAGTG
    TATGACTCTGCAGAGGCATTCACCCTTGATGATATCTTCAGTTCTGTGAAAAAGTTCAGC
    GACGCCTCCGCTGAGGATATAGGAAACCGCGCTGAAGACATATGTCGTGTTATCTCAGAA
    ACAGCTCCTTTCATTAACGACTTAAGGGCTGTAGATTTGGATTCTTTAAATGATGACGGC
    TATGAAGCGGCCGTGTCTAAAATACGGGAATCTCTTGAACCCTACATGGATCTATTTCAC
    GAATTGGAGATCTTTAGCGTGGGTGATGAGTTTCCTAAATGTGCTGCCTTTTATAGCGAG
    TTGGAAGAGGTCTCAGAACAACTGATTGAAATCATTCCTTTATTTAACAAAGCAAGAAGT
    TTTTGCACAAGGAAAAGGTATTCAACCGACAAAATCAAAGTCAATTTAAAATTCCCTACT
    CTGGCAGATGGATGGGATCTAAATAAAGAAAGGGATAACAAAGCCGCAATTCTAAGAAAA
    GACGGTAAATACTACCTGGCAATTTTAGACATGAAGAAAGATCTCAGTAGTATTCGTACG
    AGCGATGAGGACGAGTCTTCTTTTGAAAAGATGGAATATAAATTGCTCCCTTCTCCTGTG
    AAAATGCTTCCAAAAATTTTTGTTAAATCGAAAGCCGCCAAAGAAAAGTACGGGTTGACC
    GATAGAATGTTAGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCTTTTGATTTG
    GGTTTTTGTCATGAATTGATCGATTACTATAAGCGCTGCATTGCCGAGTACCCAGGCTGG
    GATGTTTTCGACTTTAAATTTCGTGAGACAAGCGATTACGGATCCATGAAAGAATTTAAT
    GAAGACGTCGCTGGCGCAGGTTACTATATGTCACTTAGAAAGATTCCATGTTCCGAAGTT
    TATCGTTTACTGGACGAGAAGTCAATTTACTTGTTTCAAATATATAATAAGGATTATAGC
    GAAAACGCACATGGGAATAAGAATATGCATACGATGTATTGGGAGGGCTTGTTCTCACCA
    CAAAATTTGGAATCACCAGTCTTCAAATTGTCCGGAGGCGCAGAACTTTTTTTCAGAAAG
    TCATCTATTCCTAATGACGCTAAAACGGTACATCCGAAAGGTTCAGTTCTTGTTCCCAGA
    AACGACGTCAATGGTAGAAGAATACCAGACTCGATCTACAGAGAGTTGACAAGGTATTTT
    AACCGTGGGGATTGCAGGATCAGTGATGAAGCTAAGTCTTACCTGGACAAGGTCAAGACA
    AAAAAAGCGGACCATGACATTGTTAAGGATAGAAGATTTACTGTAGATAAGATGATGTTC
    CATGTTCCGATTGCCATGAATTTTAAAGCTATAAGTAAACCAAATCTTAATAAGAAAGTT
    ATTGATGGCATAATAGATGATCAAGATTTGAAAATCATCGGTATCGATCGTGGTGAGAGA
    AATCTTATTTATGTGACCATGGTCGATAGGAAGGGGAATATATTGTATCAAGACAGTCTT
    AATATTTTAAATGGATACGATTACCGCAAAGCTTTAGACGTGAGGGAATATGATAACAAA
    GAAGCTAGAAGGAATTGGACTAAAGTAGAAGGTATTAGAAAAATGAAAGAAGGTTATTTA
    TCTTTAGCTGTTAGTAAATTGGCCGATATGATCATCGAAAATAATGCTATAATCGTAATG
    GAAGATTTGAATCACGGGTTTAAGGCAGGTCGTTCCAAAATTGAAAAGCAGGTGTATCAA
    AAATTCGAATCAATGTTAATCAACAAGTTAGGATACATGGTGCTAAAAGACAAGTCCATT
    GACCAGTCTGGTGGAGCCCTTCATGGTTACCAATTAGCCAATCATGTTACGACCTTAGCT
    AGCGTGGGTAAACAATGTGGAGTAATTTTTTACATACCTGCAGCTTTTACTTCGAAGATT
    GATCCCACCACGGGCTTTGCTGATTTATTCGCTCTCTCTAATGTGAAGAATGTCGCTTCT
    ATGAGAGAGTTCTTCTCCAAAATGAAGTCAGTAATATATGACAAGGCGGAAGGCAAATTC
    GCCTTTACATTTGATTATTTGGATTATAACGTTAAAAGCGAATGTGGACGTACCTTATGG
    ACTGTGTATACAGTTGGTGAACGCTTCACCTACTCTAGAGTAAACCGAGAGTATGTTCGG
    AAAGTCCCAACAGATATCATCTATGATGCATTACAAAAAGCTGGTATTAGCGTCGAAGGT
    GACCTTAGAGATAGAATCGCGGAAAGCGACGGTGACACATTAAAGTCTATATTCTACGCT
    TTTAAATACGCGTTGGATATGAGAGTCGAAAACAGAGAGGAAGACTATATACAGTCACCT
    GTGAAGAATGCTTCTGGTGAGTTCTTTTGTTCAAAAAACGCCGGAAAGTCTTTGCCGCAG
    GATTCAGATGCAAATGGTGCCTATAATATAGCTCTGAAAGGGATCCTACAACTCAGAATG
    TTGAGCGAACAATACGATCCAAATGCAGAATCGATTAGATTGCCACTTATAACTAACAAG
    GCATGGTTAACTTTTATGCAATCCGGTATGAAAACTTGGAAGAATTAA
    SEQ ATGGATTCTCTTAAGGATTTCACTAATTTATATCCAGTCTCGAAAACATTGCGGTTCGAA
    ID TTGAAACCAGTTGGGAAAACTCTAGAAAACATTGAAAAAGCCGGTATATTGAAAGAAGAT
    NO: GAACACAGAGCGGAATCCTACCGCCGGGTAAAAAAGATAATTGACACATACCATAAAGTG
    132 TTTATTGACAGCTCCTTAGAGAACATGGCTAAAATGGGGATAGAAAATGAAATCAAGGCT
    ATGCTGCAGTCTTTTTGTGAACTCTATAAGAAAGACCACAGGACAGAAGGAGAAGATAAA
    GCTCTTGATAAAATTAGAGCTGTTCTTAGAGGTTTAATCGTTGGGGCTTTCACTGGTGTA
    TGTGGAAGACGAGAAAACACAGTACAAAATGAAAAGTACGAGAGTTTGTTCAAAGAAAAA
    TTGATAAAGGAAATTTTGCCAGATTTCGTGTTGTCCACCGAGGCTGAGTCTCTTCCATTC
    AGCGTTGAAGAAGCAACAAGGAGCTTAAAAGAGTTTGACTCATTCACTTCTTATTTTGCT
    GGTTTTTACGAAAATAGAAAGAATATTTATTCCACGAAACCGCAAAGTACTGCGATAGCC
    TACAGATTAATTCATGAAAACTTGCCTAAATTTATAGATAATATTTTGGTCTTCCAGAAG
    ATTAAAGAACCAATCGCTAAAGAACTTGAGCACATAAGAGCAGATTTTAGCGCAGGCGGA
    TATATCAAAAAAGATGAACGGCTAGAAGACATATTCTCATTAAATTACTACATTCATGTC
    CTTTCTCAAGCTGGTATAGAAAAATATAATGCTTTAATCGGGAAGATAGTGACGGAAGGT
    GATGGTGAAATGAAAGGTCTTAATGAACATATTAACTTATATAACCAACAGAGGGGTCGA
    GAGGATAGGTTGCCCTTGTTTAGGCCTCTATACAAGCAAATCCTGTCCGATAGAGAGCAA
    TTGTCTTATTTACCTGAATCATTTGAAAAAGATGAAGAGCTGCTTAGAGCACTTAAGGAA
    TTTTACGATCACATCGCCGAAGACATCTTGGGTAGAACACAGCAATTGATGACTTCAATT
    TCTGAATACGACTTGTCCCGTATTTATGTCAGAAATGATTCTCAACTTACAGACATCTCG
    AAGAAAATGCTAGGAGATTGGAACGCCATTTATATGGCTAGAGAACGAGCCTACGACCAC
    GAACAGGCTCCTAAACGTATTACTGCTAAATACGAACGTGATAGAATCAAGGCCTTAAAA
    GGTGAAGAGTCAATTTCATTGGCGAATCTGAACAGCTGTATAGCTTTCTTGGACAATGTA
    AGGGATTGTCGAGTTGACACATACCTATCAACTTTGGGGCAGAAAGAGGGTCCTCATGGC
    TTAAGTAACTTGGTGGAAAACGTCTTCGCCTCATATCATGAAGCAGAACAGTTATTGTCG
    TTTCCTTACCCCGAAGAGAACAACCTTATTCAGGACAAAGACAATGTAGTTTTGATCAAA
    AACCTATTGGATAATATAAGTGATTTACAACGTTTCCTTAAACCTTTGTGGGGAATGGGC
    GATGAACCTGACAAAGACGAAAGGTTTTACGGTGAATACAACTATATTAGAGGAGCGCTT
    GACCAGGTAATACCTTTGTACAATAAAGTAAGGAACTACTTGACTCGTAAACCATATTCT
    ACTAGAAAAGTTAAATTGAACTTTGGTAATTCACAGCTGCTGAGTGGTTGGGATCGTAAT
    AAAGAAAAAGATAACTCCTGTGTTATCTTGCGAAAAGGACAAAACTTTTACTTGGCAATT
    ATGAACAACCGTCACAAAAGGTCCTTCGAGAACAAAGTTCTGCCTGAATACAAAGAAGGT
    GAACCATATTTTGAAAAAATGGACTATAAATTCCTGCCAGATCCTAATAAAATGTTGCCT
    AAGGTCTTCTTGTCTAAAAAAGGTATAGAAATATATAAACCATCCCCGAAGTTGCTGGAG
    CAATATGGTCATGGAACGCACAAAAAAGGTGACACTTTTAGTATGGATGACTTGCACGAG
    TTGATTGATTTTTTTAAACATTCCATTGAAGCGCACGAAGATTGGAAACAATTTGGTTTC
    AAGTTCTCTGACACAGCCACTTACGAAAATGTATCGTCCTTTTATAGAGAAGTGGAAGAT
    CAGGGTTATAAACTGTCATTCCGTAAGGTTAGTGAAAGCTATGTGTACTCGTTGATCGAT
    CAAGGGAAGCTTTATCTTTTTCAAATCTATAATAAAGATTTCTCTCCTTGTTCAAAGGGC
    ACACCTAATCTTCATACACTATACTGGAGAATGCTTTTCGATGAAAGAAATTTGGCTGAT
    GTGATCTATAAATTAGACGGTAAAGCTGAGATTTTTTTCAGAGAGAAATCCCTGAAAAAC
    GACCATCCAACTCATCCGGCAGGTAAACCGATTAAAAAGAAATCCCGGCAAAAAAAGGGC
    GAAGAGAGTTTATTCGAGTATGATTTAGTTAAGGACAGACATTATACAATGGACAAATTT
    CAATTTCATGTGCCCATTACTATGAACTTTAAGTGTAGTGCAGGGTCTAAGGTTAATGAT
    ATGGTAAACGCACATATTAGAGAAGCTAAAGATATGCACGTCATCGGTATTGATCGCGGA
    GAAAGAAATTTACTTTACATTTGCGTTATCGATTCTAGGGGCACCATCTTGGATCAAATC
    TCTTTGAACACTATAAATGATATTGACTATCATGATCTACTAGAGAGTCGGGATAAAGAC
    AGGCAACAAGAAAGAAGAAATTGGCAAACAATTGAAGGTATTAAAGAATTAAAGCAAGGC
    TATCTAAGCCAGGCTGTACACAGAATTGCCGAATTAATGGTAGCATATAAAGCTGTCGTA
    GCTCTAGAAGACTTGAACATGGGTTTCAAAAGAGGGCGCCAGAAGGTCGAAAGTAGTGTT
    TATCAACAATTTGAAAAACAGTTAATAGATAAGTTGAATTATCTAGTGGATAAAAAAAAG
    CGTCCTGAGGACATTGGCGGTTTATTAAGAGCCTACCAATTCACTGCGCCATTTAAATCG
    TTCAAAGAAATGGGTAAACAAAACGGTTTTCTATTCTACATCCCCGCATGGAATACCTCA
    AATATAGATCCAACTACCGGTTTCGTCAACTTATTTCATGCTCAATATGAGAATGTGGAC
    AAAGCAAAATCATTCTTTCAAAAATTTGATAGCATTAGCTACAATCCTAAAAAAGATTGG
    TTTGAATTTGCGTTCGATTATAAAAATTTCACCAAGAAGGCTGAAGGTTCCAGATCTATG
    TGGATATTGTGCACCCACGGAAGTAGAATTAAGAACTTCCGTAATTCACAGAAAAACGGC
    CAGTGGGACAGCGAAGAATTCGCCCTAACCGAAGCTTTCAAAAGTCTTTTCGTAAGATAC
    GAGATAGACTATACAGCTGATCTAAAGACAGCTATTGTGGATGAGAAGCAAAAAGACTTC
    TTTGTCGACCTTCTTAAGTTGTTCAAGTTAACTGTGCAGATGAGAAATAGTTGGAAGGAA
    AAAGACCTAGATTACTTGATTAGCCCAGTCGCTGGTGCAGATGGCAGATTTTTTGATACA
    CGTGAAGGCAATAAATCACTACCAAAAGACGCGGACGCTAATGGCGCATACAACATCGCA
    TTGAAGGGTTTGTGGGCTCTCAGGCAGATTAGGCAGACAAGTGAGGGTGGTAAGCTTAAG
    CTGGCGATTTCTAATAAGGAATGGTTACAGTTTGTTCAAGAAAGATCCTACGAAAAAGAT
    TAA
    SEQ ATGAACAATGGTACTAATAATTTTCAAAACTTCATAGGGATTTCTAGCCTTCAAAAGACA
    ID TTGAGAAATGCTTTAATTCCAACAGAAACGACTCAACAATTCATAGTGAAAAATGGTATT
    NO: ATAAAAGAAGACGAGTTGCGTGGCGAGAATAGACAAATTTTGAAAGATATCATGGATGAC
    133 TACTACAGAGGGTTCATCTCCGAAACATTGTCTTCTATTGACGACATTGACTGGACCAGC
    TTATTCGAAAAAATGGAAATACAGCTGAAGAACGGAGATAACAAGGACACTCTTATAAAG
    GAGCAAACGGAATATAGAAAGGCTATACACAAAAAGTTTGCTAATGACGATAGATTTAAA
    AACATGTTTAGTGCGAAGTTAATTTCTGATATTCTACCCGAGTTTGTCATTCATAATAAT
    AACTACTCTGCATCTGAAAAAGAGGAGAAGACCCAGGTTATAAAGTTGTTTTCAAGATTT
    GCCACATCATTTAAAGACTACTTCAAGAACAGGGCGAATTGCTTCTCTGCTGATGATATT
    AGCTCTTCCAGCTGTCATAGAATTGTTAACGATAATGCCGAAATTTTTTTTAGTAATGCC
    TTGGTATATAGACGCATAGTCAAGTCACTAAGCAATGATGATATAAACAAGATTAGTGGT
    GATATGAAAGATAGCCTTAAAGAAATGAGCCTTGAAGAGATATATTCATATGAGAAGTAC
    GGTGAATTTATAACTCAAGAAGGAATTTCTTTTTATAACGATATTTGTGGTAAGGTTAAT
    TCTTTTATGAATTTGTATTGCCAGAAGAACAAGGAAAATAAGAATCTATATAAACTACAA
    AAGTTGCATAAACAGATTTTGTGTATAGCTGATACATCCTACGAAGTTCCGTATAAATTT
    GAATCTGATGAGGAAGTTTATCAATCGGTAAACGGTTTTCTTGACAACATTTCCAGCAAA
    CATATCGTTGAGAGACTACGTAAAATTGGAGACAACTATAATGGTTACAATCTAGATAAA
    ATATACATAGTGTCCAAGTTTTATGAGTCTGTCTCTCAAAAGACATATCGTGATTGGGAG
    ACCATTAATACTGCACTTGAAATTCATTATAACAACATATTGCCTGGTAACGGGAAGAGT
    AAAGCTGATAAGGTTAAAAAGGCCGTCAAAAACGACTTGCAAAAGTCTATTACCGAGATA
    AATGAATTAGTGTCAAACTACAAACTATGCTCAGATGATAATATTAAAGCGGAAACATAC
    ATCCACGAAATTTCCCACATACTGAATAACTTTGAAGCTCAGGAGCTTAAATATAACCCG
    GAAATACACTTGGTTGAGAGCGAGTTAAAAGCATCTGAGTTGAAAAATGTATTAGACGTC
    ATCATGAATGCGTTTCATTGGTGTTCAGTTTTCATGACTGAAGAATTAGTCGACAAAGAT
    AACAATTTTTATGCCGAATTAGAGGAAATATATGATGAAATTTATCCCGTAATTAGTTTA
    TACAATCTAGTTAGAAATTATGTTACACAAAAGCCGTATAGTACCAAGAAAATAAAGCTT
    AATTTCGGAATACCTACGCTTGCTGATGGTTGGTCAAAAAGTAAAGAATATAGCAATAAT
    GCAATAATTTTAATGAGAGATAACCTATATTATTTGGGTATTTTTAACGCTAAGAACAAA
    CCAGACAAGAAAATAATTGAAGGTAATACATCTGAAAACAAGGGCGACTATAAAAAGATG
    ATATACAATTTGCTCCCAGGTCCTAATAAAATGATTCCTAAGGTTTTCCTGAGTAGCAAG
    ACTGGCGTTGAAACTTACAAGCCTAGTGCGTATATCCTGGAGGGTTATAAACAGAACAAG
    CATATCAAATCCTCTAAGGACTTCGATATCACCTTTTGCCATGACTTAATCGATTATTTT
    AAAAATTGTATCGCAATTCATCCAGAATGGAAAAATTTCGGATTTGATTTTAGTGATACC
    AGCACTTACGAGGATATCTCTGGGTTCTACAGAGAAGTGGAGTTGCAGGGCTACAAAATC
    GATTGGACTTACATATCTGAAAAGGACATAGATTTGCTGCAGGAGAAAGGTCAGCTATAT
    TTGTTTCAAATCTACAACAAAGACTTTTCTAAAAAGTCTACCGGTAATGACAATCTGCAC
    ACAATGTACTTGAAGAACTTATTCTCCGAGGAGAACTTAAAGGACATTGTACTCAAGTTG
    AATGGAGAAGCCGAGATTTTTTTTAGAAAGAGCAGTATAAAGAATCCTATAATCCACAAG
    AAGGGCTCAATTCTCGTGAATAGGACGTATGAGGCAGAAGAAAAGGACCAATTTGGGAAT
    ATACAAATTGTAAGAAAAAACATCCCAGAAAATATCTACCAGGAATTATATAAGTATTTT
    AATGACAAATCTGATAAGGAACTGTCTGACGAAGCCGCTAAGCTCAAGAATGTTGTGGGC
    CACCATGAAGCTGCTACTAATATAGTGAAGGACTACAGATATACCTACGATAAATATTTC
    CTGCATATGCCAATTACTATAAACTTCAAAGCAAATAAAACAGGTTTTATAAATGATAGA
    ATCCTGCAGTATATTGCTAAAGAAAAGGATTTACATGTAATTGGGATTGATAGAGGTGAA
    CGCAATCTGATCTATGTCAGCGTAATAGATACTTGTGGTAATATTGTGGAACAAAAGTCC
    TTTAATATTGTGAACGGATATGATTACCAAATCAAGTTGAAACAACAAGAGGGAGCACGC
    CAAATTGCCCGTAAGGAATGGAAAGAGATAGGTAAGATCAAGGAAATTAAGGAAGGTTAT
    CTTTCATTAGTTATTCACGAAATTTCGAAGATGGTAATCAAATACAACGCAATAATTGCT
    ATGGAGGACCTGTCATATGGATTTAAGAAAGGTAGATTCAAGGTTGAGAGACAGGTATAC
    CAGAAATTTGAAACTATGTTGATCAACAAATTAAATTACTTAGTCTTTAAGGACATATCA
    ATAACGGAAAACGGCGGGCTTTTAAAAGGGTATCAACTTACATACATACCTGATAAGTTG
    AAAAATGTGGGTCATCAGTGTGGGTGCATCTTTTATGTTCCAGCCGCTTACACATCAAAA
    ATCGATCCTACTACTGGGTTCGTAAACATATTTAAATTTAAAGATCTAACCGTTGATGCA
    AAAAGAGAGTTTATCAAGAAATTTGATAGCATTAGGTACGATTCAGAAAAAAATCTATTC
    TGTTTTACTTTTGACTACAACAACTTTATAACGCAGAATACAGTGATGTCAAAATCGTCC
    TGGTCAGTGTATACTTATGGTGTTAGAATTAAGAGACGTTTCGTAAACGGTCGTTTTTCT
    AACGAGTCCGATACAATCGACATCACTAAAGATATGGAAAAAACTTTGGAAATGACAGAT
    ATAAACTGGAGAGATGGTCACGACCTTAGACAAGATATAATCGATTATGAAATCGTACAG
    CATATTTTTGAAATTTTTCGCTTAACAGTTCAGATGCGTAACTCTCTTAGTGAGCTAGAA
    GATAGAGATTATGATAGACTTATCTCGCCTGTTCTTAACGAAAATAATATCTTCTATGAC
    TCGGCAAAAGCCGGTGATGCACTTCCAAAAGATGCTGATGCAAATGGCGCGTACTGCATC
    GCATTGAAGGGGCTCTACGAGATTAAACAAATCACCGAAAACTGGAAAGAAGATGGTAAA
    TTTTCTAGGGATAAGTTGAAAATCAGTAATAAAGATTGGTTCGATTTTATACAAAATAAG
    CGATACTTATAG
    SEQ ATGACCAATAAGTTTACTAATCAATACTCATTGTCTAAAACGTTAAGATTCGAGTTAATT
    ID CCCCAGGGAAAGACACTAGAATTTATTCAAGAAAAAGGTCTTCTCTCTCAGGATAAACAA
    NO: AGAGCAGAATCATACCAGGAGATGAAAAAAACCATAGATAAATTTCATAAGTACTTCATC
    134 GACTTGGCACTATCGAACGCCAAGCTAACACATTTGGAAACCTACCTGGAGTTGTATAAT
    AAATCGGCAGAGACGAAAAAGGAACAAAAATTCAAGGATGACCTGAAGAAGGTTCAAGAT
    AATCTGCGAAAGGAAATAGTGAAGTCGTTTAGTGATGGTGATGCAAAGTCAATCTTTGCT
    ATTTTAGACAAGAAGGAATTAATAACCGTGGAACTTGAAAAGTGGTTTGAAAATAACGAA
    CAGAAAGATATTTACTTCGACGAAAAATTTAAAACGTTTACTACGTACTTTACAGGGTTC
    CATCAGAACCGCAAAAACATGTACTCCGTTGAACCAAACTCTACTGCAATCGCCTACAGA
    TTAATACACGAAAATTTGCCTAAGTTTTTAGAAAATGCAAAGGCTTTTGAAAAGATAAAG
    CAAGTCGAATCGTTACAGGTAAACTTTCGCGAATTAATGGGCGAATTTGGAGATGAAGGT
    CTTATTTTTGTCAATGAATTAGAGGAAATGTTTCAAATTAATTATTATAACGATGTCTTG
    AGTCAGAACGGCATTACTATCTACAACTCAATTATCAGTGGTTTCACTAAGAATGATATA
    AAATATAAAGGTTTGAATGAATACATTAATAATTATAATCAAACTAAAGATAAGAAGGAC
    AGGCTTCCGAAATTGAAGCAATTGTACAAGCAGATTCTAAGTGATAGGATTAGTTTGTCT
    TTCTTGCCAGACGCATTTACTGATGGCAAGCAAGTCTTAAAGGCTATATTCGATTTCTAC
    AAGATTAACCTACTTTCGTACACAATTGAAGGTCAAGAAGAATCTCAAAATCTGCTGCTT
    TTGATTAGGCAAACTATAGAAAATTTGTCGTCCTTTGACACTCAAAAAATTTACCTGAAG
    AATGATACACACCTGACTACAATATCACAGCAGGTCTTTGGGGATTTTTCTGTCTTCTCC
    ACGGCCCTAAACTATTGGTATGAGACAAAAGTTAATCCAAAATTTGAAACAGAATATAGT
    AAGGCGAATGAAAAAAAGAGAGAAATTTTGGATAAAGCGAAGGCAGTATTCACAAAACAA
    GACTATTTTTCTATCGCATTTCTCCAAGAAGTCTTATCCGAATATATTTTGACACTCGAT
    CACACCTCTGATATAGTTAAGAAACATTCGTCCAACTGCATCGCAGATTACTTCAAGAAT
    CACTTCGTGGCTAAGAAAGAAAACGAAACGGATAAAACTTTTGACTTCATTGCTAACATA
    ACCGCTAAATACCAATGTATTCAGGGCATATTAGAAAATGCAGACCAGTACGAAGACGAG
    TTAAAACAGGACCAAAAGTTAATAGATAATCTAAAGTTTTTCTTAGATGCTATACTTGAG
    TTATTACATTTTATAAAGCCATTGCATCTAAAATCGGAAAGTATTACTGAAAAAGACACT
    GCGTTCTATGATGTGTTCGAAAATTATTATGAGGCTTTATCTTTATTGACCCCCCTTTAC
    AACATGGTCCGCAATTATGTTACTCAGAAGCCTTACTCTACTGAAAAGATCAAATTAAAC
    TTTGAAAATGCTCAGTTGCTGAATGGTTGGGATGCCAATAAGGAAGGTGACTACCTGACG
    ACTATTCTAAAAAAAGACGGTAATTATTTCTTAGCAATCATGGATAAAAAACATAACAAG
    GCATTTCAAAAATTTCCAGAAGGAAAAGAAAACTATGAAAAGATGGTTTATAAATTGTTG
    CCTGGAGTTAATAAAATGTTGCCAAAAGTTTTTTTTAGCAATAAGAACATAGCTTACTTT
    AATCCATCTAAGGAACTGCTCGAGAACTACAAGAAGGAAACACATAAAAAAGGTGATACA
    TTTAATTTGGAACATTGCCATACTCTGATTGATTTTTTTAAGGACTCTCTTAATAAACAT
    GAAGACTGGAAATATTTTGATTTTCAATTTTCGGAAACTAAATCATACCAAGATCTAAGT
    GGATTTTACAGAGAAGTTGAACACCAAGGTTATAAGATTAACTTCAAGAATATAGATTCT
    GAATACATTGATGGTCTTGTAAACGAGGGTAAACTATTCCTGTTCCAAATCTACTCTAAG
    GACTTCTCACCTTTTTCCAAAGGAAAACCTAATATGCATACGTTGTACTGGAAGGCTCTA
    TTTGAAGAACAAAATTTGCAAAATGTAATCTACAAACTGAACGGCCAAGCTGAAATATTC
    TTCAGAAAAGCCTCAATTAAGCCAAAAAACATTATTCTTCATAAAAAGAAGATCAAGATT
    GCGAAGAAACATTTTATTGATAAGAAGACCAAGACTTCCGAAATTGTACCAGTACAAACA
    ATCAAGAATCTCAATATGTATTATCAAGGCAAGATAAGTGAGAAAGAGTTAACCCAGGAT
    GATTTACGTTATATAGACAATTTCTCTATATTCAACGAGAAGAACAAAACAATAGACATT
    ATCAAAGATAAAAGGTTTACTGTTGACAAATTTCAATTTCATGTGCCTATCACAATGAAC
    TTTAAGGCCACAGGTGGTTCGTACATTAATCAAACTGTTTTAGAATATCTGCAAAATAAC
    CCAGAGGTCAAGATCATCGGTCTTGATAGGGGTGAGAGACATCTGGTGTATCTAACACTC
    ATTGATCAACAAGGCAACATCTTGAAGCAAGAATCATTGAACACTATCACAGACTCCAAG
    ATCTCGACTCCATATCACAAACTCCTTGACAATAAAGAAAACGAAAGGGATCTTGCCAGA
    AAAAATTGGGGTACAGTTGAAAATATTAAGGAACTAAAAGAAGGTTACATTTCGCAAGTA
    GTTCACAAGATTGCAACACTCATGTTGGAAGAAAACGCAATCGTTGTCATGGAAGATTTA
    AATTTCGGATTTAAGAGAGGAAGATTTAAAGTAGAAAAGCAAATCTACCAGAAGTTGGAG
    AAGATGTTAATTGACAAATTGAACTACTTAGTGCTGAAAGACAAACAGCCTCAAGAATTG
    GGCGGTCTATACAACGCTTTACAACTGACAAATAAATTTGAGTCATTCCAAAAGATGGGT
    AAGCAGAGTGGTTTTTTGTTTTATGTTCCGGCATGGAACACATCCAAAATCGATCCAACT
    ACAGGCTTCGTGAATTATTTCTACACTAAATATGAAAATGTGGATAAAGCAAAAGCTTTC
    TTTGAGAAGTTCGAGGCGATCCGTTTTAACGCTGAAAAGAAGTACTTCGAGTTCGAGGTC
    AAAAAGTATTCAGATTTTAACCCCAAGGCTGAAGGCACCCAGCAAGCATGGACTATTTGC
    ACGTACGGTGAGCGAATCGAAACTAAAAGGCAAAAGGATCAAAATAATAAGTTTGTAAGC
    ACACCCATTAACTTGACAGAAAAGATAGAAGATTTTCTTGGAAAAAACCAAATTGTATAT
    GGTGACGGTAACTGTATCAAGTCACAAATTGCTTCTAAAGACGATAAGGCCTTCTTCGAA
    ACTCTGCTATACTGGTTTAAAATGACGTTGCAAATGAGAAACAGTGAAACTAGAACTGAT
    ATCGACTATTTAATATCACCCGTGATGAACGATAATGGTACCTTTTACAATTCAAGAGAT
    TACGAGAAATTGGAGAACCCCACACTACCAAAAGACGCAGACGCTAATGGTGCCTACCAT
    ATTGCTAAAAAGGGACTGATGTTGTTGAACAAGATAGATCAAGCCGACTTAACTAAAAAA
    GTTGATTTGTCAATTTCGAATAGAGATTGGTTGCAATTCGTCCAGAAAAATAAGTAA
    SEQ ATGGAACAGGAATACTACTTGGGTTTGGATATGGGAACTGGTTCAGTCGGTTGGGCTGTT
    ID ACGGACTCCGAGTACCACGTGTTGAGAAAACACGGAAAGGCTTTATGGGGTGTCAGACTA
    NO: TTCGAATCAGCATCGACCGCGGAAGAGAGAAGAATGTTTAGAACTTCAAGAAGAAGGCTG
    135 GATCGTAGGAATTGGCGGATAGAAATTTTACAAGAAATATTCGCCGAAGAAATCTCTAAA
    AAAGATCCAGGATTTTTTCTACGTATGAAGGAATCCAAATACTATCCGGAAGATAAACGT
    GATATTAATGGCAATTGTCCAGAGTTACCCTATGCTTTATTTGTGGACGACGATTTCACC
    GATAAAGATTACCATAAGAAGTTCCCAACAATTTACCATCTGAGAAAGATGTTAATGAAC
    ACTGAAGAAACCCCGGATATAAGACTGGTCTATCTAGCCATTCATCATATGATGAAACAC
    AGGGGACACTTCTTGCTATCAGGGGATATAAATGAAATTAAAGAATTTGGTACAACATTT
    TCTAAATTATTGGAAAATATTAAAAACGAAGAATTAGATTGGAATTTAGAATTAGGCAAG
    GAGGAATACGCAGTTGTCGAATCGATTCTGAAAGATAACATGTTGAACAGATCAACGAAA
    AAAACAAGGCTGATCAAGGCTTTAAAAGCGAAATCAATATGCGAAAAAGCAGTATTGAAT
    TTGTTAGCTGGGGGGACTGTCAAGTTGTCTGATATTTTCGGATTGGAAGAATTGAATGAA
    ACAGAGAGACCGAAGATATCCTTCGCCGATAATGGCTACGATGATTATATAGGCGAAGTC
    GAAAATGAGCTGGGCGAACAATTCTACATTATCGAGACTGCCAAGGCTGTTTATGATTGG
    GCGGTGTTAGTCGAAATCCTTGGCAAATACACTTCCATCTCCGAAGCTAAGGTGGCAACC
    TACGAAAAGCATAAAAGTGATTTGCAATTCCTTAAGAAAATTGTCCGAAAGTACTTGACC
    AAAGAAGAGTACAAGGATATTTTCGTATCAACATCGGACAAACTGAAGAATTATTCAGCT
    TATATTGGCATGACGAAAATTAATGGTAAGAAAGTTGATTTGCAATCCAAGAGATGTTCT
    AAAGAAGAATTTTACGATTTCATTAAAAAAAATGTCCTAAAAAAGTTGGAGGGACAACCT
    GAATATGAGTATTTAAAGGAAGAACTGGAAAGAGAAACTTTCCTACCAAAGCAAGTTAAT
    CGTGATAATGGCGTTATTCCATACCAAATACACTTGTACGAATTAAAGAAGATCTTGGGT
    AACTTGAGGGACAAAATTGATTTAATCAAGGAAAATGAAGACAAACTGGTACAATTATTT
    GAATTTAGAATACCTTACTACGTGGGCCCTTTAAACAAAATAGACGATGGTAAGGAAGGG
    AAGTTCACATGGGCAGTCAGAAAGTCCAATGAAAAAATTTACCCATGGAATTTCGAAAAC
    GTTGTAGATATTGAAGCTTCTGCTGAGAAATTTATTAGGAGAATGACAAATAAATGCACT
    TATCTTATGGGGGAAGACGTGTTGCCTAAAGATAGTTTATTATATTCAAAGTATATGGTC
    TTAAATGAATTAAACAATGTTAAATTAGATGGTGAAAAACTTTCCGTCGAATTGAAACAA
    AGATTGTATACAGATGTATTCTGCAAATATAGAAAAGTAACTGTAAAGAAGATTAAAAAC
    TACCTTAAATGTGAAGGCATTATCAGCGGAAATGTTGAGATCACTGGTATCGATGGTGAT
    TTTAAGGCATCTTTAACCGCATATCACGACTTTAAGGAAATATTGACGGGTACTGAGCTT
    GCTAAAAAAGACAAAGAGAACATTATCACCAATATCGTGCTCTTCGGAGACGACAAGAAA
    TTATTGAAAAAGAGATTGAACCGCCTATACCCTCAGATTACCCCTAACCAATTGAAGAAA
    ATCTGCGCTCTGTCTTATACTGGATGGGGTCGTTTTAGCAAGAAGTTTCTAGAAGAAATT
    ACTGCTCCGGATCCTGAAACTGGGGAAGTCTGGAATATAATTACCGCGCTATGGGAATCG
    AATAATAATTTAATGCAATTACTATCTAATGAATACAGATTTATGGAAGAAGTCGAAACT
    TACAATATGGGAAAACAAACAAAAACTTTGAGCTACGAAACAGTAGAGAATATGTATGTC
    TCACCATCTGTAAAGCGGCAGATCTGGCAAACCTTGAAGATAGTTAAAGAATTAGAAAAA
    GTGATGAAGGAAAGTCCAAAAAGGGTTTTTATTGAAATGGCCCGAGAAAAACAAGAATCT
    AAAAGGACGGAAAGTAGGAAAAAGCAACTTATAGATCTATATAAAGCCTGCAAAAATGAA
    GAAAAAGATTGGGTAAAGGAATTAGGTGACCAGGAAGAGCAAAAATTGAGATCTGACAAG
    CTGTACTTGTATTATACGCAAAAGGGCCGGTGTATGTATTCGGGTGAGGTAATAGAATTG
    AAAGATTTATGGGATAACACTAAGTATGACATTGACCATATTTACCCCCAGTCTAAGACA
    ATGGACGATTCATTAAATAACCGAGTTCTTGTCAAAAAGAAGTACAATGCCACAAAGAGC
    GATAAGTACCCATTGAACGAAAATATAAGACATGAACGAAAAGGTTTCTGGAAATCATTG
    TTGGACGGTGGATTTATTTCCAAAGAAAAATACGAGAGATTGATTAGAAACACTGAACTA
    TCTCCAGAGGAGTTAGCTGGCTTTATCGAAAGACAAATTGTTGAAACTAGACAGTCTACA
    AAAGCAGTTGCAGAAATCTTAAAACAAGTATTTCCAGAATCCGAAATTGTGTACGTCAAA
    GCCGGAACAGTAAGTAGATTTAGAAAAGACTTTGAATTATTGAAAGTACGAGAGGTTAAC
    GACCTACATCATGCTAAGGATGCTTATTTAAATATAGTCGTTGGTAATTCGTATTACGTG
    AAATTCACAAAAAACGCATCTTGGTTCATCAAGGAGAATCCTGGTAGGACATACAACTTG
    AAAAAGATGTTTACATCAGGATGGAATATCGAAAGAAATGGTGAGGTTGCGTGGGAGGTA
    GGCAAGAAGGGAACCATTGTTACTGTAAAGCAAATTATGAATAAAAACAATATACTTGTT
    ACGAGACAGGTGCACGAAGCCAAAGGAGGGTTGTTTGACCAGCAAATCATGAAGAAAGGT
    AAAGGTCAGATAGCAATAAAAGAGACTGATGAGCGTTTAGCTAGTATAGAAAAATATGGG
    GGCTACAATAAGGCAGCTGGTGCTTACTTCATGTTGGTCGAATCAAAGGATAAAAAAGGG
    AAGACGATCCGGACCATAGAGTTTATCCCTCTGTACTTGAAGAATAAGATTGAGTCTGAC
    GAAAGCATCGCATTGAATTTCTTGGAAAAGGGGCGCGGTCTAAAGGAGCCAAAAATATTG
    TTAAAGAAAATTAAAATAGACACCCTATTCGACGTCGATGGGTTTAAGATGTGGCTTAGT
    GGTCGTACTGGGGACAGATTATTATTCAAGTGTGCCAATCAGTTAATCCTTGACGAGAAA
    ATCATTGTTACAATGAAAAAAATTGTTAAGTTTATTCAAAGGCGACAAGAAAATAGAGAA
    CTAAAGTTGAGTGATAAGGATGGAATCGATAATGAAGTGTTAATGGAGATTTATAACACT
    TTTGTCGACAAATTGGAGAATACGGTGTACAGAATTAGGCTATCTGAACAGGCTAAAACC
    CTAATTGATAAACAGAAGGAGTTTGAGCGACTTTCTCTTGAAGACAAATCTTCAACTCTT
    TTCGAGATCCTACATATCTTTCAGTGTCAATCTTCTGCAGCTAATTTGAAAATGATTGGA
    GGTCCTGGTAAGGCTGGTATATTAGTCATGAACAACAACATATCTAAGTGTAATAAGATT
    AGTATAATTAACCAATCACCGACAGGTATCTTTGAAAATGAAATTGATTTACTTAAA
    SEQ ATGAAATCATTCGACTCGTTCACCAACTTGTACTCCCTGTCTAAAACATTGAAATTTGAA
    ID ATGCGACCTGTTGGTAACACCCAAAAGATGTTAGATAATGCAGGAGTTTTCGAAAAGGAT
    NO: AAACTGATCCAGAAAAAATACGGTAAAACGAAACCATATTTCGATAGGTTGCATCGGGAA
    136 TTTATAGAAGAAGCTTTGACTGGTGTAGAATTAATTGGCTTAGATGAGAATTTCCGTACT
    CTAGTCGATTGGCAAAAAGATAAAAAGAACAATGTTGCCATGAAGGCATACGAAAATAGT
    CTACAAAGACTAAGAACAGAGATCGGGAAAATTTTCAATTTGAAGGCAGAAGACTGGGTG
    AAGAACAAATATCCAATATTGGGTCTTAAGAATAAGAATACTGATATATTGTTCGAGGAG
    GCCGTTTTCGGTATTCTTAAGGCAAGATATGGTGAAGAGAAAGACACGTTTATTGAAGTT
    GAGGAGATTGATAAAACCGGTAAGTCCAAAATCAACCAGATCTCTATCTTCGACAGTTGG
    AAGGGCTTCACTGGTTATTTTAAGAAGTTCTTCGAAACTAGGAAGAACTTCTATAAAAAC
    GATGGTACTTCCACGGCTATTGCTACAAGAATTATCGACCAAAACCTTAAGCGTTTTATT
    GATAACCTATCAATTGTTGAAAGTGTTCGACAGAAAGTAGATTTGGCTGAAACTGAAAAA
    TCTTTTAGTATCTCCTTATCCCAGTTTTTCTCTATAGATTTTTATAATAAATGTTTGCTG
    CAAGATGGCATTGACTACTATAATAAAATAATTGGTGGAGAGACATTGAAAAACGGAGAG
    AAGCTGATTGGCCTTAATGAGTTGATAAATCAATATAGACAAAATAATAAGGACCAGAAA
    ATCCCTTTCTTTAAATTGCTAGACAAACAGATTTTGTCTGAAAAGATCCTATTCTTGGAT
    GAAATAAAGAACGATACTGAATTGATTGAAGCTTTGTCCCAGTTTGCTAAAACAGCTGAA
    GAAAAGACAAAGATTGTGAAAAAATTGTTTGCTGATTTCGTAGAAAACAATTCTAAATAT
    GATCTAGCCCAGATTTATATAAGTCAAGAAGCTTTCAATACAATAAGTAATAAGTGGACA
    AGTGAAACAGAAACTTTTGCTAAGTATTTATTCGAAGCCATGAAGTCTGGTAAACTTGCC
    AAATACGAAAAAAAAGATAACAGTTATAAATTTCCAGACTTTATAGCCCTTTCACAGATG
    AAGTCTGCCTTATTGTCGATATCCTTAGAAGGTCATTTTTGGAAGGAAAAATATTATAAG
    ATAAGCAAGTTCCAAGAAAAGACTAATTGGGAACAATTTTTGGCTATATTTCTATATGAG
    TTCAATTCATTATTTTCCGATAAAATCAACACTAAGGATGGAGAGACTAAGCAAGTTGGC
    TACTATTTGTTCGCAAAAGATCTGCACAATTTGATTCTATCAGAACAAATAGATATACCA
    AAAGATTCAAAGGTAACTATAAAGGATTTCGCAGATTCCGTCCTCACCATTTATCAAATG
    GCTAAATATTTTGCCGTTGAAAAAAAGAGAGCGTGGTTAGCAGAATACGAGTTGGACTCG
    TTTTATACTCAGCCAGATACTGGATACTTGCAATTCTACGATAATGCATACGAAGACATT
    GTACAGGTATACAATAAACTTAGAAATTACTTAACCAAGAAGCCCTACAGTGAAGAAAAA
    TGGAAGCTGAACTTTGAAAATTCGACTTTGGCAAATGGTTGGGATAAAAATAAAGAAAGT
    GACAACTCCGCAGTGATTTTGCAAAAGGGTGGGAAATATTACTTGGGTTTAATCACAAAA
    GGCCACAATAAGATTTTTGATGATAGATTTCAAGAAAAATTCATAGTTGGTATAGAAGGT
    GGCAAATACGAGAAAATTGTCTATAAATTCTTCCCTGATCAAGCCAAAATGTTCCCAAAA
    GTTTGCTTTTCTGCTAAAGGATTGGAGTTTTTCCGGCCTAGCGAGGAGATCCTTCGTATC
    TACAACAATGCTGAATTCAAAAAAGGAGAAACCTATAGCATAGATTCTATGCAAAAACTG
    ATAGATTTTTATAAGGATTGTTTAACAAAGTACGAAGGCTGGGCCTGCTATACATTTAGA
    CATTTAAAGCCCACAGAAGAATACCAAAATAACATTGGTGAATTCTTTCGGGACGTTGCC
    GAAGACGGCTATAGGATCGATTTTCAAGGTATCTCAGATCAATATATCCACGAAAAGAAC
    GAGAAGGGTGAGCTGCACCTTTTCGAAATTCATAATAAGGACTGGAATTTGGATAAGGCG
    AGAGATGGTAAATCGAAGACCACTCAAAAGAACTTGCATACTTTATATTTTGAGTCCTTG
    TTTTCTAATGATAACGTCGTCCAAAATTTTCCAATAAAGTTGAATGGACAAGCGGAAATT
    TTCTATCGGCCTAAGACAGAGAAAGACAAATTAGAATCAAAGAAAGATAAAAAGGGAAAT
    AAAGTCATTGATCACAAACGATACTCTGAGAATAAAATATTTTTCCACGTACCATTGACA
    CTCAACAGGACTAAGAATGACTCTTATAGATTTAATGCTCAGATTAATAATTTTTTGGCA
    AATAACAAGGATATTAACATAATTGGGGTGGATAGAGGTGAAAAGCACTTGGTATATTAC
    TCTGTCATCACTCAGGCTTCTGATATATTGGAAAGCGGGTCTCTAAATGAATTGAACGGT
    GTTAACTACGCCGAAAAGCTAGGTAAAAAAGCTGAAAACAGAGAGCAGGCTCGGCGCGAT
    TGGCAAGATGTTCAAGGAATTAAAGACCTTAAAAAAGGCTACATTAGTCAAGTAGTTAGA
    AAGTTAGCCGATCTTGCTATTAAACATAACGCAATCATTATTCTGGAGGACCTAAATATG
    CGTTTTAAGCAAGTTAGGGGTGGCATAGAAAAAAGTATTTATCAGCAGCTTGAGAAGGCT
    TTGATAGATAAGTTATCGTTCCTAGTTGACAAAGGTGAAAAAAATCCTGAACAAGCTGGT
    CATCTGTTGAAAGCTTATCAGCTGAGCGCACCTTTTGAAACATTTCAAAAAATGGGAAAA
    CAAACAGGTATTATTTTCTATACTCAAGCGAGTTATACAAGTAAATCTGACCCAGTGACA
    GGATGGAGACCACACCTTTATCTAAAATATTTTTCTGCTAAAAAGGCCAAAGATGACATC
    GCTAAGTTTACAAAAATAGAATTTGTCAACGATAGATTTGAATTGACTTACGATATTAAA
    GATTTTCAGCAAGCAAAAGAATACCCAAATAAGACAGTGTGGAAAGTATGCTCCAATGTG
    GAGAGATTTAGATGGGATAAAAATCTCAATCAAAACAAGGGTGGTTACACACATTATACT
    AATATAACTGAAAATATTCAAGAATTGTTTACTAAGTACGGAATTGACATAACCAAAGAC
    TTACTAACTCAGATTTCAACTATTGACGAAAAACAAAATACCTCATTTTTCCGCGACTTT
    ATTTTTTATTTCAACTTGATCTGTCAAATTCGTAACACGGATGATTCCGAAATTGCCAAG
    AAGAACGGAAAAGATGATTTCATCCTATCTCCAGTGGAACCATTTTTTGACTCAAGAAAA
    GATAATGGTAATAAGTTGCCTGAGAACGGAGATGATAACGGCGCTTATAATATCGCTCGG
    AAGGGTATTGTAATTCTTAATAAAATATCTCAGTACTCTGAAAAGAACGAAAACTGCGAG
    AAAATGAAGTGGGGCGACTTGTATGTATCTAATATAGATTGGGATAATTTCGTTACTCAA
    GCCAACGCGAGACATTGA
    SEQ ATGGAAAATTTTAAAAACCTATATCCAATTAATAAGACACTTAGATTCGAGCTTAGGCCA
    ID TACGGCAAAACACTAGAAAATTTTAAGAAGTCAGGCCTATTAGAAAAAGACGCCTTTAAG
    NO: GCAAATTCCAGAAGATCAATGCAGGCAATTATTGATGAGAAATTTAAAGAGACTATCGAG
    137 GAAAGGTTGAAATACACTGAATTCTCTGAGTGCGATCTGGGAAACATGACTTCCAAGGAT
    AAAAAGATTACCGATAAGGCTGCTACCAACCTCAAAAAGCAAGTCATCTTATCGTTTGAT
    GATGAAATTTTTAATAACTACTTAAAGCCGGACAAAAACATTGACGCCCTATTCAAAAAT
    GATCCGTCCAACCCCGTAATTTCAACTTTTAAGGGTTTTACCACGTACTTTGTAAATTTT
    TTTGAGATTCGTAAACATATCTTCAAAGGAGAATCGTCGGGTTCCATGGCCTATAGGATA
    ATTGATGAAAATCTTACGACTTACTTAAACAATATCGAAAAGATAAAAAAGTTACCAGAA
    GAATTAAAGTCTCAATTGGAAGGTATTGACCAAATAGACAAATTAAATAACTATAATGAG
    TTCATAACTCAAAGCGGTATCACACATTACAATGAAATTATCGGTGGTATATCTAAAAGT
    GAGAACGTAAAAATACAGGGAATAAACGAGGGGATCAATCTATACTGTCAGAAGAATAAA
    GTAAAATTACCAAGACTAACGCCATTATACAAAATGATTCTGTCTGATAGAGTTTCCAAC
    TCGTTCGTGCTTGATACTATAGAAAATGATACTGAATTAATTGAGATGATTAGCGACTTG
    ATTAATAAAACAGAAATATCTCAAGACGTAATAATGTCAGACATTCAGAACATTTTCATA
    AAATATAAACAGCTTGGTAATTTACCGGGGATAAGTTACTCTAGCATCGTGAATGCTATT
    TGCTCCGATTATGACAATAATTTTGGTGACGGAAAAAGAAAAAAATCATATGAGAACGAT
    AGGAAGAAACACCTTGAAACAAACGTATACTCAATTAACTATATATCGGAACTGTTAACA
    GACACCGATGTATCATCTAATATAAAAATGAGATATAAGGAACTTGAACAAAATTACCAG
    GTGTGTAAGGAGAATTTCAATGCTACCAACTGGATGAACATTAAGAATATTAAACAGAGT
    GAAAAGACAAACTTGATTAAAGATCTACTAGATATACTGAAATCAATACAGAGATTCTAC
    GATCTGTTTGATATAGTTGATGAAGACAAAAATCCTAGTGCTGAGTTTTACACGTGGCTA
    AGTAAAAATGCGGAAAAGTTAGATTTCGAGTTCAACTCTGTTTATAATAAATCTAGGAAT
    TATTTAACTAGAAAGCAGTATTCTGATAAAAAGATAAAATTGAACTTCGACTCCCCTACG
    TTGGCAAAGGGTTGGGATGCAAACAAAGAAATCGATAACTCCACCATAATAATGCGTAAG
    TTTAACAATGATAGGGGGGATTACGATTATTTTTTGGGAATTTGGAACAAATCTACCCCA
    GCGAATGAAAAAATTATTCCCCTTGAAGACAATGGTCTTTTTGAAAAAATGCAGTATAAA
    TTATATCCAGACCCATCCAAGATGCTTCCAAAGCAATTTCTGTCAAAAATTTGGAAGGCT
    AAACACCCTACTACTCCTGAATTTGATAAGAAGTATAAGGAGGGCCGACACAAAAAGGGT
    CCAGATTTTGAAAAAGAATTCCTGCATGAATTGATAGATTGTTTTAAGCATGGTTTGGTA
    AATCATGATGAAAAATATCAGGATGTCTTTGGATTCAATTTGAGAAATACAGAGGATTAC
    AACTCATATACAGAATTTCTCGAGGACGTCGAACGTTGCAATTATAATCTCAGTTTCAAC
    AAGATCGCAGACACTTCAAACTTAATTAACGACGGAAAATTGTACGTTTTTCAAATCTGG
    TCGAAAGACTTTAGTATTGATTCAAAGGGTACAAAAAACCTAAATACAATATATTTCGAA
    AGTCTATTCTCGGAAGAAAACATGATCGAAAAAATGTTCAAACTGTCAGGCGAAGCTGAA
    ATATTCTACCGTCCCGCAAGCCTTAATTATTGTGAGGATATCATTAAAAAAGGACATCAC
    CATGCAGAGTTAAAAGATAAATTCGATTACCCAATAATTAAAGATAAAAGATACTCCCAG
    GATAAGTTCTTTTTCCATGTACCTATGGTTATTAACTACAAGTCGGAAAAACTAAACTCG
    AAGTCATTAAATAATAGAACTAACGAGAACTTGGGACAATTCACACATATAATTGGTATT
    GATCGTGGCGAAAGACATTTAATATATCTGACTGTTGTTGATGTTTCAACAGGAGAAATT
    GTTGAACAGAAACATCTTGATGAAATTATAAACACAGATACAAAAGGCGTTGAGCATAAA
    ACTCATTATCTAAATAAATTGGAGGAAAAGTCGAAGACTCGCGATAACGAGAGAAAGAGT
    TGGGAAGCAATTGAAACCATAAAAGAGCTTAAAGAAGGTTACATTAGTCACGTCATCAAT
    GAAATACAAAAGTTACAAGAAAAGTATAACGCTTTGATTGTAATGGAAAATCTAAATTAT
    GGTTTTAAGAATTCAAGAATCAAAGTCGAAAAGCAGGTCTATCAGAAATTTGAAACGGCA
    CTTATTAAAAAGTTTAACTACATTATTGATAAAAAGGACCCAGAAACTTATATTCATGGT
    TACCAACTGACGAACCCAATCACAACATTGGACAAAATTGGAAACCAAAGTGGAATTGTT
    TTATACATTCCAGCTTGGAATACATCCAAAATAGACCCTGTCACGGGGTTTGTCAACTTG
    TTATATGCCGACGATTTAAAGTATAAAAACCAAGAACAAGCAAAGTCTTTTATTCAAAAG
    ATTGATAATATTTATTTCGAAAACGGTGAATTTAAATTCGACATAGATTTTTCTAAATGG
    AACAACCGTTATTCAATAAGTAAAACTAAATGGACACTCACCTCATACGGCACTCGTATC
    CAAACCTTTCGGAATCCCCAAAAAAATAACAAATGGGATTCTGCAGAATACGACTTGACC
    GAGGAATTTAAATTAATTCTTAATATAGACGGTACACTCAAAAGTCAAGACGTGGAGACA
    TACAAGAAGTTTATGTCGTTATTCAAGCTTATGCTTCAGTTGAGGAACTCCGTTACAGGC
    ACTGATATTGATTACATGATTTCACCAGTAACGGATAAGACTGGGACTCATTTCGATTCT
    AGGGAAAATATTAAAAATTTACCTGCTGACGCAGACGCAAACGGCGCATACAATATAGCA
    AGAAAAGGGATTATGGCCATTGAGAATATTATGAATGGCATATCAGATCCATTAAAGATA
    AGCAATGAAGACTACTTAAAATACATTCAGAATCAGCAAGAATAA
    SEQ ATGACCCAGTTTGAAGGTTTCACCAATTTGTACCAAGTAAGTAAAACCTTGAGGTTCGAA
    ID TTGATCCCACAGGGCAAGACATTGAAGCATATTCAAGAGCAAGGATTTATAGAAGAAGAT
    NO: AAAGCGAGAAACGATCACTATAAAGAGTTAAAACCCATTATTGACAGGATCTATAAAACA
    138 TACGCCGATCAATGCCTTCAATTAGTGCAATTAGATTGGGAAAACTTGAGCGCTGCCATC
    GATTCCTACAGGAAGGAAAAAACAGAAGAAACAAGAAATGCCTTAATCGAGGAACAAGCA
    ACCTATAGAAACGCTATACACGATTACTTCATCGGTAGAACTGATAATCTAACAGATGCA
    ATAAATAAGAGACATGCTGAGATATATAAAGGACTATTTAAAGCAGAATTATTCAACGGA
    AAGGTGTTGAAACAGTTAGGTACCGTTACAACTACTGAGCATGAAAATGCCTTGCTGAGA
    AGCTTTGACAAGTTTACTACCTACTTTTCGGGTTTCTACGAAAATCGCAAAAATGTATTT
    TCTGCGGAAGATATTTCAACTGCAATCCCTCATAGGATTGTTCAAGATAATTTCCCTAAG
    TTTAAAGAGAACTGTCACATTTTTACAAGGTTAATTACTGCGGTTCCAAGTCTAAGAGAA
    CATTTTGAGAATGTAAAAAAAGCGATTGGTATATTTGTATCCACTAGCATTGAAGAGGTT
    TTCAGCTTCCCTTTTTATAACCAATTACTTACCCAAACACAGATCGACCTGTACAACCAA
    TTGTTAGGTGGTATATCGAGGGAGGCTGGTACGGAAAAGATTAAAGGATTAAATGAAGTT
    CTTAATTTGGCCATACAAAAAAATGATGAAACCGCGCACATTATCGCATCTTTACCACAT
    AGGTTTATACCGTTATTCAAGCAAATATTATCTGATCGTAATACCTTATCGTTCATATTA
    GAGGAGTTTAAATCTGACGAAGAAGTTATACAATCTTTTTGCAAGTATAAGACGCTATTG
    AGAAACGAAAACGTTCTGGAAACAGCCGAAGCACTGTTCAATGAATTAAACAGTATCGAC
    TTGACTCATATTTTTATATCGCATAAAAAGTTGGAGACAATTTCTTCAGCATTGTGCGAT
    CACTGGGACACTTTAAGGAACGCACTATATGAACGTAGGATCTCAGAATTGACAGGTAAG
    ATAACGAAGTCTGCTAAAGAGAAAGTGCAGAGATCCCTAAAACACGAGGATATAAATTTG
    CAGGAGATAATTTCAGCTGCAGGTAAAGAGTTGTCTGAAGCGTTCAAGCAAAAGACTTCC
    GAAATCTTGTCACACGCACACGCCGCATTAGATCAACCTTTACCCACTACTTTGAAAAAA
    CAAGAAGAGAAGGAGATATTAAAATCACAACTTGATTCTTTACTTGGCCTTTATCATCTT
    TTAGATTGGTTCGCTGTTGACGAGAGCAATGAAGTGGATCCAGAGTTTTCCGCAAGATTG
    ACCGGTATAAAGTTGGAAATGGAACCTTCGTTATCATTTTACAACAAAGCTAGGAACTAT
    GCTACAAAAAAACCTTATTCTGTCGAAAAATTTAAACTGAACTTCCAAATGCCTACTCTA
    GCAAGTGGCTGGGATGTTAATAAAGAAAAGAACAATGGCGCTATTTTGTTTGTAAAAAAT
    GGCCTATACTATCTTGGAATTATGCCTAAACAAAAAGGTCGCTACAAGGCTTTGTCATTT
    GAACCTACTGAAAAGACTAGCGAAGGTTTCGATAAGATGTATTACGATTATTTCCCGGAT
    GCCGCTAAAATGATCCCCAAGTGCTCTACTCAATTGAAGGCAGTAACTGCTCATTTCCAA
    ACGCATACCACGCCAATACTGCTTTCTAACAACTTTATAGAACCACTAGAAATAACGAAA
    GAAATTTACGACCTAAATAACCCAGAGAAAGAACCAAAAAAGTTCCAGACGGCCTACGCC
    AAAAAGACAGGGGACCAAAAAGGTTACCGCGAGGCGTTATGTAAATGGATTGATTTTACT
    AGGGACTTTTTATCAAAATACACTAAAACGACGTCTATTGATCTTAGCTCCTTACGCCCG
    TCCTCCCAATACAAGGATCTAGGTGAGTATTACGCAGAGTTGAACCCGCTATTATACCAT
    ATTTCCTTCCAAAGGATTGCTGAAAAGGAAATTATGGACGCTGTTGAAACTGGGAAATTG
    TACCTGTTTCAGATTTATAATAAGGACTTCGCAAAGGGTCACCATGGTAAGCCTAACCTT
    CACACTTTGTACTGGACCGGACTATTCTCGCCTGAAAATTTGGCTAAAACAAGTATCAAG
    TTAAACGGTCAGGCCGAGTTATTTTATAGACCCAAATCTAGAATGAAAAGAATGGCCCAT
    AGATTAGGCGAAAAGATGTTAAACAAGAAATTAAAGGACCAAAAAACCCCGATACCAGAC
    ACTCTATACCAAGAACTGTACGACTATGTGAATCACAGGCTTAGTCACGATTTATCAGAT
    GAAGCGAGGGCTTTATTGCCAAATGTCATCACCAAGGAAGTATCACATGAAATAATTAAG
    GATAGAAGGTTCACATCTGATAAATTCTTTTTTCATGTCCCAATTACATTGAATTATCAA
    GCAGCGAACTCACCATCTAAATTTAATCAGCGCGTCAACGCCTATTTGAAAGAACATCCC
    GAAACACCAATCATCGGCATAGATCGAGGTGAGAGAAACTTAATATATATAACTGTGATT
    GATTCTACAGGAAAAATCCTGGAGCAACGATCTTTAAATACCATACAACAGTTTGATTAT
    CAAAAAAAGTTGGATAACAGAGAAAAAGAACGTGTTGCCGCTAGGCAGGCTTGGTCTGTG
    GTAGGAACAATTAAGGACTTAAAGCAGGGCTATCTGTCCCAAGTTATTCATGAAATAGTC
    GATCTGATGATACATTATCAGGCAGTTGTCGTGTTGGAAAATTTGAATTTTGGCTTTAAA
    TCAAAAAGAACTGGCATAGCAGAAAAAGCTGTGTACCAGCAGTTTGAAAAGATGTTAATC
    GATAAGCTAAACTGCCTTGTTCTTAAAGATTACCCCGCAGAAAAAGTAGGTGGTGTTCTT
    AATCCATATCAGTTGACAGACCAATTTACATCCTTTGCGAAAATGGGTACGCAAAGCGGG
    TTCTTATTCTACGTACCGGCCCCCTATACTTCTAAGATCGACCCACTAACAGGTTTTGTG
    GACCCTTTTGTTTGGAAGACGATAAAGAACCACGAGTCACGCAAACATTTCTTAGAGGGC
    TTTGATTTCTTGCACTACGACGTGAAAACTGGTGATTTTATCTTACACTTTAAAATGAAC
    AGAAATCTCTCTTTCCAACGTGGACTGCCCGGATTCATGCCGGCTTGGGACATCGTTTTT
    GAAAAGAATGAAACGCAGTTTGACGCCAAAGGTACACCATTTATAGCGGGTAAGAGAATT
    GTGCCGGTCATAGAAAACCATAGATTTACAGGTAGATATAGGGATCTGTACCCTGCTAAT
    GAATTGATTGCATTACTCGAAGAGAAAGGAATTGTGTTTCGAGATGGATCGAATATTTTA
    CCTAAGTTGTTGGAAAATGATGATTCACACGCAATTGATACTATGGTTGCCCTCATAAGA
    TCGGTATTGCAAATGAGAAACTCAAATGCTGCTACGGGAGAGGATTATATAAACAGCCCC
    GTTCGCGATCTTAATGGTGTTTGTTTTGATTCACGTTTTCAGAACCCCGAATGGCCAATG
    GATGCCGACGCAAACGGAGCATATCATATTGCTCTTAAAGGCCAACTACTATTAAATCAC
    TTAAAGGAATCCAAAGACCTAAAATTGCAAAACGGGATATCTAATCAGGATTGGCTGGCT
    TACATACAAGAACTACGTAACTAG
    SEQ ATGGCCGTTAAGTCAATCAAAGTGAAACTTAGACTGGATGACATGCCAGAGATTCGTGCG
    ID GGGTTATGGAAACTTCATAAGGAAGTTAACGCAGGGGTAAGATATTATACCGAATGGTTA
    NO: TCATTACTTCGACAAGAGAATTTGTACAGAAGGTCCCCGAACGGCGACGGTGAGCAAGAA
    139 TGCGATAAGACGGCTGAAGAATGTAAGGCAGAACTTTTGGAGCGCCTGAGAGCCCGTCAG
    GTTGAAAATGGCCATAGAGGTCCTGCGGGATCTGATGATGAGCTTTTACAGCTAGCTAGA
    CAATTGTATGAATTGTTGGTCCCTCAGGCTATTGGGGCTAAAGGAGACGCTCAACAAATC
    GCCAGAAAGTTCTTGTCACCTCTGGCTGACAAAGATGCCGTGGGAGGATTAGGTATCGCT
    AAAGCAGGTAATAAACCAAGATGGGTTAGAATGAGAGAAGCAGGCGAACCTGGTTGGGAA
    GAAGAGAAAGAAAAGGCCGAAACTAGAAAAAGCGCTGACAGAACCGCAGATGTTTTACGG
    GCCTTGGCTGATTTTGGACTGAAGCCTTTGATGAGAGTGTATACTGATTCAGAAATGTCT
    TCCGTTGAATGGAAGCCCCTAAGGAAGGGACAAGCGGTCAGAACCTGGGATAGGGATATG
    TTTCAACAGGCTATTGAAAGGATGATGTCATGGGAATCCTGGAATCAAAGAGTAGGTCAA
    GAATACGCTAAACTGGTCGAACAAAAGAATAGATTTGAACAAAAAAATTTTGTAGGTCAA
    GAACATTTAGTACATTTGGTTAATCAACTTCAACAAGATATGAAAGAGGCATCTCCTGGT
    TTGGAATCAAAAGAACAAACAGCACACTATGTTACCGGCCGAGCTTTGCGAGGTTCTGAC
    AAAGTATTTGAAAAGTGGGGGAAATTAGCTCCCGATGCCCCCTTTGATCTATATGATGCT
    GAAATTAAAAACGTTCAAAGAAGGAACACTAGACGTTTTGGATCCCATGATCTTTTTGCA
    AAGCTAGCTGAGCCAGAATACCAGGCTCTATGGCGTGAAGACGCCTCGTTTTTGACTAGA
    TACGCAGTATACAATTCAATACTCAGAAAACTAAACCATGCCAAGATGTTTGCTACATTC
    ACCCTGCCCGATGCTACCGCTCATCCTATTTGGACTAGATTTGACAAGTTGGGGGGGAAT
    CTACATCAGTACACATTTTTATTTAATGAATTCGGTGAAAGAAGACACGCTATTAGATTC
    CACAAGCTCCTAAAGGTTGAAAACGGCGTTGCGAGAGAAGTTGATGATGTAACAGTTCCC
    ATTTCTATGTCGGAGCAATTGGATAATCTATTGCCTAGAGACCCTAATGAACCAATTGCT
    TTGTACTTTCGTGACTACGGTGCAGAACAACACTTTACAGGTGAATTCGGCGGAGCCAAG
    ATTCAATGTAGACGTGATCAACTCGCACACATGCATAGAAGAAGAGGCGCTCGTGATGTT
    TATTTAAATGTGTCTGTTAGAGTTCAATCCCAATCGGAGGCTAGAGGTGAAAGAAGGCCA
    CCATACGCAGCAGTTTTTAGGTTAGTAGGTGATAATCATAGGGCATTTGTCCACTTCGAC
    AAATTAAGTGATTATTTAGCAGAGCACCCTGATGATGGAAAGTTGGGCAGTGAGGGATTA
    TTAAGTGGGTTGAGGGTAATGTCTGTAGATCTTGGTCTTCGTACTTCTGCGAGTATCTCT
    GTCTTTAGAGTAGCACGTAAGGATGAGTTGAAACCTAATAGCAAAGGAAGAGTCCCGTTT
    TTTTTTCCTATTAAGGGTAACGATAACCTGGTGGCCGTGCATGAAAGATCACAACTTTTG
    AAATTGCCAGGAGAAACGGAGTCCAAGGACTTGAGGGCAATTAGAGAGGAACGTCAGCGT
    ACATTGCGACAGCTGAGAACTCAATTGGCTTATTTGAGGTTGTTGGTTAGGTGTGGTTCC
    GAGGATGTTGGCAGAAGAGAAAGGTCTTGGGCCAAATTGATAGAACAACCAGTGGACGCC
    GCAAATCACATGACACCAGATTGGAGAGAAGCTTTCGAAAATGAACTCCAGAAATTAAAG
    AGCCTACATGGCATATGCTCTGATAAAGAGTGGATGGATGCCGTATACGAATCCGTTCGT
    AGAGTCTGGCGCCACATGGGTAAGCAAGTACGGGACTGGAGAAAGGATGTTCGTTCCGGC
    GAAAGACCGAAGATAAGGGGGTATGCAAAGGACGTTGTAGGCGGTAATTCTATTGAACAG
    ATTGAGTATTTGGAAAGGCAGTACAAATTTCTTAAATCCTGGAGCTTCTTCGGCAAAGTG
    TCAGGACAAGTCATCAGGGCTGAAAAAGGTTCCAGATTTGCTATTACGCTAAGGGAACAT
    ATTGATCATGCGAAAGAAGATAGACTGAAAAAACTAGCAGATAGAATAATTATGGAAGCA
    CTTGGTTACGTCTATGCACTTGATGAAAGAGGCAAGGGGAAATGGGTAGCTAAATACCCG
    CCTTGTCAACTTATTTTATTAGAAGAATTAAGCGAGTACCAATTTAACAACGATAGACCT
    CCATCCGAAAATAATCAGCTGATGCAATGGTCCCATAGGGGTGTTTTTCAAGAATTGATA
    AATCAAGCTCAAGTACACGATTTGCTGGTAGGTACTATGTACGCAGCGTTTTCGAGCCGT
    TTTGATGCAAGAACTGGTGCCCCAGGTATCAGATGTCGACGTGTTCCGGCCAGATGTACA
    CAGGAACATAACCCTGAGCCATTTCCGTGGTGGCTTAATAAGTTTGTTGTCGAGCACACA
    TTAGACGCATGCCCTCTGAGAGCAGATGACCTTATACCCACTGGAGAAGGCGAAATATTT
    GTTAGTCCATTCTCTGCAGAAGAAGGTGACTTTCACCAGATACATGCAGACTTAAATGCA
    GCACAGAATCTCCAACAAAGGTTGTGGTCGGATTTTGATATTTCGCAAATAAGACTAAGA
    TGCGATTGGGGAGAGGTTGATGGAGAATTGGTGCTGATTCCAAGATTAACCGGAAAGCGA
    ACTGCCGATTCCTATTCTAACAAGGTGTTTTACACAAATACTGGTGTTACCTATTACGAA
    AGAGAAAGGGGTAAGAAGAGACGTAAAGTATTTGCTCAAGAAAAATTGTCAGAAGAGGAG
    GCAGAACTGTTAGTAGAAGCAGACGAAGCCAGAGAAAAATCAGTTGTGCTTATGCGTGAC
    CCTTCCGGCATTATAAATCGTGGTAATTGGACACGACAAAAAGAATTTTGGTCTATGGTC
    AATCAACGTATCGAAGGCTACCTAGTTAAGCAAATCAGGTCTAGGGTTCCACTACAAGAT
    AGCGCATGTGAAAATACGGGTGATATATAA
    SEQ ATGGCTACTAGATCTTTCATTTTAAAAATTGAACCTAATGAAGAAGTGAAGAAGGGTCTC
    ID TGGAAAACTCACGAAGTACTTAATCATGGCATTGCCTATTATATGAATATCCTGAAGCTT
    NO: ATTCGTCAAGAAGCTATATACGAGCATCATGAGCAAGATCCTAAGAACCCTAAGAAAGTA
    140 AGCAAAGCGGAAATTCAGGCTGAATTGTGGGACTTCGTCTTGAAGATGCAGAAGTGTAAC
    AGTTTTACGCACGAAGTTGATAAAGATGTGGTGTTTAATATTTTGAGGGAGCTATATGAG
    GAGTTGGTGCCCTCGAGTGTCGAAAAAAAAGGAGAAGCTAATCAGCTGTCAAATAAATTT
    TTATATCCTCTGGTGGATCCAAACTCTCAATCAGGTAAAGGCACTGCCAGTAGTGGTCGA
    AAACCGAGATGGTATAATTTGAAAATCGCAGGTGATCCATCGTGGGAAGAAGAAAAAAAA
    AAATGGGAAGAAGATAAAAAAAAAGATCCCCTTGCCAAAATACTAGGTAAGCTAGCCGAG
    TATGGACTTATACCATTATTCATTCCTTTCACGGACTCTAATGAACCAATTGTGAAGGAA
    ATCAAATGGATGGAAAAATCACGTAATCAGTCTGTTAGGAGGTTGGACAAAGATATGTTT
    ATACAGGCTCTTGAGAGGTTTTTGTCGTGGGAGTCCTGGAATTTGAAAGTGAAAGAAGAA
    TATGAAAAAGTGGAAAAGGAGCATAAGACGTTGGAAGAAAGGATTAAGGAAGATATTCAG
    GCCTTTAAGAGTCTGGAACAGTACGAAAAAGAAAGACAGGAACAGTTATTGAGAGATACT
    CTAAACACTAATGAATATAGGCTTTCCAAGAGGGGCTTGCGAGGATGGAGAGAGATAATT
    CAGAAATGGTTGAAAATGGATGAGAACGAGCCATCGGAGAAATATCTAGAGGTGTTTAAA
    GATTACCAAAGAAAGCACCCTCGCGAAGCTGGTGATTACTCTGTTTATGAATTCCTTTCG
    AAGAAGGAAAATCACTTCATCTGGCGAAATCATCCAGAGTACCCATATTTATATGCTACA
    TTTTGCGAAATTGACAAGAAAAAAAAAGATGCTAAACAGCAAGCGACATTCACCCTCGCT
    GATCCCATCAACCACCCATTATGGGTCAGGTTCGAAGAGAGATCAGGCTCGAACCTGAAT
    AAGTACAGGATCTTGACTGAGCAATTGCATACTGAGAAGTTAAAAAAGAAATTGACGGTC
    CAACTTGACAGATTGATTTATCCCACTGAATCTGGTGGATGGGAGGAGAAAGGTAAGGTT
    GATATTGTCCTATTGCCTTCTCGTCAATTTTACAACCAAATATTTCTGGACATCGAAGAG
    AAGGGTAAACATGCTTTTACCTATAAGGATGAGAGTATTAAATTTCCATTGAAGGGAACG
    CTTGGCGGCGCTAGAGTTCAGTTCGATAGAGATCATTTGAGAAGATACCCGCATAAAGTG
    GAATCTGGTAATGTAGGTCGGATCTACTTTAACATGACGGTAAATATTGAACCTACCGAG
    TCACCAGTCAGTAAGTCTTTAAAGATTCATAGGGATGATTTCCCTAAATTTGTCAACTTC
    AAGCCTAAGGAACTAACCGAGTGGATCAAAGACAGTAAAGGCAAAAAGTTAAAGAGCGGT
    ATTGAGTCCCTGGAGATAGGTCTTAGAGTCATGTCTATCGATTTGGGTCAAAGACAAGCA
    GCCGCAGCATCTATTTTCGAAGTTGTTGACCAAAAACCGGATATCGAGGGGAAATTATTT
    TTTCCAATAAAAGGAACTGAGCTATACGCTGTGCATCGCGCATCCTTCAATATAAAACTG
    CCAGGAGAAACACTAGTAAAATCTAGAGAGGTCTTGCGTAAAGCACGTGAGGACAATCTC
    AAATTAATGAATCAGAAGTTAAATTTCCTTAGGAACGTGTTGCATTTCCAACAGTTCGAG
    GACATAACTGAACGCGAGAAAAGAGTCACTAAGTGGATCTCAAGACAAGAAAATAGTGAT
    GTGCCATTAGTGTATCAAGACGAACTTATTCAAATAAGAGAGCTAATGTATAAACCATAT
    AAAGACTGGGTGGCATTCTTAAAACAATTACACAAGCGGCTTGAAGTAGAAATAGGAAAA
    GAAGTAAAGCATTGGAGGAAGAGTCTGTCCGATGGTCGCAAAGGCCTGTACGGGATATCA
    CTTAAAAATATTGATGAAATTGACAGAACACGAAAATTTTTGTTAAGATGGTCATTGAGA
    CCAACCGAACCAGGTGAGGTTAGAAGGTTGGAACCAGGCCAAAGGTTTGCCATCGATCAA
    TTAAACCATCTTAACGCACTGAAAGAAGATAGATTGAAGAAGATGGCGAACACTATTATT
    ATGCACGCTCTAGGTTATTGCTATGATGTGAGAAAGAAAAAATGGCAAGCCAAGAACCCT
    GCATGCCAAATTATTTTGTTTGAAGATCTTTCTAATTACAATCCATACGAAGAGCGTTCA
    CGTTTTGAAAACTCTAAATTGATGAAATGGTCTAGAAGAGAGATTCCGAGACAGGTCGCT
    CTACAAGGGGAGATTTACGGTCTTCAAGTCGGTGAGGTTGGTGCTCAATTTTCTTCCAGA
    TTTCATGCAAAAACTGGGTCTCCAGGCATTAGGTGTTCGGTCGTTACTAAGGAAAAGTTA
    CAGGACAACCGTTTCTTCAAAAATTTGCAACGTGAAGGCCGTTTAACACTTGATAAGATA
    GCTGTCCTTAAGGAAGGCGATCTGTACCCAGATAAAGGTGGTGAGAAATTCATATCTTTG
    AGTAAAGACAGGAAACTGGTTACAACACACGCCGACATTAACGCAGCTCAGAACTTGCAA
    AAGAGATTCTGGACAAGGACCCACGGCTTCTATAAGGTGTACTGTAAAGCTTATCAAGTA
    GATGGACAAACGGTTTATATTCCTGAATCAAAGGACCAGAAACAAAAAATTATAGAAGAA
    TTTGGTGAAGGATACTTTATCTTGAAGGATGGAGTTTATGAGTGGGGCAATGCAGGTAAG
    TTAAAGATAAAGAAAGGTTCATCAAAGCAATCAAGTAGCGAACTGGTCGATTCGGATATT
    TTAAAGGATAGCTTTGATCTAGCTAGTGAATTGAAGGGAGAAAAGTTAATGTTATACAGA
    GATCCCAGTGGGAATGTATTTCCATCTGATAAGTGGATGGCCGCCGGAGTGTTTTTTGGC
    AAATTAGAGAGAATCTTGATTTCTAAACTGACCAATCAATACTCAATTTCGACCATCGAA
    GACGACTCTTCAAAACAATCCATGTGA
    SEQ ATGCCTACTCGCACCATCAATCTGAAGTTAGTTTTGGGGAAGAACCCAGAAAATGCGACT
    ID CTAAGACGGGCACTATTCTCTACACATAGACTTGTCAACCAAGCGACTAAGAGAATTGAA
    NO: GAATTTTTACTGTTGTGTAGAGGAGAAGCTTATCGTACCGTAGATAATGAAGGTAAAGAA
    141 GCTGAGATCCCACGCCATGCTGTTCAAGAAGAGGCGCTTGCTTTTGCAAAAGCTGCACAA
    CGACATAACGGCTGTATCTCCACATATGAGGACCAGGAAATCTTGGATGTGCTTAGACAA
    TTGTATGAAAGATTAGTACCTAGCGTCAATGAAAACAACGAGGCTGGGGATGCCCAAGCC
    GCTAACGCTTGGGTGAGTCCATTAATGAGTGCAGAGTCCGAAGGTGGACTATCGGTCTAT
    GATAAAGTGTTAGACCCGCCGCCAGTATGGATGAAACTCAAAGAAGAGAAAGCGCCTGGT
    TGGGAAGCTGCTTCTCAGATTTGGATACAGTCCGACGAAGGTCAATCGCTGCTAAATAAA
    CCGGGTAGCCCACCACGTTGGATTAGAAAACTTAGATCTGGTCAACCGTGGCAAGATGAC
    TTCGTTTCAGACCAAAAAAAAAAGCAAGATGAACTAACGAAAGGTAACGCACCACTCATA
    AAACAATTGAAAGAGATGGGCCTCTTGCCTTTAGTTAATCCCTTTTTTAGACATTTGTTG
    GATCCCGAGGGTAAGGGTGTATCCCCATGGGACAGATTGGCCGTAAGGGCCGCGGTGGCG
    CACTTCATCTCTTGGGAAAGTTGGAACCACAGAACAAGAGCTGAGTATAACAGTTTGAAA
    CTGCGAAGAGATGAATTTGAGGCCGCATCTGATGAATTCAAGGACGATTTTACATTGCTA
    CGACAATATGAGGCTAAGCGACATAGTACGCTTAAGTCAATTGCCTTAGCTGATGACTCT
    AACCCGTACCGAATTGGTGTAAGGTCCTTGAGAGCCTGGAATAGGGTTAGAGAAGAATGG
    ATTGACAAAGGCGCAACCGAGGAACAAAGGGTTACCATCCTTAGTAAGCTTCAAACACAA
    TTACGGGGTAAATTCGGTGATCCAGACCTATTTAATTGGCTAGCCCAAGATAGACACGTA
    CACCTGTGGTCCCCGAGAGATTCCGTCACGCCCCTCGTAAGGATTAATGCCGTCGACAAA
    GTGCTTAGAAGACGTAAGCCTTATGCACTGATGACTTTTGCACATCCGAGATTCCATCCA
    AGATGGATTCTATACGAAGCGCCTGGTGGTTCTAACTTGCGACAATACGCTTTAGATTGT
    ACTGAAAATGCTCTGCATATTACACTTCCATTACTCGTCGACGACGCCCATGGTACATGG
    ATTGAGAAAAAAATCCGCGTACCACTCGCTCCTAGTGGACAAATACAAGATTTAACTTTA
    GAAAAACTTGAAAAGAAAAAAAACAGATTATACTATAGATCAGGATTCCAACAATTTGCT
    GGATTAGCCGGTGGTGCTGAGGTGTTGTTTCATAGGCCGTATATGGAACATGATGAGAGA
    TCAGAAGAATCTCTGTTGGAAAGGCCAGGCGCTGTGTGGTTCAAATTAACCTTAGATGTT
    GCTACCCAAGCACCACCTAACTGGTTAGATGGTAAAGGCAGAGTTAGGACACCTCCAGAA
    GTTCATCATTTCAAAACCGCTCTGTCAAATAAATCTAAACATACGAGAACCTTGCAACCA
    GGATTGAGAGTCCTTTCTGTTGATTTGGGTATGAGAACATTTGCTTCTTGTTCTGTTTTC
    GAATTGATCGAAGGTAAACCTGAAACAGGTAGAGCATTCCCTGTTGCTGACGAAAGATCA
    ATGGATAGTCCAAATAAGTTATGGGCCAAGCACGAGAGAAGCTTTAAACTAACTCTGCCT
    GGAGAAACACCGAGCAGAAAGGAGGAAGAAGAGAGAAGCATTGCTAGGGCAGAGATTTAC
    GCGCTGAAAAGAGATATTCAAAGACTGAAATCACTCCTAAGATTAGGTGAGGAAGATAAT
    GATAATAGAAGAGATGCTTTGTTAGAGCAATTCTTTAAAGGATGGGGTGAAGAGGACGTA
    GTTCCTGGTCAAGCTTTCCCTAGAAGCCTCTTTCAGGGATTAGGCGCTGCACCCTTTAGG
    TCAACACCCGAATTGTGGAGACAGCACTGTCAGACGTATTACGACAAAGCGGAAGCTTGC
    CTGGCAAAGCATATTTCCGACTGGAGGAAGAGAACTAGACCTCGTCCGACTTCGAGAGAG
    ATGTGGTATAAGACAAGATCTTACCATGGTGGCAAAAGTATTTGGATGCTAGAATACTTA
    GATGCTGTCCGCAAATTACTACTTTCATGGTCGTTAAGAGGTCGTACTTACGGAGCTATT
    AATAGACAAGACACCGCTCGTTTTGGTTCCTTAGCTTCTAGATTGTTGCATCATATCAAC
    TCTTTAAAGGAAGACCGCATCAAAACCGGTGCAGATAGTATTGTGCAGGCCGCAAGGGGC
    TATATTCCTCTCCCACATGGCAAGGGTTGGGAACAGCGTTATGAACCCTGTCAGTTGATA
    TTATTTGAAGATCTAGCTAGGTACAGATTTCGTGTAGACAGACCTCGGAGAGAGAATTCG
    CAATTGATGCAGTGGAATCATCGAGCTATAGTAGCAGAAACGACGATGCAAGCTGAACTA
    TACGGTCAAATAGTCGAAAATACCGCTGCTGGTTTCTCCTCAAGATTTCATGCTGCAACT
    GGTGCTCCTGGTGTCAGATGTCGCTTTTTGTTAGAACGAGATTTCGATAATGACCTACCA
    AAGCCGTACTTACTGAGAGAACTAAGTTGGATGTTAGGTAACACAAAGGTTGAATCAGAG
    GAAGAAAAATTGCGTCTTCTAAGCGAGAAAATTAGACCAGGTTCATTAGTCCCTTGGGAT
    GGGGGTGAACAATTCGCGACATTACACCCGAAAAGACAAACTCTTTGTGTCATTCACGCA
    GATATGAACGCTGCTCAAAACCTGCAACGCAGATTTTTCGGAAGGTGTGGGGAAGCCTTT
    CGCCTTGTGTGTCAGCCACATGGTGATGATGTTTTGAGGCTAGCGTCTACACCAGGTGCA
    AGACTTTTGGGTGCATTACAACAACTGGAAAATGGTCAGGGAGCTTTCGAATTAGTTCGT
    GATATGGGTAGCACATCACAAATGAATCGTTTCGTCATGAAGTCGTTGGGCAAAAAAAAG
    ATCAAGCCATTACAAGACAATAACGGGGATGATGAACTAGAAGACGTGCTATCTGTTTTA
    CCTGAAGAAGATGATACCGGACGAATTACTGTATTTCGGGACTCTTCGGGTATATTCTTC
    CCTTGTAACGTTTGGATCCCGGCAAAACAGTTCTGGCCTGCGGTCCGTGCTATGATTTGG
    AAGGTTATGGCATCACATTCATTGGGTTAG
    SEQ ATGACAAAGTTAAGGCATAGACAGAAGAAGTTAACTCACGATTGGGCGGGGTCTAAAAAG
    ID AGAGAAGTTCTAGGGAGCAATGGTAAATTACAGAATCCATTGCTAATGCCCGTCAAAAAA
    NO: GGTCAGGTGACAGAATTTCGAAAAGCATTTTCCGCATACGCCCGAGCAACCAAAGGGGAA
    142 ATGACGGATGGCAGAAAAAATATGTTTACTCACTCATTTGAACCATTCAAGACCAAGCCT
    TCGTTACATCAGTGCGAACTGGCTGACAAAGCCTACCAGAGCTTGCATTCATATTTACCG
    GGTTCTTTGGCGCATTTTCTTTTATCTGCCCATGCACTTGGTTTTAGGATTTTTAGCAAA
    TCAGGGGAAGCCACTGCATTCCAAGCGTCCTCAAAGATTGAAGCTTACGAAAGCAAGTTA
    GCTAGCGAGCTTGCTTGTGTTGATTTGTCTATTCAGAACTTGACTATTTCAACTTTGTTC
    AACGCATTAACGACTTCCGTAAGAGGTAAAGGTGAGGAGACATCGGCAGATCCACTGATA
    GCTAGATTTTACACCTTACTTACCGGTAAACCACTAAGCAGAGACACTCAGGGCCCAGAA
    CGAGATTTAGCCGAGGTGATAAGCAGAAAAATTGCAAGTTCTTTTGGAACTTGGAAGGAG
    ATGACTGCCAATCCACTTCAATCTCTTCAATTTTTTGAAGAGGAGTTGCATGCGCTAGAT
    GCAAATGTTAGTTTGTCACCTGCCTTCGATGTTCTGATTAAGATGAACGACCTGCAGGGT
    GACTTGAAGAACAGAACGATAGTTTTTGATCCAGATGCTCCTGTGTTTGAATATAATGCT
    GAGGATCCTGCTGACATCATCATTAAACTGACAGCTAGATATGCGAAAGAAGCAGTGATT
    AAAAATCAAAATGTCGGGAATTATGTTAAGAACGCTATTACGACAACTAACGCAAACGGA
    CTAGGTTGGTTGCTGAACAAAGGCCTTTCCTTATTGCCTGTCTCCACTGATGACGAACTA
    TTGGAGTTTATTGGGGTCGAGAGATCCCATCCTAGCTGTCATGCGTTGATAGAACTTATC
    GCTCAGTTAGAAGCACCTGAACTGTTCGAAAAAAATGTTTTTTCTGATACTCGTTCCGAG
    GTTCAAGGTATGATAGATTCAGCTGTAAGCAATCATATCGCCAGGCTGTCAAGCTCTCGT
    AATTCATTGAGCATGGACTCAGAGGAACTTGAGAGATTGATAAAATCTTTTCAAATTCAT
    ACACCACATTGTTCATTATTTATAGGGGCTCAATCCTTATCTCAACAATTGGAAAGCCTA
    CCCGAAGCATTGCAGTCAGGAGTGAACAGTGCTGATATTCTGCTCGGCTCAACCCAATAC
    ATGTTGACAAATTCTTTGGTCGAGGAGTCAATCGCTACGTATCAGAGAACCTTAAATAGA
    ATTAACTACCTGTCCGGCGTTGCAGGACAGATTAACGGTGCTATTAAGAGGAAAGCTATT
    GATGGTGAGAAGATACATTTACCCGCTGCTTGGTCAGAGTTAATTTCTTTACCCTTTATT
    GGGCAACCAGTGATTGATGTTGAATCAGATTTAGCCCACTTAAAGAACCAATACCAGACA
    TTGTCTAACGAATTTGATACGCTGATTTCCGCACTGCAAAAGAATTTCGACTTAAATTTT
    AATAAAGCCTTGCTTAATCGAACACAACATTTCGAGGCTATGTGTAGATCAACAAAAAAG
    AATGCCCTTTCTAAGCCTGAGATCGTTAGTTATAGAGATTTGCTAGCCAGGTTGACTTCT
    TGTCTTTATAGGGGCTCTCTAGTCTTGAGGAGGGCGGGTATAGAAGTACTGAAAAAGCAC
    AAGATATTTGAGTCCAACTCTGAATTAAGAGAGCACGTTCATGAAAGAAAACACTTCGTA
    TTTGTTTCTCCGCTCGATAGAAAAGCCAAGAAGCTCCTACGTTTGACTGACTCTAGGCCT
    GATTTATTGCACGTAATTGATGAAATACTACAACATGATAATTTAGAGAACAAGGATAGA
    GAATCTTTGTGGTTAGTTCGATCTGGTTATTTACTGGCCGGCCTACCAGACCAACTCTCC
    TCTTCCTTTATAAATCTTCCAATCATTACTCAAAAAGGCGATCGTCGCTTGATAGATCTC
    ATTCAATACGACCAAATTAATAGAGATGCTTTTGTGATGTTGGTAACTTCCGCTTTTAAG
    TCGAACTTAAGTGGGCTGCAGTACAGAGCAAACAAACAATCTTTTGTGGTTACGCGCACT
    TTGTCACCATATTTGGGATCTAAATTGGTTTATGTGCCCAAAGATAAAGATTGGCTGGTC
    CCTTCCCAAATGTTCGAGGGGAGATTTGCGGACATTTTGCAATCCGATTATATGGTGTGG
    AAGGACGCTGGAAGATTGTGTGTTATTGACACAGCTAAGCATTTGTCTAACATTAAAAAA
    TCTGTATTCTCAAGTGAAGAAGTCCTCGCGTTTTTAAGAGAATTGCCACACCGTACGTTT
    ATCCAAACTGAGGTCAGGGGTTTAGGGGTGAATGTGGACGGTATTGCATTTAATAACGGG
    GATATACCCTCTCTGAAGACGTTTAGCAATTGCGTGCAAGTCAAAGTGAGTCGGACAAAC
    ACTAGTCTGGTCCAAACATTAAATAGATGGTTTGAAGGCGGTAAGGTCTCGCCGCCTAGC
    ATCCAATTTGAGAGAGCATATTACAAAAAAGATGATCAAATCCACGAGGACGCTGCAAAA
    AGGAAGATAAGGTTTCAAATGCCAGCTACAGAGTTGGTACACGCGTCAGACGACGCAGGA
    TGGACCCCCTCCTATTTACTTGGTATCGATCCCGGTGAATATGGTATGGGTTTGTCATTG
    GTCTCAATAAATAATGGCGAAGTTTTAGATAGCGGATTTATACACATAAATTCATTGATA
    AATTTCGCTTCTAAGAAATCAAATCATCAAACCAAAGTTGTTCCGAGGCAGCAATACAAG
    TCACCATACGCCAACTATCTAGAACAATCTAAAGATTCTGCAGCAGGAGACATAGCTCAT
    ATTTTGGATAGACTTATCTACAAGTTGAACGCCCTACCCGTTTTCGAAGCTCTATCTGGC
    AATAGTCAAAGCGCAGCGGATCAGGTTTGGACAAAAGTCCTCAGCTTCTACACCTGGGGA
    GATAATGATGCACAAAATTCAATTCGTAAGCAACATTGGTTCGGTGCTTCACACTGGGAC
    ATTAAAGGCATGTTGAGGCAACCGCCAACAGAAAAAAAGCCCAAACCATACATTGCCTTT
    CCCGGTTCACAAGTTTCTTCTTATGGTAATTCTCAAAGGTGTTCATGTTGTGGACGTAAC
    CCAATTGAACAATTGCGCGAAATGGCGAAGGACACATCCATTAAGGAGTTGAAGATTAGA
    AATTCAGAAATTCAATTGTTCGACGGTACTATAAAGTTATTTAATCCAGACCCGTCAACG
    GTCATAGAAAGAAGAAGACATAATTTAGGGCCATCAAGAATTCCTGTAGCTGATAGAACT
    TTCAAAAATATAAGTCCAAGCTCACTAGAATTCAAAGAACTAATAACGATTGTGTCACGG
    TCTATACGTCATTCCCCAGAATTTATTGCTAAAAAAAGAGGTATAGGTAGTGAGTACTTT
    TGTGCTTATAGTGATTGTAATTCCTCCTTAAATTCAGAAGCAAATGCGGCTGCGAACGTT
    GCCCAAAAGTTCCAAAAGCAATTGTTTTTCGAATTATAG
    SEQ ATGAAAAGAATCTTGAACTCTTTAAAGGTTGCCGCCCTGCGTTTGTTATTTAGAGGTAAA
    ID GGATCTGAACTTGTCAAGACTGTTAAATACCCTTTGGTCTCGCCGGTTCAGGGTGCAGTT
    NO: GAGGAGTTAGCTGAGGCGATCCGCCATGATAACCTACATCTGTTTGGTCAAAAAGAAATT
    143 GTTGACCTTATGGAAAAGGATGAAGGTACGCAAGTTTACTCAGTGGTTGATTTCTGGTTA
    GATACCCTTCGTTTGGGGATGTTTTTCAGTCCATCAGCAAACGCATTAAAAATCACGCTG
    GGTAAGTTTAATTCTGATCAGGTTAGCCCTTTTAGGAAAGTGTTAGAGCAGTCTCCATTC
    TTCTTGGCTGGTAGGCTGAAGGTTGAACCGGCAGAACGTATATTATCTGTCGAGATCCGT
    AAGATTGGGAAGAGGGAAAACAGAGTTGAGAACTATGCTGCTGACGTAGAAACGTGTTTT
    ATAGGCCAATTAAGTTCAGATGAGAAACAGTCAATACAAAAATTAGCTAATGATATCTGG
    GATAGTAAAGATCATGAAGAGCAAAGAATGTTAAAGGCAGATTTCTTCGCTATCCCTTTG
    ATTAAGGATCCAAAGGCTGTGACCGAAGAGGATCCTGAAAATGAAACTGCTGGTAAACAA
    AAACCCTTGGAGTTGTGTGTCTGCCTTGTCCCAGAACTTTACACAAGAGGATTCGGGTCA
    ATAGCCGATTTTTTGGTTCAACGCTTAACTCTTTTAAGGGATAAAATGTCTACAGATACT
    GCAGAAGATTGTTTAGAATATGTCGGGATTGAGGAGGAAAAAGGTAACGGCATGAACTCA
    TTGTTGGGAACGTTCTTAAAGAATTTGCAAGGCGATGGATTTGAGCAGATTTTCCAATTT
    ATGTTAGGGAGCTATGTCGGTTGGCAAGGGAAGGAAGATGTTTTAAGAGAGAGATTAGAC
    TTATTGGCTGAAAAAGTGAAGAGGTTACCGAAACCAAAATTTGCTGGCGAATGGTCTGGT
    CATAGGATGTTCTTGCATGGCCAATTGAAGTCTTGGTCTTCAAATTTTTTTAGACTATTT
    AACGAGACAAGGGAACTTCTAGAGTCTATTAAGTCAGATATACAGCATGCCACAATGCTA
    ATATCATATGTAGAAGAAAAAGGTGGTTATCATCCTCAATTACTTAGTCAATATAGAAAA
    CTTATGGAACAACTACCAGCTTTGCGTACCAAGGTATTGGACCCTGAGATTGAAATGACA
    CATATGTCCGAAGCAGTTCGCTCTTATATAATGATACATAAATCTGTTGCGGGTTTTTTA
    CCGGATTTATTAGAATCATTAGATAGAGACAAGGATCGTGAGTTTCTGCTTAGTATTTTT
    CCAAGAATCCCAAAAATTGATAAAAAAACCAAGGAAATTGTAGCTTGGGAACTGCCGGGA
    GAACCAGAAGAAGGTTATTTATTTACTGCTAATAACTTGTTCAGAAACTTCTTAGAGAAT
    CCGAAACATGTCCCGAGATTTATGGCCGAAAGGATCCCAGAAGATTGGACTCGATTACGC
    TCTGCTCCTGTCTGGTTCGATGGAATGGTAAAACAATGGCAAAAAGTCGTTAACCAGTTA
    GTAGAATCACCAGGTGCTTTATATCAATTTAACGAATCCTTCTTGAGACAAAGGTTACAG
    GCCATGTTAACTGTGTATAAGAGGGACTTACAAACTGAAAAATTTCTTAAACTTTTGGCG
    GATGTTTGTAGGCCTCTTGTAGATTTTTTTGGTTTGGGTGGAAATGATATTATTTTTAAG
    AGCTGTCAAGACCCAAGAAAACAATGGCAAACCGTTATTCCTCTCTCTGTTCCGGCAGAT
    GTCTATACTGCTTGCGAAGGTTTGGCGATTAGACTAAGGGAGACATTAGGATTCGAATGG
    AAGAATTTGAAAGGTCACGAGAGAGAAGATTTCTTAAGATTGCACCAGTTATTGGGCAAT
    TTACTTTTCTGGATTCGTGATGCTAAATTGGTAGTAAAATTAGAGGATTGGATGAACAAC
    CCATGTGTTCAGGAATATGTAGAAGCCCGGAAAGCTATCGATCTTCCACTAGAAATATTC
    GGTTTTGAAGTGCCTATCTTCCTGAATGGCTATCTATTTTCGGAGTTGAGACAATTAGAA
    CTTTTGCTTAGGAGAAAAAGTGTGATGACTAGCTACAGTGTAAAGACTACTGGATCTCCT
    AATAGGCTATTTCAGCTAGTTTATTTACCTCTAAACCCTAGTGACCCCGAAAAGAAGAAC
    TCAAATAACTTTCAAGAACGTTTGGATACCCCAACTGGTTTGTCCCGTCGTTTCCTAGAC
    CTAACCCTTGATGCATTCGCAGGTAAGTTACTTACCGATCCAGTTACACAAGAATTGAAG
    ACAATGGCAGGTTTTTACGATCATCTTTTTGGATTCAAATTGCCATGTAAACTCGCCGCC
    ATGTCGAATCATCCAGGTTCTTCTTCAAAGATGGTTGTGTTAGCGAAACCCAAAAAAGGT
    GTTGCTTCTAATATAGGGTTTGAACCGATCCCAGATCCCGCTCATCCCGTATTTAGGGTT
    AGATCCAGTTGGCCAGAGTTGAAGTACCTCGAGGGGCTATTGTATTTGCCAGAAGACACA
    CCTTTGACCATCGAATTAGCAGAGACCTCCGTATCGTGCCAAAGTGTCTCGTCAGTTGCA
    TTCGATTTGAAAAACTTGACAACGATCTTAGGTCGTGTGGGAGAATTTAGGGTCACAGCT
    GATCAACCCTTTAAACTAACGCCTATAATCCCGGAGAAAGAAGAATCTTTTATTGGTAAA
    ACTTATTTGGGTCTCGACGCGGGTGAAAGGAGCGGCGTCGGTTTCGCTATTGTTACAGTG
    GACGGAGATGGGTACGAAGTGCAAAGATTGGGGGTCCACGAGGATACACAGCTTATGGCC
    TTGCAGCAAGTTGCTAGTAAATCCTTAAAAGAGCCAGTATTTCAGCCTCTAAGAAAAGGC
    ACCTTTAGACAACAAGAAAGAATACGGAAATCCTTACGTGGTTGCTACTGGAATTTTTAT
    CATGCCTTGATGATAAAATATAGGGCCAAAGTAGTACATGAGGAATCTGTCGGAAGTAGT
    GGTCTTGTGGGTCAATGGTTGAGGGCTTTTCAGAAGGATTTGAAGAAAGCCGATGTTCTC
    CCCAAGAAGGGCGGTAAAAACGGTGTAGATAAGAAGAAGAGAGAGTCCTCAGCTCAAGAC
    ACTCTTTGGGGTGGTGCTTTCTCTAAAAAGGAGGAGCAACAGATTGCGTTTGAGGTGCAA
    GCTGCAGGTTCTTCGCAATTTTGTTTGAAGTGCGGATGGTGGTTCCAACTAGGCATGCGT
    GAAGTAAACAGGGTACAAGAATCGGGCGTCGTGTTAGATTGGAATAGAAGCATAGTTACC
    TTTTTAATAGAATCATCCGGCGAAAAAGTTTATGGTTTCTCCCCACAGCAATTAGAGAAG
    GGTTTCAGACCAGACATCGAAACTTTTAAAAAGATGGTAAGAGACTTTATGAGACCTCCT
    ATGTTTGATAGAAAAGGCAGACCGGCCGCAGCTTACGAGAGATTTGTTTTAGGAAGGAGA
    CATCGAAGGTACAGGTTTGATAAAGTATTTGAGGAAAGATTTGGGAGGTCTGCTCTTTTC
    ATTTGTCCTAGAGTAGGTTGTGGAAATTTTGACCACAGCTCCGAACAGTCCGCGGTTGTT
    TTGGCCTTGATCGGATATATTGCCGATAAGGAGGGAATGTCAGGTAAGAAGTTGGTTTAT
    GTACGGCTGGCCGAACTTATGGCCGAATGGAAACTAAAAAAATTAGAAAGATCCAGAGTT
    GAAGAACAATCATCCGCTCAATAA
    SEQ ATGGCAGAAAGCAAACAAATGCAGTGTAGGAAATGTGGAGCTAGTATGAAGTACGAAGTC
    ID ATCGGTTTGGGTAAAAAGTCATGTAGATACATGTGTCCCGATTGTGGCAACCATACCTCG
    NO: GCAAGAAAGATACAAAACAAAAAAAAAAGAGATAAAAAATATGGGTCAGCCAGTAAAGCC
    144 CAATCTCAAAGAATTGCTGTAGCAGGTGCTCTTTACCCTGACAAAAAAGTACAAACTATC
    AAAACCTATAAATATCCAGCAGACTTGAATGGTGAGGTGCATGATAGCGGTGTTGCCGAG
    AAAATCGCACAAGCAATACAAGAGGACGAGATTGGACTTTTGGGACCAAGCTCAGAATAT
    GCATGCTGGATTGCATCTCAAAAACAGTCTGAGCCTTACAGTGTAGTCGATTTCTGGTTT
    GATGCAGTGTGCGCAGGGGGAGTCTTCGCCTACTCTGGCGCTAGATTATTGAGTACAGTT
    TTACAGTTATCCGGTGAGGAATCGGTGCTTAGAGCTGCCTTAGCCTCGTCTCCATTCGTT
    GACGATATAAACTTAGCGCAAGCCGAAAAGTTTTTGGCGGTTAGCAGGCGTACAGGTCAA
    GATAAGTTAGGTAAGAGAATTGGGGAGTGCTTTGCAGAAGGAAGATTGGAAGCTTTAGGG
    ATAAAAGATAGAATGAGGGAATTTGTTCAAGCTATCGATGTTGCACAGACCGCCGGACAA
    CGTTTCGCTGCCAAATTGAAGATATTCGGTATAAGTCAGATGCCAGAAGCTAAGCAATGG
    AATAACGATTCCGGACTGACTGTCTGTATACTACCTGATTATTATGTTCCCGAAGAGAAT
    CGCGCGGACCAACTTGTAGTGTTGTTAAGAAGACTTCGCGAGATTGCATATTGCATGGGT
    ATTGAAGATGAAGCGGGTTTCGAACATCTTGGAATAGATCCTGGTGCTCTTTCGAATTTT
    TCAAACGGTAACCCTAAGAGAGGATTTCTAGGGAGGCTGTTAAATAACGATATTATTGCG
    TTGGCAAACAATATGAGTGCGATGACTCCATATTGGGAAGGGCGTAAGGGTGAACTCATA
    GAAAGGCTTGCGTGGTTAAAGCACAGGGCAGAAGGGCTGTATCTTAAAGAACCTCATTTC
    GGTAACTCCTGGGCCGATCATAGGTCACGAATTTTCTCAAGGATCGCAGGCTGGTTATCT
    GGTTGCGCTGGCAAGTTGAAAATTGCGAAAGACCAAATTTCTGGAGTACGTACAGATCTA
    TTTCTGCTAAAAAGACTGCTGGACGCAGTTCCGCAATCGGCGCCATCCCCCGATTTTATT
    GCGTCAATTTCGGCACTTGACAGGTTTTTAGAAGCTGCAGAATCGAGCCAGGACCCTGCT
    GAACAAGTGAGGGCTCTCTACGCTTTTCACTTGAACGCACCTGCAGTCCGAAGTATAGCC
    AATAAAGCAGTGCAAAGGTCCGACAGCCAAGAATGGCTGATAAAAGAACTAGACGCTGTT
    GACCATTTAGAATTTAACAAAGCGTTCCCATTTTTCTCTGACACAGGAAAAAAAAAAAAA
    AAAGGTGCTAATAGCAACGGTGCTCCATCGGAAGAAGAGTACACTGAAACGGAATCAATA
    CAACAACCTGAGGACGCGGAACAGGAAGTAAACGGACAAGAAGGGAACGGAGCGTCTAAA
    AATCAAAAGAAATTTCAAAGAATACCTAGATTCTTCGGTGAAGGCTCCAGATCTGAATAC
    AGAATTTTAACGGAAGCTCCACAGTATTTCGATATGTTTTGTAATAACATGAGGGCTATA
    TTTATGCAGTTAGAAAGTCAACCCCGTAAAGCTCCCAGAGATTTTAAATGTTTCCTACAA
    AATCGATTACAAAAATTATACAAACAGACTTTCTTGAATGCACGAAGCAACAAGTGTCGC
    GCTCTGCTTGAGTCAGTTTTAATCTCTTGGGGAGAATTTTATACATACGGTGCCAACGAA
    AAGAAATTTAGATTAAGACATGAAGCTTCAGAACGCAGCAGTGACCCAGATTACGTAGTT
    CAGCAAGCCTTGGAAATCGCGCGTCGTCTATTCCTTTTTGGCTTCGAATGGAGAGATTGC
    TCCGCTGGTGAAAGAGTGGATTTGGTTGAAATTCACAAAAAGGCTATCAGTTTTTTGTTG
    GCTATTACTCAAGCTGAGGTCTCTGTTGGTTCATACAATTGGCTTGGCAACTCAACAGTA
    TCGAGATATTTATCCGTTGCGGGAACTGATACCTTATACGGTACCCAATTGGAAGAATTC
    CTGAACGCTACAGTGTTGAGTCAAATGCGTGGTCTGGCCATTAGATTGAGTTCTCAAGAA
    CTTAAGGACGGTTTTGATGTGCAGCTCGAGTCTTCCTGCCAGGACAATCTGCAACACCTA
    TTGGTGTATAGGGCTTCGAGAGATTTGGCGGCTTGCAAGCGCGCTACTTGTCCAGCCGAA
    CTCGATCCTAAGATTTTAGTTTTACCGGTAGGTGCATTCATCGCTTCCGTAATGAAAATG
    ATAGAAAGAGGTGACGAACCTTTAGCTGGTGCTTATTTACGGCATAGGCCACACTCTTTC
    GGATGGCAAATTAGGGTCCGCGGTGTTGCTGAGGTAGGGATGGATCAGGGTACAGCATTG
    GCCTTTCAAAAGCCAACAGAGTCAGAACCTTTTAAAATTAAGCCCTTCTCTGCACAGTAT
    GGACCAGTTCTGTGGTTGAACAGTAGTAGTTATTCTCAATCACAATATTTGGACGGTTTT
    CTATCTCAACCAAAAAATTGGAGTATGAGGGTGTTGCCTCAGGCGGGTTCAGTTCGCGTC
    GAACAACGAGTTGCTTTGATATGGAACTTACAAGCAGGCAAGATGAGACTAGAACGCTCC
    GGTGCGAGGGCCTTTTTCATGCCTGTACCGTTTTCATTTAGGCCATCCGGCAGTGGGGAC
    GAAGCAGTTTTGGCGCCCAACCGGTACTTGGGTCTGTTCCCTCATTCCGGAGGTATAGAA
    TACGCTGTAGTGGATGTCCTGGATTCTGCTGGATTTAAAATTCTTGAAAGAGGCACTATT
    GCTGTCAATGGTTTCTCTCAGAAAAGGGGAGAGCGCCAAGAAGAAGCCCATCGTGAAAAA
    CAAAGAAGGGGGATAAGTGATATAGGGCGAAAGAAGCCTGTGCAGGCAGAAGTCGATGCG
    GCGAACGAATTGCATAGAAAGTACACTGATGTTGCCACAAGATTAGGTTGTAGAATCGTC
    GTTCAATGGGCACCACAACCTAAACCAGGGACAGCACCGACAGCGCAAACTGTTTACGCG
    AGGGCTGTTAGGACAGAAGCTCCGAGGAGCGGCAACCAAGAAGATCATGCAAGAATGAAA
    AGTTCTTGGGGTTACACCTGGGGTACGTATTGGGAGAAACGAAAACCAGAAGATATTTTA
    GGGATTTCTACACAGGTGTATTGGACAGGAGGTATAGGCGAATCCTGTCCTGCTGTAGCA
    GTCGCTTTATTAGGTCATATTAGAGCAACTTCAACACAAACGGAGTGGGAAAAGGAAGAA
    GTTGTCTTTGGAAGACTGAAGAAGTTCTTTCCGAGTTAA
    SEQ ATGGAGAAGAGAATTAATAAGATACGGAAAAAATTATCTGCGGATAATGCAACAAAGCCA
    ID GTCTCTCGTTCAGGCCCCATGAAAACCCTGCTTGTAAGAGTAATGACGGATGATTTAAAA
    NO: AAGAGGTTGGAAAAGCGTAGAAAAAAACCAGAAGTGATGCCGCAAGTGATCTCAAATAAC
    145 GCAGCTAATAATCTAAGGATGCTACTTGATGATTATACAAAAATGAAAGAAGCAATCCTG
    CAAGTTTACTGGCAGGAATTCAAGGATGACCATGTTGGACTAATGTGCAAATTCGCACAA
    CCAGCGTCTAAGAAAATTGACCAAAATAAATTGAAACCCGAAATGGACGAAAAAGGGAAT
    TTAACAACTGCCGGGTTTGCCTGCTCGCAATGTGGGCAACCATTATTTGTTTATAAATTA
    GAGCAGGTTTCGGAAAAAGGAAAGGCTTACACAAATTACTTCGGCAGATGTAATGTTGCC
    GAACACGAAAAACTCATATTGTTAGCTCAGTTGAAGCCTGAGAAAGACTCTGATGAGGCC
    GTTACTTACTCGTTGGGGAAGTTTGGTCAAAGAGCTCTCGATTTTTATTCTATTCATGTG
    ACAAAGGAGTCCACACATCCCGTCAAGCCCTTGGCACAAATTGCGGGTAATAGATACGCT
    TCGGGTCCAGTTGGGAAGGCCCTTTCTGATGCATGTATGGGCACAATTGCTAGCTTTCTT
    AGTAAATACCAGGATATCATAATAGAGCATCAAAAAGTTGTAAAGGGTAACCAAAAGAGA
    TTAGAATCGCTGCGTGAGTTGGCGGGTAAAGAAAACTTGGAATATCCATCTGTCACTCTG
    CCTCCTCAACCTCATACTAAGGAAGGTGTAGATGCGTACAATGAAGTTATCGCTAGAGTC
    CGTATGTGGGTGAATTTAAATTTGTGGCAAAAATTGAAGTTATCGCGTGATGATGCAAAA
    CCTCTTCTTAGACTAAAGGGCTTTCCTAGCTTCCCTGTAGTGGAAAGACGCGAAAATGAA
    GTCGATTGGTGGAATACAATTAACGAAGTCAAAAAACTGATCGATGCAAAGCGAGATATG
    GGTCGAGTTTTTTGGTCTGGTGTTACAGCTGAAAAAAGGAATACGATCTTAGAAGGTTAC
    AACTACTTGCCAAATGAGAACGATCATAAAAAAAGAGAAGGCAGTTTAGAAAATCCAAAA
    AAGCCAGCTAAGAGACAATTTGGTGATTTGCTACTTTACCTAGAAAAAAAGTACGCCGGA
    GATTGGGGGAAAGTCTTTGACGAAGCTTGGGAGAGAATAGATAAAAAAATAGCAGGATTG
    ACGTCACACATTGAAAGAGAAGAGGCGAGAAATGCAGAAGATGCTCAGTCCAAAGCTGTC
    CTCACCGACTGGTTGAGAGCCAAAGCGTCCTTTGTTCTCGAACGCCTAAAAGAAATGGAT
    GAGAAGGAATTTTATGCCTGCGAAATCCAGCTACAAAAATGGTACGGAGACTTGAGAGGT
    AACCCCTTTGCCGTGGAAGCAGAGAACCGTGTTGTAGATATCTCCGGTTTCTCAATCGGT
    AGCGATGGACACTCCATTCAGTATCGCAACTTGTTGGCCTGGAAATATTTGGAAAACGGT
    AAGAGGGAATTCTATTTACTTATGAATTATGGCAAGAAAGGTAGAATCAGGTTTACTGAC
    GGAACAGACATTAAAAAGAGTGGTAAGTGGCAAGGCCTTTTGTACGGTGGTGGCAAGGCC
    AAAGTAATAGACTTAACATTTGACCCCGACGACGAACAACTGATAATACTGCCTTTAGCT
    TTTGGTACTCGACAGGGGCGAGAGTTCATTTGGAATGATCTTTTGTCACTCGAGACTGGT
    TTGATAAAACTTGCAAATGGAAGAGTCATCGAGAAGACAATTTACAACAAAAAGATAGGT
    CGCGATGAGCCTGCACTATTTGTGGCCTTGACCTTTGAGAGAAGGGAAGTTGTCGACCCA
    TCCAATATTAAACCAGTCAACCTAATCGGTGTAGATAGAGGTGAAAACATCCCAGCTGTT
    ATCGCTCTGACAGACCCTGAAGGTTGCCCTTTGCCAGAATTTAAAGATTCGTCTGGTGGA
    CCAACAGATATATTACGTATTGGGGAAGGCTATAAAGAGAAACAACGTGCTATTCAGGCT
    GCAAAAGAAGTTGAACAGAGGAGAGCTGGAGGTTACAGTAGAAAATTCGCCAGTAAAAGT
    AGAAACTTAGCAGATGACATGGTTAGAAACTCTGCCCGGGATTTGTTCTATCATGCGGTT
    ACTCACGATGCAGTCTTAGTCTTTGAAAATCTATCGCGCGGTTTTGGTAGGCAAGGCAAG
    AGGACTTTTATGACAGAGAGACAATATACAAAAATGGAAGATTGGTTAACCGCGAAGCTC
    GCATATGAAGGTCTTACTTCGAAAACGTACCTCAGCAAAACGCTGGCTCAATATACTTCT
    AAAACTTGTTCAAATTGTGGTTTTACTATTACCACGGCAGACTACGACGGGATGTTGGTG
    AGATTGAAGAAGACGAGCGATGGTTGGGCAACAACATTGAATAATAAGGAATTAAAAGCA
    GAAGGACAGATTACGTATTACAATCGTTATAAACGCCAAACGGTTGAGAAAGAGTTGTCA
    GCCGAGTTGGATAGACTAAGTGAAGAGAGCGGTAACAATGATATCTCAAAGTGGACTAAA
    GGGAGGCGGGATGAAGCCCTCTTTTTACTAAAGAAGAGATTCTCACATAGACCTGTGCAA
    GAACAATTCGTTTGTTTAGATTGTGGCCATGAGGTTCATGCAGACGAACAGGCTGCGTTA
    AATATTGCGAGAAGCTGGCTATTTCTAAATTCTAATTCAACAGAGTTCAAGAGCTATAAA
    TCCGGAAAACAACCTTTCGTAGGCGCGTGGCAAGCCTTCTATAAAAGGAGATTAAAAGAG
    GTTTGGAAACCAAATGCA
    SEQ ATGAAAAGAATTAACAAAATTAGAAGGAGGCTGGTCAAAGATTCTAATACCAAGAAAGCT
    ID GGTAAGACTGGTCCGATGAAAACCCTATTAGTCAGAGTTATGACCCCAGATTTGAGAGAA
    NO: AGATTGGAGAACCTCAGGAAAAAGCCCGAAAACATCCCACAACCCATTAGTAACACATCA
    146 AGAGCTAATTTAAACAAGTTATTAACTGACTACACTGAAATGAAAAAAGCAATATTGCAT
    GTTTACTGGGAAGAGTTCCAGAAAGATCCTGTTGGGTTGATGTCTAGAGTTGCTCAACCG
    GCCCCAAAGAATATAGATCAAAGGAAACTTATTCCTGTGAAGGACGGCAATGAAAGATTA
    ACCAGCTCCGGTTTCGCTTGCTCCCAGTGCTGCCAACCCCTGTATGTATACAAACTGGAA
    CAAGTAAATGATAAAGGTAAGCCACATACTAACTACTTTGGTAGGTGTAATGTATCCGAG
    CATGAAAGATTGATCTTGTTAAGTCCCCATAAACCAGAAGCTAATGATGAGTTAGTAACT
    TATAGTTTAGGTAAGTTCGGACAACGAGCTTTAGATTTCTATAGCATCCATGTTACAAGA
    GAAAGCAATCACCCCGTCAAACCACTGGAACAAATCGGTGGTAATAGTTGTGCGTCAGGT
    CCAGTAGGCAAAGCTTTATCAGACGCTTGCATGGGTGCCGTGGCTAGTTTTTTGACGAAA
    TACCAAGATATTATACTGGAACATCAAAAGGTAATTAAAAAGAATGAAAAGAGACTCGCT
    AACTTAAAAGATATTGCAAGTGCCAATGGTTTAGCTTTTCCTAAAATTACCTTGCCACCT
    CAGCCACATACAAAGGAGGGAATTGAAGCTTACAATAATGTAGTAGCCCAAATAGTTATT
    TGGGTGAACCTTAACCTATGGCAAAAGTTAAAAATTGGTAGAGACGAAGCCAAACCCCTG
    CAGAGGCTGAAGGGTTTTCCCTCCTTCCCCTTAGTAGAGAGACAAGCTAATGAAGTGGAC
    TGGTGGGATATGGTGTGCAATGTTAAAAAATTGATTAATGAGAAGAAAGAGGATGGTAAA
    GTGTTTTGGCAGAATCTTGCTGGCTACAAGAGACAGGAAGCTTTACTGCCTTATTTATCT
    TCTGAGGAAGATAGGAAAAAAGGTAAAAAATTTGCTAGATATCAATTCGGAGACCTACTT
    CTGCATTTAGAAAAAAAACATGGCGAAGATTGGGGTAAAGTTTATGATGAAGCCTGGGAA
    AGAATTGATAAGAAGGTAGAAGGTCTCTCCAAACATATTAAATTAGAGGAAGAACGTAGG
    TCCGAAGACGCTCAATCAAAGGCAGCATTAACTGATTGGTTGAGAGCAAAAGCCTCTTTC
    GTTATTGAAGGATTAAAAGAAGCCGACAAAGATGAATTTTGTAGATGTGAGTTAAAGTTG
    CAAAAGTGGTATGGAGACCTCCGTGGTAAACCTTTTGCTATTGAGGCTGAAAATTCTATA
    CTCGATATCTCTGGATTTTCAAAACAATATAACTGCGCATTTATATGGCAGAAAGATGGT
    GTTAAAAAGCTAAATCTATACTTAATTATCAATTACTTTAAAGGTGGTAAATTGCGTTTT
    AAGAAGATAAAGCCTGAAGCCTTTGAGGCAAACCGTTTTTACACTGTTATCAATAAAAAA
    TCTGGGGAAATCGTACCAATGGAAGTTAATTTCAATTTCGATGATCCTAATCTTATTATT
    TTACCTCTTGCTTTCGGCAAAAGGCAAGGTAGGGAGTTTATTTGGAATGATTTATTGTCG
    CTGGAAACGGGGTCTCTCAAACTCGCAAACGGTAGGGTGATAGAAAAAACATTATACAAC
    AGGAGAACTCGGCAGGATGAGCCAGCTCTTTTTGTGGCTCTGACATTCGAGAGAAGGGAA
    GTTTTAGATTCATCTAACATCAAACCAATGAATTTAATAGGTATTGACCGGGGTGAAAAT
    ATACCTGCAGTTATTGCTTTAACTGATCCTGAGGGATGTCCTCTTAGCAGATTCAAGGAC
    TCGTTGGGTAACCCTACTCACATCTTAAGGATTGGAGAAAGTTACAAGGAGAAACAAAGG
    ACAATACAAGCTGCTAAAGAAGTAGAACAAAGGAGGGCGGGTGGATATAGTCGGAAATAT
    GCCAGCAAGGCCAAGAATTTAGCTGACGACATGGTTAGGAATACAGCTAGAGACCTTTTA
    TACTATGCCGTCACCCAGGATGCCATGTTGATATTTGAAAATTTAAGTAGAGGCTTCGGT
    AGACAAGGTAAGCGCACCTTCATGGCAGAGAGACAATATACTAGAATGGAAGATTGGTTG
    ACTGCCAAATTGGCATACGAAGGTCTACCTAGTAAGACGTACTTATCTAAAACACTAGCG
    CAGTATACTTCCAAGACATGCAGTAATTGTGGTTTCACAATCACTTCTGCCGATTACGAT
    CGCGTCTTGGAAAAACTAAAAAAAACAGCGACAGGTTGGATGACTACTATTAATGGGAAA
    GAATTGAAGGTCGAAGGACAAATAACTTACTATAATAGATATAAACGGCAAAACGTTGTA
    AAAGACCTGTCAGTCGAACTCGATCGACTTAGTGAAGAATCTGTTAATAATGATATTAGT
    TCGTGGACAAAAGGTAGATCCGGTGAAGCTTTGAGCCTCCTGAAAAAACGTTTTAGCCAT
    AGGCCTGTCCAAGAAAAGTTTGTATGTTTAAACTGTGGTTTTGAGACCCATGCAGACGAG
    CAGGCCGCTCTTAATATTGCTAGATCATGGTTATTTTTAAGATCTCAGGAATACAAGAAG
    TACCAGACTAACAAGACAACAGGCAACACAGATAAGCGAGCATTCGTTGAGACTTGGCAA
    TCTTTTTATAGAAAGAAATTGAAGGAAGTCTGGAAACCA
    SEQ ATGGGAAAAATGTATTATCTAGGCCTGGACATAGGGACCAATTCAGTAGGCTACGCTGTC
    ID ACTGACCCCTCCTACCATTTGCTGAAGTTCAAGGGGGAACCCATGTGGGGAGCACACGTG
    NO: TTTGCGGCCGGCAACCAGAGCGCAGAGCGGAGAAGCTTCCGCACCTCCAGGAGAAGGCTG
    147 GATCGCAGGCAGCAGCGTGTGAAGCTGGTCCAAGAGATATTTGCCCCAGTGATTTCCCCC
    ATCGATCCGCGCTTCTTTATTAGGCTCCACGAGTCCGCTCTCTGGCGCGACGACGTGGCC
    GAAACTGATAAACATATTTTCTTTAATGACCCAACATACACTGACAAGGAGTACTATTCA
    GATTACCCAACAATTCACCATTTGATCGTGGACCTTATGGAAAGTTCGGAGAAGCATGAT
    CCTCGACTTGTCTATTTGGCCGTGGCGTGGCTCGTGGCACATAGGGGCCACTTCTTGAAC
    GAGGTGGACAAGGATAACATCGGGGATGTGTTATCTTTCGACGCTTTCTATCCTGAATTC
    CTTGCTTTTCTGTCTGACAATGGCGTCAGCCCGTGGGTCTGCGAATCCAAGGCCCTCCAG
    GCTACGCTATTGTCAAGAAATAGCGTGAACGACAAGTACAAGGCTCTTAAGTCTTTGATT
    TTTGGAAGCCAGAAGCCCGAGGACAACTTTGATGCAAATATCTCGGAGGACGGGCTGATT
    CAGCTCCTCGCTGGGAAAAAGGTCAAGGTCAATAAGCTGTTTCCACAGGAGTCAAATGAC
    GCGAGCTTCACCCTTAACGACAAAGAGGATGCCATTGAAGAGATCCTGGGGACACTCACC
    CCAGACGAGTGCGAGTGGATAGCCCATATTAGGCGCCTCTTTGATTGGGCCATAATGAAA
    CATGCGCTTAAGGACGGGCGCACGATATCCGAAAGCAAGGTCAAATTGTACGAGCAGCAC
    CACCATGATCTGACCCAGCTAAAATATTTTGTAAAAACATATCTGGCCAAGGAGTACGAT
    GATATCTTCCGCAACGTGGATAGTGAGACCACCAAAAACTACGTCGCGTACTCATACCAC
    GTGAAAGAAGTTAAGGGCACGCTGCCTAAGAACAAGGCAACACAAGAGGAGTTCTGCAAG
    TACGTTCTCGGGAAAGTTAAAAATATAGAGTGCAGCGAGGCCGACAAAGTGGATTTTGAC
    GAGATGATTCAACGCCTGACCGACAATTCGTTTATGCCTAAACAGGTGAGTGGAGAGAAT
    CGCGTGATTCCATATCAGCTCTATTACTATGAACTCAAGACTATTCTGAATAAGGCCGCT
    AGCTATTTACCCTTCCTTACGCAGTGCGGGAAGGATGCCATTTCTAACCAGGATAAACTC
    TTGAGTATAATGACATTTCGAATTCCCTATTTCGTGGGTCCGCTTCGTAAGGATAACAGT
    GAGCACGCTTGGCTGGAGCGGAAGGCTGGCAAAATTTATCCATGGAATTTCAACGACAAG
    GTGGATCTGGACAAATCCGAAGAAGCCTTTATCCGCAGGATGACCAATACTTGCACATAC
    TATCCTGGGGAGGATGTCCTTCCACTGGACTCTCTGATCTACGAAAAGTTCATGATTTTG
    AATGAAATTAACAACATAAGGATCGATGGGTATCCTATTTCCGTCGACGTGAAGCAGCAG
    GTGTTCGGGCTCTTTGAGAAGAAGCGACGGGTGACCGTGAAGGATATTCAGAATCTTCTC
    TTATCGCTGGGAGCCCTGGATAAACACGGAAAACTGACCGGGATAGATACTACGATTCAT
    TCTAATTACAACACGTATCACCATTTTAAGTCACTGATGGAGAGGGGCGTCCTAACAAGA
    GATGACGTGGAGAGAATAGTGGAACGAATGACATATTCTGATGACACCAAGAGAGTGCGG
    CTTTGGCTGAATAACAACTACGGCACTCTGACGGCGGATGATGTAAAGCATATTTCCCGA
    CTCCGTAAGCATGACTTCGGGCGGCTGTCTAAGATGTTTCTAACAGGCCTCAAGGGTGTG
    CATAAGGAAACTGGGGAGCGCGCTAGCATCCTGGATTTTATGTGGAACACCAATGATAAC
    CTGATGCAGCTCCTGTCAGAATGCTACACATTTTCGGACGAAATCACCAAGCTGCAGGAG
    GCTTACTATGCCAAGGCCCAACTAAGCTTGAATGATTTCCTGGATTCTATGTACATCAGC
    AACGCCGTAAAACGACCAATTTATAGGACACTGGCAGTGGTTAACGACATTAGGAAAGCA
    TGCGGAACAGCTCCCAAGCGAATCTTTATCGAGATGGCCCGCGACGGCGAGAGTAAGAAG
    AAAAGGTCAGTGACTAGGCGGGAGCAGATCAAGAACCTTTACCGCTCTATCCGAAAAGAC
    TTCCAGCAAGAGGTTGATTTCCTTGAGAAGATCTTAGAGAACAAGTCAGATGGACAGCTC
    CAATCCGATGCTCTGTATCTGTACTTCGCTCAGCTGGGACGAGATATGTACACTGGCGAC
    CCCATTAAACTAGAACATATCAAGGACCAATCGTTTTATAATATCGACCACATCTACCCT
    CAGTCCATGGTGAAAGACGATAGTCTGGACAATAAGGTGCTCGTCCAAAGTGAGATTAAC
    GGAGAAAAGTCGAGCAGATATCCTTTGGACGCTGCGATCCGCAACAAGATGAAGCCCCTG
    TGGGATGCTTACTACAATCATGGACTGATCAGCCTGAAGAAGTATCAGAGACTGACCCGG
    AGTACCCCTTTCACAGACGATGAGAAGTGGGATTTTATCAATAGACAACTGGTGGAAACC
    AGGCAGTCCACGAAAGCTCTGGCCATTCTTCTGAAGAGAAAGTTTCCAGACACAGAGATC
    GTCTATTCAAAGGCCGGCCTCAGTTCCGACTTTAGACATGAGTTCGGACTCGTTAAATCA
    CGAAATATAAACGATCTCCACCATGCAAAGGACGCATTCCTCGCGATTGTGACTGGAAAT
    GTCTATCACGAAAGATTTAATAGGCGGTGGTTCATGGTTAACCAGCCATACTCAGTGAAG
    ACCAAGACCCTTTTCACTCACTCTATTAAAAATGGCAACTTCGTGGCTTGGAATGGTGAG
    GAGGATCTTGGAAGAATTGTGAAGATGTTAAAACAGAATAAGAATACCATCCACTTTACT
    AGATTCAGCTTTGACCGAAAAGAGGGGCTATTCGATATTCAACCGTTAAAGGCTTCAACA
    GGTCTCGTTCCACGAAAGGCCGGACTGGACGTAGTGAAATACGGCGGCTATGATAAGAGC
    ACCGCAGCTTACTACCTCCTTGTGCGATTTACGCTCGAGGATAAGAAGACCCAACACAAG
    CTGATGATGATTCCCGTGGAGGGACTGTACAAAGCTCGAATTGACCATGATAAAGAGTTT
    CTCACAGATTACGCACAAACCACCATCTCTGAGATTCTCCAGAAAGACAAACAAAAAGTT
    ATAAACATAATGTTTCCAATGGGTACAAGGCATATTAAACTGAACAGCATGATCTCCATT
    GATGGCTTTTATTTGTCCATTGGAGGAAAGTCTAGTAAAGGCAAGTCTGTCCTCTGCCAT
    GCCATGGTACCCCTAATCGTCCCACACAAGATTGAATGCTACATCAAGGCTATGGAGAGT
    TTTGCTCGGAAATTTAAAGAGAATAATAAGCTGCGTATTGTGGAAAAATTCGACAAGATA
    ACCGTTGAAGACAATCTGAATCTGTACGAGCTCTTTCTGCAGAAGCTGCAGCATAACCCC
    TATAATAAGTTCTTCTCCACACAGTTCGATGTACTGACCAACGGGCGATCAACTTTCACA
    AAGCTAAGTCCTGAGGAACAGGTGCAAACACTCCTAAACATTCTTTCCATTTTTAAGACC
    TGCAGATCTTCAGGATGCGACTTGAAGAGCATTAACGGGAGCGCACAGGCAGCTAGGATC
    ATGATCTCAGCTGACCTGACAGGGCTGAGTAAAAAATACTCCGACATTCGGCTTGTAGAG
    CAAAGCGCCAGTGGGTTGTTCGTTAGTAAGTCGCAGAACCTGCTGGAATACCTGTAA
    SEQ ATGTCTTCTTTGACGAAGTTTACAAACAAATACTCTAAGCAGCTTACAATTAAGAACGAA
    ID CTGATTCCCGTAGGAAAGACTCTGGAAAACATCAAAGAGAATGGGCTGATAGACGGCGAC
    NO: GAACAACTGAATGAGAACTATCAGAAGGCCAAAATTATCGTGGATGACTTCCTGAGGGAT
    148 TTTATTAACAAGGCCCTGAATAATACCCAGATCGGCAATTGGCGGGAACTGGCCGACGCT
    CTGAACAAAGAAGATGAGGACAATATCGAAAAATTACAAGACAAAATCAGGGGCATTATT
    GTCAGTAAGTTCGAGACATTCGATCTGTTCTCTTCGTACTCCATTAAGAAGGACGAGAAA
    ATCATCGATGATGACAATGACGTTGAGGAAGAAGAACTGGACTTGGGTAAAAAGACCTCA
    TCCTTCAAGTATATTTTTAAAAAAAATCTGTTTAAATTAGTGCTCCCCAGTTATTTAAAG
    ACAACTAACCAGGACAAGCTTAAGATTATCTCCTCTTTTGACAACTTTAGCACCTATTTT
    AGAGGCTTCTTTGAAAATCGCAAGAATATTTTCACTAAGAAGCCCATAAGCACCTCTATT
    GCCTACAGAATCGTACATGATAACTTCCCAAAATTTTTGGATAACATTAGATGTTTTAAT
    GTATGGCAGACCGAATGTCCTCAGTTAATTGTGAAGGCGGATAACTACCTCAAATCCAAG
    AATGTGATCGCCAAAGATAAGTCTCTTGCTAACTACTTTACGGTCGGAGCCTACGATTAC
    TTCTTATCTCAAAACGGTATTGACTTTTACAATAACATTATCGGGGGATTGCCTGCCTTC
    GCCGGCCATGAGAAAATTCAGGGCTTAAACGAGTTCATAAATCAGGAATGTCAAAAGGAC
    TCAGAGCTGAAATCAAAGCTTAAGAATCGACACGCATTTAAAATGGCGGTCTTGTTCAAA
    CAGATCCTCAGCGATAGAGAGAAAAGCTTCGTTATTGATGAATTCGAGAGCGACGCACAG
    GTGATTGATGCCGTGAAGAACTTCTATGCGGAACAGTGTAAAGACAATAATGTTATTTTC
    AACCTATTAAACTTGATTAAGAATATCGCGTTTTTAAGTGACGATGAACTCGACGGTATC
    TTTATAGAAGGCAAGTACCTGTCCTCTGTCAGCCAAAAACTCTACTCAGATTGGTCCAAG
    CTAAGAAATGACATCGAGGACAGTGCTAACAGCAAACAGGGCAATAAAGAGCTGGCAAAG
    AAAATCAAGACTAATAAAGGGGATGTGGAGAAGGCGATATCTAAATATGAGTTCTCCCTC
    TCCGAACTGAACTCCATCGTCCACGATAATACCAAGTTTAGTGATCTGTTGTCGTGTACA
    CTGCACAAAGTGGCCAGTGAAAAACTCGTCAAGGTGAACGAAGGCGATTGGCCCAAACAC
    CTGAAAAATAATGAGGAGAAACAGAAGATCAAAGAACCTTTGGATGCGTTGCTCGAAATA
    TATAACACACTGTTGATCTTCAACTGTAAAAGCTTCAACAAGAACGGGAACTTTTATGTA
    GACTACGATCGATGTATAAATGAACTGAGCAGCGTCGTTTACCTGTACAACAAGACTCGC
    AATTATTGTACGAAAAAACCATATAACACCGATAAGTTCAAGCTTAATTTCAACAGTCCC
    CAGCTGGGAGAAGGGTTCAGCAAATCAAAAGAAAACGATTGCCTGACATTACTCTTTAAA
    AAGGATGATAATTATTATGTTGGGATTATTAGGAAAGGCGCTAAGATCAACTTTGACGAC
    ACACAGGCCATAGCTGACAACACTGATAACTGCATCTTTAAAATGAATTACTTTCTGTTG
    AAGGACGCCAAAAAATTCATTCCAAAATGCTCTATTCAGCTCAAGGAGGTTAAGGCCCAT
    TTCAAGAAGTCTGAAGATGACTACATCCTCTCTGACAAGGAAAAATTCGCTAGTCCTCTG
    GTTATCAAAAAAAGTACCTTCTTGCTGGCTACAGCTCACGTGAAAGGCAAGAAAGGGAAC
    ATTAAGAAGTTCCAAAAGGAATACAGCAAAGAGAATCCAACCGAGTACAGAAATTCTCTG
    AACGAATGGATCGCATTCTGTAAAGAATTTCTAAAGACGTACAAGGCCGCTACCATTTTC
    GATATTACCACCTTGAAAAAAGCCGAGGAGTACGCCGACATCGTCGAATTCTATAAAGAC
    GTGGATAACCTGTGTTACAAATTGGAATTCTGCCCAATTAAGACCTCTTTCATTGAAAAC
    CTCATCGACAATGGGGACCTCTACTTATTTAGAATTAACAATAAGGATTTTTCTTCGAAA
    TCTACCGGAACTAAAAATCTGCACACACTGTATCTGCAAGCAATCTTCGATGAACGTAAT
    CTCAACAACCCTACAATAATGCTGAACGGCGGTGCTGAACTGTTCTACCGTAAAGAGAGT
    ATTGAACAGAAGAATCGAATCACACACAAAGCGGGCAGTATTCTCGTCAATAAGGTGTGC
    AAAGACGGGACCAGCCTGGACGATAAGATCAGGAATGAAATATATCAGTATGAGAACAAG
    TTTATCGACACCTTGTCGGATGAGGCAAAGAAGGTGCTACCTAACGTTATCAAGAAGGAA
    GCTACCCATGACATAACCAAGGATAAGCGGTTCACTTCTGACAAGTTCTTCTTCCACTGT
    CCTCTGACCATTAACTACAAGGAAGGAGACACTAAACAATTCAATAATGAAGTACTTAGC
    TTTTTGCGGGGTAATCCCGATATTAACATAATTGGTATCGACCGGGGAGAACGGAACCTG
    ATATACGTGACAGTAATTAATCAGAAAGGAGAAATCCTGGATTCCGTATCCTTCAATACC
    GTGACTAATAAATCTAGTAAAATCGAGCAGACGGTCGACTACGAGGAAAAGTTAGCAGTC
    AGAGAGAAGGAGAGAATCGAGGCCAAACGTTCCTGGGATAGTATCAGCAAGATTGCTACT
    CTGAAAGAAGGATATCTGTCCGCTATCGTCCATGAGATCTGTTTGTTGATGATCAAGCAC
    AATGCTATAGTGGTTCTGGAGAACCTGAACGCAGGCTTCAAGCGAATTAGAGGGGGCCTG
    TCGGAAAAAAGCGTTTACCAGAAGTTTGAAAAGATGCTAATCAATAAGTTAAATTACTTT
    GTAAGTAAAAAAGAAAGCGATTGGAATAAGCCATCAGGACTTTTAAACGGGCTGCAACTG
    AGCGACCAGTTTGAGTCATTCGAAAAACTGGGTATTCAGAGTGGTTTCATATTCTACGTA
    CCTGCCGCTTACACTTCAAAGATCGATCCTACAACTGGTTTTGCGAATGTCCTGAATCTG
    TCTAAGGTGAGGAATGTGGACGCAATCAAGTCTTTCTTCAGCAACTTCAACGAGATATCT
    TACAGCAAGAAAGAGGCTCTGTTTAAATTCAGTTTTGATCTGGATAGCCTGAGCAAGAAA
    GGATTCTCTTCTTTCGTAAAGTTTTCTAAGTCCAAATGGAACGTCTACACGTTCGGAGAG
    AGAATCATTAAACCAAAGAACAAGCAGGGGTATCGGGAAGACAAAAGGATCAATCTGACT
    TTCGAAATGAAGAAACTATTGAATGAGTACAAAGTCTCATTCGATTTGGAGAACAATCTG
    ATCCCCAATCTGACCAGCGCTAACCTCAAAGACACATTCTGGAAGGAGCTGTTTTTCATC
    TTTAAGACCACCCTGCAGCTACGGAATAGTGTCACAAATGGGAAAGAGGATGTACTGATC
    TCACCTGTGAAAAACGCCAAGGGGGAGTTCTTTGTGTCCGGCACCCATAACAAAACCCTG
    CCTCAGGACTGTGACGCGAACGGGGCCTACCACATCGCGCTAAAGGGGTTAATGATTCTC
    GAACGTAATAATCTGGTGCGCGAAGAAAAAGACACAAAGAAAATTATGGCCATCAGCAAC
    GTTGACTGGTTTGAGTACGTGCAGAAGCGTCGAGGAGTTTTGTAA
    SEQ ATGAACAACTATGACGAGTTCACTAAACTTTACCCCATTCAGAAAACCATCAGATTTGAA
    ID CTGAAGCCTCAGGGTCGTACCATGGAACACTTGGAAACTTTCAACTTTTTCGAGGAGGAC
    NO: AGGGATAGAGCTGAGAAATACAAGATCTTGAAAGAGGCCATCGACGAGTATCACAAAAAA
    149 TTCATCGATGAGCATCTCACCAACATGTCGCTGGATTGGAACAGTCTCAAGCAGATTTCC
    GAGAAGTACTATAAATCTCGGGAGGAGAAAGATAAAAAGGTGTTTTTGAGCGAGCAAAAG
    CGAATGCGACAGGAGATAGTCTCTGAATTTAAGAAAGATGATCGGTTTAAAGACCTATTT
    TCCAAAAAGCTTTTTTCAGAGCTGCTGAAGGAAGAGATCTATAAAAAAGGCAATCACCAA
    GAAATTGATGCCCTGAAATCATTCGACAAATTCAGTGGGTATTTCATAGGACTGCATGAG
    AACCGGAAGAATATGTATAGTGATGGAGACGAGATCACAGCCATAAGCAATCGAATCGTT
    AACGAGAATTTCCCGAAGTTCCTGGATAACCTGCAGAAGTATCAAGAGGCTAGGAAAAAG
    TACCCTGAGTGGATCATCAAGGCTGAATCAGCTCTGGTGGCTCACAATATCAAGATGGAT
    GAAGTCTTTAGTCTTGAGTACTTTAATAAAGTCCTTAACCAGGAGGGCATCCAGCGCTAT
    AACCTGGCTCTCGGTGGCTACGTCACAAAAAGCGGAGAAAAGATGATGGGTCTCAACGAT
    GCACTGAATTTGGCTCATCAGTCGGAGAAGTCATCTAAGGGACGCATACACATGACACCA
    CTGTTTAAACAAATCCTGAGCGAAAAGGAATCATTTTCCTACATTCCCGACGTATTCACC
    GAGGACTCACAACTGCTGCCTAGTATAGGGGGGTTTTTCGCTCAGATAGAGAACGACAAA
    GATGGCAACATTTTTGACAGAGCCTTGGAGTTGATTTCATCTTACGCCGAGTACGATACG
    GAGCGCATTTATATTCGCCAGGCGGATATCAACAGGGTTTCCAATGTGATCTTTGGCGAG
    TGGGGAACGCTGGGCGGGCTGATGCGGGAATACAAAGCCGACTCGATCAATGACATCAAC
    CTGGAGAGAACATGCAAGAAGGTCGATAAATGGTTGGATAGCAAAGAGTTCGCCCTGAGT
    GACGTCTTGGAAGCTATCAAAAGAACCGGAAATAATGACGCGTTCAACGAGTATATCTCT
    AAAATGAGGACCGCGAGAGAAAAAATTGATGCAGCAAGGAAGGAGATGAAGTTTATATCT
    GAGAAGATCTCAGGCGATGAAGAGTCCATCCATATTATTAAAACTCTTCTGGACTCAGTG
    CAGCAATTCCTGCACTTTTTTAACCTCTTCAAGGCCAGGCAGGATATACCGTTAGACGGG
    GCTTTTTATGCCGAGTTTGATGAAGTTCATTCGAAACTTTTTGCTATAGTGCCTCTCTAT
    AATAAAGTTCGCAATTACCTGACAAAGAATAACTTAAACACAAAGAAAATCAAGCTCAAC
    TTCAAAAACCCAACACTGGCAAACGGATGGGATCAGAACAAGGTATATGATTACGCCTCA
    TTGATTTTCCTCCGGGACGGGAATTACTATCTGGGGATCATCAACCCTAAGCGCAAAAAG
    AACATTAAGTTCGAACAGGGATCTGGCAATGGTCCCTTCTATAGGAAAATGGTATACAAA
    CAGATTCCTGGCCCCAACAAGAATCTCCCACGCGTCTTTCTGACGTCCACTAAGGGAAAG
    AAGGAGTACAAGCCGTCTAAAGAAATTATCGAGGGCTATGAGGCAGACAAGCATATTAGG
    GGTGACAAGTTTGACCTAGACTTTTGTCATAAGCTTATCGACTTTTTCAAGGAGTCCATA
    GAGAAGCACAAAGATTGGTCAAAGTTTAATTTCTATTTTTCTCCAACAGAGTCCTACGGG
    GATATCTCTGAGTTCTATCTGGATGTTGAAAAGCAGGGGTACAGAATGCACTTCGAAAAT
    ATCTCAGCAGAAACTATCGATGAGTACGTAGAGAAAGGAGATCTGTTTCTTTTCCAAATC
    TACAATAAGGATTTTGTGAAGGCCGCCACTGGGAAGAAGGACATGCACACTATTTACTGG
    AACGCTGCATTTTCCCCTGAAAATCTGCAGGACGTAGTAGTGAAATTAAATGGTGAGGCA
    GAACTGTTTTACCGCGATAAATCAGACATCAAGGAAATAGTGCACCGGGAAGGCGAGATT
    CTTGTTAACCGAACATATAATGGCAGGACACCTGTCCCTGATAAAATTCATAAGAAACTG
    ACCGATTACCACAACGGTCGAACCAAGGATCTGGGCGAGGCCAAGGAATACCTCGATAAG
    GTGAGGTACTTCAAAGCCCATTATGACATCACCAAGGACCGAAGATACCTTAACGACAAA
    ATCTACTTCCATGTCCCACTCACCTTGAACTTCAAAGCTAACGGTAAGAAGAACCTCAAT
    AAAATGGTGATTGAAAAATTTCTGTCCGATGAGAAGGCCCATATCATCGGCATTGATCGC
    GGCGAGAGAAATCTCCTTTACTATTCTATCATTGATCGGTCGGGAAAGATTATCGACCAA
    CAATCACTGAATGTCATCGACGGATTCGACTATAGAGAGAAGCTGAACCAACGGGAAATC
    GAGATGAAGGACGCGCGCCAGTCCTGGAACGCTATCGGCAAAATTAAAGATTTGAAAGAA
    GGTTACCTCTCCAAAGCAGTGCACGAAATTACCAAAATGGCAATCCAGTACAATGCTATT
    GTGGTAATGGAGGAGTTAAATTACGGATTTAAGCGCGGGAGGTTCAAGGTTGAAAAGCAA
    ATTTACCAAAAATTTGAGAACATGTTGATTGATAAGATGAACTACCTGGTGTTCAAGGAC
    GCACCTGACGAGTCGCCAGGCGGCGTGTTAAATGCATATCAGCTGACAAATCCACTGGAG
    AGCTTTGCCAAGCTAGGAAAGCAGACTGGCATTCTCTTTTACGTCCCTGCAGCGTATACA
    TCCAAAATTGACCCCACCACTGGCTTCGTCAATCTGTTTAACACCTCCTCCAAAACCAAC
    GCACAAGAACGGAAAGAATTTTTGCAAAAGTTTGAGTCCATTAGCTACTCTGCCAAAGAC
    GGCGGGATCTTTGCTTTCGCATTCGACTACAGGAAATTCGGGACGAGTAAGACAGACCAC
    AAGAACGTCTGGACCGCGTACACTAATGGGGAACGCATGCGCTACATCAAAGAGAAAAAG
    AGGAATGAACTTTTTGACCCTTCAAAGGAAATCAAGGAAGCTCTCACCTCAAGCGGTATC
    AAATACGATGGCGGGCAGAATATTTTGCCAGATATCCTCAGATCGAACAATAATGGACTT
    ATCTATACTATGTACTCCTCCTTCATTGCAGCAATTCAAATGAGAGTGTACGATGGAAAG
    GAGGATTACATTATATCGCCAATTAAGAACTCCAAAGGCGAATTCTTCCGCACGGATCCT
    AAGCGAAGAGAACTCCCAATCGACGCTGATGCGAACGGCGCCTATAATATAGCCCTGCGG
    GGTGAATTAACAATGCGCGCTATTGCCGAGAAGTTCGACCCCGATTCAGAAAAAATGGCT
    AAGCTTGAGCTGAAACACAAAGATTGGTTCGAATTCATGCAGACAAGAGGCGACTAA
    SEQ ATGACTAAGACCTTCGATTCCGAGTTCTTCAACCTTTATTCCCTGCAGAAAACTGTAAGG
    ID TTTGAGCTGAAGCCGGTGGGCGAGACAGCCAGCTTCGTAGAGGATTTCAAGAATGAGGGT
    NO: CTCAAACGGGTAGTTAGTGAGGATGAGAGGAGAGCAGTGGACTATCAGAAGGTGAAAGAG
    150 ATCATCGATGACTATCACCGGGATTTCATAGAGGAGTCGTTGAATTACTTCCCTGAGCAA
    GTATCCAAAGACGCGCTGGAACAGGCCTTTCATCTTTACCAGAAACTGAAGGCAGCGAAG
    GTTGAGGAGCGGGAAAAGGCCTTGAAAGAGTGGGAAGCCCTGCAGAAAAAGCTCAGAGAA
    AAGGTTGTCAAATGCTTCAGCGACAGCAACAAAGCCAGGTTCAGTAGGATCGATAAGAAA
    GAACTGATCAAAGAAGACTTGATCAATTGGCTGGTTGCACAGAACCGGGAAGATGATATT
    CCCACCGTAGAGACCTTCAACAACTTCACAACTTACTTCACCGGCTTCCATGAGAATCGT
    AAAAACATCTACAGTAAAGATGATCATGCAACCGCCATCTCCTTCCGGTTGATCCACGAG
    AATCTCCCCAAGTTCTTTGACAACGTGATAAGTTTCAATAAGTTGAAAGAGGGATTTCCC
    GAACTCAAGTTCGATAAAGTGAAGGAGGATCTGGAAGTGGATTATGACCTTAAGCACGCT
    TTCGAGATAGAGTACTTCGTGAACTTTGTGACTCAGGCCGGCATCGATCAGTATAACTAC
    CTCCTCGGGGGTAAGACGCTCGAGGACGGTACTAAGAAGCAAGGAATGAATGAGCAAATT
    AATCTATTTAAACAGCAGCAGACCAGGGATAAGGCTAGACAGATCCCCAAGCTTATTCCT
    CTTTTTAAACAGATCCTAAGTGAAAGGACAGAAAGTCAAAGCTTCATACCTAAGCAATTT
    GAAAGTGATCAGGAGCTGTTTGACTCCCTGCAAAAGCTGCACAACAATTGCCAGGACAAG
    TTTACCGTGCTGCAGCAGGCTATCCTCGGACTGGCTGAGGCGGATCTTAAGAAGGTATTC
    ATTAAGACTAGCGACCTCAATGCCCTTAGTAACACCATCTTTGGAAATTACTCCGTTTTC
    AGCGATGCCCTCAATCTATACAAAGAGAGCTTGAAGACTAAAAAAGCTCAGGAAGCTTTT
    GAAAAATTACCGGCACATTCTATACACGACCTTATACAATACTTAGAGCAGTTCAACAGC
    AGCCTCGACGCTGAGAAACAGCAATCCACAGACACCGTCCTGAATTACTTCATCAAAACC
    GATGAACTGTACTCCCGATTTATCAAGAGCACTTCAGAAGCCTTCACGCAAGTTCAGCCT
    CTGTTCGAGCTGGAGGCACTGTCCAGCAAGAGACGACCGCCAGAGTCTGAAGACGAGGGA
    GCCAAGGGTCAAGAGGGGTTTGAACAGATAAAGCGAATTAAGGCTTACTTGGATACTCTC
    ATGGAGGCGGTGCATTTCGCTAAGCCTTTGTACCTGGTTAAAGGCCGAAAAATGATTGAG
    GGGCTAGATAAGGATCAGTCTTTTTACGAGGCTTTTGAAATGGCCTACCAGGAATTGGAA
    TCCTTGATCATTCCAATCTATAATAAAGCCCGGAGTTATCTGAGCAGGAAGCCCTTCAAA
    GCCGACAAGTTCAAAATAAATTTTGACAATAATACGCTACTGTCTGGTTGGGACGCTAAC
    AAGGAAACAGCCAATGCTTCCATCCTGTTTAAGAAAGACGGCCTGTACTACCTGGGAATT
    ATGCCAAAAGGCAAAACTTTTTTGTTCGATTACTTTGTGTCATCAGAGGATAGCGAGAAG
    TTAAAGCAAAGACGGCAGAAGACCGCCGAAGAAGCCCTCGCACAAGACGGAGAATCATAT
    TTCGAGAAAATTCGATATAAGCTCCTGCCTGGCGCATCAAAGATGTTGCCAAAAGTCTTC
    TTTTCCAACAAAAACATCGGCTTTTATAACCCCAGCGATGATATCCTTCGCATCCGGAAC
    ACCGCCTCACATACCAAAAATGGAACTCCACAGAAGGGCCACTCGAAGGTTGAATTCAAC
    CTTAACGATTGTCACAAAATGATTGATTTTTTTAAGAGCTCCATTCAGAAACACCCCGAA
    TGGGGGTCCTTTGGCTTCACCTTTTCTGATACTTCAGACTTCGAGGACATGTCCGCCTTC
    TACAGGGAGGTGGAGAACCAGGGCTATGTCATCTCCTTCGACAAAATAAAAGAGACATAC
    ATTCAGAGCCAGGTCGAGCAGGGAAATCTGTACCTGTTTCAGATCTATAACAAGGATTTC
    AGTCCCTATAGCAAGGGCAAGCCCAATTTACATACCCTGTACTGGAAGGCCCTGTTCGAA
    GAGGCAAACCTTAACAATGTAGTTGCTAAGCTGAATGGGGAAGCAGAGATCTTCTTCCGA
    AGGCACAGCATCAAGGCAAGCGACAAAGTTGTACATCCTGCTAACCAGGCCATCGATAAC
    AAGAACCCGCATACAGAAAAGACACAGTCAACCTTTGAATACGACCTCGTGAAGGACAAG
    AGGTACACACAAGATAAATTCTTCTTCCACGTGCCCATCAGCTTGAATTTTAAAGCGCAG
    GGAGTGAGCAAATTTAACGACAAGGTCAACGGCTTCCTGAAGGGAAACCCCGACGTGAAT
    ATCATCGGAATTGATCGCGGTGAAAGACATCTCCTCTACTTTACTGTGGTGAACCAGAAG
    GGTGAGATCCTAGTACAGGAGAGCCTGAACACCCTTATGAGTGATAAGGGCCATGTGAAT
    GATTACCAGCAGAAGCTGGACAAGAAGGAACAGGAAAGGGACGCAGCGCGGAAGTCCTGG
    ACCACTGTTGAGAATATCAAAGAACTGAAGGAGGGATATCTTAGCCATGTGGTACACAAA
    CTTGCACATCTGATTATCAAGTATAATGCCATAGTCTGCCTGGAAGACTTGAACTTCGGT
    TTCAAGCGAGGAAGGTTTAAAGTGGAGAAGCAGGTGTACCAGAAGTTTGAGAAAGCCCTT
    ATTGATAAGCTAAACTACCTTGTCTTTAAGGAAAAAGAACTCGGCGAAGTTGGCCACTAT
    TTAACCGCCTACCAACTAACCGCCCCTTTCGAGTCTTTTAAGAAACTGGGAAAGCAGAGC
    GGAATACTCTTCTATGTGCCTGCAGACTACACCTCTAAGATCGACCCCACTACCGGCTTT
    GTAAACTTTCTAGATCTCCGCTATCAGTCAGTAGAAAAAGCCAAACAGCTCTTGTCAGAT
    TTTAACGCCATCCGATTTAATTCCGTCCAAAATTACTTCGAGTTCGAAATCGACTATAAA
    AAACTTACCCCCAAGAGAAAGGTTGGGACGCAGTCTAAGTGGGTAATCTGCACTTACGGT
    GACGTGAGATACCAGAACCGCCGAAACCAGAAAGGTCATTGGGAAACCGAGGAAGTGAAT
    GTGACTGAGAAGCTCAAGGCCCTCTTCGCTAGCGACAGTAAAACAACAACAGTTATCGAT
    TACGCCAATGACGATAATCTTATAGACGTGATCTTGGAACAAGACAAAGCCTCTTTTTTT
    AAGGAATTGTTGTGGTTGCTGAAACTTACAATGACCCTTAGGCACAGCAAGATCAAATCA
    GAGGATGACTTCATCCTCAGCCCGGTGAAGAATGAACAGGGAGAGTTCTACGATTCACGG
    AAGGCTGGAGAGGTGTGGCCCAAGGATGCCGACGCGAACGGGGCCTACCACATAGCTCTA
    AAAGGTCTGTGGAACCTGCAACAAATCAATCAATGGGAGAAAGGTAAGACACTGAACCTG
    GCCATCAAAAATCAAGATTGGTTCTCATTCATCCAGGAAAAGCCTTATCAAGAGTGA
    SEQ ATGCATACGGGAGGCCTTTTATCAATGGACGCAAAAGAGTTCACCGGGCAGTATCCATTA
    ID TCTAAGACACTCCGCTTCGAGCTGAGGCCCATTGGCAGGACCTGGGACAACCTGGAGGCG
    NO: TCGGGCTACCTGGCTGAGGACAGACATCGCGCAGAATGCTATCCGAGAGCTAAGGAGCTT
    151 TTGGACGACAATCATCGCGCGTTCCTTAACCGGGTGCTCCCACAGATCGATATGGACTGG
    CACCCGATCGCTGAGGCTTTTTGCAAGGTCCATAAGAACCCTGGGAACAAAGAGCTCGCC
    CAGGACTACAACTTGCAGCTGAGCAAGCGACGGAAAGAGATTTCTGCCTACCTTCAAGAC
    GCCGATGGCTACAAAGGGCTCTTCGCAAAGCCCGCATTGGATGAGGCCATGAAAATCGCC
    AAGGAGAACGGGAATGAAAGTGACATCGAAGTTCTCGAAGCGTTTAACGGATTTAGCGTG
    TACTTTACCGGCTATCATGAGTCAAGGGAGAATATTTATAGCGATGAGGACATGGTCTCT
    GTGGCCTACCGGATTACCGAGGATAATTTCCCGAGGTTTGTTTCAAATGCACTAATATTC
    GACAAGTTAAATGAGAGCCACCCAGACATCATCTCGGAGGTCAGCGGCAACCTCGGAGTT
    GACGATATTGGCAAATACTTCGACGTGAGCAACTATAACAACTTCCTCTCACAGGCTGGC
    ATCGACGACTATAATCATATTATAGGCGGCCACACTACTGAGGATGGTCTCATTCAGGCA
    TTCAATGTAGTCTTGAATCTTAGGCACCAGAAGGACCCTGGGTTTGAAAAGATACAGTTC
    AAGCAGCTGTATAAGCAGATATTATCCGTGCGAACATCTAAAAGTTACATCCCCAAACAG
    TTTGATAACTCAAAGGAGATGGTGGATTGCATATGCGATTATGTGTCAAAAATTGAAAAG
    AGCGAGACTGTGGAGCGGGCTCTGAAGCTCGTCAGGAACATTAGCTCCTTTGACCTTAGA
    GGAATTTTCGTCAATAAAAAGAATCTGAGGATCCTGAGCAATAAGCTAATAGGAGATTGG
    GACGCCATAGAGACAGCATTGATGCATTCCAGCTCAAGCGAGAATGATAAGAAGTCTGTC
    TACGATAGCGCTGAAGCCTTCACGCTGGACGATATCTTCTCTTCCGTGAAAAAATTTAGT
    GATGCGTCCGCAGAAGATATCGGGAATCGAGCCGAAGATATCTGCAGGGTAATTTCAGAG
    ACCGCCCCTTTCATCAATGACCTGCGCGCCGTGGACCTGGATAGCCTGAATGACGATGGT
    TACGAAGCTGCAGTTTCTAAGATCAGGGAGTCTCTGGAGCCATATATGGACTTGTTTCAC
    GAACTTGAGATCTTTAGCGTGGGCGACGAGTTCCCGAAATGCGCAGCTTTCTATAGCGAG
    TTAGAGGAGGTCAGCGAGCAATTAATCGAGATCATACCCCTGTTTAATAAGGCACGGAGC
    TTTTGTACTCGCAAGCGCTACAGCACCGACAAGATTAAAGTTAATCTGAAATTTCCAACT
    CTCGCAGACGGGTGGGACCTAAACAAGGAACGCGATAATAAAGCCGCCATCCTTAGAAAG
    GACGGAAAGTACTATCTTGCCATCCTAGATATGAAAAAAGATCTGAGTTCCATTCGTACT
    AGCGATGAAGACGAATCTTCTTTCGAAAAAATGGAGTATAAGCTGCTCCCCTCGCCAGTC
    AAGATGCTACCCAAGATCTTTGTGAAGAGCAAAGCAGCCAAGGAAAAGTACGGGCTGACG
    GACAGGATGCTGGAGTGCTACGATAAGGGAATGCATAAATCAGGGTCAGCTTTTGACTTG
    GGCTTTTGCCATGAGCTAATCGATTACTACAAGCGCTGTATCGCCGAGTATCCAGGATGG
    GACGTTTTCGACTTTAAATTTCGGGAGACTTCTGATTATGGTTCAATGAAGGAGTTCAAC
    GAAGATGTCGCTGGTGCCGGTTACTACATGAGCCTTCGCAAGATTCCTTGTTCCGAAGTC
    TACCGGCTACTGGACGAGAAATCTATATATTTGTTCCAGATATATAACAAGGACTACAGT
    GAGAATGCACATGGGAATAAGAATATGCATACTATGTATTGGGAAGGTCTCTTTTCACCC
    CAAAATTTGGAGTCACCCGTGTTCAAACTTAGCGGTGGCGCAGAGCTGTTCTTTAGGAAA
    TCCAGTATACCCAATGACGCCAAGACAGTCCACCCAAAGGGTAGCGTCCTGGTGCCCAGA
    AACGATGTGAACGGCAGGAGAATCCCTGACAGCATTTACCGAGAACTTACCAGGTACTTC
    AACCGCGGCGACTGTAGAATCTCTGATGAGGCAAAGTCTTATCTGGATAAGGTGAAGACT
    AAGAAGGCAGATCATGACATTGTGAAAGACCGCCGCTTTACTGTCGACAAAATGATGTTT
    CACGTGCCTATCGCAATGAATTTTAAGGCAATCTCAAAACCGAATCTGAACAAGAAGGTG
    ATAGATGGCATTATCGATGACCAGGACCTCAAGATCATCGGAATCGACAGAGGTGAGCGA
    AACCTGATATACGTCACAATGGTAGATCGGAAGGGTAATATTCTGTACCAGGATTCACTA
    AACATCCTCAATGGATATGACTATCGAAAAGCTCTCGATGTCAGGGAATACGACAACAAG
    GAGGCGCGACGGAATTGGACAAAGGTGGAAGGCATACGGAAGATGAAGGAAGGCTATCTG
    TCACTAGCTGTCTCCAAATTGGCTGATATGATTATAGAGAACAACGCCATTATCGTGATG
    GAAGATCTCAACCATGGATTCAAGGCAGGAAGAAGTAAAATTGAGAAGCAGGTGTATCAG
    AAGTTCGAAAGCATGCTTATTAATAAGTTGGGTTATATGGTCTTAAAGGACAAGTCTATC
    GATCAGAGCGGCGGCGCACTCCATGGGTATCAGCTGGCTAACCATGTCACCACACTAGCA
    TCCGTAGGCAAACAGTGTGGCGTGATTTTCTACATTCCTGCTGCGTTCACTTCTAAGATC
    GATCCTACCACGGGATTCGCAGACCTGTTCGCACTGAGCAATGTTAAAAACGTGGCCTCC
    ATGAGGGAGTTCTTTAGCAAAATGAAAAGCGTGATTTATGACAAGGCCGAGGGCAAGTTC
    GCTTTCACATTTGACTACCTGGACTACAATGTGAAATCAGAGTGCGGGAGAACCCTGTGG
    ACCGTATACACGGTAGGGGAAAGATTCACTTACAGTCGAGTTAATCGGGAGTATGTCCGT
    AAAGTGCCAACTGACATCATCTACGATGCCCTTCAGAAGGCTGGCATAAGTGTTGAGGGG
    GATCTAAGGGACAGGATCGCTGAATCGGATGGCGATACTCTCAAATCAATCTTCTACGCC
    TTCAAGTATGCCCTCGACATGAGGGTAGAGAACCGGGAGGAGGACTATATACAGTCTCCC
    GTGAAGAATGCGTCGGGAGAGTTCTTCTGCTCAAAAAACGCCGGGAAATCTTTGCCGCAG
    GATTCTGATGCAAATGGGGCTTATAACATTGCTCTCAAAGGCATCCTGCAGCTGCGCATG
    CTATCTGAACAATATGACCCAAACGCTGAAAGCATTAGATTGCCATTGATCACCAATAAG
    GCTTGGCTGACTTTCATGCAGAGCGGTATGAAGACATGGAAAAACTAA
    SEQ ATGGATTCCCTTAAGGACTTCACAAATCTTTACCCCGTGAGTAAAACCCTGAGATTTGAA
    ID CTCAAGCCCGTGGGAAAGACTCTCGAGAATATCGAGAAGGCCGGGATTTTGAAGGAAGAC
    NO: GAGCATCGGGCGGAAAGTTACAGACGGGTGAAGAAGATTATAGATACTTATCACAAGGTC
    152 TTTATAGACAGCTCTTTAGAGAACATGGCAAAGATGGGCATCGAGAACGAAATCAAGGCC
    ATGCTGCAGTCCTTCTGCGAGCTGTATAAAAAGGATCATCGGACCGAAGGCGAAGACAAG
    GCGCTGGATAAGATCAGGGCAGTGCTGCGCGGCCTCATTGTGGGTGCCTTCACTGGGGTG
    TGCGGGCGGAGAGAGAACACTGTGCAGAATGAGAAATACGAGAGTTTGTTCAAAGAGAAA
    CTCATCAAGGAAATCCTGCCCGACTTCGTCTTAAGCACAGAAGCCGAATCTCTCCCATTT
    TCTGTCGAGGAGGCCACGCGTTCCCTTAAAGAGTTCGACAGTTTCACTTCATACTTTGCC
    GGATTTTATGAAAACCGTAAAAATATATACTCCACTAAACCACAGTCAACTGCAATAGCT
    TACAGGTTAATCCACGAAAACCTGCCAAAATTCATCGACAATATACTCGTCTTTCAAAAA
    ATCAAGGAACCAATCGCGAAGGAACTTGAACACATCCGGGCTGACTTTAGTGCGGGAGGA
    TACATCAAAAAAGACGAGCGCCTGGAGGATATATTTTCACTAAATTATTATATTCATGTA
    CTGAGCCAGGCTGGCATAGAAAAGTACAACGCTCTAATTGGGAAAATCGTGACAGAAGGT
    GACGGGGAAATGAAAGGGCTAAACGAACATATTAACTTATATAACCAACAGCGGGGTCGA
    GAAGATCGTCTGCCCCTGTTCAGACCTCTGTATAAGCAAATACTCTCCGACAGAGAGCAG
    CTATCATATCTGCCCGAGTCCTTTGAGAAAGATGAAGAGCTGCTCCGGGCGCTCAAGGAG
    TTCTATGATCATATAGCCGAGGACATTTTGGGCAGAACTCAGCAACTCATGACGTCTATT
    TCTGAATATGATCTGTCTCGTATCTATGTCAGGAATGATAGCCAGCTGACCGATATATCC
    AAGAAGATGCTGGGGGACTGGAACGCCATTTATATGGCGAGGGAGCGAGCATACGATCAC
    GAGCAGGCACCCAAGAGAATCACAGCCAAATATGAGAGAGACCGCATTAAGGCGCTGAAG
    GGCGAAGAAAGTATCAGTCTGGCCAATCTGAACTCCTGCATAGCTTTCCTTGATAACGTG
    AGGGATTGCAGAGTTGATACTTACCTGAGTACCCTGGGCCAGAAGGAAGGGCCTCACGGC
    CTCTCTAATCTAGTGGAGAATGTATTTGCCTCCTACCACGAAGCTGAGCAGCTGCTGTCA
    TTTCCGTACCCAGAGGAAAATAATTTAATACAGGATAAGGACAACGTAGTGCTTATCAAA
    AATCTACTGGATAACATTTCCGACCTCCAGCGCTTTCTCAAACCACTTTGGGGGATGGGC
    GACGAGCCTGATAAGGATGAGCGCTTTTACGGCGAGTACAACTACATCAGGGGCGCCTTG
    GACCAGGTGATTCCCCTCTATAATAAAGTCAGGAATTACCTGACCCGAAAGCCATACAGT
    ACAAGAAAGGTGAAATTAAATTTCGGCAATAGTCAGCTGCTGTCTGGTTGGGACCGAAAT
    AAGGAGAAAGACAACAGCTGCGTAATTCTCAGAAAAGGACAGAACTTTTATTTGGCCATC
    ATGAATAACAGACACAAGAGATCTTTCGAGAACAAAGTGCTCCCTGAGTATAAGGAGGGG
    GAACCCTACTTCGAGAAGATGGACTATAAATTCCTTCCTGATCCAAATAAAATGCTGCCT
    AAAGTATTTCTGTCAAAAAAAGGTATAGAAATCTACAAACCTTCACCTAAGCTACTTGAA
    CAGTATGGCCACGGCACCCATAAAAAAGGGGACACGTTCAGCATGGACGACCTACACGAA
    CTGATTGACTTCTTTAAGCACAGCATAGAAGCTCATGAGGACTGGAAACAGTTCGGATTC
    AAATTCTCAGATACCGCGACCTACGAAAACGTGTCTAGTTTTTACCGGGAAGTCGAGGAC
    CAGGGCTACAAGCTCAGCTTCAGAAAAGTTAGCGAATCTTACGTCTACTCCCTTATAGAT
    CAAGGTAAGCTGTATCTCTTTCAAATCTACAACAAGGACTTTTCCCCATGTAGCAAGGGC
    ACCCCCAATCTGCACACTCTCTACTGGCGGATGCTGTTCGACGAGCGTAACCTGGCAGAC
    GTGATCTACAAATTAGATGGTAAAGCTGAGATCTTCTTTCGTGAAAAGAGCCTAAAGAAC
    GATCACCCCACTCACCCCGCCGGAAAGCCCATTAAGAAGAAAAGTAGGCAGAAGAAAGGA
    GAAGAATCGCTATTTGAGTACGACCTCGTCAAGGATCGGCATTATACAATGGATAAGTTC
    CAGTTCCATGTGCCAATAACTATGAATTTCAAGTGCAGTGCTGGCAGTAAGGTGAATGAC
    ATGGTAAACGCTCATATCCGGGAGGCAAAGGACATGCATGTTATTGGAATTGATAGGGGT
    GAGCGTAATCTCCTCTACATCTGTGTTATTGACTCCCGCGGCACAATCCTCGATCAGATT
    TCCTTGAATACAATTAATGATATAGACTACCATGACTTGCTTGAGTCTCGCGACAAAGAT
    AGACAGCAGGAGAGAAGAAATTGGCAGACCATCGAAGGCATCAAGGAACTCAAGCAAGGC
    TACCTTTCTCAGGCAGTGCATCGAATAGCCGAGCTGATGGTGGCTTATAAAGCCGTCGTG
    GCACTAGAAGACCTAAATATGGGATTTAAACGAGGCAGGCAGAAGGTGGAATCATCCGTA
    TACCAGCAGTTCGAAAAACAGTTGATAGACAAACTCAATTACCTTGTAGACAAGAAGAAG
    CGGCCTGAGGACATAGGGGGCCTGCTTAGAGCGTATCAATTTACAGCCCCATTCAAGTCT
    TTCAAAGAAATGGGTAAACAGAACGGTTTTCTGTTTTACATCCCAGCGTGGAACACCAGC
    AATATAGATCCAACCACTGGCTTCGTCAATCTGTTTCATGCTCAGTATGAAAATGTGGAC
    AAGGCCAAATCCTTCTTTCAGAAATTTGACAGCATCTCCTATAACCCAAAGAAAGACTGG
    TTTGAATTCGCCTTTGACTATAAGAATTTCACTAAGAAGGCCGAGGGATCAAGAAGCATG
    TGGATATTGTGCACGCATGGCTCACGTATAAAGAACTTTAGAAACTCGCAAAAAAACGGG
    CAGTGGGACTCAGAAGAATTCGCACTCACCGAGGCTTTCAAATCCCTCTTCGTCCGGTAT
    GAGATCGATTACACCGCCGATCTGAAGACGGCAATCGTCGACGAGAAACAGAAAGACTTC
    TTTGTAGATCTACTTAAGCTCTTTAAGCTAACCGTTCAGATGCGAAACAGTTGGAAAGAA
    AAGGATCTCGACTATCTCATTAGTCCAGTGGCTGGCGCGGATGGTAGATTTTTCGATACC
    CGGGAAGGTAACAAGTCCCTTCCCAAAGACGCCGACGCGAATGGTGCCTACAATATTGCA
    CTAAAGGGGCTCTGGGCGCTGCGGCAAATTAGACAGACATCTGAAGGGGGCAAGCTTAAG
    CTGGCTATTTCTAATAAAGAGTGGTTGCAGTTTGTGCAGGAAAGGAGTTATGAGAAGGAC
    TAG
    SEQ ATGAACAACGGCACCAACAACTTCCAGAACTTCATCGGCATATCGTCTCTGCAGAAAACA
    ID CTTAGGAATGCCCTGATTCCAACTGAGACAACACAGCAGTTTATTGTGAAGAATGGGATC
    NO: ATCAAAGAGGACGAATTGCGCGGGGAGAATAGGCAGATCCTGAAGGACATCATGGACGAT
    153 TACTACAGGGGTTTTATCTCCGAAACGCTGAGCTCGATTGACGATATTGACTGGACGTCC
    CTCTTTGAGAAGATGGAAATCCAACTTAAAAATGGCGATAATAAAGATACCCTGATAAAG
    GAACAAACCGAATATAGAAAGGCTATACACAAAAAATTCGCAAATGACGACCGCTTTAAG
    AACATGTTTTCTGCAAAACTGATTAGCGATATTCTGCCCGAGTTTGTGATTCACAATAAT
    AACTATTCCGCTTCGGAGAAGGAGGAAAAGACTCAGGTGATTAAACTGTTTTCTCGGTTC
    GCCACTTCTTTCAAAGATTATTTCAAAAATCGCGCCAACTGTTTTTCCGCTGACGACATC
    TCCTCCTCTTCCTGCCACCGGATCGTAAACGACAATGCCGAGATCTTTTTTAGTAACGCC
    CTTGTGTATCGGAGGATAGTGAAGAGCCTGTCCAATGATGACATAAACAAAATTTCTGGC
    GATATGAAGGATAGCCTCAAAGAGATGAGCCTTGAAGAAATTTACTCCTACGAGAAGTAT
    GGGGAGTTCATCACCCAGGAGGGGATTTCCTTCTATAATGACATCTGTGGCAAGGTGAAC
    AGCTTCATGAACCTGTACTGCCAGAAGAATAAGGAAAACAAAAATCTGTACAAGCTTCAG
    AAGTTACATAAGCAGATCCTGTGTATCGCGGATACCTCATATGAGGTTCCTTATAAGTTC
    GAGAGTGATGAAGAAGTGTACCAGTCTGTAAATGGATTCTTAGACAATATTTCGTCCAAA
    CATATAGTGGAGAGACTGAGAAAGATCGGGGACAATTACAATGGGTACAATCTCGACAAG
    ATTTATATCGTGTCGAAGTTTTACGAATCTGTGAGCCAGAAAACATACAGGGATTGGGAA
    ACCATTAATACCGCGCTTGAAATTCACTACAATAATATTCTGCCTGGCAACGGAAAAAGC
    AAGGCCGATAAGGTAAAAAAGGCAGTCAAAAATGACCTTCAGAAAAGTATCACCGAAATC
    AATGAGTTGGTGAGCAACTACAAATTGTGTTCAGACGATAATATTAAAGCGGAAACGTAC
    ATACATGAAATTAGCCATATTCTGAATAACTTTGAGGCGCAGGAACTTAAGTACAACCCT
    GAAATTCATCTCGTCGAAAGCGAATTGAAGGCCTCTGAATTGAAAAACGTTCTTGACGTG
    ATAATGAACGCTTTCCATTGGTGCTCTGTGTTTATGACTGAAGAGCTGGTTGATAAGGAC
    AACAACTTTTATGCTGAACTTGAGGAAATCTACGACGAGATCTACCCTGTGATTAGCTTG
    TATAACCTCGTCAGAAACTACGTTACCCAGAAGCCGTACAGCACGAAAAAAATAAAGCTG
    AACTTTGGTATTCCGACTCTCGCCGATGGATGGAGCAAGTCGAAGGAATATTCCAACAAT
    GCCATCATTCTTATGCGAGACAATCTGTATTACCTCGGCATCTTTAACGCCAAAAACAAG
    CCGGATAAGAAAATCATTGAAGGGAATACGAGCGAGAATAAGGGCGACTATAAGAAAATG
    ATCTACAACTTACTGCCAGGTCCCAATAAAATGATTCCTAAGGTGTTTCTGTCATCGAAA
    ACAGGTGTAGAAACATATAAGCCCAGCGCATACATCCTGGAAGGCTACAAGCAAAACAAA
    CACATCAAAAGCAGCAAGGACTTTGATATCACATTCTGCCACGATCTAATCGACTACTTC
    AAAAATTGCATCGCCATTCACCCTGAGTGGAAGAACTTCGGCTTTGACTTCTCCGACACC
    AGTACCTACGAAGACATTTCTGGATTCTACCGTGAGGTTGAGCTGCAGGGTTATAAAATT
    GACTGGACATACATCAGTGAAAAAGACATCGATCTACTGCAGGAGAAGGGGCAGCTCTAT
    CTCTTCCAGATTTATAATAAGGATTTCAGCAAGAAGTCCACTGGAAACGACAATCTGCAT
    ACAATGTATCTTAAGAACTTGTTTAGCGAAGAGAATTTGAAAGATATCGTTCTAAAGTTA
    AACGGGGAAGCCGAGATTTTCTTTCGAAAGTCTTCCATTAAGAATCCAATTATTCACAAG
    AAGGGCAGTATCCTGGTCAACAGAACCTATGAGGCCGAGGAAAAGGACCAGTTCGGTAAT
    ATACAAATTGTGCGCAAGAACATCCCCGAGAACATTTACCAGGAGCTCTATAAATACTTC
    AACGACAAAAGCGATAAGGAGCTTTCCGACGAGGCTGCCAAGCTGAAAAACGTGGTGGGA
    CACCATGAAGCAGCCACCAACATCGTCAAAGATTATCGTTATACATATGACAAATATTTT
    CTGCACATGCCTATTACAATAAACTTTAAGGCAAACAAGACCGGGTTCATCAATGACCGG
    ATACTCCAGTACATCGCAAAAGAGAAGGACCTGCATGTGATCGGCATCGACCGCGGTGAA
    AGAAATCTCATTTACGTCAGCGTTATCGACACTTGTGGAAACATTGTGGAGCAGAAGTCC
    TTCAACATTGTTAACGGCTATGACTATCAGATCAAGCTCAAACAGCAGGAAGGTGCTCGT
    CAGATTGCGAGGAAAGAATGGAAAGAGATCGGCAAGATCAAGGAGATCAAAGAAGGGTAT
    CTGAGCTTGGTCATTCACGAGATCTCCAAAATGGTCATCAAGTACAACGCTATTATCGCG
    ATGGAAGACCTCTCTTACGGCTTTAAGAAGGGGCGCTTTAAAGTGGAGCGCCAGGTCTAT
    CAGAAGTTCGAGACTATGCTTATCAATAAGCTGAATTACTTGGTCTTTAAGGATATCAGT
    ATCACCGAGAACGGAGGACTGCTGAAAGGTTACCAGCTCACATATATTCCCGATAAGCTC
    AAGAATGTGGGCCACCAATGCGGTTGTATTTTTTACGTTCCAGCTGCCTACACATCTAAG
    ATCGATCCTACCACCGGATTCGTCAATATATTTAAATTTAAAGATCTAACCGTTGATGCC
    AAGCGTGAGTTTATTAAGAAATTTGATTCAATCAGGTACGACAGCGAAAAGAACCTCTTC
    TGTTTCACTTTCGACTACAACAACTTCATCACACAAAATACTGTGATGAGCAAGTCATCA
    TGGAGCGTTTATACTTATGGTGTAAGGATAAAAAGGCGCTTTGTTAATGGAAGGTTTTCC
    AATGAAAGCGATACAATAGACATCACAAAAGACATGGAGAAGACACTGGAGATGACAGAT
    ATTAATTGGAGGGACGGGCATGACCTTAGACAGGACATCATCGACTACGAAATCGTCCAA
    CACATTTTTGAGATATTCAGACTCACTGTCCAGATGCGAAACAGCCTGTCGGAACTCGAA
    GACCGGGACTACGATAGACTGATCTCCCCGGTGTTAAACGAAAATAATATTTTCTACGAT
    TCTGCTAAGGCAGGAGACGCTCTTCCTAAAGATGCGGACGCCAATGGCGCTTACTGTATA
    GCGTTGAAGGGATTGTATGAGATTAAACAGATCACTGAGAATTGGAAAGAAGACGGTAAA
    TTCTCCAGAGACAAGCTGAAAATCTCCAACAAAGACTGGTTTGATTTTATTCAAAATAAG
    CGCTACCTGTAA
    SEQ ATGACAAACAAATTTACTAATCAGTACAGCCTGTCAAAGACCCTCCGCTTCGAACTGATT
    ID CCACAAGGGAAGACCCTTGAATTCATCCAGGAAAAGGGTTTATTATCCCAGGATAAACAA
    NO: CGCGCAGAAAGCTATCAAGAGATGAAGAAGACGATCGATAAATTTCATAAGTATTTCATA
    154 GATTTAGCCCTGAGCAACGCTAAATTGACCCACCTGGAAACCTATTTGGAGCTGTACAAC
    AAGTCAGCCGAGACAAAGAAAGAGCAGAAGTTTAAGGACGACCTGAAAAAAGTACAGGAC
    AATTTGCGAAAAGAGATCGTCAAGTCTTTTTCCGACGGAGACGCCAAGTCAATATTTGCC
    ATCCTGGACAAAAAGGAACTCATCACTGTGGAGTTGGAGAAGTGGTTTGAGAATAATGAG
    CAGAAGGACATCTATTTTGACGAAAAGTTCAAGACATTTACTACTTACTTCACCGGATTT
    CACCAAAACCGGAAGAACATGTACTCTGTTGAGCCGAACTCAACCGCCATCGCCTACCGC
    CTTATTCACGAAAATCTGCCAAAGTTTCTCGAGAATGCTAAAGCCTTTGAGAAAATTAAG
    CAGGTCGAGTCGCTCCAGGTGAACTTTCGAGAGCTGATGGGTGAATTCGGGGACGAGGGC
    CTGATTTTCGTGAATGAACTCGAAGAGATGTTTCAGATCAACTACTATAATGATGTACTC
    TCACAGAACGGGATCACTATCTACAACAGCATTATCTCTGGATTCACTAAGAACGATATC
    AAGTATAAAGGGCTGAATGAATACATCAACAATTATAATCAGACTAAGGACAAAAAGGAC
    AGGCTGCCTAAATTGAAACAGCTGTATAAGCAGATCCTCAGTGATAGAATTAGCTTGTCA
    TTTCTCCCAGATGCCTTCACTGACGGAAAGCAGGTGCTTAAGGCGATATTCGATTTCTAT
    AAGATCAACCTCCTCTCTTATACAATCGAGGGCCAGGAGGAGTCACAGAACCTCCTGCTC
    CTGATTCGACAAACTATTGAAAATCTGTCCTCTTTCGATACGCAGAAGATATACCTGAAA
    AATGACACCCATCTCACTACAATATCCCAACAGGTATTCGGAGATTTCTCCGTCTTCAGT
    ACAGCCCTGAATTACTGGTACGAGACAAAGGTGAACCCTAAGTTCGAAACAGAGTACAGC
    AAGGCGAACGAAAAGAAGAGGGAGATCCTGGACAAAGCCAAAGCCGTTTTCACCAAGCAA
    GATTACTTTAGCATCGCATTTCTGCAGGAAGTCCTGTCTGAGTACATACTGACACTCGAT
    CACACAAGCGACATAGTTAAGAAGCACTCTTCCAATTGTATCGCGGACTACTTCAAAAAT
    CATTTTGTCGCGAAAAAGGAGAACGAGACAGATAAGACCTTCGATTTTATCGCGAATATT
    ACCGCAAAGTATCAATGCATTCAGGGTATCTTGGAGAACGCCGACCAGTACGAAGACGAG
    CTTAAACAGGATCAGAAGCTCATCGACAACCTAAAGTTCTTTTTGGACGCTATACTGGAA
    CTCCTTCATTTTATTAAGCCACTACATCTGAAGAGTGAGTCTATCACTGAGAAGGACACT
    GCTTTTTACGACGTTTTCGAGAATTACTACGAAGCACTGTCTCTGCTAACCCCTCTGTAT
    AACATGGTGAGAAACTATGTGACACAGAAACCTTATAGTACCGAGAAGATTAAGTTGAAC
    TTCGAGAACGCACAATTGCTGAATGGGTGGGATGCAAACAAAGAGGGTGATTACCTCACA
    ACAATCCTCAAGAAAGATGGCAATTACTTCCTGGCCATTATGGATAAAAAACATAACAAG
    GCATTTCAGAAATTTCCCGAGGGGAAGGAAAATTATGAAAAGATGGTATACAAGTTGCTG
    CCCGGGGTGAACAAAATGCTCCCGAAGGTGTTTTTCTCGAATAAGAATATCGCGTACTTT
    AACCCGTCCAAGGAACTGTTGGAAAATTATAAAAAGGAAACACACAAGAAGGGGGACACT
    TTTAATTTGGAGCACTGCCACACACTCATTGACTTCTTTAAAGATAGTCTCAACAAACAT
    GAGGATTGGAAATATTTTGACTTTCAGTTTAGCGAGACCAAGTCTTATCAGGATCTGTCG
    GGATTTTATAGGGAAGTTGAGCACCAGGGTTACAAGATAAATTTCAAGAACATCGATAGC
    GAGTACATTGACGGACTGGTGAACGAAGGGAAGCTGTTCCTGTTTCAGATTTACAGCAAA
    GATTTCTCTCCTTTCTCAAAAGGCAAGCCGAACATGCATACCCTGTATTGGAAGGCCCTG
    TTCGAGGAGCAAAACCTTCAGAATGTGATTTACAAGCTGAACGGTCAGGCCGAGATTTTT
    TTTAGGAAGGCCTCTATCAAGCCCAAAAACATCATTCTGCACAAGAAAAAGATAAAGATC
    GCCAAAAAACACTTCATTGATAAAAAGACAAAGACTTCTGAGATCGTACCTGTTCAGACA
    ATCAAGAATCTCAACATGTATTATCAGGGGAAGATTAGCGAGAAAGAGCTGACACAGGAC
    GATTTGAGGTACATCGACAACTTCTCTATCTTTAACGAGAAGAACAAGACAATCGATATC
    ATCAAGGACAAGCGGTTTACCGTCGATAAATTCCAGTTCCATGTGCCTATCACGATGAAT
    TTCAAGGCCACCGGTGGGAGTTATATCAACCAGACTGTGCTGGAGTATCTGCAGAACAAC
    CCCGAAGTAAAAATTATTGGCCTGGACAGAGGAGAGCGGCATCTGGTGTACTTGACCCTC
    ATCGATCAGCAGGGAAATATCCTGAAACAAGAATCTCTGAATACTATTACGGACTCCAAA
    ATCAGCACACCTTACCACAAGCTGCTTGATAATAAAGAGAATGAGAGGGACTTGGCCCGC
    AAAAATTGGGGCACCGTCGAGAATATTAAGGAATTGAAAGAAGGATACATCTCACAGGTG
    GTTCACAAAATCGCAACCCTGATGTTAGAAGAGAACGCTATTGTGGTGATGGAGGACTTA
    AACTTCGGATTTAAAAGAGGAAGATTTAAAGTCGAGAAACAGATTTATCAGAAACTGGAA
    AAAATGCTCATTGACAAATTAAATTACCTGGTGCTGAAAGATAAACAGCCACAGGAGCTG
    GGTGGCCTGTATAATGCTCTGCAGCTGACCAACAAGTTCGAGTCGTTTCAGAAAATGGGC
    AAGCAGTCAGGCTTCCTTTTTTACGTGCCCGCTTGGAACACCTCAAAAATCGACCCTACA
    ACAGGCTTTGTGAATTATTTCTATACCAAGTATGAAAACGTGGACAAGGCAAAGGCCTTT
    TTCGAGAAGTTTGAAGCAATCAGGTTCAATGCCGAGAAAAAATACTTTGAGTTCGAGGTC
    AAAAAATATAGCGACTTCAACCCTAAGGCCGAAGGCACGCAACAAGCCTGGACAATATGC
    ACGTATGGGGAGAGAATTGAGACTAAGCGGCAGAAGGATCAGAATAACAAATTCGTGAGC
    ACACCGATTAACCTGACAGAGAAGATAGAGGACTTCCTCGGCAAGAATCAGATCGTGTAC
    GGCGACGGCAATTGCATCAAGTCACAAATTGCATCTAAAGATGACAAAGCATTCTTCGAA
    ACACTGCTGTATTGGTTCAAGATGACACTCCAGATGCGAAATAGCGAAACAAGAACAGAT
    ATTGACTACCTCATCAGCCCTGTGATGAATGATAACGGCACGTTTTACAATTCCCGGGAC
    TATGAAAAATTAGAGAACCCGACACTGCCAAAAGACGCCGACGCAAATGGTGCATATCAC
    ATCGCAAAGAAAGGTTTGATGCTGTTGAACAAAATTGATCAGGCTGATCTGACAAAAAAG
    GTCGATCTGAGTATCAGTAACCGCGACTGGTTGCAGTTTGTCCAGAAGAACAAATAA
    SEQ ATGGAACAAGAGTACTATCTGGGCCTGGACATGGGCACCGGGAGTGTCGGATGGGCAGTC
    ID ACCGACTCAGAGTACCACGTCCTCAGAAAGCACGGTAAGGCACTTTGGGGAGTGCGACTC
    NO: TTCGAGTCCGCTAGTACTGCTGAAGAGAGGAGGATGTTTCGAACTTCCAGGCGCAGGCTG
    155 GATCGGCGAAACTGGAGAATAGAGATTCTCCAGGAGATATTTGCTGAAGAGATTTCAAAG
    AAGGATCCTGGTTTTTTCCTGCGCATGAAAGAATCTAAGTATTACCCCGAAGATAAACGC
    GACATCAACGGCAATTGTCCTGAACTGCCCTATGCTCTGTTTGTCGACGACGATTTCACC
    GACAAAGATTACCACAAGAAATTCCCCACCATATACCACCTGAGAAAGATGTTGATGAAC
    ACCGAGGAGACACCCGACATACGTCTGGTTTACCTGGCTATCCATCATATGATGAAGCAC
    CGCGGGCATTTCCTGCTGTCTGGAGACATCAATGAGATAAAGGAATTTGGTACTACGTTC
    TCCAAGTTGTTAGAAAACATTAAGAATGAAGAGTTGGACTGGAATCTTGAACTGGGAAAG
    GAAGAGTATGCAGTTGTAGAGTCGATTTTGAAAGATAACATGTTAAACCGGTCAACTAAG
    AAAACCAGGTTAATTAAGGCACTAAAGGCCAAATCGATATGCGAGAAGGCTGTGCTAAAT
    CTGCTGGCTGGAGGCACCGTGAAACTGTCTGATATTTTCGGCCTGGAAGAGCTCAATGAA
    ACCGAGCGGCCTAAAATTTCTTTCGCCGATAACGGATACGATGACTATATTGGGGAGGTG
    GAAAACGAGCTCGGAGAACAATTCTACATTATTGAAACCGCTAAGGCAGTCTATGACTGG
    GCCGTGCTCGTCGAGATTTTAGGCAAGTACACCAGCATTAGCGAAGCAAAGGTGGCTACC
    TATGAAAAGCACAAATCTGACCTCCAGTTTCTGAAAAAGATTGTGCGCAAATACTTAACA
    AAAGAAGAGTACAAGGACATCTTTGTGAGCACATCAGATAAGCTCAAGAATTACTCAGCA
    TACATTGGAATGACAAAGATTAACGGGAAGAAGGTGGATCTCCAAAGCAAACGTTGTTCA
    AAGGAGGAGTTTTACGATTTCATAAAGAAGAACGTGCTGAAGAAACTGGAGGGACAACCG
    GAGTACGAGTATTTAAAGGAGGAGCTCGAGCGAGAAACTTTCCTGCCCAAGCAAGTGAAC
    AGAGACAATGGTGTCATTCCTTACCAGATTCACTTATATGAGCTGAAGAAAATCCTGGGG
    AACTTGAGAGACAAGATAGACCTCATCAAGGAAAATGAAGATAAGTTGGTCCAGTTGTTC
    GAATTCAGAATCCCATATTACGTCGGCCCGCTCAATAAGATCGACGACGGCAAGGAAGGC
    AAATTCACTTGGGCGGTGCGAAAAAGCAACGAAAAAATATACCCATGGAACTTTGAGAAC
    GTCGTTGACATCGAGGCCAGCGCCGAGAAATTTATAAGACGCATGACTAATAAGTGTACT
    TACCTCATGGGCGAGGATGTTCTGCCCAAGGACAGCCTGCTGTATTCCAAGTACATGGTG
    CTTAACGAGCTGAATAATGTAAAGTTAGATGGTGAGAAGCTCAGCGTGGAGCTTAAACAG
    AGGCTGTACACTGATGTGTTTTGCAAGTATCGGAAAGTTACCGTTAAGAAGATAAAGAAT
    TACCTGAAATGCGAAGGGATCATTTCCGGCAACGTGGAAATTACCGGAATCGACGGCGAT
    TTTAAGGCGTCGTTGACCGCTTATCATGATTTCAAGGAGATTTTAACCGGCACGGAGCTC
    GCGAAGAAAGACAAGGAGAACATAATCACGAATATAGTTCTGTTTGGGGACGATAAAAAA
    CTTCTTAAAAAACGACTCAATCGACTGTATCCGCAGATTACCCCCAACCAGCTGAAGAAG
    ATTTGCGCTCTGAGCTATACCGGGTGGGGCCGGTTCTCTAAGAAATTCCTCGAGGAGATC
    ACAGCACCAGACCCAGAGACTGGTGAGGTGTGGAATATTATTACAGCTCTGTGGGAATCC
    AATAATAACCTTATGCAATTGTTGAGCAATGAATATAGGTTCATGGAGGAAGTGGAAACC
    TACAATATGGGCAAGCAGACAAAGACCCTATCTTACGAGACCGTTGAGAATATGTATGTC
    TCCCCTTCAGTGAAACGGCAAATCTGGCAAACTTTGAAGATCGTGAAGGAGCTCGAAAAG
    GTGATGAAAGAGAGCCCGAAGAGGGTTTTTATTGAAATGGCCAGAGAGAAACAGGAGAGC
    AAGAGAACAGAGTCTAGGAAGAAGCAGCTAATCGATTTGTATAAAGCCTGCAAGAACGAG
    GAAAAAGACTGGGTCAAGGAGCTAGGCGATCAGGAAGAACAGAAGTTGCGCTCTGATAAG
    CTGTACTTATATTATACCCAGAAAGGACGGTGCATGTACTCAGGTGAGGTCATTGAGCTG
    AAAGATCTGTGGGACAATACTAAGTATGATATTGATCACATCTACCCTCAGTCAAAAACT
    ATGGACGACTCCCTCAACAACAGGGTGTTGGTTAAGAAGAAATACAATGCTACAAAGTCC
    GATAAATACCCTCTTAACGAAAACATCCGGCACGAAAGAAAGGGCTTCTGGAAGTCCCTG
    CTGGATGGGGGTTTTATCAGTAAAGAAAAGTATGAGAGGCTGATCCGAAATACCGAGCTC
    TCCCCCGAGGAACTGGCTGGCTTTATCGAAAGGCAGATCGTAGAGACTAGGCAATCTACA
    AAGGCAGTCGCTGAGATCCTGAAGCAAGTGTTTCCTGAGTCAGAAATCGTGTACGTCAAA
    GCTGGCACAGTGTCACGGTTCCGAAAGGACTTTGAGTTGTTAAAAGTTCGGGAGGTGAAT
    GACCTGCACCACGCTAAAGACGCCTATCTGAATATCGTTGTGGGGAACTCCTATTATGTT
    AAGTTTACTAAGAATGCGTCCTGGTTTATTAAGGAGAACCCGGGGCGCACCTATAACCTG
    AAGAAGATGTTCACCTCCGGCTGGAACATAGAACGGAACGGAGAAGTCGCGTGGGAGGTG
    GGTAAGAAAGGGACCATTGTGACCGTCAAACAGATTATGAACAAAAACAACATATTGGTA
    ACTCGCCAGGTGCATGAGGCCAAAGGGGGCCTCTTTGATCAGCAGATTATGAAAAAGGGC
    AAAGGACAGATCGCAATCAAGGAAACCGACGAGCGCCTGGCATCCATTGAGAAGTACGGA
    GGCTACAACAAGGCGGCAGGTGCGTACTTCATGCTCGTCGAGTCCAAAGATAAGAAAGGC
    AAAACTATTAGAACAATCGAGTTCATCCCTCTATATTTGAAAAATAAGATCGAAAGTGAC
    GAAAGCATCGCCCTTAACTTCTTGGAGAAGGGCCGGGGCTTAAAGGAACCAAAGATTCTG
    CTCAAGAAGATCAAGATCGACACACTCTTCGATGTGGATGGTTTTAAGATGTGGCTGTCA
    GGCAGGACAGGGGATCGCTTGCTGTTCAAATGCGCAAATCAGTTGATTCTGGACGAAAAG
    ATCATTGTGACGATGAAGAAGATCGTTAAATTCATTCAGCGGAGACAGGAAAACAGAGAA
    CTGAAACTCTCCGATAAGGATGGAATTGACAATGAAGTCCTCATGGAGATTTACAATACC
    TTTGTGGACAAGCTTGAGAACACAGTCTATCGGATCCGACTGTCCGAACAGGCAAAGACT
    CTGATCGACAAACAGAAAGAATTCGAAAGACTAAGCTTAGAGGACAAAAGTTCAACTCTC
    TTTGAAATTCTCCACATCTTCCAATGTCAAAGTAGTGCAGCCAACTTGAAGATGATCGGG
    GGTCCCGGCAAGGCTGGAATCTTAGTCATGAACAACAACATCTCCAAATGTAACAAAATC
    TCCATCATAAACCAGTCTCCCACCGGCATTTTCGAGAACGAAATTGATTTACTCAAG
    SEQ ATGAAATCTTTCGATTCTTTCACCAACCTCTACTCCCTTAGCAAAACCCTTAAGTTTGAA
    ID ATGAGGCCGGTGGGGAATACACAGAAGATGCTTGACAATGCTGGCGTCTTTGAAAAGGAC
    NO: AAATTAATCCAGAAGAAGTATGGTAAAACAAAGCCATATTTTGACCGATTGCATCGGGAA
    156 TTCATTGAAGAGGCTCTTACAGGAGTAGAATTGATCGGACTGGACGAGAACTTCCGTACC
    TTAGTAGACTGGCAGAAGGACAAGAAGAACAACGTGGCAATGAAGGCCTATGAGAACTCA
    CTCCAGCGCCTTAGAACCGAGATCGGAAAGATCTTTAATCTTAAGGCGGAAGATTGGGTA
    AAAAATAAGTACCCGATCCTGGGACTGAAAAACAAAAACACAGACATCCTGTTTGAAGAA
    GCCGTCTTTGGTATCTTGAAGGCCAGGTATGGAGAGGAGAAAGACACGTTTATAGAGGTA
    GAGGAGATTGATAAAACAGGCAAGAGTAAGATTAATCAGATCAGTATCTTTGATTCTTGG
    AAGGGGTTCACAGGCTACTTTAAGAAGTTTTTCGAAACCAGGAAAAATTTCTATAAGAAC
    GATGGCACCTCCACAGCTATCGCGACACGCATCATAGATCAGAATCTGAAACGGTTCATT
    GATAATCTGAGCATTGTTGAATCCGTGCGCCAGAAGGTCGACCTAGCTGAGACTGAGAAG
    TCTTTCTCTATATCACTCTCCCAGTTCTTCTCAATAGATTTTTATAATAAGTGCCTTCTG
    CAAGATGGCATAGACTACTATAACAAGATCATCGGCGGCGAAACTCTCAAAAACGGTGAA
    AAGCTCATTGGCCTGAATGAGCTCATCAACCAATATAGACAAAATAACAAGGATCAGAAA
    ATCCCATTCTTTAAGCTGCTAGATAAACAGATCCTATCAGAAAAAATCCTGTTCCTCGAC
    GAAATCAAAAACGACACCGAACTCATCGAGGCTCTCTCGCAGTTTGCCAAGACGGCTGAG
    GAGAAGACGAAGATTGTGAAAAAGCTGTTTGCAGACTTTGTGGAGAACAACTCTAAATAC
    GATTTGGCTCAGATTTATATCTCCCAGGAAGCATTTAACACAATCTCCAATAAGTGGACT
    AGCGAGACTGAAACCTTCGCCAAATACCTGTTCGAGGCCATGAAAAGCGGCAAGCTCGCC
    AAATACGAGAAGAAGGACAATTCCTATAAGTTTCCCGATTTCATCGCATTATCTCAGATG
    AAGTCCGCGCTACTTAGCATTAGCCTGGAAGGCCATTTTTGGAAGGAGAAATACTATAAG
    ATTTCCAAATTCCAAGAAAAGACCAATTGGGAGCAGTTCTTGGCTATTTTTCTATACGAG
    TTCAACTCTTTGTTCAGTGACAAGATCAACACTAAGGACGGTGAGACCAAACAAGTGGGG
    TACTACCTCTTCGCCAAAGATCTTCATAACCTGATACTGTCCGAACAGATCGACATACCC
    AAGGATTCAAAGGTGACCATCAAGGATTTTGCGGATTCGGTATTGACGATCTATCAGATG
    GCGAAGTATTTCGCTGTCGAGAAAAAGCGGGCATGGCTGGCCGAATACGAGTTGGACTCC
    TTCTATACTCAACCCGATACAGGGTACCTGCAGTTTTACGATAATGCATACGAGGATATA
    GTCCAGGTGTACAATAAACTCAGGAACTACCTCACTAAGAAACCATACTCCGAAGAAAAA
    TGGAAACTTAATTTTGAGAATAGTACACTGGCCAATGGATGGGACAAGAACAAGGAATCA
    GACAACTCCGCTGTAATTCTCCAGAAGGGTGGCAAGTATTATCTGGGACTGATAACAAAG
    GGCCATAACAAGATTTTCGATGACCGTTTTCAGGAGAAGTTTATAGTGGGCATAGAGGGT
    GGCAAGTATGAAAAAATAGTCTACAAGTTCTTTCCCGATCAGGCGAAGATGTTCCCCAAA
    GTATGCTTCAGTGCTAAAGGCCTCGAGTTTTTCCGGCCATCTGAAGAGATACTCCGCATC
    TATAATAACGCAGAGTTTAAAAAGGGAGAGACGTACTCAATCGACTCGATGCAGAAACTC
    ATTGACTTCTACAAAGATTGTCTCACAAAATACGAGGGCTGGGCTTGCTACACGTTTCGG
    CACTTGAAGCCAACCGAGGAATATCAAAACAACATCGGGGAGTTCTTCCGTGACGTCGCC
    GAAGACGGCTATAGAATTGACTTTCAGGGCATAAGTGATCAGTATATTCACGAGAAGAAT
    GAGAAAGGTGAGTTGCATCTTTTCGAAATCCACAATAAAGACTGGAATCTTGACAAGGCT
    CGCGATGGAAAATCAAAGACTACCCAGAAGAATCTTCATACACTTTACTTCGAGTCCCTC
    TTTTCCAACGACAACGTCGTACAGAATTTCCCAATAAAACTGAACGGCCAGGCCGAAATT
    TTTTACAGGCCCAAAACCGAAAAAGATAAACTGGAATCCAAGAAAGACAAGAAGGGAAAT
    AAGGTGATAGATCACAAAAGGTATTCCGAGAACAAGATTTTTTTCCACGTACCTCTTACC
    CTGAACAGAACGAAGAACGACTCTTATAGATTCAATGCCCAGATAAACAACTTTCTCGCA
    AACAACAAAGATATCAATATTATCGGCGTCGATAGAGGTGAGAAGCACTTGGTATATTAT
    TCTGTGATCACGCAAGCATCCGATATCTTGGAGTCCGGTTCTTTGAACGAACTGAATGGT
    GTCAACTACGCCGAGAAACTCGGTAAGAAAGCTGAGAATCGGGAGCAGGCTAGAAGGGAC
    TGGCAGGACGTTCAGGGTATCAAGGACCTGAAGAAGGGCTACATTTCTCAGGTGGTTCGA
    AAACTGGCTGATTTGGCCATTAAGCACAATGCAATCATCATTTTAGAAGATTTGAACATG
    CGGTTTAAACAAGTCAGGGGGGGGATAGAGAAATCAATTTACCAACAGCTGGAAAAAGCT
    CTGATTGATAAACTCTCTTTTTTGGTTGATAAGGGCGAAAAGAACCCCGAGCAAGCAGGA
    CATCTCCTTAAAGCCTATCAACTGAGCGCACCTTTCGAGACATTCCAGAAGATGGGAAAG
    CAAACCGGCATCATTTTCTATACCCAGGCTTCCTATACATCCAAGTCTGATCCAGTGACT
    GGGTGGAGACCCCATCTCTACCTCAAGTACTTTTCTGCCAAAAAAGCTAAGGACGACATT
    GCTAAGTTCACAAAAATCGAGTTCGTGAACGACAGGTTCGAGCTGACTTATGACATAAAA
    GATTTCCAGCAGGCCAAGGAGTACCCAAACAAGACAGTTTGGAAAGTGTGTTCCAATGTG
    GAGAGGTTTCGGTGGGACAAGAATCTGAATCAGAATAAAGGGGGATATACTCACTACACC
    AACATTACCGAGAACATCCAAGAGTTGTTCACCAAATACGGCATCGACATTACTAAAGAT
    CTGCTGACACAGATCTCCACCATCGATGAGAAGCAGAACACATCTTTCTTCCGGGATTTC
    ATCTTTTATTTTAACTTGATCTGTCAGATTAGAAATACCGACGACAGTGAGATAGCTAAA
    AAAAACGGGAAAGACGATTTCATTCTCTCTCCCGTGGAGCCGTTTTTTGACTCCCGCAAA
    GACAATGGCAATAAGCTTCCGGAAAACGGGGACGATAACGGCGCCTACAACATCGCTCGT
    AAGGGAATCGTTATCCTCAATAAAATAAGCCAGTATTCCGAGAAGAACGAGAATTGTGAA
    AAAATGAAGTGGGGGGACCTTTACGTCAGCAACATCGATTGGGATAACTTTGTGACACAA
    GCCAATGCGAGACACTAG
    SEQ ATGGAAAACTTCAAAAACCTCTACCCCATCAACAAGACCTTGAGGTTTGAGCTCCGGCCA
    ID TATGGGAAGACACTGGAGAACTTCAAAAAGTCCGGTCTGCTGGAAAAGGATGCTTTTAAG
    NO: GCTAACTCTAGGAGGTCTATGCAGGCCATTATCGATGAGAAATTCAAGGAGACCATAGAG
    157 GAGCGTCTGAAATATACTGAGTTTTCCGAGTGTGACCTAGGAAATATGACCAGTAAGGAC
    AAAAAGATCACCGACAAGGCAGCGACAAACCTGAAGAAACAGGTGATTTTAAGCTTTGAT
    GATGAGATTTTCAATAACTACTTGAAGCCGGACAAAAACATCGACGCTCTGTTCAAGAAT
    GATCCAAGCAACCCGGTCATCTCTACTTTCAAGGGCTTCACCACATACTTTGTAAATTTC
    TTCGAAATACGGAAACACATCTTCAAGGGAGAGTCTTCCGGTAGCATGGCTTACAGAATA
    ATCGATGAGAACCTAACTACATATCTAAACAATATCGAGAAGATCAAGAAATTGCCTGAA
    GAACTGAAATCTCAGCTTGAGGGAATCGATCAAATTGACAAACTGAACAACTATAACGAG
    TTCATCACCCAGTCCGGCATTACTCATTATAACGAAATTATTGGAGGGATTTCGAAGTCT
    GAAAATGTCAAAATTCAAGGCATTAACGAAGGGATTAATCTTTACTGTCAAAAGAATAAA
    GTGAAGCTACCACGCTTAACTCCTCTGTATAAGATGATTCTCTCTGATCGGGTCTCTAAT
    TCCTTTGTGCTGGATACCATTGAAAATGATACCGAGTTAATTGAAATGATCTCTGATCTG
    ATAAATAAGACAGAGATAAGTCAGGATGTTATTATGTCCGACATCCAAAATATTTTCATC
    AAATATAAACAACTCGGCAACTTGCCGGGGATTAGCTACTCATCTATAGTGAATGCTATC
    TGTTCGGATTACGACAATAACTTTGGTGACGGCAAACGTAAAAAAAGCTATGAGAATGAT
    CGCAAAAAACACCTCGAGACTAACGTGTATAGCATTAACTATATCTCAGAGTTACTGACA
    GACACCGACGTCTCCAGCAACATAAAGATGCGGTACAAAGAGCTGGAGCAGAATTATCAG
    GTATGCAAGGAAAATTTCAACGCCACTAACTGGATGAACATCAAAAACATTAAGCAGTCT
    GAGAAAACCAATCTGATCAAGGACCTTCTTGACATCCTCAAGAGCATCCAGCGGTTTTAT
    GATTTGTTTGACATCGTGGATGAAGACAAAAATCCTAGTGCTGAGTTCTATACCTGGCTG
    TCTAAAAACGCGGAGAAACTGGACTTCGAGTTTAATTCAGTGTACAACAAGAGCAGGAAC
    TACCTCACGAGAAAGCAGTACTCCGATAAAAAGATTAAGTTGAACTTCGATAGTCCTACT
    CTCGCCAAGGGGTGGGATGCGAACAAAGAAATTGATAATAGCACAATTATCATGAGGAAG
    TTCAACAACGACCGGGGCGATTACGATTACTTCTTGGGGATCTGGAATAAGAGCACACCT
    GCCAACGAAAAGATCATCCCATTAGAGGATAATGGACTGTTTGAAAAAATGCAATATAAG
    CTGTATCCCGATCCTAGTAAAATGCTGCCAAAGCAATTCCTTTCTAAGATCTGGAAAGCT
    AAACATCCAACTACACCCGAGTTTGATAAGAAGTACAAAGAAGGTCGGCACAAGAAGGGG
    CCTGATTTTGAGAAAGAGTTTCTGCACGAGTTGATCGATTGCTTTAAGCATGGATTGGTA
    AACCACGACGAAAAATATCAGGATGTGTTCGGGTTCAATCTGCGCAACACGGAAGACTAC
    AACTCTTATACAGAGTTTCTGGAGGACGTCGAAAGGTGCAACTATAATCTTAGTTTCAAT
    AAAATCGCTGACACGTCTAACTTGATAAATGATGGGAAACTCTATGTTTTTCAGATCTGG
    AGCAAGGATTTCAGCATAGATAGCAAGGGAACAAAAAACTTGAACACAATATACTTTGAA
    TCCCTCTTCTCGGAGGAAAATATGATCGAGAAGATGTTCAAGCTCTCAGGGGAAGCCGAA
    ATATTCTATCGTCCAGCAAGTTTGAATTATTGTGAAGATATTATCAAGAAGGGACACCAC
    CACGCCGAACTGAAGGACAAATTCGACTATCCCATCATCAAGGACAAGCGATATAGCCAG
    GACAAATTTTTTTTTCATGTCCCCATGGTTATCAACTACAAAAGCGAGAAGTTAAACTCC
    AAATCACTTAACAATAGGACGAACGAAAATTTAGGCCAATTCACGCACATCATCGGTATC
    GACCGCGGAGAGCGACATCTCATCTACCTGACCGTGGTGGATGTGTCCACCGGTGAGATC
    GTTGAGCAAAAGCACCTGGATGAAATTATAAATACAGATACAAAAGGCGTCGAGCATAAA
    ACTCATTATCTCAATAAATTAGAAGAGAAGTCCAAGACGCGGGATAATGAAAGAAAGTCC
    TGGGAAGCAATCGAGACGATTAAGGAGCTGAAAGAAGGCTATATTAGCCACGTGATCAAT
    GAAATCCAGAAATTGCAGGAAAAGTATAACGCACTGATAGTGATGGAGAACCTCAATTAT
    GGGTTTAAGAACTCGCGTATCAAAGTGGAAAAGCAGGTCTACCAGAAATTCGAGACCGCC
    CTGATTAAAAAGTTTAATTACATCATTGACAAGAAAGATCCTGAAACCTACATTCATGGA
    TACCAACTGACGAATCCAATCACTACACTCGATAAAATTGGTAACCAGAGCGGTATTGTG
    TTGTACATTCCGGCTTGGAATACAAGCAAGATTGATCCAGTCACTGGTTTCGTTAACCTC
    CTGTATGCAGACGATTTGAAATACAAGAACCAGGAGCAGGCTAAAAGCTTTATCCAGAAA
    ATCGATAATATCTACTTCGAAAATGGTGAGTTTAAATTTGATATAGATTTCAGCAAATGG
    AACAACCGCTACTCAATTAGCAAGACGAAATGGACACTGACAAGCTACGGAACCCGGATA
    CAGACGTTCCGAAACCCCCAGAAAAATAACAAGTGGGACAGCGCCGAGTATGACCTGACC
    GAAGAGTTTAAATTAATCCTGAACATCGATGGTACTCTGAAATCTCAGGATGTGGAAACC
    TATAAGAAATTCATGTCTTTATTCAAGCTGATGTTGCAGCTGCGAAACTCCGTTACTGGA
    ACAGACATTGACTACATGATTAGCCCTGTGACAGATAAAACTGGAACCCACTTTGATTCA
    CGGGAGAATATCAAGAACCTGCCCGCCGATGCTGATGCGAACGGAGCTTACAACATTGCT
    AGGAAGGGCATCATGGCAATCGAGAATATTATGAACGGCATTAGCGACCCTCTGAAGATC
    AGTAATGAGGACTACCTGAAGTACATTCAGAACCAACAAGAGTAA
    SEQ ATGACCCAGTTTGAGGGTTTCACCAATCTTTATCAGGTGTCAAAAACACTCAGATTTGAG
    ID CTCATCCCACAGGGTAAAACTTTAAAGCATATTCAAGAGCAGGGCTTTATAGAGGAAGAC
    NO: AAAGCCAGAAACGACCATTATAAGGAACTAAAACCGATCATTGACCGCATCTACAAAACC
    158 TATGCCGACCAATGCCTTCAGCTCGTCCAACTCGATTGGGAGAATCTGAGCGCCGCTATT
    GACAGCTACAGGAAGGAGAAGACCGAGGAGACTAGAAACGCCCTGATCGAGGAGCAGGCG
    ACCTATAGAAACGCTATTCACGATTATTTTATCGGCCGCACCGACAATTTGACAGATGCC
    ATCAACAAGCGGCACGCCGAAATTTATAAGGGGTTATTTAAGGCCGAGCTGTTCAATGGA
    AAAGTACTGAAACAGCTGGGCACCGTAACAACCACCGAACACGAGAATGCTCTGTTGAGG
    TCCTTCGACAAGTTTACTACCTACTTTAGCGGCTTCTACGAAAACCGTAAAAACGTGTTT
    TCCGCGGAGGATATTTCAACAGCCATTCCTCATAGGATCGTGCAGGATAATTTCCCCAAG
    TTTAAGGAGAACTGCCATATCTTTACCAGACTTATCACTGCTGTGCCAAGTTTACGAGAA
    CACTTCGAGAATGTTAAGAAGGCTATAGGCATATTCGTTTCCACCTCCATCGAAGAAGTA
    TTCAGTTTTCCATTCTACAATCAGTTACTCACGCAGACCCAGATAGATCTCTACAATCAG
    CTGCTCGGAGGCATTTCTAGAGAAGCAGGCACGGAAAAGATCAAGGGCTTAAATGAAGTA
    CTCAATCTTGCAATTCAGAAGAACGATGAGACAGCACACATTATTGCATCTCTCCCTCAC
    AGATTCATTCCCCTGTTCAAACAGATCCTGTCCGATCGCAACACACTAAGCTTTATACTT
    GAGGAGTTTAAGTCAGATGAGGAAGTGATCCAGAGCTTCTGTAAGTATAAGACTTTGCTC
    CGTAATGAAAACGTGCTTGAGACAGCAGAGGCTCTCTTTAACGAGTTGAATTCCATCGAC
    CTGACACACATTTTTATCAGCCATAAAAAGCTGGAAACGATTAGCTCTGCCTTGTGCGAC
    CACTGGGACACCCTGCGTAACGCCCTCTATGAAAGGCGCATTTCCGAGCTCACCGGGAAG
    ATCACAAAAAGTGCCAAGGAAAAAGTCCAGAGGTCCCTTAAACATGAAGACATCAACCTA
    CAAGAGATCATCTCTGCGGCTGGGAAAGAGCTGTCAGAAGCATTTAAACAGAAGACTTCC
    GAGATCCTGAGCCACGCACACGCCGCATTAGACCAGCCCCTGCCTACAACTCTTAAAAAA
    CAGGAGGAGAAGGAGATTTTAAAGAGCCAGCTGGACTCATTACTCGGCCTGTATCATCTC
    CTGGACTGGTTCGCCGTGGACGAATCCAACGAGGTGGACCCAGAATTTAGCGCCAGGCTG
    ACAGGAATTAAACTGGAAATGGAGCCAAGTTTGAGCTTTTACAACAAGGCTCGGAACTAT
    GCCACTAAAAAGCCCTACAGCGTGGAAAAGTTCAAGCTGAATTTTCAGATGCCGACCCTG
    GCTTCCGGGTGGGATGTTAATAAGGAAAAGAATAATGGGGCTATACTGTTCGTCAAAAAT
    GGTCTCTACTACCTGGGAATCATGCCCAAACAGAAGGGCAGGTACAAAGCCCTTTCGTTT
    GAGCCGACCGAAAAAACCAGCGAAGGCTTTGATAAGATGTATTACGACTATTTCCCAGAT
    GCAGCCAAGATGATCCCAAAATGTAGCACTCAGTTGAAGGCGGTAACCGCTCACTTTCAG
    ACACACACCACTCCTATCTTGCTCTCCAACAACTTTATTGAGCCGCTGGAGATCACGAAG
    GAAATCTACGACCTTAACAACCCAGAGAAGGAACCCAAGAAATTCCAAACAGCTTATGCT
    AAGAAGACTGGGGATCAAAAGGGCTATCGAGAGGCTTTGTGTAAGTGGATTGACTTTACA
    CGGGATTTCCTGAGTAAGTATACCAAGACCACATCTATTGACCTGTCCTCACTGAGACCT
    TCCTCACAATATAAGGATCTCGGAGAGTATTATGCCGAACTCAACCCTCTACTCTATCAC
    ATCTCTTTCCAGAGGATCGCCGAAAAGGAAATTATGGACGCCGTCGAGACAGGCAAGCTG
    TACCTCTTCCAGATTTACAACAAGGATTTCGCAAAGGGCCACCACGGAAAACCCAATTTG
    CACACTTTGTACTGGACAGGGCTCTTCTCTCCCGAAAATTTGGCCAAAACTTCAATAAAA
    CTGAACGGGCAAGCCGAGCTGTTCTATCGGCCCAAGTCACGTATGAAGCGGATGGCCCAC
    CGGCTGGGCGAGAAGATGCTCAACAAGAAACTGAAGGATCAGAAGACGCCCATACCAGAC
    ACTCTTTACCAAGAGCTGTATGACTACGTGAATCACAGACTGAGTCACGACCTGTCTGAT
    GAAGCCCGGGCTCTTCTTCCAAATGTGATTACCAAAGAAGTTTCCCACGAAATTATCAAG
    GACCGGCGCTTCACCTCTGACAAATTCTTTTTCCACGTCCCAATCACCCTCAACTACCAG
    GCAGCCAATTCCCCTTCAAAGTTTAACCAGCGTGTGAATGCCTACCTGAAAGAGCATCCG
    GAGACCCCCATCATAGGGATAGACAGAGGAGAGCGGAATCTTATCTACATTACTGTGATT
    GACAGCACAGGTAAGATCTTGGAGCAGAGATCTTTAAATACAATCCAGCAGTTTGACTAC
    CAGAAGAAACTGGATAACCGAGAGAAGGAAAGGGTTGCTGCAAGACAGGCCTGGTCAGTG
    GTCGGCACCATCAAAGACCTGAAGCAGGGCTACTTATCCCAAGTAATTCACGAAATTGTC
    GATCTTATGATTCATTATCAAGCCGTTGTTGTGCTGGAGAACCTGAATTTTGGCTTCAAA
    AGCAAACGAACAGGTATCGCCGAGAAAGCCGTGTATCAGCAGTTCGAAAAGATGCTCATA
    GACAAGCTGAACTGCTTAGTGCTGAAGGATTATCCTGCTGAGAAGGTCGGCGGCGTACTT
    AACCCATACCAGCTGACCGATCAGTTCACTAGTTTCGCCAAGATGGGAACGCAAAGTGGC
    TTCCTTTTCTACGTGCCCGCTCCCTACACGAGTAAGATCGACCCTCTGACCGGCTTCGTC
    GACCCATTCGTCTGGAAGACCATCAAGAATCACGAATCACGGAAACACTTCTTAGAGGGG
    TTTGACTTCCTGCACTACGACGTGAAGACAGGGGACTTCATCTTACACTTTAAGATGAAT
    CGAAACCTCTCCTTCCAGCGGGGCCTGCCTGGTTTCATGCCCGCATGGGACATCGTGTTT
    GAGAAAAACGAGACACAGTTTGACGCTAAGGGAACCCCCTTTATTGCGGGGAAGCGGATT
    GTCCCAGTCATCGAAAACCATCGGTTCACCGGGCGATACCGGGATCTGTACCCGGCCAAC
    GAGCTCATCGCGCTGCTGGAGGAGAAGGGTATTGTGTTTAGGGATGGATCCAACATTCTG
    CCTAAGTTGCTGGAAAATGATGATTCGCACGCCATTGATACCATGGTTGCACTGATTAGA
    TCCGTACTGCAGATGAGGAATAGCAATGCTGCAACCGGGGAGGATTATATTAATTCCCCA
    GTGCGAGATCTGAATGGTGTCTGTTTTGACTCGCGCTTTCAGAATCCAGAATGGCCAATG
    GATGCAGACGCTAACGGGGCGTACCACATTGCTCTGAAAGGCCAGCTACTCCTGAACCAC
    CTCAAGGAGAGCAAAGATCTGAAGCTGCAGAACGGCATTTCCAACCAAGACTGGCTCGCC
    TACATACAAGAACTGCGCAATTAA
    SEQ ATGGCTGTCAAATCCATCAAGGTTAAATTACGGCTTGATGACATGCCCGAGATCCGCGCC
    ID GGGCTCTGGAAACTCCATAAAGAAGTGAATGCTGGCGTTAGATACTACACAGAATGGCTC
    NO: TCCCTGCTGCGCCAGGAAAATTTGTACCGCCGGTCACCTAATGGAGATGGAGAGCAGGAA
    159 TGCGATAAAACAGCAGAAGAGTGCAAAGCCGAATTGCTGGAGCGACTGCGGGCACGGCAG
    GTTGAGAATGGACACCGAGGTCCGGCGGGATCGGACGACGAGCTGCTCCAGCTCGCCAGA
    CAATTATATGAACTGCTGGTGCCTCAGGCTATTGGGGCAAAGGGTGACGCACAGCAGATT
    GCTAGAAAATTTCTGTCTCCCCTCGCCGACAAAGACGCTGTCGGCGGCCTTGGGATAGCC
    AAAGCCGGCAACAAACCCCGATGGGTGCGCATGAGGGAGGCTGGTGAGCCTGGCTGGGAG
    GAAGAAAAGGAAAAGGCCGAAACCAGAAAGTCCGCCGACAGGACCGCGGACGTACTCCGA
    GCATTGGCCGATTTTGGGCTGAAGCCCTTAATGCGAGTCTACACCGATAGTGAAATGTCT
    AGCGTGGAGTGGAAGCCATTACGCAAAGGGCAGGCAGTGCGGACGTGGGACCGTGACATG
    TTCCAGCAAGCCATCGAGCGAATGATGAGCTGGGAGAGCTGGAACCAGAGAGTGGGGCAG
    GAGTATGCCAAGCTGGTCGAGCAGAAAAACCGGTTTGAGCAAAAAAATTTTGTAGGTCAG
    GAACACCTGGTGCATCTCGTTAACCAGCTCCAGCAAGATATGAAGGAAGCTTCGCCTGGA
    TTAGAGAGCAAAGAGCAGACTGCACACTATGTAACCGGAAGAGCACTGAGGGGCAGTGAC
    AAAGTGTTCGAAAAATGGGGAAAACTGGCTCCCGATGCCCCCTTTGACCTGTACGACGCA
    GAAATAAAAAACGTGCAGCGGCGAAACACCAGGCGATTTGGTAGCCATGATCTGTTCGCC
    AAATTGGCAGAGCCGGAATATCAGGCTCTTTGGCGAGAAGACGCATCATTTCTCACTAGG
    TACGCGGTCTATAACTCCATTTTGAGGAAATTGAACCACGCAAAAATGTTTGCCACCTTC
    ACGTTGCCTGACGCCACCGCTCATCCCATTTGGACACGGTTTGATAAGCTGGGCGGCAAT
    CTGCATCAGTATACATTCCTGTTTAACGAGTTTGGAGAGCGAAGACATGCGATACGATTC
    CACAAGCTACTGAAGGTCGAAAATGGCGTGGCACGTGAGGTGGACGATGTCACCGTGCCC
    ATCAGCATGAGCGAACAGCTGGATAATTTGTTGCCGCGGGACCCAAATGAACCTATAGCC
    CTTTATTTTAGGGACTACGGGGCGGAGCAACATTTCACTGGGGAGTTTGGCGGCGCAAAA
    ATTCAGTGCCGACGCGACCAGCTCGCCCACATGCATAGAAGACGCGGGGCCCGGGACGTA
    TACCTTAACGTCTCTGTGAGGGTGCAGTCCCAGTCAGAGGCAAGAGGGGAACGCAGACCA
    CCTTACGCAGCAGTATTCAGGCTGGTAGGCGATAACCACCGGGCGTTTGTACACTTTGAT
    AAACTTTCTGACTACCTGGCCGAACACCCGGATGACGGCAAATTAGGATCGGAGGGGCTG
    CTTAGCGGCCTGCGTGTGATGAGCGTCGATCTGGGGCTACGGACCTCTGCTTCCATCTCT
    GTGTTCCGTGTGGCCCGAAAGGACGAGTTGAAACCTAATTCGAAGGGCCGTGTACCATTC
    TTTTTCCCTATTAAGGGAAATGATAATCTCGTCGCGGTGCACGAGCGTTCCCAACTGCTG
    AAACTGCCTGGCGAGACCGAGTCCAAAGATCTCAGAGCAATCCGGGAGGAGCGACAACGT
    ACACTTAGGCAACTCCGCACCCAGCTGGCCTATCTGCGCTTGCTGGTGCGGTGCGGCTCC
    GAGGATGTAGGGAGAAGAGAGCGAAGCTGGGCAAAGCTGATAGAGCAACCAGTTGACGCC
    GCGAATCACATGACCCCCGACTGGCGCGAAGCGTTTGAAAATGAGCTGCAGAAGTTGAAA
    TCTCTGCATGGGATTTGCTCAGATAAGGAGTGGATGGACGCCGTATACGAGTCTGTTCGC
    CGGGTATGGCGGCACATGGGGAAGCAGGTGAGAGATTGGAGAAAGGACGTTCGCTCTGGG
    GAACGGCCGAAAATTCGGGGATACGCAAAGGATGTCGTGGGCGGCAATAGCATTGAGCAG
    ATCGAGTACCTGGAAAGGCAATACAAATTTCTGAAATCTTGGTCTTTCTTTGGGAAGGTA
    AGCGGACAAGTTATCAGAGCCGAAAAGGGATCTCGCTTTGCTATCACATTGAGGGAACAC
    ATTGATCACGCCAAAGAAGACAGGTTGAAAAAGTTGGCTGATCGCATTATCATGGAAGCA
    CTCGGTTACGTCTACGCCCTTGATGAGCGCGGTAAAGGGAAGTGGGTAGCCAAGTATCCC
    CCATGTCAGCTGATCCTGCTCGAGGAACTTTCTGAGTATCAGTTCAATAACGACCGTCCT
    CCCTCCGAAAATAATCAGCTCATGCAATGGTCCCACCGGGGTGTGTTCCAAGAACTGATC
    AATCAGGCTCAGGTGCACGACCTCCTCGTAGGCACTATGTATGCAGCCTTTAGCTCCCGT
    TTTGACGCGCGCACAGGCGCCCCTGGAATACGATGTAGGCGAGTTCCCGCACGGTGCACT
    CAAGAACATAACCCGGAGCCTTTCCCATGGTGGCTCAATAAGTTTGTTGTGGAGCATACC
    CTCGACGCTTGCCCATTGAGGGCGGATGACTTGATTCCCACAGGCGAGGGGGAGATCTTC
    GTGAGCCCATTTTCTGCCGAAGAAGGGGATTTCCACCAAATACATGCCGACTTGAATGCT
    GCCCAAAATCTGCAGCAAAGGCTGTGGTCAGACTTCGACATCTCGCAAATCAGACTGCGG
    TGTGACTGGGGCGAAGTAGACGGCGAGCTGGTGCTGATACCTAGACTGACGGGTAAGCGT
    ACCGCCGATAGCTATAGTAATAAGGTTTTTTATACGAATACGGGGGTGACATATTACGAG
    CGTGAGAGAGGCAAGAAGCGTCGGAAGGTGTTCGCGCAGGAGAAGCTGAGCGAAGAGGAG
    GCGGAGCTACTGGTAGAGGCAGATGAGGCAAGAGAAAAGTCCGTCGTCCTGATGCGGGAT
    CCTAGCGGGATTATTAACAGAGGTAATTGGACACGGCAGAAAGAATTCTGGAGCATGGTG
    AATCAAAGAATCGAGGGTTACCTGGTGAAGCAAATTCGAAGCCGGGTGCCCCTTCAAGAC
    AGCGCATGTGAAAACACTGGGGACATCTAG
    SEQ ATGGCTACTCGGTCCTTCATCCTGAAAATCGAGCCAAATGAAGAGGTGAAAAAGGGCCTG
    ID TGGAAGACCCATGAGGTACTTAACCACGGCATAGCATACTATATGAATATCCTAAAACTT
    NO: ATACGGCAGGAGGCTATCTACGAGCATCACGAGCAAGATCCTAAAAATCCAAAGAAGGTT
    160 AGTAAGGCTGAAATCCAGGCTGAATTGTGGGACTTCGTGCTGAAGATGCAGAAATGCAAC
    AGTTTCACGCATGAAGTTGATAAGGACGTCGTGTTTAATATACTCCGGGAGCTGTACGAA
    GAACTGGTACCAAGCTCTGTGGAAAAGAAAGGAGAGGCCAACCAGCTAAGTAATAAGTTC
    CTCTATCCTCTCGTGGACCCCAATTCACAGAGCGGCAAAGGTACCGCATCTTCTGGGAGG
    AAACCACGCTGGTACAACTTGAAGATCGCTGGCGATCCCAGCTGGGAGGAGGAAAAGAAG
    AAATGGGAAGAGGATAAAAAGAAAGACCCCCTGGCCAAAATCTTAGGCAAGCTCGCCGAG
    TACGGTCTGATTCCACTTTTCATCCCGTTCACAGATAGCAATGAGCCGATCGTCAAGGAG
    ATTAAGTGGATGGAAAAGAGCCGCAATCAGAGTGTGCGGAGGCTGGACAAAGACATGTTT
    ATTCAGGCCCTGGAACGCTTCCTTAGCTGGGAAAGCTGGAACCTGAAGGTTAAGGAAGAG
    TACGAAAAAGTCGAGAAGGAGCATAAGACTTTGGAGGAGCGCATCAAAGAAGACATCCAG
    GCCTTTAAGTCTCTAGAACAGTATGAGAAAGAACGGCAGGAACAGCTGCTGCGTGATACA
    CTGAACACAAACGAATATCGCCTGAGCAAGAGGGGACTCAGAGGCTGGAGAGAAATCATT
    CAAAAGTGGCTCAAAATGGATGAAAATGAGCCGTCTGAAAAATACCTTGAAGTTTTCAAG
    GACTACCAGCGGAAGCACCCTAGAGAAGCCGGCGACTATAGTGTTTACGAATTCTTGAGC
    AAGAAGGAGAATCATTTTATATGGAGGAATCACCCGGAGTACCCATATCTGTACGCAACC
    TTCTGCGAAATCGACAAGAAAAAAAAAGACGCCAAGCAACAGGCTACATTTACTCTGGCC
    GACCCTATCAATCACCCTCTATGGGTCCGGTTTGAGGAGCGCTCCGGAAGCAATCTGAAT
    AAATATCGTATTCTGACTGAACAGTTACACACAGAGAAGCTCAAGAAGAAACTTACGGTG
    CAGCTGGACCGCCTGATATACCCAACAGAGTCCGGAGGATGGGAAGAGAAAGGAAAGGTT
    GACATCGTACTGCTTCCATCTCGTCAGTTTTACAACCAGATATTCCTGGACATCGAGGAG
    AAGGGGAAACACGCCTTCACATACAAGGACGAGTCCATAAAGTTCCCACTGAAGGGTACT
    TTAGGCGGTGCTAGGGTGCAGTTCGACCGCGATCACCTGAGACGGTACCCCCACAAGGTG
    GAGAGCGGGAACGTGGGACGAATCTACTTTAATATGACAGTGAACATTGAACCCACAGAG
    AGTCCAGTTAGTAAATCCCTGAAAATTCACCGTGACGACTTTCCGAAATTTGTGAATTTC
    AAGCCAAAGGAGCTTACGGAGTGGATCAAGGATTCAAAGGGAAAGAAGCTGAAATCTGGT
    ATCGAATCTCTCGAGATCGGTCTCCGTGTCATGAGCATCGATCTGGGACAGCGCCAGGCA
    GCTGCCGCCAGTATATTCGAGGTGGTAGACCAAAAGCCTGACATCGAGGGAAAGCTCTTC
    TTCCCAATCAAAGGCACAGAGCTGTATGCGGTGCACCGGGCGTCCTTTAATATAAAGCTG
    CCCGGTGAAACCCTGGTGAAGTCACGGGAGGTGCTTAGAAAAGCGCGAGAGGATAACCTC
    AAACTGATGAACCAAAAACTGAACTTTCTGAGGAACGTCCTGCACTTTCAGCAGTTCGAA
    GATATTACCGAACGCGAAAAGAGAGTAACCAAGTGGATATCTCGTCAAGAGAACAGCGAC
    GTCCCGTTAGTCTATCAGGACGAACTCATCCAAATACGGGAGTTGATGTATAAGCCCTAC
    AAGGATTGGGTCGCCTTTCTTAAGCAGCTTCACAAACGCCTAGAGGTCGAAATAGGTAAA
    GAGGTGAAACATTGGCGGAAGTCGCTCAGCGACGGGAGGAAGGGACTTTATGGCATCTCT
    TTGAAGAACATTGACGAAATCGATAGAACCAGAAAATTTTTGTTGAGATGGTCCCTCCGA
    CCCACCGAGCCTGGAGAGGTGAGGCGGTTAGAACCAGGACAGAGGTTCGCTATCGATCAG
    CTGAATCACCTCAATGCTCTGAAGGAGGACCGCCTCAAGAAAATGGCCAATACAATCATA
    ATGCACGCCCTTGGCTACTGCTACGACGTCCGAAAGAAGAAGTGGCAGGCCAAGAATCCC
    GCCTGTCAAATTATCCTTTTTGAGGATCTTAGCAATTACAACCCCTATGAAGAGCGGTCC
    AGATTCGAAAATAGTAAGCTCATGAAGTGGAGCCGCAGGGAGATCCCGCGCCAAGTGGCC
    CTTCAGGGGGAAATTTATGGGCTGCAGGTAGGCGAGGTCGGGGCCCAATTCTCCTCGCGC
    TTTCATGCGAAAACTGGAAGTCCTGGAATCCGGTGCTCAGTGGTGACAAAGGAGAAGTTG
    CAAGACAATCGGTTTTTTAAAAACTTACAGCGGGAGGGAAGGCTGACCCTGGATAAGATA
    GCCGTACTTAAGGAAGGAGATCTGTACCCTGACAAAGGCGGTGAAAAGTTCATTAGCTTG
    AGCAAGGACCGAAAACTTGTGACCACCCACGCTGACATCAATGCGGCACAGAACCTGCAG
    AAGAGATTTTGGACTCGCACCCACGGATTCTACAAAGTTTACTGCAAAGCATATCAAGTA
    GACGGACAGACCGTATACATCCCCGAGTCCAAAGATCAGAAGCAGAAAATTATTGAAGAG
    TTTGGGGAAGGGTACTTTATCCTGAAGGATGGTGTCTACGAATGGGGCAACGCTGGTAAA
    CTTAAAATTAAGAAGGGCAGCTCTAAACAGTCCTCCAGCGAGTTAGTTGATTCTGATATT
    CTGAAAGACAGTTTCGACCTGGCCAGCGAACTTAAAGGGGAAAAATTAATGCTGTACCGG
    GACCCCAGCGGAAACGTCTTTCCATCCGATAAGTGGATGGCCGCTGGAGTGTTCTTTGGC
    AAGTTAGAGAGGATTCTCATAAGTAAGCTGACCAACCAATACTCAATCTCCACAATCGAG
    GATGACTCATCCAAGCAGTCTATGTGA
    SEQ ATGCCTACACGCACTATCAACCTGAAACTGGTTCTTGGCAAGAATCCAGAGAATGCTACC
    ID CTTCGTCGGGCACTATTTTCAACGCATAGACTGGTGAATCAGGCTACCAAACGGATTGAA
    NO: GAGTTCCTCTTGCTTTGTCGGGGGGAAGCATATAGGACGGTGGATAATGAGGGGAAAGAG
    161 GCTGAAATTCCGAGACACGCCGTGCAGGAGGAAGCTCTTGCGTTTGCAAAGGCCGCTCAA
    CGGCACAATGGTTGCATCTCTACTTATGAAGACCAGGAAATCCTGGATGTGCTCCGGCAA
    CTGTATGAAAGGCTGGTGCCTTCTGTGAATGAAAATAATGAAGCAGGGGACGCTCAAGCC
    GCAAACGCGTGGGTGTCGCCACTGATGTCCGCCGAGTCCGAGGGAGGGCTCAGCGTTTAC
    GACAAGGTGCTGGACCCACCCCCAGTGTGGATGAAACTCAAAGAGGAAAAAGCTCCGGGC
    TGGGAGGCTGCTTCCCAGATCTGGATCCAGTCCGACGAAGGGCAGTCCCTTCTTAACAAG
    CCTGGTTCGCCCCCGCGGTGGATTAGGAAACTGAGGTCAGGCCAGCCTTGGCAGGACGAT
    TTTGTTAGCGACCAGAAAAAGAAGCAGGACGAGCTGACAAAGGGGAATGCGCCACTGATC
    AAACAATTAAAGGAAATGGGCTTATTGCCTCTTGTGAATCCCTTTTTTAGACATCTGCTT
    GACCCGGAGGGGAAGGGGGTGTCACCTTGGGACAGACTCGCTGTTAGGGCCGCTGTCGCT
    CATTTCATATCATGGGAATCATGGAACCACCGGACACGCGCCGAATACAATAGTTTGAAG
    CTGCGGAGGGATGAGTTCGAAGCAGCTTCCGACGAATTCAAGGACGACTTCACGCTGCTT
    CGGCAGTACGAGGCTAAGAGGCACTCCACACTGAAGAGTATAGCTTTAGCCGATGATTCA
    AACCCTTATAGGATCGGCGTACGCTCCCTCCGCGCTTGGAACCGCGTCCGCGAGGAGTGG
    ATCGACAAGGGAGCGACCGAGGAGCAGCGGGTCACCATTCTCAGCAAGTTGCAGACCCAA
    CTAAGGGGCAAATTTGGAGATCCTGACTTGTTCAACTGGCTGGCGCAGGACCGGCACGTG
    CACCTCTGGAGCCCTAGAGATAGTGTTACCCCACTGGTTAGGATCAACGCTGTTGACAAA
    GTATTGCGACGGAGAAAACCGTACGCCTTGATGACTTTTGCCCACCCAAGATTCCACCCT
    CGGTGGATACTTTACGAAGCCCCAGGGGGCAGCAATCTCCGCCAGTATGCACTGGATTGT
    ACCGAAAATGCTCTGCACATTACACTGCCTCTGCTGGTTGACGATGCACATGGCACATGG
    ATTGAGAAAAAAATTAGGGTTCCTCTTGCCCCCAGCGGCCAGATTCAGGACCTGACACTA
    GAAAAGCTCGAGAAGAAGAAAAATCGTCTCTACTACCGTTCTGGGTTCCAGCAGTTTGCC
    GGCCTGGCCGGAGGTGCCGAGGTGCTTTTCCATCGACCATACATGGAGCACGATGAGAGG
    AGCGAGGAGAGCTTATTAGAACGCCCTGGTGCTGTTTGGTTCAAACTCACCTTGGACGTG
    GCAACCCAGGCCCCTCCAAACTGGTTGGACGGAAAGGGCCGCGTCCGAACGCCCCCCGAG
    GTTCACCACTTCAAGACAGCCCTCAGTAACAAGTCTAAGCACACACGGACCCTCCAGCCC
    GGACTCAGAGTGTTATCCGTGGATCTGGGAATGCGCACCTTCGCCTCTTGCTCCGTATTT
    GAGCTGATCGAGGGCAAACCAGAGACTGGCAGAGCGTTCCCTGTGGCCGACGAACGTTCC
    ATGGATTCACCAAACAAGCTGTGGGCCAAGCACGAAAGATCCTTTAAACTCACGCTCCCC
    GGCGAAACCCCCAGTCGGAAAGAAGAGGAGGAACGGAGCATTGCAAGAGCCGAAATCTAT
    GCGTTGAAAAGAGATATTCAGAGATTAAAAAGTCTTCTGCGCCTGGGGGAAGAGGATAAC
    GATAATAGACGCGATGCACTTCTTGAGCAATTTTTCAAGGGCTGGGGCGAGGAAGACGTG
    GTTCCAGGTCAGGCCTTTCCCCGGAGTCTGTTCCAGGGGCTGGGGGCCGCCCCATTCAGA
    TCCACCCCTGAGTTGTGGAGACAACACTGTCAAACCTATTATGATAAAGCAGAGGCGTGC
    CTGGCTAAACACATCAGCGATTGGCGCAAGAGAACCAGGCCTAGGCCTACCTCACGTGAG
    ATGTGGTACAAGACACGCTCTTATCACGGCGGAAAGTCAATCTGGATGCTGGAATACCTC
    GACGCTGTGAGGAAACTGCTCTTATCCTGGAGCCTCAGAGGCCGGACCTACGGGGCTATC
    AACAGACAGGACACAGCAAGGTTCGGGAGCTTAGCCAGCCGGCTCCTTCACCACATTAAC
    TCACTCAAAGAGGATCGAATAAAGACCGGAGCCGACTCGATCGTGCAGGCAGCCCGAGGG
    TACATCCCCCTGCCTCATGGGAAGGGCTGGGAGCAGCGATATGAACCCTGCCAGCTGATC
    TTGTTTGAGGACCTTGCCCGTTATAGATTTCGCGTTGATAGACCTCGCCGTGAGAATTCT
    CAGCTGATGCAGTGGAACCACAGAGCGATCGTGGCTGAGACCACTATGCAGGCCGAGCTG
    TATGGACAGATCGTGGAGAACACCGCCGCAGGGTTCAGTTCTCGGTTTCATGCTGCCACC
    GGAGCTCCCGGCGTCCGGTGCCGCTTCCTCTTAGAGCGTGATTTTGACAATGACCTCCCA
    AAGCCCTATCTGCTGAGGGAACTGAGCTGGATGCTGGGGAACACAAAAGTAGAATCGGAG
    GAGGAGAAGCTACGGCTCCTCTCCGAAAAGATACGTCCAGGCTCTCTGGTACCATGGGAC
    GGAGGAGAGCAGTTCGCGACACTGCATCCTAAGAGACAGACGTTATGTGTGATTCACGCC
    GATATGAACGCCGCTCAGAATCTGCAGCGAAGATTCTTTGGCCGCTGCGGCGAAGCCTTC
    AGGCTGGTATGTCAGCCCCACGGGGATGATGTGCTGCGGCTGGCCTCAACCCCTGGGGCT
    AGACTCTTGGGGGCACTCCAGCAGCTGGAAAATGGCCAAGGGGCTTTCGAACTCGTTCGG
    GACATGGGCAGCACAAGCCAGATGAACAGATTCGTCATGAAGAGCCTGGGAAAGAAAAAG
    ATCAAACCCTTACAGGACAATAATGGCGACGACGAACTGGAGGACGTGTTGTCCGTGCTG
    CCAGAGGAAGACGACACAGGCCGCATCACTGTCTTCCGCGACTCAAGTGGGATATTCTTT
    CCTTGCAACGTGTGGATTCCGGCCAAACAGTTCTGGCCTGCCGTCAGAGCCATGATTTGG
    AAAGTGATGGCTAGTCATTCATTGGGATGA
    SEQ ATGACAAAGCTGAGGCACAGACAAAAGAAGCTTACACACGACTGGGCAGGGAGCAAGAAA
    ID CGTGAGGTCCTTGGGTCAAATGGAAAACTGCAGAACCCCTTGCTCATGCCTGTAAAGAAG
    NO: GGGCAGGTAACAGAATTTAGAAAAGCATTCTCCGCGTACGCTCGGGCAACTAAGGGGGAA
    162 ATGACCGATGGACGGAAGAACATGTTCACCCATTCTTTCGAGCCATTCAAAACAAAGCCG
    TCATTGCACCAATGCGAGCTGGCCGATAAGGCTTACCAGTCTTTGCATAGTTACCTCCCC
    GGTTCCCTGGCCCATTTCTTGCTTTCCGCACACGCACTGGGCTTTCGTATTTTCTCTAAA
    TCTGGGGAGGCAACTGCCTTCCAGGCCAGCTCAAAAATCGAGGCCTATGAGTCCAAGCTC
    GCTTCGGAGCTAGCCTGTGTCGATTTGAGTATCCAGAATTTGACGATTAGTACTCTTTTC
    AACGCTCTCACAACTTCAGTTCGGGGCAAGGGGGAGGAAACTTCAGCAGATCCCCTTATC
    GCACGGTTCTACACTCTCCTGACGGGCAAGCCCCTGAGCCGAGACACACAGGGCCCAGAA
    CGGGACTTGGCAGAGGTCATCTCCAGAAAGATCGCCTCGTCCTTCGGCACATGGAAGGAA
    ATGACTGCCAACCCTCTGCAGAGCCTCCAGTTCTTCGAAGAAGAGCTTCATGCACTAGAT
    GCCAACGTGTCTTTATCTCCAGCTTTTGATGTGTTAATCAAGATGAATGATCTCCAAGGT
    GATCTGAAGAACCGTACTATAGTGTTCGACCCAGATGCACCCGTGTTCGAGTACAACGCT
    GAGGATCCAGCCGATATCATCATAAAGCTGACAGCTCGGTATGCGAAGGAGGCCGTCATC
    AAGAATCAGAACGTGGGCAATTATGTGAAAAACGCCATTACCACCACTAATGCCAATGGG
    CTGGGGTGGCTCCTCAATAAAGGGCTTTCACTACTGCCAGTTTCTACTGACGATGAGCTG
    CTCGAATTCATTGGGGTGGAGAGAAGCCATCCCAGCTGTCACGCGCTGATAGAGCTGATT
    GCCCAGCTAGAGGCGCCGGAACTGTTTGAGAAGAATGTGTTTAGTGACACCCGTTCCGAG
    GTTCAGGGTATGATCGACAGTGCAGTGTCGAACCACATTGCTCGGCTGTCCAGCAGCCGA
    AACTCCCTGAGCATGGACAGCGAGGAATTGGAACGCTTGATTAAATCTTTCCAGATTCAT
    ACTCCCCATTGTTCTCTGTTCATAGGCGCTCAGTCCTTATCTCAGCAGCTGGAGAGCTTA
    CCTGAGGCGCTGCAGTCCGGAGTGAACAGCGCTGATATCTTATTAGGCAGCACACAGTAT
    ATGCTGACCAACTCTCTCGTTGAAGAGTCAATTGCAACATATCAAAGGACATTAAATAGG
    ATCAATTACCTGAGTGGGGTGGCTGGGCAGATTAACGGTGCTATCAAAAGAAAGGCAATC
    GACGGCGAAAAAATACACCTGCCTGCCGCCTGGAGTGAGCTCATCTCCTTACCTTTCATT
    GGACAGCCGGTGATTGATGTGGAGAGCGACCTGGCACACTTAAAAAACCAGTACCAGACC
    CTGTCCAATGAATTTGACACCCTCATTTCGGCCCTGCAGAAGAACTTCGATTTGAATTTC
    AACAAAGCACTCCTTAACCGCACGCAGCATTTCGAGGCAATGTGCCGGAGCACAAAAAAA
    AATGCTTTATCTAAGCCCGAGATCGTGTCCTACAGAGATCTGCTGGCGCGGCTGACCAGT
    TGCCTTTATCGAGGCTCGCTGGTTCTCAGAAGGGCGGGAATCGAAGTTCTGAAAAAGCAC
    AAAATCTTTGAGTCGAATAGTGAGCTGAGAGAACACGTCCACGAGCGAAAGCACTTCGTG
    TTCGTTAGTCCATTGGACAGAAAGGCAAAAAAACTGTTGCGCCTGACCGATTCCCGCCCT
    GACTTGCTCCATGTGATCGATGAGATCCTGCAACATGACAATCTGGAGAATAAGGACAGA
    GAGTCCCTTTGGCTGGTCCGGTCTGGGTACCTCCTTGCTGGTCTGCCGGACCAGCTGAGT
    TCTTCGTTTATCAATCTCCCCATAATCACGCAAAAGGGCGATCGCCGGCTGATTGACCTG
    ATTCAGTATGACCAGATCAATCGCGATGCTTTCGTAATGTTGGTGACAAGTGCTTTCAAA
    AGCAATCTCTCTGGGTTGCAGTACCGCGCTAACAAGCAGTCTTTCGTGGTCACCCGCACC
    CTGTCTCCTTACCTGGGTAGTAAGCTCGTATACGTCCCTAAAGACAAAGATTGGCTGGTC
    CCATCCCAGATGTTTGAGGGAAGATTCGCCGATATTCTGCAGAGTGACTACATGGTCTGG
    AAGGATGCCGGACGCCTGTGCGTGATCGACACTGCCAAACATCTCTCTAACATTAAAAAA
    AGCGTGTTTAGTAGCGAAGAAGTCCTTGCTTTTCTTCGAGAGCTGCCTCACCGGACCTTC
    ATCCAGACCGAGGTACGGGGGTTAGGAGTGAACGTCGATGGAATCGCATTTAATAACGGG
    GATATCCCGAGCTTGAAGACATTCTCGAATTGTGTGCAGGTGAAGGTGAGTAGGACTAAT
    ACTAGTCTCGTGCAGACTCTAAACAGGTGGTTCGAGGGTGGCAAAGTGTCACCTCCCTCT
    ATTCAGTTCGAAAGAGCTTACTACAAAAAAGACGATCAGATTCACGAGGACGCAGCCAAG
    AGAAAGATACGCTTCCAGATGCCAGCAACGGAATTAGTGCACGCCAGCGATGACGCTGGT
    TGGACCCCCAGCTACCTGCTGGGCATCGACCCCGGTGAGTACGGAATGGGTCTCAGTTTG
    GTGTCCATCAACAATGGAGAGGTCCTGGATTCTGGATTCATCCACATTAATTCCCTGATC
    AATTTCGCGTCCAAAAAAAGCAATCACCAGACCAAAGTAGTCCCCCGCCAGCAGTACAAG
    TCCCCCTACGCGAATTATCTCGAGCAGTCAAAGGATTCAGCAGCAGGGGATATAGCTCAC
    ATTCTGGATCGGCTAATCTACAAATTGAACGCCTTGCCTGTGTTCGAGGCGCTGTCTGGC
    AACAGTCAGAGTGCTGCTGATCAGGTATGGACCAAAGTTCTATCCTTCTATACATGGGGA
    GACAACGACGCACAGAACAGTATACGGAAGCAGCACTGGTTCGGTGCCTCACACTGGGAT
    ATTAAGGGGATGCTGCGCCAACCCCCAACCGAAAAAAAACCCAAACCATATATAGCCTTT
    CCCGGGAGTCAAGTGTCATCCTATGGAAATAGTCAAAGGTGTAGTTGTTGCGGCCGCAAT
    CCCATTGAGCAGTTGCGTGAGATGGCAAAGGACACGAGTATCAAGGAGCTGAAAATCCGA
    AATAGTGAGATCCAACTATTCGATGGTACAATCAAGCTGTTTAACCCCGACCCTTCCACC
    GTCATCGAGAGGCGGCGGCATAACCTAGGACCCTCACGCATTCCTGTGGCAGACCGAACT
    TTCAAGAATATTAGCCCTTCTTCGTTAGAGTTCAAGGAGCTCATTACTATCGTTTCTCGA
    AGCATCCGCCATAGCCCCGAATTTATTGCTAAGAAACGGGGTATCGGGTCTGAGTACTTT
    TGTGCTTATTCTGACTGCAACTCCTCACTGAACTCAGAGGCCAATGCCGCGGCCAATGTG
    GCACAGAAGTTTCAGAAGCAACTCTTTTTCGAACTCTGA
    SEQ ATGAAACGTATTCTGAACTCTCTGAAAGTCGCCGCACTGAGGCTGCTGTTTCGAGGAAAG
    ID GGCTCAGAGCTGGTGAAGACCGTCAAGTACCCTCTGGTTTCGCCCGTCCAGGGTGCTGTG
    NO: GAAGAACTCGCCGAAGCAATACGCCACGACAACCTACATTTATTTGGGCAGAAGGAAATC
    163 GTAGATCTGATGGAGAAGGACGAGGGCACCCAGGTCTACTCGGTGGTGGACTTTTGGCTC
    GACACACTCCGTCTAGGGATGTTCTTCAGTCCAAGTGCTAATGCCCTTAAGATCACTCTG
    GGGAAGTTTAACAGCGACCAAGTTTCCCCTTTCAGGAAGGTTCTGGAGCAGTCCCCTTTC
    TTTCTCGCGGGTAGACTCAAAGTGGAGCCCGCTGAACGTATCCTCAGCGTGGAGATCCGC
    AAGATCGGTAAGAGGGAGAATAGAGTGGAGAACTACGCCGCAGATGTAGAGACTTGTTTT
    ATCGGTCAGCTGTCTAGTGATGAAAAGCAGTCTATCCAGAAGCTCGCTAACGATATCTGG
    GACTCTAAGGATCACGAAGAGCAAAGGATGCTTAAGGCGGATTTCTTTGCCATTCCCCTC
    ATCAAAGACCCAAAGGCAGTGACCGAGGAAGATCCCGAGAATGAAACCGCAGGCAAACAG
    AAGCCTCTCGAATTATGTGTGTGCTTAGTGCCCGAGTTGTACACCCGCGGGTTCGGTTCA
    ATAGCGGACTTCCTGGTCCAGCGTCTGACACTATTAAGAGACAAAATGAGCACAGACACA
    GCAGAAGACTGCCTTGAGTATGTCGGCATAGAGGAGGAGAAGGGTAATGGGATGAACTCG
    CTGCTGGGGACGTTCCTCAAGAACCTGCAGGGAGACGGGTTCGAACAGATCTTCCAATTT
    ATGCTCGGCAGTTACGTGGGATGGCAAGGTAAGGAAGACGTCCTACGCGAACGGCTTGAT
    TTGCTAGCGGAGAAGGTTAAAAGACTGCCGAAACCTAAGTTTGCCGGCGAGTGGTCCGGC
    CATCGGATGTTCCTGCATGGTCAATTGAAGAGCTGGTCCTCTAACTTTTTCCGCCTGTTT
    AACGAGACTAGGGAGCTCCTCGAAAGCATAAAATCCGACATCCAACACGCGACCATGTTA
    ATCAGCTACGTCGAAGAGAAAGGGGGATACCACCCACAACTCTTGTCACAGTACAGGAAA
    CTAATGGAGCAGCTGCCAGCTCTCAGAACAAAGGTGTTAGATCCAGAGATAGAAATGACT
    CACATGAGCGAGGCGGTAAGGTCGTACATTATGATCCACAAGTCGGTAGCAGGATTTCTG
    CCTGACTTACTCGAGTCCCTCGATAGGGACAAGGACAGGGAATTCCTGCTGAGTATATTT
    CCAAGGATCCCCAAAATTGACAAAAAAACTAAGGAAATCGTGGCCTGGGAGCTCCCAGGC
    GAGCCCGAAGAAGGATACCTGTTCACTGCCAATAATCTTTTTCGCAACTTTCTGGAGAAT
    CCTAAACATGTTCCACGTTTCATGGCAGAAAGGATCCCGGAAGATTGGACGCGCCTGCGG
    TCCGCTCCCGTATGGTTTGACGGCATGGTGAAACAATGGCAGAAAGTGGTAAACCAGCTG
    GTGGAGTCACCTGGAGCATTGTATCAGTTCAATGAAAGCTTTCTCCGACAACGTTTACAG
    GCAATGCTGACAGTGTATAAGAGAGACCTGCAGACAGAGAAATTCCTTAAGTTGTTGGCT
    GATGTCTGCAGGCCTCTGGTGGACTTCTTTGGGCTGGGGGGAAACGATATCATCTTCAAA
    AGCTGCCAGGACCCGAGGAAACAATGGCAAACTGTCATTCCCTTGAGTGTCCCCGCTGAT
    GTGTACACCGCGTGTGAGGGGCTGGCAATCCGGCTTCGTGAGACATTGGGATTTGAGTGG
    AAGAACCTTAAGGGCCATGAAAGGGAGGACTTTCTAAGACTGCACCAGCTTTTAGGGAAT
    CTGCTTTTCTGGATTCGAGATGCCAAACTGGTGGTGAAATTGGAAGATTGGATGAATAAT
    CCCTGTGTTCAGGAGTACGTTGAGGCTCGTAAGGCCATTGATCTCCCACTGGAGATCTTC
    GGCTTTGAGGTCCCCATCTTCCTGAACGGATATCTGTTTAGTGAACTGAGGCAGTTAGAA
    CTGCTGCTCCGCCGTAAGTCGGTTATGACCAGCTATTCGGTTAAGACAACTGGCAGTCCA
    AACAGGCTTTTCCAGTTAGTCTACCTGCCATTAAATCCTTCCGACCCTGAGAAAAAAAAT
    TCTAATAACTTTCAGGAACGCCTGGACACCCCCACTGGCTTATCACGTCGCTTCCTGGAC
    CTTACTCTGGACGCCTTCGCCGGCAAGTTGCTGACAGACCCCGTGACTCAAGAGCTTAAA
    ACTATGGCTGGGTTCTACGATCACCTGTTTGGTTTCAAGCTCCCATGTAAGCTGGCAGCC
    ATGTCTAACCACCCTGGCTCTAGCAGCAAGATGGTCGTGTTGGCCAAACCTAAAAAAGGG
    GTTGCATCTAATATAGGATTCGAACCAATCCCTGATCCCGCGCACCCCGTATTCCGGGTG
    AGATCATCATGGCCAGAGCTGAAGTATCTGGAGGGGTTACTGTATCTTCCAGAAGACACT
    CCACTGACAATAGAGCTCGCAGAGACAAGTGTTAGTTGTCAGAGCGTCAGTAGCGTGGCA
    TTCGATCTGAAAAATCTGACTACTATCCTTGGACGCGTGGGTGAGTTCCGTGTGACCGCA
    GACCAGCCTTTTAAGTTGACCCCCATCATCCCTGAGAAGGAGGAGTCCTTCATAGGAAAA
    ACATATCTAGGCCTTGATGCCGGGGAACGCTCAGGCGTAGGGTTCGCTATCGTCACAGTC
    GACGGGGATGGGTACGAGGTACAGCGCCTGGGGGTGCATGAAGATACACAGCTGATGGCC
    CTACAGCAGGTGGCCTCTAAAAGCTTGAAGGAGCCGGTGTTCCAGCCGCTCAGAAAGGGT
    ACTTTTCGGCAGCAGGAACGTATTAGAAAATCTCTCAGAGGATGTTATTGGAACTTCTAT
    CACGCTCTGATGATTAAGTACCGCGCCAAGGTAGTGCACGAAGAGAGCGTGGGCAGTTCC
    GGCCTGGTTGGGCAGTGGTTACGAGCATTCCAGAAGGACCTCAAGAAAGCCGATGTGTTG
    CCAAAAAAGGGAGGCAAAAACGGAGTCGATAAGAAAAAGAGAGAGTCTTCTGCACAAGAC
    ACATTGTGGGGAGGGGCTTTTAGCAAGAAGGAAGAACAGCAGATAGCTTTCGAAGTCCAA
    GCTGCTGGTTCTAGCCAGTTCTGCCTGAAGTGCGGATGGTGGTTCCAACTCGGAATGCGT
    GAGGTTAATCGCGTGCAGGAATCCGGCGTCGTGCTGGATTGGAATCGGAGTATTGTCACA
    TTCCTGATTGAGAGCTCTGGCGAGAAAGTGTATGGGTTCTCCCCTCAGCAACTCGAAAAG
    GGGTTCAGACCAGACATTGAAACCTTCAAGAAGATGGTTCGGGATTTCATGCGCCCGCCT
    ATGTTTGACCGGAAGGGTCGCCCAGCAGCTGCCTACGAAAGGTTTGTCTTGGGACGCCGG
    CATCGGCGGTATAGATTCGACAAGGTTTTTGAAGAACGATTCGGACGATCCGCGCTATTC
    ATTTGCCCGAGGGTTGGCTGTGGCAACTTTGACCACAGCAGCGAGCAGTCAGCCGTAGTG
    CTGGCTCTAATCGGATATATTGCCGACAAAGAGGGGATGAGCGGAAAAAAGCTAGTCTAC
    GTGCGTCTGGCAGAACTAATGGCGGAATGGAAATTGAAGAAACTGGAGAGGAGTAGAGTT
    GAGGAGCAAAGCTCCGCTCAGTGA
    SEQ ATGGCGGAGTCGAAGCAAATGCAGTGCAGGAAGTGTGGAGCCTCTATGAAGTACGAAGTG
    ID ATCGGCCTCGGGAAGAAAAGCTGCAGATATATGTGTCCCGACTGCGGGAATCACACATCT
    NO: GCAAGAAAGATTCAGAATAAGAAGAAAAGGGACAAGAAGTATGGATCTGCCAGTAAAGCA
    164 CAAAGCCAACGAATCGCAGTTGCAGGGGCCTTATACCCGGATAAAAAGGTTCAGACCATC
    AAGACTTATAAGTATCCAGCCGACCTGAATGGTGAGGTCCATGACTCAGGGGTGGCCGAA
    AAAATAGCCCAAGCAATCCAGGAGGATGAAATAGGGCTCCTCGGCCCCTCTTCCGAGTAC
    GCCTGTTGGATCGCTAGCCAGAAACAGAGCGAGCCCTACAGTGTTGTAGACTTTTGGTTT
    GACGCTGTGTGCGCCGGAGGCGTGTTCGCCTATTCTGGGGCTAGATTGCTGTCTACCGTC
    CTGCAGCTATCTGGGGAGGAGAGCGTCCTACGCGCAGCCCTGGCATCCTCCCCTTTTGTC
    GACGATATCAATCTGGCACAGGCCGAAAAATTTCTGGCGGTGTCCAGGCGAACCGGCCAA
    GATAAGCTGGGGAAGCGCATTGGAGAGTGCTTCGCAGAGGGCCGACTTGAGGCCCTAGGC
    ATCAAGGACCGGATGCGTGAATTTGTCCAGGCTATCGATGTCGCTCAGACCGCTGGGCAG
    CGTTTTGCCGCGAAACTGAAAATCTTTGGGATTTCTCAGATGCCCGAGGCAAAGCAGTGG
    AACAATGACAGCGGACTCACCGTGTGCATCCTGCCCGACTATTACGTCCCAGAAGAAAAT
    CGCGCAGATCAGTTGGTCGTCCTGCTAAGACGACTGAGAGAGATAGCATACTGTATGGGG
    ATCGAAGATGAGGCCGGTTTTGAACATCTTGGAATTGATCCTGGCGCACTATCAAATTTT
    TCCAATGGCAATCCTAAACGCGGATTTTTGGGCCGCCTGCTGAACAATGATATTATTGCC
    TTAGCGAACAACATGTCCGCCATGACGCCTTACTGGGAGGGCAGGAAGGGAGAACTGATT
    GAAAGATTGGCTTGGCTGAAGCACCGTGCAGAGGGGCTTTATCTGAAGGAACCGCATTTT
    GGAAATAGTTGGGCCGACCATAGGTCTAGAATTTTTTCCAGAATAGCCGGGTGGCTTTCT
    GGGTGCGCTGGGAAGCTAAAGATCGCCAAAGACCAGATCAGCGGAGTGCGTACTGATCTG
    TTCCTTCTGAAGAGACTGCTGGATGCGGTCCCGCAGTCCGCCCCTTCTCCCGACTTCATA
    GCCTCTATCTCTGCCTTGGATCGCTTCCTGGAGGCCGCAGAATCTAGTCAGGATCCTGCC
    GAACAGGTGAGGGCCCTATACGCCTTTCATCTGAACGCACCCGCGGTGCGAAGCATCGCC
    AACAAGGCAGTCCAGCGATCCGACAGCCAAGAATGGCTTATAAAGGAACTGGACGCTGTG
    GACCACCTGGAGTTTAACAAGGCCTTTCCCTTCTTCTCTGATACGGGAAAGAAGAAAAAG
    AAAGGGGCTAACTCGAATGGCGCTCCGTCCGAGGAGGAGTACACCGAGACTGAGAGCATC
    CAGCAGCCCGAGGACGCTGAGCAAGAGGTTAATGGTCAGGAAGGCAACGGGGCCTCGAAG
    AACCAGAAGAAGTTTCAGAGAATCCCCCGATTCTTCGGCGAGGGGAGTCGCAGCGAGTAT
    CGCATCCTCACTGAAGCCCCGCAGTACTTCGACATGTTCTGTAACAACATGCGGGCCATC
    TTTATGCAATTAGAATCCCAACCGCGTAAAGCTCCCAGGGATTTTAAGTGTTTCCTGCAG
    AATCGGCTGCAGAAATTGTATAAGCAGACATTCCTGAACGCTCGATCCAACAAGTGCCGG
    GCATTACTAGAGTCCGTATTGATTAGTTGGGGAGAGTTTTACACCTACGGGGCTAACGAG
    AAAAAATTTCGACTGCGTCATGAAGCTTCTGAGCGCTCCTCGGACCCAGATTACGTGGTG
    CAACAGGCGCTGGAGATCGCTCGGAGGCTGTTTCTCTTCGGCTTTGAGTGGAGGGACTGT
    AGCGCAGGTGAAAGAGTGGATCTGGTCGAAATACATAAGAAAGCCATATCTTTCCTGTTG
    GCCATCACTCAGGCTGAGGTGTCTGTGGGCAGCTATAACTGGCTGGGCAATTCTACCGTG
    AGTCGGTACCTGTCCGTGGCAGGGACTGATACCCTTTACGGCACCCAGCTGGAAGAATTC
    TTAAATGCAACCGTGTTATCTCAGATGCGGGGGCTGGCTATCAGGTTATCATCTCAGGAA
    CTGAAGGATGGATTTGACGTACAGCTGGAGTCTAGTTGCCAGGATAATCTGCAACACTTG
    CTCGTGTACAGGGCTTCACGAGACCTTGCCGCCTGCAAGCGCGCTACTTGTCCAGCTGAG
    TTGGATCCTAAGATTCTGGTACTGCCCGTGGGGGCCTTTATCGCTAGCGTGATGAAAATG
    ATTGAAAGAGGGGATGAGCCTTTAGCTGGAGCTTATCTGAGACACAGACCCCATAGTTTC
    GGGTGGCAGATCCGCGTTCGAGGTGTGGCAGAGGTGGGAATGGACCAAGGGACCGCCCTG
    GCGTTCCAGAAACCGACCGAGAGCGAACCCTTCAAGATAAAGCCGTTTTCCGCTCAATAC
    GGCCCCGTTCTATGGCTGAACAGCTCCAGTTATAGCCAGAGCCAGTACCTGGACGGGTTC
    CTATCACAGCCCAAGAACTGGAGTATGCGGGTGCTGCCACAGGCCGGCTCAGTGCGGGTA
    GAACAGCGCGTCGCCTTGATTTGGAATCTCCAGGCCGGAAAGATGAGGCTGGAACGGAGC
    GGAGCGCGGGCTTTCTTCATGCCCGTCCCATTCAGTTTCCGCCCCAGTGGCAGCGGCGAC
    GAGGCAGTCCTGGCTCCAAATAGGTACCTGGGACTCTTTCCACACAGCGGCGGCATAGAG
    TACGCTGTGGTCGATGTTCTTGACTCTGCCGGCTTCAAAATACTCGAGAGAGGAACAATA
    GCCGTCAATGGCTTCTCCCAGAAACGAGGAGAAAGACAAGAGGAAGCCCATCGCGAAAAA
    CAAAGACGCGGTATCTCCGATATTGGGCGCAAGAAGCCAGTCCAGGCCGAAGTCGATGCG
    GCCAACGAGCTCCATCGAAAATACACCGATGTTGCTACTCGGCTGGGGTGTCGAATTGTC
    GTTCAATGGGCACCCCAACCCAAACCAGGCACTGCGCCGACCGCTCAGACTGTGTACGCT
    AGGGCCGTGAGGACTGAAGCACCAAGATCCGGCAATCAGGAAGATCACGCCAGGATGAAA
    TCTTCCTGGGGATACACATGGGGTACGTATTGGGAAAAAAGGAAGCCCGAGGACATCCTC
    GGCATTAGTACCCAGGTGTATTGGACAGGCGGGATCGGCGAGTCCTGCCCGGCTGTCGCC
    GTCGCGCTATTGGGACACATCAGGGCCACCTCAACCCAGACTGAATGGGAGAAAGAGGAA
    GTCGTGTTTGGGCGATTGAAAAAGTTCTTCCCATCCTGA
    SEQ ATGGAGAAGCGCATCAATAAAATTCGCAAGAAGCTGTCTGCCGATAACGCCACAAAACCA
    ID GTTAGTCGAAGCGGCCCAATGAAGACCCTGCTAGTTCGAGTGATGACTGATGATCTGAAG
    NO: AAAAGGCTCGAAAAGCGACGCAAGAAGCCTGAGGTAATGCCTCAGGTTATAAGTAACAAT
    165 GCAGCAAACAATCTGCGGATGCTGCTTGACGATTACACAAAGATGAAGGAAGCCATTCTC
    CAGGTGTATTGGCAGGAGTTCAAGGATGATCACGTAGGCCTGATGTGTAAATTCGCGCAA
    CCTGCAAGCAAGAAGATCGACCAAAACAAGCTGAAACCCGAGATGGATGAAAAAGGCAAT
    TTAACAACCGCCGGATTCGCTTGTTCCCAGTGTGGGCAGCCACTGTTCGTGTACAAGTTA
    GAACAGGTGTCGGAAAAAGGAAAGGCATACACTAACTACTTTGGACGGTGCAATGTTGCA
    GAACACGAAAAGCTGATACTGCTTGCCCAGCTTAAGCCCGAAAAAGACAGCGACGAAGCG
    GTGACCTACAGCCTGGGAAAATTCGGGCAGCGGGCACTGGACTTCTATTCTATCCACGTT
    ACCAAGGAGAGCACCCACCCAGTGAAGCCGTTGGCCCAAATCGCTGGAAACCGGTACGCC
    AGCGGACCAGTCGGCAAGGCCCTGTCCGATGCCTGTATGGGCACAATTGCTTCTTTCCTG
    TCCAAGTACCAGGACATCATAATCGAGCACCAAAAAGTTGTGAAAGGGAATCAGAAACGC
    CTGGAATCCCTTCGAGAACTGGCCGGCAAGGAGAACCTTGAGTACCCGTCCGTGACCCTG
    CCTCCACAGCCACATACCAAAGAGGGCGTAGACGCGTATAATGAGGTCATTGCCCGCGTT
    CGCATGTGGGTTAATTTAAACCTGTGGCAGAAATTAAAACTAAGCCGAGATGATGCTAAA
    CCGTTACTGAGATTGAAGGGATTCCCTAGCTTTCCTGTGGTGGAGAGAAGGGAAAACGAG
    GTTGATTGGTGGAATACTATTAATGAGGTGAAAAAGCTTATTGACGCCAAGAGGGATATG
    GGCAGGGTGTTCTGGAGCGGGGTGACTGCCGAAAAGAGAAATACCATCCTCGAGGGATAC
    AATTACCTCCCCAACGAGAATGATCATAAGAAAAGAGAGGGGAGCTTAGAGAATCCAAAG
    AAACCTGCAAAGAGGCAATTCGGTGATCTCCTGCTCTACCTCGAGAAGAAATACGCGGGG
    GACTGGGGAAAAGTTTTTGACGAAGCCTGGGAGCGCATTGACAAGAAGATCGCCGGGCTG
    ACGTCTCACATTGAACGGGAAGAGGCACGGAATGCAGAGGACGCCCAGTCTAAGGCCGTG
    CTGACTGACTGGCTGCGCGCAAAGGCCTCCTTCGTGCTCGAACGTCTGAAGGAAATGGAT
    GAGAAAGAGTTTTACGCGTGTGAAATACAGCTGCAGAAGTGGTACGGCGATCTAAGGGGA
    AATCCCTTCGCAGTGGAAGCCGAGAATAGGGTAGTTGACATCAGTGGGTTCTCCATCGGC
    AGTGATGGACATTCTATCCAGTATAGAAACCTGCTCGCCTGGAAGTACTTAGAGAACGGC
    AAGAGAGAGTTCTATCTGCTGATGAACTACGGGAAAAAAGGTAGAATTCGCTTTACAGAT
    GGCACCGACATAAAGAAGTCCGGAAAGTGGCAAGGCCTCTTATACGGAGGCGGCAAAGCA
    AAGGTGATAGACTTGACTTTTGACCCTGACGACGAACAGCTGATAATCTTGCCGCTGGCC
    TTTGGCACAAGACAAGGTAGGGAATTTATCTGGAATGATCTTCTTTCTCTCGAGACCGGA
    CTCATCAAGCTCGCAAACGGAAGGGTCATCGAGAAGACAATCTACAATAAAAAGATAGGC
    CGAGACGAGCCAGCCCTGTTTGTGGCTTTGACATTTGAGCGGAGAGAGGTCGTAGATCCC
    AGCAACATCAAACCCGTGAACCTGATCGGTGTTGACAGGGGCGAGAACATCCCGGCGGTT
    ATCGCACTGACGGATCCAGAAGGATGTCCTCTGCCCGAGTTCAAAGATTCATCGGGAGGG
    CCAACCGACATTTTGAGGATAGGGGAGGGGTACAAGGAGAAGCAGCGAGCTATCCAGGCG
    GCCAAAGAAGTGGAGCAACGAAGAGCTGGTGGTTATTCTCGCAAGTTCGCTTCCAAAAGT
    CGTAACCTGGCTGACGATATGGTGCGCAATTCTGCCCGTGACCTTTTCTACCACGCCGTT
    ACACACGACGCCGTGTTAGTGTTTGAAAATCTTAGTCGAGGCTTCGGGCGACAGGGGAAG
    CGGACCTTTATGACCGAGAGACAGTATACAAAAATGGAGGATTGGCTGACCGCCAAACTG
    GCGTATGAAGGACTCACATCCAAGACCTATCTCTCAAAAACTTTGGCCCAGTATACATCT
    AAGACGTGCAGTAACTGTGGCTTCACCATTACCACAGCTGACTACGATGGCATGCTGGTC
    CGCTTAAAAAAGACATCTGACGGCTGGGCTACTACCCTCAACAATAAAGAGCTCAAAGCC
    GAAGGACAAATTACCTATTATAACAGGTATAAAAGACAGACTGTCGAGAAGGAGTTGAGC
    GCGGAGCTGGACCGCCTATCAGAGGAGTCAGGGAACAACGATATCTCTAAGTGGACTAAG
    GGACGCCGAGACGAGGCGTTGTTCTTGCTGAAAAAGCGGTTCTCTCATCGACCCGTGCAG
    GAGCAGTTCGTGTGTCTGGACTGCGGCCACGAGGTTCATGCTGATGAGCAAGCTGCTCTA
    AATATTGCCCGTAGTTGGTTGTTCCTGAACAGCAATTCAACAGAGTTCAAGTCATACAAG
    AGCGGAAAGCAGCCGTTTGTGGGCGCATGGCAGGCATTTTACAAAAGACGCCTGAAGGAA
    GTGTGGAAGCCAAACGCC
    SEQ ATGAAAAGGATTAACAAAATCCGAAGGCGGCTTGTAAAGGATTCTAACACCAAAAAGGCT
    ID GGCAAGACGGGGCCCATGAAAACATTACTCGTTAGAGTTATGACCCCCGACCTCAGAGAG
    NO: CGACTGGAAAATTTACGCAAGAAGCCAGAGAACATACCTCAGCCAATTAGTAATACCTCT
    166 CGGGCAAACCTAAACAAGTTGCTTACTGATTACACGGAGATGAAAAAGGCCATACTGCAT
    GTGTACTGGGAGGAGTTTCAAAAGGACCCTGTCGGGCTAATGAGCAGGGTGGCTCAGCCT
    GCACCTAAAAACATCGACCAGCGGAAACTCATCCCAGTTAAGGACGGAAATGAGAGATTG
    ACAAGTTCAGGTTTCGCCTGCTCACAGTGCTGTCAACCGCTGTACGTTTATAAGTTAGAA
    CAAGTGAATGACAAAGGAAAGCCTCACACAAATTATTTTGGCCGGTGTAATGTCTCTGAG
    CATGAGCGTCTGATTCTGTTGTCCCCGCATAAACCGGAAGCTAATGACGAGCTCGTAACC
    TACAGCTTGGGGAAGTTTGGCCAAAGAGCATTGGACTTCTATTCAATCCATGTGACCCGC
    GAATCCAATCATCCCGTCAAGCCCTTGGAGCAGATAGGGGGCAATAGTTGCGCTTCTGGC
    CCTGTGGGCAAAGCCCTGTCCGACGCCTGTATGGGAGCCGTGGCTTCATTCCTGACCAAA
    TATCAGGATATCATCTTGGAGCACCAGAAAGTGATCAAGAAAAATGAAAAAAGGTTAGCA
    AACCTCAAGGATATTGCAAGCGCTAACGGCTTGGCTTTTCCTAAAATCACACTTCCACCT
    CAGCCTCACACAAAGGAAGGCATCGAGGCATACAACAATGTGGTGGCCCAGATCGTCATC
    TGGGTTAACTTAAACCTGTGGCAGAAACTTAAAATTGGCAGGGATGAGGCAAAACCCTTA
    CAGCGCCTGAAAGGATTCCCCAGCTTTCCACTGGTGGAGCGCCAGGCTAACGAAGTGGAC
    TGGTGGGATATGGTGTGTAACGTCAAGAAGCTCATCAATGAAAAGAAAGAGGACGGTAAA
    GTCTTCTGGCAGAACCTCGCCGGTTACAAACGGCAGGAGGCGCTGTTACCTTATCTGTCG
    AGTGAAGAGGACCGGAAAAAAGGCAAGAAATTTGCTCGTTATCAGTTTGGTGATTTGCTC
    CTACATTTGGAGAAGAAGCACGGCGAGGACTGGGGAAAAGTATACGATGAGGCCTGGGAG
    AGGATTGACAAAAAGGTGGAGGGACTGTCAAAGCACATCAAGCTCGAAGAAGAGCGCAGA
    AGCGAGGACGCCCAATCCAAAGCAGCGCTGACTGACTGGCTGCGGGCGAAGGCCAGTTTT
    GTAATCGAAGGCCTTAAAGAAGCCGACAAGGATGAATTCTGCAGATGCGAATTAAAACTC
    CAGAAGTGGTACGGCGATCTCCGAGGTAAGCCTTTCGCAATCGAGGCCGAGAATTCCATA
    CTGGACATTAGTGGATTCAGTAAACAGTATAATTGTGCCTTTATATGGCAGAAGGATGGT
    GTCAAGAAACTCAACCTGTACCTTATTATTAATTATTTCAAAGGCGGGAAACTGAGATTT
    AAGAAGATAAAGCCTGAAGCCTTTGAGGCGAACCGATTCTACACAGTTATTAACAAGAAA
    TCTGGTGAAATTGTACCCATGGAGGTAAACTTCAACTTCGATGATCCCAATCTGATTATA
    TTGCCACTAGCTTTTGGCAAGCGGCAGGGTAGGGAATTCATTTGGAACGATTTGCTTTCA
    CTGGAAACAGGGTCCCTTAAGCTGGCAAACGGGAGAGTGATTGAAAAGACATTGTACAAT
    CGGAGGACACGTCAGGATGAACCTGCCCTTTTCGTGGCTCTGACATTCGAGCGCAGGGAG
    GTTCTGGACTCTAGCAATATCAAGCCAATGAACCTGATCGGCATAGACCGAGGAGAGAAT
    ATTCCGGCTGTGATCGCACTCACCGATCCCGAAGGATGTCCCCTTTCTCGGTTCAAGGAC
    TCCTTAGGCAATCCAACTCATATCCTGAGAATCGGCGAGTCATACAAGGAGAAGCAGCGA
    ACAATTCAGGCCGCCAAGGAAGTCGAGCAGAGGCGAGCTGGCGGCTACAGCCGTAAATAC
    GCTAGTAAAGCTAAGAACCTGGCCGACGATATGGTGCGCAATACTGCTAGAGACCTGCTG
    TACTATGCAGTGACGCAGGACGCAATGCTGATATTCGAGAATCTGTCCAGAGGATTCGGA
    AGGCAGGGCAAGCGGACGTTCATGGCCGAGCGCCAGTATACAAGGATGGAGGATTGGTTA
    ACGGCCAAGCTTGCCTATGAGGGGCTACCTAGTAAGACCTATCTGTCTAAGACGCTGGCT
    CAATACACCAGTAAGACCTGCTCAAACTGTGGCTTTACAATCACTTCTGCTGATTATGAT
    AGAGTGCTCGAGAAGCTAAAAAAAACTGCCACCGGCTGGATGACTACTATTAATGGGAAG
    GAACTGAAAGTGGAAGGACAGATTACCTATTATAATCGCTACAAGCGTCAAAACGTCGTC
    AAGGACCTGTCGGTGGAATTGGACAGACTCAGTGAAGAGTCCGTGAACAATGATATCAGC
    TCCTGGACAAAAGGGCGCAGTGGGGAGGCACTCAGCTTGCTTAAAAAGAGGTTTTCACAT
    CGGCCGGTCCAGGAGAAATTTGTCTGCCTGAACTGCGGATTCGAGACACACGCCGACGAG
    CAGGCAGCACTGAACATTGCCAGATCCTGGCTGTTCCTTAGGTCCCAGGAATATAAGAAG
    TACCAGACTAACAAAACCACGGGAAACACAGATAAAAGGGCCTTTGTCGAAACTTGGCAA
    TCCTTTTACCGGAAGAAGTTAAAGGAAGTGTGGAAGCCC
    SEQ ATGGATAAGAAATACTCAATAGGCTTAGCAATCGGCACAAATAGCGTCGGATGGGCGGTG
    ID ATCACTGATGAATATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGC
    NO: CACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGACAGTGGAGAGACAGCGGAA
    167 GCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGT
    TATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGA
    CTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATTTTTGGA
    AATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCGAAAA
    AAATTGGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCTTAGCGCAT
    ATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGAT
    GTGGACAAACTATTTATCCAGTTGGTACAAACCTACAATCAATTATTTGAAGAAAACCCT
    ATTAACGCAAGTGGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAGTAAATCAAGA
    CGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAAAAATGGCTTATTTGGGAAT
    CTCATTGCTTTGTCATTGGGTTTGACCCCTAATTTTAAATCAAATTTTGATTTGGCAGAA
    GATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAGATAATTTATTGGCG
    CAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATT
    TTACTTTCAGATATCCTAAGAGTAAATACTGAAATAACTAAGGCTCCCCTATCAGCTTCA
    ATGATTAAACGCTACGATGAACATCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGA
    CAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAATCAAAAAACGGATATGCA
    GGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTA
    GAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGC
    AAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGGTGAGCTGCAT
    GCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAAAAGACAATCGTGAGAAGATT
    GAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGT
    CGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAA
    GTTGTCGATAAAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAA
    AATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTATTTTACGGTT
    TATAACGAATTGACAAAGGTCAAATATGTTACTGAAGGAATGCGAAAACCAGCATTTCTT
    TCAGGTGAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCGAAAAGTAACC
    GTTAAGCAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATT
    TCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGTACCTACCATGATTTGCTAAAAATT
    ATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTT
    TTAACATTGACCTTATTTGAAGATAGGGAGATGATTGAGGAAAGACTTAAAACATATGCT
    CACCTCTTTGATGATAAGGTGATGAAACAGCTTAAACGTCGCCGTTATACTGGTTGGGGA
    CGTTTGTCTCGAAAATTGATTAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTA
    GATTTTTTGAAATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGAT
    AGTTTGACATTTAAAGAAGACATTCAAAAAGCACAAGTGTCTGGACAAGGCGATAGTTTA
    CATGAACATATTGCAAATTTAGCTGGTAGCCCTGCTATTAAAAAAGGTATTTTACAGACT
    GTAAAAGTTGTTGATGAATTGGTCAAAGTAATGGGGCGGCATAAGCCAGAAAATATCGTT
    ATTGAAATGGCACGTGAAAATCAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGT
    ATGAAACGAATCGAAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCT
    GTTGAAAATACTCAATTGCAAAATGAAAAGCTCTATCTCTATTATCTCCAAAATGGAAGA
    GACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTGATTATGATGTCGATCAC
    ATTGTTCCACAAAGTTTCCTTAAAGACGATTCAATAGACAATAAGGTCTTAACGCGTTCT
    GATAAAAATCGTGGTAAATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAA
    AACTATTGGAGACAACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTA
    ACGAAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAA
    TTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAAT
    ACTAAATACGATGAAAATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCT
    AAATTAGTTTCTGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAAT
    TACCATCATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAA
    TATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAA
    ATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCT
    AATATCATGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGC
    CCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTT
    GCCACAGTGCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTA
    CAGACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATT
    GCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCT
    TATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTT
    AAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGAC
    TTTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAA
    TATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTA
    CAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGT
    CATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGAG
    CAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGCGTGTT
    ATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAA
    CCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTTGGAGCT
    CCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAA
    GAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATT
    GATTTGAGTCAGCTAGGAGGTGACTGA
    SEQ ATGGATAAGAAGTATTCAATTGGACTTGCGATTGGCACTAACAGTGTGGGCTGGGCGGTG
    ID ATTACAGACGAGTATAAGGTGCCGTCAAAAAAGTTTAAAGTTCTGGGCAACACTGATCGC
    NO: CATTCCATCAAGAAAAACCTAATCGGGGCCCTTCTTTTTGATAGTGGCGAAACGGCCGAG
    168 GCGACGCGTCTAAAACGTACCGCGCGGCGTCGCTACACCCGACGAAAAAACCGTATTTGT
    TACCTTCAGGAGATCTTCAGTAACGAAATGGCTAAGGTGGACGATTCATTCTTCCACCGT
    CTGGAGGAGTCCTTTTTAGTTGAAGAAGACAAGAAGCATGAGCGACACCCAATTTTTGGT
    AACATTGTCGACGAAGTCGCCTATCACGAAAAATATCCGACCATTTATCACCTGCGCAAA
    AAACTGGTCGATAGCACGGATAAAGCGGATCTGCGGCTTATTTACCTGGCGCTTGCCCAC
    ATGATCAAGTTCCGCGGCCACTTCCTGATAGAAGGAGACCTGAACCCGGATAATAGCGAT
    GTAGACAAACTGTTTATTCAGCTGGTCCAGACCTACAACCAGCTGTTTGAAGAAAATCCG
    ATTAATGCGTCAGGCGTGGATGCGAAAGCGATACTGAGTGCCCGCCTGTCGAAATCTCGC
    CGTCTCGAAAATCTGATTGCACAGCTGCCCGGCGAAAAAAAAAACGGTCTTTTTGGCAAT
    CTGATCGCGCTGTCACTGGGCCTGACACCAAATTTTAAGAGCAACTTCGACCTGGCAGAG
    GATGCGAAGCTTCAACTGTCGAAGGACACCTATGACGATGATCTGGATAATCTTCTGGCA
    CAAATCGGTGATCAGTATGCGGATTTATTCCTTGCAGCGAAAAACCTATCTGACGCAATT
    CTGTTGAGCGATATCCTCCGCGTCAACACCGAAATCACTAAAGCCCCCCTGTCAGCGTCG
    ATGATTAAACGTTATGATGAGCACCATCAGGATCTGACCTTGCTAAAGGCGCTGGTGCGA
    CAGCAGCTTCCCGAAAAATATAAAGAGATCTTTTTTGATCAATCGAAGAATGGTTATGCC
    GGATACATTGATGGCGGAGCCAGTCAGGAAGAATTTTACAAATTCATCAAACCGATCCTG
    GAAAAAATGGATGGCACAGAAGAACTGCTTGTGAAATTGAACCGGGAAGATTTACTGCGC
    AAACAGCGTACGTTCGACAACGGCTCCATACCCCATCAGATTCACTTAGGTGAGCTGCAT
    GCAATACTCCGTCGCCAGGAAGATTTTTATCCATTTTTAAAAGACAACCGTGAGAAGATT
    GAAAAAATTTTAACTTTTCGTATTCCATATTACGTCGGGCCTTTGGCCCGAGGTAACTCT
    CGATTCGCCTGGATGACGAGAAAAAGCGAGGAGACCATCACTCCGTGGAATTTTGAAGAG
    GTTGTTGATAAAGGCGCGAGCGCCCAGTCGTTTATCGAACGTATGACCAACTTTGATAAA
    AATCTGCCGAATGAAAAAGTGCTTCCGAAGCATTCTCTGTTGTATGAATATTTCACTGTG
    TACAATGAGTTAACGAAAGTGAAATATGTGACCGAAGGCATGCGGAAACCTGCTTTTCTG
    TCCGGAGAACAGAAAAAAGCAATTGTGGACCTGCTGTTCAAAACGAACCGGAAAGTAACT
    GTGAAGCAGCTGAAAGAGGACTACTTCAAAAAAATCGAATGCTTCGACTCAGTAGAGATC
    TCTGGTGTTGAAGATCGCTTCAACGCGAGTCTGGGAACGTACCATGATTTGTTGAAAATC
    ATCAAAGATAAAGACTTTCTGGATAACGAAGAGAATGAGGACATTCTTGAAGATATTGTT
    TTGACACTGACTCTGTTTGAGGATCGCGAAATGATTGAAGAGCGCCTGAAAACGTATGCC
    CATTTATTCGATGACAAAGTCATGAAGCAGCTGAAACGTCGCCGCTATACTGGGTGGGGC
    AGACTTTCACGTAAATTGATCAATGGTATAAGAGACAAACAGAGCGGCAAAACTATCTTA
    GATTTCCTGAAGAGTGATGGATTTGCCAACCGGAATTTTATGCAGCTTATACATGATGAC
    TCGCTAACGTTTAAAGAAGACATTCAGAAGGCGCAGGTCAGCGGCCAGGGTGATTCGCTG
    CATGAACACATTGCAAATCTTGCCGGATCGCCAGCGATCAAAAAAGGCATCCTTCAGACA
    GTAAAAGTTGTGGATGAACTGGTGAAAGTAATGGGTCGTCACAAGCCAGAAAATATTGTG
    ATCGAAATGGCCCGGGAAAATCAGACTACTCAAAAAGGTCAGAAAAATTCTCGCGAGCGT
    ATGAAACGTATTGAAGAAGGCATCAAAGAGCTAGGCAGCCAGATATTAAAGGAACATCCG
    GTTGAGAACACTCAGCTGCAGAATGAAAAACTGTATCTGTATTATCTTCAGAACGGCCGT
    GACATGTATGTTGATCAAGAACTGGATATCAATCGCTTGTCCGATTATGACGTGGATCAT
    ATTGTTCCGCAAAGCTTTCTGAAAGACGATTCTATTGACAATAAAGTACTGACACGTTCG
    GACAAAAACCGTGGTAAAAGCGATAACGTACCGTCGGAAGAAGTTGTTAAGAAAATGAAA
    AATTATTGGCGCCAACTCCTGAATGCTAAATTGATTACCCAGCGGAAATTTGATAACTTA
    ACCAAAGCCGAGCGGGGTGGCTTAAGTGAACTGGATAAAGCGGGTTTTATTAAACGCCAA
    CTGGTAGAAACCCGCCAGATAACGAAACATGTAGCTCAAATCCTCGATAGTCGCATGAAT
    ACGAAATATGACGAAAATGATAAATTGATCCGTGAAGTAAAAGTGATTACTCTTAAAAGC
    AAATTGGTATCTGATTTTCGGAAAGATTTCCAATTCTATAAGGTGAGAGAAATTAACAAT
    TACCATCATGCACATGATGCGTATTTAAATGCAGTTGTTGGCACCGCCTTAATCAAAAAA
    TATCCGAAATTAGAATCTGAGTTCGTGTATGGTGATTATAAAGTTTATGATGTTCGAAAA
    ATGATTGCTAAGTCTGAACAGGAAATCGGCAAAGCGACCGCAAAGTATTTTTTTTATAGC
    AATATTATGAATTTTTTTAAAACTGAGATTACCCTGGCGAATGGCGAAATTCGCAAACGT
    CCTCTGATTGAAACCAATGGCGAAACCGGCGAGATAGTATGGGACAAGGGCCGTGATTTT
    GCGACCGTCCGGAAAGTCCTGTCAATGCCGCAGGTGAATATTGTCAAGAAAACAGAAGTT
    CAGACAGGCGGTTTTAGTAAAGAGTCTATTCTGCCCAAACGTAATTCGGATAAATTGATT
    GCCCGCAAGAAAGATTGGGATCCGAAGAAATATGGTGGATTCGATTCTCCGACGGTCGCC
    TATAGCGTTCTAGTCGTCGCCAAGGTCGAAAAAGGTAAATCCAAAAAACTGAAATCTGTG
    AAAGAACTGTTAGGCATTACAATCATGGAACGTAGTAGTTTTGAAAAGAACCCGATCGAC
    TTCCTCGAGGCGAAAGGCTACAAAGAAGTCAAGAAGGATTTGATTATTAAACTCCCAAAA
    TATTCATTATTTGAGTTAGAAAACGGTAGGAAGCGTATGCTGGCGAGTGCTGGGGAATTA
    CAGAAAGGGAATGAGTTAGCACTGCCGTCAAAATATGTGAACTTTCTGTATCTGGCCTCC
    CATTACGAGAAACTGAAAGGTAGCCCGGAAGATAATGAACAGAAACAACTATTTGTCGAG
    CAACACAAACATTATCTGGATGAAATTATTGAACAGATTAGTGAATTCTCTAAACGTGTT
    ATTTTAGCGGATGCCAACCTTGACAAGGTGCTGAGCGCATATAATAAACACCGTGATAAA
    CCCATTCGTGAACAGGCTGAAAATATCATACATCTGTTCACGTTAACCAACTTGGGAGCT
    CCTGCCGCTTTTAAATATTTCGATACCACAATTGACCGCAAACGTTATACGTCTACAAAA
    GAGGTGCTCGATGCGACCCTGATCCACCAGTCTATTACAGGCCTGTATGAAACTCGTATC
    GACCTGTCACAACTGGGCGGCGACTGA
    SEQ ATGGACAAGAAATATTCAATCGGTTTAGCAATAGGAACTAACTCAGTAGGTTGGGCTGTA
    ID ATTACAGACGAATACAAGGTACCGTCCAAAAAGTTTAAGGTGTTGGGGAACACAGATAGA
    NO: CACTCTATAAAAAAAAATTTAATAGGCGCTTTACTTTTCGATTCAGGCGAAACTGCAGAA
    169 GCGACACGTCTGAAGAGAACCGCTAGACGTAGATACACGAGGAGAAAGAACAGAATATGT
    TACCTACAAGAAATTTTTTCTAATGAGATGGCTAAGGTGGATGATTCGTTTTTTCATAGA
    CTCGAAGAATCTTTCTTAGTTGAAGAAGATAAAAAACACGAAAGGCATCCTATCTTTGGA
    AACATAGTTGATGAGGTGGCTTACCATGAAAAATATCCCACTATATATCACCTTAGAAAA
    AAGTTGGTTGATTCAACCGACAAAGCGGATCTAAGGTTAATTTACCTCGCGTTGGCTCAC
    ATGATAAAATTTAGAGGACATTTCTTGATCGAAGGTGATTTAAATCCCGATAACTCTGAT
    GTAGATAAACTGTTCATCCAGTTGGTTCAAACATATAATCAGTTGTTCGAAGAGAACCCC
    ATTAACGCATCAGGTGTTGATGCTAAAGCAATCTTATCAGCAAGGTTGAGCAAGAGCAGA
    CGTCTGGAAAACTTGATTGCCCAATTGCCAGGTGAAAAGAAGAACGGTCTTTTTGGAAAT
    TTAATTGCACTTTCACTTGGGTTGACACCGAATTTTAAAAGCAATTTCGACCTCGCTGAG
    GATGCTAAACTCCAGTTATCTAAGGATACATATGACGATGATTTGGATAATCTATTGGCC
    CAGATAGGTGATCAGTATGCAGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCAATT
    CTACTGAGCGATATTTTAAGGGTGAATACAGAAATAACTAAAGCACCTTTGTCTGCATCT
    ATGATAAAAAGATACGATGAACACCATCAAGATCTCACACTATTAAAAGCTTTAGTTAGA
    CAACAATTACCAGAAAAATATAAAGAAATCTTTTTCGATCAGTCCAAGAACGGATACGCC
    GGCTATATAGATGGCGGTGCCTCCCAAGAAGAATTTTACAAATTTATCAAACCCATTTTG
    GAAAAGATGGATGGTACTGAAGAATTATTGGTCAAATTAAACAGGGAAGATTTATTAAGA
    AAACAAAGGACCTTTGATAATGGTTCTATTCCACACCAAATCCATCTAGGGGAATTACAT
    GCGATTCTTAGAAGACAAGAAGATTTTTATCCATTCTTGAAAGATAACAGGGAAAAGATA
    GAGAAAATCTTAACTTTTAGAATTCCCTACTACGTCGGGCCCTTAGCTAGGGGGAATTCT
    AGATTCGCCTGGATGACACGCAAATCAGAAGAAACAATTACGCCTTGGAATTTTGAAGAA
    GTTGTTGATAAAGGAGCCTCTGCTCAATCTTTTATTGAACGAATGACCAATTTTGATAAG
    AATTTACCCAATGAAAAGGTCTTACCCAAACATTCACTCCTATACGAGTACTTTACTGTT
    TACAATGAGTTGACAAAAGTGAAGTATGTTACCGAGGGTATGCGAAAACCTGCTTTCTTG
    AGTGGTGAACAAAAGAAGGCCATTGTTGACTTGTTATTCAAAACTAACAGAAAGGTCACT
    GTGAAGCAGCTTAAAGAAGATTATTTCAAAAAGATCGAATGTTTCGACTCGGTAGAAATT
    AGTGGTGTGGAAGATAGATTTAATGCTTCTCTTGGAACATATCATGATCTACTAAAGATC
    ATCAAAGATAAAGATTTCTTGGACAATGAAGAAAATGAAGATATTCTTGAAGACATCGTG
    TTGACACTTACATTGTTTGAGGACAGAGAAATGATTGAAGAAAGGCTGAAGACCTACGCC
    CATTTGTTTGATGATAAAGTCATGAAACAGTTAAAGAGGAGAAGGTATACCGGATGGGGT
    AGGCTGTCTCGCAAATTGATTAATGGTATTCGTGATAAACAATCGGGTAAAACAATCCTA
    GATTTCCTGAAGTCCGATGGTTTCGCCAACAGGAATTTTATGCAATTGATTCATGACGAT
    TCTTTGACTTTTAAAGAGGATATTCAGAAAGCACAGGTCTCAGGACAGGGCGATTCACTC
    CATGAACATATAGCTAACCTGGCTGGCTCCCCTGCTATTAAGAAAGGTATCTTGCAAACC
    GTCAAAGTAGTAGACGAACTTGTTAAAGTTATGGGAAGACACAAACCTGAAAATATCGTT
    ATTGAAATGGCTCGCGAAAACCAGACAACACAAAAGGGTCAAAAGAATTCGAGAGAGAGA
    ATGAAGCGTATCGAAGAAGGTATTAAAGAACTTGGGTCCCAAATACTTAAAGAACATCCA
    GTAGAAAACACTCAGCTTCAAAATGAAAAATTATACTTATATTATCTTCAGAATGGCCGC
    GATATGTATGTTGACCAAGAGTTAGATATAAATAGGTTGTCTGATTACGACGTGGATCAT
    ATTGTACCTCAATCTTTTCTAAAAGATGATTCAATTGATAATAAGGTATTAACGAGAAGT
    GATAAAAATAGAGGTAAATCTGACAACGTGCCAAGCGAAGAGGTGGTGAAGAAAATGAAA
    AATTATTGGCGTCAACTGTTGAACGCCAAGTTAATTACGCAGAGAAAGTTTGATAATCTA
    ACAAAAGCTGAAAGAGGAGGCCTATCTGAGTTAGATAAGGCCGGTTTTATCAAACGTCAG
    TTAGTTGAAACCAGGCAAATCACGAAGCACGTTGCCCAAATTCTAGATTCAAGGATGAAT
    ACCAAATACGATGAAAACGATAAACTGATTCGGGAAGTCAAGGTTATAACTCTAAAAAGC
    AAACTAGTTTCAGATTTTCGCAAAGATTTTCAATTTTACAAAGTTCGAGAAATCAATAAT
    TATCATCATGCTCACGACGCGTACTTGAACGCGGTCGTTGGTACAGCTTTAATAAAGAAA
    TATCCTAAACTGGAATCGGAATTTGTATATGGGGATTACAAAGTATACGACGTGAGAAAG
    ATGATCGCTAAATCTGAACAAGAAATTGGGAAAGCAACTGCCAAATATTTTTTTTACAGC
    AACATAATGAATTTTTTTAAAACGGAAATTACATTGGCAAATGGCGAAATTAGAAAGCGC
    CCATTGATAGAGACCAATGGAGAGACTGGGGAAATCGTGTGGGATAAAGGACGTGATTTT
    GCCACAGTGAGGAAAGTGTTAAGTATGCCACAAGTTAATATTGTAAAAAAGACCGAGGTC
    CAAACGGGTGGATTTAGCAAAGAATCAATTTTACCTAAGAGAAATTCAGATAAATTAATT
    GCCCGCAAAAAGGATTGGGATCCTAAAAAATATGGTGGTTTTGATTCCCCAACAGTTGCT
    TACTCCGTCCTAGTTGTTGCTAAGGTTGAAAAAGGAAAGTCTAAGAAACTTAAATCCGTA
    AAAGAGTTACTGGGAATTACAATAATGGAAAGATCCTCTTTCGAAAAGAACCCTATTGAC
    TTCTTGGAGGCGAAAGGTTATAAAGAAGTCAAAAAAGATTTGATCATAAAACTACCAAAG
    TATTCTCTATTTGAATTGGAAAACGGCAGAAAAAGGATGTTGGCAAGCGCTGGTGAACTA
    CAAAAGGGTAACGAATTGGCATTGCCGAGTAAATACGTGAATTTTCTATATTTGGCATCA
    CATTACGAAAAGTTAAAGGGATCACCCGAGGATAACGAGCAGAAACAACTGTTTGTTGAA
    CAACACAAACATTATCTTGATGAAATTATAGAACAAATTAGTGAGTTCAGTAAGAGAGTT
    ATTTTAGCCGATGCAAATTTAGACAAAGTTTTATCTGCTTATAACAAACATAGAGATAAG
    CCTATAAGGGAACAAGCCGAAAATATTATTCATTTGTTTACGTTAACAAATTTAGGGGCA
    CCAGCAGCATTCAAGTACTTCGATACGACTATCGATCGTAAGCGTTACACATCTACCAAA
    GAAGTTCTTGATGCAACTTTGATTCATCAATCTATAACAGGCTTATATGAAACTAGAATC
    GATCTGTCACAACTTGGTGGTGACTAA
    SEQ ATGGACAAGAAGTACTCAATTGGGCTTGCTATCGGCACTAACAGCGTTGGCTGGGCGGTC
    ID ATCACAGACGAATATAAGGTCCCATCAAAGAAATTCAAAGTCCTTGGCAATACGGACCGA
    NO: CATTCAATCAAGAAGAACCTGATTGGAGCTCTGCTGTTTGATTCCGGTGAAACCGCCGAG
    170 GCAACACGATTGAAACGTACCGCTCGTAGGAGGTATACGCGGCGGAAAAATAGGATCTGC
    TATCTGCAGGAAATATTTAGCAACGAAATGGCCAAGGTAGACGACAGCTTCTTCCACCGG
    CTCGAGGAATCTTTCCTCGTGGAAGAAGACAAAAAGCACGAGCGCCACCCCATTTTCGGC
    AATATCGTGGACGAGGTAGCTTACCATGAAAAGTATCCAACTATTTACCACTTACGTAAG
    AAGTTAGTGGACAGCACCGATAAAGCCGACCTTCGCCTGATTTACCTAGCACTTGCACAC
    ATGATTAAGTTCCGAGGCCACTTCTTGATAGAGGGAGACCTGAATCCTGACAATTCCGAT
    GTGGATAAATTGTTCATCCAGCTGGTACAGACATACAATCAGTTGTTTGAGGAAAATCCG
    ATTAATGCCAGTGGCGTGGACGCCAAGGCTATCCTGTCTGCTCGGCTTAGTAAGAGTAGA
    CGCCTGGAAAATCTAATCGCACAGCTGCCCGGCGAAAAGAAAAATGGACTGTTCGGTAAT
    TTGATCGCCCTGAGCCTGGGCCTCACCCCTAACTTTAAGTCTAACTTCGACCTGGCCGAA
    GATGCTAAGCTCCAGCTGTCCAAAGATACTTACGATGACGATCTCGATAATCTACTGGCT
    CAGATCGGGGACCAGTACGCTGACCTGTTTCTAGCTGCCAAGAACCTCAGTGACGCCATT
    CTCCTGTCCGATATTCTGAGGGTTAACACTGAAATTACAAAGGCCCCGCTGAGCGCGAGC
    ATGATCAAAAGGTACGACGAGCATCACCAGGACCTCACGCTGCTGAAGGCCTTAGTCAGA
    CAGCAACTGCCCGAAAAGTACAAAGAAATCTTTTTCGACCAATCCAAGAACGGGTACGCC
    GGCTACATTGATGGCGGGGCTTCACAAGAGGAGTTTTACAAGTTTATCAAGCCCATCCTG
    GAGAAAATGGACGGCACTGAAGAACTGCTTGTGAAACTCAATAGGGAAGACTTACTGAGG
    AAACAGCGCACATTCGATAATGGCTCCATACCCCACCAAATCCATCTGGGAGAGTTGCAT
    GCCATCTTGCGAAGGCAGGAGGACTTCTACCCCTTTCTTAAGGACAACAGGGAGAAAATC
    GAGAAAATTCTGACTTTCCGTATCCCCTACTACGTGGGCCCACTTGCTCGCGGAAACTCA
    CGATTCGCATGGATGACCAGAAAGTCCGAGGAAACAATTACACCCTGGAATTTTGAGGAG
    GTAGTAGACAAGGGAGCCAGCGCTCAATCTTTCATTGAGAGGATGACGAATTTCGACAAG
    AACCTTCCAAACGAGAAAGTGCTTCCTAAGCACAGCCTGCTGTATGAGTATTTCACGGTG
    TACAACGAACTTACGAAGGTCAAGTATGTGACAGAGGGTATGCGGAAACCTGCTTTTCTG
    TCTGGTGAACAGAAGAAAGCTATCGTCGATCTCCTGTTTAAAACCAACCGAAAGGTGACG
    GTGAAACAGTTGAAGGAGGATTACTTCAAGAAGATCGAGTGTTTTGATTCTGTTGAAATT
    TCTGGGGTCGAGGATAGATTCAACGCCAGCCTGGGCACCTACCATGATTTGCTGAAGATT
    ATCAAGGATAAGGATTTTCTGGATAATGAGGAGAATGAAGACATTTTGGAGGATATAGTG
    CTGACCCTCACCCTGTTCGAGGACCGGGAGATGATCGAGGAGAGACTGAAAACATACGCT
    CACCTGTTTGACGACAAGGTCATGAAGCAGCTTAAGAGACGCCGTTACACAGGCTGGGGA
    AGATTATCCCGCAAATTAATCAACGGGATACGCGATAAACAAAGTGGCAAGACCATACTC
    GACTTCCTAAAGAGCGATGGATTCGCAAATCGCAATTTCATGCAGTTGATCCACGACGAT
    AGCCTGACCTTCAAAGAGGACATTCAGAAAGCGCAGGTGAGTGGTCAAGGGGATTCCCTG
    CACGAACACATTGCTAACTTGGCTGGATCACCAGCCATTAAGAAAGGCATACTGCAGACC
    GTTAAAGTGGTAGATGAGCTTGTGAAAGTCATGGGAAGACATAAGCCAGAGAACATAGTG
    ATCGAAATGGCCAGGGAAAATCAGACCACGCAAAAGGGGCAGAAGAACTCAAGAGAGCGT
    ATGAAGAGGATCGAGGAGGGCATCAAGGAGCTGGGTAGCCAGATCCTTAAAGAGCACCCA
    GTTGAGAATACCCAGCTGCAGAATGAGAAACTTTATCTCTATTATCTCCAGAACGGAAGG
    GATATGTATGTCGACCAGGAACTGGACATCAATCGGCTGAGTGATTATGACGTCGACCAC
    ATTGTGCCTCAAAGCTTTCTGAAGGATGATTCCATCGACAATAAAGTTCTGACCCGGTCT
    GATAAAAATAGAGGCAAATCCGACAACGTACCTAGCGAAGAAGTCGTCAAAAAAATGAAG
    AACTATTGGAGGCAGTTGCTGAATGCCAAGCTGATTACACAACGCAAGTTTGACAATCTC
    ACCAAGGCAGAAAGGGGGGGCCTGTCAGAACTCGACAAAGCAGGTTTCATTAAAAGGCAG
    CTAGTTGAAACTAGGCAGATTACTAAGCACGTGGCCCAGATCCTCGACTCACGGATGAAT
    ACAAAGTATGATGAGAATGATAAGCTAATCCGGGAGGTGAAGGTGATTACTCTGAAATCT
    AAGCTGGTGTCAGATTTCAGAAAAGACTTCCAGTTCTACAAAGTCAGAGAGATCAACAAT
    TATCACCATGCCCACGATGCATATCTTAATGCAGTAGTGGGGACAGCTCTGATCAAAAAA
    TATCCTAAACTGGAGTCTGAATTCGTTTATGGTGACTATAAAGTCTATGACGTCAGAAAA
    ATGATCGCAAAGAGCGAGCAGGAGATAGGGAAGGCCACAGCAAAGTACTTCTTTTACAGT
    AATATCATGAACTTTTTCAAAACTGAGATTACATTGGCTAACGGCGAGATCCGCAAGCGG
    CCACTGATAGAGACTAACGGAGAGACAGGGGAGATTGTTTGGGATAAGGGCCGTGACTTC
    GCCACCGTTAGGAAAGTGCTGTCCATGCCCCAGGTGAACATTGTGAAGAAGACAGAAGTG
    CAGACGGGTGGGTTCTCAAAAGAGTCTATTCTGCCTAAGCGGAATAGTGACAAACTGATC
    GCACGTAAAAAGGACTGGGATCCAAAAAAGTACGGCGGATTCGACAGTCCTACCGTTGCA
    TATTCCGTGCTTGTGGTCGCTAAGGTGGAGAAGGGAAAAAGCAAGAAACTGAAGTCAGTC
    AAAGAACTACTGGGCATAACGATCATGGAGCGCTCCAGTTTCGAAAAAAACCCAATCGAT
    TTTCTTGAAGCCAAGGGATACAAGGAGGTAAAGAAAGACCTTATCATTAAGCTGCCTAAG
    TACAGTCTGTTCGAACTGGAGAATGGGAGGAAGCGCATGCTGGCATCAGCTGGAGAACTC
    CAAAAAGGGAACGAGTTGGCCCTCCCCTCAAAGTATGTCAATTTTCTCTACCTGGCTTCT
    CACTACGAGAAGTTAAAGGGGTCTCCAGAGGATAATGAGCAGAAACAGCTGTTTGTGGAA
    CAGCACAAGCACTATTTGGACGAAATCATCGAACAAATTTCCGAGTTCAGTAAGAGGGTG
    ATTCTGGCCGACGCAAACCTTGACAAAGTTCTGTCCGCATACAATAAGCACAGAGACAAA
    CCAATCCGCGAGCAAGCCGAGAATATAATTCACCTTTTCACTCTGACTAATCTGGGGGCC
    CCCGCAGCATTTAAATATTTCGATACAACAATCGACCGGAAGCGGTATACATCTACTAAG
    GAAGTCCTCGATGCGACACTGATCCACCAGTCAATTACAGGTTTATATGAAACAAGAATC
    GACCTGTCCCAGCTGGGCGGCGACTAG
    SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATT
    ID GATAATTGAGATCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAA
    NO: TTATTTATTTATCCAGAAAATGAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCT
    171 GGCGCAGTTGATATGTcaaaCAGGTtgccgtcactgcgtcttttactggctcttctcgct
    aaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
    gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattattt
    gcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctac
    ctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcac
    cgcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGGAA
    CAGGAATATTATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTTACTGAC
    AGTGAATATCACGTTCTAAGAAAGCATGGTAAGGCATTGTGGGGTGTAAGACTTTTCGAA
    TCTGCTTCCACTGCTGAAGAGCGTAGAATGTTTAGAACGAGTCGACGTAGGCTAGACAGG
    CGCAATTGGAGAATCGAAATTTTACAAGAAATTTTTGCGGAAGAGATATCTAAGAAAGAC
    CCAGGCTTTTTCCTGAGAATGAAGGAATCTAAGTATTACCCTGAGGATAAAAGAGATATA
    AATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGACGATGATTTTACCGATAAG
    GATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATGAATACAGAG
    GAAACCCCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACATAGAGGC
    CATTTCTTACTTTCCGGGGATATCAACGAAATCAAAGAGTTTGGTACCACATTTAGTAAG
    TTACTGGAAAACATAAAGAATGAAGAATTGGATTGGAACTTAGAACTCGGAAAAGAAGAA
    TACGCGGTTGTCGAATCTATCCTGAAGGATAATATGCTGAATAGGTCGACCAAAAAAACT
    AGGCTGATCAAAGCACTGAAAGCCAAATCTATCTGCGAAAAAGCTGTTTTAAATTTACTT
    GCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTTTGGAAGAATTGAACGAAACCGAG
    CGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTGGTGAGGTGGAAAAC
    GAGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGGGCTGTT
    TTAGTAGAAATCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACTTACGAA
    AAGCACAAGTCCGATCTCCAGTTTTTGAAGAAAATTGTCAGGAAATATCTGACTAAGGAA
    GAATATAAAGATATTTTCGTTAGTACCTCTGACAAACTGAAAAATTACTCCGCTTACATC
    GGGATGACCAAGATTAATGGCAAAAAAGTTGATCTGCAAAGCAAAAGGTGTTCGAAGGAA
    GAATTTTATGATTTCATTAAAAAGAATGTCTTAAAAAAATTAGAAGGTCAGCCAGAATAC
    GAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACCAAAACAAGTCAACAGAGAT
    AATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTTAGGCAATTTA
    CGCGATAAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTTGAATTC
    AGAATACCCTATTATGTGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGTAAATTC
    ACATGGGCCGTCCGCAAATCCAATGAAAAAATTTACCCATGGAACTTTGAAAATGTAGTA
    GATATTGAAGCGTCTGCGGAGAAATTTATTCGAAGAATGACTAATAAATGCACTTACTTG
    ATGGGAGAGGATGTTCTGCCTAAAGACAGCTTATTATACAGCAAGTACATGGTTCTAAAC
    GAACTTAACAACGTTAAGTTGGACGGTGAGAAATTAAGTGTAGAATTGAAACAAAGATTG
    TATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAAAATTAAGAATTACTTG
    AAGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGATTTCAAA
    GCATCCCTAACAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTCGCAAAA
    AAAGATAAAGAAAACATTATTACTAATATTGTTCTTTTCGGTGATGACAAGAAATTGTTG
    AAGAAAAGACTGAATAGACTTTACCCCCAGATTACTCCCAATCAACTTAAGAAAATTTGT
    GCTTTGTCTTACACAGGATGGGGTCGTTTTTCAAAAAAGTTCTTAGAAGAGATTACCGCA
    CCTGATCCAGAAACAGGCGAAGTATGGAATATAATTACCGCCTTATGGGAATCGAACAAT
    AATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGGAAGAAGTTGAGACTTACAAC
    ATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGTATGTATCACCT
    TCTGTCAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAGGTAATG
    AAGGAGTCTCCTAAACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCAAAAAGA
    ACCGAGTCAAGAAAGAAGCAGTTAATCGATTTATATAAGGCTTGTAAAAACGAAGAGAAA
    GATTGGGTTAAAGAATTGGGGGACCAAGAGGAACAAAAACTACGGTCGGATAAGTTGTAT
    TTATACTATACGCAAAAGGGACGATGTATGTATTCCGGCGAGGTAATAGAATTGAAGGAT
    TTATGGGACAATACAAAATATGACATAGACCATATATATCCCCAATCAAAAACGATGGAC
    GATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATAATGCGACCAAATCTGATAAG
    TATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGTCCTTGTTAGAT
    GGTGGGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTATCGCCA
    GAAGAACTCGCTGGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACCAAAGCC
    GTTGCTGAGATCCTAAAGCAAGTTTTCCCAGAGTCGGAGATTGTCTATGTCAAAGCTGGC
    ACAGTGAGCAGGTTTAGGAAAGACTTCGAACTATTAAAGGTAAGAGAAGTGAACGATTTA
    CATCACGCAAAGGACGCTTACCTAAATATCGTTGTAGGTAACTCATATTATGTTAAATTT
    ACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCCAGGTAGAACATATAACCTGAAAAAG
    ATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGCATGGGAAGTTGGTAAG
    AAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCGTTACAAGG
    CAGGTTCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGGAAAGGT
    CAAATTGCAATAAAAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGTGGCTAT
    AATAAAGCTGCGGGTGCATACTTTATGCTTGTTGAATCAAAAGACAAGAAAGGTAAGACT
    ATTAGAACTATAGAATTTATACCCCTGTACCTTAAAAACAAAATTGAATCGGATGAGTCA
    ATCGCGTTAAATTTTCTAGAGAAAGGAAGGGGTTTAAAAGAACCAAAGATCCTGTTAAAA
    AAGATTAAGATTGACACCTTGTTCGATGTAGATGGATTTAAAATGTGGTTATCTGGCAGA
    ACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTGGATGAGAAAATCATT
    GTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGAGTTGAAA
    TTATCTGATAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACATTCGTT
    GATAAACTTGAAAATACCGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACATTAATT
    GATAAACAAAAAGAATTTGAAAGGCTATCACTGGAAGACAAATCCTCCACCCTATTTGAA
    ATTTTGCATATATTCCAGTGCCAATCTTCAGCAGCTAATTTAAAAATGATTGGCGGACCT
    GGGAAAGCCGGCATCCTAGTGATGAACAATAATATCTCCAAGTGTAACAAAATATCAATT
    ATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGACTTGCTTAAGATATAAGAA
    ATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATT
    ATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTT
    TATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTT
    ATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTA
    TTTCC
    SEQ AATTCAAAGGATAATCAAAC
    ID
    NO:
    172
    SEQ AATCTCTACTCTTTGTAGAT
    ID
    NO:
    173
    SEQ AATTTCTACTGTTGTAGAT
    ID
    NO:
    174
    SEQ AATTTCTACTAGTGTAGAT
    ID
    NO:
    175
    SEQ AATTTCTACTATTGT
    ID
    NO:
    176
    SEQ AATTTCTACTGTTGTAGA
    ID
    NO:
    177
    SEQ AATTTCTACTATTGTA
    ID
    NO:
    178
    SEQ AATTTCTACTTTTGTAGAT
    ID
    NO:
    179
    SEQ AATTTCTACTGTTGTAGAT
    ID
    NO:
    180
    SEQ AATTTCTACTCTTGTAGAT
    ID
    NO:
    181

Claims (20)

What is claimed is:
1. A composition comprising:
i) a first donor nucleic acid comprising:
a) a modified first target nucleic acid sequence;
b) a first protospacer adjacent motif (PAM) mutation; and
c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid; and
ii) a second donor nucleic acid comprising:
a) a barcode corresponding to the modified first target nucleic acid sequence; and
b) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of a second target nucleic acid.
2. The composition of claim 1, wherein the modified first target nucleic acid sequence comprises at least one inserted, deleted, or substituted nucleic acid compared to a corresponding un-modified first target nucleic acid.
3. The composition of claim 1, wherein the first guide nucleic acid and second guide nucleic acid are compatible with a nucleic acid-guided nuclease.
4. The composition of claim 3, wherein the nucleic acid-guided nuclease is a Type II or Type V Cas protein.
5. The composition of claim 3, wherein the nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue.
6. The composition of claim 1, wherein the second donor nucleic acid comprises a second PAM mutation.
7. The composition of claim 1, wherein the second donor nucleic acid sequence comprises a regulatory sequence or a mutation to turn a screenable or selectable marker on or off.
8. The composition of claim 1, wherein the second donor nucleic acid sequence targets a unique landing site.
9. A method of genome engineering, the method comprising:
a) contacting a population of cells with a polynucleotide, wherein each cell comprises a first target nucleic acid, a second target nucleic acid, and a nucleic acid-guided nuclease,
wherein the polynucleotide comprises
1) an editing cassette comprising:
i) a modified first target nucleic acid sequence;
ii) a first protospacer adjacent motif (PAM) mutation;
iii) a first guide nucleic acid sequence comprising a spacer region complementary to a portion of the first target nucleic acid and compatible with the nucleic acid-guided nuclease; and
2) a recorder cassette comprising
i) a barcode corresponding to the modified first target nucleic acid sequence; and
ii) a second guide nucleic acid sequence comprising a second spacer region complementary to a portion of the second target nucleic acid and compatible with the nucleic acid-guided nuclease;
b) allowing the first guide nucleic acid sequence, the second guide nucleic acid sequence, and the nucleic acid-guided nuclease to create a genome edit within the first target nucleic acid and the second target nucleic acid.
10. The method of claim 9, further comprising c) sequencing a portion of the barcode, thereby identifying the modified first target nucleic acid that was inserted within the first target nucleic acid in step a).
11. The method of claim 9, wherein the nucleic acid-guided nuclease is a CRISPR nuclease.
12. The method of claim 9, wherein the PAM mutation is not recognized by the nucleic acid-guided nuclease.
13. The method of claim 9, wherein the nucleic acid-guided nuclease is a Type II or Type V Cas protein.
14. The method of claim 9, wherein the nucleic acid-guided nuclease is a Cas9 homologue or a Cpf1 homologue.
15. The method of claim 9, wherein the recorder cassette further comprises a second PAM mutation that is not recognized by the nucleic acid-guided nuclease.
16. A method of selectable recursive genetic engineering comprising
a) contacting cells comprising a nucleic acid-guided nuclease with a polynucleotide comprising a recorder cassette, said recorder cassette comprising
i) a nucleic acid sequence that recombines into a unique landing site incorporated during a previous round of engineering, wherein the nucleic acid sequence comprises a unique barcode; and
ii) a guide RNA compatible with the nucleic acid-guided nuclease that targets the unique landing site; and
b) allowing the nucleic acid-guided nuclease to edit the unique landing site, thereby incorporating the unique barcode into the unique landing site.
17. The method of claim 16, wherein the nucleic acid sequence further comprises a regulatory sequence that turns transcription of a screenable or selectable marker on or off.
18. The method of claim 16, wherein the nucleic acid sequence further comprises a PAM mutation that is not compatible with the nucleic acid-guided nuclease.
19. The method of claim 16, wherein the nucleic acid sequence further comprises a second unique landing site for subsequent engineering rounds.
20. The method of claim 16, wherein the polynucleotide further comprises an editing cassette comprising
a) a modified first target nucleic acid sequence;
b) a first protospacer adjacent motif (PAM) mutation; and
c) a first guide nucleic acid sequence comprising a first spacer region complementary to a portion of the first target nucleic acid,
wherein the unique barcode corresponds to the modified first target nucleic acid such that the modified target nucleic acid can be identified by the unique barcode.
US18/157,740 2016-06-24 2023-01-20 Methods for generating barcoded combinatorial libraries Pending US20230227810A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/157,740 US20230227810A1 (en) 2016-06-24 2023-01-20 Methods for generating barcoded combinatorial libraries

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201662354516P 2016-06-24 2016-06-24
US201662367386P 2016-07-27 2016-07-27
US201762483930P 2017-04-10 2017-04-10
US15/632,222 US10017760B2 (en) 2016-06-24 2017-06-23 Methods for generating barcoded combinatorial libraries
US15/948,793 US10294473B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US15/948,798 US10287575B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US16/295,393 US11584928B2 (en) 2016-06-24 2019-03-07 Methods for generating barcoded combinatorial libraries
US18/157,740 US20230227810A1 (en) 2016-06-24 2023-01-20 Methods for generating barcoded combinatorial libraries

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/295,393 Continuation US11584928B2 (en) 2016-06-24 2019-03-07 Methods for generating barcoded combinatorial libraries

Publications (1)

Publication Number Publication Date
US20230227810A1 true US20230227810A1 (en) 2023-07-20

Family

ID=60676029

Family Applications (5)

Application Number Title Priority Date Filing Date
US15/632,222 Active US10017760B2 (en) 2016-06-24 2017-06-23 Methods for generating barcoded combinatorial libraries
US15/948,793 Active US10294473B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US15/948,798 Active US10287575B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US16/295,393 Active 2040-03-16 US11584928B2 (en) 2016-06-24 2019-03-07 Methods for generating barcoded combinatorial libraries
US18/157,740 Pending US20230227810A1 (en) 2016-06-24 2023-01-20 Methods for generating barcoded combinatorial libraries

Family Applications Before (4)

Application Number Title Priority Date Filing Date
US15/632,222 Active US10017760B2 (en) 2016-06-24 2017-06-23 Methods for generating barcoded combinatorial libraries
US15/948,793 Active US10294473B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US15/948,798 Active US10287575B2 (en) 2016-06-24 2018-04-09 Methods for generating barcoded combinatorial libraries
US16/295,393 Active 2040-03-16 US11584928B2 (en) 2016-06-24 2019-03-07 Methods for generating barcoded combinatorial libraries

Country Status (10)

Country Link
US (5) US10017760B2 (en)
EP (1) EP3474669B1 (en)
JP (1) JP2019518478A (en)
CN (1) CN109688820B (en)
AU (1) AU2017280353B2 (en)
CA (1) CA3029254A1 (en)
DK (1) DK3474669T3 (en)
ES (1) ES2915562T3 (en)
LT (1) LT3474669T (en)
WO (1) WO2017223538A1 (en)

Families Citing this family (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
HUE066611T2 (en) 2014-02-11 2024-08-28 Univ Colorado Regents Crispr enabled multiplexed genome engineering
JP7166168B2 (en) 2015-07-30 2022-11-07 マサチューセッツ アイ アンド イヤー インファーマリー Ancestral viral sequences and their uses
US11208649B2 (en) 2015-12-07 2021-12-28 Zymergen Inc. HTP genomic engineering platform
US9988624B2 (en) 2015-12-07 2018-06-05 Zymergen Inc. Microbial strain improvement by a HTP genomic engineering platform
US11293029B2 (en) 2015-12-07 2022-04-05 Zymergen Inc. Promoters from Corynebacterium glutamicum
EP4273248A3 (en) * 2016-05-20 2024-01-10 Braingene AB Destabilising domains for conditionally stabilising a protein
US10337051B2 (en) 2016-06-16 2019-07-02 The Regents Of The University Of California Methods and compositions for detecting a target RNA
US11293021B1 (en) 2016-06-23 2022-04-05 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
AU2017280353B2 (en) 2016-06-24 2021-11-11 Inscripta, Inc. Methods for generating barcoded combinatorial libraries
EP3478845A4 (en) 2016-06-30 2019-07-31 Zymergen, Inc. METHODS OF PRODUCING A GLUCOSE PERMEASE BANK AND USES THEREOF
WO2018005655A2 (en) 2016-06-30 2018-01-04 Zymergen Inc. Methods for generating a bacterial hemoglobin library and uses thereof
US20180004537A1 (en) * 2016-07-01 2018-01-04 Microsoft Technology Licensing, Llc Molecular State Machines
US10892034B2 (en) 2016-07-01 2021-01-12 Microsoft Technology Licensing, Llc Use of homology direct repair to record timing of a molecular event
CN109477130B (en) 2016-07-01 2022-08-30 微软技术许可有限责任公司 Storage by iterative DNA editing
US11359234B2 (en) * 2016-07-01 2022-06-14 Microsoft Technology Licensing, Llc Barcoding sequences for identification of gene expression
US20200216837A1 (en) * 2016-08-17 2020-07-09 Katholieke Universiteit Leuven Drug-target identification by rapid selection of drug resistance mutations
WO2018064371A1 (en) 2016-09-30 2018-04-05 The Regents Of The University Of California Rna-guided nucleic acid modifying enzymes and methods of use thereof
KR102812752B1 (en) * 2016-09-30 2025-05-26 더 리젠츠 오브 더 유니버시티 오브 캘리포니아 Rna-guided nucleic acid modifying enzymes and methods of use thereof
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
HUE066467T2 (en) * 2017-06-23 2024-08-28 Inscripta Inc Nucleic acid-guided nucleases
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
WO2019006436A1 (en) 2017-06-30 2019-01-03 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
US10738327B2 (en) 2017-08-28 2020-08-11 Inscripta, Inc. Electroporation cuvettes for automation
WO2019055878A2 (en) * 2017-09-15 2019-03-21 The Board Of Trustees Of The Leland Stanford Junior University Multiplex production and barcoding of genetically engineered cells
WO2019068022A1 (en) 2017-09-30 2019-04-04 Inscripta, Inc. Flow through electroporation instrumentation
AU2018358051B2 (en) 2017-11-01 2025-01-09 The Regents Of The University Of California CasZ compositions and methods of use
WO2019089804A1 (en) 2017-11-01 2019-05-09 The Regents Of The University Of California Casy compositions and methods of use
US11970719B2 (en) 2017-11-01 2024-04-30 The Regents Of The University Of California Class 2 CRISPR/Cas compositions and methods of use
US12145962B2 (en) * 2018-02-05 2024-11-19 The Regents Of The University Of Colorado, A Body Corporate Construction and methods of use of a barcoded and gene edited DNA library
US10767193B2 (en) 2018-02-15 2020-09-08 Sigma-Aldrich Co. Llc Engineered CAS9 systems for eukaryotic genome modification
WO2019190874A1 (en) 2018-03-29 2019-10-03 Inscripta, Inc. Automated control of cell growth rates for induction and transformation
EP3775266A4 (en) * 2018-04-05 2021-06-30 Massachusetts Eye and Ear Infirmary METHODS OF MANUFACTURING AND USING COMBINATORY LIBRARIES OF BARCODE NUCLEIC ACIDS HAVING A DEFINED VARIATION
WO2019200004A1 (en) 2018-04-13 2019-10-17 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
WO2019209926A1 (en) 2018-04-24 2019-10-31 Inscripta, Inc. Automated instrumentation for production of peptide libraries
US10858761B2 (en) 2018-04-24 2020-12-08 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US10557216B2 (en) 2018-04-24 2020-02-11 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
WO2020011985A1 (en) * 2018-07-12 2020-01-16 Keygene N.V. Type v crispr/nuclease-system for genome editing in plant cells
EP4442836A3 (en) 2018-08-01 2024-12-18 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
US10532324B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10533152B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10752874B2 (en) 2018-08-14 2020-08-25 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11142740B2 (en) 2018-08-14 2021-10-12 Inscripta, Inc. Detection of nuclease edited sequences in automated modules and instruments
WO2020041456A1 (en) * 2018-08-22 2020-02-27 The Regents Of The University Of California Variant type v crispr/cas effector polypeptides and methods of use thereof
US11718847B2 (en) * 2018-08-29 2023-08-08 Agilent Technologies, Inc. Amplifying oligonucleotides and producing libraries of dual guide constructs
AU2019363487A1 (en) 2018-08-30 2021-04-15 Inscripta, Inc. Improved detection of nuclease edited sequences in automated modules and instruments
EP3861112A4 (en) * 2018-10-04 2022-09-21 The Regents of the University of Colorado, A Body Corporate MODIFIED CHIMERIC NUCLEIC ACID GUIDED NUCLEASE CONSTRUCTS AND THEIR USES
WO2020086475A1 (en) 2018-10-22 2020-04-30 Inscripta, Inc. Engineered enzymes
US11214781B2 (en) 2018-10-22 2022-01-04 Inscripta, Inc. Engineered enzyme
WO2020092763A1 (en) * 2018-11-03 2020-05-07 Blueallele, Llc Methods for comparing efficacy of donor molecules
EP3931313A2 (en) 2019-01-04 2022-01-05 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
WO2020163779A1 (en) * 2019-02-08 2020-08-13 The Board Of Trustees Of The Leland Stanford Junior University Production and tracking of engineered cells with combinatorial genetic modifications
EP3947691A4 (en) 2019-03-25 2022-12-14 Inscripta, Inc. SIMULTANEOUS MULTIPLEX GENE EDIT IN YEAST
US11001831B2 (en) 2019-03-25 2021-05-11 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
JP2022530029A (en) * 2019-04-24 2022-06-27 スポットライト セラピューティクス Nucleic Acid Guided nuclease Methods and Compositions for Cell Targeting Screening
US10837021B1 (en) 2019-06-06 2020-11-17 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
BR112021024288A2 (en) 2019-06-07 2022-02-15 Scribe Therapeutics Inc modified casx systems
JP2022538789A (en) * 2019-06-14 2022-09-06 アーバー バイオテクノロジーズ, インコーポレイテッド Novel CRISPR DNA targeting enzymes and systems
US10907125B2 (en) 2019-06-20 2021-02-02 Inscripta, Inc. Flow through electroporation modules and instrumentation
EP3986909A4 (en) 2019-06-21 2023-08-02 Inscripta, Inc. GENOME-WIDE RATIONAL DESIGNED MUTATIONS LEADING TO INCREASED LYSINE PRODUCTION IN E. COLI
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
EP3997221A4 (en) * 2019-07-08 2023-07-05 Inscripta, Inc. ENHANCED MODIFICATION OF A NUCLEIC ACID-GUIDED CELL THROUGH A LEXA-RAD51 FUSION PROTEIN
EP4028523A1 (en) * 2019-09-09 2022-07-20 Scribe Therapeutics Inc. Compositions and methods for use in immunotherapy
EP4028522A1 (en) 2019-09-09 2022-07-20 Scribe Therapeutics Inc. Compositions and methods for the targeting of sod1
EP4028585A1 (en) * 2019-09-12 2022-07-20 GlaxoSmithKline Intellectual Property Development Limited Method for screening libraries
US20220315907A1 (en) * 2019-10-10 2022-10-06 Inscripta, Inc. Split crispr nuclease tethering system
CN115052980B (en) * 2019-11-18 2024-09-10 苏州齐禾生科生物科技有限公司 A gene editing system derived from Flavobacterium
US11203762B2 (en) 2019-11-19 2021-12-21 Inscripta, Inc. Methods for increasing observed editing in bacteria
CA3159316A1 (en) 2019-12-06 2021-06-10 Benjamin OAKES Compositions and methods for the targeting of rhodopsin
KR20250033331A (en) 2019-12-10 2025-03-07 인스크립타 인코포레이티드 Novel mad nucleases
US10704033B1 (en) 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases
CN115135778A (en) * 2019-12-18 2022-09-30 巴斯夫欧洲公司 Nucleic acid analysis method and apparatus
EP4069851A4 (en) 2019-12-18 2023-11-22 Inscripta, Inc. CASCADE/DCAS3 COMPLEMENTATION TESTS FOR IN VIVO DETECTION OF NUCLEIC ACID-DRIVEN NUCLEASE EDITED CELLS
US20230037026A1 (en) * 2019-12-24 2023-02-02 Asklepios Biopharmaceutical, Inc. Method for identifying regulatory elements conformationally
WO2021142394A1 (en) * 2020-01-11 2021-07-15 Inscripta, Inc. Cell populations with rationally designed edits
US10689669B1 (en) 2020-01-11 2020-06-23 Inscripta, Inc. Automated multi-module cell processing methods, instruments, and systems
KR20220133257A (en) 2020-01-27 2022-10-04 인스크립타 인코포레이티드 Electroporation modules and instruments
US20210317444A1 (en) * 2020-04-08 2021-10-14 Inscripta, Inc. System and method for gene editing cassette design
US20230159955A1 (en) * 2020-04-16 2023-05-25 Zymergen Inc. Circular-permuted nucleic acids for homology-directed editing
US20210332388A1 (en) 2020-04-24 2021-10-28 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells
WO2021236740A2 (en) * 2020-05-19 2021-11-25 Board Of Regents, The University Of Texas System Genetic physical unclonable functions and methods of use thereof
US11787841B2 (en) 2020-05-19 2023-10-17 Inscripta, Inc. Rationally-designed mutations to the thrA gene for enhanced lysine production in E. coli
WO2021247942A2 (en) * 2020-06-04 2021-12-09 Inscripta, Inc. Methods and compositions for crispr editing of cells and correlating the edits to a resulting cellular nucleic acid profile
JP7419168B2 (en) * 2020-06-10 2024-01-22 株式会社東芝 Modified piggyBac transposase polypeptide, polynucleotide encoding it, introduction carrier, kit, method for integrating a target sequence into the genome of a cell, and cell production method
WO2021257716A2 (en) * 2020-06-16 2021-12-23 Bio-Techne Corporation Engineered mad7 directed endonuclease
US11299731B1 (en) 2020-09-15 2022-04-12 Inscripta, Inc. CRISPR editing to embed nucleic acid landing pads into genomes of live cells
US11566241B2 (en) * 2020-10-02 2023-01-31 Inscripta, Inc. Methods and systems for modeling of design representation in a library of editing cassettes
US11512297B2 (en) 2020-11-09 2022-11-29 Inscripta, Inc. Affinity tag for recombination protein recruitment
PE20231178A1 (en) 2020-12-03 2023-08-01 Scribe Therapeutics Inc ENGINEERING DESIGNED CLASS 2 TYPE V CRISPR SYSTEMS
WO2022146497A1 (en) 2021-01-04 2022-07-07 Inscripta, Inc. Mad nucleases
WO2022150269A1 (en) 2021-01-07 2022-07-14 Inscripta, Inc. Mad nucleases
US11884924B2 (en) 2021-02-16 2024-01-30 Inscripta, Inc. Dual strand nucleic acid-guided nickase editing
KR20240146111A (en) * 2021-03-09 2024-10-07 일루미나, 인코포레이티드 Analyzing expression of protein-coding variants in cells
WO2022226085A1 (en) * 2021-04-20 2022-10-27 The Board Of Trustees Of The Leland Stanford Junior University Compressive molecular probes for genomic editing and tracking
WO2022261150A2 (en) 2021-06-09 2022-12-15 Scribe Therapeutics Inc. Particle delivery systems
WO2023288018A2 (en) * 2021-07-14 2023-01-19 Ultima Genomics, Inc. Barcode selection
AU2022339955A1 (en) * 2021-09-02 2024-03-07 University Of Washington Multiplex, temporally resolved molecular signal recorder and related methods
WO2023076134A1 (en) * 2021-10-26 2023-05-04 Inscripta, Inc. Processes for measuring strain fitness and/or genotype selection in bioreactors
CN114022491B (en) * 2021-10-27 2022-05-10 安徽医科大学 An automatic delineation method of esophageal cancer tumor target image in small dataset based on improved spatial pyramid model
WO2023137233A2 (en) * 2022-01-17 2023-07-20 Danmarks Tekniske Universitet Compositions and methods for editing genomes
US12186388B2 (en) 2022-11-02 2025-01-07 Centre For Virology, Vaccinology And Therapeutics Limited Interferon-producing universal sarbecovirus vaccines, and uses thereof
WO2024094050A1 (en) 2022-11-02 2024-05-10 Centre For Virology, Vaccinology And Therapeutics Limited Interferon-producing universal sarbecovirus vaccines, and uses thereof
CN116286898A (en) * 2023-04-12 2023-06-23 山东大学 Methionine Synthetase Gene HMTs and Its Application
WO2025038872A1 (en) * 2023-08-15 2025-02-20 Bio-Techne Corporation Compositions and methods related to crispr-associated enzymes
CN119432813A (en) * 2023-12-29 2025-02-14 武汉艾迪晶生物科技有限公司 Cas protein, combined protein, nucleic acid molecule, ribonucleoprotein complex, recombinant vector, transgenic cell and its application

Family Cites Families (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US1001760A (en) 1911-03-16 1911-08-29 William F Mcgregor Floating fish-trap.
US1001184A (en) 1911-04-20 1911-08-22 Charles M Coover Non-slipping device.
US6562594B1 (en) 1999-09-29 2003-05-13 Diversa Corporation Saturation mutagenesis in directed evolution
US20030044866A1 (en) 2001-08-15 2003-03-06 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
CN101967490B (en) 2002-06-14 2014-07-16 先正达公司 Xylanases, nucleic adics encoding them and methods for making and using them
JP4447977B2 (en) 2004-06-30 2010-04-07 富士通マイクロエレクトロニクス株式会社 Secure processor and program for secure processor.
EP2336362B1 (en) 2005-08-26 2018-09-19 DuPont Nutrition Biosciences ApS Use of crispr associated genes (cas)
WO2007144770A2 (en) 2006-06-16 2007-12-21 Danisco A/S Bacterium
WO2008052101A2 (en) 2006-10-25 2008-05-02 President And Fellows Of Harvard College Multiplex automated genome engineering
US9309511B2 (en) 2007-08-28 2016-04-12 The Johns Hopkins University Functional assay for identification of loss-of-function mutations in genes
US20140121118A1 (en) 2010-11-23 2014-05-01 Opx Biotechnologies, Inc. Methods, systems and compositions regarding multiplex construction protein amino-acid substitutions and identification of sequence-activity relationships, to provide gene replacement such as with tagged mutant genes, such as via efficient homologous recombination
US20150368639A1 (en) 2011-04-14 2015-12-24 Ryan T. Gill Compositions, methods and uses for multiplex protein sequence activity relationship mapping
JP2015500648A (en) 2011-12-16 2015-01-08 ターゲットジーン バイオテクノロジーズ リミテッド Compositions and methods for modifying a given target nucleic acid sequence
US9637739B2 (en) 2012-03-20 2017-05-02 Vilnius University RNA-directed DNA cleavage by the Cas9-crRNA complex
RS59199B1 (en) 2012-05-25 2019-10-31 Univ California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
HUE064187T2 (en) 2012-05-25 2024-02-28 Cellectis Procedure for modification of allogeneic and immunosuppressive-resistant T cells suitable for immunotherapy
HK1207111A1 (en) 2012-08-03 2016-01-22 加利福尼亚大学董事会 Methods and compositions for controlling gene expression by rna processing
JP6517143B2 (en) 2012-10-23 2019-05-22 ツールゲン インコーポレイテッド Composition for cleaving target DNA comprising guide RNA specific for target DNA and CAS protein encoding nucleic acid or CAS protein, and use thereof
KR102479178B1 (en) 2012-12-06 2022-12-19 시그마-알드리치 컴퍼니., 엘엘씨 Crispr-based genome modification and regulation
EP4299741A3 (en) 2012-12-12 2024-02-28 The Broad Institute, Inc. Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
JP2016505256A (en) 2012-12-12 2016-02-25 ザ・ブロード・インスティテュート・インコーポレイテッ CRISPR-Cas component system, method and composition for sequence manipulation
PL2896697T3 (en) 2012-12-12 2016-01-29 Broad Inst Inc Engineering of systems, methods and optimized guide compositions for sequence manipulation
IL239344B2 (en) 2012-12-12 2024-06-01 Broad Inst Inc Engineering of systems, methods and optimized guide compositions for sequence manipulation
EP2898075B1 (en) 2012-12-12 2016-03-09 The Broad Institute, Inc. Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation
JP6700788B2 (en) 2012-12-17 2020-05-27 プレジデント アンド フェローズ オブ ハーバード カレッジ RNA-induced human genome modification
AU2014205648B2 (en) 2013-01-10 2017-05-04 Dharmacon, Inc. Templates, libraries, kits and methods for generating molecules
CN113005148A (en) 2013-01-16 2021-06-22 爱默蕾大学 CAS 9-nucleic acid complexes and uses related thereto
WO2014143381A1 (en) 2013-03-09 2014-09-18 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
NZ712727A (en) 2013-03-14 2017-05-26 Caribou Biosciences Inc Compositions and methods of nucleic acid-targeting nucleic acids
US9234213B2 (en) 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
KR102271292B1 (en) 2013-03-15 2021-07-02 더 제너럴 하스피탈 코포레이션 Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
JP2016522679A (en) 2013-04-04 2016-08-04 プレジデント アンド フェローズ オブ ハーバード カレッジ Therapeutic use of genome editing with the CRISPR / Cas system
AU2014273490B2 (en) 2013-05-29 2019-05-09 Cellectis Methods for engineering T cells for immunotherapy by using RNA-guided Cas nuclease system
CN105492611A (en) * 2013-06-17 2016-04-13 布罗德研究所有限公司 Optimized CRISPR-CAS double nickase systems, methods and compositions for sequence manipulation
WO2015006290A1 (en) 2013-07-09 2015-01-15 President And Fellows Of Harvard College Multiplex rna-guided genome engineering
RS62529B1 (en) 2013-07-11 2021-11-30 Modernatx Inc Compositions comprising synthetic polynucleotides encoding crispr related proteins and synthetic sgrnas and methods of use
US10563225B2 (en) 2013-07-26 2020-02-18 President And Fellows Of Harvard College Genome engineering
ES2915377T3 (en) 2013-08-02 2022-06-22 Enevolv Inc Procedures and host cells for genomic, pathway and biomolecular engineering
US20150044192A1 (en) 2013-08-09 2015-02-12 President And Fellows Of Harvard College Methods for identifying a target site of a cas9 nuclease
US10760065B2 (en) 2013-09-05 2020-09-01 Massachusetts Institute Of Technology Tuning microbial populations with programmable nucleases
WO2015040402A1 (en) 2013-09-18 2015-03-26 Kymab Limited Methods. cells & organisms
WO2015048577A2 (en) 2013-09-27 2015-04-02 Editas Medicine, Inc. Crispr-related methods and compositions
US10822606B2 (en) 2013-09-27 2020-11-03 The Regents Of The University Of California Optimized small guide RNAs and methods of use
US20150098954A1 (en) 2013-10-08 2015-04-09 Elwha Llc Compositions and Methods Related to CRISPR Targeting
WO2015069682A2 (en) 2013-11-05 2015-05-14 President And Fellows Of Harvard College Precise microbiota engineering at the cellular level
EP3865575A1 (en) 2013-11-06 2021-08-18 Hiroshima University Vector for nucleic acid insertion
US11326209B2 (en) 2013-11-07 2022-05-10 Massachusetts Institute Of Technology Cell-based genomic recorded accumulative memory
US20160298096A1 (en) 2013-11-18 2016-10-13 Crispr Therapeutics Ag Crispr-cas system materials and methods
US9074199B1 (en) 2013-11-19 2015-07-07 President And Fellows Of Harvard College Mutant Cas9 proteins
RU2685914C1 (en) 2013-12-11 2019-04-23 Регенерон Фармасьютикалс, Инк. Methods and compositions for genome targeted modification
EP3653703A1 (en) 2013-12-12 2020-05-20 The Broad Institute, Inc. Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders
US10787654B2 (en) 2014-01-24 2020-09-29 North Carolina State University Methods and compositions for sequence guiding Cas9 targeting
HUE066611T2 (en) 2014-02-11 2024-08-28 Univ Colorado Regents Crispr enabled multiplexed genome engineering
EP3105325B2 (en) 2014-02-13 2024-10-23 Takara Bio USA, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
US10507232B2 (en) 2014-04-02 2019-12-17 University Of Florida Research Foundation, Incorporated Materials and methods for the treatment of latent viral infection
GB201406970D0 (en) 2014-04-17 2014-06-04 Green Biologics Ltd Targeted mutations
GB201406968D0 (en) 2014-04-17 2014-06-04 Green Biologics Ltd Deletion mutants
WO2015168600A2 (en) 2014-05-02 2015-11-05 Tufts University Methods and apparatus for transformation of naturally competent cells
JP2017517256A (en) 2014-05-20 2017-06-29 リージェンツ オブ ザ ユニバーシティ オブ ミネソタ How to edit gene sequences
US20170191123A1 (en) 2014-05-28 2017-07-06 Toolgen Incorporated Method for Sensitive Detection of Target DNA Using Target-Specific Nuclease
US9970001B2 (en) 2014-06-05 2018-05-15 Sangamo Therapeutics, Inc. Methods and compositions for nuclease design
WO2015191693A2 (en) 2014-06-10 2015-12-17 Massachusetts Institute Of Technology Method for gene editing
WO2015195798A1 (en) 2014-06-17 2015-12-23 Poseida Therapeutics, Inc. A method for directing proteins to specific loci in the genome and uses thereof
US20150376587A1 (en) 2014-06-25 2015-12-31 Caribou Biosciences, Inc. RNA Modification to Engineer Cas9 Activity
GB201411344D0 (en) 2014-06-26 2014-08-13 Univ Leicester Cloning
US11254933B2 (en) 2014-07-14 2022-02-22 The Regents Of The University Of California CRISPR/Cas transcriptional modulation
US20160053272A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Modifying A Sequence Using CRISPR
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
US20160076093A1 (en) 2014-08-04 2016-03-17 University Of Washington Multiplex homology-directed repair
AU2015299850B2 (en) 2014-08-06 2020-08-13 Institute For Basic Science Genome editing using Campylobacter jejuni CRISPR/CAS system-derived RGEN
US10513711B2 (en) 2014-08-13 2019-12-24 Dupont Us Holding, Llc Genetic targeting in non-conventional yeast using an RNA-guided endonuclease
WO2016040594A1 (en) 2014-09-10 2016-03-17 The Regents Of The University Of California Reconstruction of ancestral cells by enzymatic recording
EP3998344A1 (en) 2014-10-09 2022-05-18 Life Technologies Corporation Crispr oligonucleotides and gene editing
US20190100769A1 (en) 2014-10-31 2019-04-04 Massachusetts Institute Of Technology Massively parallel combinatorial genetics for crispr
AU2015355546B2 (en) 2014-12-03 2021-10-14 Agilent Technologies, Inc. Guide RNA with chemical modifications
EP3234117B1 (en) 2014-12-17 2021-03-03 DuPont US Holding, LLC Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates
AU2015364286B2 (en) 2014-12-20 2021-11-04 Arc Bio, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using CRISPR/Cas system proteins
GB201506509D0 (en) 2015-04-16 2015-06-03 Univ Wageningen Nuclease-mediated genome editing
EP3294878A1 (en) 2015-05-15 2018-03-21 Pioneer Hi-Bred International, Inc. Guide rna/cas endonuclease systems
JP7107683B2 (en) 2015-06-18 2022-07-27 ザ・ブロード・インスティテュート・インコーポレイテッド CRISPR enzyme mutations that reduce off-target effects
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
EP4545544A3 (en) 2015-06-29 2025-10-08 Ionis Pharmaceuticals, Inc. Modified crispr rna and modified single crispr rna and uses thereof
WO2017015015A1 (en) 2015-07-17 2017-01-26 Emory University Crispr-associated protein from francisella and uses related thereto
WO2017019867A1 (en) 2015-07-28 2017-02-02 Danisco Us Inc Genome editing systems and methods of use
CN107922953B (en) 2015-08-20 2022-03-04 应用干细胞有限公司 Nuclease for improving gene editing efficiency
US10369232B2 (en) 2015-09-21 2019-08-06 Arcturus Therapeutics, Inc. Allele selective gene editing and uses thereof
HK1256817A1 (en) 2015-09-25 2019-10-04 Tarveda Therapeutics, Inc. Compositions and methods for genome editing
ES2840648T3 (en) 2015-10-22 2021-07-07 Inst Nat Sante Rech Med Endonuclease barcode generation
WO2017070598A1 (en) 2015-10-23 2017-04-27 Caribou Biosciences, Inc. Engineered crispr class 2 cross-type nucleic-acid targeting nucleic acids
US11905521B2 (en) 2015-11-17 2024-02-20 The Chinese University Of Hong Kong Methods and systems for targeted gene manipulation
JP2018534942A (en) 2015-11-26 2018-11-29 ディーエヌエーイー グループ ホールディングス リミテッド Single molecule control
DK3386550T3 (en) 2015-12-07 2021-04-26 Arc Bio Llc Methods for preparing and using guide nucleic acids
CA3007840C (en) 2015-12-07 2020-09-15 Zymergen Inc. Microbial strain improvement by a htp genomic engineering platform
US9988624B2 (en) 2015-12-07 2018-06-05 Zymergen Inc. Microbial strain improvement by a HTP genomic engineering platform
JP2019500036A (en) 2015-12-24 2019-01-10 ブレイン・バイオテクノロジー・リサーチ・アンド・インフォメーション・ネットワーク・アクチェンゲゼルシャフトBRAIN Biotechnology Research and Information Network Aktiengesellschaft Reconstruction of DNA end repair pathways in prokaryotes
AU2017280353B2 (en) 2016-06-24 2021-11-11 Inscripta, Inc. Methods for generating barcoded combinatorial libraries
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases

Also Published As

Publication number Publication date
US20180230461A1 (en) 2018-08-16
WO2017223538A9 (en) 2018-07-19
AU2017280353B2 (en) 2021-11-11
CA3029254A1 (en) 2017-12-28
US20180230460A1 (en) 2018-08-16
WO2017223538A1 (en) 2017-12-28
US20190194650A1 (en) 2019-06-27
US20170369870A1 (en) 2017-12-28
CN109688820A (en) 2019-04-26
EP3474669B1 (en) 2022-04-06
JP2019518478A (en) 2019-07-04
LT3474669T (en) 2022-06-10
DK3474669T3 (en) 2022-06-27
EP3474669A1 (en) 2019-05-01
US11584928B2 (en) 2023-02-21
EP3474669A4 (en) 2019-05-08
AU2017280353A1 (en) 2019-01-24
ES2915562T3 (en) 2022-06-23
CN109688820B (en) 2023-01-10
US10287575B2 (en) 2019-05-14
US10017760B2 (en) 2018-07-10
US10294473B2 (en) 2019-05-21

Similar Documents

Publication Publication Date Title
US20230227810A1 (en) Methods for generating barcoded combinatorial libraries
Garst et al. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering
US11220678B2 (en) Engineered CRISPR-Cas9 nucleases with altered PAM specificity
US11242513B2 (en) Thermostable Cas9 nucleases
EP4257696A2 (en) Novel crispr dna targeting enzymes and systems
US11643654B2 (en) Crispr DNA targeting enzymes and systems
Maier et al. The nuts and bolts of the Haloferax CRISPR-Cas system IB
Higgins et al. Rapid and programmable protein mutagenesis using plasmid recombineering
Schroeder et al. Development of a functional genomics platform for Sinorhizobium meliloti: construction of an ORFeome
Escudero et al. Primary and promiscuous functions coexist during evolutionary innovation through whole protein domain acquisitions
Patinios et al. Targeted DNA ADP-ribosylation triggers templated repair in bacteria and base mutagenesis in eukaryotes
Venetz Development of a Standardized Assembly Technology for Large-Scale DNA Constructs and Demonstration of its Applicability to Build Synthetic Chromosomes
HK1249132B (en) Engineered crispr-cas9 nucleases with altered pam specificity

Legal Events

Date Code Title Description
AS Assignment

Owner name: THE REGENTS OF THE UNIVERSITY OF COLORADO, A BODY CORPORATE, COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GILL, RYAN T.;GARST, ANDREW;BASSALO, MARCELO COLIKA;AND OTHERS;SIGNING DATES FROM 20170828 TO 20180130;REEL/FRAME:063382/0778

Owner name: MUSE BIOTECHNOLOGY, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WARNECKE LIPSCOMB, TANYA ELIZABETH;REEL/FRAME:063382/0467

Effective date: 20180130

Owner name: INSCRIPTA, INC., COLORADO

Free format text: CHANGE OF NAME;ASSIGNOR:MUSE BIOTECHNOLOGY, INC.;REEL/FRAME:063397/0861

Effective date: 20171128

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: UNITED STATES DEPARTMENT OF ENERGY, DISTRICT OF COLUMBIA

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF COLORADO;REEL/FRAME:064645/0619

Effective date: 20230201

AS Assignment

Owner name: UNITED STATES DEPARTMENT OF ENERGY, DISTRICT OF COLUMBIA

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF COLORADO;REEL/FRAME:065179/0282

Effective date: 20230201