WO2024220475A1 - Polymerase variants - Google Patents
Polymerase variants Download PDFInfo
- Publication number
- WO2024220475A1 WO2024220475A1 PCT/US2024/024895 US2024024895W WO2024220475A1 WO 2024220475 A1 WO2024220475 A1 WO 2024220475A1 US 2024024895 W US2024024895 W US 2024024895W WO 2024220475 A1 WO2024220475 A1 WO 2024220475A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instances
- seq
- polypeptide
- enzyme
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1252—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07007—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
Definitions
- Enzymes are capable of catalyzing a wide range of chemical reactions, including those used in chemical biology for sequencing applications.
- the design and implementation of enzymes can be challenging.
- polypeptides comprising amino acid sequences comprising at least one amino acid mutation relative to SEQ ID NO: 1.
- the amino acid sequence at least 80%, at least 90%, at least 95%, at least 98%, or 100% homologous to any one of SEQ ID NOs: 3-9.
- the mutation comprises an addition, deletion, substitution, or combination thereof.
- the deletion comprises 250-300 amino acids from tire N-terminus relative to SEQ ID NO: 1.
- the polypeptide comprises at least 2, at least 3, or at least 4 amino acid mutations relative to SEQ ID NO: 1.
- the mutations are at one or more of positions V449, V493, L522, L605, T664, E681, W706, D732, R736, R736, and G824 relative to SEQ ID NO: 1.
- the mutations are selected from one or more of V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, and G824A relative to SEQ ID NO: 1.
- the polypeptide comprises a purification tag.
- nucleic acid molecules encoding for the polypeptides, and vectors and cells comprising the nucleic acid molecules.
- the method comprises contacting the first polynucleotide with a nucleotide and a polypeptide to form an extended polynucleotide.
- the polypeptide comprises an amino acid sequence comprising at least one amino acid mutation relative to SEQ ID NO: 1.
- the first polynucleotide comprises genomic DNA or a fragment thereof, cDNA, or adenosine triphosphate.
- the method is at least 90% selective for incorporation of a single nucleotide. In some embodiments, the method is at least 90% selective for incorporation of a nucleotide type.
- the method is at least 95% selective for adenine (A) over guanine (G).
- the method further comprises ligating an adapter to the extended polynucleotide.
- the adapter comprises a complementary overhang to the extended polynucleotide.
- the method further comprises extending a second polynucleotide. In some aspects, the polynucleotide and the second polynucleotide are hybridized.
- method comprises providing a plurality of nucleic acids: end-repairing the plurality of nucleic acids; performing a-tailing on the plurality of nucleic acids using a polymerase; and ligating at least one adapter to the nucleic acids using a ligase.
- the polymerase comprises an amino acid sequence comprising at least one amino acid mutation relative to SEQ ID NO: 1.
- FIG. 1 is a diagram depicting an exemplary workflow for assaying A-tailing activity of variants of TaqIT DNA polymerase (“TaqIT”), according to aspects of the present disclosure.
- FIGS. 2A-2D is a bar graph demonstrating end compositions of an exemplary adapter before and after end-repairing and A-tailing, according to aspects of the present disclosure.
- FIG. 2A depicts read counts for untreated cell-free DNA (cfDNA) molecules having blunt ends or overhangs of varying lengths.
- FIG. 2B depicts read counts for end-repaired cfDNA molecules having blunt ends or overhangs of vary ing lengths.
- FIG. 2C depicts read counts for end-repaired and A-tailed cfDNA molecules having blunt ends or overhangs of vary ing lengths.
- FIG. 2D depicts an end composition of one base pair having a 3‘ overhang added by wild-type TaqIT DNA polymerase.
- FIG. 3 is a probability plot depicting cumulative probabilities for amino acids (0.0 to 1.0 at 0.2 unit intervals) versus position in Taq DNA polymerase (left to right: 730-755), according to aspects of the present disclosure.
- FIG. 4A is a scatter plot depicting mean-normalized results from an exemplary first round screen of A-tailing variants of DNA polymerase, according to aspects of the present disclosure.
- FIG. 4B is a table depicting fold change values of top performer variants over wild-type DNA polymerase, according to aspects of the present disclosure.
- FIGS. 5A-5C demonstrate results of an exemplary experiment comparing n Taq DNA polymerase homologues to wild-type, according to aspects of the present disclosure.
- FIG. 5A is a photograph of an SDS-PAGE gel of two purified wild-type DNA polymerases.
- FIG. 5B is a photograph of an SDS-PAGE gel of twelve purified homologues of Taq DNA polymerases.
- FIG. 5C is a bar graph depicting results of next-generation sequencing performed with each of the tw elve Taq DNA polymerase homologues and the two wild-type DNA polymerases.
- FIGS. 6A-6C depict results of an exemplary experiment comparing binary A-tailing variants of TaqIT DNA polymerase to wild-type, according to aspects of the present disclosure.
- FIG. 6A is a scatter plot depicting normalized results from exemplary binary ⁇ A-tailing variants of TaqIT DNA Polymerase.
- FIG. 6B is a table depicting fold change value of top performer binary variants over wild-type.
- FIG. 6C is a scatter plot depicting additional results from binary A-tailing variants of TaqIT DNA polymerase.
- FIGS. 7A-7C depicts results of an exemplary experiment evaluating binary’ variants of TaqIT DNA poly merase, according to aspects of the present disclosure.
- FIG. 7A-7C depicts results of an exemplary experiment evaluating binary’ variants of TaqIT DNA poly merase, according to aspects of the present disclosure.
- FIG. 7A is a photograph of an SDS- PAGE gel of purified binary’ variants of TaqIT DNA polymerase.
- FIG. 7B is a bar graph depicting results from next-generation sequencing performed with binary variants as compared to wild-type.
- FIG. 7C is a bar graph depicting additional binary variants next-generation sequencing results.
- FIGS. 8A-8B depict results of an exemplary experiment evaluating effectiveness of binary A- tailing variants of TaqIT DNA polymerase, according to aspects of the present disclosure.
- FIG. 8A is a bar graph depicting fraction reads with correct tail length after A-tailing with the binary variants.
- FIG. 8B is a bar graph depicting fraction reads of single-base pair 3’ overhangs that had a guanine (G) instead of an adenosine (A) addition.
- G guanine
- A adenosine
- FIGS. 9A-9B depicts results of an exemplary experiment evaluating tertiary variants of TaqIT DNA polymerase, according to aspects of the present disclosure.
- FIG. 9A is a scatter plot depicting normalized results from the tertiary variants.
- FIG. 9B is a table depicting fold change values of top performer tertiary variants over wild-type.
- compositions and methods for generation of sequencing libraries are provided herein. Further provided herein are engineered enzy mes to improve library' generation. Further provided herein are polymerases for generating sequencing libraries.
- nucleic acid encompass double-stranded or triple-stranded nucleic acid molecules, as well as single-stranded nucleic acid molecules.
- nucleic acid strands need not be coextensive (i.e., a double-stranded nucleic acid molecule need not be double-stranded along the entire length of both strands).
- Nucleic acid sequences, when provided, are listed in the 5’ to 3’ direction, unless stated otherwise. Methods described herein provide for the generation of isolated nucleic acids. Methods described herein additionally provide for the generation of isolated and purified nucleic acids.
- a “nucleic acid” as referred to herein can comprise at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, or more bases in length.
- polypeptide-segments encoding nucleotide sequences, including sequences encoding non- ribosomal peptides (NRPs), sequences encoding non-ribosomal peptide-synthetase (NRPS) modules and synthetic variants, polypeptide segments of other modular proteins, such as antibodies, polypeptide segments from other protein families, including non-coding DNA or RNA, such as regulatory sequences e.g. promoters, transcription factors, enhancers, siRNA, shRNA, RNAi, miRNA, small nucleolar RNA derived from microRNA, or any functional or structural DNA or RNA unit of interest.
- NRPs non-ribosomal peptides
- NRPS non-ribosomal peptide-synthetase
- synthetic variants polypeptide segments of other modular proteins, such as antibodies, polypeptide segments from other protein families, including non-coding DNA or RNA, such as regulatory sequences e.g. promoters, transcription factors
- polynucleotides coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA. short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), small nucleolar RNA, ribozymes, complementary DNA (cDNA).
- cDNA encoding for a gene or gene fragment referred herein may comprise at least one region encoding for exon sequences without an intervening intron sequence in the genomic equivalent sequence.
- the enzyme comprises a polymerase.
- the enzy me is configured to increase the specificity of non-templated 3' nucleotide addition.
- the enzy me is configured to increase the specificity of non- templated 3‘ adenosine addition.
- the enzy me comprise a Taq polymerase.
- the Taq polymerase is selected from Table 1, below.
- the Taq polymerase is a truncated Taq polymerase (e.g.. a TaqIT polymerase).
- the enzyme comprises a variant of SEQ ID NO: 1.
- the enzyme comprises a variant of SEQ ID NO: 2.
- Taq polymerases may be used for “A-tailing”. wherein an adenosine nucleotide is extended from the 3’ end of a polynucleotide (e.g.. genomic DNA, cDNA). In some instances, extension to generate an overhang facilitates ligation with adapters. In some instances, ligation occurs using T4 ligase or other ligase. In some instances, variant polymerases provided herein provide for higher control over the number of nucleotides added. In some instances, the nucleotide comprises adenosine triphosphate. In some instances, the variant enzyme comprises at least 70%. 75%, 80%, 85%, 90%, 95%.
- variant polymerase comprises at least 70%, 75%, 80%, 85%. 90%, 95%, 97%, or at least 99% selectivity for a single nucleotide type. In some instances, the variant polymerase comprises at least 70%, 75%, 80%, 85%, 90%, 95%. 97%, or at least 99% selectivity for adenosine. In some instances, the variant polymerase comprises at least 70%.
- variant polymerase extend the 3’ end of a first polynucleotide and a second polynucleotide. In some instances, a first polynucleotide and a second polynucleotide are hybridized together.
- An enzyme provided herein may comprise one or more variants of SEQ ID NO: 1.
- a variant comprises one or more of an insertion, deletion, or substitution relative to SEQ ID NO: 1.
- a deletion comprises an N-terminal deletion.
- a deletion comprises a C-terminal deletion.
- a deletion comprises a deletion of at least 10, 25. 30, 50, 60, 100, 150, 200, 250, 280, 300, or at least 350 amino acids.
- a deletion comprises a deletion of at least 10, 25, 30. 50, 60, 100. 150, 200, 250. 280, 300, or at least 350 amino acids from the N-terminus.
- a deletion comprises a deletion of 20-300. 20-290.
- a deletion comprises a deletion of 20-300, 20-290, 20-250, 20-200, 50- 300, 100-300. 150-300, 200-300. 200-350, 200-400. 250-400, 250-300. 250-350, 275-300. or 275-325 amino acids from the N-terminus.
- a variant comprises at least 1. at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11. at least 12, at least 13, at least 14.
- a variant comprises about 1, about 2, about 3, about 4, about 5, about 6. about 7. about 8. about 9, about 10. about 1 1, about 12. about 13, about 14, about 15. or about 16 variant amino acid positions of SEQ ID NO: 1.
- An enzyme provided herein may comprise one or more variants of SEQ ID NO: 2.
- a variant comprises at least 1, at least 2, at least 3, at least 4, at least 5. at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14. at least 15, or at least 16 variant amino acid positions of SEQ ID NO: 2.
- a variant comprises about 1, about 2.
- An enzyme provided herein may comprise a sequence having homology or similarity and mutations at one or more amino acid positions.
- an enzyme comprises a mutation at one or more of positions selected from 449. 493, 522, 605. 664, 681, 706. 732, 736, or 824 and at least 95% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at two or more of positions selected from 449. 493, 522, 605, 664, 681. 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at three or more of positions selected from 449, 493. 522, 605, 664. 681, 706, 732. 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzyme comprises a mutation at four or more of positions selected from 449, 493. 522. 605, 664, 681. 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzyme comprises a mutation at five or more of positions selected from 449, 493. 522, 605, 664. 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at six or more of positions selected from 449, 493, 522, 605. 664, 681, 706. 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzyme comprises a mutation at seven or more of positions selected from 449, 493, 522, 605, 664, 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzy e comprises a mutation at eight or more of positions selected from 449, 493, 522, 605, 664, 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzy e comprises a mutation at nine or more of positions selected from 449, 493, 522, 605, 664, 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at ten or more of positions selected from 449, 493, 522, 605, 664, 681, 706. 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- An enzyme provided herein may comprise a sequence having homology or similarity and mutations at one or more amino acid positions.
- an enzyme comprises a mutation at one or more of positions selected from V449F, V493L, L522I, L605C, T664I. E681G. W706Y, D732A, R736K. R736Q. or G824A and at least 95% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at two or more of positions selected from V449F. V493L. L522I, L605C, T664I, E681G. W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at three or more of positions selected from V449F, V493L. L522I, L605C, T664I, E681G, W706Y. D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at four or more of positions selected from V449F, V493L, L522I. L605C. T664I, E681G, W706Y, D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at five or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681 G, W706Y. D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at six or more of positions selected from V449F, V493L, L522I, L605C. T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at seven or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G. W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at eight or more of positions selected from V449F, V493L. L522I, L605C, T664I. E681G. W706Y, D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at nine or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at ten or more of positions selected from V449F, V493L, L522I. L605C. T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 1.
- An enzyme provided herein may comprise one or more variants of SEQ ID NO: 2.
- a variant comprises at least 1, at least 2, at least 3, at least 4, at least 5. at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14. at least 15, or at least 16 variant amino acid positions of SEQ ID NO: 2.
- a variant comprises about 1, about 2. about 3, about 4, about 5. about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, or about 16 variant amino acid positions of SEQ ID NO: 2.
- An enzyme provided herein may comprise a sequence having homolog ⁇ 7 or similarity 7 and mutations at one or more amino acid positions.
- an enzy me comprises a mutation at one or more of positions selected from 449, 493, 522, 605, 664, 681, 706, 732, 736, or 824 and at least 95% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at tyvo or more of positions selected from 449, 493, 522, 605, 664, 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enz me comprises a mutation at three or more of positions selected from 449, 493, 522, 605, 664, 681, 706. 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at four or more of positions selected from 449, 493, 522, 605, 664. 681, 706, 732. 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at five or more of positions selected from 449, 493, 522, 605. 664, 681, 706. 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at six or more of positions selected from 449. 493, 522, 605, 664, 681. 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzyme comprises a mutation at seven or more of positions selected from 449, 493. 522, 605, 664. 681, 706, 732. 736, or 824 and at least 90% similarity to SEQ ID NO: 1. In some instances, an enzyme comprises a mutation at eight or more of positions selected from 449, 493. 522, 605, 664, 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1.
- an enzyme comprises a mutation at nine or more of positions selected from 449, 493, 522. 605, 664, 681. 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1 . In some instances, an enzyme comprises a mutation at ten or more of positions selected from 449, 493. 522, 605, 664. 681, 706, 732, 736, or 824 and at least 90% similarity to SEQ ID NO: 1. [0032] An enzyme provided herein may comprise a sequence having homology or similarity and mutations at one or more amino acid positions.
- an enzyme comprises a mutation at one or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 95% similarity 7 to SEQ ID NO: 2.
- an enzyme comprises a mutation at tyvo or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzyme comprises a mutation at three or more of positions selected from V449F, V493L. L522I, L605C, T664I, E681G, W706Y. D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzyme comprises a mutation at four or more of positions selected from V449F, V493L, L522I. L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzyme comprises a mutation at five or more of positions selected from V449F, V493L, L522I, L605C. T664I, E681G, W706Y. D732A. R736K. R736Q. or G824A and at least 90% similarity to SEQ ID NO: 2. In some instances, an enzyme comprises a mutation at six or more of positions selected from V449F, V493L, L522I, L605C. T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity’ to SEQ ID NO: 2.
- an enzyme comprises a mutation at seven or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G. W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzyme comprises a mutation at eight or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzy me comprises a mutation at nine or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- an enzyme comprises a mutation at ten or more of positions selected from V449F, V493L, L522I, L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, or G824A and at least 90% similarity to SEQ ID NO: 2.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 1.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 1.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%, at least about 99%.
- At least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 1.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%.
- 20-100 contiguous amino acids of an enzy me provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 1.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 2.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 2.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%, at least about 98%.
- at least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%. at least about 99.5%. or more similarity with SEQ ID NO: 2.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%. at least about 97%. at least about 98%, at least about 99%, at least about 99.5%. or more similarity with SEQ ID NO: 2.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 3.
- an enzyme provided herein comprises at least about 50%. at least about 60%, at least about 70%, at least about 80%. at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%. at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 3.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%, at least about 99%.
- At least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 3.
- At least 100 contiguous amino acids of an enzy me provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 3.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 3.
- Air enzyme provided herein may comprise a sequence having homology' or similarity with SEQ ID NO: 4.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%. or more similarity with SEQ ID NO: 4.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%. at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- at least 50 contiguous amino acids of an enzy me provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%. at least about 99.5%, or more similarity with SEQ ID NO: 4.
- at least 100 contiguous amino acids of an enzy me provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 4.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 5.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%. at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 5.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%, at least about 99%.
- At least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 5.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%.
- 20-100 contiguous amino acids of an enzy e provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 5.
- Air enzyme provided herein may comprise a sequence having homology' or similarity' with SEQ ID NO: 6.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 6.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%, at least about 98%.
- at least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%, at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%. at least about 99.5%, or more similarity with SEQ ID NO: 6.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity' with SEQ ID NO: 6.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 7. In some instances, an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%. at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%, at least about 99%. at least about 99.5%, or more similarity with SEQ ID NO: 7.
- at least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity’ with SEQ ID NO: 7.
- 20-100 contiguous amino acids of an enzy me provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 7.
- An enzyme provided herein may comprise a sequence having homology’ or similarity’ with SEQ ID NO: 8.
- an enzyme provided herein comprises at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 8.
- at least 10 contiguous amino acids of an enzy me provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%, at least about 98%.
- At least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%, at least about 80%. at least about 85%, at least about 90%, at least about 95%. at least about 97%, at least about 98%, at least about 99%, at least about 99.5%. or more similarity with SEQ ID NO: 8.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity with SEQ ID NO: 8.
- An enzyme provided herein may comprise a sequence having homology or similarity with SEQ ID NO: 9.
- an enzy me provided herein comprises at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%. at least about 95%, at least about 97%. at least about 98%, at least about 99%, at least about 99.5%. or more similarity with SEQ ID NO: 9.
- at least 10 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%. at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- at least 50 contiguous amino acids of an enzyme provided herein comprise at least about 50%. at least about 60%, at least about 70%, at least about 80%, at least about 85%. at least about 90%, at least about 95%, at least about 97%. at least about 98%, at least about 99%. at least about 99.5%, or more similarity with SEQ ID NO: 9.
- at least 100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%.
- 20-100 contiguous amino acids of an enzyme provided herein comprise at least about 50%, at least about 60%, at least about 70%. at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or more similarity’ with SEQ ID NO: 9.
- an amino acid sequence of an enzyme or enzyme fragment may be used as input.
- An amino acid sequence of any enzyme may be used for input in the methods and s stems described herein.
- a database comprising known mutations from an organism may be queried, and a library' of sequences comprising combinations of these mutations may be generated.
- specific mutations or combinations of mutations may be excluded from the library (e.g., known immunogenic sites, structure sites, etc.).
- specific sites in the input sequence may be systematically replaced with histidine, aspartic acid, glutamic acid, or combinations thereof.
- the maximum or minimum number of mutations allowed for each region of an enzyme may be specified.
- sequences generated by the optimization may comprise at least 1, at least 2, at least 3, at least 4, at least 5. at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13. at least 14. at least 15, at least 16. or more than 16 mutations from the input sequence.
- sequences generated by the optimization comprise no more than 1, no more than 2. no more than 3. no more than 4, no more than 5, no more than 6, no more than 7. no more than 8, no more than 9, no more than 10. no more than 11, no more than 12, no more than 13. no more than 14, no more than 15, no more than 16. or no more than 18 mutations from the input sequence.
- sequences generated by the optimization comprise about 1, about 2. about 3. about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12. about 13, about 14, about 15, about 16, or about 18 mutations relative to the input sequence.
- in silico enzy me libraries may be synthesized, assembled, and/or enriched for desired sequences.
- Germline sequences corresponding to an input sequence may also be modified to generate sequences in a library.
- sequences generated by the optimization methods described herein comprise at least 1, at least 2, at least 3. at least 4, at least 5, at least 6, at least 7, at least 8. at least 9, at least 10, at least 11, at least 12. at least 13, at least 14. at least 15. at least 16, or more than 16 mutations from the germline sequence.
- sequences generated by the optimization comprise no more than 1, no more than 2, no more than 3, no more than 4. no more than 5, no more than 6, no more than 7. no more than 8, no more than 9, no more than 10. no more than 11, no more than 12, no more than 13, no more than 14, no more than 15. no more than 16or no more than 18 mutations from the germline sequence. In some instances, sequences generated by the optimization comprise about 1, about 2. about 3. about 4, about 5, about 6. about 7. about 8, about 9, about 10. about 11, about 12, about 13. about 14, about 15, about 16. or about 18 mutations relative to the germline sequence.
- Data from preprocessing operations, as described herein, may be fed into one or more machine learning (ML) algorithms for identifying a library comprising one or more candidates with high affinity to a target and/or functional activity.
- the one or more candidates comprise one or more sequences encoding for an enzy me.
- the library may be a sy nthetic library.
- the ML algorithms may be integrated into a computational pipeline for intelligent decision making and/or experimental validation.
- the one or more ML algorithms may be supervised, semi-supervised, or unsupervised for training to identify' anomalies.
- the one or more ML algorithms may perform classification or clustering to identify anomalies or attacks.
- the one or more ML algorithms may comprise classical ML algorithms for performing clustering to identify outliers.
- Classical ML algorithms may comprise of algorithms that learn from existing observations (i.e., known features) to predict outputs.
- the classical ML algorithms for performing clustering may be K-means clustering, mean-shift clustering, density -based spatial clustering of applications with noise (DBSCAN), expectation-maximization (EM) clustering (e.g., using Gaussian mixture models (GMM)), agglomerative hierarchical clustering, or a combination thereof.
- the one or more ML algorithms may comprise classical ML algorithms for classification.
- the classical ML algorithms may comprise logistic regression, naive Bayes, K-nearest neighbors, random forests or decision trees, gradient boosting, support vector machines (SVMs). or a combination thereof.
- the one or more ML algorithm may employ deep learning.
- a deep learning algorithm may comprise of an algorithm that learns by extracting new features to predict outputs.
- the deep learning algorithm may comprise of layers, which may comprise a neural network.
- libraries comprising nucleic acids encoding for enzymes, wherein the libraries have improved specificity, stability, expression, folding, or downstream activity.
- libraries described herein may be used for screening and analysis.
- screening and analysis comprises in vitro, in vivo, or ex vivo assays.
- Cells for screening include primary cells taken from living subjects or cell lines. Cells may be from prokaryotes (e.g., bacteria and fungi) or eukaryotes (e.g.. animals and plants).
- Exemplary animal cells include, without limitation, those from a mouse, rabbit, primate, and insect.
- cells for screening include a cell line including, but not limited to, Chinese Hamster Ovary (CHO) cell line, human embry onic kidney (HEK) cell line, or baby hamster kidney (BHK) cell line.
- nucleic acid libraries described herein may also be delivered to a multicellular organism.
- Exemplary multicellular organisms include, without limitation, a plant, a mouse, a rat, a rabbit, a primate (e.g.. a monkey or an ape), a fish, a worm, a bird, a chicken, a camelid. a cat, a dog. a horse, a cow, a sheep, a goat, a frog, or an insect.
- Nucleic acid libraries described herein may be screened for various pharmacological or pharmacokinetic properties.
- the libraries are screened using in vitro assays, in vivo assays, or ex vivo assays.
- in vitro pharmacological or pharmacokinetic properties that are screened include, but are not limited to, binding affinity, binding specificity, and binding avidity.
- Exemplar ⁇ ' in vivo pharmacological or pharmacokinetic properties of libraries described herein that arc screened include, but are not limited to, therapeutic efficacy, activity, preclinical toxicity properties, clinical efficacy properties, clinical toxicity properties, immunogenicity, potency, and clinical safety properties.
- nucleic acid libraries wherein the nucleic acid libraries may be expressed in a vector.
- Expression vectors for inserting nucleic acid libraries disclosed herein may comprise eukary otic or prokaryotic expression vectors.
- Exemplary expression vectors include, without limitation, mammalian expression vectors: pSF-CMV-NEO-NH2-PPT-3XFLAG, pSF-CMV-NEO-COOH-3XFLAG, pSF- CMV-PURO-NH2-GST-TEV, pSF-OXB20-COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV- Ub-KrYFP, pSF-CMV-FMDV-daGFP, pEFla-mCherry-Nl Vector, pEFla-tdTomato Vector, pSF-CMV- FMDV-Hygro, pSF-CMV-PGK-Puro, pMC
- nucleic acid libraries that are expressed in a vector to generate a construct comprising an enzyme.
- a size of the construct varies.
- the construct comprises at least or about 500. at least or about 600. at least or about 700, at least or about 800, at least or about 900, at least or about 1000.
- a the construct comprises a range of about 300 to 1,000, 300 to 2.000. 300 to 3,000, 300 to 4.000. 300 to 5,000, 300 to 6.000. 300 to 7,000, 300 to 8,000, 300 to 9.000. 300 to 10.000. 1,000 to 2,000, 1,000 to 3.000. 1,000 to 4,000, 1,000 to 5,000. 1.000 to 6,000, 1,000 to 7,000. 1.000 to 8,000, 1,000 to 9,000, 1.000 to 10,000, 2.000 to 3.000. 2,000 to 4,000. 2.000 to 5.000. 2,000 to 6,000, 2.000 to 7.000. 2,000 to 8,000, 2,000 to 9.000. 2.000 to 10.000. 3,000 to 4.000. 3.000 to 5,000, 3,000 to 6,000.
- libraries comprising nucleic acids encoding for enzymes, wherein the nucleic acid libraries are expressed in a cell.
- the libraries are synthesized to express a reporter gene.
- Exemplary' reporter genes include, but are not limited to, acetohydroxy acid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucoronidase (GUS), chloramphenicol acety ltransferase (CAT), green fluorescent protein (GFP), red fluorescent protein (RFP), yellow fluorescent protein (YFP), cyan fluorescent protein (CFP), cerulean fluorescent protein, citrine fluorescent protein, orange fluorescent protein , cherry fluorescent protein, turquoise fluorescent protein, blue fluorescent protein, horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, and derivatives thereof.
- HRP horseradish peroxidase
- Methods to determine modulation of a reporter gene include, but are not limited to, fluorometric methods (e.g. fluorescence spectroscopy, Fluorescence Activated Cell Sorting (FACS), fluorescence microscopy), and antibiotic resistance determination.
- fluorometric methods e.g. fluorescence spectroscopy, Fluorescence Activated Cell Sorting (FACS), fluorescence microscopy
- antibiotic resistance determination e.g. antibiotic resistance determination.
- sequence identity means that two polynucleotide sequences are identical (i.e., on a nucleotide-by -nucleotide basis) over the window of comparison.
- percentage of sequence identity is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g.. A, T, C. G, U. or 1) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity.
- the term “homology” or “similarity” between two proteins is determined by comparing the amino acid sequence and its conserved amino acid substitutes of one protein sequence to the second protein sequence. Similarity may be determined by procedures which are well-known in the art, for example, a BLAST program (Basic Local Alignment Search Tool at the National Center for Biological Information).
- libraries comprising nucleic acids encoding for enzymes (e.g., polymerases). Enzymes described herein allow for improved stability for a range of active site encoding sequences. In some instances, the active site encoding sequences are determined by interactions between the substrate and the catalytically active site of an enzyme. [0058] Sequences of active sites based on surface interactions between a ligand/substrate and an enzyme described herein are analyzed using various methods. For example, multispecies computational analysis is performed. In some instances, a structure analysis is performed. In some instances, a sequence analysis is performed. Sequence analysis can be performed using a database known in the art.
- Non-limiting examples of databases include, but are not limited to, NCBI BLAST (blast.ncbi.nlm.nih.gov/Blast.cgi), UCSC Genome Browser (genome.ucsc.edu/), UniProt (www.r iprot.org/), and IUPHAR/BPS Guide to PHARMACOLOGY (guidetopharmacology.org/).
- Described herein are active sites designed based on sequence analysis among various organisms. For example, sequence analysis is performed to identify homologous sequences in different organisms. Exemplary organisms include, but are not limited to, mouse, rat, equine, sheep, cow. primate (e g.. chimpanzee, baboon, gorilla, orangutan, monkey), dog, cat, pig, donkey, rabbit, camelid. fish, fly, or human. In some instances, homologous sequences are identified in the same organism, across individuals. [0060] Following identification of active sites, libraries comprising nucleic acids encoding for the active sites may be generated.
- libraries of active sites comprise sequences of active sites designed based on conformational ligand/substrate interactions.
- Libraries of active sites may be translated to generate protein libraries.
- libraries of active sites arc translated to generate peptide libraries, immunoglobulin libraries, derivatives thereof, or combinations thereof.
- libraries of active sites are translated to generate protein libraries that are further modified to generate peptidomimetic libraries.
- libraries of active sites are translated to generate protein libraries that are used to generate small molecules.
- Methods described herein provide for synthesis of libraries of active sites comprising nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence.
- the predetermined reference sequence is a nucleic acid sequence encoding for a protein
- the variant library comprises sequences encoding for variation of at least a single codon such that a plurality of different variants of a single residue in the subsequent protein encoded by the synthesized nucleic acid are generated by standard translation processes.
- the libraries of active sites comprise varied nucleic acids collectively encoding variations at multiple positions.
- the variant library comprises sequences encoding for variation of at least a single codon in an active site.
- the variant library comprises sequences encoding for variation of multiple codons in an active site.
- An exemplary number of codons for variation include, but are not limited to, at least or about 1, 5. 10, 15, 20, 25. 30, 35, 40, 45. 50. 55, 60, 65, 70. 75, 80, 85, 90. 95. 100, 125. 150, 175, 225. 250, 275, 300, or more than 300 codons.
- the library comprises sequences encoding for variation of length of at least or about 1, 5. 10, 15, 20, 25, 30, 35, 40, 45, 50. 55, 60, 65, 70. 75, 80, 85, 90, 95, 100, 125, 150. 175, 225, 250, 275, 300, or more than 300 codons less as compared to a predetermined reference sequence.
- the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20, 25, 30. 35, 40, 45, 50. 55. 60, 65, 70, 75. 80, 85, 90, 95. 100, 125, 150, 175, 200. 225, 250, 275. 300, or more than 300 codons more as compared to a predetermined reference sequence.
- enzymes may be designed and synthesized to comprise the active sites. Enzymes comprising active sites may be designed based on binding, specificity, stability, expression, folding, or downstream activity.
- Methods described herein provide for synthesis of a library of nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence.
- the predetermined reference sequence is a nucleic acid sequence encoding for a protein
- the variant library comprises sequences encoding for variation of at least a single codon such that a plurality of different variants of a single residue in the subsequent protein encoded by the synthesized nucleic acid are generated by standard translation processes.
- the library comprises varied nucleic acids collectively encoding variations at multiple positions.
- the variant library comprises sequences encoding for variation of at least a single codon in an active site. For example, at least one single codon of the enzyme is varied.
- An exemplary number of codons for variation include, but are not limited to, at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85. 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons.
- Methods described herein provide for synthesis of a library of nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence, wherein the library comprises sequences encoding for variation of length of a domain in the enzyme.
- the library comprises sequences encoding for variation of length of at least or about 1, 5, 10. 15, 20, 25, 30, 35, 40, 45, 50. 55, 60, 65, 70. 75, 80, 85, 90, 95, 100, 125, 150. 175, 225, 250, 275, 300, or more than 300 codons less as compared to a predetermined reference sequence.
- the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20.
- tags include, but are not limited to. a radioactive label, a fluorescent label, an enzyme, a chemiluminescent tag. a colorimetric tag. an affinity tag or other labels or tags that are known in the art.
- the tag is histidine, polyhistidine, myc, hemagglutinin (HA), or FLAG.
- libraries are assayed by sequencing using various methods including, but not limited to, single-molecule real-time (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis.
- SMRT single-molecule real-time
- Polony sequencing sequencing by ligation
- reversible terminator sequencing proton detection sequencing
- ion semiconductor sequencing nanopore sequencing
- electronic sequencing pyrosequencing
- Maxam-Gilbert sequencing Maxam-Gilbert sequencing
- chain termination e.g., Sanger sequencing
- +S sequencing e.g., +S sequencing, or sequencing by synthesis.
- libraries are assayed for A- tailing activity or stability.
- Variant nucleic acid libraries described herein may comprise a plurality of nucleic acids, wherein each nucleic acid encodes for a variant codon sequence compared to a reference nucleic acid sequence.
- each nucleic acid of a first nucleic acid population contains a variant at a single variant site.
- the first nucleic acid population contains a plurality of variants at a single variant site such that the first nucleic acid population contains more than one variant at the same variant site.
- the first nucleic acid population may comprise nucleic acids collectively encoding multiple codon variants at the same variant site.
- the first nucleic acid population may comprise nucleic acids collectively encoding up to 19 or more codons at the same position.
- the first nucleic acid population may comprise nucleic acids collectively encoding up to 60 variant triplets at the same position, or the first nucleic acid population may comprise nucleic acids collectively encoding up to 61 different triplets of codons at the same position.
- Each variant may encode for a codon that results in a different amino acid during translation.
- Table 2 provides a listing of each codon possible (and the representative amino acid) for a variant site.
- a nucleic acid population may comprise varied nucleic acids collectively encoding up to 20 codon variations at multiple positions.
- each nucleic acid in the population comprises variation for codons at more than one position in the same nucleic acid.
- each nucleic acid in the population comprises variation for codons at 1. 2. 3, 4, 5, 6, 7. 8. 9, 10. 11. 12, 13, 14. 15. 16, 17, 18, 19, 20 or more codons in a single nucleic acid.
- each variant long nucleic acid comprises variation for codons at 1, 2, 3, 4, 5. 6. 7, 8, 9, 10, 11, 12. 13, 14, 15, 16. 17. 18, 19, 20, 21. 22, 23, 24, 25, 26. 'll. 28, 29, 30 or more codons in a single long nucleic acid.
- the variant nucleic acid population comprises variation for codons at 1, 2, 3. 4. 5, 6, 7, 8. 9. 10. 11, 12, 13. 14. 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27. 28, 29, 30 or more codons in a single nucleic acid. In some instances, the variant nucleic acid population comprises variation for codons in at least about 10, 20, 30. 40, 50. 60. 70, 80, 90, 100 or more codons in a single long nucleic acid.
- a platform approach utilizing miniaturization, parallelization, and vertical integration of the end-to-end process from polynucleotide synthesis to gene assembly within nanowells on silicon to create a revolutionary synthesis platform.
- Devices described herein provide, with the same footprint as a 96-well plate, a silicon synthesis platform is capable of increasing throughput by a factor of up to 1,000 or more compared to traditional synthesis methods, with production of up to approximately 1,000.000 or more polynucleotides, or 10,000 or more genes in a single highly -parallelized run.
- Genomic information encoded in the DNA is transcribed into a message that is then translated into the protein that is the active product within a given biological pathway.
- Saturation mutagenesis in which a researcher attempts to generate all possible mutations at a specific site within the receptor, represents one approach to this development challenge. Though costly and time and labor-intensive, it enables each variant to be introduced into each position. In contrast, combinatorial mutagenesis, where a few selected positions or short stretch of DNA may be modified extensively, generates an incomplete repertoire of variants with biased representation.
- a library with the desired variants available at the intended frequency in the right position available for testing — in other words, a precision library, enables reduced costs as well as turnaround time for screening.
- an enzyme itself can be optimized using methods described herein.
- a variant polynucleotide library encoding for a portion of the enzyme is designed and synthesized.
- a variant nucleic acid library for the enzyme can then be generated by processes described herein (e.g.. PCR mutagenesis followed by insertion into a vector).
- the enzyme is then expressed in a production cell line and screened for enhanced activity.
- Example screens include examining modulation in binding affinity to a substrate, stability (e.g., heat, salt), or function (e.g., substrate scope, speed).
- Nucleic acid libraries synthesized by methods described herein may be expressed in various cells associated with a disease state.
- Cells associated with a disease state include cell lines, tissue samples, primary' cells from a subject, cultured cells expanded from a subject, or cells in a model system.
- Exemplary' model systems include, without limitation, plant and animal models of a disease state.
- a variant nucleic acid library' described herein is expressed in a cell associated with a disease state, or one in which a cell a disease state can be induced. In some instances, an agent is used to induce a disease state in cells.
- Exemplary tools for disease state induction include, without limitation, a Cre/Lox recombination system, LPS inflammation induction, and streptozotocin to induce hypoglycemia.
- the cells associated with a disease state may be cells from a model sy stem or cultured cells, as well as cells from a subject having a particular disease condition.
- Exemplary disease conditions include a bacterial, fungal, viral, autoimmune, or proliferative disorder (e.g.. cancer).
- the variant nucleic acid library is expressed in the model system, cell line, or primary cells derived from a subject, and screened for changes in at least one cellular activity.
- Exemplary cellular activities include, without limitation, proliferation, cycle progression, cell death, adhesion, migration, reproduction, cell signaling, energy production, oxy' gen utilization, metabolic activity, and aging, response to free radical damage, or any combination thereof.
- methods described herein provide for generation of a library of nucleic acids comprising variant nucleic acids differing at a plurality' of codon sites.
- a nucleic acid may have 1 site, 2 sites. 3 sites, 4 sites, 5 sites. 6 sites, 7 sites. 8 sites, 9 sites. 10 sites, 11 sites, 12 sites, 13 sites, 14 sites. 15 sites. 16 sites, 17 sites 18 sites, 19 sites, 20 sites, 30 sites. 40 sites. 50 sites, or more of variant codon sites.
- the one or more sites of variant codon sites may be adjacent.
- the one or more sites of variant codon sites may not be adjacent and separated by 1, 2, 3, 4, 5. 6, 7, 8, 9, 10, or more codons.
- a nucleic acid may' comprise multiple sites of variant codon sites, wherein all the variant codon sites are adjacent to one another, forming a stretch of variant codon sites. In some instances, a nucleic acid may comprise multiple sites of variant codon sites, wherein none the variant codon sites are adjacent to one another. In some instances, a nucleic acid may comprise multiple sites of variant codon sites, wherein some the variant codon sites are adjacent to one another, forming a stretch of variant codon sites, and some of the variant codon sites are not adjacent to one another.
- Enzymes provided herein may be used for a variety of downstream applications.
- enzymes comprise polymerases.
- a sample is obtained from one or more sources, and the population of sample polynucleotides is isolated. Samples are obtained (by way of nonlimiting example) from biological sources such as saliva, blood, tissue, skin, or completely synthetic sources.
- samples comprise circulating tumor DNA (ctDNA), cell-free DNA (cfDNA), or other nucleic acid sample.
- the plurality of polynucleotides obtained from the sample are fragmented, end-repaired, and adenylated to form a double stranded sample nucleic acid fragment.
- end repair is accomplished by treatment with one or more enzymes, such as a T4 DNA polymerase or variant thereof (including Taq variants described herein), klenow enzyme, and T4 polynucleotide kinase in an appropriate buffer.
- a T4 DNA polymerase or variant thereof including Taq variants described herein
- klenow enzyme and T4 polynucleotide kinase in an appropriate buffer.
- a nucleotide overhang to facilitate ligation to adapters is added, in some instances with 3’ to 5’ exo minus klenow fragment and dATP.
- a nucleotide overhang to facilitate ligation to adapters is added, in some instances with a variant polymerase described herein and dATP.
- Adapters may be ligated to both ends of the sample polynucleotide fragments with a ligase, such as T4 ligase described herein, to produce a library of adapter-tagged polynucleotide strands, and the adapter-tagged polynucleotide library is amplified with primers, such as universal primers.
- the adapters are Y-shaped adapters comprising one or more primer binding sites, one or more grafting regions, and one or more index (or barcode) regions.
- the one or more index region is present on each strand of the adapter.
- grafting regions are complementary to a flow cell surface, and facilitate next generation sequencing of sample libraries.
- Y-shaped adapters comprise partially complementary sequences.
- Y -shaped adapters comprise a single thymidine overhang which hybridizes to the overhanging adenine of the double stranded adapter-tagged polynucleotide strands.
- Y-shaped adapters may comprise modified nucleic acids, that are resistant to cleavage. For example, a phosphorothioate backbone is used to attach an overhanging thymidine to the 3’ end of the adapters. If universal primers are used, amplification of the library is performed to add barcoded primers to the adapters.
- a plurality of nucleic acids may be obtained from a sample, and fragmented, optionally end-repaired, and adenylated.
- Adapters are ligated to both ends of the polynucleotide fragments to produce a library of adapter-tagged polynucleotide strands, and the adapter- tagged polynucleotide library is amplified.
- the adapter-tagged polynucleotide library is then denatured at high temperature, preferably 96 °C, in the presence of adapter blockers.
- a polynucleotide targeting library (probe library ) is denatured in a hybridization solution at high temperature, preferably about 90 °C to 99 °C, and combined with the denatured, tagged polynucleotide library in hybridization solution for about 10 hours to 24 hours at about 45 °C to 80 °C.
- Binding buffer is then added to the hybridized tagged polynucleotide probes, and a solid support comprising a capture moiety are used to selectively bind the hybridized adapter-tagged polynucleotide-probes.
- the solid support is washed one or more times with buffer, preferably about 2 and 5 times to remove unbound polynucleotides before an elution buffer is added to release the enriched, adapter-tagged polynucleotide fragments from the solid support.
- the enriched library of adapter-tagged polynucleotide fragments is amplified and then the library is sequenced.
- Alternative variables such as incubation times, temperatures, reaction volumes/concentrations, number of washes, or other variables consistent with the specification are also employed in the method.
- the detection or quantification analysis of the oligonucleotides can be accomplished by sequencing.
- the subunits or entire synthesized oligonucleotides can be detected via full sequencing of all oligonucleotides by any suitable methods known in the art, e g., Illumina sequencing by synthesis, PacBio SMRT sequencing (waveguide). Oxford Nanopore (nanopore sequencing) or BGI/MGI nanoball sequencing, including the sequencing methods described herein.
- Sequencing can be accomplished through classic Sanger sequencing methods which are well known in the art. Sequencing can also be accomplished using high-throughput systems some of which allow detection of a sequenced nucleotide immediately after or upon its incorporation into a growing strand, i.e., detection of sequence in red time or substantially real time. In some cases, high throughput sequencing generates at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50.000, at least 100,000 or at least 500,000 sequence reads per hour: with each read being at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read.
- high-throughput sequencing involves the use of technology available by Illumina's Genome Analyzer IIX.
- MiSeq personal sequencer, or HiSeq systems such as those using HiSeq 2500.
- These machines use reversible terminator-based sequencing by synthesis chemistry. These machines can generate 6000 Gb or more reads in 13-44 hours. Smaller systems may be utilized for runs within 3. 2, 1 days or less time. Short synthesis cycles may be used to minimize the time it takes to obtain sequencing results.
- high-throughput sequencing involves the use of technology available by ABI Solid System. This genetic analysis platform that enables massively parallel sequencing of clonally- amplified DNA fragments linked to beads.
- the sequencing methodology is based on sequential ligation with dye-labeled oligonucleotides.
- the next generation sequencing can comprise ion semiconductor sequencing (e.g., using technology from Life Technologies (Ion Torrent)).
- Ion semiconductor sequencing can take advantage of the fact that when a nucleotide is incorporated into a strand of DNA, an ion can be released.
- a high density array of micromachined wells can be formed. Each well can hold a single DNA template. Beneath the well can be an ion sensitive layer, and beneath the ion sensitive layer can be an ion sensor.
- H+ can be released, which can be measured as a change in pH.
- the H+ ion can be converted to voltage and recorded by the semiconductor sensor.
- An array chip can be sequentially flooded with one nucleotide after another. No scanning, light, or cameras can be required.
- an IONPROTONTM Sequencer is used to sequence nucleic acid.
- an 1ONPGMTM Sequencer is used.
- the Ion Torrent Personal Genome Machine (PGM) can do 10 million reads in two hours.
- SMSS Single Molecule Sequencing by Synthesis
- SMSS is unique because it allows for sequencing the entire human genome in up to 24 hours.
- SMSS is powerful because, like the MW technology, it does not require a pre amplification step prior to hybridization. In fact, SMSS does not require any amplification. SMSS is described in part in US Publication Application Nos. 2006002471 I; 20060024678; 20060012793; 20060012784; and 20050100932.
- high-throughput sequencing involves the use of technology available by 454 Lifesciences, Inc. (Branford, Conn.) such as the Pico Titer Plate device which includes a fiber optic plate that transmits chemiluminescent signal generated by the sequencing reaction to be recorded by a CCD camera in the instrument.
- This use of fiber optics allows for the detection of a minimum of 20 million base pairs in 4.5 hours.
- high-throughput sequencing is performed using Clonal Single Molecule Array (Solexa. Inc.) or sequencing-by -synthesis (SBS) utilizing reversible terminator chemistry.
- Solexa. Inc. Single Molecule Array
- SBS sequencing-by -synthesis
- High-throughput sequencing of oligonucleotides can be achieved using any suitable sequencing method known in the art. such as those commercialized by Pacific Biosciences, Complete Genomics. Genia Technologies. Halcyon Molecular. Oxford Nanopore Technologies and the like.
- Other high-throughput sequencing systems include those disclosed in Venter, et al.. Science, 2001; Adams, et al.. Science, 2000; and Levene. et al.. Science, 2003, vol. 299, pages 682- 686; as well as U.S. Publication Nos. 2003/0044781 and 2006/0078937.
- Such systems involve sequencing a target oligonucleotide molecule having a plurality of bases by the temporal addition of bases via a polymerization reaction that is measured on a molecule of oligonucleotide, i.e., the activity of a nucleic acid polymerizing enzyme on the template oligonucleotide molecule to be sequenced is followed in real time. Sequence can then be deduced by identifying which base is being incorporated into the growing complementary strand of the target oligonucleotide by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions.
- a polymerase on the target oligonucleotide molecule complex is provided in a position suitable to move along the target oligonucleotide molecule and extend the oligonucleotide primer at an active site.
- a plurality of labeled types of nucleotide analogs are provided proximate to the active site, with each distinguishably type of nucleotide analog being complementary to a different nucleotide in the target oligonucleotide sequence.
- the growing oligonucleotide strand is extended by using the polymerase to add a nucleotide analog to the oligonucleotide strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target oligonucleotide at the active site.
- the nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified.
- the steps of providing labeled nucleotide analogs, polymerizing the growing oligonucleotide strand, and identifying the added nucleotide analog are repeated so that the oligonucleotide strand is further extended and the sequence of the target oligonucleotide is determined.
- the next-generation sequencing technique can comprise real-time (SMRTTM) technology by Pacific Biosciences.
- SMRT real-time
- each of four DNA bases can be attached to one of four different fluorescent dyes. These dyes can be phospho-linked.
- a single DNA polymerase can be immobilized with a single molecule of template single -stranded DNA at the bottom of a zero-mode waveguide (ZMW).
- ZMW can be a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that can rapidly diffuse in an out of the ZMW (in microseconds). It can take several milliseconds to incorporate a nucleotide into a growing strand.
- the fluorescent label can be excited and produce a fluorescent signal, and the fluorescent tag can be cleaved off.
- the ZMW can be illuminated from below. Attenuated light from an excitation beam can penetrate the lower 20-30 mn of each ZMW. A microscope with a detection limit of 20 zepto liters (10" liters) can be created. The tiny detection volume can provide 1000-fold improvement in the reduction of background noise. Detection of the corresponding fluorescence of the dye can indicate which base was incorporated. The process can be repeated.
- the next-generation sequencing is nanopore sequencing. See, e.g., Soni. et al.. Clin Chem., 2007, vol. 53, pages 1996-2001.
- a nanopore can be a small hole, of the order of about one nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it can result in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows can be sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule can obstruct the nanopore to a different degree.
- the nanopore sequencing technology can be from Oxford Nanopore Technologies; e g., a GridlON system.
- a single nanopore can be inserted in a polymer membrane across the top of a microwell.
- Each microwell can have an electrode for individual sensing.
- the microwells can be fabricated into an array chip, with 100,000 or more microwells (e.g., more than 200,000, 300,000, 400,000, 500,000, 600.000, 700,000, 800.000. 900,000, or 1,000.000) per chip.
- An instrument or node
- Data can be analyzed in real-time.
- the nanopore can be a protein nanopore, e.g.. the protein alpha-hemolysin, a heptameric protein pore.
- the nanopore can be a solid-state nanopore made, e.g., a nanometer sized hole formed in a synthetic membrane (e.g., SiN x , or SiOz).
- the nanopore can be a hybrid pore (e.g., an integration of a protein pore into a solid-state membrane).
- the nanopore can be a nanopore with an integrated sensors (e.g.. tunneling electrode detectors, capacitive detectors, or graphene based nano-gap or edge state detectors (see, e.g.. Garaj.
- Nanopore can be functionalized for analyzing a specific type of molecule (e.g., DNA, RNA, or protein).
- Nanopore sequencing can comprise “strand sequencing” in which intact DNA polymers can be passed through a protein nanopore with sequencing in real time as the DNA translocates the pore.
- An enzyme can separate strands of a double stranded DNA and feed a strand through a nanopore.
- the DNA can have a hairpin at one end. and the system can read both strands.
- nanopore sequencing is “exonuclease sequencing” in which individual nucleotides can be cleaved from a DNA strand by a processive exonuclease, and the nucleotides can be passed through a protein nanopore.
- the nucleotides can transiently bind to a molecule in the pore (e.g.. cyclodextran). A characteristic disruption in current can be used to identify bases.
- Nanopore sequencing technology from GENIA can be used.
- An engineered protein pore can be embedded in a lipid bilayer membrane.
- “Active Control” technology can be used to enable efficient nanopore-membrane assembly and control of DNA movement through the channel.
- the nanoporc sequencing technology is from NABsys.
- Genomic DNA can be fragmented into strands of average length of about 100 kb.
- the 100 kb fragments can be made single stranded and subsequently hybridized with a 6-mer probe.
- the genomic fragments with probes can be driven through a nanopore, which can create a current-versus-time tracing.
- the current tracing can provide the positions of the probes on each genomic fragment.
- the genomic fragments can be lined up to create a probe map for the genome.
- the process can be done in parallel for a library of probes.
- a genome-length probe map for each probe can be generated.
- Errors can be fixed with a process termed “moving window Sequencing By Hybridization (mwSBH).”
- mwSBH moving window Sequencing By Hybridization
- the nanopore sequencing technology is from IBM/Roche.
- An electron beam can be used to make a nanopore sized opening in a microchip.
- An electrical field can be used to pull or thread DNA through the nanopore.
- a DNA transistor device in the nanopore can comprise alternating nanometer sized layers of metal and dielectric. Discrete charges in the DNA backbone can get trapped by electrical fields inside the DNA nanopore. Turning off and on gate voltages can allow the DNA sequence to be read.
- the next generation sequencing can comprise DNA nanoball sequencing as performed, e.g., by Complete Genomics. See. e.g., Drmanac, et al., Science, 2010, vol. 327. no. 5961, pages 78-81.
- DNA can be isolated, fragmented, and size selected. For example, DNA can be fragmented (e.g.. by sonication) to a mean length of about 500 bp.
- Adaptors (Adi) can be attached to the ends of the fragments. The adaptors can be used to hybridize to anchors for sequencing reactions. DNA with adaptors bound to each end can be PCR amplified. The adaptor sequences can be modified so that complementary single strand ends bind to each other forming circular DNA.
- the DNA can be methylated to protect it from cleavage by a type IIS restriction enzyme used in a subsequent step.
- An adaptor e.g., the right adaptor
- An adaptor can have a restriction recognition site, and the restriction recognition site can remain non-methylated.
- the nonmethylated restriction recognition site in the adaptor can be recognized by a restriction enzyme (e.g., Acul), and the DNA can be cleaved by Acul 13 bp to the right of the right adaptor to form linear double stranded DNA.
- a second round of right and left adaptors (Ad2) can be ligated onto either end of the linear DNA. and all DNA with both adapters bound can be PCR amplified (e.g.. by PCR).
- Ad2 sequences can be modified to allow them to bind each other and form circular DNA.
- the DNA can be methylated, but a restriction enzyme recognition site can remain non-methylated on the left Adi adapter.
- a restriction enzyme e.g., Acul
- a third round of right and left adaptor (Ad3) can be ligated to the right and left flank of the linear DNA, and the resulting fragment can be PCR amplified.
- the adaptors can be modified so that they can bind to each other and form circular DNA.
- a type III restriction enzyme e.g., EcoP15
- EcoP15 can be added; EcoP15 can cleave the DNA 26 bp to the left of Ad3 and 26 bp to the right of Ad2. This cleavage can remove a large segment of DNA and linearize the DNA once again.
- a fourth round of right and left adaptors (Ad4) can be ligated to the DNA.
- the DNA can be amplified (e.g., by PCR), and modified so that they bind each other and form the completed circular DNA template.
- Rolling circle replication (e.g., using Phi 29 DNA polymerase) can be used to amplify small fragments of DNA.
- the four adaptor sequences can contain palindromic sequences that can hybridize and a single strand can fold onto itself to form a DNA nanoball (DNBTM) which can be approximately 200- 300 nanometers in diameter on average.
- a DNA nanoball can be attached (e.g., by adsorption) to a microarray (sequencing flowcell).
- the flow cell can be a silicon wafer coated with silicon dioxide, titanium and hexamethyldisilazane (HMDS) and a photoresist material. Sequencing can be performed by unchained sequencing by ligating fluorescent probes to tire DNA. The color of the fluorescence of an interrogated position can be visualized by a high resolution camera.
- the identify of nucleotide sequences between adaptor sequences can be determined.
- nucleic acid library comprising one or more steps of providing one or more sample nucleic acids; end repair of sample nucleic acids; A-tailing of sample nucleic acids using a variant polymerase described herein, contacting the one or more sample nucleic acids with a plurality of adapters and a ligase to form a nucleic acid sequencing library' comprising adapter-ligated nucleic acids: and sequencing the nucleic acid library.
- the sample nucleic acids comprise genomic fragments.
- the genomic fragments are obtained from cleavage of a genome. In some instances, the genomic fragments are obtained from amplification of a genome. In some instances the sample nucleic acids comprise cDNAs. In some instances the sample nucleic acids comprise cfDNAs. In some instances the method further comprises one or more steps to prepare nucleic acid library', such as end-repair, a- tailing, and amplification. In some instances the method further comprises enriching the nucleic acid library prior to sequencing.
- kits Compositions and methods provided herein may be present in a kit.
- a kit for nucleic library preparation comprises (a) a ligase; (b) a variant polymerase described herein; and (c) at least one adapter.
- a kit comprises packaging for holding the kit components.
- a kit comprises instructions for using the kit components.
- a kit comprises adapters, buffers, additional enzymes, polymerases, dNTPs, or other components for use with sequencing library preparation.
- Example 1 Taq Polymerase High Throughput Assay
- FIG. 1 The general workflow is shown in FIG. 1.
- non-clonal fragments were obtained from Twist Bioscicncc Corporation. These fragments were designed to contain T7 promoter and terminator flanking the enzyme variant sequence.
- This DNA came lyophilized and was resuspended in water. The DNA concentration in each well was assayed with BR dsDNA Qubit (Therm ofisher).
- An ECHO liquid transfer instrument was used to set up small-scale, 1 pL, transcription-coupled translation (TxTl) reactions with a normalized mass of DNA template at 37 °C for 2 horns that are used to produce the enzyme variants, one unique variant in each well.
- A-tailing reaction was carried out with A-tailing Reaction Buffer, dNTPs, enzyme produced from TxTl and 5 ng of a blunt 230 bp DNA substrate generated by restriction enzyme digestion with Mlyl.
- This blunt substrate is a mixture of 4 sequences that all have identical sequences except for the terminal base on either side which is an equimolar mixture of all 4 bases.
- the A-tailing reaction was incubated at 65 C for 30 min to allow the enzyme variants to make untemplated additions to the blunt substrate.
- reaction was then split in half to evaluate distinct base additions separately.
- T-tailed adapters were used to ligate to the A-tailed substrate.
- TT or C-tailed adapters were also used to quantify AA or G addition by the enzyme variant.
- Double ligation products were evaluated by qPCR after dilution of the reaction 1 :300.
- the qPCR primers used to measure ligation anneal across the ligation junction to ensure proper ligation.
- a separate primer pair was utilized to measure chimeric molecule ligation, an undesired outcome for this experiment. Based on the qPCR data with the respective screens, Ct values are compiled and variant hits are identified that are brought into the next round of design or which are purified for validation.
- Taq polymerase variants Following the general procedure of Example 1, multiple rounds of optimization/selection were used to generate Taq polymerase variants. Variants from the Taq sequence (SEQ ID NO: 1) were selected based in part on high entropy positions (FIG. 3) and screened using a high throughput qPCR assay (FIGs. 4A-4B). In a first round, single variants were tested for polymerization performance metrics. Multiple sequence alignment (MSA) of a region of Taq Polymerase aligned with sequence homologues of this enzyme. The MSA was performed at a region of the enz e identified in the literature. Alternative amino acids found in other homologues, but not WT, are the basis of the initial design of TaqIT variants (FIG. 3).
- MSA Multiple sequence alignment
- Enzy e variants identified by MSA were assayed using a 384 well plate workflow. Two replicates were performed and the ligation to T-tailed adapters was quantified by qPCR. The scatter plot (FIG. 4A) of activity normalized to the WT, showed the correlation between the two replicates. There is a cloud of variants around the WT, and a subset of variants perform better than WT in one or both replicates. A table of the top variants that perfonned consistently better than WT across replicates is shown in FIG. 4B.
- Taq variants were purified by taking advantage of the Taq polymerase heat tolerance.
- FIG. 5A Taq variants were expressed as His6-tagged constructs. The His-tagged variants underwent enzymatic lysis (BPER) and heat treatment at 70°C for 30 minutes. The Taq variant was purified from the heat- stabilized lysate using Ni-NTA column purification for characterization in a next- generation sequencing library preparation assay. The purified variants were quantified by spectrophotometry and purity was evaluated using SDS PAGE.
- FIG. 5A shows an SDS PAGE gel of purified wild-type TaqIT
- TaqIT binary variants and one homologue, were evaluated for the percentage of Ibp 3’ reads that have G tails, an undesired outcome.
- the TaqIT variants will create more ligatable molecules for NGS (FIG. 8B).
- FIG. 7A Binary combinations, with two mutants per sequence, were also constructed SDS-PAGE gel showing a set of purified TaqIT binary variants is shown in FIG. 7A.
- NGS library preparation was performed using purified TaqIT binary variants as the A-tailing enzyme during the end repair and A- tailing reaction.
- the total number of aligned reads left
- percent chimera right
- Enzy me tertiary variants were assayed using the 384 well plate workflow above. Two replicates were performed and ligation to T-tailed adapters was quantified by qPCR.
- the scatter plot (left) of activity normalized to the WT shows the correlation between the two replicates.
- This plate included a few binary variants from the previous round. Binary variants outperformed WT. and other tertiaries also outperformed some binaries.
- On the right is a table of the top variants that performed consistently better than WT across replicates. (FIGs. 9A-9B).
- wild type TaqIT results in about 8% G tailing (rather than A). For ligation with adapters comprising a T overhang, this may reduce the efficiency of ligation with this type of adapter. Mutants were identified which gave improved A-tailing efficiency and selectivity of no more than 2% G tailing (Table 3).
- Item 1 A variant polypeptide comprising at least one amino acid mutation relative to SEQ ID NO: 1.
- Item 2 The polypeptide of item 1. wherein the polypeptide comprises at least 80% similarity to any one of SEQ ID NOs: 3-9.
- Item 3 The polypeptide of item 1, wherein the polypeptide comprises at least 90% similarity to any one of SEQ ID NOs: 3-9.
- Item 4 The polypeptide of item 1, wherein the polypeptide comprises at least 95% similarity to any one of SEQ ID NOs: 3-9.
- Item 5 The polypeptide of item 1, wherein the polypeptide comprises at least 98% similarity to any one of SEQ ID NOs: 3-9.
- Item 6 The polypeptide of item 1, wherein the polypeptide comprises any one of SEQ ID NOs: 3-9.
- Item 7 The polypeptide of any one of items 1-6, wherein the mutation comprises one or more of an addition, deletion, and substitution.
- Item 8 The polypeptide of any one of items 1-7. wherein the deletion comprises 250-300 amino acids from the N-tenninus relative to SEQ ID NO: 1.
- Item 9 The polypeptide of any one of items 1-7. wherein the polypeptide comprises at least 2 amino acid mutations relative to SEQ ID NO: 1.
- Item 10 The polypeptide of any one of items 1-7, wherein the polypeptide comprises at least 3 amino acid mutations relative to SEQ ID NO: 1.
- Item 11 The polypeptide of any one of items 1-7, wherein the polypeptide comprises at least 4 amino acid mutations relative to SEQ ID NO: 1.
- Item 12 The polypeptide of any one of items 1-11. wherein the mutations are at one or more of positions V449F. V493L. L522I, L605C, T664I, E681G, W706Y. D732A. R736K. R736Q. and G824A relative to SEQ ID NO: 1.
- Item 13 The polypeptide of item 12, wherein the mutations are at two or more of positions V449F, V493L, L522I. L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, and G824A relative to SEQ ID NO: 1.
- Item 14 The polypeptide of item 12, wherein the mutations are selected from two or more of V449F, V493L, L522I. L605C, T664I, E681G, W706Y, D732A, R736K, R736Q, and G824A relative to SEQ ID NO: 1.
- Item 15 The polypeptide of item 14, wherein the mutations are selected from one or more of V449F, V493L, L5221, L605C. T664I, E681G, W706Y. D732A. R736K. R736Q. and G824A relative to SEQ ID NO: 1.
- Item 16 The polypeptides of any one of items 1-15, wherein the polypeptide further comprises a purification tag.
- Item 17 A nucleic acid encoding for the polypeptide of any one of items 1-16.
- Item 18 A vector comprising the nucleic acid of item 17.
- Item 19 The vector of item 18, wherein the vector comprises a plasmid.
- Item 20 A cell comprising the nucleic acid of item 17.
- Item 21 The cell of item 20, wherein the cell comprises a bacterial cell.
- Item 22 A method of expressing the polypeptide of any one of items 1-15.
- Item 23 The method of item 22, wherein expression comprises translation of the nucleic acid sequence of any one of items 1-16.
- Item 24 The method of item 22 or 23, wherein the method comprises an in vivo method.
- Item 25 The method of item 22 or 23, wherein the method comprises a cell-free method.
- Item 26 A method for extending a first polynucleotide comprising: contacting a first polynucleotide with a nucleotide and polypeptide of any one of items 1-16 to form an extended polynucleotide.
- Item 27 The method of item 26, wherein the first polynucleotide comprises genomic DNA or a fragment thereof.
- Item 28 The method of item 26, wherein the first polynucleotide comprises cDNA.
- Item 29 The method of item 26, wherein the nucleotide comprises adenosine triphosphate.
- Item 30 The method of any one of items 26-29, wherein the method is selective for incorporation of a single nucleotide.
- Item 31 The method of item 30, wherein the method results in at least 90% selectivity for a single nucleotide vs. incorporation of multiple nucleotides.
- Item 32 The method of item 30, wherein the method results in at least 95% selectivity for a single nucleotide vs. incorporation of multiple nucleotides.
- Item 33 The method of any one of items 26-29, wherein the method is selective for incorporation of a nucleotide type.
- Item 34 The method of item 33. wherein the method results in at least 90% selectivity for the nucleotide type.
- Item 35 The method of item 33, wherein the method results in at least 95% selectivity for the nucleotide type.
- Item 36 The method of item 33, wherein the method results in at least 95% selectivity for A over G.
- Item 37 The method of any one of items 26-36, wherein the method further comprises ligating an adapter to the extended polynucleotide.
- Item 38 The method of item 37, wherein the adapter comprises a complementary overhang to the extended polynucleotide.
- Item 39 The method of item 37, wherein the method further comprises extending a second polynucleotide.
- Item 40 The method of item 39. wherein the first polynucleotide and the second polynucleotide are hybridized.
- Item 41 A kit for nucleic library preparation comprising: a ligase; a polymerase having the sequence of the polypeptide of any one of items 1-16; and at least one adapter.
- Item 42 A method for preparing a sequencing library comprising: providing a plurality of nucleic acids; end-repairing the plurality of nucleic acids; performing a-tailing on the nucleic acids using a polymerase having the sequence of the polypeptide of any one of items 1-16; and ligating at least one adapter to the nucleic acids using a ligase.
- Item 43 The method of item 42, wherein the plurality of nucleic acids is derived from cfDNA.
- Item 44 The method of item 42, wherein the plurality of nucleic acids is derived from ctDNA.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medicinal Chemistry (AREA)
- Enzymes And Modification Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2024259004A AU2024259004A1 (en) | 2023-04-21 | 2024-04-17 | Polymerase variants |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363497665P | 2023-04-21 | 2023-04-21 | |
| US63/497,665 | 2023-04-21 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024220475A1 true WO2024220475A1 (en) | 2024-10-24 |
Family
ID=91081977
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/024895 Pending WO2024220475A1 (en) | 2023-04-21 | 2024-04-17 | Polymerase variants |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2024259004A1 (en) |
| WO (1) | WO2024220475A1 (en) |
Citations (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US2471106A (en) | 1945-06-28 | 1949-05-24 | Clarence E Hall | Valve clearance gauge |
| WO1997016566A1 (en) * | 1995-10-20 | 1997-05-09 | THE GOVERNMENT OF THE UNITED STATES OF AMERICA, represented by THE SECRETARY OF THE DEPARTMENT OFHE ALTH AND HUMAN SERVICES | Sequence modification of oligonucleotide primers to manipulate non-templated nucleotide addition |
| US20020012930A1 (en) | 1999-09-16 | 2002-01-31 | Rothberg Jonathan M. | Method of sequencing a nucleic acid |
| US20030022207A1 (en) | 1998-10-16 | 2003-01-30 | Solexa, Ltd. | Arrayed polynucleotides and their use in genome analysis |
| US20030044781A1 (en) | 1999-05-19 | 2003-03-06 | Jonas Korlach | Method for sequencing nucleic acid molecules |
| US20030058629A1 (en) | 2001-09-25 | 2003-03-27 | Taro Hirai | Wiring substrate for small electronic component and manufacturing method |
| US20030064398A1 (en) | 2000-02-02 | 2003-04-03 | Solexa, Ltd. | Synthesis of spatially addressed molecular arrays |
| US20040106130A1 (en) | 1994-06-08 | 2004-06-03 | Affymetrix, Inc. | Bioarray chip reaction apparatus and its manufacture |
| US6787308B2 (en) | 1998-07-30 | 2004-09-07 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
| US20040248161A1 (en) | 1999-09-16 | 2004-12-09 | Rothberg Jonathan M. | Method of sequencing a nucleic acid |
| US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
| US20050079510A1 (en) | 2003-01-29 | 2005-04-14 | Jan Berka | Bead emulsion nucleic acid amplification |
| US20050100932A1 (en) | 2003-11-12 | 2005-05-12 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
| US6897023B2 (en) | 2000-09-27 | 2005-05-24 | The Molecular Sciences Institute, Inc. | Method for determining relative abundance of nucleic acid sequences |
| US20050124022A1 (en) | 2001-10-30 | 2005-06-09 | Maithreyan Srinivasan | Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
| US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
| US20060012793A1 (en) | 2004-07-19 | 2006-01-19 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
| US20060012784A1 (en) | 2004-07-19 | 2006-01-19 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
| US20060024678A1 (en) | 2004-07-28 | 2006-02-02 | Helicos Biosciences Corporation | Use of single-stranded nucleic acid binding proteins in sequencing |
| US20060078909A1 (en) | 2001-10-30 | 2006-04-13 | Maithreyan Srinivasan | Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
| WO2018191702A2 (en) * | 2017-04-14 | 2018-10-18 | Guardant Health, Inc. | Methods of attaching adapters to sample nucleic acids |
| WO2020185702A2 (en) * | 2019-03-13 | 2020-09-17 | Abclonal Science, Inc. | Mutant taq polymerase for faster amplification |
| US20230094503A1 (en) * | 2019-03-10 | 2023-03-30 | AbClonal Science Inc. | Mutant Taq Polymerase for Increased Salt Concentration or Body Fluids |
-
2024
- 2024-04-17 WO PCT/US2024/024895 patent/WO2024220475A1/en active Pending
- 2024-04-17 AU AU2024259004A patent/AU2024259004A1/en active Pending
Patent Citations (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US2471106A (en) | 1945-06-28 | 1949-05-24 | Clarence E Hall | Valve clearance gauge |
| US20040106130A1 (en) | 1994-06-08 | 2004-06-03 | Affymetrix, Inc. | Bioarray chip reaction apparatus and its manufacture |
| WO1997016566A1 (en) * | 1995-10-20 | 1997-05-09 | THE GOVERNMENT OF THE UNITED STATES OF AMERICA, represented by THE SECRETARY OF THE DEPARTMENT OFHE ALTH AND HUMAN SERVICES | Sequence modification of oligonucleotide primers to manipulate non-templated nucleotide addition |
| US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
| US6787308B2 (en) | 1998-07-30 | 2004-09-07 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
| US20030022207A1 (en) | 1998-10-16 | 2003-01-30 | Solexa, Ltd. | Arrayed polynucleotides and their use in genome analysis |
| US20030044781A1 (en) | 1999-05-19 | 2003-03-06 | Jonas Korlach | Method for sequencing nucleic acid molecules |
| US20060078937A1 (en) | 1999-05-19 | 2006-04-13 | Jonas Korlach | Sequencing nucleic acid using tagged polymerase and/or tagged nucleotide |
| US20020012930A1 (en) | 1999-09-16 | 2002-01-31 | Rothberg Jonathan M. | Method of sequencing a nucleic acid |
| US20030100102A1 (en) | 1999-09-16 | 2003-05-29 | Rothberg Jonathan M. | Apparatus and method for sequencing a nucleic acid |
| US20030148344A1 (en) | 1999-09-16 | 2003-08-07 | Rothberg Jonathan M. | Method of sequencing a nucleic acid |
| US20040248161A1 (en) | 1999-09-16 | 2004-12-09 | Rothberg Jonathan M. | Method of sequencing a nucleic acid |
| US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
| US20030064398A1 (en) | 2000-02-02 | 2003-04-03 | Solexa, Ltd. | Synthesis of spatially addressed molecular arrays |
| US6897023B2 (en) | 2000-09-27 | 2005-05-24 | The Molecular Sciences Institute, Inc. | Method for determining relative abundance of nucleic acid sequences |
| US20030058629A1 (en) | 2001-09-25 | 2003-03-27 | Taro Hirai | Wiring substrate for small electronic component and manufacturing method |
| US20050124022A1 (en) | 2001-10-30 | 2005-06-09 | Maithreyan Srinivasan | Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
| US20060078909A1 (en) | 2001-10-30 | 2006-04-13 | Maithreyan Srinivasan | Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase |
| US20050079510A1 (en) | 2003-01-29 | 2005-04-14 | Jan Berka | Bead emulsion nucleic acid amplification |
| US20050100932A1 (en) | 2003-11-12 | 2005-05-12 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
| US20060012793A1 (en) | 2004-07-19 | 2006-01-19 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
| US20060012784A1 (en) | 2004-07-19 | 2006-01-19 | Helicos Biosciences Corporation | Apparatus and methods for analyzing samples |
| US20060024678A1 (en) | 2004-07-28 | 2006-02-02 | Helicos Biosciences Corporation | Use of single-stranded nucleic acid binding proteins in sequencing |
| WO2018191702A2 (en) * | 2017-04-14 | 2018-10-18 | Guardant Health, Inc. | Methods of attaching adapters to sample nucleic acids |
| US20230094503A1 (en) * | 2019-03-10 | 2023-03-30 | AbClonal Science Inc. | Mutant Taq Polymerase for Increased Salt Concentration or Body Fluids |
| WO2020185702A2 (en) * | 2019-03-13 | 2020-09-17 | Abclonal Science, Inc. | Mutant taq polymerase for faster amplification |
| US20210079365A1 (en) * | 2019-03-13 | 2021-03-18 | Abclonal Science, Inc. | Mutant Taq Polymerase for Faster Amplification |
Non-Patent Citations (13)
| Title |
|---|
| ADAMS ET AL., SCIENCE, 2000 |
| BARNES ET AL: "The fidelity of Taq polymerase catalyzing PCR is improved by an N-terminal deletion", GENE, ELSEVIER AMSTERDAM, NL, vol. 112, no. 1, 1 March 1992 (1992-03-01), pages 29 - 35, XP023542220, ISSN: 0378-1119, [retrieved on 19920301], DOI: 10.1016/0378-1119(92)90299-5 * |
| BARNES WAYNE M. ET AL: "A Single Amino Acid Change to Taq DNA Polymerase Enables Faster PCR, Reverse Transcription and Strand-Displacement", FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, vol. 8, 14 January 2021 (2021-01-14), CH, pages 553474, XP093151630, ISSN: 2296-4185, DOI: 10.3389/fbioe.2020.553474 * |
| CONSTANS: "Beyond Sanger: toward the $1.000 genome: new technologies promise faster and cheaper whole-genome sequencing", THE SCIENTIST, vol. 17, no. 13, 2003, pages 36 |
| DATABASE Geneseq [online] 11 May 2023 (2023-05-11), "Taq polymerase with C-terminal his tag mutant E681K, SEQ 402.", XP093184474, retrieved from EBI accession no. GSP:BMQ85974 Database accession no. BMQ85974 * |
| DATABASE Geneseq [online] 27 May 2021 (2021-05-27), "Taq polymerase mutant D732N with linker/his-tag, SEQ 16.", XP093184477, retrieved from EBI accession no. GSP:BJC46913 Database accession no. BJC46913 * |
| DRMANAC ET AL., SCIENCE, vol. 327, no. 5961, 2010, pages 78 - 81 |
| GARAJ ET AL., NATURE, vol. 467, 2010, pages 190 - 193 |
| LEVENE ET AL., SCIENCE, vol. 299, 2003, pages 682 - 686 |
| MARGUILES ET AL.: "Genome sequencing in microfabricated high-density picolitre reactors", NATURE, vol. 437, 2005, pages 376 - 380 |
| SONI ET AL., CLIN CHEM., vol. 53, 2007, pages 1996 - 2001 |
| TAKESHI YAMAGAMI ET AL: "Mutant Taq DNA polymerases with improved elongation ability as a useful reagent for genetic engineering", FRONTIERS IN MICROBIOLOGY, vol. 5, 3 September 2014 (2014-09-03), pages 1 - 10, XP055386500, DOI: 10.3389/fmicb.2014.00461 * |
| VENTER ET AL., SCIENCE, 2001 |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2024259004A1 (en) | 2025-12-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3027775B1 (en) | Dna sequencing and epigenome analysis | |
| JP7638309B2 (en) | High-throughput single-cell sequencing with reduced amplification bias | |
| Moffitt et al. | Spatial organization shapes the turnover of a bacterial transcriptome | |
| Twyman | Principles of proteomics | |
| US20180258421A1 (en) | Compositions, methods and uses for multiplex protein sequence activity relationship mapping | |
| US10011830B2 (en) | Devices and methods for display of encoded peptides, polypeptides, and proteins on DNA | |
| TW201321518A (en) | Method of micro-scale nucleic acid library construction and application thereof | |
| WO2010036323A1 (en) | Method of identifing interactions between genomic loci | |
| KR102795708B1 (en) | Method for diagnosing and predicting cancer type based on artificial intelligence | |
| AU2016242953A1 (en) | Method for detecting genomic variations using circularised mate-pair library and shotgun sequencing | |
| KR101913735B1 (en) | Internal control substance searching for intersample crosscontamination of nextgeneration sequencing samples | |
| AU2024259004A1 (en) | Polymerase variants | |
| AU2024259004A9 (en) | Polymerase variants | |
| WO2024123733A1 (en) | Enzymes for library preparation | |
| US20240287580A1 (en) | Unit-dna composition for spatial barcoding and sequencing | |
| KR20250175336A (en) | polymerase mutants | |
| CN121219408A (en) | polymerase variants | |
| JP2025541124A (en) | Enzymes for library preparation | |
| Monge et al. | Highly replicated experiments studying complex genotypes using nested DNA barcodes | |
| US20220025430A1 (en) | Sequence based imaging | |
| HK1227063A1 (en) | Dna sequencing and epigenome analysis | |
| HK1227063B (en) | Dna sequencing and epigenome analysis | |
| Primrose | Principles of gene manipulation and genomics by Sandy B Primrose and Richard Twyman | |
| KR20140006363A (en) | Method for preparing chimeric ribonucleic acid, cdna and its derivatives |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24726054 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: CN2024800323828 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: AU2024259004 Country of ref document: AU Ref document number: KR1020257038874 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024726054 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2024726054 Country of ref document: EP Effective date: 20251121 |
|
| ENP | Entry into the national phase |
Ref document number: 2024726054 Country of ref document: EP Effective date: 20251121 |
|
| ENP | Entry into the national phase |
Ref document number: 2024259004 Country of ref document: AU Date of ref document: 20240417 Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2024726054 Country of ref document: EP Effective date: 20251121 |
|
| ENP | Entry into the national phase |
Ref document number: 2024726054 Country of ref document: EP Effective date: 20251121 |