[go: up one dir, main page]

WO2018009729A2 - Modification of dna polymerases for in vitro applications - Google Patents

Modification of dna polymerases for in vitro applications Download PDF

Info

Publication number
WO2018009729A2
WO2018009729A2 PCT/US2017/040994 US2017040994W WO2018009729A2 WO 2018009729 A2 WO2018009729 A2 WO 2018009729A2 US 2017040994 W US2017040994 W US 2017040994W WO 2018009729 A2 WO2018009729 A2 WO 2018009729A2
Authority
WO
WIPO (PCT)
Prior art keywords
polymerase
seq
dna
polynucleotide
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2017/040994
Other languages
French (fr)
Other versions
WO2018009729A3 (en
Inventor
Mark Welch
Sridhar Govindarajan
David Mead
Baigen MEI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dna20 Inc dba Atum
Lucigen Corp
Original Assignee
Dna20 Inc dba Atum
Lucigen Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dna20 Inc dba Atum, Lucigen Corp filed Critical Dna20 Inc dba Atum
Publication of WO2018009729A2 publication Critical patent/WO2018009729A2/en
Publication of WO2018009729A3 publication Critical patent/WO2018009729A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1252DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase

Definitions

  • the field of the present invention relates to novel compositions of DNA polymerases for in vitro applications.
  • DNA polymerases are widely used for nucleic acid amplification, detection and sequencing.
  • the most commonly used enzyme for Sanger sequencing is derived from Thermus aquaticus (Taq), whereas Bacillus sterothermophilus (Bst) DNAP is used in the 454/Roche pyrosequencing platform.
  • Taq polymerase lacks proofreading activity and is unable to efficiently extend misincorporated bases. Mismatched base pairing generates truncated products that accumulate during PCR and contribute to reaction failure if the target is too long and/or the template DNA is supplied in low amounts.
  • proofreading high fidelity enzymes are extremely accurate, but do not perform well over longer target distances or with low template concentration because the 3 '-5' exonuclease (proofreading) activity destroys primers and affects sensitivity.
  • Second and third generation instruments for massively parallel DNA sequencing can deliver megabases of data at a lower cost.
  • the development of a DNAP to match the technical capabilities of new instrument platforms has not kept pace. Achieving long and accurate reads using new solid phase extension methods, terminator chemistries, and microf!uidic flow technologies places new demands on currently used enzymes.
  • Polymerases with increased template affinity for DNA or RNA could provide important improvements in sequencing, amplification and reverse transcription.
  • NGS next generation sequencing
  • DNA polymerases are well known in the art and include both DNA-dependent polymerases and RNA-dependent polymerases such as reverse transcriptase. At least five families of DNA-dependent DNA polymerases are known, although most fall into families A, B and C. Other less characterized families include D, X, Y and RT. There is little or no structural or sequence similarity among the various families. Most family A polymerases are single chain proteins that can contain multiple enzymatic functions including polymerase, 3' to 5' exonuclease activity and 5' to 3' exonuclease activity. Family B polymerases typically have a single catalytic domain with polymerase and 3' to 5 * exonuclease activity, as well as accessory factors. Family C polymerases are typically multi-subunit proteins with
  • DNA polymerizing and 3' to 5' exonuclease activity In E. coli, three types of DNA polymerases have been found, DNA polymerases I (family A), II (family B), and III (family C). In eukaryotic cells, three different family B polymerases, DNA polymerases alpha, delta, and epsilon, are implicated in nuclear replication, and a family A polymerase, polymerase gamma, is used for mitochondrial DNA replication. Other types of DNA polymerases include phage polymerases.
  • RNA polymerases typically include eukaryotic RNA polymerases I, II, and III, and bacterial RNA polymerases as well as phage and viral polymerases.
  • polymerases can be DNA-dependent and RNA-dependent.
  • the present invention is directed to eukaryotic DNA polymerases, in particular Pol eta, Pol iota and Pol kappa, members of Family Y DNA polymerases involved in the DNA repair by translesion synthesis and encoded by genes POLH, POLI and POLK respectively.
  • the DinB Pol ⁇ subgroup proteins are ubiquitously present from bacteria to humans, but notably absent in the completely sequenced genomes of Saccharomyces
  • the E. co/?DinB protein was shown to have DNA polymerase activity (designated DNA polymerase rV, or Pol IV), independently of accessory proteins such as UmuD' and RecA which are required for UmuC-dependent DNA
  • Polymerase activity (designated Pol V).
  • Members of Family Y have five common motifs to aid in binding the substrate and primer terminus and they all include the typical right-hand thumb, palm and finger domains with added domains like little finger (LF), polymerase- associated domain (PAD), or wrist.
  • the active site differs between family members due to the different lesions being repaired.
  • Polymerases in Family Y are low-fidelity polymerases. The importance of these polymerases is evidenced by the fact that gene encoding DNA polymerase ⁇ is referred as XPV, because loss of this gene results in the disease Xeroderma Pigmentosum Variant.
  • Pol ⁇ is particularly important for allowing accurate translesion synthesis of DNA damage resulting from ultraviolet radiation. The functionality of Pol ⁇ is not completely understood, but researchers have found two probable functions. Pol ⁇ is thought to act as an extender or an inserter of a specific base at certain DNA lesions.
  • the present invention provides modified DNA polymerases that have improved amplification properties and/or processivity over natural forms of the polymerases as well as other polymerases in commercial use.
  • the invention also provides recombinant DNA sequences encoding such DNA polymerases, and vector plasmtds and host cells suitable for the expression of these recombinant DNA sequences.
  • the polymerase sequence is selected from SEQ ID NOS: 3-242.
  • the polymerase may be selected from SEQ ID NOS: 156, 159, 164, 165, 166, 173, 181, 190, 214, 218, 225 and 242.
  • the invention further provides a polynucleotide encoding a non-natura ly occuring polymerase, wherein the polymerase has a sequence comprising SEQ ID NO: 1 modified by one or more substitutions listed in Table 3 and up to ten internal insertions, deletions or substitutions at positions other than those listed in Table 3.
  • the polymerase comprises at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 substitutions listed in Table 3.
  • the polymerase has at least 90, 95 or 99% sequence identity to claim 1.
  • the polymerase has no substitutions other than those shown in Table 3 and conservative substitutions not affecting activity of the polymerase.
  • the invention further provides a polynucleotide, which encodes an amino acid sequence at least 90, 95 or 99% sequence identical to any of SEQ ID NOS: 3-242 provided any substitutions present in the amino acid sequence specified in Table 4 are retained.
  • the polymerases selected can be thermostable, retain polymerase activity and exhibit reverse transcriptase activity.
  • the polymerase variants can be selected for any or all of properties that include strand displacement activity, amplification of next generation sequencing (NGS) libraries, high fidelity amplification of target sequence, amplification of amplification resistant target sequences comprising direct repeats, inverted repeats, at least 65% G+C residues or A+T residues or a sequence greater than 2 kilobases.
  • NGS next generation sequencing
  • target sequence comprising direct repeats, inverted repeats, at least 65% G+C residues or A+T residues or a sequence greater than 2 kilobases.
  • any or all such properties are enhanced relative to the corresponding property of the polymerase having the amino acid sequence of SEQ ID NO:l .
  • the target sequence may be
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • DNA sequences encoding DNA polymerases and other motifs such as DNA binding proteins, antibodies and more are another aspect.
  • the invention also provides a novel formulation of the DNA polymerases of the present invention and other thermostable DNA polymerases, which formulation of enzymes is capable of efficiently catalyzing the amplification by PCR (the polymerase chain reaction) of unusually long and faithful products.
  • compositions comprising one or more non-natural polymerases selected from SEQ ID NOS: 3-242.
  • a method comprising amplifying sequences from a target polynucleotide, for example PCR or reverse transcription using a modified polymerase, other than a natural sequence selected from any of SEQ ID NOS: 3-242.
  • the modified non-natural polymerase polynucleotide sequence is included in a construct operably linked to a promoter; the construct is included in a recombinant host cell.
  • the target polynucleotide may be DNA or RNA and may be from a bacterial cell, a human cell or murine cell.
  • the modified non- natural polymerases can amplify target polynucleotide sequences comprising amplification resistant sequence comprising direct repeats, inverted repeats, at least 65 % G+C residues, or A+T residues or a sequence greater than 2 kilobases.
  • the modified non-natural polymerases allow amplification of next generation sequencing (NGS) libraries.
  • NGS next generation sequencing
  • a kit comprising a polynucleotide encoding a non-natural polymerase, wherein the polymerase of SEQ ID NO: 1 comprises one or more substitutions listed in Table 3 and/or is selected from SEQ ID NOS: 3- 242.
  • FIGURE 1 A Purified modified polymerases were tested for purity by SDS- PAGE analysis, >95% purity was observed.
  • FIGURE IB Additional purified modified polymerases were tested for purity by SDS-PAGE analysis, >95% purity was observed.
  • FIGURE 2A Purified modified DNA polymerase variants were tested for ability to amplify contaminating bacterial host DNA by the E. coli 16S PCR test. Reactions were set up as described in Section 6.1.5 and analyzed by agarose gel electrophoresis.
  • FIGURE 2B Additional purified modified DNA polymerase variants were tested for ability to amplify contaminating bacterial host DNA by the E. coli 16S PCR test. Reactions were set up as described in Section 6.1.5 and analyzed by agarose gel electrophoresis.
  • FIGURE 3A Amplification properties of modified DNA polymerase variants were determined by PCR of a 2.8 kb PoIA target DNA from E. coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability and analyzed by agarose gel electrophoresis. Polymerases Accura (Acc) DNAP, GoTaq and KAPA (KA) were run as controls.
  • FIGURE 3B Additional PCR reactions from Figure 3A were run at two denaturation temperatures 94°C and 98°C to determine thermostability and analyzed by agarose gel electrophoresis as above. Polymerases Accura (Acc) DNAP and KAPA (KA) were run as controls.
  • Acc Accura
  • KA KAPA
  • FIGURE 4A Amplification properties of modified DNA polymerase variants were determined by PCR of a 5 kb target DNA from E, coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 4B Amplification properties of additional modified DNA polymerase variants were determined by PCR of a 5 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 5 A Amplification properties of modified DNA polymerase variants were determined by PCR of a 10 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 5B Amplification properties of additional modified DNA polymerase variants were determined by PCR of a 10 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 6A Modified DNA polymerase variants efficiently amplified longer human DNA target (5 kb) by PCR and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls.
  • FIGURE 6B Modified DNA polymerase variants (V313 and V318 shown) efficiently amplified longer human DNA target (5 kb) by PCR and analyzed on a
  • FIGURE 7 panels A and B: Modified DNA polymerase variants identified show good NGS Human DNA library amplification. PCR reactions were analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. Several modified DNA polymerase variants successfully amplified a human DNA library giving high yields of amplicons.
  • FIGURE 8 panels I and II Modified DNA polymerase variants successfully amplified high GC-content NGS DNA library from Rhodobacter. PCR reactions were analyzed by agarose gel electrophoresis and on a Bioanalyser (data not shown). Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls.
  • FIGURE 9 panels I and II Modified DNA polymerase variants successfully amplified high AT-content NGS DNA library from Staphylococcus, PCR reactions were analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls.
  • FIGURE 10 Panels A and B show modified DNA polymerase variants that show strong strand displacement activity by primer extension. Primer extension was set up and reactions analyzed by agarose gel electrophoresis.
  • FIGURE 11 A Modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. RT-PCR reactions were run without (panel I) and with KF (panel II) and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase was run as control.
  • FIGURE 1 IB Additional modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. RT-PCR reactions were run without (panel I) and with KF (panel II) and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase was run as control.
  • FIGURE 1 1C Modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. A subset of samples from Figure 11 A were re-analyzed on an agarose gel to confirm positives.
  • RT reverse transcriptase
  • FIGURE 12 12 DNA polymerase variants from Table 10 were tested in a
  • HotStart version to show inhibition of polymerase activity by antibody.
  • FIGURE 13 Efficient amplification of a 3 kb plasmid E. coli target was shown for the 12 variants from round 1 and 2 round 2 variants SEQ ID NO: 101 (V103) and SEQ ID NO: 1 17 (VI 19). PCR reaction was set up and analyzed by agarose gel electrophoresis.
  • FIGURE 14 Efficient amplification of a 5 kb E. coli target was shown for the 12 variants from rounds 1 and 2, and round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction was set up and analyzed by agarose gel electrophoresis.
  • FIGURE 1 A Efficient amplification of a 10 kb E. coli target was shown for the 12 variants from rounds 1 and round 2, round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction was set up and analyzed by agarose gel electrophoresis.
  • FIGURE 15B Efficient amplification of a 10 kb E. coli target was shown for the 12 variants from rounds 1 and 2, round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117).
  • PCR reaction was set up and analyzed by agarose gel electrophoresis.
  • Panel I shows reactions after 35 cycles of amplification.
  • Panel II shows reactions after 30 cycles of amplification.
  • FIGURE 16 Efficient amplification of a 5 kb human DNA target was shown for the 12 modified polymerase variants from rounds 1 and 2, round 2 variants VI 03 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 17 Efficient amplification of NGS DNA libraries from human (panel I) and Rhodobacter (panel II) was shown for the 12 modified polymerase variants from Table 10 and two second round variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
  • FIGURE 18 Efficient amplification of NGS DNA library from AT-rich Staphylococcus was shown for the 12 modified polymerase variants from Table 10 and two second round variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117).
  • Accura Acc
  • KAPA KAPA
  • FIGURE 19A PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitoi and reactions analyzed by agarose gel electrophoresis.
  • sugars for example sucrose, trehalose, mannitoi and reactions analyzed by agarose gel electrophoresis.
  • FIGURE I9B PCR optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example trehalose, mannitol and sorbitol and reactions analyzed by agarose gel electrophoresis.
  • sugars for example trehalose, mannitol and sorbitol and reactions analyzed by agarose gel electrophoresis.
  • FIGURE 20 PCR reaction optimization was carried out for modified polymerase variant (SEQ ID NO: 190) in the presence of BSA or L-carnitine and reactions analyzed by agarose gel electrophoresis.
  • FIGURE 21 A PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitol and sorbitol in the presence or absence of L-carnitine and reactions analyzed by agarose gel electrophoresis. Sugar and L-carnitine together showed an additive effect on amplification of a 3 kb CometGFP DNA target.
  • FIGURE 2 IB PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitol and sorbitol in the presence or absence of L-carnitine and reactions analyzed by agarose gel electrophoresis. Sugar and L-carnitine together showed an additive effect on amplification of a 5 kb human gDNA target.
  • FIGURE 22 A Various combinations of buffer compositions were used to set up PCR on a 3 kb E. coli target DNA with modified polymerase variant (SEQ ID NO: 190). PCR reactions with or without additive were analyzed by agarose gel electrophoresis.
  • FIGURE 22B Various combinations of buffer compositions were used to set up PCR on a 5 kb E. coli target DNA with modified polymerase variant (SEQ ID NO: 190). PCR reactions with or without additive were analyzed by agarose gel electrophoresis.
  • FIGURE 23A PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of sorbitol and with modified polymerase variant (SEQ ID NO: 190). PCR reactions without and with addition of sorbitol were analyzed by agarose gel electrophoresis.
  • FIGURE 23B PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of KC1 and with modified polymerase variant (SEQ ID NO: 190). PCR reactions without and with addition of KC1 were analyzed by agarose gel electrophoresis.
  • FIGURE 23C PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of non-ionic detergents, for example Triton X- 100, Tween 20, NP-40, CHAPS and Brij 8 and with modified polymerase variant (SEQ ID NO: 190). PCR reactions were analyzed by agarose gel electrophoresis. DETAILED DESCRIPTION OF THE INVENTION
  • Amplification reaction refers to any in vitro means for multiplying the copies of a target sequence of nucleic acid.
  • Such methods include but are not limited to polymerase chain reaction (PCR), DNA ligase chain reaction (see U.S. Pat. Nos. 4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis et al., eds, 1990)), (LCR), QBeta RNA replicase, and RNA transcription-based (such as TAS and 3SR) amplification reactions as well as others known to those of skill in the art.
  • PCR polymerase chain reaction
  • DNA ligase chain reaction see U.S. Pat. Nos. 4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis et al., eds, 1990)
  • LCR LCR
  • QBeta RNA replicase QBeta RNA replicase
  • “Amplifying” refers to a step of submitting a solution to conditions sufficient to allow for amplification of a polynucleotide if all components of the reaction are intact.
  • Components of an amplification reaction include, e.g., primers, a polynucleotide template, polymerase, nucleotides, and the like.
  • the term “amplifying” typically refers to an
  • exposure increase in target nucleic acid.
  • amplifying as used herein can also refer to linear increases in the numbers of a select target sequence of nucleic acid, such as is obtained with cycle sequencing.
  • Amplification reaction mixture refers to an aqueous solution comprising the various reagents used to amplify a target nucleic acid. These include enzymes, aqueous buffers, salts, amplification primers, target nucleic acid, and nucleoside triphosphates.
  • the mixture can be either a complete or incomplete
  • DNA sequence means a contiguous nucleic acid sequence.
  • the sequence can be an oligonucleotide of 2 to 20 nucleotides in length to a full length genomic sequence of thousands or hundreds of thousands of base pairs.
  • Domain refers to a unit of a protein or protein complex, comprising a polypeptide subsequence, a complete polypeptide sequence, or a plurality of polypeptide sequences where that unit has a defined function.
  • the function is understood to be broadly defined and can be ligand binding, catalytic activity or can have a stabilizing effect on the structure of the protein.
  • DNA binding domain refers to nucleic acid and both full-length polypeptides and fragments of the polypeptides that have sequence nonspecific double- stranded DNA binding activity.
  • expression system means any in vivo or in vitro biological system that is used to produce one or more gene product encoded by a polynucleotide.
  • Enhances in the context of an enzyme refers to improving the activity of the enzyme, i.e., increasing the amount of product per unit enzyme per unit time.
  • Two elements are "heterologous" to one another if not naturally associated.
  • a nucleic acid sequence encoding a protein linked to a heterologous promoter means a promoter other than that which naturally drives expression of the protein.
  • a heterologous nucleic acid flanked by transposon ends or ITRs means a heterologous nucleic acid not naturally flanked by those transposon ends or ITRs, such as a nucleic acid encoding a polypeptide other than a transposase, including an antibody heavy or light chain.
  • a nucleic acid is heterologous to a cell if not normally found in the cell or in a different location (e.g., episomal or different genomic location) than the location naturally present within a cell.
  • the term "host” means any prokaryotic or eukaryotic organism that can be a recipient of a nucleic acid.
  • the terms "host,” “host cell,” “host system” and “expression host” can be used interchangeably.
  • An 'isolated' polypeptide or polynucleotide means a polypeptide or polynucleotide that has been either removed from its natural environment, produced using recombinant techniques, or chemically or enzymatically synthesized.
  • a polypeptide or polynucleotide of this invention is purified, that is, it is essentially free from any other polypeptide or polynucleotide and associated cellular products or other impurities.
  • nucleic acids or polypeptide sequences refers to the residues in the two sequences that are the same when aligned for maximum correspondence, as measured using a “sequence comparison algorithms”.
  • nucleoside and nucleotide include those moieties which contain not only the known purine and pyrimidine bases, but also other heterocyclic bases which have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, or other heterocycles. Modified nucleosides or nucleotides can also include modifications on the sugar moiety, for example, where one or more . of the hydroxyl groups are replaced with halogen, aliphatic groups, or is functional ized as ethers, amines, or the like.
  • nucleotidic unit is intended to encompass nucleosides and nucleotides.
  • NGS Next-generation sequencing
  • An "Open Reading Frame” or “ORP” means a portion of a polynucleotide that, when translated into amino acids, contains no stop codons.
  • the genetic code reads DNA sequences in groups of three base pairs, which means that a double-stranded DNA molecule can read in any of six possible reading frames-three in the forward direction and three in the reverse.
  • operably linked refers to functional linkage between two sequences such that one sequence modifies the behavior of the other.
  • a first polynucleotide comprising a nucleic acid expression control sequence (such as a promoter, IRES sequence, enhancer or array of transcription factor binding sites) and a second polynucleotide are operably linked if the first polynucleotide affects transcription and/or translation of the second polynucleotide.
  • a first amino acid sequence comprising a secretion signal or a subcellular localization signal and a second amino acid sequence are operably linked if the first amino acid sequence causes the second amino acid sequence to be secreted or localized to a subcellular location.
  • a "promoter” means a nucleic acid sequence sufficient to direct transcription of an operably linked nucleic acid molecule. Also included in this definition are those transcription control elements (for example, enhancers) that are sufficient to render promoter- dependent gene expression controllable in a cell type-specific, tissue-specific, or temporal- specific manner, or that are inducible by external signals or agents; such elements, which are well-known to skilled artisans, may be found in a 5' or 3' region of a gene or within an intron.
  • transcription control elements for example, enhancers
  • a promoter is operably linked to a nucleic acid sequence, for example, a cDNA or a gene sequence, or an effector RNA coding sequence, in such a way as to enable expression of the nucleic acid sequence, or a promoter is provided in an expression cassette into which a selected nucleic acid sequence to be transcribed can be conveniently inserted.
  • Polymerase refers to an enzyme that performs template-directed synthesis of polynucleotides. The term encompasses both the full-length polypeptide or a domain that has polymerase activity.
  • Processivity refers to the ability of a polymerase to remain bound to the template or substrate and perform DNA synthesis. Processivity is measured by the number of catalytic events that take place per binding event.
  • PCR Polymerase chain reaction
  • PCR refers to a method whereby a specific segment or subsequence of a target double-stranded DNA, is amplified in a geometric progression.
  • PCR is well known to those of skill in the art; see, e.g., U.S. Pat. Nos. 4,683, 195 and 4,683,202; and PCR Protocols: A Guide to Methods and Applications, Innis et al., eds, 1990.
  • Exemplary PCR reaction conditions typically comprise either two or three step cycles. Two step cycles have a denaturation step followed by a hybridization/elongation step. Three step cycles comprise a denaturation step followed by a hybridization step followed by a separate elongation step.
  • Long PCR refers to the amplification of a DNA fragment of 5 kb or longer in length. Long PCR is typically performed using specially-adapted polymerases or polymerase mixtures (see, e.g., U.S. Pat. Nos. 5,436,149 and 5,512,462) that are distinct from the polymerases conventionally used to amplify shorter products.
  • a "primer” refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid and serves as a point of initiation of nucleic acid synthesis.
  • Primers can be of a variety of lengths and are often less than 50 nucleotides in length, for example 12- 30 nucleotides, in length.
  • the length and sequences of primers for use in PCR can be designed based on principles known to those of skill in the art, see, e.g., Innis et al., supra.
  • polymerase primer/template binding specificity refers to the ability of a polymerase to discriminate between correctly matched primer/templates and mismatched primer templates.
  • An "increase in polymerase primer/template binding specificity” in this context refers to an increased ability of a variant modified fusion polymerases of the invention to discriminate between matched primer/template in comparison to a wildtype polymerase fusion protein.
  • An "improved polymerase” includes both, a modified polymerase and/or sequence-non-specific double-stranded DNA binding domain joined to the polymerase or polymerase domain.
  • a "modified DNA polymerase” or modified DNA polymerase variant or DNA polymerase variant refers to a DNA polymerase, comprising one or more mutations that modulate one or more activities of the DNA polymerase including, but not limited to, DNA polymerization activity, base analog detection activities, 3 '-5' or 5 '-3' exonuclease activities, processivity improved nucleotide analog incorporation activity, proofreading, fidelity, efficiency, specificity, thermostability and intrinsic hot start capability or decreased DNA polymerization at room temperature, decreased amplification slippage on templates with trinucleotide repeat stretches or homopolymeric stretches, decreased amplification cycles, decreased extension times, reduced sensitivity to inhibitors (e.g., high salt, nucleic acid purification reagents), altered optimal reaction conditions (e.g., pH, KCL) and a decrease in the amount of polymerase needed for the applications described.
  • inhibitors e.g., high salt, nucleic acid purification reagents
  • PCR "sensitivity” refers to the ability to amplify a target nucleic acid that is present in low concentration.
  • Low concentration refers to 10.sup.4, often 10.sup.3 ;
  • polypeptide peptide
  • protein protein
  • amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers.
  • selectable marker means a polynucleotide segment that allows one to select for or against a molecule or a cell that contains it, often under particular conditions. These markers can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions.
  • Selectable markers include but are not limited to: (1) DNA segments that encode products which provide resistance against otherwise toxic compounds (e.g., antibiotics); (2) DNA segments that encode products which are otherwise lacking in the recipient cell (e.g., tRNA genes, auxotrophic markers); (3) DNA segments that encode products which suppress the activity of a gene product; (4) DNA segments that encode products which can be readily identified (e.g., phenotypic markers such as beta-galactosidase, green fluorescent protein (GFP), and cell surface proteins); (5) DNA segments that bind products which are otherwise detrimental to cell survival and/or function; (6) DNA segments that otherwise inhibit the activity of any of the DNA segments described in Nos.
  • DNA segments that encode products which provide resistance against otherwise toxic compounds e.g., antibiotics
  • DNA segments that encode products which are otherwise lacking in the recipient cell e.g., tRNA genes, auxotrophic markers
  • DNA segments that encode products which suppress the activity of a gene product e.g., phenotypic markers such as beta-galact
  • DNA segments that bind products that modify a substrate e.g. restriction endonucleases
  • DNA segments that can be used to isolate a desired molecule e.g. specific protein binding sites
  • DNA segments that encode a specific nucleotide sequence which can be otherwise non-functional e.g., for PCR amplification of subpopulations of molecules
  • DNA segments, which when absent, directly or indirectly confer sensitivity to particular compounds e.g., antisense oligonucleotides
  • DNA segments that bind products that modify a substrate e.g. restriction endonucleases
  • DNA segments that can be used to isolate a desired molecule e.g. specific protein binding sites
  • DNA segments that encode a specific nucleotide sequence which can be otherwise non-functional e.g., for PCR amplification of subpopulations of molecules
  • DNA segments, which when absent, directly or indirectly confer sensitivity to particular compounds e.g., antisense oligonucleotides
  • Sequence identity can be determined by aligning sequences using algorithms, such as BESTFIT, FAST A, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.), using default gap parameters, or by inspection, and the best alignment (i.e., resulting in the highest percentage of sequence similarity over a comparison window).
  • algorithms such as BESTFIT, FAST A, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.
  • Percentage of sequence identity is calculated by comparing two optimally aligned sequences over a window of comparison, determining the number of positions at which the identical residues occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of matched and mismatched positions not counting gaps in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity.
  • the window of comparison between two sequences is defined by the entire length of the shorter of the two sequences.
  • translation refers to the process by which a polypeptide is synthesized by a ribosome 'reading' the sequence of a polynucleotide.
  • Thermally stable polymerase refers to any enzyme that catalyzes polynucleotide synthesis by addition of nucleotide units to a nucleotide chain using DNA or RNA as a template and has an optimal activity at a temperature above 45°C, or retains at least 50% of its activity at elevated temperatures, for example above 95 °C.
  • thermostable refers to an enzyme which is stable to heat, is heat resistant, and functions at high temperatures, e.g., 50 to 100°C as compared, for example, to a non- thermostable form of an enzyme with a similar activity.
  • a thermostable nucleic acid polymerase derived from thermophilic organisms such as P. furiosus, M. jannaschii, A. fulgidiis or P. horikoshii are more stable and active at elevated temperatures as compared to a nucleic acid polymerase from E. coli.
  • a representative thermostable nucleic acid polymerase isolated from P. furiosus (Pfu) is described in Lundberg et al., 1991, Gene, 108:1-6.
  • Additional representative temperature stable polymerases include, e.g., polymerases extracted from the thermophilic bacteria Thermus fl vus, Thermus ruber, Thermus (hemophilus, Bacillus stearothermophilus (which has a somewhat lower temperature optimum than the others listed), Thermus lacteus, Thermus rubens, Thermotoga maritima, or from thermophilic archaea Thermococcus litoralis, and Methano thermusfervidus.
  • polymerases extracted from the thermophilic bacteria Thermus fl vus, Thermus ruber, Thermus (hemophilus, Bacillus stearothermophilus (which has a somewhat lower temperature optimum than the others listed), Thermus lacteus, Thermus rubens, Thermotoga maritima, or from thermophilic archaea Thermococcus litoralis, and Methano thermusfervidus.
  • a "temperature profile” refers to the temperature and lengths of time of the denaturation, annealing and/or extension steps of a PCR or cycle sequencing reaction.
  • a temperature profile for a PCR or cycle sequencing reaction typically consists of 10 to 60 repetitions of similar or identical shorter temperature profiles; each of these shorter profiles may typically define a two-step or three-step cycle. Selection of a temperature profile is based on various considerations known to those of skill in the art, see, e.g., Innis et al., supra.
  • the extension time required to obtain an amplification product of 5 kb or greater in length is reduced compared to conventional polymerase mixtures.
  • a "template” refers to a double stranded polynucleotide sequence that comprises the polynucleotide to be amplified, flanked by primer hybridization sites.
  • a “target template” comprises the target polynucleotide sequence flanked by hybridization sites for a 5' primer and a 3' primer.
  • vector or "DNA vector” or “gene transfer vector” refers to a polynucleotide that is used to perform a "carrying" function for another polynucleotide.
  • vectors are often used to allow a polynucleotide to be propagated within a living cell, or to allow a polynucleotide to be packaged for delivery into a cell, or to allow a polynucleotide to be integrated into the genomic DNA of a cell.
  • a vector may further comprise additional functional elements, for example it may comprise a transposon.
  • amino acids are grouped as follows: Group I (hydrophobic side chains): met, ala, val, leu, ile; Group II (neutral hydrophilic side chains): cys, ser, thr; Group III (acidic side chains): asp, glu; Group IV (basic side chains): asn, gin, his, lys, arg; Group V (residues influencing chain orientation): gly, pro; and Group VI (aromatic side chains): tip, tyr, phe.
  • Conservative substitutions involve substitutions between amino acids in the same class.
  • Non-conservative substitutions constitute exchanging a member of one of these classes for a member of another.
  • Positions in a variant sequence are assigned the same numbers as the aligned positions in corresponding reference sequence.
  • One class of methods is alignment-based, and involves aligning a set of sequences, including naturally occurring sequences. This alignment may be used to derive a phylogenetic relationship between the sequences. This relationship can also be used to calculate conservation properties and for each amino acid at each position in the protein.
  • a second class of methods is structural, in which substitutions are tested computationally for their effect upon a known or calculated protein structure. All these methods yield quantitative measures of the predicted favorability of replacing an amino acid with a different amino acid. The favorabilities predicted by one method may be different from the favorabilities predicted by a different method, so it is often desirable to combine the results from different methods.
  • amino acid residue positions in a reference sequence are compared with the same position in an alignment of homologous sequences. Positions that exhibit a high degree of variance in homologs may have a high probability that substitutions at such positions will be active.
  • One method of calculating the degree of amino acid variance is described by Gribskov (1987) Proc Natl Acad Sci USA 84, 4355.
  • a sequence alignment can serve as the basis of a Hidden Markov model that can be used to calculate the probability that one specific residue will be followed by a second specific residue. These models also include probabilities for gaps and insertions.
  • substitutions may be identified based upon the consensus sequence of the alignment.
  • homologous sequences are often analogous functionally and structurally, although having been subjected separately to different selective pressures they are also likely to be optimized differently. Amino acids that differ between homologous sequences thus provide a guide to substitutions that are likely to yield functional though different proteins. Alignment of homologous sequences can therefore be used to identify candidate substitution positions.
  • homologous protein sequences may be aligned (e.g., by using clustalw; Thompson et al. (1994) Nucleic Acids Res 22: 4673-80) and then a phylogenetic tree reconstructed.
  • Scores for a given alignment can also be normalized to have an average value of 0.0 and a standard deviation of 1.0, or other standard procedures can be used to compare and combine scores from multiple methods. These values can then be used directly as a score. For example, all sites with a score above a certain threshold value can be selected, or all sites with a score below a certain threshold value can be eliminated. Alternatively, the most variable (e.g., least conserved) sites can be selected by ranking the sites in order of these scores, or the least variable (e.g., most conserved) sites can be eliminated by ranking the sites in order of these scores.
  • most variable e.g., least conserved
  • Amino acid diversity and tolerance at each site can be measured as a fitness property of each amino acid at every location. The most fit residue for that position carries a higher value (e.g., Koshi et al. (2001) Pac Symp Biocomput 1 1-202; O. Soyer, .W.
  • Sites can be grouped into site-classes or treated independently. Sites and site classes most fit to change based on the substitution rate and the substitutions most favorable based on the fitness can be selected. These values of fitness may then be used directly as a score. All sites with a score above a certain threshold value could then be selected, for example, a cutoff (threshold) of 0.0 can be chosen (when the normalization of scores sets the wild type residue found in the reference to be 0.0). Alternatively, all sites with a score below a certain threshold value could be eliminated.
  • Threshold values of 0.0 or below can be eliminated, thereby only including amino changes that have a higher fitness value that the reference wild type amino acid found in that position.
  • the sites most tolerant to change could be selected by ranking the sites in order of these scores. For example, in the study of G- protein coupled receptors (GPCR) by Soyer et. al. (O. Soyer, M.W. Dimmic, R.R. Neubig, and R.A. Goldstein; Pacific Symposium on Biocomputing 7:625-636 (2002)), using the 8-site class model the class #8 was identified to have the highest substitution rate and the property correlating with fitness of amino acids at these positions was identified to be "charge transfer" propensity of the amino acid. In the present invention, amino acids in the sites that carry a higher relative fitness compared to the wild type amino acid found in that position are identified as suitable for substitution.
  • a substitution matrix represents the probability of one amino acid being replaced by a second amino acid across a set of positions within a set of proteins.
  • the matrix can be expressed in terms of probabilities or values derived from probabilities by mathematical transformation involving probabilities of transitions or substitutions (Pij) and observed frequencies of amino acids(Fi). Matrices using such transformation include scoring matrices like PAM100, PAM250, and BLOSUUM etc.
  • Substitution matrices are derived from pairwise alignments of protein homoiogs from sequence databases. They constitute estimates of the probability that one amino acid will be changed to another while conserving function. Different substitution matrices are calculated from different sets of sequences.
  • substitution matrices are calculated by selecting the protein families used to create the matrix, as well as the positions considered.
  • a substitution matrix that best captures the observed sequences in the protein family of interest can be calculated using the Bayesian method developed by Goldstein et al. (Koshi et al. (1995) Protein Eng 8: 641-645) and used to score all candidate substitutions.
  • a substitution or a scoring matrix can be calculated by considering homologous proteins from many different protein families (e.g., Benner et al. (1994) Protein Eng 7: 1323-1332; and Tomii et al. (1996) Protein Eng 9: 27-36) can be used to score all candidate substitutions.
  • Matrices derived from a variety of protein are often used to evaluate and confirm homology of protein sequences and represent an approximation of protein evolution in general.
  • Another example of an alignment based model is the reconstruction of ancestral sequences. Evolutionary relationships between homologous sequences can be derived in the form of phylogenetic trees. Using evolutionary models, ancestral sequences can probabilistically reconstructed. See, for example, Koshi and Goldstein (1996) MoL Evol. 42, 313-320. Coupled with knowledge of functions of proteins, evolutionary analysis will also identify amino acid changes that occur in functionally distinct groups. See, for example, Zhang and Rosenberg (2002) PNAS 99, 5486-5491. Comparison of the rates of synonymous (Ks) versus non-synonymous substitutions (Ka) can also be used to quantify (e.g., using Ka K s ratio) the type and degree of evolutionary constraint on substitutions.
  • the codon for this residue in the gene for the protein will tend to encode the same amino acid throughout the phylogenetic tree (synonymous substitutions, high K s ).
  • substitutions high K s
  • alterations of the corresponding codon in the gene will more frequently encode different amino acids throughout the phylogenetic tree (non-synonymous substitutions, Ka comparable with Ks).
  • the ratio of frequency with which a site is replaced by a synonymous codon to the frequency with which it is replaced by a non-synonymous codon in a reconstructed phylogenetic tree provides a measure of the selective pressure (on the function of the protein) acting to conserve the identity of the amino acid at that position. Often these ratios are calculated as averages for entire sequences. However, such ratios can also be limited to specific sites or groups of positions. These ratios can also be used to weight substitutions identified by other methods from a specific homolog.
  • Another example of an alignment-based method for identifying amino acid substitutions that are most important in differentiating protein function is a dimension- reducing technique such as principal component analysis. This has been previously described (e.g., Casari et al, 1995, Nat Struct Biol 2: 171-178; Gogos et al, 2000, Proteins 40: 98-105; and del Sol Mesa et al, 2003, J Mol Biol 326: 1289-1302).
  • PCA can identify sequence features and substitutions corresponding to the desired phenotype of the protein and scores "loads" for these features in the direction of desired phenotype are used as absolute scores or as filters to identify substitutions.
  • candidate amino acid changes can be computationally modeled in the structure(s) and changes in the free energy computed. These computationally calculated changes in free energies resulting from the substitutions can then be used directly as a score.
  • all changes can be selected that increase the free energy of the protein by less than a certain value. For example, all changes that would increase the free energy by less than lkCal/mol can be selected, all changes that would increase the free energy by less than 1.5 kCal/mol can be selected, all changes that would increase the free energy by less than 2kCaI/moI can be selected, or all changes that would increase the free energy by less than 2.5kCal/mol can be selected.
  • regions of the protein that differ structurally between homologs are considered more likely to tolerate change, while those regions that are structurally conserved are likely to be less tolerant.
  • Structures can be directly obtained from the database or predicted using various structure modeling software packages. Structures of homologs and mutants can be superposed on the wild type structure. See, for example, May et al. (1994) Protein Eng 7: 475-85; and Ochagavia et al (2002) Bio informatics 18: 637-40). Structural conservation can be calculated as the root mean squared (RMS) deviations of the backbones of the superposed chains.
  • RMS root mean squared
  • These computationally calculated RMS deviations for every position between homologous structures can then be used directly as a score.
  • RMS deviations between the alpha carbons (or backbone atoms) in the structure of the target protein and one or more homologous proteins that are greater than a threshold value can be considered structurally labile and these sites can be selected.
  • This threshold RMS deviation between homologous structures can be greater than 2A, 2.5A, 3A, 3.5A, 4A, 4.5A, 5A.
  • RMS deviations between the alpha carbons in the structure of the target protein and one or more homologous proteins that are less than a threshold value can be considered structurally conserved and these sites can be eliminated.
  • This threshold RMS deviation between homologous structures can be less than 2A, 2.5 A, 3 A, 3.5A, 4A, 4.5A, or 5A.
  • changes near catalytic and binding sites are highly likely to influence the activity of the protein and are good candidates for substitution.
  • All amino acid substitutions that are found in one or more homologs can be tested for proximity to a binding or catalytic or regulatory site of the protein. The distance between an amino acid substitution from a binding or catalytic or regulatory site, in one or more homologs, can be used directly as a score. Alternatively, all amino acid substitutions that are found in one or more homologs and that are within a threshold distance of a binding or catalytic or regulatory site in the protein can be selected.
  • This threshold distance can be less than 2k, 2.5A, 3A, 3.5 A, 4A, 4.5A, 5k, 5.5k, 6k, 6.5k, 7k.
  • all amino acid substitutions that are found in one or more homologs and that are beyond a threshold distance of a binding or catalytic or regulatory site in the protein can be eliminated.
  • This threshold distance can be more than 2A, 2.5A, 3A, 3.5A, 4A, 4.5 A, 5A, 5.5 A, 6k, 6.5k, or 7 A.
  • all amino acid substitutions that are found in one or more homologs can be ranked in order of proximity to a binding or catalytic or regulatory site in the protein and those that are closest to the binding or catalytic or regulatory site.
  • the substitution closest to the binding or catalytic or regulatory site can be selected, or between 2 and 20, between 10 and 100, or the top 200 substitutions closest to the binding or catalytic or regulatory site can be selected.
  • all amino acid substitutions that are found in one or more homologs can be ranked in order of proximity to a binding or catalytic or regulatory site in the protein and those that are farthest from the binding or catalytic or regulatory site eliminated.
  • the substitution farthest from the binding or catalytic or regulatory site can be eliminated.
  • between 2 and 20, between 10 and 100, or the top 200 substitutions farthest from the binding or catalytic or regulatory site can be eliminated.
  • infologs are designed variants of a given gene, for example a polymerase, where substitutions are systematically incorporated to achieve high information content enabling modern machine learning tools to de-convolute sequence-activity relationships.
  • infologs is the basis of our rational approach to protein engineering in which a matrix of well-defined amino acid substitutions is used to map the targeted fitness landscape.
  • libraries of ⁇ 100 mutants typically 96 are characterized for property/activities of interest and serve as a basis for machine-learning tools to design the next generation of infologs.
  • the initial set of infologs are designed to have the same number of substitutions (approximately 3), thereby probing regions at the same hamming distance from the reference locus in sequence space.
  • Substitutions in the mutants are selected from a pool of substitutions and each set of infologs contains several variants with the same substitution albeit in presence of two completely different mutations, thereby providing us with the ability to characterize the amino acid change with respect to its additivity and context dependence.
  • the functional consequences of individual substitutions can be modeled and quantitatively evaluated.
  • an infolog library based on 59 amino-acid substitutions in a tau class glutathione transferase (GST) from wheat afforded increased activity against most of a number of herbicides tested (Govindarajan et al, 2015).
  • SEQ ID NO: 128971 was used to identify homologs from Genbank database of non-redundant protein sequences using the BLAST program. The list of homologs used for identifying substitutions may not be limited to these homologs.
  • the multiple sequence alignment of all homologs and SEQ ID NO: 1 was obtained using the clustalw program and a phylogenetic tree was constructed. The alignment was used to enumerate possible changes that can be made to SEQ ID NO: 1 that are seen in the alignments.
  • DNA polymerase substitutions (relative to SEQ ID NO: 1) identified using combinations of the methods described above are shown in Tables 1 and 2.
  • DNA polymerase variants were synthesized to incorporate the amino acid substitutions shown in Table 3. Specific activities of the synthesized polymerase variants were determined experimentally. The specific activities were individually modeled as a function of the substitutions by linear regression. The results provide relative weights of each substitution for activity and other properties such as thermostability and amplification of >3 kb target DNA (Table 5). Mutations that showed positive weights for selected activity profiles were used in different combinations with other substitutions to construct a library of polymerase variants. Sets of variants were designed to incorporate selected substitutions a uniform number of times. The substitutions were also distributed within a variant set so that the number of unique pairs of substitutions is high, that ensures that different substitutions are tested in a wide variety of contexts. The specific combinations of substitutions in the 3 variant sets (relative to SEQ ID NO: I) are shown in Table 4. Amino acid sequences are given as SEQ ID NOS: 3-242.
  • a nucleic acid encoding a non-natural polymerase wherein the polymerase has at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% sequence identity with SEQ ID NO: 190 retaining the combination of substitutions specified in Table 4 present in SEQ ID NO: 190 or any subset thereof.
  • modified polymerases other than natural sequences, comprising one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more of the substitutions shown in Table 1, Table 2 or Table 3 may also be used.
  • the invention thus provides non-naturally occurring polymerases having the sequence of SEQ ID NO:l modified by one or more of the substitutions shown in Table 3 and having zero to ten (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) substitutions, deletions or internal substitutions at positions other than those shown in Table 3.
  • the polymerases encoded by such polynucleotides can have at least, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or up to all of the substitutions present in Table 3.
  • any modifications at positions other than those shown in Table 3 are conservative substitutions having no significant effect on polymerase activity (i.e., activity indistinguishable from an otherwise identical polymerase without the substitution within experimental error).
  • Non-naturally occurring polymerases encoded by such polynucleotides are also provided.
  • the invention further provides polynucleotides encoding any of polymerase having the sequence of SEQ ID NO:3-242 or variants having for example at least 85%, 90%, 95%, 99% or 100% identity therewith provided that the combination of substitutions present in the SEQ ID NO. in question specified in Table 4 is retained.
  • Non-naturally occurring polymerases encoded by such polynucleotides are also provided. Any variations present in a sequence other than the substitutions shown in Table 4 for that sequence are preferably conservative substitutions not significantly affecting activity or substitutions shown in Table 3.
  • the substitutions present in preferred polymerase having the sequence of SEQ ID NO: 159 are M354I, L438I, D468G and A494S.
  • the substitutions present in preferred polymerase having the sequence of SEQ ID NO:I90 are Y342I, L438I and D468G.
  • the above mentioned polymerases preferably have an enhanced property relative to a base polymerase having the sequence of SEQ ID NO:l measured under the same conditions.
  • the enhanced property can be at least one of enhanced polymerase activity, enhanced reverse transcriptase activity, enhanced thermostability, enhanced strand- displacement activity, or any combination thereof, including all of these properties.
  • An activity is considered enhanced if the change is beyond experimental error (in other words, statistically significant).
  • the enhancement if quantifiable can be for example at least 10% to 1000%, including for example 10-500%, 10-200%, 10-100%, 20-1000%, 20-500%, 20- 100%, and 20-50%.
  • Modified polymerases e.g., SEQ ID NOS: 3-242 fused to various accessory proteins called processivity factors may also be used.
  • Processivity factors assist polymerases in various ways— some by forming complexes with the polymerase itself (for example, thioredoxin), and some by encircling duplex DNA (for example, clamp protein)— thereby ensuring a strong, stable binding with the template (Zhuang and Ai 2010).
  • a stable association of the polymerase with the template DNA is crucial for the unfettered incorporation of nucleotides, and therefore, for an efficient PCR reaction.
  • One strategy demonstrating enhanced processivity and improved PCR efficiency is the approach developed and patented by Wang in 2000 and published in 2004 (Wang et al.
  • Sso7d is a small protein (7 kD) capable of covalently binding to dsDNA without any preference for specific sequences.
  • the binding of the Sso7d domain to a DNA polymerase is optimal to smoothly slide along the template.
  • the covalent linking of the fusion protein with the polymerase does not entail any structural modifications, therefore the fusion does not interfere with the structural integrity and thermal stability and consequently, with the catalytic activity of the enzyme.
  • Sso7d can be linked to both family A and family B DNA polymerases and can bind with dsDNA at ambient temperature as well as high temperatures. It can therefore help enzymes that are thermostable or otherwise. Studies carried out using the Sso7d fusion polymerase have demonstrated that the fusion protein technique improves processivity without affecting the catalytic activity or thermal stability of the enzyme.
  • aptamers or antibodies fused to the modified polymerase are also used.
  • the aptamer used is attached to the reactive site of the modified polymerase, and thus is inactive at room temperature.
  • the temperature of the reaction solution is increased to high temperature, the three-dimensional structure of the aptamer is modified so that it is separated from polymerase so as to be active, and the reverse transcription reaction of a specifically primed target RNA can be performed, and afterwards PCR can be performed.
  • Other processivity factors comprise sequences encoding polymerase fused to certain protein functional domains.
  • protein functional domains can include, but are not limited to, one or more DNA binding domains, one or more nuclear localization signals, one or more flexible hinge regions that can facilitate one or more domain fusions, and combinations thereof. Fusions can be made either to the N-terminus, C-terminus, or internal regions of the polymerase protein so long as polymerse activity is retained.
  • DNA binding domains used can include, but are not limited to, a helix-turn-helix domain, Zn-fmger domain, a leucine zipper domain, or a helix-ioop-helix domain.
  • Specific DNA binding domains used can include, but are not limited to, a Gal4 DNA binding domain, a LexA DNA binding domain, or a Zif268 DNA binding domain.
  • Nuclear localization signals (NLS) used can include, but are not limited to, consensus NLS sequences, viral NLS sequences, cellular NLS sequences, and combinations thereof.
  • Flexible hinge regions used can include, but are not limited to, glycine/serine linkers and variants thereof.
  • a polymerase other than a naturally occurring polymerase, whose sequence comprises a polypeptide with at least 85%, at least 86 at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% identity to SEQ ID NO: 190 provided the combination of substitutions shown in Table 4 for SEQ ID NO: 190 or a subset thereof is retained.
  • Modified DNA polymerase combines a function of polymerizing DNA using RNA as a template and a function of polymerizing DNA using DNA as a template.
  • the modified polymerases are active on both DNA and RNA templates and can generate amplicons from any source, for example DNA or RNA from bacterial, human or murine hosts and more.
  • a modified polymerase often exhibits an increase in primer/template specificity in comparison to an unmodified polymerase comprising a wildtype sequence.
  • Primer/template specificity is the ability of an enzyme to discriminate between matched primer/template duplexes and mismatched primer/template duplexes. Specificity can be determined, for example, by comparing the relative yield of two reactions, one of which employs a matched primer, and one of which employs a mismatched primer. An enzyme with increased discrimination has a higher relative yield with the matched primer than with the mismatched primer, i.e., the ratio of the yield in the reaction using the matched primer vs. the reaction using the mismatched primer is about 1 or above.
  • the modified polymerase(s) typically exhibit at least a 2-fold, often 3 -fold or greater increase in the ratio relative to a wild type polymerase.
  • Specificity can also be measured, for example, in a real-time PCR, where the difference in the Ct (threshold cycle) values (.DELTA.C.sub.t) between the fully
  • the Ct value represents the number of cycles required to generate a detectable amount of DNA (a "detectable" amount of DNA is typically 2 times, usually 5 times, 10 times, 100 times or more above background).
  • a polymerase with enhanced specificity may be able to produce a detectable amount of DNA in a smaller number of cycles by more closely approaching the theoretical maximum amplification efficiency of PCR. Accordingly, a lower Ct value reflects a greater
  • Polymerase processivity can be measured by a variety of methods known to those of ordinary skill in the art. Polymerase processivity is generally defined as the number of nucleotides incorporated during a single binding event of a modifying enzyme to a primed template. For example, a 5'FAM-labeled primer is annealed to circular or linearized ssM13 mpl8 DNA to form a primed template. In measuring processivity, the primed template usually is present in significant molar excess to the polymerase so that the chance of any primed template being extended more than once by the polymerase is minimized.
  • the primed template is therefore mixed with the polymerase at a ratio such as approximately 4000:1 (primed DNA: DNA polymerase) in the presence of buffer and dNTPs.
  • MgCb is added to initiate DNA synthesis.
  • Samples are quenched at various times after initiation, and analyzed on a sequencing gel.
  • the length corresponds to the processivity of the enzyme.
  • the processivity of a protein of the invention i.e., a modified polymerase, is then compared to the processivity of the wild-type enzyme (an unmodified polymerase).
  • the modified polymerases of the invention are expected to exhibit increased processivity relative to the unmodified polymerase.
  • Enhanced efficiency can also be demonstrated by measuring the increased ability of an enzyme to produce product
  • Such an analysis measures the stability of the double-stranded nucleic acid duplex indirectly by determining the amount of product obtained in a reaction.
  • a PCR assay can be used to measure the amount of PCR product obtained with a short, e.g., 12 nucleotides in length, primer annealed at an elevated temperature, for example, 50°C.
  • enhanced efficiency is shown by the ability of a modified polymerase to produce more product in a PCR reaction using the 12-nucleotide primer annealed at 50°C in comparison to an unmodified polymerase.
  • Long PCR may be used as another of demonstrating enhanced efficiency.
  • an enzyme with enhanced efficiency typically allows the amplification of a long ampiicon (>5 kb) in a shorter extension time compared to an enzyme with relatively lower efficiency.
  • Assays such as salt sensitivity can also be used to demonstrate improvement in efficiency of a processive nucleic acid modifying enzyme of the invention.
  • a modified polymerase of the invention can exhibit increased tolerance to high salt concentrations, i.e., a processive enzyme with increased processivity can produce more product in higher salt concentrations.
  • a PCR analysis can be performed to determine the amount of product obtained in a reaction using a modified polymerase compared to an unmodified polymerase in reaction conditions with high salt, for example, 80 mM.
  • the fidelity of DNA polymerase refers to its ability to accurately replicate a template.
  • High-fidelity PCR utilizing modified DNA polymerase variants that couple low misincorporation rates with proofreading activity to give faithful replication of a DNA target are also contemplated.
  • Modified DNA polymerase variants selected from SEQ ID NOS: 3- 242 that show high fidelity and proofreading activity are claimed. Additional variants with proofreading activity and low misincorporation rates are also contemplated.
  • Variants of SEQ ID NO: 1, that is modified polymerases comprising one or more substitution shown in Table 1 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates (DNA or RNA) from human cells.
  • Modified polymerase variants comprising one or more substitution shown in Table 2 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells.
  • Modified polymerase variants comprising one or more substitution shown in Table 3 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells.
  • Modified polymerase variants comprising one or more combinations of substitutions from Table 3 relative to SEQ ID NO: 1 are shown in Table 4 and are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells.
  • DNA polymerase sequences SEQ ID NO: 3-242 are anticipated to possess improved properties conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells.
  • the modified polymerases may be included in vectors and host cells comprising such vectors.
  • the modified polymerases may be provided as a vector or a purified protein as a component of a kit.
  • Kits may include other components such as buffers, salts, primers and may be used for applications such as amplification, e.g., PCR and/or sequencing (e.g., next-generation sequencing (NGS) platforms).
  • NGS next-generation sequencing
  • Modified polymerases may be used for quantification and quality assessment of human genomic DNA samples prior to NGS library construction.
  • the kit includes a qPCR Master mix with the modified polymerase, optimized for high-performance SYBR Green I- based qPCR, pre-diluted set of DNA standards and primer premixes targeting different portions of a highly conserved single-copy human locus. Absolute quantification is achieved with the primer pair defining the shortest fragment, whereas the additional primers are used to derive information about the amount of amplifiable template in the DNA sample.
  • Quality scores (or Q-ratios) generated with the kit may be used to predict the outcome of library construction, or tailor workflows for samples of variable quality, particularly formalin-fixed paraffin-embedded (FFPE) DNA, samples obtained by laser-capture microdissection of fresh, frozen, or FFPE tissue, DNA extracted from cells collected by flow cytometry, free circulating DNA from plasma or serum, forensic samples or any other low-concentration or precious clinical sample.
  • FFPE formalin-fixed paraffin-embedded
  • NGS Next-generation sequencing
  • modified polymerase may be used for DNA library preparation. Modified polymerases that enable higher yields and lower amplification bias translates to higher library diversity, lower duplication rates and more uniform coverage.
  • a method comprising amplifying sequences from a target polynucleotide, for example, polymerase chain reaction (PCR) or reverse transcription using a modified polymerase.
  • the polymerase may be selected from SEQ ID NOS: 3-242, in particular SEQ ID NO: 190.
  • kits comprising modified polymerases, other than natural sequences, comprising one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more of the substitutions shown in Table 1, Table 2 or Table 3 may also be used.
  • Amplifying sequences from a target polynucleotide using modified polymerases, other than a natural sequence, that have at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% sequence identity to polymerase of SEQ ID NO: 190 is also
  • the target polynucleotide may be DNA or RNA.
  • the target polynucleotide may be from a human cell or a murine cell.
  • Modified DNA polymerase variants were expressed in E. coli using the rhamnose inducible system in vector pD861 (from ATUM/DNA2.0). Variants were purified in parallel using heat cut, host cell DNA removal and column chromatography and stored in appropriate storage buffer. Purified protein was characterized for i) purity on SDS-PAGE; ii) polymerase activity; iii) host cell DNA contamination test by 16S-PCR; iv) exonuclease activity and; v) thermostability test.
  • DNA polymerase activity was determined using a DNA polymerase assay based on the detection of incorporated radioisotope-labeled dNTP in a DNA elongation reaction. The reaction was incubated at 70°C for 10 minutes. Activity of the DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA (Table 6).
  • the parent DNA polymerase (Accura DNA polymerase) has an inherent 3' - 5' proofreading exonuclease activity. Exonuclease activity of the modified DNA polymerase (DNAP) variants was measured to determine if the purified DNAP variants maintained the exonuclease activity. Nuclease activity of the modified DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA. The reaction mixture including 10 units of enzyme and 3 P-labeled DNA fragment in 25 ⁇ IX DNAP buffer B (Lucigen) was incubated at 65°C for 16 hours. Reactions were analyzed by trichloroacetic acid precipitation method.
  • DNA polymerase variants were tested by incubating the enzymes in IX DNAP buffer B (Lucigen) at 98°C for 2 minutes. Thermostability of the DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA (Table 0). The polymerase activity was determined as described in example 6.1.2. . o «5 3 ⁇ 4 *-r> rM
  • Amplification properties of DNA polymerase variants were determined by PCR of a 2.8 kb Pol A target DNA from E, coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability. 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 200 ⁇ dNTP, 10 ng E. coli gDNA, 1 ⁇ each primer, 5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles - 94°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C.
  • a second PCR reaction set was run with cycling conditions: 94°C for 2 mins, 35 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C.
  • Accura DNA polymerase and KAPA DNA polymerase was included as controls.
  • Amplification properties of DNA polymerase variants were determined by PCR of a 5 kb and 10 kb target DNA from E. coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability. 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 200 ⁇ dNTP, 10 ng E. coli gDNA, 1 ⁇ each primer, 5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles - 94°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C.
  • a second PCR reaction set was run with cycling conditions: 94°C for 2 mins, 35 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C.
  • Accura DNA polymerase and KAPA DNA polymerase was included as controls.
  • PCR reactions were analyzed by agarose gel electrophoresis.
  • V302 (SEQ ID NO: 148), V303 (SEQ ID NO: 149), V305 (SEQ ID NO: 151), V307 (SEQ ID NO: 153), V308 (SEQ ID NO: 154), V309 (SEQ ID NO: 155), V310 (SEQ ID NO: 156), V311 (SEQ ID NO: 157), V313 (SEQ ID NO: 159), V314 (SEQ ID NO: 160), V3 I5 (SEQ ID NO: 161), V316 (SEQ ID NO: 162), V320 (SEQ ID NO: 166), V321 (SEQ ID NO: 167), V322 (SEQ ID NO: 168), V324 (SEQ ID NO: 170), V327 (SEQ ID NO: 173), V331 (SEQ ID NO: 177), V334 (SEQ ID NO: 180), V335 (SEQ ID NO: 181), V339 (SEQ ID NO: 148), V303 (SEQ ID NO:
  • Modified DNA polymerase variants efficiently amplified longer human DNA target (5 kb) by PCR.
  • 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 50 ng human DNA, 2.5 ⁇ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 30 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2.5 mins, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 30 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 2.5 mins, 72°C for 10 mins, hold at 4°C.
  • Accura DNA polymerase and KAPA DNA polymerase was included as controls, PCR reactions were analyzed by agarose gel electrophoresis ( Figure 6 A) and on a Bioanalyser ( Figure 6B).
  • Modified DNA polymerase variants identified show good NGS Human DNA library amplification. 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 20 pg human DNA library, 2.5 ⁇ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 1 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls.
  • Modified DNA polymerase variants V304 (SEQ ID NO: 150), V309 (SEQ ID NO: 155), V313 (SEQ ID NO: 159), V318 (SEQ ID NO: 164), V319 (SEQ ID NO: 165), V320 (SEQ ID NO: 166), V327 (SEQ ID NO: 173), V339 (SEQ ID NO: 185), V344 (SEQ ID NO: 190), V397 (SEQ ID NO: 242) efficiently amplified high GC-content NGS DNA library from Rhodobacter giving high yields of amplicons.
  • Modified DNA polymerase variants successfully amplified high AT-content NGS DNA library from Staphylococcus. 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 20 pg Staphylococcus DNA library, 2.5 ⁇ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 mins, 72°C for 10 mins, hold at 4°C. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis ( Figures 9) and on a Bioanalyser (data
  • Modified DNA polymerase variants V313 (SEQ ID NO: 159), V320 (SEQ ID NO: 166), V344 (SEQ ID NO: 190), V368 (SEQ ID NO: 214) efficiently amplified high AT- content NGS DNA library from Staphylococcus giving high yields of amplicons.
  • Primer extension was set up as follows in a 25 ⁇ reaction mixture containing IX DNAP buffer B, 0.5 ⁇ g Ml 3 single-stranded DNA (ssDNA), 0.4 ⁇ Ml 3 primer, 250 ⁇ dNTPs, 0.1 ⁇ g/ ⁇ l BSA and 5 U of polymerase. Reactions were incubated at 65 °C for 30 minutes and analyzed by agarose gel electrophoresis (Figure 10).
  • RT-PCR 25 ⁇ reaction mixture was set up as follows: IX HF buffer, 200 ⁇ dNTP, 1 ng MS2 RNA, 0.5 ⁇ each primer, 5 mM PCF if needed, 2.5 U polymerase; RT-PCR cycling conditions: 65°C for 2 mins, 94°C for 2 mins; 35 cycles: 94°C for 15 seconds, 60°C for 30 seconds, 72°C for 1 minute; 72°C for 10 mins, hold at 4°C.
  • RT-PCR reactions were analyzed by agarose gel electrophoresis ( Figures 11A and 1 IB), a subset of samples were re-analyzed on an agarose gel to confirm positives ( Figure 11C).
  • PCR reaction and cycling conditions were the same as sections 6.1.5 and 6.1.6 above except using 2.5 U polymerase for the 3kb and 5 kb templates.
  • 25 ⁇ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 10 ng E. coli gDNA, 0.5 ⁇ each primer, 2.5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles: 94°C for 15 seconds, 60°C for 30 seconds, 72°C for 5 mins; 72°C for 10 mins, hold at 4°C (Figure 15B panel I).
  • a second set was cycling conditions with 30 cycles ( Figure 15B panel II).
  • PCR reaction and cycling conditions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 10 ng human gDNA, 0.5 ⁇ primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 30 cycles: 98°C for 15 seconds; 60°C for 30 seconds; 72°C fori.5 mins, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 30 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1.5 mins, 72 °C for 10 mins, hold at 4°C.
  • Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis ( Figure 16).
  • PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 20 pg DNA library, 2.5 ⁇ P5/P7 primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles: 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis ( Figure 17) and on a Bioanalyser (data not shown).
  • PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 ⁇ dNTP, 20 pg Staphylococcus DNA library, 2.5 ⁇ P5/P7 primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 mins, 72°C for 10 mins, hold at 4°C.
  • Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis ( Figure 18) and by Bioanalyser (data not shown).
  • Second round and third round variant screening is summarized in Table 12 below.
  • PCR optimization was carried out first in the presence of sugars, for example sucrose, trehalose, mannitol ( Figure 19 A) and sorbitol (Figure 19B) followed by addition of bovine serum albumin (BSA) or L-carnitine (Figure 20).
  • sugars for example sucrose, trehalose, mannitol ( Figure 19 A) and sorbitol ( Figure 19B) followed by addition of bovine serum albumin (BSA) or L-carnitine (Figure 20).
  • BSA bovine serum albumin
  • L-carnitine helped with a more productive PCR.
  • Sugar and L-carnitine together showed an additive effect on amplification of a 3 kb CometGFP DNA and a 5 kb human gDNA target ( Figures 21 A and 21B).
  • buffer compositions for example Tris-HCl (10-50 mM), Tris-acetate (10-50 mM) or Bis-Tris propane (10-50 mM) with a pH range of 8.0-9.3; salts, for example KC1 (10-100 mM) and ammonium sulfate (5-50 mM); metal, for example magnesium chloride (1-5 mM) or magnesium sulfate (1-5 mM); non-ionic detergents, for example Triton X-100 (0.1-1%), NP-40 (0.1-1%), Tween 20 (0.1-1%), Brij58 (0.1-1%) or CHAPS (0.1-1%); dNTPs (50-500 ⁇ ) were set up on a 3 kb and 2.8 kb E.coli targets ( Figures 22A and 22B respectively), buffer combinations shown in Table 13.
  • Tris-HCl 10-50 mM
  • PCR reactions were set up as follows: 25 ⁇ of IX buffer as designed (Table 12), 5 pg of plasmid DNA, 0.5 ⁇ primers, 2.5 U of hotstart polymerase V344 (SEQ ID NO: 190) with or without additive, cycling conditions: 95°C for 2 minutes; 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds, 72°C for 1.5 minutes; 72°C for 10 min. A summary of results is shown in Table 14.
  • PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of sorbitol (Figure 23A), KC1 ( Figure 23B) or non-ionic detergents ( Figure 23C).
  • PCR reactions were set up as follows: 25 ⁇ of IX basic buffer, pH range 8.0- 9.3 ( Figure 23A), 200 ⁇ dNTP, 5 pg of CometGFP DNA, 0.5 ⁇ primers, 15% sorbitol added to a subset of reactions, 2.5 U of hotstart polymerase V344 (SEQ ID NO: 238041) with or without additive, cycling conditions: 95°C for 2 minutes; 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds, 72°C for 1.5 minutes; 72°C for 10 min. Reactions were analyzed by agarose gel electrophoresis.
  • Models are based on the set of infologs described and assess the relative contribution of substitutions within the set. Model weights are shown for activities that are desirable in the modified variants and are used for selection of substitutions.
  • V645A 6.35 S571L 5.28 E511L 4.15 V601L 3.58
  • V602F 1.66 V343E 2.23 P560Y 1.78 F576I 1.48
  • V358A 0.05 V343S 0.02 Q513D 0.00 P495Y 0.00

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The present invention provides DNA polymerase sequences for in vitro applications including PCR for creation of next generation sequencing libraries, and for reverse-transcriptase PCR.

Description

MODIFICATION OF DNA POLYMERASES FOR IN VITRO APPLICATIONS
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a non-provisional of 62/359,059 filed July 6, 2016 and 62/360,627 filed July 11, 2016, each incorporated by reference in its entirety for all purposes.
REFERENCE TO A SEQUENCE LISTING
[0002] The application refers to sequences disclosed in a txt file named
AT20170616_ST25.TXT, of 1,347,000 bytes, created June 16, 2017, incorporated by reference.
FIELD OF THE INVENTION
[0003] The field of the present invention relates to novel compositions of DNA polymerases for in vitro applications.
BACKGROUND OF THE INVENTION
[0004] DNA polymerases (DNAP) are widely used for nucleic acid amplification, detection and sequencing. The most commonly used enzyme for Sanger sequencing is derived from Thermus aquaticus (Taq), whereas Bacillus sterothermophilus (Bst) DNAP is used in the 454/Roche pyrosequencing platform. Taq polymerase lacks proofreading activity and is unable to efficiently extend misincorporated bases. Mismatched base pairing generates truncated products that accumulate during PCR and contribute to reaction failure if the target is too long and/or the template DNA is supplied in low amounts. In contrast, proofreading high fidelity enzymes are extremely accurate, but do not perform well over longer target distances or with low template concentration because the 3 '-5' exonuclease (proofreading) activity destroys primers and affects sensitivity.
[0005] Second and third generation instruments for massively parallel DNA sequencing can deliver megabases of data at a lower cost. The development of a DNAP to match the technical capabilities of new instrument platforms has not kept pace. Achieving long and accurate reads using new solid phase extension methods, terminator chemistries, and microf!uidic flow technologies places new demands on currently used enzymes. Polymerases with increased template affinity for DNA or RNA could provide important improvements in sequencing, amplification and reverse transcription. We have engineered bacterial, archaeal, and viral polymerases to improve binding affinity, modifications that will be useful improvements for a variety of next generation sequencing (NGS) and amplification applications.
[0006] DNA polymerases are well known in the art and include both DNA-dependent polymerases and RNA-dependent polymerases such as reverse transcriptase. At least five families of DNA-dependent DNA polymerases are known, although most fall into families A, B and C. Other less characterized families include D, X, Y and RT. There is little or no structural or sequence similarity among the various families. Most family A polymerases are single chain proteins that can contain multiple enzymatic functions including polymerase, 3' to 5' exonuclease activity and 5' to 3' exonuclease activity. Family B polymerases typically have a single catalytic domain with polymerase and 3' to 5* exonuclease activity, as well as accessory factors. Family C polymerases are typically multi-subunit proteins with
polymerizing and 3' to 5' exonuclease activity. In E. coli, three types of DNA polymerases have been found, DNA polymerases I (family A), II (family B), and III (family C). In eukaryotic cells, three different family B polymerases, DNA polymerases alpha, delta, and epsilon, are implicated in nuclear replication, and a family A polymerase, polymerase gamma, is used for mitochondrial DNA replication. Other types of DNA polymerases include phage polymerases.
[0007] Similarly, RNA polymerases typically include eukaryotic RNA polymerases I, II, and III, and bacterial RNA polymerases as well as phage and viral polymerases. RNA
polymerases can be DNA-dependent and RNA-dependent.
[0008] The present invention is directed to eukaryotic DNA polymerases, in particular Pol eta, Pol iota and Pol kappa, members of Family Y DNA polymerases involved in the DNA repair by translesion synthesis and encoded by genes POLH, POLI and POLK respectively. The DinB Pol κ subgroup proteins are ubiquitously present from bacteria to humans, but notably absent in the completely sequenced genomes of Saccharomyces
cerevisiae and Drosophila melanogaster. The E. co/?DinB protein was shown to have DNA polymerase activity (designated DNA polymerase rV, or Pol IV), independently of accessory proteins such as UmuD' and RecA which are required for UmuC-dependent DNA
polymerase activity (designated Pol V). Members of Family Y have five common motifs to aid in binding the substrate and primer terminus and they all include the typical right-hand thumb, palm and finger domains with added domains like little finger (LF), polymerase- associated domain (PAD), or wrist. The active site, however, differs between family members due to the different lesions being repaired. Polymerases in Family Y are low-fidelity polymerases. The importance of these polymerases is evidenced by the fact that gene encoding DNA polymerase η is referred as XPV, because loss of this gene results in the disease Xeroderma Pigmentosum Variant. Pol η is particularly important for allowing accurate translesion synthesis of DNA damage resulting from ultraviolet radiation. The functionality of Pol κ is not completely understood, but researchers have found two probable functions. Pol κ is thought to act as an extender or an inserter of a specific base at certain DNA lesions.
SUMMARY OF THE INVENTION
[0009] The present invention provides modified DNA polymerases that have improved amplification properties and/or processivity over natural forms of the polymerases as well as other polymerases in commercial use. The invention also provides recombinant DNA sequences encoding such DNA polymerases, and vector plasmtds and host cells suitable for the expression of these recombinant DNA sequences. A polynucleotide encoding a non- naturally occurring polymerase, wherein the polymerase of SEQ ID NO: 1 comprises one or more substitutions listed in Table 3. The polymerase sequence is selected from SEQ ID NOS: 3-242. The polymerase may be selected from SEQ ID NOS: 156, 159, 164, 165, 166, 173, 181, 190, 214, 218, 225 and 242. A polymerase, other than a naturally occurring polymerase, whose sequence comprises a polypeptide with one or more substitutions from Table 3.
[00010] The invention further provides a polynucleotide encoding a non-natura ly occuring polymerase, wherein the polymerase has a sequence comprising SEQ ID NO: 1 modified by one or more substitutions listed in Table 3 and up to ten internal insertions, deletions or substitutions at positions other than those listed in Table 3. Optionally, the polymerase comprises at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 substitutions listed in Table 3.
Optionally combination of substitutions selected from the combinations listed in Table 4. Optionally, the polymerase has at least 90, 95 or 99% sequence identity to claim 1.
Optionally the polymerase has no substitutions other than those shown in Table 3 and conservative substitutions not affecting activity of the polymerase. The invention further provides a polynucleotide, which encodes an amino acid sequence at least 90, 95 or 99% sequence identical to any of SEQ ID NOS: 3-242 provided any substitutions present in the amino acid sequence specified in Table 4 are retained.
[00011] The polymerases selected can be thermostable, retain polymerase activity and exhibit reverse transcriptase activity. The polymerase variants can be selected for any or all of properties that include strand displacement activity, amplification of next generation sequencing (NGS) libraries, high fidelity amplification of target sequence, amplification of amplification resistant target sequences comprising direct repeats, inverted repeats, at least 65% G+C residues or A+T residues or a sequence greater than 2 kilobases. Preferably, any or all such properties are enhanced relative to the corresponding property of the polymerase having the amino acid sequence of SEQ ID NO:l . The target sequence may be
deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
[00012] DNA sequences encoding DNA polymerases and other motifs such as DNA binding proteins, antibodies and more are another aspect. The invention also provides a novel formulation of the DNA polymerases of the present invention and other thermostable DNA polymerases, which formulation of enzymes is capable of efficiently catalyzing the amplification by PCR (the polymerase chain reaction) of unusually long and faithful products. Also claimed are compositions comprising one or more non-natural polymerases selected from SEQ ID NOS: 3-242.
[00013] Also included is a method comprising amplifying sequences from a target polynucleotide, for example PCR or reverse transcription using a modified polymerase, other than a natural sequence selected from any of SEQ ID NOS: 3-242. The modified non-natural polymerase polynucleotide sequence is included in a construct operably linked to a promoter; the construct is included in a recombinant host cell. The target polynucleotide may be DNA or RNA and may be from a bacterial cell, a human cell or murine cell. The modified non- natural polymerases can amplify target polynucleotide sequences comprising amplification resistant sequence comprising direct repeats, inverted repeats, at least 65 % G+C residues, or A+T residues or a sequence greater than 2 kilobases. The modified non-natural polymerases allow amplification of next generation sequencing (NGS) libraries. A kit comprising a polynucleotide encoding a non-natural polymerase, wherein the polymerase of SEQ ID NO: 1 comprises one or more substitutions listed in Table 3 and/or is selected from SEQ ID NOS: 3- 242. A kit comprising a polymerase, other than a naturally occurring polymerase, whose sequence comprises a polypeptide with one or more substitutions from Table 3 and/or is selected from SEQ ID NOS: 3-242.
BRIEF DESCRIPTION OF THE DRAWINGS
[00014] FIGURE 1 A: Purified modified polymerases were tested for purity by SDS- PAGE analysis, >95% purity was observed.
[00015] FIGURE IB: Additional purified modified polymerases were tested for purity by SDS-PAGE analysis, >95% purity was observed.
[00016] FIGURE 2A: Purified modified DNA polymerase variants were tested for ability to amplify contaminating bacterial host DNA by the E. coli 16S PCR test. Reactions were set up as described in Section 6.1.5 and analyzed by agarose gel electrophoresis.
[00017] FIGURE 2B: Additional purified modified DNA polymerase variants were tested for ability to amplify contaminating bacterial host DNA by the E. coli 16S PCR test. Reactions were set up as described in Section 6.1.5 and analyzed by agarose gel electrophoresis.
[00018] FIGURE 3A: Amplification properties of modified DNA polymerase variants were determined by PCR of a 2.8 kb PoIA target DNA from E. coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability and analyzed by agarose gel electrophoresis. Polymerases Accura (Acc) DNAP, GoTaq and KAPA (KA) were run as controls.
[00019] FIGURE 3B: Additional PCR reactions from Figure 3A were run at two denaturation temperatures 94°C and 98°C to determine thermostability and analyzed by agarose gel electrophoresis as above. Polymerases Accura (Acc) DNAP and KAPA (KA) were run as controls.
[00020] FIGURE 4A: Amplification properties of modified DNA polymerase variants were determined by PCR of a 5 kb target DNA from E, coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
[00021] FIGURE 4B: Amplification properties of additional modified DNA polymerase variants were determined by PCR of a 5 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
[00022] FIGURE 5 A: Amplification properties of modified DNA polymerase variants were determined by PCR of a 10 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
[00023] FIGURE 5B: Amplification properties of additional modified DNA polymerase variants were determined by PCR of a 10 kb target DNA from E. coli genomic DNA (gDNA) at denaturation temperatures of 94°C and 98°C. PCR reactions were analyzed by agarose gel electrophoresis.
[00024| FIGURE 6A: Modified DNA polymerase variants efficiently amplified longer human DNA target (5 kb) by PCR and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. [00025] FIGURE 6B : Modified DNA polymerase variants (V313 and V318 shown) efficiently amplified longer human DNA target (5 kb) by PCR and analyzed on a
Bioanalyser. KAPA (KA) DNA polymerase was included as control.
[00026] FIGURE 7 (panels A and B): Modified DNA polymerase variants identified show good NGS Human DNA library amplification. PCR reactions were analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. Several modified DNA polymerase variants successfully amplified a human DNA library giving high yields of amplicons.
[00027] FIGURE 8 panels I and II: Modified DNA polymerase variants successfully amplified high GC-content NGS DNA library from Rhodobacter. PCR reactions were analyzed by agarose gel electrophoresis and on a Bioanalyser (data not shown). Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls.
[00028] FIGURE 9 panels I and II: Modified DNA polymerase variants successfully amplified high AT-content NGS DNA library from Staphylococcus, PCR reactions were analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls.
[00029] FIGURE 10: Panels A and B show modified DNA polymerase variants that show strong strand displacement activity by primer extension. Primer extension was set up and reactions analyzed by agarose gel electrophoresis.
[00030] FIGURE 11 A: Modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. RT-PCR reactions were run without (panel I) and with KF (panel II) and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase was run as control.
[00031] FIGURE 1 IB: Additional modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. RT-PCR reactions were run without (panel I) and with KF (panel II) and analyzed by agarose gel electrophoresis. Accura (Acc) DNA polymerase was run as control.
[00032] FIGURE 1 1C: Modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. A subset of samples from Figure 11 A were re-analyzed on an agarose gel to confirm positives.
[00033] FIGURE 12: 12 DNA polymerase variants from Table 10 were tested in a
HotStart version to show inhibition of polymerase activity by antibody.
[00034] FIGURE 13: Efficient amplification of a 3 kb plasmid E. coli target was shown for the 12 variants from round 1 and 2 round 2 variants SEQ ID NO: 101 (V103) and SEQ ID NO: 1 17 (VI 19). PCR reaction was set up and analyzed by agarose gel electrophoresis.
[00035] FIGURE 14: Efficient amplification of a 5 kb E. coli target was shown for the 12 variants from rounds 1 and 2, and round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction was set up and analyzed by agarose gel electrophoresis.
[00036] FIGURE 1 A: Efficient amplification of a 10 kb E. coli target was shown for the 12 variants from rounds 1 and round 2, round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction was set up and analyzed by agarose gel electrophoresis.
[00037] FIGURE 15B: Efficient amplification of a 10 kb E. coli target was shown for the 12 variants from rounds 1 and 2, round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction was set up and analyzed by agarose gel electrophoresis. Panel I shows reactions after 35 cycles of amplification. Panel II shows reactions after 30 cycles of amplification.
[00038] FIGURE 16: Efficient amplification of a 5 kb human DNA target was shown for the 12 modified polymerase variants from rounds 1 and 2, round 2 variants VI 03 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
[00039] FIGURE 17: Efficient amplification of NGS DNA libraries from human (panel I) and Rhodobacter (panel II) was shown for the 12 modified polymerase variants from Table 10 and two second round variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
[00040] FIGURE 18: Efficient amplification of NGS DNA library from AT-rich Staphylococcus was shown for the 12 modified polymerase variants from Table 10 and two second round variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
[00041] FIGURE 19A: PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitoi and reactions analyzed by agarose gel electrophoresis.
[00042] FIGURE I9B: PCR optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example trehalose, mannitol and sorbitol and reactions analyzed by agarose gel electrophoresis.
[00043] FIGURE 20: PCR reaction optimization was carried out for modified polymerase variant (SEQ ID NO: 190) in the presence of BSA or L-carnitine and reactions analyzed by agarose gel electrophoresis.
[00044] FIGURE 21 A: PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitol and sorbitol in the presence or absence of L-carnitine and reactions analyzed by agarose gel electrophoresis. Sugar and L-carnitine together showed an additive effect on amplification of a 3 kb CometGFP DNA target.
[00045] FIGURE 2 IB: PCR reaction optimization was carried out for modified polymerase variants (SEQ ID NOS: 159 and 190) in the presence of sugars, for example sucrose, trehalose, mannitol and sorbitol in the presence or absence of L-carnitine and reactions analyzed by agarose gel electrophoresis. Sugar and L-carnitine together showed an additive effect on amplification of a 5 kb human gDNA target.
[00046] FIGURE 22 A: Various combinations of buffer compositions were used to set up PCR on a 3 kb E. coli target DNA with modified polymerase variant (SEQ ID NO: 190). PCR reactions with or without additive were analyzed by agarose gel electrophoresis.
[00047] FIGURE 22B: Various combinations of buffer compositions were used to set up PCR on a 5 kb E. coli target DNA with modified polymerase variant (SEQ ID NO: 190). PCR reactions with or without additive were analyzed by agarose gel electrophoresis.
[00048] FIGURE 23A: PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of sorbitol and with modified polymerase variant (SEQ ID NO: 190). PCR reactions without and with addition of sorbitol were analyzed by agarose gel electrophoresis.
[00049] FIGURE 23B : PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of KC1 and with modified polymerase variant (SEQ ID NO: 190). PCR reactions without and with addition of KC1 were analyzed by agarose gel electrophoresis.
[00050] FIGURE 23C: PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of non-ionic detergents, for example Triton X- 100, Tween 20, NP-40, CHAPS and Brij 8 and with modified polymerase variant (SEQ ID NO: 190). PCR reactions were analyzed by agarose gel electrophoresis. DETAILED DESCRIPTION OF THE INVENTION
DEFINITIONS
[00051] Use of the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of polynucleotides, reference to "a substrate" includes a plurality of such substrates, reference to "a variant" includes a plurality of variants, and the like.
[00052] Terms such as "connected," "attached," "linked," and "conjugated" are used interchangeably herein and encompass direct as well as indirect connection, attachment, linkage or conjugation unless the context clearly dictates otherwise. Where a range of values is recited, it is to be understood that each intervening integer value, and each fraction thereof, between the recited upper and lower limits of that range is also specifically disclosed, along with each subrange between such values. The upper and lower limits of any range can independently be included in or excluded from the range, and each range where either, neither or both limits are included is also encompassed within the invention. Where a value being discussed has inherent limits, for example where a component can be present at a concentration of from 0 to 100%, or where the pH of an aqueous solution can range from 1 to 14, those inherent limits are specifically disclosed. Where a value is explicitly recited, it is to be understood that values which are about the same quantity or amount as the recited value are also within the scope of the invention. Where a combination is disclosed, each sub combination of the elements of that combination is also specifically disclosed and is within the scope of the invention. Conversely, where different elements or groups of elements are individually disclosed, combinations thereof are also disclosed. Where any element of an invention is disclosed as having a plurality of alternatives, examples of that invention in which each alternative is excluded singly or in any combination with the other alternatives are also hereby disclosed; more than one element of an invention can have such exclusions, and all combinations of elements having such exclusions are hereby disclosed.
[00053] Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., Dictionary of Microbiology and Molecular Biology, 2nd Ed., John Wiley and Sons, New York (1994), and Hale & Marham, The Harper Collins Dictionary of Biology, Harper Perennial, NY, 1991, provide one of skill with a general dictionary of many of the terms used in this invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. The terms defined immediately below are more fully defined by reference to the specification as a whole.
[00054] "Amplification reaction" refers to any in vitro means for multiplying the copies of a target sequence of nucleic acid. Such methods include but are not limited to polymerase chain reaction (PCR), DNA ligase chain reaction (see U.S. Pat. Nos. 4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis et al., eds, 1990)), (LCR), QBeta RNA replicase, and RNA transcription-based (such as TAS and 3SR) amplification reactions as well as others known to those of skill in the art.
[00055] "Amplifying" refers to a step of submitting a solution to conditions sufficient to allow for amplification of a polynucleotide if all components of the reaction are intact. Components of an amplification reaction include, e.g., primers, a polynucleotide template, polymerase, nucleotides, and the like. The term "amplifying" typically refers to an
"exponential" increase in target nucleic acid. However, "amplifying" as used herein can also refer to linear increases in the numbers of a select target sequence of nucleic acid, such as is obtained with cycle sequencing.
[00056] "Amplification reaction mixture" refers to an aqueous solution comprising the various reagents used to amplify a target nucleic acid. These include enzymes, aqueous buffers, salts, amplification primers, target nucleic acid, and nucleoside triphosphates.
Depending upon the context, the mixture can be either a complete or incomplete
amplification reaction mixture.
[00057] The terms "DNA sequence", "RNA sequence" or "polynucleotide sequence" mean a contiguous nucleic acid sequence. The sequence can be an oligonucleotide of 2 to 20 nucleotides in length to a full length genomic sequence of thousands or hundreds of thousands of base pairs.
[00058] "Domain" refers to a unit of a protein or protein complex, comprising a polypeptide subsequence, a complete polypeptide sequence, or a plurality of polypeptide sequences where that unit has a defined function. The function is understood to be broadly defined and can be ligand binding, catalytic activity or can have a stabilizing effect on the structure of the protein.
[00059] The term "DNA binding domain" refers to nucleic acid and both full-length polypeptides and fragments of the polypeptides that have sequence nonspecific double- stranded DNA binding activity. [00060] The term "expression system" means any in vivo or in vitro biological system that is used to produce one or more gene product encoded by a polynucleotide.
[00061] "Efficiency" in the context of a nucleic acid modifying enzyme of this invention refers to the ability of the enzyme to perform its catalytic function under specific reaction conditions. Typically, "efficiency" as defined herein is indicated by the amount of product generated under given reaction conditions,
[00062] "Enhances" in the context of an enzyme refers to improving the activity of the enzyme, i.e., increasing the amount of product per unit enzyme per unit time.
[00063] Two elements are "heterologous" to one another if not naturally associated. For example, a nucleic acid sequence encoding a protein linked to a heterologous promoter means a promoter other than that which naturally drives expression of the protein. A heterologous nucleic acid flanked by transposon ends or ITRs means a heterologous nucleic acid not naturally flanked by those transposon ends or ITRs, such as a nucleic acid encoding a polypeptide other than a transposase, including an antibody heavy or light chain. A nucleic acid is heterologous to a cell if not normally found in the cell or in a different location (e.g., episomal or different genomic location) than the location naturally present within a cell.
[00064] The term "host" means any prokaryotic or eukaryotic organism that can be a recipient of a nucleic acid. A "host," as the term is used herein, includes prokaryotic or eukaryotic organisms that can be genetically engineered. For examples of such hosts, see Maniatis et al., Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1982). As used herein, the terms "host," "host cell," "host system" and "expression host" can be used interchangeably.
[00065] An 'isolated' polypeptide or polynucleotide means a polypeptide or polynucleotide that has been either removed from its natural environment, produced using recombinant techniques, or chemically or enzymatically synthesized. Preferably, a polypeptide or polynucleotide of this invention is purified, that is, it is essentially free from any other polypeptide or polynucleotide and associated cellular products or other impurities.
[00066] The term "identical" in the context of two nucleic acids or polypeptide sequences refers to the residues in the two sequences that are the same when aligned for maximum correspondence, as measured using a "sequence comparison algorithms".
[00067] The terms "nucleoside" and "nucleotide" include those moieties which contain not only the known purine and pyrimidine bases, but also other heterocyclic bases which have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, or other heterocycles. Modified nucleosides or nucleotides can also include modifications on the sugar moiety, for example, where one or more . of the hydroxyl groups are replaced with halogen, aliphatic groups, or is functional ized as ethers, amines, or the like. The term "nucleotidic unit" is intended to encompass nucleosides and nucleotides.
[00068] "Next-generation sequencing" (NGS) referes to non-Sanger-based high- throughput DNA sequencing technologies. Millions or billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that are often used in Sanger sequencing of genomes.
[00069] An "Open Reading Frame" or "ORP" means a portion of a polynucleotide that, when translated into amino acids, contains no stop codons. The genetic code reads DNA sequences in groups of three base pairs, which means that a double-stranded DNA molecule can read in any of six possible reading frames-three in the forward direction and three in the reverse.
[00070] The term "operably linked" refers to functional linkage between two sequences such that one sequence modifies the behavior of the other. For example, a first polynucleotide comprising a nucleic acid expression control sequence (such as a promoter, IRES sequence, enhancer or array of transcription factor binding sites) and a second polynucleotide are operably linked if the first polynucleotide affects transcription and/or translation of the second polynucleotide. Similarly, a first amino acid sequence comprising a secretion signal or a subcellular localization signal and a second amino acid sequence are operably linked if the first amino acid sequence causes the second amino acid sequence to be secreted or localized to a subcellular location.
[00071J A "promoter" means a nucleic acid sequence sufficient to direct transcription of an operably linked nucleic acid molecule. Also included in this definition are those transcription control elements (for example, enhancers) that are sufficient to render promoter- dependent gene expression controllable in a cell type-specific, tissue-specific, or temporal- specific manner, or that are inducible by external signals or agents; such elements, which are well-known to skilled artisans, may be found in a 5' or 3' region of a gene or within an intron. Desirably, a promoter is operably linked to a nucleic acid sequence, for example, a cDNA or a gene sequence, or an effector RNA coding sequence, in such a way as to enable expression of the nucleic acid sequence, or a promoter is provided in an expression cassette into which a selected nucleic acid sequence to be transcribed can be conveniently inserted.
[00072] "Polymerase" refers to an enzyme that performs template-directed synthesis of polynucleotides. The term encompasses both the full-length polypeptide or a domain that has polymerase activity. [00073] "Processivity" refers to the ability of a polymerase to remain bound to the template or substrate and perform DNA synthesis. Processivity is measured by the number of catalytic events that take place per binding event.
[00074] "Polymerase chain reaction" or "PCR" refers to a method whereby a specific segment or subsequence of a target double-stranded DNA, is amplified in a geometric progression. PCR is well known to those of skill in the art; see, e.g., U.S. Pat. Nos. 4,683, 195 and 4,683,202; and PCR Protocols: A Guide to Methods and Applications, Innis et al., eds, 1990. Exemplary PCR reaction conditions typically comprise either two or three step cycles. Two step cycles have a denaturation step followed by a hybridization/elongation step. Three step cycles comprise a denaturation step followed by a hybridization step followed by a separate elongation step.
[00075] "Long PCR" refers to the amplification of a DNA fragment of 5 kb or longer in length. Long PCR is typically performed using specially-adapted polymerases or polymerase mixtures (see, e.g., U.S. Pat. Nos. 5,436,149 and 5,512,462) that are distinct from the polymerases conventionally used to amplify shorter products.
[00076] A "primer" refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid and serves as a point of initiation of nucleic acid synthesis. Primers can be of a variety of lengths and are often less than 50 nucleotides in length, for example 12- 30 nucleotides, in length. The length and sequences of primers for use in PCR can be designed based on principles known to those of skill in the art, see, e.g., Innis et al., supra.
[00077] The term "polymerase primer/template binding specificity" refers to the ability of a polymerase to discriminate between correctly matched primer/templates and mismatched primer templates. An "increase in polymerase primer/template binding specificity" in this context refers to an increased ability of a variant modified fusion polymerases of the invention to discriminate between matched primer/template in comparison to a wildtype polymerase fusion protein.
[00078] An "improved polymerase" includes both, a modified polymerase and/or sequence-non-specific double-stranded DNA binding domain joined to the polymerase or polymerase domain.
[00079] A "modified DNA polymerase" or modified DNA polymerase variant or DNA polymerase variant refers to a DNA polymerase, comprising one or more mutations that modulate one or more activities of the DNA polymerase including, but not limited to, DNA polymerization activity, base analog detection activities, 3 '-5' or 5 '-3' exonuclease activities, processivity improved nucleotide analog incorporation activity, proofreading, fidelity, efficiency, specificity, thermostability and intrinsic hot start capability or decreased DNA polymerization at room temperature, decreased amplification slippage on templates with trinucleotide repeat stretches or homopolymeric stretches, decreased amplification cycles, decreased extension times, reduced sensitivity to inhibitors (e.g., high salt, nucleic acid purification reagents), altered optimal reaction conditions (e.g., pH, KCL) and a decrease in the amount of polymerase needed for the applications described.
[00080] PCR "sensitivity" refers to the ability to amplify a target nucleic acid that is present in low concentration. "Low concentration" refers to 10.sup.4, often 10.sup.3;
10.sup.2, lO.sup.l, or fewer, copies of the target sequence per microliter in the nucleic acid sample to be amplified.
[00081] The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers.
[00082] The term "selectable marker" means a polynucleotide segment that allows one to select for or against a molecule or a cell that contains it, often under particular conditions. These markers can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions. Examples of Selectable markers include but are not limited to: (1) DNA segments that encode products which provide resistance against otherwise toxic compounds (e.g., antibiotics); (2) DNA segments that encode products which are otherwise lacking in the recipient cell (e.g., tRNA genes, auxotrophic markers); (3) DNA segments that encode products which suppress the activity of a gene product; (4) DNA segments that encode products which can be readily identified (e.g., phenotypic markers such as beta-galactosidase, green fluorescent protein (GFP), and cell surface proteins); (5) DNA segments that bind products which are otherwise detrimental to cell survival and/or function; (6) DNA segments that otherwise inhibit the activity of any of the DNA segments described in Nos. 1-5 above (e.g., antisense oligonucleotides); (7) DNA segments that bind products that modify a substrate (e.g. restriction endonucleases); (8) DNA segments that can be used to isolate a desired molecule (e.g. specific protein binding sites); (9) DNA segments that encode a specific nucleotide sequence which can be otherwise non-functional (e.g., for PCR amplification of subpopulations of molecules); and/or (10) DNA segments, which when absent, directly or indirectly confer sensitivity to particular compounds. [00083] Sequence identity can be determined by aligning sequences using algorithms, such as BESTFIT, FAST A, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.), using default gap parameters, or by inspection, and the best alignment (i.e., resulting in the highest percentage of sequence similarity over a comparison window). Percentage of sequence identity is calculated by comparing two optimally aligned sequences over a window of comparison, determining the number of positions at which the identical residues occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of matched and mismatched positions not counting gaps in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. Unless otherwise indicated the window of comparison between two sequences is defined by the entire length of the shorter of the two sequences.
[00084] The term "translation" refers to the process by which a polypeptide is synthesized by a ribosome 'reading' the sequence of a polynucleotide.
[00085] "Thermally stable polymerase" as used herein refers to any enzyme that catalyzes polynucleotide synthesis by addition of nucleotide units to a nucleotide chain using DNA or RNA as a template and has an optimal activity at a temperature above 45°C, or retains at least 50% of its activity at elevated temperatures, for example above 95 °C.
[00086] "Thermostable" refers to an enzyme which is stable to heat, is heat resistant, and functions at high temperatures, e.g., 50 to 100°C as compared, for example, to a non- thermostable form of an enzyme with a similar activity. For example, a thermostable nucleic acid polymerase derived from thermophilic organisms such as P. furiosus, M. jannaschii, A. fulgidiis or P. horikoshii are more stable and active at elevated temperatures as compared to a nucleic acid polymerase from E. coli. A representative thermostable nucleic acid polymerase isolated from P. furiosus (Pfu) is described in Lundberg et al., 1991, Gene, 108:1-6.
Additional representative temperature stable polymerases include, e.g., polymerases extracted from the thermophilic bacteria Thermus fl vus, Thermus ruber, Thermus (hemophilus, Bacillus stearothermophilus (which has a somewhat lower temperature optimum than the others listed), Thermus lacteus, Thermus rubens, Thermotoga maritima, or from thermophilic archaea Thermococcus litoralis, and Methano thermusfervidus.
[00087] A "temperature profile" refers to the temperature and lengths of time of the denaturation, annealing and/or extension steps of a PCR or cycle sequencing reaction. A temperature profile for a PCR or cycle sequencing reaction typically consists of 10 to 60 repetitions of similar or identical shorter temperature profiles; each of these shorter profiles may typically define a two-step or three-step cycle. Selection of a temperature profile is based on various considerations known to those of skill in the art, see, e.g., Innis et al., supra. In a long PCR reaction as described herein, the extension time required to obtain an amplification product of 5 kb or greater in length is reduced compared to conventional polymerase mixtures.
[00088] A "template" refers to a double stranded polynucleotide sequence that comprises the polynucleotide to be amplified, flanked by primer hybridization sites. Thus, a "target template" comprises the target polynucleotide sequence flanked by hybridization sites for a 5' primer and a 3' primer.
[00089] The term "vector" or "DNA vector" or "gene transfer vector" refers to a polynucleotide that is used to perform a "carrying" function for another polynucleotide. For example, vectors are often used to allow a polynucleotide to be propagated within a living cell, or to allow a polynucleotide to be packaged for delivery into a cell, or to allow a polynucleotide to be integrated into the genomic DNA of a cell. A vector may further comprise additional functional elements, for example it may comprise a transposon.
[00090] For purposes of classifying amino acids substitutions as conservative or nonconservative, amino acids are grouped as follows: Group I (hydrophobic side chains): met, ala, val, leu, ile; Group II (neutral hydrophilic side chains): cys, ser, thr; Group III (acidic side chains): asp, glu; Group IV (basic side chains): asn, gin, his, lys, arg; Group V (residues influencing chain orientation): gly, pro; and Group VI (aromatic side chains): tip, tyr, phe. Conservative substitutions involve substitutions between amino acids in the same class. Non-conservative substitutions constitute exchanging a member of one of these classes for a member of another.
[00091] Positions in a variant sequence are assigned the same numbers as the aligned positions in corresponding reference sequence.
DESCRIPTION
SUBSTITUTION SELECTION METHODS
[00092] There are multiple methods for identifying amino acid residues within a protein whose substitution may modify the protein's function. One class of methods is alignment-based, and involves aligning a set of sequences, including naturally occurring sequences. This alignment may be used to derive a phylogenetic relationship between the sequences. This relationship can also be used to calculate conservation properties and for each amino acid at each position in the protein. A second class of methods is structural, in which substitutions are tested computationally for their effect upon a known or calculated protein structure. All these methods yield quantitative measures of the predicted favorability of replacing an amino acid with a different amino acid. The favorabilities predicted by one method may be different from the favorabilities predicted by a different method, so it is often desirable to combine the results from different methods.
[00093] In one example of an alignment-based method, amino acid residue positions in a reference sequence are compared with the same position in an alignment of homologous sequences. Positions that exhibit a high degree of variance in homologs may have a high probability that substitutions at such positions will be active. One method of calculating the degree of amino acid variance is described by Gribskov (1987) Proc Natl Acad Sci USA 84, 4355. In some instances, a sequence alignment can serve as the basis of a Hidden Markov model that can be used to calculate the probability that one specific residue will be followed by a second specific residue. These models also include probabilities for gaps and insertions. See, rogh, "An introduction to Hidden Markov models for biological sequences," in Computational Methods in Molecular Biology, Salzberg et al., eds, Elsevier, Amsterdam. Such models can be used to calculate the probability that a particular substitution will be functional.
[00094] In another example of an alignment-based method, substitutions may be identified based upon the consensus sequence of the alignment.
[00095] Homologous sequences are often analogous functionally and structurally, although having been subjected separately to different selective pressures they are also likely to be optimized differently. Amino acids that differ between homologous sequences thus provide a guide to substitutions that are likely to yield functional though different proteins. Alignment of homologous sequences can therefore be used to identify candidate substitution positions. In one approach, homologous protein sequences may be aligned (e.g., by using clustalw; Thompson et al. (1994) Nucleic Acids Res 22: 4673-80) and then a phylogenetic tree reconstructed. Conservation indices can then be calculated for each site (e.g., Dopazo (1997) Comput Appl Biosci 13: 313-7) and the information content calculated for each site (e.g., Zhang (2002) J Comput Biol 9: 487-503). These scores can be exhaustively calculated for every position in the protein. The scores reflect the extent of tolerance to substitutions in the protein at each position. The scores can be normalized using the phylogenetic tree to eliminate bias in the homolog sequences found in databases (for e.g. ease of access to certain template DNAs results in sequences from certain class of organisms dominates the database.) Scores for a given alignment can also be normalized to have an average value of 0.0 and a standard deviation of 1.0, or other standard procedures can be used to compare and combine scores from multiple methods. These values can then be used directly as a score. For example, all sites with a score above a certain threshold value can be selected, or all sites with a score below a certain threshold value can be eliminated. Alternatively, the most variable (e.g., least conserved) sites can be selected by ranking the sites in order of these scores, or the least variable (e.g., most conserved) sites can be eliminated by ranking the sites in order of these scores.
[00096] Amino acid diversity and tolerance at each site can be measured as a fitness property of each amino acid at every location. The most fit residue for that position carries a higher value (e.g., Koshi et al. (2001) Pac Symp Biocomput 1 1-202; O. Soyer, .W.
Dimmic, R.R. Neubig, and R.A, Goldstein; Pacific Symposium on Biocomputing 7:625-636 (2002). Sites can be grouped into site-classes or treated independently. Sites and site classes most fit to change based on the substitution rate and the substitutions most favorable based on the fitness can be selected. These values of fitness may then be used directly as a score. All sites with a score above a certain threshold value could then be selected, for example, a cutoff (threshold) of 0.0 can be chosen (when the normalization of scores sets the wild type residue found in the reference to be 0.0). Alternatively, all sites with a score below a certain threshold value could be eliminated. Threshold values of 0.0 or below can be eliminated, thereby only including amino changes that have a higher fitness value that the reference wild type amino acid found in that position. Alternatively, the sites most tolerant to change could be selected by ranking the sites in order of these scores. For example, in the study of G- protein coupled receptors (GPCR) by Soyer et. al. (O. Soyer, M.W. Dimmic, R.R. Neubig, and R.A. Goldstein; Pacific Symposium on Biocomputing 7:625-636 (2002)), using the 8-site class model the class #8 was identified to have the highest substitution rate and the property correlating with fitness of amino acids at these positions was identified to be "charge transfer" propensity of the amino acid. In the present invention, amino acids in the sites that carry a higher relative fitness compared to the wild type amino acid found in that position are identified as suitable for substitution.
[00097] Another example of an alignment based model is the use of a substitution matrix. A substitution matrix represents the probability of one amino acid being replaced by a second amino acid across a set of positions within a set of proteins. The matrix can be expressed in terms of probabilities or values derived from probabilities by mathematical transformation involving probabilities of transitions or substitutions (Pij) and observed frequencies of amino acids(Fi). Matrices using such transformation include scoring matrices like PAM100, PAM250, and BLOSUUM etc. Substitution matrices are derived from pairwise alignments of protein homoiogs from sequence databases. They constitute estimates of the probability that one amino acid will be changed to another while conserving function. Different substitution matrices are calculated from different sets of sequences. For example, they can be based on the structural environment of a residue (Overington (1992) Genet Eng (N Y) 14; 231-49.; and Overington et al., (1992) Protein Sci 1: 216-26.) or on additional factors including secondary structure, solvent accessibility, and residue chemistry (Luthy et al. (1992) Nature 356: 83-5). Different substitution matrices are calculated by selecting the protein families used to create the matrix, as well as the positions considered. A substitution matrix that best captures the observed sequences in the protein family of interest can be calculated using the Bayesian method developed by Goldstein et al. (Koshi et al. (1995) Protein Eng 8: 641-645) and used to score all candidate substitutions. A substitution or a scoring matrix can be calculated by considering homologous proteins from many different protein families (e.g., Benner et al. (1994) Protein Eng 7: 1323-1332; and Tomii et al. (1996) Protein Eng 9: 27-36) can be used to score all candidate substitutions. Matrices derived from a variety of protein are often used to evaluate and confirm homology of protein sequences and represent an approximation of protein evolution in general.
[00098] Another example of an alignment based model is the reconstruction of ancestral sequences. Evolutionary relationships between homologous sequences can be derived in the form of phylogenetic trees. Using evolutionary models, ancestral sequences can probabilistically reconstructed. See, for example, Koshi and Goldstein (1996) MoL Evol. 42, 313-320. Coupled with knowledge of functions of proteins, evolutionary analysis will also identify amino acid changes that occur in functionally distinct groups. See, for example, Zhang and Rosenberg (2002) PNAS 99, 5486-5491. Comparison of the rates of synonymous (Ks) versus non-synonymous substitutions (Ka) can also be used to quantify (e.g., using Ka Ks ratio) the type and degree of evolutionary constraint on substitutions. See, for example, Benner et al. (2000) Res Microbiol 151, 97-106. Here, Ka/Ks >1 means adaptive evolution and a/ s <0.1 is observed for purifying selection. Methods to detect positive selection at single amino sites in order to infer residues critical for adaptation to new functions can also be applied. See, for example, Suzuki and Gojobori (1999) Mol. Biol. Evol. 16, 1315-1328. Together these analyses allow for the identification of functionally important conservations and changes, even those distant from an active site. Consider the case in which the function of a protein is dependent on the fact that the identity of a residue at a particular position in the protein is not altered. In such instances, the codon for this residue in the gene for the protein will tend to encode the same amino acid throughout the phylogenetic tree (synonymous substitutions, high Ks). On the other hand, when the function of a protein is capable of tolerating different amino acids at a particular position, then alterations of the corresponding codon in the gene will more frequently encode different amino acids throughout the phylogenetic tree (non-synonymous substitutions, Ka comparable with Ks). Thus, the ratio of frequency with which a site is replaced by a synonymous codon to the frequency with which it is replaced by a non-synonymous codon in a reconstructed phylogenetic tree provides a measure of the selective pressure (on the function of the protein) acting to conserve the identity of the amino acid at that position. Often these ratios are calculated as averages for entire sequences. However, such ratios can also be limited to specific sites or groups of positions. These ratios can also be used to weight substitutions identified by other methods from a specific homolog.
[00099] Another example of an alignment-based method for identifying amino acid substitutions that are most important in differentiating protein function is a dimension- reducing technique such as principal component analysis. This has been previously described (e.g., Casari et al, 1995, Nat Struct Biol 2: 171-178; Gogos et al, 2000, Proteins 40: 98-105; and del Sol Mesa et al, 2003, J Mol Biol 326: 1289-1302). PCA can identify sequence features and substitutions corresponding to the desired phenotype of the protein and scores "loads" for these features in the direction of desired phenotype are used as absolute scores or as filters to identify substitutions.
[000100] In an example of a structure-based method, the structures of many proteins and their variants are also available in the RCSB protein data bank ((2002) Acta Cryst. D 58 (6:1), pp. 899-907); and Structural Bioinformatics (2003); P. E. Bourne and H. Weissig, Hoboken, NJ, John Wiley & Sons, Inc. pp. 181-198. The availability of structures can help identify amino acid changes that affect protein function. One way in which they can be used to do so is to avoid changes to a protein that will not be structurally tolerated. Changes computed in-silico using energy functions and force fields correlate with experimentally measured free energy changes in the stabilities of proteins. See, for example, Privalov et al. (1988) Adv Protein Chem 39: 191-234; Lee (1993) Protein Sci 2: 733-8; Freire (2001) Methods Mol Biol 168: 37-68; and Guerois et. al (2002) J Mol Biol 320: 369-87).
Therefore, candidate amino acid changes can be computationally modeled in the structure(s) and changes in the free energy computed. These computationally calculated changes in free energies resulting from the substitutions can then be used directly as a score. Alternatively, all changes can be selected that increase the free energy of the protein by less than a certain value. For example, all changes that would increase the free energy by less than lkCal/mol can be selected, all changes that would increase the free energy by less than 1.5 kCal/mol can be selected, all changes that would increase the free energy by less than 2kCaI/moI can be selected, or all changes that would increase the free energy by less than 2.5kCal/mol can be selected.
[000101] In another structure-based method, multiple changes can be modeled into the structure(s) computationally and changes in the free energies resulting from the substitutions computed. These free energy values can be used to identify changes that are "valid" independently, but not together. Amino acid changes that are independent can be selected preferentially. Amino acid clashes that yield a higher free energy when compared to the free energies produced by modeling changes separately can be eliminated.
[000102] In another structure-based method, regions of the protein that differ structurally between homologs are considered more likely to tolerate change, while those regions that are structurally conserved are likely to be less tolerant. Structures can be directly obtained from the database or predicted using various structure modeling software packages. Structures of homologs and mutants can be superposed on the wild type structure. See, for example, May et al. (1994) Protein Eng 7: 475-85; and Ochagavia et al (2002) Bio informatics 18: 637-40). Structural conservation can be calculated as the root mean squared (RMS) deviations of the backbones of the superposed chains. This can be computed as the deviations of individual residues, or more preferably as the deviations of a running average stretch of the backbone (for example from two to ten backbone residues) between the target protein and one or more homologous proteins. These computationally calculated RMS deviations for every position between homologous structures can then be used directly as a score. In one example, RMS deviations between the alpha carbons (or backbone atoms) in the structure of the target protein and one or more homologous proteins that are greater than a threshold value can be considered structurally labile and these sites can be selected. This threshold RMS deviation between homologous structures can be greater than 2A, 2.5A, 3A, 3.5A, 4A, 4.5A, 5A. Alternatively, RMS deviations between the alpha carbons in the structure of the target protein and one or more homologous proteins that are less than a threshold value can be considered structurally conserved and these sites can be eliminated. This threshold RMS deviation between homologous structures can be less than 2A, 2.5 A, 3 A, 3.5A, 4A, 4.5A, or 5A.
[000103] In another structure-based method, changes near catalytic and binding sites are highly likely to influence the activity of the protein and are good candidates for substitution. All amino acid substitutions that are found in one or more homologs can be tested for proximity to a binding or catalytic or regulatory site of the protein. The distance between an amino acid substitution from a binding or catalytic or regulatory site, in one or more homologs, can be used directly as a score. Alternatively, all amino acid substitutions that are found in one or more homologs and that are within a threshold distance of a binding or catalytic or regulatory site in the protein can be selected. This threshold distance can be less than 2k, 2.5A, 3A, 3.5 A, 4A, 4.5A, 5k, 5.5k, 6k, 6.5k, 7k. Alternatively, all amino acid substitutions that are found in one or more homologs and that are beyond a threshold distance of a binding or catalytic or regulatory site in the protein can be eliminated. This threshold distance can be more than 2A, 2.5A, 3A, 3.5A, 4A, 4.5 A, 5A, 5.5 A, 6k, 6.5k, or 7 A. In still other alternatives, all amino acid substitutions that are found in one or more homologs can be ranked in order of proximity to a binding or catalytic or regulatory site in the protein and those that are closest to the binding or catalytic or regulatory site. For example, the substitution closest to the binding or catalytic or regulatory site can be selected, or between 2 and 20, between 10 and 100, or the top 200 substitutions closest to the binding or catalytic or regulatory site can be selected. In still other alternatives, all amino acid substitutions that are found in one or more homologs can be ranked in order of proximity to a binding or catalytic or regulatory site in the protein and those that are farthest from the binding or catalytic or regulatory site eliminated. For example, the substitution farthest from the binding or catalytic or regulatory site can be eliminated. In some embodiments, between 2 and 20, between 10 and 100, or the top 200 substitutions farthest from the binding or catalytic or regulatory site can be eliminated.
[000104] A preferred method uses infologs to overcome limitations in currently used strategies based on empirical screening of libraries of designed random mutants. Infologs are designed variants of a given gene, for example a polymerase, where substitutions are systematically incorporated to achieve high information content enabling modern machine learning tools to de-convolute sequence-activity relationships. The use of infologs is the basis of our rational approach to protein engineering in which a matrix of well-defined amino acid substitutions is used to map the targeted fitness landscape. Using this approach, libraries of < 100 mutants (typically 96) are characterized for property/activities of interest and serve as a basis for machine-learning tools to design the next generation of infologs. The initial set of infologs are designed to have the same number of substitutions (approximately 3), thereby probing regions at the same hamming distance from the reference locus in sequence space. Substitutions in the mutants are selected from a pool of substitutions and each set of infologs contains several variants with the same substitution albeit in presence of two completely different mutations, thereby providing us with the ability to characterize the amino acid change with respect to its additivity and context dependence. The functional consequences of individual substitutions can be modeled and quantitatively evaluated. For example, an infolog library based on 59 amino-acid substitutions in a tau class glutathione transferase (GST) from wheat afforded increased activity against most of a number of herbicides tested (Govindarajan et al, 2015).
SELECTED POLYMERASE SUBSTITUTIONS
[000105] SEQ ID NO: 128971 was used to identify homologs from Genbank database of non-redundant protein sequences using the BLAST program. The list of homologs used for identifying substitutions may not be limited to these homologs. The multiple sequence alignment of all homologs and SEQ ID NO: 1 was obtained using the clustalw program and a phylogenetic tree was constructed. The alignment was used to enumerate possible changes that can be made to SEQ ID NO: 1 that are seen in the alignments. DNA polymerase substitutions (relative to SEQ ID NO: 1) identified using combinations of the methods described above are shown in Tables 1 and 2. These changes were then scored based on the pattern of convergence and divergence on the tree and ranked for adaptability score (Liao et al., 2007, Ehren et al. 2008). The top 57 substitutions were chosen to be included in the first Infolog set. For the second set of infologs, model weights from the first Infolog set (Table 5) were used to select 24 substitutions. For the third Infolog set, using SEQ ID NO: 117 (198576 R2V19) identified in round 2 was used as base sequence on which a subset of 10 substitutions from round 2 infolog set based on model weights (Table 5) and an additional 25 substitutions selected from Tables 1 and 2 were used to build variants (Table 4) with one or more combinations of these selected substitutions.
[000106] DNA polymerase variants were synthesized to incorporate the amino acid substitutions shown in Table 3. Specific activities of the synthesized polymerase variants were determined experimentally. The specific activities were individually modeled as a function of the substitutions by linear regression. The results provide relative weights of each substitution for activity and other properties such as thermostability and amplification of >3 kb target DNA (Table 5). Mutations that showed positive weights for selected activity profiles were used in different combinations with other substitutions to construct a library of polymerase variants. Sets of variants were designed to incorporate selected substitutions a uniform number of times. The substitutions were also distributed within a variant set so that the number of unique pairs of substitutions is high, that ensures that different substitutions are tested in a wide variety of contexts. The specific combinations of substitutions in the 3 variant sets (relative to SEQ ID NO: I) are shown in Table 4. Amino acid sequences are given as SEQ ID NOS: 3-242.
[000107] A nucleic acid encoding a non-natural polymerase, wherein the polymerase has at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% sequence identity with SEQ ID NO: 190 retaining the combination of substitutions specified in Table 4 present in SEQ ID NO: 190 or any subset thereof. Additionally, modified polymerases, other than natural sequences, comprising one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more of the substitutions shown in Table 1, Table 2 or Table 3 may also be used.
[000108J The invention thus provides non-naturally occurring polymerases having the sequence of SEQ ID NO:l modified by one or more of the substitutions shown in Table 3 and having zero to ten (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) substitutions, deletions or internal substitutions at positions other than those shown in Table 3. The polymerases encoded by such polynucleotides can have at least, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or up to all of the substitutions present in Table 3. Preferably any modifications at positions other than those shown in Table 3 are conservative substitutions having no significant effect on polymerase activity (i.e., activity indistinguishable from an otherwise identical polymerase without the substitution within experimental error). Non-naturally occurring polymerases encoded by such polynucleotides are also provided.
[000109] The invention further provides polynucleotides encoding any of polymerase having the sequence of SEQ ID NO:3-242 or variants having for example at least 85%, 90%, 95%, 99% or 100% identity therewith provided that the combination of substitutions present in the SEQ ID NO. in question specified in Table 4 is retained. Non-naturally occurring polymerases encoded by such polynucleotides are also provided. Any variations present in a sequence other than the substitutions shown in Table 4 for that sequence are preferably conservative substitutions not significantly affecting activity or substitutions shown in Table 3. The substitutions present in preferred polymerase having the sequence of SEQ ID NO: 159 are M354I, L438I, D468G and A494S. The substitutions present in preferred polymerase having the sequence of SEQ ID NO:I90 are Y342I, L438I and D468G.
[000110] The above mentioned polymerases preferably have an enhanced property relative to a base polymerase having the sequence of SEQ ID NO:l measured under the same conditions. The enhanced property can be at least one of enhanced polymerase activity, enhanced reverse transcriptase activity, enhanced thermostability, enhanced strand- displacement activity, or any combination thereof, including all of these properties. An activity is considered enhanced if the change is beyond experimental error (in other words, statistically significant). The enhancement if quantifiable can be for example at least 10% to 1000%, including for example 10-500%, 10-200%, 10-100%, 20-1000%, 20-500%, 20- 100%, and 20-50%.
[000111] Modified polymerases (e.g., SEQ ID NOS: 3-242) fused to various accessory proteins called processivity factors may also be used. Processivity factors assist polymerases in various ways— some by forming complexes with the polymerase itself (for example, thioredoxin), and some by encircling duplex DNA (for example, clamp protein)— thereby ensuring a strong, stable binding with the template (Zhuang and Ai 2010). A stable association of the polymerase with the template DNA is crucial for the unfettered incorporation of nucleotides, and therefore, for an efficient PCR reaction. One strategy demonstrating enhanced processivity and improved PCR efficiency is the approach developed and patented by Wang in 2000 and published in 2004 (Wang et al. 2004), where the nonreplicative DNA polymerase was covalently linked to the sequence-nonspecific dsDNA binding protein called Sso7d, obtained from Sulfolobus solfataricus. Sso7d is a small protein (7 kD) capable of covalently binding to dsDNA without any preference for specific sequences. The binding of the Sso7d domain to a DNA polymerase is optimal to smoothly slide along the template. The covalent linking of the fusion protein with the polymerase does not entail any structural modifications, therefore the fusion does not interfere with the structural integrity and thermal stability and consequently, with the catalytic activity of the enzyme. Sso7d can be linked to both family A and family B DNA polymerases and can bind with dsDNA at ambient temperature as well as high temperatures. It can therefore help enzymes that are thermostable or otherwise. Studies carried out using the Sso7d fusion polymerase have demonstrated that the fusion protein technique improves processivity without affecting the catalytic activity or thermal stability of the enzyme.
[000112] Processivity factors that help prevent non-specific amplifications, for example use of aptamers or antibodies fused to the modified polymerase (SEQ ID NOS: 3-242) are also used. The aptamer used is attached to the reactive site of the modified polymerase, and thus is inactive at room temperature. When the temperature of the reaction solution is increased to high temperature, the three-dimensional structure of the aptamer is modified so that it is separated from polymerase so as to be active, and the reverse transcription reaction of a specifically primed target RNA can be performed, and afterwards PCR can be performed.
[000113] Other processivity factors comprise sequences encoding polymerase fused to certain protein functional domains. Such protein functional domains can include, but are not limited to, one or more DNA binding domains, one or more nuclear localization signals, one or more flexible hinge regions that can facilitate one or more domain fusions, and combinations thereof. Fusions can be made either to the N-terminus, C-terminus, or internal regions of the polymerase protein so long as polymerse activity is retained. DNA binding domains used can include, but are not limited to, a helix-turn-helix domain, Zn-fmger domain, a leucine zipper domain, or a helix-ioop-helix domain. Specific DNA binding domains used can include, but are not limited to, a Gal4 DNA binding domain, a LexA DNA binding domain, or a Zif268 DNA binding domain. Nuclear localization signals (NLS) used can include, but are not limited to, consensus NLS sequences, viral NLS sequences, cellular NLS sequences, and combinations thereof. Flexible hinge regions used can include, but are not limited to, glycine/serine linkers and variants thereof.
[000114] A polymerase, other than a naturally occurring polymerase, whose sequence comprises a polypeptide with at least 85%, at least 86 at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% identity to SEQ ID NO: 190 provided the combination of substitutions shown in Table 4 for SEQ ID NO: 190 or a subset thereof is retained. Modified DNA polymerase combines a function of polymerizing DNA using RNA as a template and a function of polymerizing DNA using DNA as a template. The modified polymerases are active on both DNA and RNA templates and can generate amplicons from any source, for example DNA or RNA from bacterial, human or murine hosts and more.
POLYMERASE PROPERTIES
[000115] We have identified substitutions predicted to improve polymerases, for example improved amplification properties and/or by modulating activity that includes both increased processivity compared to an unmodified polymerase and improved primer/template binding specificity. The intrinsic high processivity can result in significant improvements to yield, sensitivity, speed, target length, and the ability to amplify difficult template, for example AT- and GC-rich templates. Improved primer/template binding specificity can show improved amplification of low abundance template and lower non-target competition. Increased processivity of the identified polymerases can show improved amplification properties for long PCR templates, for example ability to amplify 2 kb to 50 kb DNA templates generating long PCR amplicons. In addition, the identified polymerases can show a lower error rate than wild-type Taq polymerase with the same or better performance as existing enzyme. The identified polymerases can show higher thermostability without any additives.
[000116] A modified polymerase often exhibits an increase in primer/template specificity in comparison to an unmodified polymerase comprising a wildtype sequence. Primer/template specificity is the ability of an enzyme to discriminate between matched primer/template duplexes and mismatched primer/template duplexes. Specificity can be determined, for example, by comparing the relative yield of two reactions, one of which employs a matched primer, and one of which employs a mismatched primer. An enzyme with increased discrimination has a higher relative yield with the matched primer than with the mismatched primer, i.e., the ratio of the yield in the reaction using the matched primer vs. the reaction using the mismatched primer is about 1 or above. This ratio can then be compared to the yield obtained in a parallel set of reactions employing a modified polymerase. The modified polymerase(s) typically exhibit at least a 2-fold, often 3 -fold or greater increase in the ratio relative to a wild type polymerase.
[000117] Specificity can also be measured, for example, in a real-time PCR, where the difference in the Ct (threshold cycle) values (.DELTA.C.sub.t) between the fully
complementary primer/template and the mismatched primer/template can be used to measure primer/template binding specificity of different enzymes. The Ct value represents the number of cycles required to generate a detectable amount of DNA (a "detectable" amount of DNA is typically 2 times, usually 5 times, 10 times, 100 times or more above background). A polymerase with enhanced specificity may be able to produce a detectable amount of DNA in a smaller number of cycles by more closely approaching the theoretical maximum amplification efficiency of PCR. Accordingly, a lower Ct value reflects a greater
amplification efficiency for the enzyme.
[000118] Polymerase processivity can be measured by a variety of methods known to those of ordinary skill in the art. Polymerase processivity is generally defined as the number of nucleotides incorporated during a single binding event of a modifying enzyme to a primed template. For example, a 5'FAM-labeled primer is annealed to circular or linearized ssM13 mpl8 DNA to form a primed template. In measuring processivity, the primed template usually is present in significant molar excess to the polymerase so that the chance of any primed template being extended more than once by the polymerase is minimized. The primed template is therefore mixed with the polymerase at a ratio such as approximately 4000:1 (primed DNA: DNA polymerase) in the presence of buffer and dNTPs. MgCb is added to initiate DNA synthesis. Samples are quenched at various times after initiation, and analyzed on a sequencing gel. At a polymerase concentration where the median product length does not change with time or polymerase concentration, the length corresponds to the processivity of the enzyme. The processivity of a protein of the invention, i.e., a modified polymerase, is then compared to the processivity of the wild-type enzyme (an unmodified polymerase). The modified polymerases of the invention are expected to exhibit increased processivity relative to the unmodified polymerase.
[000119] Enhanced efficiency can also be demonstrated by measuring the increased ability of an enzyme to produce product Such an analysis measures the stability of the double-stranded nucleic acid duplex indirectly by determining the amount of product obtained in a reaction. For example, a PCR assay can be used to measure the amount of PCR product obtained with a short, e.g., 12 nucleotides in length, primer annealed at an elevated temperature, for example, 50°C. In this analysis, enhanced efficiency is shown by the ability of a modified polymerase to produce more product in a PCR reaction using the 12-nucleotide primer annealed at 50°C in comparison to an unmodified polymerase.
[000120] Long PCR may be used as another of demonstrating enhanced efficiency. For example, an enzyme with enhanced efficiency typically allows the amplification of a long ampiicon (>5 kb) in a shorter extension time compared to an enzyme with relatively lower efficiency.
[000121] Assays such as salt sensitivity can also be used to demonstrate improvement in efficiency of a processive nucleic acid modifying enzyme of the invention. A modified polymerase of the invention can exhibit increased tolerance to high salt concentrations, i.e., a processive enzyme with increased processivity can produce more product in higher salt concentrations. For example, a PCR analysis can be performed to determine the amount of product obtained in a reaction using a modified polymerase compared to an unmodified polymerase in reaction conditions with high salt, for example, 80 mM.
[000122] The fidelity of DNA polymerase refers to its ability to accurately replicate a template. High-fidelity PCR utilizing modified DNA polymerase variants that couple low misincorporation rates with proofreading activity to give faithful replication of a DNA target are also contemplated. Modified DNA polymerase variants selected from SEQ ID NOS: 3- 242 that show high fidelity and proofreading activity are claimed. Additional variants with proofreading activity and low misincorporation rates are also contemplated.
[000123] Other methods of assessing enhanced efficiency of the improved polymerases of the invention can be determined by those of ordinary skill in the art using standard assays of the enzymatic activity of a given modification enzyme.
[000124] Variants of SEQ ID NO: 1, that is modified polymerases comprising one or more substitution shown in Table 1 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates (DNA or RNA) from human cells. Modified polymerase variants comprising one or more substitution shown in Table 2 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells. Modified polymerase variants comprising one or more substitution shown in Table 3 are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells. Modified polymerase variants comprising one or more combinations of substitutions from Table 3 relative to SEQ ID NO: 1 are shown in Table 4 and are anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells. DNA polymerase sequences SEQ ID NO: 3-242 are anticipated to possess improved properties conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells. A DNA polymerase having at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, sequence identity to polymerase of SEQ ID NO: 190 retaining each of the substitutions present in SEQ ID NO: 190 shown in Table 4 or any subset thereof, is anticipated to possess improved properties, relative to naturally occurring DNA polymerases, conferring higher activity in amplifying sequences from difficult to amplify templates including polynucleotide templates from human cells.
KITS
[000125] The modified polymerases may be included in vectors and host cells comprising such vectors. The modified polymerases may be provided as a vector or a purified protein as a component of a kit. Kits may include other components such as buffers, salts, primers and may be used for applications such as amplification, e.g., PCR and/or sequencing (e.g., next-generation sequencing (NGS) platforms).
[000126] Modified polymerases may be used for quantification and quality assessment of human genomic DNA samples prior to NGS library construction. The kit includes a qPCR Master mix with the modified polymerase, optimized for high-performance SYBR Green I- based qPCR, pre-diluted set of DNA standards and primer premixes targeting different portions of a highly conserved single-copy human locus. Absolute quantification is achieved with the primer pair defining the shortest fragment, whereas the additional primers are used to derive information about the amount of amplifiable template in the DNA sample. Quality scores (or Q-ratios) generated with the kit may be used to predict the outcome of library construction, or tailor workflows for samples of variable quality, particularly formalin-fixed paraffin-embedded (FFPE) DNA, samples obtained by laser-capture microdissection of fresh, frozen, or FFPE tissue, DNA extracted from cells collected by flow cytometry, free circulating DNA from plasma or serum, forensic samples or any other low-concentration or precious clinical sample.
[000127] Next-generation sequencing (NGS) library amplification kits contain the modified polymerase due to its ability to amplify complex DNA populations with high fidelity, high efficiency, decreased PCR duplication rates and very low bias. Other applications that demand precise control over library amplification using a uracil-tolerant variant polymerase may be used for amplification of libraries constructed from bisulfite- treated DNA.
[000128] Optimally formulated modified polymerase may be used for DNA library preparation. Modified polymerases that enable higher yields and lower amplification bias translates to higher library diversity, lower duplication rates and more uniform coverage.
[000129] A method comprising amplifying sequences from a target polynucleotide, for example, polymerase chain reaction (PCR) or reverse transcription using a modified polymerase. The polymerase may be selected from SEQ ID NOS: 3-242, in particular SEQ ID NO: 190. Additionally, kits comprising modified polymerases, other than natural sequences, comprising one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more of the substitutions shown in Table 1, Table 2 or Table 3 may also be used. Amplifying sequences from a target polynucleotide using modified polymerases, other than a natural sequence, that have at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% sequence identity to polymerase of SEQ ID NO: 190 is also
contemplated. The target polynucleotide may be DNA or RNA. The target polynucleotide may be from a human cell or a murine cell.
[000130] All patent filings, websites, other publications, accession numbers and the like cited above or below are incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference. If different versions of a sequence are associated with an accession number at different times, the version associated with the accession number at the effective filing date of this application is meant. The effective filing date means the earlier of the actual filing date or filing date of a priority application referring to the accession number if applicable. Likewise if different versions of a publication, website or the like are published at different times, the version most recently published at the effective filing date of the application is meant unless otherwise indicated. Any feature, step, element, embodiment, or aspect of the invention can be used in combination with any other unless specifically indicated otherwise. Although the present invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
EXAMPLES
[000131] The following examples are intended to illustrate the methods, compositions and kits disclosed herein and should not be construed as limiting in any way. Various equivalents will be apparent to one skilled in the art from the following examples; such equivalents are also contemplated to be part of the invention disclosed herein.
CHARACTERIZATION OF POLYMERASE VARIANTS
[000132] We have identified modified polymerases with improved amplification properties, high processivity, primer/template affinity and thermostability. DNA polymerase substitutions were identified by the methods described in Section 5.2.1 above and synthesized to incorporate the substitutions as described in section 5.2.2 above. Desired polymerase attributes, for example improved amplification properties, high processivity, primer/template affinity and thermostability were measured by PCR, NGS library amplification and thermostability assays as described below. High reverse transcriptase activity was measured by standard reverse transcriptase assay using poly(rA)/oligo (dT)25 as substrate and RT-PCR. The RT assay reaction mixtures were incubated at 65°C for 10 minutes. Strong strand displacement activity was measured by primer extension assays using Ml 3 single-stranded DNA as template. [000133] Modified DNA polymerase variants were expressed in E. coli using the rhamnose inducible system in vector pD861 (from ATUM/DNA2.0). Variants were purified in parallel using heat cut, host cell DNA removal and column chromatography and stored in appropriate storage buffer. Purified protein was characterized for i) purity on SDS-PAGE; ii) polymerase activity; iii) host cell DNA contamination test by 16S-PCR; iv) exonuclease activity and; v) thermostability test.
PURITY ON SDS-PAGE
[000134] Purified polymerases were tested for purity by SDS-PAGE analysis. We observed yields of 200 μg to 1 mg of purified protein from 50 ml of culture with greater than 95% purity on SDS-PAGE (Figure 1 A and IB).
POLYMERASE ACTIVITY
[000135] Polymerase activity was determined using a DNA polymerase assay based on the detection of incorporated radioisotope-labeled dNTP in a DNA elongation reaction. The reaction was incubated at 70°C for 10 minutes. Activity of the DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA (Table 6).
Figure imgf000035_0001
TABLE 6
[000136] Specific activity of the DNA polymerases was measured (Tables 7 and 8).
Figure imgf000036_0001
TABLE 7
Figure imgf000036_0002
TABLE 8
[000137] We identified 9 modified DNA polymerase variants that showed greater than 2-fold increased specific activity over wild-type VA3173WT (SEQ ID NO: 1), polymerase variants V313 (SEQ ID NO: 159), V315 (SEQ ID NO: 161), V316 (SEQ ID NO: 162), V319 (SEQ ID NO: 165), V320 (SEQ ID NO: 166), V325 (SEQ ID NO: 171), V327 (SEQ ID NO: 173), V329 (SEQ ID NO: 175), V337 (SEQ ID NO: 183), 32 variants showed decreased (< 100,000 U/mg) specific activity.
NUCLEASE ASSAY
[000138] The parent DNA polymerase (Accura DNA polymerase) has an inherent 3' - 5' proofreading exonuclease activity. Exonuclease activity of the modified DNA polymerase (DNAP) variants was measured to determine if the purified DNAP variants maintained the exonuclease activity. Nuclease activity of the modified DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA. The reaction mixture including 10 units of enzyme and 3 P-labeled DNA fragment in 25 μΐ IX DNAP buffer B (Lucigen) was incubated at 65°C for 16 hours. Reactions were analyzed by trichloroacetic acid precipitation method. Almost all modified DNAP variants retain or increase the nuclease activity comparable to the Accura or KAPA DNA polymerases, except two variants V357 (SEQ ID NO: 203) and V395 (SEQ ID NO: 240) showing reduced nuclease activity (Table 9).
% Total % Total % Total % Total % Total % Total
SEQ ID NO cpm SEQ ID NO cpm SEQ ID NO cpm SEQ ID NO cpm SEQ ID NO cpm SEQ ID NO cpm
147 104.1 163 84.5 179 83 195 102.7 211 100 227 99.9
148 80 164 80.1 180 81.3 196 78.8 212 82.4 228 81.8
149 79.1 165 80.1 181 83.9 197 79.5 213 83.7 229 67.4
150 101.4 166 82.3 182 83.6 198 79.9 214 81.5 230 78.4
151 80.8 167 78.4 183 78.9 199 80.8 215 80.1 231 79.9
152 82.6 168 81 184 82 200 80.4 216 80 232 77.6
153 88.5 169 81.2 185 82 201 79.8 217 81 233 85.6
154 109 170 82.3 186 85.1 202 80.8 218 83.8 234 76.2
155 101.3 171 83.3 187 83.5 203 28.1 219 102.1 235 105.3
156 79.3 172 82.6 188 81 204 82.2 220 80.9 236 91
157 78.8 173 83.7 189 73.9 205 84.2 221 81.9 237 89.4
158 79.3 174 83 190 84.6 206 82.3 222 81.9 238 81
159 80.1 175 82.9 191 81.9 207 81.3 223 80 239 82.6
160 80.4 176 83.1 192 83 208 82.7 224 80.4 240 22.6
161 78.9 177 84.4 193 82.1 209 81.9 225 81 241 84.7
162 97.9 178 81.9 194 84.9 210 95.4 226 81.6 242 102.2
Accura 79.8
GoTaq 0.4
KAPA 92.2
TABLE 9 THERMOSTABILITY TEST
[000139] Thermostability of the DNA polymerase variants was tested by incubating the enzymes in IX DNAP buffer B (Lucigen) at 98°C for 2 minutes. Thermostability of the DNA polymerase (DNAP) variants was compared to other available polymerases Accura DNAP, GoTaq and KAPA (Table 0). The polymerase activity was determined as described in example 6.1.2. . o «5 ¾ *-r> rM
Figure imgf000038_0001
ft co cr> c > ι -j ro - ^ij co cn o o o o ""! rM
o ° S s
Figure imgf000038_0002
tri
3 in ^ co co cr>
A " ^ j o
-H r ro m "3- no tn u) m r*. oo cn 3
TABLE 10
[000140] We observed 8 variants (SEQ ID NOS: 155, 157, 159, 164, 190, 218, 225 and 233) that retained greater than 50% activity (Table 10).
TEST FOR HOST DNA CONTAMINATION
[000141] Purified DNA variants were tested for ability to amplify contaminating bacterial host DNA by the E. coli 16S PCR test. PCR reactions were run in the presence of E. coli genomic DNA (gDNA) to determine amplification ability of the DNA polymerases as well as in the absence to test for contaminating bacterial 16S ribosomal DNA (rDNA). 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 200 μΜ dNTP, 10 ng E. coli gDNA, 1 μΜ each primer, 5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles -94°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C. Purified DNA polymerase variants had low to no contaminating bacterial host DNA (Figures 2A and 2B).
AMPLIFICATION OF POL A (2.8 kb) DNA TARGET
[000142] Amplification properties of DNA polymerase variants were determined by PCR of a 2.8 kb Pol A target DNA from E, coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 200 μΜ dNTP, 10 ng E. coli gDNA, 1 μΜ each primer, 5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles - 94°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C. A second PCR reaction set was run with cycling conditions: 94°C for 2 mins, 35 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls.
[000143] We identified 42 positive variants in the 98°C PCR - V301 (SEQ ID NO: 147), V304 (SEQ ID NO: 150), V305 (SEQ ID NO: 151), V307 (SEQ ID NO: 153), V308 (SEQ ID NO: 154), V309 (SEQ ID NO: 155), V310 (SEQ ID NO: 156), V311 (SEQ ID NO: 157), V313 (SEQ ID NO: 159), V314 (SEQ ID NO: 160), V315 (SEQ ID NO: 161), V316 (SEQ ID NO: 162), V318 (SEQ ID NO: 164), V31 (SEQ ID NO: 165), V320 (SEQ ID NO: 166), V321 (SEQ ID NO: 167), V322 (SEQ ID NO: 168), V323 (SEQ ID NO: 169), V324 (SEQ ID NO: 170), V327 (SEQ ID NO: 173), V329 (SEQ ID NO: 175), V334 (SEQ ID NO: 180), V335 (SEQ ID NO: 181), V339 (SEQ ID NO: 185), V340 (SEQ ID NO: 186), V344 (SEQ ID NO: 190), V346 (SEQ ID NO: 192), V347 (SEQ ID NO: 193), V349 (SEQ ID NO: 195), V360 (SEQ ID NO: 206), V365 (SEQ ID NO: 211), V368 (SEQ ID NO: 214),V372 (SEQ ID NO: 218), V373 (SEQ ID NO: 219), V380 (SEQ ID NO: 225), V381 (SEQ ID NO: 226), V382 (SEQ ID NO: 227), V388 (SEQ ID NO: 233), V389 (SEQ ID NO: 234), V391 (SEQ ID NO: 236), V396 (SEQ ID NO: 241), and V397 (SEQ ID NO: 242) that were comparable to control polymerases Accura and KAPA (Figures 3A and 3B). We also show that the DNA polymerase variants survive 98°C denaturation in 35 cycles of PCR.
AMPLIFICATION OF LONGER E. coli DNA TARGETS
[000144] Amplification properties of DNA polymerase variants were determined by PCR of a 5 kb and 10 kb target DNA from E. coli genomic DNA (gDNA). PCR reactions were run at two denaturation temperatures 94°C and 98°C to determine thermostability. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 200 μΜ dNTP, 10 ng E. coli gDNA, 1 μ each primer, 5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles - 94°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C. A second PCR reaction set was run with cycling conditions: 94°C for 2 mins, 35 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2 mins, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis.
[000145] We identified 32 positive modified DNA polymerase variants V302 (SEQ ID NO: 148), V303 (SEQ ID NO: 149), V305 (SEQ ID NO: 151), V307 (SEQ ID NO: 153), V308 (SEQ ID NO: 154), V309 (SEQ ID NO: 155), V310 (SEQ ID NO: 156), V311 (SEQ ID NO: 157), V313 (SEQ ID NO: 159), V314 (SEQ ID NO: 160), V3 I5 (SEQ ID NO: 161), V316 (SEQ ID NO: 162), V320 (SEQ ID NO: 166), V321 (SEQ ID NO: 167), V322 (SEQ ID NO: 168), V324 (SEQ ID NO: 170), V327 (SEQ ID NO: 173), V331 (SEQ ID NO: 177), V334 (SEQ ID NO: 180), V335 (SEQ ID NO: 181), V339 (SEQ ID NO: 185), V340 (SEQ ID NO: 186), V344 (SEQ ID NO: 190), V345 (SEQ ID NO: 191), V346 (SEQ ID NO: 192), V347 (SEQ ID NO: 193), V355 (SEQ ID NO: 201), V365 (SEQ ID NO: 211), V368 (SEQ ID NO: 214), V386 (SEQ ID NO: 231), V391 (SEQ ID NO: 236), and V396 (SEQ ID NO: 241) that efficiently amplified a 5 kb E. coli gDNA target (Figures 4 A and 4B). Three DNA polymerase variants V311 ((SEQ ID NO: 157), 314 (SEQ ID NO: 160), and 335 (SEQ ID NO: 181) efficiently amplified a 10 kb E. coli gDNA target (Figures 5A and 5B).
Additionally, we show that these DNA polymerase variants survive 98°C denaturation in 35 cycles of PCR.
AMPLIFICATION OF LONGER HUMAN DNA TARGETS
[000146] Modified DNA polymerase variants efficiently amplified longer human DNA target (5 kb) by PCR. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 50 ng human DNA, 2.5 μΐ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 30 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 2.5 mins, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 30 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 2.5 mins, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls, PCR reactions were analyzed by agarose gel electrophoresis (Figure 6 A) and on a Bioanalyser (Figure 6B).
[000147] We identified 5 positive modified DNA polymerase variants V313 (SEQ ID NO: 159), V318 (SEQ ID NO: 164), V329 (SEQ ID NO: 175), V380 (SEQ ID NO: 225), V388 (SEQ ID NO: 233) that efficiently amplified a 5 kb human DNA target (Figures 6 A, 6B). Additionally, these variants survive a 98°C denaturation in 30 cycles of PCR.
AMPLIFICATION OF NGS HUMAN DNA LIBRARY
[000148] Modified DNA polymerase variants identified show good NGS Human DNA library amplification. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 20 pg human DNA library, 2.5 μΐ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 1 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls.
[000149] Modified DNA polymerase variants successfully amplified a human DNA library (Figure 7) giving high yields of amplicons.
AMPLIFICATION OF HIGH GC TARGET DNA
[000150] Modified DNA polymerase variants successfully amplified high GC-content NGS DNA library from Rhodobacter. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 20 pg Rhodobacter DNA library, 2.5 μΐ P5 P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 mins, 72°C for 10 mins, hold at 4°C. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis (Figures 8) and on a Bioanalyser (data not shown).
[000151] Modified DNA polymerase variants V304 (SEQ ID NO: 150), V309 (SEQ ID NO: 155), V313 (SEQ ID NO: 159), V318 (SEQ ID NO: 164), V319 (SEQ ID NO: 165), V320 (SEQ ID NO: 166), V327 (SEQ ID NO: 173), V339 (SEQ ID NO: 185), V344 (SEQ ID NO: 190), V397 (SEQ ID NO: 242) efficiently amplified high GC-content NGS DNA library from Rhodobacter giving high yields of amplicons.
AMPLIFICATION OF HIGH AT TARGET DNA
[000152] Modified DNA polymerase variants successfully amplified high AT-content NGS DNA library from Staphylococcus. 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 20 pg Staphylococcus DNA library, 2.5 μΐ P5/P7 primer, 5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 mins, 72°C for 10 mins, hold at 4°C. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis (Figures 9) and on a Bioanalyser (data not shown).
[000153] Modified DNA polymerase variants V313 (SEQ ID NO: 159), V320 (SEQ ID NO: 166), V344 (SEQ ID NO: 190), V368 (SEQ ID NO: 214) efficiently amplified high AT- content NGS DNA library from Staphylococcus giving high yields of amplicons.
PRIMER EXTENSION
[000154] We identified modified DNA polymerase variants that show strong strand displacement activity by primer extension. Primer extension was set up as follows in a 25 μΐ reaction mixture containing IX DNAP buffer B, 0.5 μg Ml 3 single-stranded DNA (ssDNA), 0.4 μΜ Ml 3 primer, 250 μΜ dNTPs, 0.1 μg/μl BSA and 5 U of polymerase. Reactions were incubated at 65 °C for 30 minutes and analyzed by agarose gel electrophoresis (Figure 10).
REVERSE TRANSCRIPTASE ACTIVITY
[000155] Modified DNA polymerase variants show high reverse transcriptase (RT) activity on a 520 base pairs (bp) MS2 RNA target. RT-PCR 25 μΐ reaction mixture was set up as follows: IX HF buffer, 200 μΜ dNTP, 1 ng MS2 RNA, 0.5 μΜ each primer, 5 mM PCF if needed, 2.5 U polymerase; RT-PCR cycling conditions: 65°C for 2 mins, 94°C for 2 mins; 35 cycles: 94°C for 15 seconds, 60°C for 30 seconds, 72°C for 1 minute; 72°C for 10 mins, hold at 4°C. RT-PCR reactions were analyzed by agarose gel electrophoresis (Figures 11A and 1 IB), a subset of samples were re-analyzed on an agarose gel to confirm positives (Figure 11C).
[000156] We identified 12 DNA polymerase variants with improved activity, amplification of long targets and difficult to amplify targets, for example GC- and AT-rich templates, processivity and primer/template affinity that are shown in Table 11.
Figure imgf000044_0001
ROUND 2 SCREENING
[000157] A second round of screening of 12 modified DNA polymerase variants from the first round (sections 6.1.1 to 6.1.9) above (Table 11) and second round variants that were designed based on successful first round variants included making hot start version of the variants identified from Table 1 and confirmation of results by PCR and NGS amplification as above. Additionally, PCR conditions were optimized.
[000158] 12 modified DNA polymerase variants from Table 11 (SEQ ID NOS: 156, 159, 164, 165, 166, 173, 181, 190, 214, 218, 225 and 242) were tested in a HotStart version to show inhibition of polymerase activity by antibody (Figure 12). Efficient amplification of a 3 kb plasmid target, 5 kb and 10 kb E. coli target (Figures 13-15A and B) was shown for the 12 variants from rounds 1 and 2, and round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction and cycling conditions were the same as sections 6.1.5 and 6.1.6 above except using 2.5 U polymerase for the 3kb and 5 kb templates. For the 10 kb template, 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 10 ng E. coli gDNA, 0.5 μΜ each primer, 2.5 U DNA polymerase; cycling conditions: 94°C for 2 mins, 35 cycles: 94°C for 15 seconds, 60°C for 30 seconds, 72°C for 5 mins; 72°C for 10 mins, hold at 4°C (Figure 15B panel I). For comparison, a second set was cycling conditions with 30 cycles (Figure 15B panel II).
[000159] Efficient amplification of a 5 kb human DNA target (Figures 16) was shown for the 12 variants from rounds 1 and 2 (SEQ ID NOS: 156, 159, 164, 165, 166, 173, 181, 190, 214, 218, 225, 242), and round 2 variants V103 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). PCR reaction and cycling conditions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 10 ng human gDNA, 0.5 μΜ primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 30 cycles: 98°C for 15 seconds; 60°C for 30 seconds; 72°C fori.5 mins, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 30 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1.5 mins, 72 °C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis (Figure 16).
[000160] Efficient amplification of NGS DNA libraries from human and Rhodobacter was shown for the 12 variants from Table 11 and two second round variants VI 03 (SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117) (Figures 17). 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 20 pg DNA library, 2.5 μΐ P5/P7 primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles: 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C. Accura DNA polymerase and KAPA DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis (Figure 17) and on a Bioanalyser (data not shown).
[000161] Efficient amplification of NGS DNA library from AT-rich Staphylococcus was shown for the 12 variants from Table 11 (SEQ ID NOS: 156, 159, 164, 165, 166, 173, 181, 190, 214, 218, 225 and 242) and two second round variants V103 {SEQ ID NO: 101) and VI 19 (SEQ ID NO: 117). 25 μΐ PCR reactions were set up as follows: IX HiF buffer (Lucigen), 300 μΜ dNTP, 20 pg Staphylococcus DNA library, 2.5 μΐ P5/P7 primer, 2.5 U DNA polymerase; cycling conditions: 95°C for 2 mins, 18 cycles - 98°C for 15 seconds; 60°C for 30 seconds; 72°C for 1 minute, 72°C for 10 mins, hold at 4°C; cycling conditions for KAPA polymerase: 95°C for 3 mins, 18 cycles - 98°C for 20 seconds; 65°C for 15 seconds; 72°C for 1 mins, 72°C for 10 mins, hold at 4°C. Accura (Acc) DNA polymerase and KAPA (KA) DNA polymerase was included as controls. PCR reactions were analyzed by agarose gel electrophoresis (Figure 18) and by Bioanalyser (data not shown).
[000162] Second round and third round variant screening is summarized in Table 12 below.
Figure imgf000047_0001
TABLE 12
[000163] We identified DNA polymerase variants with greatly improved properties compared to wild-type after 3 rounds of screening. Further rounds of screening are contemplated to obtain properties that are comparable or better than existing DNA polymerases.
[000164] PCR optimization was carried out first in the presence of sugars, for example sucrose, trehalose, mannitol (Figure 19 A) and sorbitol (Figure 19B) followed by addition of bovine serum albumin (BSA) or L-carnitine (Figure 20). Sugars and L-carnitine helped with a more productive PCR. Sugar and L-carnitine together showed an additive effect on amplification of a 3 kb CometGFP DNA and a 5 kb human gDNA target (Figures 21 A and 21B). Various combinations of buffer compositions, for example Tris-HCl (10-50 mM), Tris-acetate (10-50 mM) or Bis-Tris propane (10-50 mM) with a pH range of 8.0-9.3; salts, for example KC1 (10-100 mM) and ammonium sulfate (5-50 mM); metal, for example magnesium chloride (1-5 mM) or magnesium sulfate (1-5 mM); non-ionic detergents, for example Triton X-100 (0.1-1%), NP-40 (0.1-1%), Tween 20 (0.1-1%), Brij58 (0.1-1%) or CHAPS (0.1-1%); dNTPs (50-500 μΜ) were set up on a 3 kb and 2.8 kb E.coli targets (Figures 22A and 22B respectively), buffer combinations shown in Table 13.
Figure imgf000048_0001
TABLE 13 [000165] PCR reactions were set up as follows: 25 μί of IX buffer as designed (Table 12), 5 pg of plasmid DNA, 0.5 μΜ primers, 2.5 U of hotstart polymerase V344 (SEQ ID NO: 190) with or without additive, cycling conditions: 95°C for 2 minutes; 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds, 72°C for 1.5 minutes; 72°C for 10 min. A summary of results is shown in Table 14.
3Kb Plasmid Target, ng 2.8Kb E.coli Target, ng
Test Buffer no Additive with Additive no Additive with Additive
1 0 6.5 0 170.8
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 154.3 171.8
9 0 0 0 77.5
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 114.1 151.7
15 0 0 87.5 118.3
16 0 0 0 0
17 0 0 0 <1.0
18 0 0 0 0
19 0 0 68.7 149.1
20 0 0 0 72.8
21 0 0 0 149.5
22 0 0 0 0
23 0 0 0 66.3
24 0 0 119.6 224.4
TABLE 14
[000166] PCR reactions with standard PCR buffer was tested on CometGFP DNA as template with addition of sorbitol (Figure 23A), KC1 (Figure 23B) or non-ionic detergents (Figure 23C). PCR reactions were set up as follows: 25 μΐ of IX basic buffer, pH range 8.0- 9.3 (Figure 23A), 200 μΜ dNTP, 5 pg of CometGFP DNA, 0.5 μΜ primers, 15% sorbitol added to a subset of reactions, 2.5 U of hotstart polymerase V344 (SEQ ID NO: 238041) with or without additive, cycling conditions: 95°C for 2 minutes; 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds, 72°C for 1.5 minutes; 72°C for 10 min. Reactions were analyzed by agarose gel electrophoresis.
[000167] Additional screening for improved polymerase variants and further optimization of PCR reaction conditions is contemplated.
BRIEF DESCRIPTION OF TABLES
[000168] Table 1. DNA polymerase substitution scores (method 1).
[000169] Amino acid substitutions (Columns A, C, E, G, I, K and M) with scores (columns B, D, F, H, J, L, N) calculated for DNA polymerase variants based on SEQ ID NO: 1 homologs.
[000170] Table 2. DNA polymerase substitution scores (method 2).
[000171] Amino acid substitutions (Columns A, C, E, G, I, K, M) with scores (columns
B, D, F, H, J, L, N) calculated for DNA polymerase variants based on SEQ ID NO: 1 homologs.
[000172] Table 3. Selected amino acid substitutions for incorporation into variants.
[000173] Amino acid substitutions selected to incorporate into SEQ ID NO: 1 variants, based on high scoring substitutions shown in Tables 1 and 2.
[000174] Table 4. Combinations of substitutions in synthetic DNA polymerases.
[000175] Combinations of substitutions from Table 3 incorporated into 240 Variant DNA polymerases. Sequences of polymerases are given as SEQ ID NOS: 3-242.
[000176] Table 5. Model weights for selected substitutions.
[000177] Models are based on the set of infologs described and assess the relative contribution of substitutions within the set. Model weights are shown for activities that are desirable in the modified variants and are used for selection of substitutions.
TABLES
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
TABLE 1
A B C 0 E F G H
Substitution Score Substitution Score Substitution Score Substitution Score
L347P 289.05 N552G 2.79 I517F 1.02 K559R 0.23
I5S0L 12.81 VS18R 5.54 I442L 4.50 F416A 3.68
L590A 12.97 V441A 4.43 V581! 3.79 F390R 2.99
E54SR 15.57 A613K 4.00 Q533D 3.11 I493M 2.15
G418R 6.37 T586A 4.39 I503A 2.95 YS15R 2.44
V5S5R 6.38 Y386H 2.65 Y555C 0.99 I404E 0.23
S367A 7.91 Q561R 5.52 H532L 4.47 A439L 3.68
S464A 8.62 i364L 4.41 D562G 3.79 A485L 2.90
Q350 6.87 K520R 3.67 N547A 2.99 P39SA 2.13
W558A 6.05 424R 4.03 M345A 2.94 E450A 2.42
[616L 5.88 K642P 2.44 F334L 0.78 N552S 0.21
Q.S67P 7.73 F448 5.35 D562E 4.43 G572A 3.63
F333A 8.17 S475T 4.41 I641V 3.74 E548A 2.90
E585 5.89 F370A 3.58 I481R 2.83 G387A 2.12
L374N 5.68 A535I 3.65 K523R 2.90 Q610E 2.37
F49SL 5.76 E574D 2.42 A494S 0.77 L575M 0.18
I600L 6.6S G382D 5.35 A573S 4.34 P446E 3.63
L413I 7.91 D546E 4.38 PS88D 3.69 K640E 2.89
Q480R 5.61 G387P 3.49 E548K 2.80 LS64I 2.11
T554R 5.03 I364A 3.64 E644! 2.76 F333I 2.36
P432S 5.62 I417F 2.35 522L 0.73 I490L 0.14
V593I 6.38 G492A 5.35 T462V 4.28 K578L 3.60
P495A 7.22 N470P 4.35 S381D 3.66 N444R 2.87
G497S 5.5S 591R 3.45 K460A 2.78 Y566N 2.09
L349A 5.01 V534A 3.51 E543R 2.76 328Q 2.33
6482A 5.00 L458I 2.33 Q454E 0.69 N337Q 0.12
V645A 6.35 S571L 5.28 E511L 4.15 V601L 3.58
K522F 6.28 A369D 4.33 E530A 3.64 A331G 2.87
V602L S.23 I507L 3.40 W443T 2.46 K523E 2.09
S484A 4.98 A604C 3.51 V638I 2.73 K341A 2.33
C401S 4.92 G440A 2.02 Y491F 0.41 N337P 0.08
A613S 5.90 L489I 5.21 E478D 4.14 F389Y 3.54
Q513E 5.98 V518I 4.27 Q363K 3.58 V343T 2.81
R412D 5.23 D403N 3.28 W626K 2.40 D611A 2.07
I490A 4.98 L582R 3.46 357A 2.63 C627K 2.32
F431Y 4.83 F452Y 1.70 Y555F 0.41 L413V 0.08
L563R 5.81 F448L 5.04 L374Y 4.05 N388S 3.53
E478Q 5.05 N505Y 4.21 N337E 3.52 F416L 2.80
M399F 4.85 A348V 3.28 A439I 2.30 T392A 1.97
E500A 4.94 N508D 3.43 K527A 2.S5 D589R 2.31
W521F 3.60 L575I 1.58 1616V 0.32 G482I 0.06
F539A 5.76 I592L 4.74 G497R 3.96 A485V 3.48
P588L 5.03 614A 4.13 V358I 3.49 L575G 2.71
A535Q 4.74 K496E 3.26 D605K 2.30 L582F 1.94
L59SA 4.92 A369N 3.39 D611E 2.55 V336I 2.30
E445D 3.57 A499S 1.32 V474I 0.31 E574F 0.05
Q407T 5.75 I528F 4.66 A557M 3.S4 T510S 3.37
L36SI 4.75 S415Q 4.12 L368K 3.33 V581A 2.70 619A 4.59 L413F 3.19 K332D 2.23 D468N 1.93
Y524A 4.67 I503R 3.13 Q533R 2.53 Q3S0A 2.30
D4236 3.45 L349I 1.16 L583F 0.30 D423N 0.05
Y52SF 5.62 K341E 4.66 1568V 3.75 K469I 3.19
L438i 4.74 A485I 3.94 L368R 3.09 R412K 2.66
CS02A 4.51 K617E 3.19 L628! 2.18 V545I 1.90
Y566A 4.63 D342E 3.10 M345K 2.51 K447A 2.29
T391Q 2.98 481L 1.15 L349V 0.25 G482S 0.05
1427V 5.60 T586K 4.66 A394T 3.69 N444G 3.18
L563A 4.57 S339T 3.89 509I 3.08 L426F 2.65
L461A 4.38 K469V 3.11 K559A 2.15 K617H 1.88
E58SA 4.51 A557N 3.02 L612A 2.50 S415E 2.21 A B C D E F G H
Substitution Score Substitution Score Substitution Score Substitution Score
1417V 0.03 T526K 2.20 T549N 1.41 A639G 1.04
TS26P 3.13 N542H 1.58 K447H 1.16 N444Q 0.96
E60SL 2.62 S381R 1.96 G418I 1.65 Q454A 1.39
K372E 1.88 F390H 2.59 R519E 2.08 K332A 1.69
E60SQ 2.20 I471Y 2.11 D562T 1.31 K447N 1.03
T526G 0.02 Q513A 1.57 Y555M 1.16 L643C 0.96
K424N 3.07 E608H 1.94 N505E 1.64 P446R 1.39
1517L 2.60 S09L 2.43 1409V 2.07 Q329E 1.66
I465D 1.88 E603D 1.94 G570S 1.29 L582E 1.02
R556T 2.18 NS47G 1.53 N32SD 1.1S K607E 0.92
N552D 0.01 T392I 1.94 P588R 1.62 K584H 1.36
F419A 3.03 TS10N 2.42 ΝΞ42Κ 2.05 S464 1.66 520A 2.59 Y375L 1.85 K332S 1.27 D468G 1.00
T393G 1.82 E511I 1.38 E500R 1.10 G387S 0.88
D373R 2.17 P560F 1.91 D346A 1.60 E622A 1.33
I404G 0.01 S475N 2.42 T393K 2.03 V518L 1.6S
SS71T 2.96 P495E 1.83 K591N 1.24 V638L 0.99
K527G 2.51 S340N 1.37 A557H 1.09 I449L 0.88
E472D 1.81 S367E 1.88 F467L 1.58 R453N 1.32
E473D 2.16 E477D 2.41 I580M 2.00 I592G 1.61
I404P 0.01 L583I 1.82 N388H 1.23 E512D 0.99
S415R 2.92 I568 1.31 Q480K 1.09 E585T 0.85
V35SL 2.45 D611I 1.86 L615R 1.57 AS57S 1.32
Y5S5A 1.77 S396T 2.39 A623G 1.96 K559L 1.61
L615I 2.13 N508G 1.78 E647D 1.22 F576A 0.96
G482V 0.00 E530Q 1.31 N388D 1.07 A604V 0.83
W550L 2.91 O420A 1.85 T428G 1.54 V343R 1.31
E360D 2.42 1385V 2.36 Y324P 1.93 N470A 1.59
A529R 1.76 V441R 1.76 Y541K 1.16 F333V 0.96
K523A 2.13 L466A 1.30 TS10E 1.06 V336C 0.83
T526S 0.00 E516A 1.84 G335R 1.53 P395V 1.31
F467 2.87 I465L 2.33 TS49S 1.92 S47SD 1.56
K640Q 2.41 T392C 1.71 A369S 1.16 L347I 0.96
Q350R 1.76 N388R 1.28 W443A 1.05 C502I 0.82
K3S7L 2.11 E516G 1.84 L595T 1.48 E500Q 1.30
N552H 0.00 M399I 2.30 K496F 1.92 E530 1.55
S415G 2.82 E500N 1.60 L583V 1.13 F390Y 0.95
D376I 2.31 T330I 1.28 K379A 1.04 T428I 0.80
I471L 1.67 I517A 1.79 S402T 1.47 W626A 1.29
E365Q 2.10 K425Q 2.29 N325S 1.87 L489V 1.48
N552 0.00 V518M 1.57 L5S1D 1.12 K341D 0.95
F370I 2.81 I449T 1.25 424W 1.01 T526A 0.78
I377S 2.31 523D 1.78 I490M 1.47 D342A 1.29
V602F 1.66 V343E 2.23 P560Y 1.78 F576I 1.48
YS44A 2.05 K496A 1.55 A499M 1.12 F467Y 0.94
T526Y 0.00 W624A 1.23 A5S7R 1.00 E516I 0.78
D373N 2.81 A504R 1.77 E585R 1.46 E365A 1.29
L438V 2.29 K496R 2.18 K496N 1.76 G382N 1.39
T428A 1.65 D346N 1.46 S340D 1.12 E530R 0.92
E621I 2.04 Y386R 1.22 K619E 0.98 ES12K 0.76
T526V 0.00 E516R 1.76 E512A 1.42 P560R 1.26
1528V 2.73 F576L 2.14 P588T 1.73 LS63K 1.36
D546Q 2.23 L426I 1.45 V336L 1.12 N508S 0.87
S367R 1.59 1427L 1.19 E511R 0.98 K587L 0.76
YS36A 2.04 E622D 1.70 I456A 1.42 R538Q 1.24
A361I 2.71 DS89E 2.11 I481M 1.72 1592V 1.30
N594A 2.22 K614E 1.42 V343A 1.07 L461F 0.86
D403R 1.59 R519Q 1.17 L595S 0.97 K619R 0.76
L551F 2.03 K640D 1.66 S340R 1.42 A609L 1.22
I528L 2.69 L466I 2.10 N486Q 1.72 341R 1.29
Figure imgf000061_0001
Figure imgf000062_0001
A B C D E F G H
Substitution Score Substitution Score Substitution Score Substitution Score
I507M 0.06 Y544I 0.03 D546S 0.02 T392L 0.01
E516K 0.04 R637L 0.02 T330V 0.01 T554P 0.01
L628Y 0.25 I471P 0.20 E516V 0.15 E512R 0.11
N486T 0.10 V441K 0.04 R380Q 0.01 W624E 0.00
K425T 0.06 F333Y 0.03 K559M 0.02 E422H 0.00
S367N 0.04 A557Q 0.02 KS23C 0.01 R412E 0.01
T586N 0.25 A504G 0.20 L615T 0.15 P648S 0.11
F539S 0.09 F416Y 0.04 L368M 0.01 A613Q 0.00
A504S 0.05 A369T 0.03 N470Q 0.02 I409F 0.00
E500K 0.04 E360K 0.02 E608Y 0.01 W550I 0.01
I347E 0.25 I471M 0.20 N337R 0.14 356N 0.10
F487Y 0.09 E621V 0.03 D376R 0.01 T393Q 0.00
P588E 0.05 AS04V 0.03 K527H 0.02 E445V 0.00
L347D 0.04 N547E 0.02 T393L 0.01 T554Y 0.01
E60ST 0.24 RS38L 0.19 N325G 0.14 E632Q 0.10
K332Q 0.08 D562S 0.03 S484M 0.01 A639T 0.00 357E 0.05 K328R 0.03 Y544 0.02 D346E 0.00
Q561A 0.04 A348P 0.02 M509K 0.01 S415I 0.01
P560Q 0.24 G418K 0.19 D373Q 0.14 R637D 0.10
K607N 0.07 A535T 0.03 Y324I 0.01 D373T 0.00
R519D 0.05 W5S8 0.02 L461G 0.01 K379L 0.00
N470R 0.04 K460S 0.02 P648G 0.01 N470D 0.01
F390Q 0.24 K357H 0.19 Q533V 0.13 T586L 0.10
L575V 0.07 A485C 0.03 L368Y 0.01 S464K 0.00
G497A 0.04 A529E 0.02 K522Y 0.01 P588F 0.00
DS89Q 0.04 C502L 0.02 Y566M 0.01 M509F 0.01
D611L 0.24 L595C 0.18 E621L 0.13 S381I 0.10
1442V 0.06 Q480 0.03 A371F 0.01 K587F 0.00
P560H 0.04 S415Y 0.02 K496Y 0.01 P495I 0.00
Q480T 0.04 V602A 0.02 Q363G 0.01 K523G 0.01
K476R 0.24 NS47I 0.18 D562Q 0.13 I364C 0.10
E365D 0.06 R519T 0.03 E445I 0.01 D589Y 0.00
I409M 0.03 L582Y 0.02 E548Q 0,01 E360R 0.00 559G 0.03 L612 0.02 S484N 0.01 R453L 0.01
K614R 0.23 L612Y 0.18 1493R 0.13 G418N 0.10
S464C 0.06 A485 0.02 I364Y 0.01 QS31K 0.00
Y386Q 0.03 Y536N 0.02 Y524F 0.01 I364S 0.00
D468Q 0.03 A535N 0.02 T428Q 0.01 A557T 0.01
N444E 0.22 G572C 0.18 E537Q 0.13 E633Q o.io
S606A 0.05 Q533H 0.02 S475V 0.00 I600M 0.00
LS64R 0.03 NS94G 0.02 D468K 0.01 E500S 0.00
R453H 0.03 E450R 0.02 K447L 0.01 6196 0.01
A378M 0.22 A378L 0.17 I456R 0.13 F634Q 0.10
D342R 0.05 P432G 0.02 A494T 0.00 A535V 0.00
V534 0.03 D346 0.02 L615V 0.01 E608I 0.00
G382S 0.03 K460D 0.01 F370 0.01 K520T 0.01
F333L 0.22 P327H 0.17 D346R 0.13 N636R 0.10
D373K O.OS Q533K 0.02 F416G 0.00 D423H 0.00
E585H 0.03 D403E 0.02 I456L 0.01 P560I 0.00
A53S 0.03 E473Q 0.01 !481S o.oi F333S 0.01
K523L 0.21 R411V 0.17 R453Q 0.12 F370L 0.10
V358A 0.05 V343S 0.02 Q513D 0.00 P495Y 0.00
P495G 0.03 V441D 0.02 R411G 0.01 E622W 0.00
Q561F 0.02 I471F 0.01 T330A 0.01 G383E 0.00
E622Q 0.20 K372R 0.16 D611T 0.12 K584N 0.09
K559P 0.04 N508A 0.02 V518F 0.00 L563Y 0.00
ES43K 0.03 S381Y 0.02 S402D 0.01 D403H 0.00
R5S6V 0.02 I493Q 0.01 L461H 0.01 A529L 0.00
W626R 0.20 K424S 0.15 N352C 0.11 S402V 0.09
KS59T 0.04 T554V 0.02 L368T 0.00 Q531W 0.00
Figure imgf000064_0001
A B c D E F G H
Substitution Score Substitution Score Substitution Score Substitution Score
S3S1T 0.00 N352S 0.01 V343I 0.00 T586V 0.00
N505S 0.02 D546G 0.00 W558Y 0.00 M354E 0.00
K619V 0.00 L413W 0.01 E530N 0.00 G335E 0.00
I4S6G 0.00 E450S 0.00 D546Y 0.00 D420K o.oo
E365T 0.02 S415V 0.01 A504D 0.00 E622H 0.00
E516L 0.00 R519H 0.00 D546V 0.00 E621T 0.00
M399Y 0.00 Y51SN 0.01 Y536G 0.00 1364V 0.00
A60 K 0.02 L582H 0.00 I377H 0.00 E537H 0.00
V336Y 0.00 T586D 0.01 D562L 0.00 !377£ 0.00
DS89L 0.00 S367F 0.00 D420G 0.00 E585L 0.00 578C 0.02 D373L 0.01 584W 0.00 Y544R 0.00
A529F 0.00 Q350I 0.00 E585C 0.00 P446G 0.00 578S 0.00 E500L 0.01 352E 0.00 V534D 0.00
E532K 0.02 Q610S 0.00 G418Q 0.00 K614CI 0.00
F416V 0.00 Q454D 0.01 L374T 0.00 N508T 0.00
L349Y 0.00 K460L 0.00 P327T 0.00 NS08K 0.00
K520G 0.02 640H 0.01 D373G 0.00 D342W 0.00
Y524L 0.00 R538M 0.00 N542L 0.00 N388C o.oo
E360Q 0.02 V545G 0.01 Y544D 0.00 L612S 0.00
L5S2Q 0.00 YS24M 0.00 ES85N 0.00 W550R 0.00
S4S4 0.02 ES43S 0.01 L61S 0.00 YS24! 0.00
N547S 0.00 D611P 0.00 E537L 0.00 Y544L 0.00
Y515D 0.02 R538F 0.01 E633K 0.00 Y515E 0.00
Q610P 0.00 Q561S 0.00 F634L 0.00 N542I 0.00
N444D 0.02 Y375F 0.01 636V 0.00 K379G 0.00
K447I 0.00 K619H 0.00 Y536K 0.00 R637V 0.00
E450G 0.02 P446D 0.01 R538Y 0.00 E365N 0.00 578 0.00 I456M 0.00 I481Q 0.00 ES00V 0.00
M354L 0.02 F467V 0.01 V545T 0.00 K379E 0.00
P39SS 0,00 607T 0.00 D342L 0.00 A378V 0.00
I592F 0.02 T586P 0.01 I493G 0.00 E58SM 0.00
E450L 0.00 W558F 0.00 A331L 0.00 T510G 0.00
E585G 0.01 L349F 0.01 L628T 0.00 E622T 0.00
S367 0.00 N388K 0.00 T586H 0.00 E608G 0.00
A529G 0.01 ES37M 0.01 K372T 0.00 S381N 0.00
W5S0V 0.00 L582S 0.00 A378S 0.00 K520D 0.00 328F 0.01 D625Q 0.01 D346W 0.00 E512S 0.00
K578N 0.00 K578V 0.00 I465N 0.00 Y544C 0.00
P327S 0.01 V534Q 0.01 K425M 0.00
Y524K 0.00 Q3S0P 0.00 I493L 0.00
C627E 0.01 A378F 0.01 YS44N 0.00
379F 0.00 1456V 0.00 A369K 0.00
D342S 0.01 I354W 0.01 F390L 0.00
T526H 0.00 W558P 0.00 341S 0.00
A369G 0.01 K496P 0.00 V358 0.00
D611G 0.00 D546L 0.00 ES30I 0.00
E477K 0.01 A331H 0.00 A609T 0.00
RS38K 0.00 Y524S 0.00 E618T 0.00
M345 0.01 K540N 0.00 Y536L 0.00
N547T 0.00 D611Y 0.00 K614N 0.00
D373! 0.01 A378H 0.00 I427M 0.00
K619 0.00 WSS8S 0.00 E365F 0.00
R380I 0.01 T421A 0.00 K328S 0.00
Q350G 0.00 D546K 0.00 K460Q 0.00
I364Q 0.01 R453M 0.00 K584Y 0.00
F370S 0.00 578T 0.00 P495D 0.00
E365G 0.01 N636L 0.00 L413Y 0.00
K607G 0.00 W558T 0.00 354V 0.00
A378Y 0.01 K578E 0.00 R412L 0.00
WSS8H 0.00 DS46T 0.00 V534 0.00
TABLE 2 F322I L438I G482A Q350K I616L
Y324I L438V G482V S367A V645A
Y324L A439L S484A L374N
A331G A439M A485I L413I
V343T I442Y A485M G418
M354I I442L A485V S464A
V358L P446E F487Y Q480R
A361Y F448M I493L G497S
A361H E450 A494S F498L
Q363E F452Y P495A K522F
1385V I456E Q513E E548R
P395A 14581 A529K T554R
I404E K460R W550L Y555R
Q408N K460T L564V W558A
G418A D468G Y566T Q567P
G418F S475T S571T I580L
D420I K476P A604V E585
T428D E478Q F333A L590A
P432S E478L L347P V593I
L436M E478D L349A I600L
TABLE 3
SEQ IO NO. variant
1 wt insert
3 R1_V1 Y324! A439L A529K
4 R1_V2 P495A Y566T S571T
5 R1_V3 Q408N L438! A604V
6 R1_V4 P432S I493L L564V
7 R1_V5 V358L D420I K460R
8 R1_V6 F322I M354I A439M
9 R1_V7 A361Y E478Q A485I
10 R1_V8 Q408N I493L Y566T
11 R1_V9 L458I E478L A494S
12 R1_V10 F452Y A485I P495A
13 R1_V11 P395A G418A K460T
14 R1_V12 G418F I493L A494S
15 R1_ 13 P432S I456E A485!
16 R1_V14 L436 L438V A439M
17 R1_V15 D420I L436M A485M
18 R1_V16 M354I Q363E G482A
19 R1_V17 M354I L438I 1442Y
20 R1_V18 G418A E478D S484A
21 R1_V19 E450 K476P A494S
22 R1_V20 Y324L 1404E G482V
23 R1_V21 Y324I I442L F448M
24 R1_V22 F322I Y324I P446E
25 R1_V23 A331G G482A Q513E
26 R1_V24 1385V A439 L564V
27 R1_V25 I442Y Q513E S571T
28 R1_V26 I442Y D468G 476P
29 R1_V27 A331G T428D P446E
30 R1_V28 A361H S484A A494S
31 R1_V29 V343T 1385V I442Y
32 R1_V30 V3S8L I493L A529K
33 R1_V31 K460T A494S A604V
34 R1_V32 G418F L458I K460T
35 R1_V33 Q408N G418F W550L
36 R1_V34 F322I A331G F4S2Y
37 R1_V35 1385V E450K E478L
38 R1_V36 P446E K460T A529K
39 R1_V37 M354I A48SV S571T
40 R1_V38 K460T D468G G482A
41 R1_V39 I404E I442L S484A
42 R1.V40 Y324L E450K G482A
43 R1_V41 F322I P395A Y566T
44 R1_V42 Y324I 1385V S571T
45 R1_V43 K460R S484A A485I
46 R1_V44 G418F E478L A485
47 R1_V45 I456E S475T A529
48 R1_V46 P432S L436M G482V
49 R1_V47 I456E K460R E478Q
50 R1_V48 I442Y F487Y 1564V
SI R1_V49 Q363E D420I F487Y
52 R1_V50 V358L G482A F487Y
53 R1_V51
54 R1_V52 Y324L A361Y G418A
55 R1_V53 Y324L A331G A604V
56 R1_V54 L438I F448M K476P
57 R1_V55 D420I P446E E478D
58 R1_V56 A361H T428D K476P
59 R1_V57 A361Y F448M S475T
60 R1_V58 A439L W550L A604V
61 R1_V59 V343T D468G G482V SEQ ID NO. variant
62 R1_V60 A361Y Q363E 14421.
63 R1_V61 V343T L436 I493L
64 R1_V62 G482V A485I L564V
65 R1_V63 Y324I G418A E450
66 R1_V64 V358L I404E D468G
67 R1_V65 G418F L438I I442L
68 R1_V66 Q408N K476P E478L
69 R1_V67 P432S L438V E478D
70 R1_V68 1385V L438V A439L
71 R1_V69 A361Y A485V P495A
72 R1_V70 A361H A439L F452Y
73 R1_V71 K460R E478D F487Y
74 R1_V72 G418A L438I L458I
75 R1_V73 Y324L T428D F452Y
76 R1 J74 A439M A485V W550L
77 R1_V75 G482V Q513E A604V
78 R1_V76 T428D L438V S571T
79 R1_V77 V343T Q513E Y566T
80 R1_V78 P395A F452Y E478D
81 R1_V79 P432S E478Q P495A
82 R1_V80 L438V S475T A485V
83 R1_V81 A331G S475T E478Q
84 R1V82 I456E S484A W550L
85 R1_V83 F322I A361H A485M
86 R1_V84 P395A I456E E478L
87 R1_V85 A361H A439 P495A
88 R1_V86 Q363E I404E A485M
89 R1_V87 V358L F448M L564V
90 R1_V88 Q363E E450 S475T
91 R1_V89 V343T M354I F448M
92 R1_V90 D468G A48SM A529K
93 R1_V91 I404E F487Y W550L
94 R1_V92 T428D I442L Y566T
95 R1_V93 P39SA D420I A439L
96 R1_V94 L4581 K460R Q513E
97 R1_V95 Q408N L436M L458I
98 R1_V96
99 R2_V01 Y324! V343T P446E
100 R2_V02 T428D K460T A485I
101 R2_V03 L436M E450 A494S
102 R2_V04 L436 D468G A485I
103 R2_V05 Q363E L438I S475T
104 R2_V06 T428D P446E E450K
105 R2_V07 Q408N A485M P495A
106 R2_V08 L438I P446E P495A
107 R2_V09 M354! F448M A494S
108 R2_V10 L438I E450K E478L,
109 R2_V11 E450K 460T P495A
110 R2_V12 Y324I 354I K460T
111 R2_V13 A331G V343T K460T
112 R2_V14 Y324I Q408N A604V
113 R2_V15 T428D L458I S475T
114 R2_ 16 Q408N L436M F448
115 R2_V17 L438I D468G A48SM
116 R2_V18 F448M E450 A604V
117 R2_V19 D468G S475T A494S
118 R2_V20 Y324I A494S P49SA
119 R2_V21 Q363E S484A A604V
120 R2_V22 Y324I L438I S484A
121 R2_V23 Q408N G482A A494S SEQ ID NO. variant
122 R2_V24 V343T Q363E T428D
123 R2_V25 T428D A485M A604V
124 R2_ 26 M354I Q408N L438I
125 R2_V27 A331G T428D S484A
126 R2_V28 V343T L436 E478L
127 R2„V29 A331G F448 S475T
128 R2_V30 P446E S475T A485I
129 R2_V31 Q363E E478L A494S
130 R2_V32 Q408N P446E L4S8I
131 R2_V33 Q408N E450 S475T
132 R2_V34 A331G P446E A494S
133 R2_V35 P446E E478L A485
134 R2_V36 L438I L4S8I A604V
135 R2_V37 L438I K460T G482A
136 R2_V38 A331G E4S0 A485I
137 R2_V39 L436M P446E A604V
138 R2_V40 V343T Q408N A485I
139 R2_V41 V343T A494S A604V
140 R2_V42 M354I Q363E L436
141 R2_V43 Y324I D468G G482A
142 R2_V44 A331G L458I P495A
143 R2_V45 Q363E K460T A485M
144 R2„V 6 G482A S484A A485I
145 R2_V47 G482A P49SA A604V
146 R2_V48 E450 G482A A485M A494S
147 R3_V01 Y324I M354I L438I A494S
148 R3_V02 Y324I M354I P446E A494S
149 R3_V03 Y324I M354I D468G A494S
150 R3_V04 Y324I M354I D468G A494S
151 R3_V05 Y324I L438! P446E A494S
152 R3_V06 Y324I L438! D468G A494S
153 R3_V07 Y324I L438I D468G A494S
154 R3_V08 Y324t P446E D468G A494S
155 R3_V09 Y324I P446E D468G A494S
156 R3_V10 Y324I D468G S47ST A4945
157 R3_ 11 M354I L438I P446E A494S
158 R3_ 12 M354I L438I D468G A494S
159 R3_V13 354I L438I D468G
160 R3_V14 M3S4I P446E D468G A494S
161 R3_ 15 354I P446E D468G A494S
162 R3_V16 M354I D468G S475T A494S
163 R3„V17 L438I P446E D468G A494S
164 R3_V18 L438I P446E D468G A494S
165 R3_V19 L43SI D468G S475T A494S
166 R3_V20 P446E D468G S475T S475T G482A S484A A494S
167 R3_V21 Y324I M354I L438! S47ST G482A S484A A494S P495A
168 R3_V22 Y324I M354I L438I
169 R3_V23
170 R3_V24 D468G S475T A494S
171 R3_V25 D468G A494S L374N F333A W558A
172 R3_V26 Y324I D468G S475T P495A Q350K
173 R3_V27 Y324I D468G S475T G418R
174 R3_ 28 D468G S475T G482A I600L
175 R3_V29 Y324I D468G S475T S484A A494S
176 R3_V30 Y324I M354I P446E K522F
177 R3_V31 D468G S475T G482A P495A S464A
178 R3_V32 Y324! L438I D468G L374N
179 R3_V33 354! D468G S475T S367A I600L
180 R3_V34 P446E D468G S475T
181 R3_V35 P446E D468G A494S T554R SEQ ID NO. variant
182 R3_V36 D468G S475T A494S L590A
183 R3_V37 0468G S475T G482A E585 V593I
184 R3_V38 Y324I L438I D468G G497S I616L
185 R3_V39 L4381 D468G S475T
186 R3_V40 P446E D468G S475T
187 R3„V41 Y324I D468G S475T ES8SK
188 R3„V42 M3S4I 0468G S47ST P49SA Q567P.....„.,¾9,,..„.. R3^V43 _ M3S41 L438l__ TO D468G_ A494S E548R
, lyUv; .
191 R3_V45 354I S47ST G482A ES48R
192 R3_V46 D468G S475T A494S V593I
193 R3_V47 D46SG S475T S484A L590A
194 R3_V48 P446E D468G S475T Q480R
195 R3_V49 M354I D468G S475T L347P
196 R3_V50 P446E D468G S475T
197 R3_V51 D468G A494S P495A
198 R3_VS2 D468G S484A A494S
199 R3_V53 Y324I 354I D468G
200 R3_V54 S475T A494S G497S A494S L349A
201 3_V55 Y324I P446E D468G Q480R
202 R3_V56 D468G S475T A494S L590A
203 R3_V57 354I D468G S475T A494S L413I
204 R3_V58 Y324I 354I P446E I616L
205 R3_V59 Y324! D468G S475T I600L
206 R3_V60 M354I D468G S475T Q567P
207 R3_V61 Y324I D468G S475T
208 R3_V62 D468G S475T S484A
209 R3_V63 D468G 5475T S484A A494S L374N
210 R3_V64 L438I P446E D468G I580L
211 R3_V65 D468G S475T S484A I580L
212 R3„V66 D468G S475T A494S V593I
213 R3„V67 D468G S475T A494S A494S S464A
214 R3„V68 M3S4I D468G S47ST G497S
215 R3_V69 Y324I D468G S475T 1600L
216 R3_V70 D468G S475T A494S
217 R3_V71 Y324I D468G G482A E548R
218 R3__V72 P446E D468G S475T
219 R3„V73 S47ST A494S P495A S367A I580L
220 R3_V74 L438I D468G S475T
221 R3_V75 D468G S475T G482A K522F
222 R3_V77 L438! D468G S475T Y555R
223 R3_V78 D468G S475T A494S P495A L347P
224 R3_V79 L438I D468G S475T V593I
225 R3_V80 M354I D468G S475T
226 R3_V81 354I D468G S475T
227 R3_V82 D468G S475T T554R ES85K
228 R3_V83 D468G S475T A494S Y555R
229 R3_V84 L438I D468G S475T W558A
230 R3_V85 D468G S475T G482A I600L
231 R3_V86 D468G S475T G482A L590A
232 R3_V87 L438I D468G S475T A494S T554R
233 R3_V88 M354! L438I P446E
234 R3_V89 L438I D468G S47ST
235 R3_V90 S475T A494S L413I Q480R
236 R3_V91 P446E D468G S475T F498L
237 R3_V92 L438I D468G S475T G497S W558A
238 R3_V93 M354I D46SG S475T
239 R3_V94 D468G S475T A494S F498L YSSSR
240 R3_V95 P446E D468G S475T I616L
241 R3_V96 D468G S475T G482A
242 R3 V97 S475T A494S
TABLE 4 Substitution Rl-Activity Rl-Units R1-16S PCR Rl-RTPCR R2-NGS R3-NGS
F322I 0.10056167 0.005803527 -0.070818795 -0.06617972 na na
Y324I -0.0278636 -0.056476399 0.027857126 0.068656134 0.161242021 0.097617143
Y324L -0.271321 -0.123164391 0.015784906 -0.152080742 na na
A331G 0.01462625 0.002457362 0.095046493 0.099803268 -0.089732883 na
V343T -0.0796513 -0.01434754 0.144888572 0.123672468 -0.086701005 na
M3541 0.08936054 0.001014433 0.10071438 -0.025876884 0.247920402 0.058052536
V358L -0.0338797 0.043078889 0.035902245 0.002445977 na na
A361Y -0.5714315 -0.235304528 -0.444041821 0.03970759 na na
A361H -0.4132073 -0.196821495 -0.323033015 0.002603619 na na
Q363E -0.2358028 -0.123199014 -0.067106246 0.044835421 -0.210272249 na
1385V 0.02668118 0.041814199 -0.12431886 -0.078383991 na na
P395A 0.02451608 0.036248869 -0.606812237 0.05931173 na na
I404E -0.1332037 -0.094972003 -0.287293808 -0.053777831 na na
Q408N -0.1085216 -0.053776852 0.029934563 0.181918398 -0.050275102 na
G41SA 0.09747157 0.043454736 -0.143408101 -0.119912992 na na
G418F -0.027504 0.00653928 -0.724262843 -0.118502148 na na
D420I -0.0184899 0.016881882 0.318427215 -0.065439014 na na
T428D 0.08535571 0.030192756 -0.006795918 0.102728817 0.001567223 na
P432S -0.1812351 -0.029243134 -0.385568982 -0.057393825 na na
L436M -0.1338661 -0.100004602 0.272868239 0.028593969 0.013264537 na
L438I -0.0167223 0.043487471 0.402039717 0.256711403 0.322462794 0.039904743
L438V 0.14602407 0.18771812 -0.125999275 0.02910583 na na
A439L 0.0939156 0.005974837 -0.45054743 -0.062814693 na na
A439M 0.00310547 0.04642413 -0.394277633 -0.094155499 na na
1442Y 0.13433254 0.057064081 -0.433919892 -0.091350873 na na
I442L 0.12571151 0.064220477 -0.231814386 -0.003719399 na na
P446E 0.00663555 0.058969694 0.234509387 0.171659051 0.113900896 0.113818089
F448M -0.0581406 0.060728055 0.156035471 -0.005494905 -0.057042255 na
E450 0.14237433 0.065753295 0.090245188 0.215938694 -0.108212275 na
F452Y -0.194606 -0.133554949 0.286848315 -0.002442008 na na
I456E -0.1479296 -0.02483683 0.011115662 -0.000732625 na na
L458I -0.0664563 -0.067115856 0.091182583 -0.070126841 -0.261718833 na
K460R -0.0763708 -0.112922919 0.291240218 0.000375934 na na 460T -0.1215414 -0.045098321 0.086694057 0.046189286 -0.036565399 na
D468G 0.02086381 0.063542648 -0.17124121 0.005635662 0.053163099 -0.239531475
S475T 0.16648586 0.058875725 0.436728942 0.04705338 0.15726969 0.068171316
K476P -0.1368214 -0.110981305 -0.144870228 -0.158444129 na na
E478Q 0.14391395 0.11471038 -0.150381239 -0.070759824 na na
E478L 0.09889418 0.060090514 0.242586646 -0.009278648 -0.310879274 na
E478D 0.03936929 0.001635567 -0.091693098 -0.094219878 na na
G482A 0.03855082 -0.01460685 0.346145787 0.026726722 0.115588289 0.015893941
G482V 0.09094415 -0.018667198 -0.491226096 -0.088990601 na na
S484A -0.0518219 -0.028942669 0.28011295 -0.064712569 0.07418668 0.294420325
A485I 0.11713759 0.016891091 0.055732955 0.084492214 -0.090263532 na
A485M -0.1286538 -0.03658848 -0.185333062 0.061120348 -0.269940944 na
A485V 0.06806633 0.094474947 -0.093727474 -0.013362591 na na
F487Y -0.0745313 0.041799733 0.163290444 0.026857055 na na
I493L -0.3848844 -0.106261642 0.010430713 -0.033787994 na na
A494S 0.10897749 0.026899989 0.237511513 0.132655373 0.658629722 0.027169375
P495A 0.26057662 0.007156974 0.028762525 0.047358454 0.056906757 -0.152929864 Q513E -0.0204814 0.004156861 -0.065064876 -0.011131051 na na
A529K 0.15875409 0.000961574 0.013280416 -0.11440042 na na
W550L 0.09923792 0.043259807 -0.06039135 -0.001835908 na na
L564V 0.30977623 0.15119552 -0.346878563 -0.038366162 na na
Y566T 0.04587504 0.063872131 0.007005457 -0.117099113 na na
S571T -0.0465726 -0.072665682 -0.440953799 -0.13354912 na na
A604V 0.06780759 0.040778512 0.042596244 0.123226517 -0.139732804 na
F333A na na na na na 0.026360906
L347P na na na na na -0.19608842
L349A na na na na na -0.022041998
Q350K na na na na na -0.016339702
S367A na na na na na -0.00896047
L374N na na na na na -0.281224552
L413I na na na na na -0.259902035
G41S na na na na na -0.166998963
S464A na na na na na -0.108122903
Q480R na na na na na -0.191160941
G497S na na na na na -0.017975667
F49SL na na na na na -0.019878415
K522F na na na na na -0.03758523
E548R na na na na na 0.388814402
T554R na na na na na 0.18473505
Y555R na na na na na -0.279420604
W558A na na na na na -0.313663843
Q567P na na na na na -0.213881896
1S80L na na na na na -0.192818161
E5SSK na na na na na -0.223871388
L590A na na na na na -0.200229351
V593I na na na na na 0.105084863
I600L na na na na na 0.030382288
I616L na na na na na 0.13225754
V645A na na na na na -0.187412168
TABLE 5

Claims

1. A polynucleotide encoding a non-naturally occuring polymerase, wherein the polymerase has a sequence comprising SEQ ID NO: 1 modified by one or more substitutions listed in Table 3 and up to ten internal insertions, deletions or substitutions at positions other than those listed in Table 3.
2. The polynucleotide of claim 1 comprising at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 substitutions listed in Table 3.
3. The polynucleotide of any preceding claim comprising a combination of substitutions selected from the combinations listed in Table 4.
4. The polynucleotide of any preceding claim having at least 90, 95 or 99% sequence identity to claim 1.
5. The polynucleotide of any preceding claim having no substitutions other than those shown in Table 3 and conservative substitutions not affecting activity of the polymerase.
6. The polynucleotide of any preceding claim, wherein the polymerase exhibits enhanced strand displacement activity relative to a polymerase having the sequence ofSEQ ID NO:l.
7. The polynucleotide of any preceding claim, wherein the polymerase has enhanced thermostability relative to a polymerase having the sequence of SEQ ID NO: 1.
8. The polynucleotide of any preceding claim, wherein the polymerase has enhanced polymerase activity relative to a polymerase having the sequence of SEQ ID NO: 1.
9. The polynucleotide of any preceding claim, wherein the polymerase has enhanced reverse transcriptase activity relative to a polymerase having the sequence of SEQ ID NO:l.
10. The polynucleotide of any preceding claim, which encodes an amino acid sequence at least 90, 95 or 99% sequence identical to any of SEQ ID NOS: 3-242 provided any substitutions present in the amino acid sequence specified in Table 4 are retained.
11. The polynucleotide of any preceding claim encoding the sequence of any of SEQ ID NOS.3 242.
12. A non-naturally occurring polymerase encoded by a polynucleotide of any preceding claim.
13. A method of synthesizing a copy or complement of a target polynucleotide template comprising expressing an isolated polynucleotide of claim 1 to produce a polymerase; contacting the polymerase with a target polynucleotide template, a primer and nucleotides wherein a copy or complement of the target polynucleotide template is synthesized.
14. The method of claim 13, wherein the sequence of the isolated
polynucleotide comprises any of SEQ ID NOS: 3-242.
15. The method of claim 13, wherein the isolated polynucleotide is operably linked to a promoter in a construct.
16. The method of claim 13, wherein the construct is in a recombinant host cell.
17. The method of claim 13, wherein the target polynucleotide is DNA.
18. The method of claim 13, wherein the target polynucleotide is KNA.
19. The method of claim 13, wherein the target polynucleotide comprises an amplification resistant sequence comprising direct repeats, inverted repeats, at least 65% G+C residues or A+T residues or a sequence greater than 2 kilobases.
20. A kit comprising the polynucleotide of any of claim 1-12 or a polymerase encoded by the polynucleotide and at least one other reagent for an amplification or sequencing reaction.
PCT/US2017/040994 2016-07-06 2017-07-06 Modification of dna polymerases for in vitro applications Ceased WO2018009729A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662359059P 2016-07-06 2016-07-06
US62/359,059 2016-07-06
US201662360627P 2016-07-11 2016-07-11
US62/360,627 2016-07-11

Publications (2)

Publication Number Publication Date
WO2018009729A2 true WO2018009729A2 (en) 2018-01-11
WO2018009729A3 WO2018009729A3 (en) 2018-02-15

Family

ID=60913159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/040994 Ceased WO2018009729A2 (en) 2016-07-06 2017-07-06 Modification of dna polymerases for in vitro applications

Country Status (1)

Country Link
WO (1) WO2018009729A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021242740A3 (en) * 2020-05-26 2022-03-10 Qiagen Beverly Llc Polymerase enzyme
CN115948364A (en) * 2022-10-24 2023-04-11 翌圣生物科技(上海)股份有限公司 Taq DNA polymerase mutant Taq001 and its coding gene, expression plasmid, prokaryotic expression host

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007044671A2 (en) * 2005-10-06 2007-04-19 Lucigen Corporation Thermostable viral polymerases and methods of use
WO2010091203A2 (en) * 2009-02-04 2010-08-12 Lucigen Corporation Rna-and dna-copying enzymes

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021242740A3 (en) * 2020-05-26 2022-03-10 Qiagen Beverly Llc Polymerase enzyme
CN115948364A (en) * 2022-10-24 2023-04-11 翌圣生物科技(上海)股份有限公司 Taq DNA polymerase mutant Taq001 and its coding gene, expression plasmid, prokaryotic expression host

Also Published As

Publication number Publication date
WO2018009729A3 (en) 2018-02-15

Similar Documents

Publication Publication Date Title
US20250115895A1 (en) Dpo4 polymerase variants
JP7288457B2 (en) DPO4 polymerase variants with improved fidelity
US20240240161A1 (en) Dp04 polymerase variants
CN112442493A (en) Thermostable reverse transcriptase
CN110331136B (en) A terminal deoxyribonucleoside transferase variant and its application
EP3423576B1 (en) Polymerase variants
WO2023143123A1 (en) Terminal transferase variant for controllable synthesis of single-stranded dna and use thereof
KR20230075403A (en) Reverse transcriptase mutants with increased activity and thermostability
CN106318924B (en) A kind of archaeal dna polymerase that catalytic dna synthesis extension ability improves
Andrews et al. Characterization of a novel bacterial arginine kinase from Desulfotalea psychrophila
WO2018009729A2 (en) Modification of dna polymerases for in vitro applications
CN114174502B (en) Phi29 DNA polymerase mutant with improved primer recognition
WO2018009726A2 (en) Modification of dna polymerases for in vitro applications
EP3697930A1 (en) Primer-independent dna polymerases and their use for dna synthesis
TWI862945B (en) B-family dna polymerase and kit comprising the same
WO2002031745A1 (en) Information rich libraries
JP6826275B2 (en) Modified polymerase
JP7660563B2 (en) Improved thermostable viral reverse transcriptases
CN114958800A (en) Taq DNA polymerase mutant resistant to inhibition of blood or blood product and application thereof
CN118401656A (en) Reverse transcriptase having excellent thermostability
CN120225667A (en) Recombinant proteins and uses thereof
US20240301457A1 (en) Compositions and methods for enzymatic nucleic acid synthesis
JP2023090534A (en) Thermostable reverse transcriptase
WO2024138419A1 (en) Polypeptide having dna polymerase activity and use thereof
WO2023082266A1 (en) Chimeric dna polymerase and use thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17824937

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17824937

Country of ref document: EP

Kind code of ref document: A2