WO2008118545A2 - Procédés de génération de protéines stabilisées novatrices - Google Patents
Procédés de génération de protéines stabilisées novatrices Download PDFInfo
- Publication number
- WO2008118545A2 WO2008118545A2 PCT/US2008/053344 US2008053344W WO2008118545A2 WO 2008118545 A2 WO2008118545 A2 WO 2008118545A2 US 2008053344 W US2008053344 W US 2008053344W WO 2008118545 A2 WO2008118545 A2 WO 2008118545A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- segment
- sequence
- chimeras
- sequences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/0004—Oxidoreductases (1.)
- C12N9/0071—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
- C12N9/0077—Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14) with a reduced iron-sulfur protein as one donor (1.14.15)
Definitions
- the invention relates to biomolecular engineering and design, including methods for the design and engineering of biopolymers such as proteins and nucleic acids.
- the disclosure provides a polypeptide comprising sequences from CYP102A1, CYP102A2 or CYP102A3 and having the general structure from N-terminus to C-terminus : [segment I]-
- segment 7 is amino acid residue from about 1 to about xl of SEQ ID NO:1 ("1"), SEQ ID NO: 2 ("2") or SEQ ID N ⁇ :3 (“3"); segment 2 is from about amino acid residue xl to about x2 of SEQ ID N0:l (“1"), SEQ ID NO : 2 ("2”) or SEQ ID N ⁇ :3
- segment 3 is from about amino acid residue x2 to about x3 of SEQ ID N ⁇ :l ("1"), SEQ ID N ⁇ :2 ("2") or SEQ ID NO : 3 ("3"); segment 4 is from about amino acid residue x3 to about x4 of SEQ ID N ⁇ :l
- segment 5 is from about amino acid residue x4 to about x5 of SEQ ID N0:l ("1"), SEQ ID N0:2 ("2") or SEQ ID NO : 3 ("3");
- segment 6 is from about amino acid residue x5 to about x6 of SEQ ID N0:l ("1"), SEQ ID N0:2 ("2") or SEQ ID N0:3 ("3”);
- segment 7 is from about amino acid residue x ⁇ to about x7 of SEQ ID N0:l (“1"), SEQ ID N0:2 (“2”) or SEQ ID N0:3 (“3”); and segment 8 is from about amino acid residue x7 to about x8 of SEQ ID N0:l (“1"), SEQ ID NO : 2 (“2”) or SEQ ID N0:3 (“3”); wherein: xl is residue 62, 63, 64, 65 or 66 of SEQ ID N0:l,
- the polypeptide comprises a heme domain and the heme domain is fused to a functional reductase domain having at elast 50% identitity to the reductase domain of SEQ ID N0:l, 2, or 3.
- the disclosure also provides a polypeptide having the general structure from N-terminus to C-terminus : [segment I]- [segment 2] -[segment 3] -[segment 4] -[segment 5] -[segment 6]- [segment 7] -[segment 8] wherein segment 1 comprises at least 50- 100% identity to the sequence of SEQ ID NO: 4, 5, or 6; wherein segment 2 comprises at least 50-100% identity to the sequence of SEQ ID NO: 7, 8, or 9; wherein segment 3 comprises at least 50-100% identity to the sequence of SEQ ID NO: 10, 11 or 12; segment 4 comprises at least 50-100% identity to the sequence of SEQ ID NO: 13, 14, or 15; segment 5 comprises at least 50-100% identity to the sequence of SEQ ID NO: 16, 17, or 18; segment 6 comprises at least 50-100% identity to the sequence of SEQ ID NO: 19, 20, or 21; segment 7 comprises at least 50-100% identity to the sequence of SEQ ID NO
- polynucleotide encoding a polypeptide as set forth herein and above.
- the polynucleotide can be contained in a vector, such as an expression vector, or in a host cell (either as part of the genome or within a vector with in the host cell.
- the disclosure also provides an enzyme extract comprising a polypeptide produced from the host cell of the disclosure .
- the disclosure provides chimeric polypeptides of P450.
- the disclosure provides a polypeptide comprising sequences from CYP102A1 ("1"), A2 ("2") and A3 ("3") having the general formula 11311333, 11312331, 11312333, 21111133, 21111213, 21111231, 21111233, 21111311, 21111313, 21111331, 21111332, 21112131, 21112133, 21112211, 21112213, 21112231, 21112313, 21112323, 21113213, 21113231, 21113233, 21113311, 21113313, 21113331, 21113332, 21211133, 21211213, 21211231, 21211233, 21211311, 21211313, 21211331, 21211332, 21211333, 21212131, 21212211, 21212232, 21212311, 21212313, 21212323, 21212331, 21213213, 212
- the polypeptide comprises a first peptide segment comprising about 64 to 68 amino acids having at the C-terminus of the first peptide segment a sequence E(S or E or K)RFD (SEQ ID NO:29), a second peptide segment comprising about 56 to 60 amino acids having at the C-terminus of the second peptide segment a sequence K(G or D)YH(A or E or S) (SEQ ID NO: 30), a third peptide segment comprising about 42 to 46 amino acids having at the C- terminus of the third peptide segment a sequence GFNYR (SEQ ID NO:31), a fourth peptide segment comprising about 48 to 52 amino acid having at the C-terminus of the fourth peptide segment a sequence (D or S)LVD(K or S or R) (SEQ ID NO:32), a fifth peptide segment comprising about 50 to 54 amino acids having at the C- terminus of the fifth peptide segment a sequence HETTS (SEQ ID NO:33
- thermostabilities of cytochrome P450 proteins assembled by structure-guided SCHEMA recombination were determined in order to identify relationships that would allow prediction of the stabilities of untested sequences.
- the disclosure shows that a chimera's thermostability can be predicted from the additive contributions of sequence fragments. Those contributions can be determined either by linear regression of stability-sequence data or, with less accuracy, from the frequencies with which the specific sequence fragments appear in folded vs. unfolded chimera population. Using these observations as the basis for predicting highly stable sequences, a diverse family of 40 thermostable cytochrome P450s whose half-lives of inactivation at 57 0 C are as much as 100 times that of the most stable parent.
- the stable P450s are diverse, yet still retain catalytic activity. Some are significantly more active than the parent enzymes towards a nonnatural substrate, 2-phenoxyethanol .
- This stabilized protein family provides a unique ensemble for biotechnological applications and for studying sequence-stability-function relationships.
- FIG. 1 shows thermostabilities of parental and chimeric cytochromes P450.
- the distribution of T 50 values for 185 chimeric cytochromes P450 has an average of 50.4 0 C and standard deviation of 4.5 0 C.
- Thermostabilities for parents Al, A2 and A3 are indicated (solid lines), with four experimental replicate measurements for A2 to examine measurement variability (dotted lines, standard deviation of 1.0 0 C) .
- Figure 2A-B shows sequence elements contribute additively to thermostabilities of chimeric cytochromes P450.
- b Linear model derived from data in a accurately predicts stabilities of 20 additional chimeras, including the most-stable P450 (MTP) (top rightmost point) .
- MTP most-stable P450
- Figure 3A-B depicts relative frequencies of sequence elements among folded chimeras correlates with relative stability contributions.
- a Thermostability contributions of fragments from parents Al and A3 relative to those from parent A2, obtained by linear regression analysis of 205 folded chimeras with measured T 50 .
- b Frequencies of fragments from parents Al and A3 relative to those from parent A2 among folded chimeras
- c Relative fragment thermostability contributions correlate with their relative frequencies among folded chimeras .
- Figure 4A-D shows chimera thermostabilities and folding status predicted from sequence element frequencies in multiple sequence alignments of folded and unfolded proteins
- a Consensus energies computed from Boltzmann statistics and fragment frequencies of folded chimeras correlate with measured thermostabilities (T 5 os)
- b The distribution of consensus energies of 620 folded chimeras and 335 unfolded chimeras. Folded chimeras (dark grey) have lower consensus energies than unfolded chimeras (light grey) . Overlap region is shown. The consensus energies were calculated as in a.
- Consensus energies computed from Boltzmann statistics and fragment frequencies using folded and unfolded chimeras correlate with measured thermostabilities (T 50 ) .
- T 50 thermostabilities
- d Folded chimeras (dark grey) have lower consensus energies than unfolded chimeras (light grey) . Overlap region is shown. Consensus energies were calculated as in c.
- Figure 5A-D shows linear regression analysis of protein stability, a. Predicted T 50 compared to experimental Tsofor the training data set. The r value for the regression line is 0.901. Squares represent outlier points removed after training, b. Predicted T 5 ocompared to measured T 50 for the test data set. The r value for the regression line is 0.856. c. Prediction accuracy (indicated by correlation coefficient between predicted T 5 oand measured T 50 ) depends on the number of chimeras used for regression analysis, d. Prediction of T 50 S of 6,561 members of the synthetic protein library.
- Figure 6 shows prediction accuracy (indicated by the Spearman rank-order correlation coefficient between predicted consensus energies and measured T 50 ) is related to the number of chimeras used for consensus analysis.
- Figure 7A-B shows sequence diversity for 40 stable chimeric cytochrome P450 heme domains and the three parent sequences, a.
- a multi-dimensional scaling (XGOBI) was used to optimize a two-dimensional representation that minimizes the discrepancy between the Euclidean distances and the sequence differences.
- Figure 8 shows a comparison of the ranking performance using regression (circles) to the ranking performance using consensus (filled circles) .
- the points represent the performance of each ranking method when partitioning the set of three parents and 205 chimeras with measured T 50 values into the top 10, 20, 30...200.
- the y-positions of the leftmost points indicate that the consensus method correctly flags 4 of the top 10 chimeras while the regression method correctly flags 6.
- the x-positions of the leftmost points indicate that the consensus method correctly flags 96 of the bottom 99 chimeras while the regression method correctly flags 97.
- the regression model has superior ranking performance for all threshold choices.
- Figure 9 depicts the sequence domains.
- Figure 10 shows the amino acid sequence for CYP102A1.
- Figure 11 shows the amino acid sequence for CYP102A2.
- Figure 12 shows the amino acid sequence for CYP102A3.
- Figure 13A-B show an alignment of SEQ ID NOs: 1-3.
- Figure 14 shows chemical structures and abbreviations. Substrates are grouped according to the pairwise correlations. Members of a group are highly correlated; intergroup correlations are low.
- Figure 15 shows a summary of normalized activities for all 56 enzymes acting on 11 substrates. Activities are shown using a color scale (white indicating highest and black lowest activity) , with columns representing substrates and rows representing proteins. Not-analyzed A3, A3-R1 and A3-R2 proteins are shown in grey. Protein rows are ordered by their chimeric sequence first, and then by heme domain (RO) and Rl, R2- and R3-fusions.
- Figure 16A-D shows substrate-activity profiles for parent heme domain mono- and peroxygenases . Panel (A) shows parent peroxygenases, panel (B) parent holoenzyme monooxygenases profiles, panel (C) the Al protein set and panel (D) the A2 protein set.
- Figure 17A-F shows K-means clustering analysis separates chimeras into five clusters. All protein-activity profiles are depicted in (A) . Panels (B) through (F) show profiles for sequences within each cluster. Panel (B) depicts 32312333-R1/R2, 32313233- R1/R2. Panel (C) depicts 22213132-R2, 21313111-R3, 21313311-R3. Panel (D) depicts A1-R1/R2, 12112333-R1/R2, 11113311-R1/R2 and 22213132-R1.
- Panel (E) depicts 21313111-R1/R2, 22313233-R2, 22312333-R2, 32312231-R2, 32312333-R0, 32312333-R3, 32313233-R0, and 32313233-R3.
- Panel (F) depicts the remaining sequences.
- Figure 18 shows the interface between the FMN backbone and heme domain based on the IBVY structure. Residue indicate the degree of conservation. Hydrogen bonds are shown as dashed lines. The amino acids correspond to CYP102A1 numbering.
- Figure 19A-P shows substrate-activity profiles of all chimeras. The columns are coded as follows from front to back: heme domain (RO, front), Rl-, R2-, R3-fusion protein.
- DP diphenyl ether
- PA ethyl phenoxyacetate
- Proteins fold into native structures determined by their amino acid sequences and thereby become biologically active. Stability of the native structure therefore plays a vital role in function, and also in protein turnover, genetic diseases, mutational tolerance, functional evolvability, and even the rate of evolution. Proteins with enhanced stability are of significant benefit in industrial applications, where they are better suited to formulation, long-term storage, and extended use in non-natural environments such as elevated temperature. Stabilized proteins are also better starting points for engineering, because their enhanced robustness to mutations makes it easier for them to acquire new functional properties.
- cytochrome P450 family of heme-containing redox enzymes hydroxylate a wide range of substrates to generate products of significant medical and industrial importance.
- the disclosure demonstrates a new approach to making highly stable, functional proteins with diverse sequences by predicting the stable chimeras in a site-directed SCHEMA recombination library.
- That fragments of the primary sequence contribute additively to stability may appear surprising, considering the cooperative nature of protein folding and many tertiary contacts in the native structure.
- the high degree of additivity observed in this study may be a feature of SCHEMA library design.
- SCHEMA identifies those sequence fragments that minimize the number of contacts, or interactions, that can be broken upon recombination. Two residues in a chimera are defined to have a contact if any heavy atoms are within 4.5 A; the contact is broken if they do not appear together in any parent at the same positions. Among a total of about 500 contacts for a P450 chimera, an average of fewer than 30 were broken for the sequences in the SCHEMA library.
- the fragments that were swapped in this library have a high number of internal contacts; the inter-fragment contacts are either few or are conserved among the parents. Therefore, the fragments function as pseudo-independent structural modules that make roughly additive contributions to stability.
- the additivity was strong enough to enable detection of sequencing errors based on deviations from additivity, prediction of thermostabilities for uncharacterized chimeras with high accuracy, and prediction of the T 50 of the most stable chimera to within measurement error. This additivity enabled a new approach to stabilizing an entire protein family that does not require high throughput selection or screening.
- amino acid is a molecule having the structure wherein a central carbon atom (the -carbon atom) is linked to a hydrogen atom, a carboxylic acid group (the carbon atom of which is referred to herein as a “carboxyl carbon atom”), an amino group (the nitrogen atom of which is referred to herein as an "amino nitrogen atom"), and a side chain group, R.
- an amino acid loses one or more atoms of its amino acid carboxylic groups in the dehydration reaction that links one amino acid to another.
- an amino acid is referred to as an "amino acid residue."
- Protein or “polypeptide” refers to any polymer of two or more individual amino acids (whether or not naturally occurring) linked via a peptide bond, and occurs when the carboxyl carbon atom of the carboxylic acid group bonded to the -carbon of one amino acid (or amino acid residue) becomes covalently bound to the amino nitrogen atom of amino group bonded to the -carbon of an adjacent amino acid.
- protein is understood to include the terms “polypeptide” and “peptide” (which, at times may be used interchangeably herein) within its meaning.
- proteins comprising multiple polypeptide subunits (e.g., DNA polymerase III, RNA polymerase II) or other components (for example, an RNA molecule, as occurs in telomerase) will also be understood to be included within the meaning of "protein” as used herein.
- proteins comprising multiple polypeptide subunits (e.g., DNA polymerase III, RNA polymerase II) or other components (for example, an RNA molecule, as occurs in telomerase) will also be understood to be included within the meaning of "protein” as used herein.
- fragments of proteins and polypeptides are also within the scope of the invention and may be referred to herein as "proteins.”
- a stabilized protein comprises a chimera of two or more parental peptide segments.
- a "peptide segment” refers to a portion or fragment of a larger polypeptide or protein.
- a peptide segment need not on its own have functional activity, although in some instances, a peptide segment may correspond to a domain of a polypeptide wherein the domain has its own biological activity.
- a stability-associated peptide segment is a peptide segment found in a polypeptide that promotes stability, function, or folding compared to a related polypeptide lacking the peptide segment.
- a destabilizing- associated peptide segment is a peptide segment that is identified as causing a loss of stability, function or folding when present in a polypeptide.
- a particular amino acid sequence of a given protein is determined by the nucleotide sequence of the coding portion of a mRNA, which is in turn specified by genetic information, typically genomic DNA (including organelle DNA, e.g., mitochondrial or chloroplast DNA).
- genomic DNA including organelle DNA, e.g., mitochondrial or chloroplast DNA.
- Polynucleotide or “nucleic acid sequence” refers to a polymeric form of nucleotides. In some instances a polynucleotide refers to a sequence that is not immediately contiguous with either of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally occurring genome of the organism from which it is derived.
- the term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences.
- the nucleotides of the invention can be ribonucleotides, deoxyribonucleotides, or modified forms of either nucleotide.
- a polynucleotides as used herein refers to, among others, single-and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions.
- polynucleotide as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA.
- the strands in such regions may be from the same molecule or from different molecules.
- the regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules.
- One of the molecules of a triple-helical region often is an oligonucleotide.
- polynucleotide encompasses genomic DNA or RNA (depending upon the organism, i.e., RNA genome of viruses), as well as mRNA encoded by the genomic DNA, and cDNA.
- nucleic acid segment refers to a portion of a larger polynucleotide molecule.
- the polynucleotide segment need not correspond to an encoded functional domain of a protein; however, in some instances the segment will encode a functional domain of a protein.
- a polynucleotide segment can be about 6 nucleotides or more in length (e.g., 6-20, 20-50, 50-100, 100-200, 200-300, 300- 400 or more nucleotides in length) .
- a stability-associated peptide segment can be encoded by a stability-associated polynucleotide segment, wherein the peptide segment promotes stability, function, or folding compared to a polypeptide lacking the peptide segment.
- a chimera is a combination of at least two segments of at least two different parent proteins. As appreciated by one of skill in the art, the segments need not actually come from each of the parents, as it is the particular sequence that is relevant, and not the physical nucleic acids themselves.
- a chimeric P450 will have at least two segments from two different parent P450s. The two segments are connected so as to result in a new P450.
- a protein will not be a chimera if it has the identical sequence of either one of the parents.
- a chimeric protein can comprise more than two segments from two different parent proteins. For example, there may be 2, 3, 4, 5-10, 10-20, or more parents for each final chimera or library of chimeras.
- the segment of each parent enzyme can be very short or very long, the segments can range in length of contiguous amino acids from 1 to the entire length of the protein. In one embodiment, the minimum length is 10 amino acids. In one embodiment, a single crossover point is defined for two parents.
- the crossover location defines where one parent's amino acid segment will stop and where the next parent's amino acid segment will start.
- a simple chimera would only have one crossover location where the segment before that crossover location would belong to one parent and the segment after that crossover location would belong to the second parent.
- the chimera has more than one crossover location. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-30, or more crossover locations. How these crossover locations are named and defined are both discussed below.
- a P450 chimera from CYP102A1 (hereinafter "Al”) and CYP102A2 (hereinafter "A2”), with two crossovers at 100 and 150, could have the first 100 amino acids from Al, followed by the next 50 from A2, followed by the remainder of the amino acids from Al, all connected in one contiguous amino acid chain.
- the P450 chimera could have the first 100 amino acids from A2, the next 50 from Al and the remainder followed by A2.
- variants of chimeras exist as well as the exact sequences. Thus, not 100% of each segment need be present in the final chimera if it is a variant chimera.
- Protein stability is a key factor for industrial protein use (e.g., enzyme reaction) in denaturing conditions required for efficient product development and in therapeutic and diagnostic protein products.
- Methods for optimizing protein stability have included directed evolution and domain shuffling. However, screening and developing such recombinant libraries is difficult and time consuming.
- a method of identifying stabilizing mutations is a first step in removing or narrowing possible candidates. For this reason it is of value to be able to make multiple versions of a protein that are stabilized. If one has many stable variants to choose from, then those variants that exhibit all of the properties of interest can be identified by appropriate analysis of those properties.
- the disclosure provides a method for making many (e.g., from 1 to many thousand) variants of a protein having amino acid sequences that may differ at multiple amino acid positions and that are stabilized and thus are likely to be functional.
- Such techniques for generating libraries of stabilized proteins have not previously been provided in the art.
- a number of techniques are used for generating novel proteins including, for example, rational design, which uses computational methods to identify sites for introducing disulfide bonds; directed evolution; and consensus stabilization. The foregoing methods do not utilize a linear regression or consensus analysis to assist selectively designing stabilized proteins.
- Recombination has been widely applied to accelerate in vitro protein evolution. In this process, the genetic information of several genes is exchanged to produce a library of recombined, recombinant mutants. These mutants are screened for improvement in properties of interest, such as stability, activity, or altered substrate specificity.
- In vitro recombination methods include DNA shuffling, random-priming recombination, and the staggered extension process (StEP) .
- DNA shuffling the parental DNA is enzymatically digested into fragments. The fragments can be reassembled into offspring genes.
- random-priming method template DNA sequences are primed with random-sequence primers and then extended by DNA polymerase to create fragments. The template is removed and the fragments are reassembled into full-length genes, as in the final step of DNA shuffling.
- the number of cut points can be increased by starting with smaller fragments or by limiting the extension reaction.
- StEP recombination differs from the first two methods because it does not use gene fragments.
- the template genes are primed and extended before denaturation and reannealing. As the fragments grow, they reanneal to new templates and thus combine information from multiple parents. This process is cycled hundreds of times until a full-length offspring gene is formed.
- the foregoing methods are known in the art .
- polypeptides As a first step in performing any recombination techniques a set of related polypeptides is identified.
- the relatedness of the polypeptides can be determined in any number of ways known in the art. For example, polypeptides may be related structurally either in their primary sequence or in the secondary or tertiary sequence. Methods of identifying sequence identity or 3D structural similarities are known and are further described herein. Another method to identify a related polypeptide is through evolutionary analysis. Evolutionary trees have been developed for a large number of proteins and are available to those of skill in the art.
- a parental sequence used as a basis for defining a set of related polypeptides can be provided by any of a number of mechanisms, including, but not limited to, sequencing, or querying a nucleic acid or protein database. Additionally, while the parental sequence can be provided in a physical sense (e.g., isolated or synthesized) , typically the parental sequence or sequences are obtain in silico.
- the parental sequences typically are derived from a common family of proteins having similar three-dimensional structures (e.g., protein superfamilies) .
- the nucleic acid sequences encoding these proteins might or might not share a high degree of sequence identity.
- the methods include assessing crossover positions using any number of techniques (e.g., SCHEMA etc.).
- Sequence similarity/identity of various stringency and length can be detected and recognized using a number of methods or algorithms known to one of skill in the art. For example, many identity or similarity determination methods have been designed for comparative analysis of sequences of biopolymers, for spell- checking in word processing, and for data retrieval from various databases.
- models that simulate annealing of complementary homologous polynucleotide strings can also be used as a foundation of sequence alignment or other operations typically performed on the character strings corresponding to the sequences herein (e.g., word-processing manipulations, construction of figures comprising sequence or subsequence character strings, output tables, etc.) .
- An example of a software package for calculating sequence identity is BLAST, which can be adapted to the disclosure by inputting character strings corresponding to the sequences herein.
- sequences are aligned.
- a plurality of parental sequences are provided, which are then aligned with either a reference sequence, or with one another. Alignment and comparison of relatively short amino acid sequences (for example, less than about 30 residues) is typically straightforward. Comparison of longer sequences can require more sophisticated methods to achieve optimal alignment of two sequences .
- Optimal alignment of sequences can be performed, for example, by a number of available algorithms, including, but not limited to, the "local homology” algorithm of Smith and Waterman (Adv. Appl. Math. 2:482, 1981), the “homology alignment” algorithm of Needleman and Wunsch (J. MoI. Biol. 48:443, 1970), the “search for similarity” method of Pearson and Lipman (Proc. Natl. Acad. Sci.
- sequences can be aligned by inspection. Generally the best alignment (i.e., the relative positioning resulting in the highest percentage of sequence identity over the comparison window) generated by the various methods is selected. However, in certain embodiments of the disclosure, the best alignment may alternatively be a superpositioning of selected structural features, and not necessarily the highest sequence identity.
- sequence identity means that two amino acid sequences are substantially identical (i.e., on an amino acid-by- amino acid basis) over a window of comparison.
- sequence similarity refers to similar amino acids that share the same biophysical characteristics.
- percentage of sequence identity or “percentage of sequence similarity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical residues (or similar residues) occur in both polypeptide sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity (or percentage of sequence similarity) .
- sequence identity and sequence similarity have comparable meaning as described for protein sequences, with the term “percentage of sequence identity” indicating that two polynucleotide sequences are identical (on a nucleotide-by- nucleotide basis) over a window of comparison.
- a percentage of polynucleotide sequence identity or percentage of polynucleotide sequence similarity, e.g., for silent substitutions or other substitutions, based upon the analysis algorithm
- Maximum correspondence can be determined by using one of the sequence algorithms described herein (or other algorithms available to those of ordinary skill in the art) or by visual inspection.
- the term substantial identity or substantial similarity means that two peptide sequences, when optimally aligned, such as by the programs BLAST, GAP or BESTFIT using default gap weights or by visual inspection, share sequence identity or sequence similarity.
- substantial identity or substantial similarity means that the two nucleic acid sequences, when optimally aligned, such as by the programs BLAST, GAP or BESTFIT using default gap weights (described in detail below) or by visual inspection, share sequence identity or sequence similarity.
- FASTA FASTA algorithm
- PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments to show relationship and percent sequence identity or percent sequence similarity. It also plots a tree or dendogram showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, (1987) J. MoI. Evol . 35:351-360. The method used is similar to the method described by Higgins & Sharp, CABIOS 5:151- 153, 1989. The program can align up to 300 sequences, each of a maximum length of 5,000 nucleotides or amino acids.
- the multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster is then aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences are aligned by a simple extension of the pairwise alignment of two individual sequences . The final alignment is achieved by a series of progressive, pairwise alignments.
- the program is run by designating specific sequences and their amino acid or nucleotide coordinates for regions of sequence comparison and by designating the program parameters.
- PILEUP a reference sequence is compared to other test sequences to determine the percent sequence identity (or percent sequence similarity) relationship using the following parameters: default gap weight (3.00), default gap length weight (0.10), and weighted end gaps.
- PILEUP can be obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et al., (1984) Nuc. Acids Res. 12:387-395).
- Another example of an algorithm that is suitable for multiple DNA and amino acid sequence alignments is the CLUSTALW program (Thompson, J. D. et al., (1994) Nuc. Acids Res. 22:4673- 4680). CLUSTALW performs multiple pairwise comparisons between groups of sequences and assembles them into a multiple alignment based on sequence identity. Gap open and Gap extension penalties were 10 and 0.05 respectively.
- the BLOSUM algorithm can be used as a protein weight matrix (Henikoff and Henikoff, (1992) Proc. Natl. Acad. Sci. USA 89:10915-10919).
- Another method of determining relatedness is through protein and polynucleotide alignments. Common methods include using sequence based searches available on-line and through various software distribution routes. Homology or identity at the amino acid or nucleotide level can be determined by BLAST (Basic Local Alignment Search Tool) and by ClustalW analysis using the algorithm employed by the programs blastp, blastn, blastx, tblastn and tblastx (Karlin et al., Proc. Natl. Acad. Sci.
- the search parameters for histogram, descriptions, alignments, expect i.e., the statistical significance threshold for reporting matches against database sequences
- cutoff, matrix and filter are at the default settings.
- the default scoring matrix used by blastp, blastx, tblastn, and tblastx is the BLOSUM62 matrix (Henikoff et al., Proc. Natl. Acad. Sci. USA 89, 10915-10919, 1992, fully incorporated by reference).
- the scoring matrix is set by the ratios of M (i.e., the reward score for a pair of matching residues) to N (i.e., the penalty score for mismatching residues), wherein the default values for M and N are 5 and -4, respectively.
- proteins can be identified.
- protein homology is determined primarily by sequence similarity (sequences are more similar than expected at random) . Sequences that are as low as 15-20% similar by alignments are likely related and encode proteins with similar structures. Additional structural relatedness can be determine using any number of further techniques including, but not limited to, X-ray crystallography, NMR, searching a protein structure databases, homology modeling, de novo protein folding, and computational protein structure prediction. Such additional techniques can be used alone or in addition to sequence-based alignment techniques.
- the degree of similarity/identity between two polypeptides (including peptide segments or domains) or polynucleotide sequences should be at least about 20% or more (e.g., 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99%) .
- parent sequences are chosen from a database of sequences, by a sequence homology search such as BLAST.
- Parental sequences will typically be between about 20% and 95% identical, typically between 35 and 80% identical.
- the lower the identity the more the mutation level (and possibly the greater the possible stability enhancement and functional variation in the resulting sequences) following recombination between parental strands.
- the higher the identity the higher the probability the sequences will fold and function.
- Thermodynamic stability is an important biological property that has evolved to an optimal level to fit the functional needs of proteins. Therefore, investigating the stability of proteins is important not only because it affords information about the physical chemistry of folding, but also because it can provide important biological insights. A proper understanding of protein stability is also useful for technological purposes. The ability to rationally make proteins of high stability, low aggregation or low degradation rates will be valuable for a number of applications. For example, proteins that can resist unfolding can be used in industrial processes that require enzyme catalysis at high temperatures (Van den. Burg et al., Proc. Natl. Acad. Sci . U.S.A. 95(5): 2056-60, 1998); and the ability to produce proteins with low degradation rates within the cell can help to maximize production of recombinant proteins (Kwon et al., Protein Eng. 9(12): 1197-202, 1996) .
- Stability measurements can also be used as probes of other biological phenomena.
- the most basic of these phenomena is biological activity.
- the ability of proteins to populate their native states is a universal requirement for function. Therefore, stability can be used as a convenient, first level assay for function.
- libraries of polypeptide sequences can be tested for stability in order to select for sequences that fold into stable conformations and might potentially be active (Sandberg et al., Biochem. 34: 11970-78, 1995).
- Heme domains of cytochromes can be assayed for proper folding using CO-binding to the iron/heme.
- the heme domain for SEQ ID N0:l extends from about amino acid residue 1 to about 434; and for SEQ ID NO : 2 or 3 extends from about amino acid residue 1 to about 436.
- Changes in stability can also be used to detect binding.
- a ligand binds to the native conformation of a protein
- the global stability of a protein is increased (Schellman, Biopolymers 14: 999-1018, 1975; Pace & McGrath, (1980) J. Biol. Chem. 255: 3862-65; Pace & Grimsley, Biochem. 27: 3242-46, 1988).
- the binding constant can be measured by analyzing the extent of the stability increase. This strategy has been used to analyze the binding of ions and small molecules to a number of proteins (Pace & McGrath, (1980) J. Biol. Chem. 255: 3862-65; Pace & Grimsley, (1988) Biochem.
- the expressed chimeric recombinant proteins are measured for stability and/or biological activity. Techniques for measuring stability and activity are known in the art and include, for example, the ability to retain function (e.g.
- enzymatic activity at elevated temperature or under 'harsh' conditions of pH, salt, organic solvent, and the like; and/or the ability to maintain function for a longer period of time (e.g., in storage in normal conditions, or in harsh conditions) .
- Function will of course depend upon the type of protein being generated and will be based upon its intended purpose. For example, P450 mutants can be tested for the ability to convert alkanes to alcohols under various conditions of pH, solvents and temperature.
- the best methodology for protein stabilization depends on the target protein and the relative ease with which folding status and stability are measured.
- the linear regression model uses stability data, which are often more difficult to obtain than a simple determination of folding status.
- the linear regression model requires fewer measurements and always predicted more true positives with fewer false positives than the consensus approach based on folding status (Fig. 8) .
- the linear regression model predicted absolute thermostabilities with higher accuracy than the consensus model, the latter nonetheless reliably predicted highly stable chimeras.
- the two approaches have significant overlaps in their predicted stable sequences, including the MTP. Eight of the top seventeen stable chimeras predicted by consensus have predicted T 50 > 6O 0 C by linear regression (Table 3) .
- thermophiles are poor enzymes at low-temperature (e.g. room temperature) because they have evolved under pressure to function at high temperatures, just as proteins from mesophiles are marginally stable because they have never been selected to fold at higher temperatures .
- thermostable enzymes represents evolutionary statistics rather than an inherent biophysical tradeoff has anecdotal support from engineering experiments that have dramatically stabilized proteins while retaining their room- temperature activity.
- the current data in which a large set of proteins with varying stabilities has been generated by recombination without evolutionary selection for either activity or stability, provide a more rigorous test.
- Over half of the 40 thermostable chimeras in Table 3 are also more active on 2- phenoxyethanol than the most active parent, demonstrating that there is no fundamental biophysical tradeoff between stability and activity on this substrate.
- Such trade-offs, if they exist, must be connected to significantly more optimized functions.
- chimeric proteins exhibit a broad range of stabilities, and that stability of a given folded sequence can be predicted based on data (either stability or folding status) from a limited sampling of the chimeric library. Using this information, dozens of diverse, highly stable proteins were created.
- T SO a O + ⁇ a ij x ij
- thermostable chimeric cytochrome P450s To construct a given stable chimera, two chimeras having parts of the targeted gene (e.g. 21311212 and 11312333 for the target chimera 21312333) were selected as templates. The target gene was constructed by overlap extension PCR, cloned into the pCWori expression vector, and transformed into the catalase-free E. coli strain SN0037. All constructs were confirmed by sequencing. [0081] Enzyme activity assay. Activity on 2-phenoxyethanol was analyzed in 96-well plates using the 4-aminoantipyrine (4-AAP) assay.
- a structure-guided SCHEMA recombination of the heme domains of CYP102A1 and its homologs CYP102A2 (A2) and CYP102A3 (A3) was used to create at least 2,300 new, properly folded and catalytically active enzymes.
- the folded chimeras exhibit a great deal of sequence diversity, differing from the closest parent sequence by an average of 72 amino acid substitutions.
- the SCHEMA library was constructed by site-directed recombination at seven crossover sites, so that a chimeric P450 sequence is made up of eight fragments, each chosen from one of the three parents.
- the thermostabilities of a subset of the folded chimeras were measured and analyzed the relationship between sequence and stability. Thermostability is well described by a model that assumes the contributions of the chimera' s eight sequence fragments are additive.
- the sequences of 620 folded and 335 unfolded chimeras were examined and found that the most thermostable chimeras tend to contain 'consensus fragments' , or those appearing more frequently among the folded chimeras and less frequently in the unfolded ones.
- Chimera thermostability can thus be predicted by determining either the folding status or thermostabilities of a small sampling of the library. Based on these results, chimeras were predicted, constructed and characterized; 40 chimeric cytochrome P450s that are highly stable, catalytically active and have sequences that are significantly different from any known P450.
- thermostabilities of 185 folded P450 chimeras were measured (Table 1) in the form of T 50 , the temperature at which 50% of the protein irreversibly denatured after incubation for ten minutes.
- the parental proteins have T 50 values of 54.9 0 C (Al), 43.6 0 C (A2) and 49.1 0 C (A3).
- the T 50 distribution of the chimeras, shown in Fig. 1, has an average of 50.3 0 C and a standard deviation of 4.5 °C.
- This subset of the folded P450s contains many that are more stable than the most stable parent (Al).
- Table 1 T 50 values and sequences of 205 chimeric cytochromes P450.
- Sequence T 50 ( 0 C) Sequence T 50 ( 0 C) Sequence T 50 ( 0 C) Sequence T 50 ( 0 C) Sequence T 50 ( 0 C)
- the first 185 chimeras are those for data training and testing, and the last 20 chimeras (bold) are those used to test the linear regression model.
- thermostability contribution of each fragment shown is relative to the corresponding fragment from parent A2, which was used as the reference.
- the data was randomly divided into two parts, a training set (140 data points) and a test set (45 data points).
- r for the training set was improved from 0.847 to 0.901 (Fig. 5a) .
- Table 2 trained regression parameters
- r 0.856, indicating that additive contributions derived from one group of proteins can be used to accurately predict thermostabilities of another group (Fig. 5b) .
- the linear regression model was further confirmed by 10-fold cross-validation.
- Predicted and measured T 50 values for all 20 new P450s, including the MTP, correlated extremely well (r 0.949) (Fig. 2b).
- determining the weights for predictor variables in a regression model only requires making as many measurements as there are predictor variables. In the presence of noise, additional measurements will tend to increase the accuracy of the predictions.
- a certain number of sequences from the 205 chimeras with measured T 50 S were randomly selected and tested the ability of regression models based on these sequences to predict the T 50 S of the remaining chimeras. 35 to 40 measurements were sufficient for accurate predictions of chimera stability, although slight improvements in prediction accuracy could be seen with more data points (Fig. 5c) .
- Table 3 A stabilized cytochrome P450 heme domain family.
- the lowest consensus energy is the "consensus sequence", and should be the most stable chimera. Indeed, the consensus sequence has the highest measured stability among all 239 chimeras with known T50 and is also the MTP predicted by the linear regression model.
- Table 4 The 20 chimeras with lowest total consensus energies.
- Stability predictions identify errors in sequencing.
- the stability predictions were found sufficiently accurate to identify both sequencing errors and point mutations in the chimeras.
- the sequences of P450 chimeras were originally determined in high throughput by DNA probe hybridization, which has a -3% error rate; small numbers of point mutations during library construction are also expected. Thus approximately 7 incorrect sequence readings are expected for the total set of 239 chimeras studied, and other sequences may have point mutations. 13 chimeras with prediction error of more than 4°C from the original set of 190 chimeras whose T 50 S were measured and analyzed by linear regression were resequences. Five either had incorrect sequences or contained point mutations (Table 5) ; these five chimeras were eliminated from the subsequent linear regression analysis to determine the model parameters in Table 2.
- the stable P450 chimeras are also more tolerant to inactivation by denaturants .
- the 40 stable chimeras comprise a diverse family of sequences, differing from one another at 14 to 88 amino acid positions (49 on average) (Fig. 7) .
- the distance to the closest parent is as high as 100 amino acids.
- the 40 chimeras thus comprise a family of properly folded, highly stable cytochrome P450s that exhibit considerable sequence novelty.
- the activities of the stable chimeras were assessed in order to explore the relationship between activity and stability, and specifically to determine whether the increased stability came at the cost of catalytic function.
- thermostable chimeras were higher than those of the parent proteins. Most thermostable chimeras expressed well even without the inducing agent isopropyl-beta-D-thiogalactopyranoside (IPTG).
- IPTG isopropyl-beta-D-thiogalactopyranoside
- Peroxygenase activities of the 16 heme domains were determined by assaying for product formation after a fixed reaction time in 96-well plates. Similar assays were used to determine monooxygenase activities for each of the fusion proteins. Final enzyme concentrations were fixed to 1 ⁇ M in order to reduce large errors associated with low expression and to allow us to compare chimera activities using absorbance values directly. Protein concentrations were re-assayed in 96-well format and Attorney Docket No. 1034345-000263
- Table 8 Average activity in absorbance units for each substrate-construct pair (maximal value for each substrate in bold/italic) .
- Table 9 Standard deviations/ average of absorbance for each substrateconstruct pair. Blanks indicate where the average absorbance equals zero. Attorney Docket No. 1034345-000263
- Table 10 Summary of error statistics for collected absorbance data sorted by substrates. The percent of the standard deviation divided by the average value and the percentage of data points retained for the analysis are measures of data quality. For Attorney Docket No. 1034345-000263
- the best enzyme for each substrate is listed in Table 11. All the best enzymes are chimeras. Most of the best enzymes are also holoenzymes—only PE has a peroxygenase as the best catalyst .
- K-means clustering a statistical algorithm that partitions data into clusters based on data similarity, mutants exhibiting similar substrate specificities and protein fragments (4-7 residues) of similar structure and interacting nucleotide pairs with similar 3D structures.
- Cluster 1 consisting of chimeras 32312333-R1/R2 and 32313233- R1/R2 ( Figure 17B), is characterized by low relative activities on CH, TB, PR and PN and high relative activities on all other substrates. In fact, two of these chimeras are the best enzymes on all the remaining substrates except PB and PE.
- Cluster 2 is made up of 22213132-R2, 21313111-R3, 21313311-R3, which are the most active enzymes on TB, CH and PR ( Figure 17C) .
- Cluster 2 enzymes are entirely inactive on PN and show low activity on most of the substrates that cluster 1 enzymes accept (PE, DP, PA and EB) .
- Relative activities on the remaining substrates are moderate (although lower than cluster 1 chimeras).
- An exception is 21313111-R3, which is the best enzyme for PB and also fairly good on PE and DP.
- Cluster 3 contains chimeras A1-R1/R2, 12112333-R1/R2, 11113311-R1/R2 and 22213132-R1 ( Figure 17D) .
- the Al-like sequences are characterized by high relative activity on PN (on which 11113311-R1/R2 and Al-Rl are the three top-ranking enzymes), and moderate to high relative activity on PB and moderate activity on PE.
- Cluster 4 contains 21313111-R1/R2, 22313233-R2, 22312333-R2, 32312231-R2, 32312333-R0, 32312333-R3, 32313233-R0, and 32313233-R3 ( Figure 17E) .
- This cluster is characterized by having the highest relative activity on PE, in addition to moderate activities on PT, DP and ZX. The remaining chimeras appear in a fifth cluster with relatively low activity on everything except PN and PE ( Figure 17F) .
- This cluster contains parental sequences Al- Attorney Docket No. 1034345-000263
- the partition created by the clustering algorithm shows that the presence and identity of the reductase can alter the activity profile and thus the specificity of a heme domain sequence.
- the Rl and R2 fusions of 32312333 and 32313233 appear in cluster 1, whereas their RO and R3 counterparts are in cluster 4.
- Sequences 22213132 and 21313111 also behave differently when fused to different reductases.
- 22213132-R2 displays pronounced peaks on substrates TB, CH and PR that are not present in the corresponding peroxygenase and R1/R3 profiles ( Figure 19E) and is thus the only member with this heme domain sequence appearing in cluster 2.
- each of the 14 chimeric heme domains can be fused to a parental reductase to generate a functional monooxygenase .
- the resulting monooxygenases are generally more active under these conditions than the corresponding peroxygenases (see Figure 19) .
- the Rl and R2 fusions tend to outperform R3 fusions. While altering reductase identity never completely deactivates the protein, it does affect specificity in some cases.
- Group A core substrates have cluster 1 chimeras as their top-performing enzymes, whereas substrates of group B have cluster 2 chimeras as their top- performing enzymes.
- the top catalysts for group C are three of the cluster 3 chimeras. Members of a substrate group thus share the same best-performing enzymes.
- the folded P450s contain an average of 72 mutations from their closest parent. A large fraction of the folded P450s are catalytically active showing activity on a single substrate (PN). Eleven additional substrates were selected for characterization of 14 of the active chimeric heme domains and their fusions with each of the three parental reductase domains. Many of the chimeras were shown to be significantly more active. In fact, for every single substrate, including one widely used to assay CYP102A1 (PN), the top-performing enzyme is a chimera. Recombining mutations already accepted in natural homologs thus leads to a family of highly active enzymes that accept a broader range of substrates. Attorney Docket No. 1034345-000263
- Chimeric enzymes exhibit distinct specificities and that they can be partitioned into clusters based on their specificity.
- One cluster contains parent Al-Rl and all chimeras with Al-like profiles.
- Another cluster contains low activity chimeras and includes all remaining parental sequences. The remaining clusters represent highly active chimeras that have acquired new specificities.
- Members of a cluster are likely to exhibit common structural, physical or chemical features that account for their similar catalytic properties. If the library is large enough, statistical techniques can be used to determine how sequence elements relate to the observed profiles. In particular, if there are sufficient numbers of chimeras in each cluster, then powerful tools such as logistic regression or machine learning can be used to predict which cluster an untested sequence belongs to.
- Substrates were also partitioned into groups based on the linear correlations of substrate pairs. An enzyme active on one member of a substrate group is therefore likely to be active on another member of the same group.
- One group consists of the drug- like substrates TB, PR and CH ( Figure 14). Another consists of PT, PA, EB and DP. If these correlations hold for the larger library of chimeric enzymes, we should be able to predict with reasonable accuracy the relative activities of a chimera on all the substrates in a group by testing activity on only one. This type of analysis can be expanded to a larger collection of substrates to identify additional groups or additional members of an existing group.
- the observed correspondence between the three substrate groups and chimera clusters 1, 2 and 3 illustrates that each group Attorney Docket No. 1034345-000263
- Cluster 4 chimeras have peaks on only certain members of group A and are thus responsible for the lower correlations among group A substrates.
- Some cluster 2 and cluster 3 chimeras exhibit peaks on PB (on the edge of group A) as well as group B and C, respectively.
- PB correlates mostly with group A core substrates it shares its top-performing enzymes with groups B and C and thus displays a hybrid behavior. This is why PB correlates less with group A than core substrates do and why it has higher correlations with group B and C members than any other substrate not belonging to these groups.
- the top enzymes for one member of a substrate group will usually be among the top ones for all members of that group.
- an approach to screening that is based on carefully chosen 'surrogate' substrates could significantly enhance our ability to identify useful catalysts.
- any member of a well-defined substrate group can be a surrogate for other members of that group. Further analysis may also help to identify the critical physical, structural or chemical properties of substrates belonging to a known group.
- the direct hydrogen bond occurs between the reductase backbone carbonyl of N573 and the side-chain hydroxyl group of heme domain residue S383.
- N573 is only conserved in Rl and R2, but because the interaction involves the backbone oxygen, the reductase side of the interface is not affected by changes in the side-chain identity.
- S383 is only conserved in parents Al and A3. However, the corresponding residue in A2, D385, may also be capable of forming the hydrogen bond. This interaction may therefore be present in all the chimeras .
- the chimeric heme domains were fused to each of the three wildtype reductase domains after amino acid residue 463 when the last block originates from CYP102A1 and 466 for CYP102A2 and CYP102A3.
- the holoenzymes were constructed Attorney Docket No. 1034345-000263
- Proteins were expressed in E. coli as described previously and purified by anion exchange on Toyopearl SuperQ-650M from Tosoh. After binding of the proteins, the matrix was washed with a 30 mM NaCl buffer, and proteins were eluted with 150 mM NaCl (all buffers used for purification contained 25 mM phosphate buffer pH 8.0). Proteins were rebuffered into 100 mM phosphate buffer and concentrated using 30,000 MWCO Amicon Ultra centrifugal filter devices (Millipore) . Proteins were stored at -20 0 C in 50% glycerol. [00138] Protein concentration was measured by CO absorption at 450 ran.
- Protein concentration of 1 ⁇ M was chosen for the activity assays. Protein concentrations were re-assayed in 96-well format and determined to be 0.88 ⁇ M +/- 13% (SD/average) .
- Proteins were assayed for mono- or peroxygenase activities in 96-well plates. Heme domains were assayed for peroxygenase activity using hydrogen peroxide as the oxygen and electron source. Reductase domain fusion proteins were assayed for monooxygenase activity, using molecular oxygen and NADPH. Reactions were carried out in 100 mM EPPS buffer pH 8, 1% acetone, 1% DMSO, 1 ⁇ M protein in 120 ⁇ l volumes.
- Substrate concentrations depended on their solubility under the assay conditions. Final concentrations were: 2-phenoxyethanol (PE), 100 mM; ethoxybenzene (EB), 50 mM; ethyl phenoxyacetate (PA), 10 mM; 3-phenoxytoluene (PT), 10 mM; ethyl 4-phenylbutyrate (PB), 5 mM; diphenyl ether (DP), 10 mM; zoxazolamine (ZX), 5 mM; propranolol (PR), 4 mM; chlorzoxazone (CH) , 5 mM; tolbutamide (TB) , 10 mM; 12-p-nitrophenoxycarboxylic acid (PN), 0.25 mM.
- PE 2-phenoxyethanol
- EB ethoxybenzene
- PA ethyl phenoxyacetate
- PT 3-phenoxytoluene
- PB ethyl 4-phen
- the reaction was initiated by the addition of NADPH or hydrogen peroxide stock solution (final concentration of 500 ⁇ M NADPH or 2 mM hydrogen peroxide) and mixed briefly. After 2 hrs at room temperature, reactions with substrates 1-10 were quenched with 120 ⁇ l of 0.1 M NaOH and 4 M urea. Thirty-six ⁇ l of 0.6% (w/v) 4-aminoantipyrine (4 -AAP) was then added. The 96-well plate reader was zeroed at 500 nm and 36 ⁇ l of 0.6% (w/v) potassium persulfate was added. After 20 min, the absorbance at 500 nm was Attorney Docket No. 1034345-000263
- thermostable phosphite dehydrogenase for NAD(P)H regeneration. Appl. Environ. Microb. 71, 5728-5734 (2005)
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Peptides Or Proteins (AREA)
Abstract
L'invention concerne des procédés pour identifier et produire des protéines chimériques stabilisées.
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US90022907P | 2007-02-08 | 2007-02-08 | |
| US60/900,229 | 2007-02-08 | ||
| US91852807P | 2007-03-16 | 2007-03-16 | |
| US60/918,528 | 2007-03-16 | ||
| US12/024,515 US20080248545A1 (en) | 2007-02-02 | 2008-02-01 | Methods for Generating Novel Stabilized Proteins |
| US12/024,515 | 2008-02-01 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2008118545A2 true WO2008118545A2 (fr) | 2008-10-02 |
| WO2008118545A3 WO2008118545A3 (fr) | 2009-12-30 |
Family
ID=39789216
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2008/053344 Ceased WO2008118545A2 (fr) | 2007-02-08 | 2008-02-07 | Procédés de génération de protéines stabilisées novatrices |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2008118545A2 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7863030B2 (en) | 2003-06-17 | 2011-01-04 | The California Institute Of Technology | Regio- and enantioselective alkane hydroxylation with modified cytochrome P450 |
| US8026085B2 (en) | 2006-08-04 | 2011-09-27 | California Institute Of Technology | Methods and systems for selective fluorination of organic molecules |
| US8252559B2 (en) | 2006-08-04 | 2012-08-28 | The California Institute Of Technology | Methods and systems for selective fluorination of organic molecules |
| US8802401B2 (en) | 2007-06-18 | 2014-08-12 | The California Institute Of Technology | Methods and compositions for preparation of selectively protected carbohydrates |
| US9322007B2 (en) | 2011-07-22 | 2016-04-26 | The California Institute Of Technology | Stable fungal Cel6 enzyme variants |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7226768B2 (en) * | 2001-07-20 | 2007-06-05 | The California Institute Of Technology | Cytochrome P450 oxygenases |
| DK1660646T3 (en) * | 2003-08-11 | 2015-03-09 | California Inst Of Techn | Thermostable peroxide-driven cytochrome P450 oxygenase variants and methods of use |
-
2008
- 2008-02-07 WO PCT/US2008/053344 patent/WO2008118545A2/fr not_active Ceased
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7863030B2 (en) | 2003-06-17 | 2011-01-04 | The California Institute Of Technology | Regio- and enantioselective alkane hydroxylation with modified cytochrome P450 |
| US8343744B2 (en) | 2003-06-17 | 2013-01-01 | The California Institute Of Technology | Regio- and enantioselective alkane hydroxylation with modified cytochrome P450 |
| US8741616B2 (en) | 2003-06-17 | 2014-06-03 | California Institute Of Technology | Regio- and enantioselective alkane hydroxylation with modified cytochrome P450 |
| US9145549B2 (en) | 2003-06-17 | 2015-09-29 | The California Institute Of Technology | Regio- and enantioselective alkane hydroxylation with modified cytochrome P450 |
| US8026085B2 (en) | 2006-08-04 | 2011-09-27 | California Institute Of Technology | Methods and systems for selective fluorination of organic molecules |
| US8252559B2 (en) | 2006-08-04 | 2012-08-28 | The California Institute Of Technology | Methods and systems for selective fluorination of organic molecules |
| US8802401B2 (en) | 2007-06-18 | 2014-08-12 | The California Institute Of Technology | Methods and compositions for preparation of selectively protected carbohydrates |
| US9322007B2 (en) | 2011-07-22 | 2016-04-26 | The California Institute Of Technology | Stable fungal Cel6 enzyme variants |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2008118545A3 (fr) | 2009-12-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Tsuboyama et al. | Mega-scale experimental analysis of protein folding stability in biology and design | |
| US20080248545A1 (en) | Methods for Generating Novel Stabilized Proteins | |
| Otey et al. | Structure-guided recombination creates an artificial family of cytochromes P450 | |
| Landwehr et al. | Diversification of catalytic function in a synthetic family of chimeric cytochrome P450s | |
| Bloom et al. | Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution | |
| Gumulya et al. | Engineering highly functional thermostable proteins using ancestral sequence reconstruction | |
| US20120171693A1 (en) | Methods for Generating Novel Stabilized Proteins | |
| Perperopoulou et al. | Recent advances in protein engineering and biotechnological applications of glutathione transferases | |
| Park et al. | Energetics-based protein profiling on a proteomic scale: identification of proteins resistant to proteolysis | |
| Fox et al. | Old yellow enzyme at 2 Å resolution: overall structure, ligand binding, and comparison with related flavoproteins | |
| JP2021131901A (ja) | 酵素バリアントの自動スクリーニング | |
| Rembeza et al. | Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1. 3.15 enzyme class | |
| Govindarajan et al. | Mapping of amino acid substitutions conferring herbicide resistance in wheat glutathione transferase | |
| Nutschel et al. | Systematically scrutinizing the impact of substitution sites on thermostability and detergent tolerance for Bacillus subtilis lipase A | |
| WO2008118545A2 (fr) | Procédés de génération de protéines stabilisées novatrices | |
| Prakinee et al. | Ancestral sequence reconstruction for designing biocatalysts and investigating their functional mechanisms | |
| Nakano et al. | Benchmark analysis of native and artificial NAD+-dependent enzymes generated by a sequence-based design method with or without phylogenetic data | |
| Saab‐Rincón et al. | Stabilization of the reductase domain in the catalytically self‐sufficient cytochrome P450BM3 by consensus‐guided mutagenesis | |
| Hu et al. | GRACE: Generative redesign in artificial computational enzymology | |
| WO2005017106A2 (fr) | Bibliotheques d'enzymes cytochromes p450 optimisees et enzymes p450 optimisees | |
| Chen et al. | Rational design of loop dynamics for a barrel-shaped enzyme by introducing disulfide bonds | |
| Verma et al. | MAP2. 03D: a sequence/structure based server for protein engineering | |
| Wan et al. | Discovery of alkaline laccases from basidiomycete fungi through machine learning-based approach | |
| Chopra et al. | Structure analysis and molecular docking studies of laccase from “Bacillus licheniformis NS2324” | |
| Sameer et al. | Elucidation of ligand binding and dimerization of NADPH: protochlorophyllide (Pchlide) oxidoreductase from pea (Pisum sativum L.) by structural analysis and simulations |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08780403 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 08780403 Country of ref document: EP Kind code of ref document: A2 |