PLANT POLYPEPTIDE PRODUCTION
FIELD OF THE INVENTION
The present invention is directed to increasing expression and production of plant polypeptides of interest in a filamentous fungus host cell. The invention discloses methods for obtaining increased yields of a plant polypeptide of interest, DNA sequences, DNA constructs, vectors and host cells.
BACKGROUND OF THE INVENTION Several economically important polypeptides obtained from eukaryotes have been expressed in fungal hosts. Fungal polypeptides (e.g. fungal glucoamylase) are well expressed (up to 30 g/l), but many non-fungal polypeptides, e.g. enzymes obtained from plants, are expressed very poorly. Expression of plant polypeptides in fungi is described in Juge et al. (1998) Appl Micro Biotech 49:385-392 and in Moralejo et al. (2000) Appl Micro Biotech 54:772- 777.
SUMMARY OF THE INVENTION
The invention relates to providing methods for producing a plant polypeptide of interest in a filamentous fungus host cell. In a first aspect the invention provides a method of producing a polypeptide of interest obtained from a plant in a filamentous fungus host cell, the method comprising, (a) exchanging at least one codon of the native DNA sequence, said codon encoding an amino acid residue in the polypeptide of interest, with another codon encoding the same amino acid residue of said polypeptide of interest, so that the amino acid sequence of the polypeptide of interest is unchanged, and so that the expression level of the polypeptide of interest is increased in comparison to the expression level in the same host cell and under the same conditions of the polypeptide encoded by the native DNA sequence, (b) introducing into and expressing said DNA sequence obtained in step (a) in the host cell, (c) culturing the cell obtained in step (b) under conditions conducive to the production of the polypeptide of interest, (d) isolating the polypeptide of interest.
In a second aspect the invention provides a method for production of a polypeptide of interest obtained from a plant in a filamentous fungus host cell, the method comprising, (a) determining the frequency of the codons of at least one native DNA sequence encoding a polypeptide expressed by said host cell, (b) providing a DNA sequence encoding the polypeptide of interest, wherein at least one first codon encoding an amino acid residue in the polypeptide of interest is exchanged with a second codon encoding the same amino acid residue, said second codon having a higher frequency than said first codon as determined in
step (a), (c) introducing into and expressing said DNA sequence obtained in step (b) in the host cell, (d) culturing the host cell obtained in step (c) under conditions conducive to the production of the polypeptide of interest, and (e) isolating the polypeptide of interest.
In a third aspect the invention provides a method of producing a polypeptide of interest obtained from a plant in a filamentous fungus host cell, the method comprising introducing into and expressing in the filamentous fungus host cell a DNA sequence encoding the polypeptide of interest, said DNA sequence being codon-optimized for expression in said filamentous fungus host cell.
In a fourth aspect the invention provides a DNA sequence obtainable by the methods of the previous aspects of the invention. In a fifth aspect the invention provides a DNA construct comprising a DNA sequence according to the fourth aspect of the invention. In a sixth aspect the invention provides a recombinant expression vector comprising a DNA construct according to the fifth aspect of the invention. In a seventh aspect the invention provides a host cell, which is transformed with a DNA construct according to the fifth aspect of the invention. In further aspects the invention provides a use of the DNA sequence of the fourth aspect of the invention, a use of the DNA construct of the fifth aspect of the invention, and a use of the host cell of the seventh aspect of the invention in a method for producing plant polypeptides of interest in improved yields.
DETAILED DESCRIPTION OF THE INVENTION
Host cells
The host cell of the invention, either comprising a DNA construct or an expression vector of the invention, is advantageously used as a host cell in the recombinant production of a polypeptide of interest. The cell may be transformed with the DNA construct of the invention encoding the polypeptide of interest, conveniently by integrating the DNA construct (in one or more copies) in the host chromosome. This integration is generally considered to be an advantage as the DNA sequence is more likely to be stably maintained in the cell. Integration of the DNA construct into the host chromosome may be performed according to conventional methods, e.g., by homologous or heterologous recombination. Alternatively, the cell may be transformed with an expression vector.
Filamentous fungus host cells
In a preferred embodiment, the host cell is a filamentous fungus represented by the following groups of Ascomycota, include, e.g., Neurospora, Eupenicillium (=Penicillium), Emericella (=Aspergillus), Eurotium (=Aspergillus).
In a preferred embodiment, the filamentous fungus include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., In, Ainsworth and
Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK. The filamentous fungi are characterized by a vegetative mycelium composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In a more preferred embodiment, the filamentous fungus host cell is a cell of a species of, but not limited to a cell selected from the group consisting of a strain belonging to a species of Aspergillus, preferably Aspergillus oryzae, Aspergillus niger, Aspergillus awamori, or a strain of Fusarium, such as a strain of Fusarium oxysporium, Fusarium graminearum (in the perfect state named Gribberella zeae, previously Sphaeria zeae, synonym with Gibberella roseum and Gibberella roseum f. sp. cerealis), or Fusarium sulphureum (in the prefect state named Gibberella puricaris, synonym with Fusarium trichothecioides, Fusarium bactridioides, Fusarium sambucium, Fusarium roseum, and Fusarium roseum var. graminearum), Fusarium cerealis (synonym with Fusarium crookwellense), or Fusarium venenatum. The host cell may be a wild type filamentous fungus host cell or a variant, a mutant or a genetically modified filamentous fungus host cell. In a preferred embodiment of the invention the host cell is a protease deficient or protease minus strain. This may be the protease deficient strain Aspergillus oryzae JaL 125 having the alkaline protease gene named "alp" deleted (described in WO 97/35956 or EP 429490), or the tripeptidyl-aminopeptidases (TPAP) deficient strain of A. niger, disclosed in WO 96/14404. Further, also host cell with reduced production of the transcriptional activator (prtT) as described in WO 01/68864 is contemplated according to the invention. Another specifically contemplated host strain is the Aspergillus oryzae BECh2, where the three TAKA amylase genes present in the parent strain IFO4177 has been inactivated. In addition, two proteases, the alkaline protease and neutral metalloprotease II have been destroyed by gene disruption. The ability to form the metabolites cyclopiazonic acid and kojic acid has been destroyed by mutation. BECh2 is described in WO 00/39322 and is derived from JaL228 (described in WO 98/12300), which again was a mutant of IFO4177 disclosed in US 5,766,912 as A1560.
Also specifically contemplated is Aspergillus strains, such as A. niger strains, genetically modified to disrupt or reduce expression of glucoamylase, acid-stable alpha- amylase, alpha-1 ,6 transglucosidase, and protease activities.
Transformation of filamentous fungus host cells
Filamentous fungus host cells may be transformed by a process involving protoplast formation, transformation of the protoplasts, and regeneration of the cell wall in a manner known in the art. Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023, EP 184 438, and Yelton et al., 1984, Proceedings of the National
Academy of Sciences USA 81 :1470-1474. A suitable method of transforming Fusarium species is described by Malardier et al., 1989, Gene 78:147-156 or US 6,060,305.
Isolating and cloning a nucleic acid sequence The techniques used to isolate or clone a nucleic acid sequence encoding a polypeptide of interest are known in the art and include isolation from genomic DNA, preparation from cDNA, or a combination thereof. The cloning of the nucleic acid sequences of the present invention from such genomic DNA can be effected, e.g., by using the well known polymerase chain reaction (PCR) or antibody screening of expression libraries to detect cloned DNA fragments with shared structural features. See, e.g., Innis et al., 1990, PCR: A Guide to Methods and Application, Academic Press, New York. Other nucleic acid amplification procedures such as ligase chain reaction (LCR), ligated activated transcription (LAT) and nucleic acid sequence-based amplification (NASBA) may be used.
Isolated nucleic acid sequence
The term "isolated nucleic acid sequence" as used herein refers to a nucleic acid sequence, which is essentially free of other nucleic acid sequences, e.g., at least about 20% pure, preferably at least about 40% pure, more preferably at least about 60% pure, even more preferably at least about 80% pure, and most preferably at least about 90% pure as determined by agarose electrophoresis.
For example, an isolated nucleic acid sequence can be obtained by standard cloning procedures used in genetic engineering to relocate the nucleic acid sequence from its natural location to a different site where it will be reproduced. The cloning procedures may involve excision and isolation of a desired nucleic acid fragment comprising the nucleic acid sequence encoding the polypeptide of interest, insertion of the fragment into a vector molecule, and incorporation of the recombinant vector into a host cell where multiple copies or clones of the nucleic acid sequence will be replicated. An isolated nucleic acid sequence may be manipulated in a variety of ways to provide for expression of the polypeptide of interest. Manipulation of the nucleic acid sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector. The techniques for modifying nucleic acid sequences utilizing recombinant DNA methods are well known in the art.
DNA construct
"DNA construct" is defined herein as a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or which has been modified to contain segments of nucleic acid, which are combined and juxtaposed in a manner, which would not otherwise exist in nature. The term DNA construct is synonymous
with the term expression cassette when the DNA construct contains all the control sequences required for expression of a coding sequence of the present invention.
Coding sequence The term "coding sequence" is defined herein as a portion of a nucleic acid sequence, which directly specifies the amino acid sequence of its protein product. The boundaries of the coding sequence are generally determined by a ribosome binding site (prokaryotes) or by the ATG start codon (eukaryotes) located just upstream of the open reading frame at the 5' end of the mRNA and a transcription terminator sequence located just downstream of the open reading frame at the 3' end of the mRNA. A coding sequence can include, but is not limited to, DNA, cDNA, and recombinant nucleic acid sequences. If the coding sequence is intended for expression in a eukaryotic cell, a polyadenylation signal and transcription termination sequence will usually be located 3' to the coding sequence.
Operablv linked
The term "operably linked" is defined herein as a configuration in which a control sequence is appropriately placed at a position relative to the coding sequence of the DNA sequence such that the control sequence directs the expression of a polypeptide of interest.
Control sequences
The term "control sequences" is defined herein to include all components that are necessary or advantageous for expression of the coding sequence of the DNA sequence encoding the polypeptide of interest. Each control sequence may be native or foreign to the DNA sequence encoding the polypeptide of interest. Such control sequences may include, but are not limited to, a leader, a polyadenylation sequence, a propeptide sequence, a promoter, a signal sequence, and a transcription terminator. The control sequences may be provided with linkers, e.g., for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the DNA sequence encoding a polypeptide.
Promoters
The control sequence may be an appropriate promoter sequence, a DNA sequence which is recognized by a host cell for expression of the DNA sequence encoding the polypeptide of interest. The promoter sequence contains transcription and translation control sequences that mediate the expression of the polypeptide of interest. The promoter may be any nucleic acid sequence which shows transcriptional activity in the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained from genes
encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
Expression The present invention also relates to DNA constructs comprising a nucleic acid sequence of the present invention operably linked to one or more control sequences which direct the expression of the coding sequence in a suitable host cell under conditions compatible with the control sequences. Expression will be understood to include any step involved in the production of the polypeptide of interest including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
Filamentous Fungus Promoters
Examples of suitable promoters for directing the transcription of the DNA constructs of the present invention in a filamentous fungus host cell are promoters obtained from the genes encoding Aspergillus oryzae TAKA amylase (TAKA promoter) (EP 238 023 or US 5,766,912), Rhizomucor miehei aspartic proteinase, Aspergillus niger neutral alpha-amylase (NA2 promoter), NA2-TPI (a hybrid of the promoters from the genes encoding Aspergillus niger neutral alpha-amylase and Aspergillus nidulans triose phosphate isomerase), Aspergillus niger acid stable alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Rhizomucor miehei lipase, Aspergillus oryzae triose phosphate isomerase or Aspergillus nidulans triose phosphate isomerase (TPI-promoter), Aspergillus oryzae alkaline protease, Aspergillus nidulans acetamidase, Aspergillus oryzae and Aspergillus niger translation elongation factor (TEF1), Fusarium oxysporum trypsin-like protease (US 4,288,627) Fusarium venenatum amyloglucosidase, Fusarium oxysporum trypsin-like protease (WO 96/00787) and mutant promoter thereof, truncated promoter thereof, and hybrid promoter thereof.
Signal peptide The control sequence may also be a signal peptide-coding region, which codes for an amino acid sequence linked to the amino terminus of the polypeptide which can direct the expressed polypeptide of interest into the cell's secretory pathway. The 5' end of the coding sequence of the DNA sequence may inherently contain a signal peptide-coding region naturally linked in translation reading frame with the segment of the coding region that encodes the secreted polypeptide of interest. Alternatively, the 5' end of the coding sequence may contain a signal peptide-coding region which is foreign to that portion of the coding sequence that encodes the secreted protein. A foreign signal peptide-coding region may be
required where the coding sequence does not normally contain a signal peptide-coding region. Alternatively, a foreign signal peptide-coding region may simply replace the natural signal peptide-coding region in order to obtain enhanced secretion of the protein(s) relative to the natural signal peptide-coding region normally associated with the coding sequence. The signal peptide-coding region may be obtained from a glucoamylase or an amylase gene from an Aspergillus species, a lipase or proteinase gene from a Rhizomucor species, the gene for the alpha-factor from Saccharomyces cerevisiae, an amylase or a protease gene from a Bacillus species, or the calf preprochymosin gene. However, any signal peptide-coding region capable of directing the expressed protein into the secretory pathway of a host cell of choice may be used in the present invention. Fungal Signal Peptide Sequences
Prefered signal peptide coding regions for filamentous fungus host cells are the signal peptide coding region obtained from Aspergillus oryzae TAKA amylase gene (EP 238 023), Aspergillus niger neutral amylase gene, Aspergillus niger glucoamylase, the Rhizomucor miehei aspartic proteinase gene, the Humicola lanuginosa cellulase gene, Humicola insolens cellulase, Humicola insolens cutinase the Candida antactica lipase B gene or the Rhizomucor miehei lipase gene and mutant, truncated, and hybrid signal sequence thereof.
Transcription terminators
The control sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3' terminus of the DNA sequence encoding the polypeptide of interest.
Fungal transcription terminators
The control sequence may be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3' terminus of the nucleic acid sequence encoding the polypeptide of interest. Any terminator which is functional in the host cell of choice may be used in the present invention. Preferred terminators for filamentous fungus host cells are obtained from the genes encoding Aspergillus niger neutral alpha-amylase, Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Aspergillus niger alpha-glucosidase, and Fusarium oxysporum trypsin-like protease.
Translational initiator seguence
The control sequence may be an appropriate translational initiator sequence. The term "translational initiator sequence" is defined herein as the ten nucleotides immediately
upstream of the initiator or start codon of the open reading frame of a polypeptide-encoding nucleic acid sequence. The initiator codon encodes for the amino acid methionine, the so- called "start" codon. The initiator codon is typically an ATG, but may also be any functional start codon such as GTG.
Leader sequences
The control sequence may also be a suitable leader sequence, a non-translated region of mRNA, which is important for translation by the host cell. The leader sequence is operably linked to the 5' terminus of the DNA sequence encoding the polypeptide of interest. Any leader sequence, which is functional in the host cell of choice, may be used in the present invention.
Fungus Leader Sequences
Preferred leaders for filamentous fungus host cells are obtained from the genes encoding Aspergillus oryzae TAKA amylase and Aspergillus oryzae triose phosphate isomerase (TPI) and combinations thereof.
Polyadenylation sequences
The control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3' terminus of the nucleic acid sequence and which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence which is functional in the host cell of choice may be used in the present invention. Preferred polyadenylation sequences for filamentous fungus host cells are obtained from the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Fusarium oxysporum trypsin- like protease, and Aspergillus niger alpha-glucosidase.
Fungus Polyadenylation Sequences
Preferred polyadenylation sequences for filamentous fungus host cells are obtained from the genes encoding Aspergillus oryzae TAKA amylase (EP 238 023), Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, and Aspergillus niger alpha- glucosidase. Propeptide sequences
The control sequence may also be a propeptide coding region, which codes for an amino acid sequence positioned at the amino terminus of a polypeptide of interest. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is often inactive and can be converted to mature active polypeptide
by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding region may be obtained from the Bacillus subtilis alkaline protease gene (aprE), the Bacillus subtilis neutral protease gene (nprT), the Saccharomyces cerevisiae alpha-factor gene, the Candida antactica lipase B gene, or the Myceliophthora thermophilum laccase gene (WO 95/33836).
The DNA constructs of the present invention may also comprise one or more DNA sequences, which encode one or more factors that are advantageous in the expression of the polypeptide of interest, e.g., an activator (e.g., a trans-acting factor), a chaperone, and a processing protease. Any factor that is functional in the host cell of choice may be used in the present invention. The nucleic acids encoding one or more of these factors are not necessarily in tandem with the DNA sequence encoding the polypeptide of interest.
An activator is a protein, which activates transcription of a nucleic acid sequence encoding a polypeptide (Kudla et al., 1990, EMBO Journal 9:1355-1364; Jarai and Buxton, 1994, Current Genetics 26:2238-244; Verdier, 1990, Yeast 6:271-297). The nucleic acid sequence encoding an activator may be obtained from the genes encoding Bacillus stearothermophilus NprA (nprA), Saccharomyces cerevisiae heme activator protein 1 (hapl), Saccharomyces cerevisiae galactose metabolizing protein 4 (gal4), and Aspergillus nidulans ammonia regulation protein (are>4), and A. oryzae (amyR). For further examples, see Verdier, 1990, 1990, Yeast <δ Ε\-2§l and MacKenzie et al., 1993, Journal of General Microbiology 139:2295-2307.
A chaperone is a protein, which assists another polypeptide in folding properly (Hartl et al., 1994, TIBS 19:20-25; Bergeron et al., 1994, TIBS 19:124-128; Demolder et al., 1994, Journal of Biotechnology 32:179-189; Craig, 1993, Science 260:1902-1903; Gething and Sambrook, 1992, Nature 355:33-45; Puig and Gilbert, 1994, Journal of Biological Chemistry 269:7764-7771; Wang and Tsou, 1993, The FASEB Journal 7:1515-11157; Robinson et al., 1994, Bio/Technology 1:381-384). The nucleic acid sequence encoding a chaperone may be obtained from the genes encoding Bacillus subtilis GroE proteins, Aspergillus oryzae protein disulphide isomerase, Saccharomyces cerevisiae calnexin, Saccharomyces cerevisiae BiP/GRP78, and Saccharomyces cerevisiae Hsp70. For further examples, see Gething and Sambrook et al, 1989, Molecular Cloning, A Laboratory Manual, 2nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, and Hartl et al., 1994, TIBS 19:20-25.
A processing protease is a protease that cleaves a propeptide to generate a mature biochemically active polypeptide (Enderlin and Ogrydziak, 1994, Yeast 10:67-79; Fuller et al., 1989, Proceedings of the National Academy of Sciences USA 86:1434-1438; Julius et al., 1984, Cell 37:1075-1089; Julius et al., 1983, Cell 32:839-852). The nucleic acid sequence encoding a processing protease may be obtained from the genes encoding Aspergillus niger Kex2, Saccharomyces cerevisiae dipeptidylaminopeptidase, Saccharomyces cerevisiae
Kex2, and Yarrowia lipolytica dibasic processing endoprotease (xprβ), tripeptidyl aminopeptidase (TPAP)(WO 96/14404), and the A oryzae dipeptidyl aminopeptidase.
Regulatory seguences It may also be desirable to add regulatory sequences, which allow the regulation of the expression of the polypeptide of interest relative to the growth of the host cell. Examples of regulatory systems are those which cause the expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. In filamentous fungi, the TAKA alpha-amylase promoter, Aspergillus niger glucoamylase promoter, and the Aspergillus oryzae glucoamylase promoter may be used as regulatory sequences. Other examples of regulatory sequences are those, which allow for gene amplification. In eukaryotic systems, these include the dihydrofolate reductase gene, which is amplified in the presence of methotrexate, and the metallothionein genes, which are amplified with heavy metals. In these cases, the DNA sequence encoding the polypeptide of interest would be placed in tandem with the regulatory sequence.
Expression vectors
The present invention also relates to recombinant expression vectors which may comprise a DNA sequence of the present invention, a promoter, a signal peptide sequence, and transcriptional and translational stop signals. The various nucleic acid and control sequences described above may be joined together to produce a recombinant expression vector which may include one or more convenient restriction sites to allow for insertion or substitution of the nucleic acid sequence encoding the polypeptide at such sites. Alternatively, the nucleic acid sequence of the present invention may be expressed by inserting the nucleic acid sequence or a DNA construct comprising the sequence into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression, and possibly secretion. The recombinant expression vector may be any vector (e.g., a plasmid or virus), which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the DNA sequence. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vectors may be linear or closed circular plasmids. The vector may be an autonomously replicating vector, i.e., a vector which exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, a cosmid or an artificial chromosome. The vector may contain
P T/DK03/00108 any means for assuring self-replication. Alternatively, the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. The vector system may be a single vector or plasmid or two or more vectors or plasmids which together contain the total DNA to be introduced into the genome of the host cell, or a transposon.
Markers
The vectors of the present invention preferably contain one or more selectable markers, which permit easy selection of transformed cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
Examples of selectable markers for use in a filamentous fungus host cell may be selected from the group including, but not limited to, amdS (acetamidase), argB (ornithine carbamoyltransferase), bar (phosphinothricin acetyltransferase), hygB (hygromycin phosphotransferase), niaD (nitrate reductase), pyrG (orotidine-5'-phosphate decarboxylase), sC (sulfate adenyltransferase), trpC (anthranilate synthase), and glufosinate resistance markers, as well as equivalents from other species. Preferred for use in an Aspergillus cell are the amdS and pyrG markers of Aspergillus nidulans or Aspergillus oryzae and the bar marker of Streptomyces hygroscopicus. Furthermore, selection may be accomplished by co- transformation, e.g., as described in WO 91/17243, where the selectable marker is on a separate vector.
The vectors of the present invention preferably contain an element(s) that permits stable integration of the vector into the host cell genome or autonomous replication of the vector in the cell independent of the genome of the cell. The vectors of the present invention may be integrated into the host cell genome when introduced into a host cell. For integration, the vector may rely on the DNA sequence encoding the polypeptide of interest or any other element of the vector for stable integration of the vector into the genome by homologous or none homologous recombination. Alternatively, the vector may contain additional DNA sequences for directing integration by homologous recombination into the genome of the host cell. The additional DNA sequences enable the vector to be integrated into the host cell genome at a precise location(s) in the chromosome(s). To increase the likelihood of integration at a precise location, the integrational elements should preferably contain a sufficient number of nucleic acids, such as 100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most preferably 800 to 1,500 base pairs, which are highly homologous with the corresponding target sequence to enhance the probability of homologous recombination. The integrational elements may be any sequence that is homologous with the target sequence in the genome of the host cell.
Furthermore, the integrational elements may be non-encoding or encoding nucleic acid sequences. On the other hand, the vector may be integrated into the genome of the host cell by non-homologous recombination. These nucleic acid sequences may be any sequence that is homologous with a target sequence in the genome of the host cell, and, furthermore, may be non-encoding or encoding sequences.
For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question.
The episomal replicating the AMA1 plasmid vector disclosed in WO 00/24883 may be used. More than one copy of a DNA sequence encoding a polypeptide of interest may be inserted into the host cell to amplify expression of the DNA sequence. Stable amplification of the DNA sequence can be obtained by integrating at least one additional copy of the sequence into the host cell genome using methods well known in the art and selecting for transformants. The procedures used to ligate the elements described above to construct the recombinant expression vectors of the present invention are well known to one skilled in the art (see, e.g., Sambrook et al, 1989, Molecular Cloning, A Laboratory Manual, 2nd edition, Cold Spring Harbor, New York).
Source and type of the plant polypeptide of interest
The nucleic acid sequence encoding a polypeptide of interest may be obtained from any plant source and preferably from any genus within the Magnolophyta, or preferably within the monocots, and more preferably within the dicots. Even more preferably from any genus within the Fabaceae, any genus within the Papilionoideae, any genus within the Phaseoleae and in a most preferred embodiment selected from the genus Glycine, such as from the species G. max (L) Merr. (syn. G. soja Sieb. & Zucc). For purposes of the present invention, the term "obtained from" as used herein in connection with a given source shall mean that the polypeptide of interest is produced by the source or by a cell in which a gene obtained from the source has been inserted. The terms "a polypeptide obtained from a plant" and "a plant polypeptide" is used as equivalents and include the native polypeptides as well as variants produced by site-directed mutagenesis, random mutagenesis, shuffling, or other methods known in the art.
Preferably, the polypeptide of interest is an enzyme, a receptor or a portion thereof, an antibody or a portion thereof, or a reporter. In a more preferred embodiment, the polypeptide of interest is an industrial enzyme such as alpha-amylase, alpha-galactosidase, alpha-glucosidase, aminopeptidase, beta-amylase, beta-galactosidase, beta-glucosidase, carbohydrase, carbonyl hydrolase, carboxypeptidase, catalase, cellulase, chitinase, cutinase,
cyclodextrin glycosyltransferase, deoxyribonuclease, esterase, galactanase, glucoamylase, glucose oxidase, hydrolase, invertase, isomerase, laccase, ligase, lipoxygenase, lyase, maltogenic alpha-amylase, mannosidase, mutanase, oxidase, oxidoreductase, pectinolytic enzyme, peptidase, peroxidase, phytase, polyphenoloxidase, protease, ribonuclease, transferase, transglutaminase, or xylanase.
In another preferred embodiment the polypeptide of interest is an enzyme having protease and/or peptidase activity, such as an enzyme having protease D3-beta activity, more preferably an enzyme obtained from soy bean, and most preferably the polypeptide of interest is encoded by a DNA sequence at least 80% identical, 90% identical, preferably at least 93% identical, more preferably at least 95% identical, or even more preferably at least 97%, and most preferably 100% identical to the DNA sequences in SEQ ID NO:2.
Determination of codon usage
Information on codon usage for a large number of species may be found in the "Codon Usage Database" on the web address http://www.kazusa.or.ip/codon/. which is the source of the data for codon usage of Aspergillus oryzae shown in table 1. These data for A oryzae is based on 112 coding sequences with a total of 56051 codons.
In the context of this invention the term "codon usage" is defined as the frequency at which the different codons encoding the same amino acids of the native polypeptides of a host cell are used. Codon usage is based on the fact that several codons, i.e. triplets of nucleic acid residues, may encode the same amino acid, but that the frequency of the different codons encoding said amino acid may vary between organisms. Once the nucleotide sequence of one or more genes encoding a polypeptide in the host cell to be transformed are known, the frequencies of the various codons can be calculated. If only one or a few nucleotide sequences are used for calculation of the codon usage, each nucleotide sequences should preferably encode a polypeptide, which is expressed in a high amount in the host cell to be transformed. The differences in codon usage between remotely related organisms, such as between a plant and a fungus, may be considerable, whereas the difference in codon usage will be less significant from species to species within the same genus, and even between different genera of filamentous fungi. Thus a DNA sequence encoding a plant polypeptide, which is modified according to the codon usage of an Aspergillus oryzae host cell, may be well expressed in an Aspergillus niger host cell and may even be well expressed in a Fusarium host cell, and providing a higher yield of polypeptide than would the expression of the native DNA sequence obtained from a plant. However the codon usage should preferably be determined for the genus of the host cell to be transformed.
Codon-optimizing
The term "codon-optimized" as used herein refers to a nucleic acid sequence, which has been modified wholly or partly according to the codon usage of the host cell to be transformed. When the amino acids sequence of the polypeptide of interest has been determined a codon-optimized gene for expression in the host cell to be transformed can be synthesized in which one or more of the native first codons has been exchanged with a second codon encoding the same amino acid, the second codon having a higher frequency in the codon usage of the host cell to be transformed A DNA sequence encoding a polypeptide of interest is considered to be codon- optimized when at least one native first codon, or at least 1%, 5%, 10%, 25%, 50%, 75%, 80%, 85%, 90%, or preferably at least 95% of the native first codons have been exchanged with a second codon, the second codon encoding the same amino acid as the first codon and having a higher frequency in the codon usage of the host cell to be transformed than the first codon.
Codon usage of Aspergillus
The codon usage of Aspergillus oryzae is shown in table 1. The frequency of the codons used in the native protease D3-beta gene and in the protease D3-beta codon- optimized for Aspergillus is presented in table 2. More specifically the most frequently used, and thus the most preferred, codon for Alanine is GCC; for Argenine it is CGC; for Asparagine it is AAC, for Aspartic acid it is GAC, for Cysteine it is UGC, for Glutamine it is CAG, for Glutamic acid it is GAG, for Glycine it is GGC, for Histidine it is CAC, for Isoleucine it is AUC, for Leucine it is CUC, for Lysine it is AAG, for Methionine it is AUG, for Phenylalanine it is UUC, for Proline it is CCC, for Serine it is UCC, for Threonine it is ACC, for Tryptophan it is UGG, for Tyrosine it is UAC and finally for Valine it is GUC. Also considered within the scope of the invention is the embodiment wherein the host cell is an Aspergillus oryzae or an Aspergillus niger host cell and the modification of the DNA sequence encoding the polypeptide of interest includes one or more of the following substitutions; substituting an AGA, AGG, CGA, or CGG codon encoding Argenine with a CGC or a CGU codon, or substituting a GGG codon encoding Glycine with a GGC, GGA or a GGU codon, or substituting a CAU codon encoding Histidine with a CAC codon, or substituting an AUA codon encoding Isoleucine with an AUC or AUU codon, or substituting a CUA or a UUA codon encoding Leucine with a CUC, CUU, UUG or a CUG codon, or substituting an UUU codon encoding Phenylalanine with an UUC codon, or substituting an AGU or an UCA codon encoding Serine with an UCC, UCG, UCU or an AGC codon, or substituting a GUA codon encoding Valine with a GUC, GUG or a GUU codon.
Cultivating the host cell
The present invention also relates to methods for producing a polypeptide of interest comprising cultivating a host cell under conditions suitable for production of the polypeptide of interest. According to the production methods of the present invention, the cells are cultivated in a nutrient medium suitable for production of the polypeptide of interest using methods known in the art. For example, the cell may be cultivated by shake flask cultivation, small- scale or large-scale fermentation (including continuous, batch, fed-batch, or solid state fermentations) in laboratory or industrial fermentors performed in a suitable medium and under conditions allowing the polypeptide of interest to be expressed and/or isolated. The cultivation may take place in a suitable nutrient medium comprising carbon and nitrogen sources and inorganic salts, using procedures known in the art. Suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide of interest is secreted into the nutrient medium, the polypeptide of interest can be recovered directly from the medium. If the polypeptide of interest is not secreted, it can be recovered from cell lysates. Preferably the polypeptide is secreted by the host cell into the culture medium.
Detection, recovery and purification of the polypeptide of interest The polypeptide in question may be detected using methods known in the art that are specific for the polypeptide. These detection methods may include use of specific antibodies, formation of an enzyme product, or disappearance of an enzyme substrate. For example, an enzyme assay may be used to determine the activity of the polypeptide of interest. The resulting polypeptide of interest may be recovered by methods known in the art.
For example, the polypeptide of interest may be recovered from the nutrient medium by conventional procedures including, but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation.
The polypeptide of interest may be purified by a variety of procedures known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), SDS-PAGE, or extraction (see, e.g., Protein Purification, J.-C. Janson and L. Ryden, editors, VCH Publishers, New York, 1989).
Determination of seguence identity
The sequence identity referred to in this disclosure is understood as the degree of identity between two sequences indicating a derivation of the first sequence from the second. The sequence identity (i.e. the homology) may suitably be determined by means of computer programs known in the art such as GAP provided in the GCG program package (Program Manual for the Wisconsin Package, Version 8, August 1994, Genetics Computer Group, 575 Science Drive, Madison, Wisconsin, USA 53711) (Needleman, S.B. and Wunsch, CD., (1970), Journal of Molecular Biology, 48, 443-453. The following settings for sequence comparison are used: GAP creation penalty of 3.0 and GAP extension penalty of 0.1.
Embodiments
As mentioned above the invention provides in a first, a second, and a third aspect methods of producing a polypeptide of interest obtained from a plant in a filamentous fungus host cell, and in further aspects to a DNA sequence, a DNA construct comprising the DNA sequence, a recombinant expression vector comprising the DNA construct and a host cell. In still further aspects the invention provides a use of the DNA sequence, a use of the DNA construct, and a use of the host cell in a method for producing a plant polypeptide of interest in improved yields. In one embodiment the DNA sequence encoding the amino acid sequence of the plant polypeptide of interest has had at least one codon, at least 1% of the codons native to the plant source, or at least 5%, 10%, 25%, 50%, 75%, 80%, 85%, 90%, 95% or preferably at least 99% of the codons native to the plant source and encoding an amino acid substituted with a second codon encoding the same amino acid, said second codon having a higher frequency than the native codon in the codon usage of the intended host cell.
In another embodiment the DNA construct of the invention comprises a TAKA/TPI promoter sequence, a TAKA signal sequence and a codon-optimized sequence encoding the polypeptide of interest.
In another embodiment the modification of the DNA sequence results in an increase by at least 1%, 5%, 10%, 25%, 50%, 100%, 200%, 300%, 400%, or preferably at least 500% of the expression level of the polypeptide of interest by the host cell, compared to the expression level of the polypeptide of interest by the host cell comprising the native DNA sequence encoding the polypeptide of interest.
In another preferred preferred the modification of the DNA sequence results in a production of 0.1 g per liter, 0.2 g, 0.3 g, 0.4 g or more preferably 0.5 g per liter of enzyme protein of the polypeptide of interest by the host cell, compared to the expression level of the
polypeptide of interest by the host cell comprising the native DNA sequence encoding the polypeptide of interest.
The present invention is further described by the following examples which should not be construed as limiting the scope of the invention.
EXAMPLES
Methods
Molecular cloning techniques are described in Sambrook.J., Fritsch.E.F., ManiatisT. (1989) Molecular Cloning: A Laboratory Manual, 2nd edition. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.
Media and reagents
Chemicals used for buffers and substrates were commercial products of analytical grade.
AMG trace-element solution was composed of 2.5 g of CuSO4-5H2O, 6.8 g of ZnCI2, 0.24 g of NiCI2-6H2O, 13.9 g of FeSO4-7H2O, 13.6 g of MnSO4-5H2O, and 3.0 g of Citric acid monohydrate (Wako No. 035-03495), water to 1 litre. AnigsfR was composed of 45 g of glucose, 2 g of KH2PO4, 2 g of MgSO -7H2O, 2 g of K2SO4, 2 g of citric acid monohydrate (Wako No. 035-03495), 3 g of yeast extract, 0.5 ml of AMG trace element solution and 40 g of sodium succinate (pH 6.0), water to 1 litre. Before use 3 ml of 20% Urea was added. Cove plates were composed of 342.3 g of sucrose, 20 ml of Cove salt solution, 10 mM of acetamide, and 30 g of noble agar, water to 1 litre. Cove salt solution was composed of 26 g of KCI, 26 g of MgSO4-7H2O, 76 g of KH2PO4 and 50 ml Cove trace-element J solution, water to 1 litre. Cove top agarose was composed of 342.3 g of Sucrose, 20 ml of Cove salt solution, 10mM of Acetamide, and 10 g of low melt agarose, water to 1 litre. Cove trace-element J solution was composed of 0.04 g NaB4O7-10H2O, 0.4 g of CuSO4-5H2O, 1.2 g of FeSO4-7H2O, 1.0 g of MnSO4-H2O, 0.8 g of Na2MoO2-2H2O, and 10.0 g of ZnSO4-7H2O, water to 1 litre. Cove-2 plates were composed of 30 g of sucrose, 20 ml of Cove salt solution, 10 mM of acetamide, and 30 g of noble agar, water to 1 litre. 1/5 MDU-2Bp was composed of 9 g maltose-H2O, 1.4 g of yeast extract, 2.4 g of KH2PO4, 0.2 g of MgSO4-7H2O, 0.4 g of K2SO , 0.2 g NaCI and 0.1 ml of AMG trace metal solution, water to 1 liter (pH 5.0). MLC was composed of 50 g of soybean powder, 40 g of glucose, and 4 g citric acid monohydrate (Wako No. 035-03495), water to 1 litre (pH 5.0). MS-9 was composed of 30 g of Soybean powder, 20 g of glycerol, water to 1 liter (pH 6.0). STC buffer was composed of 0.8 M of sorbitol, 25 mM of Tris (pH 8), and 25 mM of CaCI2, water to 1 litre. STPC buffer was composed of 40 % PEG4000 in STC buffer. YPD medium was composed of 20 g of peptone
(Difco), 10 g of yeast extract (Difco) and 20 g of glucose, water to 1 litre. YPG medium was composed of 4 g of yeast extract, 1 g of KH2PO4, 0.5 g of MgSO4-7H2O and 15 g of Glucose, water to 1 litre (pH 6.0).
Enzymes, E.coli, plasmids and kits
Enzymes for DNA manipulations (e.g. restriction endonucleases, ligases etc.) were obtained from New England Biolabs, Inc. and used according to the manufacturer's instructions.
E.coli JDH5-alpha (Toyobo) was used for plasmid construction and amplification. The commercial plasmids/vectors pT7Blue (Invitrogen, Netherlands) and pUC19 (Genbank Accession No. X02514) were used for cloning of PCR fragments and for expression vector construction.
Amplified plasmids were recovered with Qiagen® Plasmid Kit (Qiagen). Ligation was performed with a DNA ligation kit (Takara) or T4 DNA ligase (Boehringer Mannheim). Polymerase Chain Reaction (PCR) was carried out either with rTth DNA polymerase (Applied Biosystems) or with Expand™ PCR system (Boehringer Mannheim). PCR was performed using the buffer provided with the applied DNA polymerase. QIAquick™ Gel Extraction Kit (Qiagen) was used for purification of PCR fragments and DNA fragments excised from agarose gels.
Expression host strain
The Aspergillus niger expression host was a strain genetically modified to disrupt expression of glucoamylase, acid-stable alpha-amylase and alpha-1 ,6 transglucosidase activities.
The Aspergillus oryzae expression host strain was the strain BECh2.
Expression plasmid
The expression cassette plasmid pHD464 comprised the A oryzae TAKA promoter, the Aspergillus nidulans TPI leader sequences, and the Aspergillus niger glucoamylase terminator. The plasmid pHD464 was a derivative of pHD414 (disclosed in WO 93/11249) in which the last 65 bp of TAKA promoter just up-stream of the start codon ATG has been replaced with 71 bp of the A nidulans TPI promoter. The sequence of the A nidulans TPI promoter is available, e.g. at EMBL as D10019. The resulting sequence was termed the TAKA/TPI promoter sequence.
The plasmid pToC186 was a derivative of pToC90 wherein the amdS promoter has been modified with two mutations that increase the transcription of the amdS gene; the amdl66 which is a duplication of a 17 bp stretch (Katz M. E. et al (1990) Mol. Gen. Genet,
0108
220: 373-376) and the amdlθ which is a C to T point mutation (Todd R. B. (1998) EMBO 17, 2042-2054).
Transformation of Aspergillus sp. The host strain was propagated in 100 ml of non-selective YPG medium at 32°C for 16 hrs on a rotary shaker at 120 rpm. Cells were collected by filtering, washed with 0.6 M KCI and resuspended in 20 ml of 0.6 M KCI containing a commercial beta-glucanase product (Glucanex, Novozymes A/S) at 600 microL/ml. The suspension was incubated at 32°C at 80 rpm until protoplasts were formed, then washed twice with STC buffer. The protoplasts were counted with a hematometer and adjusted in an 8:2:0.1 solution of STC:STPC:DMSO to a final concentration of 2.5x107 protoplasts/ml. About 3 microgram of DNA was added to 100 microL of protoplast suspension, mixed gently and incubated on ice for 20 min. One ml of SPTC was added and the protoplast suspension was incubated for 30 min at 37°C. After the addition of 10 ml of 50°C Cove top agarose, the reaction was poured onto Cove agar plates and the plates were incubated at 32°C for 5 days. As untransformed cells cannot grow or grow very slowly on Cove medium, transformants were easily selected.
Assay of soy protease D3-beta activity The assay method for protease D3-beta was modified after A. J.Barrett & H.Kirschke (1981), Methods Enzymol.80:535. Reaction mixture contained 10 micromole Z-Phe-Arg-MCA (Carbobenzoyl-L-Phenylalanyl-L-Arginine 4-Methyl-Coumaryl-7-Amide, purchased from Peptide Institute, Inc.), 50 mM potassium phosphate (pH 6.5), 0.2 M NaCI, and 1 mM 2-mercaptoethanol. The enzyme reaction was performed at room temperature. Activity was determined by measuring the intensity of fluorescence from released AMC (7-Amino-4-Methyl-Coumarin, Standard AMC purchased from Peptide Institute, Inc.). Protease D3-beta cuts the peptide bond between Arg and MCA of Z-Phe-Arg-MCA and releases AMC. The wavelength of excitation and emission of AMC was 370 nm and 460 nm, respectively. One unit of activity was defined as the enzyme activity that can catalyze the release of 1 micromole of AMC in one minute (1 micromole of AMC/min).
Example 1 ; Plasmid construction for the expression of protease D3-beta in Aspergillus sp.
Synthesis of codon-optimized sov protease D3-beta gene
The DNA sequence of the native protease D3-beta gene was disclosed in JP1997121870 as SEQ ID NO:2 and may also be retrieved from EMBL Nucleotide Sequence
Database (http://www.ebi.ac.uk/embl/index.html) as accession number E13052. It encodes a prepeptide, a propeptide, a mature peptide and a telopeptide sequence giving the amino acid
sequence SEQ ID NO:1. For expression in Aspergillus sp. the original D3-beta prepeptide sequence was substituted with the TAKA signal peptide sequence. The sequence encoding the telopeptide was omitted as it was supposed to have no significant influence on the enzyme activity. Thus of the native protease D3-beta polypeptide the synthetic gene encoded the amino acid sequence from Tyr26 to Ser379, which fragment encompassed the propeptide and the mature peptide, here termed proD3-beta.
The DNA sequence of the complete synthetic sequence comprising the TAKA signal peptide sequence, the sequence encoding the propeptide and the mature peptide of protease D3-beta, the two latter with codon usage adjusted to be optimal for Aspergillus sp., is shown as SEQ ID NO:2. The codon usage of Aspergillus sp. is shown in table 1 while the codon usage in the native protease D3-beta gene and the synthesized gene is shown in table 2.
The codon-optimized proD3-beta gene was synthesized using the 46 oligonucleotides shown below and in SEQ ID NO:3 to SEQ ID NO:48. The oligonucleotides were synthesized with a thermocycler DNA Engine PTC-200 (MJ Research Inc) and assembled by PCR as follows.
SEQ ID NO:3 <prika1->;DNA;> atgaattcatgaagctgctgtccctgaccggcg SEQ ID NO:4 <prika2->;DNA;> tcgccggcgtcctggccacctgcgtcgccgccacccccctggtcaagcgc SEQ ID NO:5 <d31-50;DNA;> tacgactccgcccacgccgacaaggccgccaccctgcgcaccgaggagga SEQ ID NO:6 <d351-100;DNA;> gctgatgtccatgtacgagcagtggctggtcaagcacggcaaggtctaca SEQ ID NO:7 <d3101-150;DNA;> acgccctgggcgagaaggagaagcgcttccagatcttcaaggacaacctg SEQ ID NO:8 <d3150-200;DNA;> cgcttcatcgacgaccacaactccgccgaggaccgcacctacaagctggg SEQ ID NO:9 <d3201-250;DNA;> cctgaaccgcttcgccgacctgaccaacgaggagtaccgcgccaagtacc SEQ ID NO:10 <d3251-300;DNA;> tgggcaccaagatcgaccccaaccgccgcctgggcaagaccccctccaac SEQ ID NO:11 <d3301-350;DNA;> cgctacgccccccgcgtcggcgacaagctgcccgactccgtcgactggcg SEQ ID NO:12 <d3351-400;DNA;> caaggagggcgccgtcccccccgtcaaggaccagggcggctgcggctcct SEQ ID NO:13 <d3401-450;DNA;> gctgggccttctccgccatcggcgccgtcgagggcatcaacaagatcgtc SEQ ID NO:14<d3451-500;DNA;> accggcgagctgatctccctgtccgagcaggagctggtcgactgcgacac SEQ ID NO:15 <d3501-550;DNA;> cggctacaaccagggctgcaacggcggcctgatggactacgccttcgagt SEQ ID NO:16<d3551-600;DNA;> tcatcatcaacaacggcggcatcgactccgacgaggactacccctaccgc SEQ ID NO:17 <d3601-650;DNA;> ggcgtcgacggccgctgcgacacctaccgcaagaacgccaaggtcgtctc SEQ ID NO: 18 <d3651-700;DNA;> catcgacgactacgaggacgtccccgcctacgacgagctggccctgaaga SEQ ID NO:19 <d3701-750;DNA;> aggccgtcgccaaccagcccgtctccgtcgccatcgagggcggcggccgc SEQ ID NO:20 <d3751-800;DNA;> gagttccagctgtacgtctccggcgtcttcaccggccgctgcggcaccgc SEQ ID NO:21 <d3801-850;DNA;> cctggaccacggcgtcgtcgccgtcggctacggcaccgccaagggccacg SEQ ID NO:22 <d3851-900;DNA;> actactggatcgtccgcaactcctggggctcctcctggggcgaggacggc SEQ ID NO:23 <d3901-950;DNA;> tacatccgcctggagcgcaacctggccaactcccgctccggcaagtgcgg SEQ ID NO.24 <d3951-1000;DNA;> catcgcgatcgagccctcctaccccctgaagaacggccccaaccccccca SEQ ID NO:25 <d31000-1055;DNA;> accccggcccctcccccccctcccccgtcaagccccccaacgtctaactcgagat SEQ ID NO:26 <d31026-1055;DNA;> atctcgagttagacgttggggggcttgacg
SEQ ID NO:27 <d3976-1025;DNA;> ggggaggggggggaggggccggggttgggggggttggggccgttcttcag SEQ ID NO:28 <d3926-975;DNA;> ggggtaggagggctcgatcgcgatgccgcacttgccggag SEQ ID NO:29 <d3876-925;DNA;> cgggagttggccaggttgcgctccaggcggatgtagccgtcctcgcccca SEQ ID NO:30 <d3826-875;DNA;> ggaggagccccaggagttgcggacgatccagtagtcgtggcccttggcgg SEQ ID N0:31 <d3776-825;DNA;> tgccgtagccgacggcgacgacgccgtggtccagggcggtgccgcagcgg SEQ ID NO:32 <d3726-775;DNA;> ccggtgaagacgccggagacgtacagctggaactcgcggccgccgccctc SEQ ID NO:33 <d3676-725;DNA;> gatggcgacggagacgggctggttggcgacggccttcttcagggccagct SEQ ID NO:34 <d3626-675;DNA;> cgtcgtaggcggggacgtcctcgtagtcgtcgatggagacgaccttggcg SEQ ID NO:35 <d3576-625;DNA;> ttcttgcggtaggtgtcgcagcggccgtcgacgccgcggtaggggtagtc SEQ ID NO:36 <d3526-575;DNA;> ctcgtcggagtcgatgccgccgttgttgatgatgaactcgaaggcgtagt SEQ ID NO:37 <d3476-525;DNA;> ccatcaggccgccgttgcagccctggttgtagccggtgtcgcagtcgacc SEQ ID NO:38 <d3426-475;DNA;> agctcctgctcggacagggagatcagctcgccggtgacgatcttgttgat SEQ ID NO:39 <d3376-425;DNA;> gccctcgacggcgccgatggcggagaaggcccagcaggagccgcagccgc SEQ ID NO:40 <d3326-375;DNA;> cctggtccttgacgggggggacggcgccctccttgcgccagtcgacggag SEQ ID N0:41 <d3276-325;DNA;> tcgggcagcttgtcgccgacgcggggggcgtagcggttggagggggtctt SEQ ID NO:42 <d3226-275;DNA;> gcccaggcggcggttggggtcgatcttggtgcccaggtacttggcgcggt SEQ ID NO:43 <d3176-225;DNA;> actcctcgttggtcaggtcggcgaagcggttcaggcccagcttgtaggtg SEQ ID NO:44 <d3126-175;DNA;> cggtcctcggcggagttgtggtcgtcgatgaagcgcaggttgtccttgaa SEQ ID NO:45 <d376-125;DNA;> gatctggaagcgcttctccttctcgcccagggcgttgtagaccttgccgt SEQ ID NO:46 <d326-75;DNA;> gcttgaccagccactgctcgtacatggacatcagctcctcctcggtgcgcagggtggcgg SEQ ID NO:47 <d325<-;DNA;> ccttgtcggcgtgggcggagtcgtagcgcttgaccaggggggtggcggcgacgcaggt SEQ ID NO:48 <prika3<-;DNA;> ggccaggacgccggcgacgccggtcagggacagcagcttcatgaattcat
The first PCR reaction mixture comprised 1 pmol of each of the oligonucleotides, 0.2 mM of dNTPs, and 1 unit of rTth DNA Polymerase in 50 microL of buffer.
The reaction was incubated in a thermocycler DNA Engine PTC-200 (MJ Research Inc) programmed as follows: 55 cycles of 94°C for 30 sec, 60°C for 30 sec and 68°C for 30 sec.
One microL of the resulting first PCR reaction solution was used for the second PCR with 0.4 mM dNTP, 100 pmol of each of the two oligonucleotides prikal (SEQ ID NO:3) and d31026-1055 (SEQ ID NO:26), and 1.75 units of Expand™ polymerase in 50 microL of buffer.
The reaction was submitted to 25 cycles of 94°C for 30 sec, 60°C for 30 sec and
68°C for 1 minute. The duration of the 68°C extension step was prolonged with 10 sec per cycle. A 4°C hold step completed the program. One microL of the resulting second PCR reaction solution was used for the third
PCR with 0.4 mM dNTP, 100 pmol of each of the two oligonucleotides prikal (SEQ ID NO:3) and d31026-1055 (SEQ ID NO:26), and 1.75 units of Expand™ polymerase in 50 microL of buffer. The reaction was submitted to 94°C for 2 minutes followed by 30 cycles of 94°C for 15 sec, 60°C for 30 sec and 72°C for 1 minute. From cycle 11 to 30 the duration of the 72°C
extension step was prolonged with 20 sec per cycle. A final extension step of 72°C for 7 minutes followed by a 4°C hold step completed the program.
The amplified fragments of expected size were purified on 0.8% agarose gel and ligated into the pT7blue plasmid. The transformants were then screened by restriction digesting of extracted plasmid DNA with Xho I and EcoR I followed by plasmid extraction and sequencing as described above for control of correct sequence. Three plasmids designated pTD3#5, pTD3#19 and pTD3#38 each containing different errors were obtained. The 0.25 kb of Pst I and Bgl II fragment from pTD3#19 was ligated to the 3.75 kb Pst I and Bgl II fragment of pTD3#38 resulting in pTD3#19-38. The 0.32 kb of Xcm I and Sac II fragment from pTD3#5 was ligated to the 3.68 kb Xcm I and Sac II fragment of pTD3#19-38 resulting in pTD3#19-38- 5. The plasmid pTD3#19-38-5 contained the complete codon-optimized proD3-beta gene with no sequence error (SEQ ID NO:2).
Construction of the expression plasmid To introduce the desired restriction enzyme sites and the signal peptide sequence of the TAKA amylase upstream to the proD3-beta gene. PCR amplification was performed using the primers prika4 (SEQ ID NO:49) and prikaδ (SEQ ID NO:50). The two primers contain a
Xho I and a BamH I restriction site respectively (shown as underlined). In addition the primer prikaδ contains the TAKA signal peptide sequence (shown as bold). prika4 (SEQ ID NO:49): atctcgag.ttaggagtagtagttgtcgcagacgttggggggcttgacgggggaggggggg prikaδ (SEQ ID NO:50): atggatccatgatggtcgcctggtggtccctgttcctgtacggcctgcaggtcgccgcccccgccctggcctacgactcc gcccacgccgacaaggccgccaccctgcgcaccgaggagga The reaction mixture comprised 10 ng of pTD3#19-38-5, 0.25 mM of dNTP, 100 pmol of each primer and 3.5 units of Expand™ polymerase in 100 microL of buffer with
MgCI2. PCR was performed under the following conditions: The reaction was submitted to
94°C for 2 minutes followed by 30 cycles of 94°C for 15 sec, 60°C for 30 sec and extension at
70°C for 1 minute. From cycle 11 to 30 the duration of the 70°C extension step was prolonged with 20 sec per cycle. A final extension step at 70°C for 7 minutes followed by a 4 °C hold step completed the program.
The amplified proD3-beta gene including the DNA sequence encoding the TAKA signal peptide was gel purified and ligated into pT7blue and the resulting plasmid was termed pt-vTD3. A second PCR amplification was carried out with the primers prikaδ (SEQ ID NO:51) and prika7 (SEQ ID NO:52). The two primers contain the restriction sites BamH I and Bgl II respectively (shown as underlined). In addition the primer prikaδ introduced the 3 bases ACC
P T/DK03/00108 in front of the TAKA signal peptide sequence (TAKA signal peptide sequence fragment shown as bold). prikaθ (SEQ ID NO:51): atggatccaccatgatggtcgcctggtggtccctgttcctgt prika7 (SEQ ID NO:52): gaagatctggaagcgcttctccttctcgcccag The reaction mixture comprised 10 ng of pt-vTD3 as template, 0.25mM of dNTP, 100 pmol of each primer and 3.5 units of Expand™ polymerase in 100 microL of buffer with MgCI2. The reaction was submitted to 94°C for 2 minutes followed by 30 cycles of 94°C for 15 sec, 60°C for 30 sec and extension at 70°C for 45 sec. From cycle 11 to 30 the duration of the 70°C extension step was prolonged with 20 sec per cycle. A final extension step at 70°C for 7 minutes followed by a 4°C hold step completed the program.
Following PCR the reactions were electrophoresed on a 0.8% agarose. The 0.21 kb fragment were excised from the gel, gel-purified with QIA gel extraction kit (Qiagen, Inc., Chatsworth, CA) following the manufacturer's instructions and digested by BamH I and Bgl II before being ligated in a pt-vTD3 previously cut by BamH I and Bgl II. The resulting plasmid, termed pt-vkTD3 was sequenced for confirmation of correct sequence.
The 1.1 kbp fragment with the codon-optimized proD3-beta gene and the TAKA signal peptide sequence was excised from pt-vkTD3 by BamH I and Xho I. It was ligated to pHD464 cut by BamH I and Xho I. The resulting plasmid was termed pkTD3.
The expression plasmid containing the native protease D3-beta gene was constructed in a similar manner as the plasmid pkTD3. The resulting plasmid was termed pkTintD3.
Example 2; Expression of the native and the codon-optimized protease D3-beta gene in Aspergillus oryzae The A oryzae host strain Bech2 was transformed with the marker plasmid pToC186, and either the expression plasmid pkTintD3 or the expression plasmid pkTD3, and selection positive transformants were isolated. Transformants were cultured on Cove-2 agar at 32°C for 5 days and a piece of agar with growing culture was inoculated to 100 ml of MS-9. After cultivation on a rotary shaker at 220 rpm at 32°C for 24 hours, 3 ml of each culture was transferred to 100 ml of 1/5 MDU-2Bp and cultivated with shaking at 32°C for 3 days. The culture broth was centrifuged at 3500 rpm for 15 minutes and the supernatant was collected. The protease D3-beta activity of the supernatant was determined spectrophotometrically as described before. The best transformants with the native gene yielded 4.7x102 units of fluorescence and the best transformants with the codon-optimized gene yielded 1.5x104 units of fluorescence, while untransformed A oryzae showed no activity (5 units of fluorescence).
Expression of the codon-optimized gene in A oryzae thus resulted in a 30 times increase in production of the polypeptide of interest compared to the expression of the native gene.
Example 3; Expression of the native and the codon-optimized protease D3-beta gene in Aspergillus niger
The A niger host strain was transformed with the marker plasmid pToC186 and with either the expression plasmid pkTintD3 or the expression plasmid pkTD3, and selection positive transformants were isolated. Transformants were cultured on COVE-2 agar at 32°C for 5 days and a piece of agar with growing culture was inoculated to a flask with 100 ml of MLC. After cultivation on a rotary shaker at 220 rpm at 30°C for 48 hours day, 3 ml of each culture was transferred to 100 ml of AnigsfR in flasks and cultivated at 30°C for 3 days under shaking. Culture broth was centrifuged at 3500 rpm for 15 minutes and the supernatant was collected. The protease D3-beta activity of the supernatant was determined spectrophotometrically as described above. The best transformants with the native gene yielded 2.0x104 units of fluorescence, the best transformants with the codon-optimized gene yielded 2.1x106 units, while untransformed A niger showed no activity (5 units of fluorescence). Culture supernatant was also subjected to SDS-PAGE analysis and the gel was stained with CBB. Protease D3-beta positive transformants showed a band of the expected size of 30kDa, while this band was not detected in the broth of negative transformant or in untransformed A niger. Expression of the codon-optimized gene in A niger thus resulted in an 105 times increase in production of the polypeptide of interest compared to the expression of the native gene. Using the codon-optimized gene approximately 0.5 g of protease D3-beta enzyme protein was produced per litre.