[go: up one dir, main page]

WO2024259585A1 - Method for co-expressing proteins - Google Patents

Method for co-expressing proteins Download PDF

Info

Publication number
WO2024259585A1
WO2024259585A1 PCT/CN2023/101377 CN2023101377W WO2024259585A1 WO 2024259585 A1 WO2024259585 A1 WO 2024259585A1 CN 2023101377 W CN2023101377 W CN 2023101377W WO 2024259585 A1 WO2024259585 A1 WO 2024259585A1
Authority
WO
WIPO (PCT)
Prior art keywords
mpp
cleav
cell
cleavable
proteins
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2023/101377
Other languages
French (fr)
Inventor
Jianguo Yang
Nan XIANG
Yiheng LIU
Chenyue GUO
Chenyu LI
Hui Li
Shuyi CAI
Ray DIXON
Yiping Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to PCT/CN2023/101377 priority Critical patent/WO2024259585A1/en
Publication of WO2024259585A1 publication Critical patent/WO2024259585A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/02Preparation of peptides or proteins having a known sequence of two or more amino acids, e.g. glutathione
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/37Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from fungi
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K7/00Peptides having 5 to 20 amino acids in a fully defined sequence; Derivatives thereof
    • C07K7/04Linear peptides containing only normal peptide links
    • C07K7/06Linear peptides containing only normal peptide links having 5 to 11 amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0095Oxidoreductases (1.) acting on iron-sulfur proteins as donor (1.18)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • C12N9/50Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
    • C12N9/58Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from fungi
    • C12N9/60Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from fungi from yeast
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P17/00Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms
    • C12P17/16Preparation of heterocyclic carbon compounds with only O, N, S, Se or Te as ring hetero atoms containing two or more hetero rings
    • C12P17/165Heterorings having nitrogen atoms as the only ring heteroatoms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/04Preparation of oxygen-containing organic compounds containing a hydroxy group acyclic
    • C12P7/16Butanols
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y118/00Oxidoreductases acting on iron-sulfur proteins as donors (1.18)
    • C12Y118/06Oxidoreductases acting on iron-sulfur proteins as donors (1.18) with dinitrogen as acceptor (1.18.6)
    • C12Y118/06001Nitrogenase (1.18.6.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y304/00Hydrolases acting on peptide bonds, i.e. peptidases (3.4)
    • C12Y304/24Metalloendopeptidases (3.4.24)
    • C12Y304/24064Mitochondrial processing peptidase (3.4.24.64)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/07Fusion polypeptide containing a localisation/targetting motif containing a mitochondrial localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/50Fusion polypeptide containing protease site
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/185Escherichia
    • C12R2001/19Escherichia coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/645Fungi ; Processes using fungi
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/645Fungi ; Processes using fungi
    • C12R2001/85Saccharomyces
    • C12R2001/865Saccharomyces cerevisiae

Definitions

  • the present invention relates to the field of bioengineering and in particularly, to Mitochondrial Processing Peptidase (MPP) cleavable sequences and uses thereof.
  • MPP Mitochondrial Processing Peptidase
  • Efficient co-expression of multiple proteins or their subunits finds many important applications in the field of life sciences and biotechnology. For instance, such a task is necessary for creating a complex biological system, constructing a biosynthetic pathway and producing a multimeric protein (such as an antibody, a biologically functional complex, etc. ) in a host cell or a cell free system.
  • a multimeric protein such as an antibody, a biologically functional complex, etc.
  • co-expression of multiple proteins or their subunits mostly relies on construction of individual expression cassettes in which each of the protein to be co-expressed is under the control of an individual promoter. This method would limit the number of the genes that can be co-expressed in a cell, especially in a eukaryotic cell.
  • RNAencoding proteins involve fusing the coding sequences of multiple proteins in tandem as a polycistronic DNA or a DNA encoding a polyprotein, in which the co-expression of multiple proteins is under the control of one promoter.
  • a ribosome binding site (RBS) or an internal ribosomal entry site (IRES) can be used to connect the coding sequences of multiple proteins to form a polycistronic DNA.
  • the polycistronic DNA will be transcribed into one polycistronic mRNA, and the translation of each protein is independent with each other.
  • a cleavable peptide can be used to connect multiple proteins to form a polyprotein.
  • TEVp Tobacco etch virus protease
  • IRES-mediated translational initiation is less efficient compared with that of the 5’-cap-mediated initiation and results in uneven protein co-expression (Ha, S.H., Liang, Y.S., Jung, H., Ahn, M.J., Suh, S.C., Kweon, S.J., Kim, D.H., Kim, Y.M., and Kim, J.K. (2010) Application of two bicistronic systems involving 2A and IRES sequences to the biosynthesis of carotenoids in rice endosperm. Plant Biotechnol. J.
  • a Mitochondrial Processing Peptidase (MPP) cleavable sequence can be used to convert multiple proteins into fusion proteins, facilitating the co-expression of multiple proteins or their subunits in mitochondria, cytoplasm or cell-free systems.
  • MPP Mitochondrial Processing Peptidase
  • the fusion proteins can be cleaved by an endogenous MPP located in mitochondria and the individual protein will be released and become functional proteins.
  • this method can be applied to other cellular or non-cellular environment, where the fusion proteins are co-expressed with an exogenous MPP.
  • MPP-based method Precise cleavage of fusion proteins by MPP and efficient co-expression of individual proteins can be achieved in either mitochondria or cytoplasm, wherever in prokaryotes or eukaryotes.
  • This MPP- based method succeeds in functional expression of multiple proteins with high level of protein accumulation in the circumstance of protein co-expression.
  • the invention relates to a method for co-expressing two or more proteins in a cell or a cell-free system, comprising expressing the two or more proteins as a fusion protein in which at least two adjacent proteins are linked by a linker comprising a Mitochondrial Processing Peptidase (MPP) cleavable region, wherein cleavage of the MPP cleavable region by an MPP in the cell or the cell-free system results in release of the proteins linked by the linker.
  • MPP Mitochondrial Processing Peptidase
  • each of the protein to be co-expressed is linked to the adjacent protein (s) in the fusion protein by a linker comprising an MPP cleavable region or an uncleavable linker, and wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  • the cell co-expressing one or more proteins endogenously expresses an MPP.
  • the cell is a eukaryotic cell and the MPP is located at mitochondria of the cell, wherein the fusion protein is located at mitochondria of the cell.
  • the cell co-expressing one or more proteins exogenously expresses an MPP.
  • the cell is a prokaryotic cell or a eukaryotic cell and the MPP is located in the cytoplasm of the cell, and wherein the fusion protein is expressed in the cytoplasm of the cell.
  • the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some preferred embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burk
  • the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  • the fungal cell is a yeast cell.
  • the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
  • the cell-free system co-expressing one or more proteins comprises an effective amount of the MPP capable of cleaving the fusion protein at the MPP cleavable region.
  • MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii.
  • MPP is derived from Saccharomyces cerevisiae.
  • MPP is derived from Saccharomyces cerevisiae W303-1a strain.
  • MPP is derived from Saccharomyces cerevisiae S288C strain.
  • the linker or the MPP cleavable region comprises one or more MPP cleavable sequences. In some embodiments, the linker or the MPP cleavable region comprises two or more MPP cleavable sequences arranged in tandem, and wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  • each of the MPP cleavable sequences independently comprises a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  • the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria.
  • the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  • the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  • the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X1 comprises RQAFQKRA or RQAFQRRA
  • X2 comprises YSS, FHT or FST.
  • X1 is RQAFQKRA
  • X2 is FHT or FST.
  • the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X3 comprises R (G) n KRA or R (G) n RRA
  • the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  • the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  • the complex biological system is nitrogenase system.
  • the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  • the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN ⁇ NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” .
  • the biosynthetic pathway is violacein biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  • the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  • the biosynthetic pathway is isobutanol biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  • the fusion protein comprises:
  • cleav is the linker comprising the MPP cleavable region.
  • the invention provides an MPP cleavable sequence comprising a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  • the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria.
  • the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  • the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  • the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X1 comprises RQAFQKRA or RQAFQRRA
  • X2 comprises YSS, FHT or FST.
  • X1 is RQAFQKRA
  • X2 is FHT or FST.
  • the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X3 comprises R (G) n KRA or R (G) n RRA
  • the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  • the invention provides a fusion protein comprising two or more proteins to be co-expressed in a cell or a cell-free system, wherein at least two adjacent proteins in the fusion protein are linked by a linker comprising an MPP cleavable region, wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  • the linker or the MPP cleavable region comprises one or more of the MPP cleavable sequence according to the second aspect of the invention, wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  • the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  • the complex biological system is nitrogenase system.
  • the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  • the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN ⁇ NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” .
  • the biosynthetic pathway is violacein biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  • the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  • the biosynthetic pathway is isobutanol biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  • the fusion protein comprises:
  • cleav is the linker comprising the MPP cleavable region.
  • the invention provides a nucleic acid encoding the MPP cleavable sequence according to the second aspect of the invention or the fusion protein according to the third aspect of the invention.
  • the nucleic acid is a linear DNA fragment or an mRNA fragment.
  • the linear DNA fragment is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell.
  • the invention provides a vector comprising one or more of the nucleic acids according to the fourth aspect of the invention.
  • the vector is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell.
  • the vector is a DNA plasmid vector, a viral vector, a bacterial vector, a cosmid, or an artificial chromosome.
  • the invention provides a cell comprising the MPP cleavable sequence according to the second aspect of the invention, the fusion protein according to the third aspect of the invention, the nucleic acid according to the fourth aspect of the invention or the vector according to the fifth aspect of the invention, wherein the cell is a prokaryotic cell or a eukaryotic cell.
  • the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholder
  • the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  • the fungal cell is a yeast cell.
  • the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
  • the yeast is Saccharomyces cerevisiae. In some preferred embodiments, the yeast is Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, the yeast is Saccharomyces cerevisiae S288C strain.
  • Figure 1 Vectors for hierarchical assembly of multiple parts.
  • the type IIS restriction enzymes BsaI and BpiI were used for hierarchical Golden Gate assembly.
  • Level 0 vectors were used for carrying expression elements, such as promoters, CDSs or terminators.
  • Level 1 vectors were used for carrying individual expression cassettes.
  • Level 2 vectors were used for assembling sub-gene clusters.
  • Level 3 vectors were used for assembling complete gene clusters.
  • FIG. 2 Schematic diagram showing the procedure of assembling Nif polyprotein gene cluster.
  • the nifHDK, nifUS, nifFMY, nifJVW, and nifENB expression cassettes were assigned to the Level 1 vectors 1, 2, 3, 8, and 10 respectively.
  • the MBP-TEVp (a maltose binding protein was fused to TEVp to enhance the solubility of the protein) expression cassette or Gap9 (a random sequence) was assigned to Level 1 vector 9.
  • These expression cassettes or gap sequences were assembled in Level 2 vectors to form sub-gene clusters.
  • the sub-gene clusters were further assembled in a Level 3 vector to form a complete Nif polyprotein gene cluster.
  • FIG. 3A-3B Engineering and assessment of TEVp-based Nif polyprotein system in yeast mitochondria.
  • A Schematic diagram of TEVp-based nif constructs for expression in mitochondria of Saccharomyces. cerevisiae W303-1a strain. Symbol “ ⁇ ” is used to represent dual TEVp sites (ENLYFQSENLYFQS) . The Leu2 marker is used for auxotrophic selection of transformants. Constructs were integrated in the YNRC ⁇ 9 locus on chromosome XIV of the yeast genome with ⁇ 600 bp 5’YNRC ⁇ 9 and 3’YNRC ⁇ 9 flanking regions as homology arms.
  • B Immunoblotting of Nif proteins from yeast carrying TEVp-based Nif polyprotein system. Ec Nif indicates protein samples prepared from E. coli cells carrying an operon-based nif system, in which each Nif protein is translated independently. Mit Isolations indicates protein samples prepared from mitochondrial extracts. HSP60 was used as an internal reference.
  • FIG. 4A-4B Growth curves of yeast strains carrying TEVp-based Nif polyprotein system.
  • Sc_535 is a yeast strain transformed with an empty vector carrying Leu2 selection marker (EV) assigned as a control.
  • Sc_410 and Sc_411 are yeast strains carrying constructs pNG410 and pNG411 indicated in Figure 3A, respectively. “Growth rate” represents the relative maximum growth rate of each strain. The maximum growth rate of strain Sc_535 was assigned as 100%.
  • For the panel (A) 2%glucose was added initially.
  • For the panel (B) 0.4%of glucose was used for pre-growth and the 1.6%of galactose was used for inducing protein expression.
  • FIG. 5 Schematic diagram showing plasmid constructs and design procedure for testing the cleavage efficiency of reconstituted MPP in E. coli.
  • GFP-RFP fusions linked by MPP cleavage sites were expressed from the ⁇ 54 dependent P nifH promoter from Klebsiella oxytoca and activated by NifA constitutively expressed from the P tet promoter.
  • MPP ⁇ subunit labeled with HA-tag and ⁇ subunit labeled with His-tag
  • Saccharomyces cerevisiae W303-1a strain was controlled by inducible P tac/lacO promoter.
  • the diamond symbol represents the MPP processing site.
  • Figure 6A-6E Screening for minimal MPP processing sites.
  • A Sequences of the designed MPP processing sites. “AA” is short for amino acid. The symbol “ ⁇ ” represents the proposed processing site of MPP.
  • Panels (B) , (C) , (D) , and (E) show immunoblotting assays to determine the cleavage efficiency of the various MPP processing sites listed in (A) .
  • the symbol “-” indicates the absence of the MPP expression module; “+” indicates the presence of the MPP expression module induced with 100 ⁇ M IPTG.
  • Figure 7 Assessment of the cleavage efficiency of MPP processing sites in mitochondria of S. cerevisiae W303-1a strain.
  • the “2 ⁇ ” symbol indicates that tandem MPP processing sites were used.
  • HSP60 was used as an internal reference for mitochondrial specificity.
  • FIG. 8A-8E Immunoblotting of S10 and S10S linked Nif polyproteins expressed in E. coli.
  • a single MPP-based giant gene was co-transformed with a plasmid containing the remaining nif genes in the operon-based system and grown under nitrogen-fixing conditions.
  • Samples were immediately collected after the acetylene reduction assay (ARA) and subjected to immunoblotting with Nif protein specific antibodies.
  • ARA acetylene reduction assay
  • “2 ⁇ S10” indicates that dual S10 sites (RGGGRRAFHTRGGGRRAFHT) were used and “2 ⁇ S10S” indicates that dual S10S sites (RGGGRRAFSTRGGGRRAFST) were used.
  • Ec Nif indicates protein samples prepared from E.
  • FIG. 9A-9B Nitrogenase activity of MPP-based polyprotein system.
  • A Schematic diagram showing the gene arrangement of TEVp-based polyprotein system (Ver 1.0) and MPP-based polyprotein system (Ver 2.0 and Ver 2.1) . Symbols and are used to represent dual TEVp sites, dual MPP S10 sites and dual S10S sites, respectively. In each case, ARA activities were normalized to the activity exhibited by TEVp-based polyprotein system (Ver 1.0) , which was assigned as 100%. Error bars indicate the SD observed from at least two biological replicates.
  • B Diazotrophic growth promoted by TEVp-based and MPP-based polyprotein systems in E. coli. WT represents the reconstituted operon-based nif system. EV represents empty vector (pBDS1549) used as a negative control. Ver 1.0 and Ver 2.0 represents TEVp-based and MPP-based polyprotein system as shown in (A) , respectively.
  • FIG. 10 Efficient cleavage of polyprotein by MPP in yeast mitochondria.
  • the polyprotein was expressed from P ScGAL1 promoter and targeted to mitochondria of yeast strain Sc_3682 with the help of Su9 signal peptide.
  • Ec Nif indicates protein samples prepared from E. coli cells carrying an operon-based nif system; Sc_535 is a yeast strain transformed with the empty vector pBDS535, which was used as a negative control.
  • Mit Isolations indicates protein samples prepared from mitochondrial extracts. Image J software was used for protein quantification, and relative expression levels are shown in red (in parentheses below each lane, displayed as a percentage relative to Ec Nif) .
  • Figure 11A-11B Assessment of MPP-based Nif polyprotein expression in yeast mitochondria.
  • A Schematic diagram showing parts used to create constructs pBDS3752 and pBDS3942. The only difference between these two constructs is the presence of promoter variants in pBDS3942 highlighted in red. Giant genes are highlighted in blue and the symbols and are used to represent dual S10 sites and dual S10S sites, respectively. The selection marker and integration site are the same as described in Figure 3A.
  • C Immunoblotting of yeast strains carrying MPP-based Nif polyprotein systems. One representative protein from each polyprotein was selected for the immunoblot assay. HSP60 was used as an internal reference.
  • Figure 12A-12C Comparative growth analysis of yeast strains carrying MPP-based and TEV-based Nif polyprotein systems.
  • A growth curves of strains grown in YPD (initial concentration of glucose was 2%) .
  • B growth curves of strains grown in YPDG (initial concentrations of glucose and galactose were 0.4%and 1.6%, respectively)
  • C Table showing the strain information and the maximum growth rate of each strain in panels (A) and (B) .
  • a. The maximum growth rate of the Sc_535 strain was assigned as 100%.
  • b. Indicates SD values lower than 0.5.
  • Figure 13 Assembly of violacein biosynthesis and isobutanol biosynthetic pathways for expression in S. cerevisiae mitochondria. Symbol “ ⁇ ” is used to represent dual MPP S10 site.
  • Figure 14A-14C Constructing of violacein biosynthesis pathway in mitochondria of S. cerevisiae S288C strain using MPP-based polyprotein strategy.
  • A Schematic diagram showing violacein biosynthesis pathway and gene arrangement in polyproteins.
  • B Petri dish experiment displaying the violet pigment biosynthesized by MPP-based Vio polyprotein in yeast mitochondria.
  • I yeast strain transformed with the empty vector (Sc_305) .
  • II yeast strain transformed with MPP-based vioA ⁇ B ⁇ E and vioD ⁇ C polyprotein system carrying dual S10 sites (Sc_297) , in which five violacein biosynthesis genes were assembled as two polyproteins encoded by giant genes.
  • yeast strain transformed with MPP-based vioA ⁇ B ⁇ E and vioD ⁇ C polyprotein system carrying non-cleavable S10 site variants (Sc_299) .
  • the symbol represents dual S10 site variants, in which the arginine residues at positions -2 and -3 were replaced by two alanine residues (RGGGAAAFSTRGGGAAAFST) , to provide a negative control
  • IV yeast strain transformed with MPP-based vioA ⁇ B ⁇ E ⁇ D ⁇ C polyprotein system with tandem S10 sites (Sc_300) , in which five violacein biosynthesis genes were assembled as a giant gene encoding a single polyprotein.
  • Figure 15A-15B Constructing of isobutanol biosynthesis pathway in mitochondria of S. cerevisiae S288C strain using MPP-based polyprotein strategy.
  • A Schematic diagram showing genes in isobutanol biosynthesis pathway and the combinatorial strategy for constructing polyproteins to optimize isobutanol biosynthesis in yeast mitochondria.
  • B Isobutanol production from MPP-based polyprotein combinations shown in (A) .
  • EV indicates empty vector, which was used to assess the native level of isobutanol synthesized in yeast. Polyprotein combinations with the highest isobutanol production level are shown in red (4a and 6b combinations) .
  • the “fusion and cut” feature of polyprotein or fusion protein strategy enables co-expression of protein components in prokaryotes, eukaryotes or cell-free systems at stoichiometric levels, thus enabling the engineering of protein complexes and intricate biochemical pathways requiring balanced gene expression.
  • This strategy also dramatically reduces the number of expression parts for synthetic biology, thus decreasing the overall DNA cargo load and the requirement to diversify promoters and terminators in order to avoid homologous recombination.
  • the complexity of eukaryotic gene regulation and the consequent DNA burden can be reduced through the deployment of minimal orthologous synthetic promoters and terminators, there is a lack of well-characterized expression parts in higher eukaryotes, especially in plants.
  • the polyprotein or fusion protein strategy therefore has significant advantages for engineering complex biological pathways in eukaryotes, since it not only provides balanced protein expression, but also decreases the combinatorial complexity when selecting suitable expression parts for engineering multiple protein-coding sequences.
  • the present invention provides an MPP-based method for efficiently co-expressing multiple proteins or their subunits, wherever in prokaryotes, eukaryotes or cell-free systems.
  • This method relies on a linker comprising an MPP cleavable region, which is used to link two or more proteins into fusion proteins.
  • the MPP cleavable region can be recognized and cleaved in the presence of an MPP, leading to the release of individual proteins within fusion proteins. Therefore, this method can be applied to protein co-expression in mitochondria, where an intrinsic MPP exists. Alternatively, this method can be applied to other cellular or non-cellular environment, where fusion proteins are co-expressed with an exogenous MPP.
  • the strategy disclosed herein could also be generalized to other co-expression methods involving in a specific processing peptidase of organelles, such as chloroplast, endoplasmic reticulum (ER) and vacuole, since all these organelles utilize a specific processing peptidase for signal peptide cleavage (Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. and Ghasemi, Y. (2018) A comprehensive review of signal peptides: Structure, roles, and applications. Eur. J. Cell Biol. 97 (6) : 422-441) .
  • nucleic acid refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain.
  • a nucleic acid is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage.
  • nucleic acid refers to an individual nucleic acid residue (e.g., a nucleotide and/or nucleoside) ; in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues.
  • a “nucleic acid” may be, for example, double-stranded, partially double-stranded, or single-stranded. As single-stranded nucleic acid, the nucleic acid may be the sense or antisense strand. A “nucleic acid” may be circular or linear. As used herein, the term “nucleic acid” encompasses DNA and RNA, including genomes, pre-mRNA, mRNA, cDNA, recombinant or synthetic nucleic acids including vectors. In some embodiments, a “nucleic acid” is or comprises RNA; in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a nucleic acid is, comprises, or consists of one or more nucleic acid analogs.
  • coding sequence means a polynucleotide encoding the amino acid sequence of a protein or polypeptide.
  • the boundaries of a coding sequence are generally determined by an open reading frame, which begins with a start codon (such as ATG, GTG or TTG) and ends with a stop codon (such as TAA, TAG or TGA) .
  • the coding sequence may be derived from genomic DNA, synthetic DNA, or a combination thereof. A skilled person in the art will recognize that due to the degeneracy of the genetic code, several nucleic acids may encode polypeptides having the same amino acid sequence.
  • a codon preference table suitable for the target host cell may be used to modify the codons in the coding sequence of the protein to obtain optimal expression in a particular host cell, such as a prokaryotic cell or a eukaryotic cell. Codon preferences in various hosts (for example, in E. coli, yeast, Arabidopsis, tobacco, maize, insect, mouse, rat, human, etc. ) are well-known in the art.
  • protein and polypeptide are used interchangeably herein and generally refer to polymers of amino acid residues linked by peptide bonds, which has specific function and/or independent three-dimensional structure.
  • Protein and polypeptide encompass full-length proteins and fragments thereof. The term also includes post-expression modifications of protein or polypeptide, such as glycosylation, acetylation, phosphorylation, and the like.
  • protein or polypeptide can be further divided into functional subunits or functional fragments that have independent and distinct functions. Therefore, these functional subunits or functional fragments can also be deemed as “proteins” or “polypeptides” and can be expressed independently.
  • protein and polypeptide also refer to variants obtained after modification, such as deletion, addition, insertion, and substitution (such as conservative amino acid substitutions) , of the amino acid sequence of a wild type protein or polypeptide.
  • wild type refers to a nucleic acid, an amino acid sequence or a protein that is naturally occurs in an organism.
  • a variant has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%sequence identity to a wild type nucleic acid, a wild type amino acid sequence or a wild type protein, provided that the variant retains the original function or activity of the wild type nucleic acid, the wild type amino acid sequence or the wild type protein.
  • Percent identity between two sequences is a function of the number of identical positions shared by the two sequences being compared, taking into account the number of gaps, and the length of each gap, which needs to be introduced for optimal alignment of the two sequences. Comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, percent identity between two nucleotide or amino acid sequences can be determined using the algorithm of Meyers and Miller (CABIOS, 1989, 4: 11-17, which is herein incorporated by reference in its entirety) , which has been incorporated into the ALIGN program (version 2.0) . Other examples of such mathematical algorithms include the algorithm of Myers and Miller (1988) CABIOS 4: 11-17, the local homology algorithm of Smith et al. (1981) Adv.
  • Such programs include, but are not limited to, CLUSTAL of the PC/Gene program, ALIGN program (Version 2.0) , and GAP, BESTFIT, BLAST, FASTA, and TFASTA of the Wisconsin Genetics software package. Alignment using these programs may be performed, for example, by using initial parameters.
  • expression refers to the transcription and/or translation of a gene so that a nucleic acid chain or an amino acid chain is synthesized.
  • a protein is “expressed” or “to be expressed” in the cytoplasm or an organelle of a cell means that the protein is finally located at the corresponding place and functions here.
  • co-express , “co-expressing” and “co-expression” can be used interchangeably herein and mean that two or more proteins or their subunits are expressed or required to be expressed simultaneously in a cell or a cell-free system.
  • the co-expressed proteins can be located at the same subcellular compartment, such as a cytoplasm or an organelle (e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole) .
  • organelle e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole
  • the co-expressed proteins can be located at different subcellular compartments, such as cytoplasm and/or organelle (s) (e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole) .
  • cytoplasm and/or organelle e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole
  • cytoplasm and/or organelle e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole
  • s e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole
  • ER endoplasmic reticulum
  • Golgi reticulum
  • fusion protein and “polyprotein” can be used interchangeably herein.
  • fusion protein or “polyprotein” refers to a polymer of individual proteins in which the C-terminal of an upstream protein is linked with the N-terminal of a downstream protein by a stretch of amino acid residues.
  • the coding sequence of one fusion protein or one polyprotein which composed of two or more individual proteins will be co-transcribed into one polycistronic mRNA and then co-translated into one polypeptide chain.
  • the fusion protein or the polyprotein functions as a whole without further cleavage by a protease or a peptidase.
  • the fusion protein or the polyprotein will suffer from post-translational cleavage by a protease or a peptidase at the region between an upstream protein and a downstream protein and release at least one of the individual proteins.
  • the fusion protein or the polyprotein comprises at least two copies of an individual proteins.
  • protein A, protein B and protein C can be configured to a fusion protein as A-B-C, A-A-B-C, A-A-B-B-C and the like.
  • protease can be used interchangeably herein and refer to a class of enzyme that are capable of recognizing specific protein or peptide and resulting in the breakage of an amino acid sequence of the protein or peptide.
  • a protein or a peptide is “cleaved” or “processed” by a protease or a peptidase means that the amino acid sequence of the protein or the peptide is split by the protease or the peptidase into at least two parts.
  • protease or peptidases having the above functions are within the scope of the invention.
  • a “signal peptide” is a peptide present on proteins that are destined either to be secreted or to be targeted to specific location of a cell, such as cell membrane, mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi and vacuole, etc.
  • a “signal peptide” may occurs in N-terminal or C-terminal of a protein, or even inside the amino acid sequence of a protein.
  • mitochondrial signal peptide As used herein, the terms “mitochondrial signal peptide” , “mitochondrial targeting peptide” , “mitochondrial leader peptide” and “presequence” can be used interchangeably and refer to a signal peptide present on a protein that is capable of targeting this protein into mitochondria. Most of the mitochondrial proteins are encoded by nuclear genes. These proteins are transcribed and translated outside the mitochondria, and enter into mitochondria with the help of mitochondrial signal peptides. Some of the mitochondrial signal peptides can be found in protein annotations disclosed in protein databases, such as Uniprot. Usually, mitochondrial signal peptides will be removed from proteins after these proteins entering into mitochondria.
  • MPP Mitochondrial Processing Peptidase
  • MPP ⁇ is the catalytic subunit of MPP with a conserved Zn-binding motif.
  • yeast and mammals MPPs are free in the matrix of mitochondria, while in plants, MPPs are integrated into cytbc1 complex of the mitochondrial respiratory chain.
  • the term “MPP cleavable sequence” refers to an amino acid sequence unit that can be recognized and cleaved by MPP.
  • the term “MPP cleavable region” refers to an amino acid sequence comprising one or more MPP cleavable sequences, wherein the one or more MPP cleavable sequences are linked with each other directly by a peptide bond or by a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5.
  • the MPP cleavable sequences within one MPP cleavable region are the same or different.
  • the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments of the present invention, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments of the present invention, the MPP cleavable sequence comprises at least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments of the present invention, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP.
  • linker refers to an amino acid chain that link an upstream protein and a downstream protein via a peptide bond, i.e., the C-terminal of the upstream protein is linked to the N-terminal of the linker and the C-terminal of the linker is linked to the N-terminal of the downstream protein both via a peptide bond.
  • linker can be a flexible amino acid chain or a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5.
  • linker can be cleaved by a protease or a peptidase via breaking a peptide bond at one or more specific sites within the linker, thus the upstream protein and the downstream protein connected by the linker will be departed from each other.
  • linker comprises one or more protease or peptidase cleavable sequences, the protease or peptidase may be selected from the group consisting of thrombin, Factor Xa, enterokinase, Tobacco Etch Virus (TEV) protease, PreScission, HRV 3C protease, MPP, Stromal Processing Peptidase (SPP, responsible for recognizing and processing a signal peptide in chloroplast) and Signal Peptidase Complex (SPC, responsible for recognizing and processing a signal peptide in ER) .
  • linker comprises one or more MPP cleavable sequences as described above.
  • linker comprises an MPP cleavable region.
  • linker comprises an MPP cleavable region and a flexible amino acid sequence located at N-terminal and/or C-terminal of the linker, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5.
  • linker is an MPP cleavable region.
  • EC expression cassette
  • EC refers to a nucleic acid or a polynucleotide that contains all the elements required for efficiently expressing a protein.
  • EC comprises a promoter, a Ribosome Binding Site (EBS) , a coding sequence and a terminator.
  • EBS Ribosome Binding Site
  • EC further comprises at least one of the elements selected from the group consisting of an enhancer, an intron, a 5’-UTR, a 3’-UTRand a poly (A) tail.
  • a promoter can be a native promoter or a synthetic promoter. In some embodiments, a promoter can be a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter. The promoter may be a promoter commonly used in eukaryotic expression systems or a promoter used in prokaryotic expression systems.
  • promoters used in eukaryotic expression systems include, but are not limited to, CMV promoter (Cytomegalovirus promoter) , SV40 promoter (Simian virus 40 promoter) , PGK promoter (phosphoglycerate kinase promoter) , EF1 ⁇ promoter (elongation factor 1-alpha promoter) , ⁇ -actin promoter, Ubc promoter (human ubiquitin C gene-derived promoter) , CAG promoter (hybrid mammalian promoter) , TRE promoter (tetracycline response element promoter) , UAS promoter (Drosophila promoter with Gal4 binding site) , Ac5 promoter (Drosophila actin 5c gene-derived insect promoter) , CaMKIIa promoter (Ca2 + /calmodulin-dependent protein kinase II promoter) , GAL1 promoter (yeast galactokinase promoter) , GAL1 and GAL
  • promoters used in prokaryotic expression systems include, but are not limited to, T7 promoter (T7 phage-derived promoter) , T7lac promoter (T7 phage-derived promoter plus lac operator) , Sp6 promoter (Sp6 phage-derived promoter) , araBAD promoter (arabinose metabolism operon-derived promoter) , trp promoter (tryptophan operon-derived promoter) , lac promoter (lac operon-derived promoter) , Ptac promoter (a hybrid promoter of the lac promoter and the trp promoter) , Ptac/lacO promoter (Ptac promoter plus lac operator) and pL promoter (Lambda phage-derived promoter) .
  • vector refers to a nucleic acid or a polynucleotide that is capable of carrying an expression cassette of a gene of interest and facilitating the expression of a protein.
  • Vector can be a linear or a circular DNA or RNA with either single strand or double strands.
  • vector is free in cytoplasm of a cell after entering into the cell.
  • vector is integrated into the genome of a cell after entering into the cell.
  • vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene.
  • the vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
  • a reference to “A and/or B” can refer, in one embodiment, to A only (optionally including elements other than B) ; in another embodiment, to B only (optionally including elements other than A) ; in yet another embodiment, to both A and B (optionally including other elements) ; etc.
  • the “fusion and cut” feature of polyprotein or fusion protein strategy enables co-expression of protein components in prokaryotes, eukaryotes or cell-free systems at stoichiometric levels, thus enabling the engineering of protein complexes and intricate biochemical pathways requiring balanced gene expression.
  • the two or more proteins to be co-expressed can be linked as a fusion protein by a linker that are recognized and cleaved by an enzyme, such as a protease or peptidase.
  • the protease or peptidase may be an endogenous enzyme that naturally exist in a host cell or subcellular compartment, where the proteins of interest are expressed.
  • protease or peptidase may be an exogenous enzyme that are artificially introduced into a host cell or subcellular compartment, where the proteins of interest are expressed.
  • protease or peptidase include, but are not limited to thrombin, Factor Xa, enterokinase, Tobacco Etch Virus (TEV) protease, PreScission, HRV 3C protease, MPP, Stromal Processing Peptidase (SPP, responsible for recognizing and processing a signal peptide in chloroplast) and Signal Peptidase Complex (SPC, responsible for recognizing and processing a signal peptide in ER) .
  • TSV Tobacco Etch Virus
  • SPC Signal Peptidase Complex
  • an MPP cleavable sequence can be used to convert multiple proteins into fusion proteins, facilitating the co-expression of multiple proteins or their subunits in mitochondria, cytoplasm or cell-free systems.
  • the fusion proteins can be cleaved by an endogenous MPP located in mitochondria and the individual protein will be released and become functional proteins.
  • this method can be applied to other cellular or non-cellular environment, where the fusion proteins are co-expressed with an exogenous MPP. Precise cleavage of fusion proteins by MPP and efficient co-expression of individual proteins can be achieved in either mitochondria or cytoplasm, wherever in prokaryotes or eukaryotes.
  • This MPP-based method succeeds in functional expression of multiple proteins with high level of protein accumulation in the circumstance of protein co-expression.
  • the invention relates to a method for co-expressing two or more proteins in a cell or a cell-free system, comprising expressing the two or more proteins as a fusion protein in which at least two adjacent proteins are linked by a linker comprising an MPP cleavable region, wherein cleavage of the MPP cleavable region by an MPP in the cell or the cell-free system results in release of the proteins linked by the linker.
  • each of the protein to be co-expressed is linked to the adjacent protein (s) in the fusion protein by a linker comprising an MPP cleavable region or an uncleavable linker, and wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  • the cell co-expressing one or more proteins endogenously expresses an MPP.
  • the cell is a eukaryotic cell and the MPP is located at mitochondria of the cell, wherein the fusion protein is located at mitochondria of the cell.
  • the cell co-expressing one or more proteins exogenously expresses an MPP.
  • the cell is a prokaryotic cell or a eukaryotic cell and the MPP is located in the cytoplasm of the cell, and wherein the fusion protein is expressed in the cytoplasm of the cell.
  • the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some preferred embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burk
  • the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  • the fungal cell is a yeast cell.
  • the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
  • the yeast is Saccharomyces cerevisiae. In some preferred embodiments, the yeast is Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, the yeast is Saccharomyces cerevisiae S288C strain.
  • the algae is selected from the group consisting of Chlorella, Spirolina, Chlamydononas, Dunaliella, Chaetoceros and Porphyridum, such as Dunaliella tertiolecta, Porphyridium sp., Dunaliella parva, Chlorella pyrenoidosa, Chlamydononas reinhardtii or Chaetoceros muelleri.
  • the animal is selected from the group consisting of mouse, rat, chicken, rabbit, goat, donkey, monkey, pig, sheep and human.
  • the plant is selected from the group consisting of arabidopsis, tobacco, barley, rice, maize, wheat, sorghum, sweet corn, sugar cane, onions, tomatoes, strawberries and asparagus.
  • the plant is selected from the group consisting of Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa, Triticum aestivum, Zea mays, Sorghum bicolor, Setaria italica, Solanum tuberosum, Ipomoea batatas, Arachis hypogaea, Brassica napus, Malva farviflora, Sesamum indicum, Olea europaea, Elaeis guineensis, Saccharum officinarum, Beta vulgaris and Gossypium spp.
  • the cell-free system co-expressing one or more proteins comprises an effective amount of the MPP capable of cleaving the fusion protein at the MPP cleavable region.
  • MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii.
  • MPP is derived from Saccharomyces cerevisiae.
  • MPP is derived from Saccharomyces cerevisiae W303-1a strain.
  • MPP is derived from Saccharomyces cerevisiae S288C strain.
  • the linker or the MPP cleavable region comprises one or more MPP cleavable sequences. In some embodiments, the linker or the MPP cleavable region comprises two or more MPP cleavable sequences arranged in tandem, and wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  • each of the MPP cleavable sequences independently comprises a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  • the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria.
  • the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  • the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  • the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments, the MPP cleavable sequence comprises at least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP.
  • the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
  • MPP cleavable sequences used in a fusion protein are selected from any one of SEQ ID NOs: 67-81 or any combinations thereof.
  • ore or more MPP cleavable sequences within one MPP cleavable region are linked by a peptide bond.
  • one or more MPP cleavable sequences within one MPP cleavable region are linked by a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5.
  • MPP cleavable sequences within one MPP cleavable region are the same or different.
  • linker comprises one or more MPP cleavable sequences as described above. In some embodiments, linker comprises an MPP cleavable region. In some embodiments, linker comprises an MPP cleavable region and a flexible amino acid sequence located at N-terminal and/or C-terminal of the linker, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1- 5. In some embodiments, linker is an MPP cleavable region.
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X1 comprises RQAFQKRA or RQAFQRRA
  • X2 comprises YSS, FHT or FST.
  • X1 is RQAFQKRA
  • X2 is FHT or FST.
  • the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X3 comprises R (G) n KRA or R (G) n RRA
  • the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  • the “G” in the brackets of above formula can be replaced by any other amino acid residues.
  • the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  • the complex biological system is nitrogenase system.
  • the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  • the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN ⁇ NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” .
  • the biosynthetic pathway is violacein biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  • the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  • the biosynthetic pathway is isobutanol biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  • the fusion protein comprises:
  • cleav is the linker comprising the MPP cleavable region.
  • the two or more proteins to be co-expressed comprises fluorescence proteins, such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue fluorescence protein) , EYFP (enhanced yellow fluorescence protein) , ECFP (enhanced cyan fluorescence protein) , GFP (green fluorescence protein) , RFP (red fluorescence protein) , BFP (blue fluorescence protein) , YFP (yellow fluorescence protein) , CFP (cyan fluorescence protein) , FbFP (flavin mononucleotide based fluorescence protein) , mCherry, dsRed, tdTomato and turbo-RFP.
  • fluorescence proteins such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue
  • the two or more proteins to be co-expressed comprises protein tags for use in protein purification or immunoblotting. These protein tags can be removed from fusion proteins after a purification step. Examples of protein tags include, but are not limited to polyhistidine tag, GST tag, HA tag, FLAG tag, MBP tag, NusA tag, c-Myc tag and Strep tag.
  • the nucleic acid encoding a linker comprising an MPP cleavable region is connected in frame with the nucleic acid encoding an upstream protein and a downstream protein to obtain a coding sequence (CDS) of a fusion protein.
  • CDS coding sequence
  • the nucleic acids encoding more than two proteins can be connected in the same manner as described above to obtain a CDS of a fusion protein carrying more individual proteins.
  • the CDS of a fusion protein is operably linked to an expression control sequence, such as a promoter.
  • a skilled person in the art will know whether a certain expression control sequence (such as a ribosome binding site, a terminator, an enhancer, an intron, a 5’-UTR, a 3’-UTR, a poly (A) tail, etc. ) is required to construct an expression cassette (EC) .
  • the EC will be ligated to a vector for expression.
  • these proteins can be divided into different groups, and the nucleic acids encoding proteins in each group will be connected into one CDS and subsequently constructed into one EC.
  • the multiple ECs can be inserted into one vector.
  • the multiple ECs can be inserted into multiple vectors to form a vector composition.
  • the vector or the vector composition can be introduced into a cell by transformation, transfection, electroporation or any other techniques well-known in the art.
  • the cells harboring vectors will be cultured under a condition that is suitable for protein expression.
  • the fusion proteins can be expressed in cells and then cleaved by either an endogenous MPP or an exogenous MPP to release individual proteins.
  • the EC encoding an exogenous MPP can be constructed on the same vector as fusion proteins, or on another vector.
  • a promoter can be a native promoter or a synthetic promoter.
  • a promoter can be a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter.
  • the promoter may be a promoter commonly used in eukaryotic expression systems or a promoter used in prokaryotic expression systems.
  • promoters used in eukaryotic expression systems include, but are not limited to, CMV promoter (Cytomegalovirus promoter) , SV40 promoter (Simian virus 40 promoter) , PGK promoter (phosphoglycerate kinase promoter) , EF1 ⁇ promoter (elongation factor 1-alpha promoter) , ⁇ -actin promoter, Ubc promoter (human ubiquitin C gene-derived promoter) , CAG promoter (hybrid mammalian promoter) , TRE promoter (tetracycline response element promoter) , UAS promoter (Drosophila promoter with Gal4 binding site) , Ac5 promoter (Drosophila actin 5c gene-derived insect promoter) , CaMKIIa promoter (Ca2 + /calmodulin-dependent protein kinase II promoter) , GAL1 promoter (yeast galactokinase promoter) , GAL1 and GAL
  • promoters used in prokaryotic expression systems include, but are not limited to, T7 promoter (T7 phage-derived promoter) , T7lac promoter (T7 phage-derived promoter plus lac operator) , Sp6 promoter (Sp6 phage-derived promoter) , araBAD promoter (arabinose metabolism operon-derived promoter) , trp promoter (tryptophan operon-derived promoter) , lac promoter (lac operon-derived promoter) , Ptac promoter (a hybrid promoter of the lac promoter and the trp promoter) , Ptac/lacO promoter (Ptac promoter plus lac operator) and pL promoter (Lambda phage-derived promoter) .
  • Vector can be a linear or a circular DNA or RNA with either single strand or double strands.
  • vector is free in cytoplasm of a cell after entering into the cell.
  • vector is integrated into the genome of a cell after entering into the cell.
  • vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene.
  • the vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
  • the invention provides an MPP cleavable sequence comprising a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  • MPP cleavable sequence usually refers to amino acid sequence, while in some cases, a nucleic acid encoding an amino acid sequence that can be cleaved by MPP is also within the meaning of an MPP cleavable sequence.
  • the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria.
  • the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  • the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  • the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments, the MPP cleavable sequence comprises at least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP.
  • the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X1 comprises RQAFQKRA or RQAFQRRA
  • X2 comprises YSS, FHT or FST.
  • X1 is RQAFQKRA
  • X2 is FHT or FST.
  • the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  • the artificially designed amino acid sequence comprises a formula as follows:
  • X3 comprises R (G) n KRA or R (G) n RRA
  • the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  • the “G” in the brackets of above formula can be replaced by any other amino acid residues.
  • MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii.
  • MPP is derived from species of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
  • MPP is derived from Saccharomyces cerevisiae. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae S288C strain.
  • the invention provides a fusion protein comprising two or more proteins to be co-expressed in a cell or a cell-free system, wherein at least two adjacent proteins in the fusion protein are linked by a linker comprising an MPP cleavable region, wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  • the linker or the MPP cleavable region comprises one or more of the MPP cleavable sequence, wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  • the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  • the complex biological system is nitrogenase system.
  • the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  • the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN ⁇ NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” .
  • the biosynthetic pathway is violacein biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  • the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  • the biosynthetic pathway is isobutanol biosynthetic pathway.
  • the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  • the fusion protein comprises:
  • cleav is the linker comprising the MPP cleavable region.
  • the two or more proteins to be co-expressed comprises fluorescence proteins, such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue fluorescence protein) , EYFP (enhanced yellow fluorescence protein) , ECFP (enhanced cyan fluorescence protein) , GFP (green fluorescence protein) , RFP (red fluorescence protein) , BFP (blue fluorescence protein) , YFP (yellow fluorescence protein) , CFP (cyan fluorescence protein) , FbFP (flavin mononucleotide based fluorescence protein) , mCherry, dsRed, tdTomato and turbo-RFP.
  • fluorescence proteins such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue
  • the two or more proteins to be co-expressed comprises protein tags for use in protein purification or immunoblotting. These protein tags can be removed from fusion proteins after purification step. Examples of protein tags include, but are not limited to polyhistidine tag, GST tag, HA tag, FLAG tag, MBP tag, NusA tag, c-Myc tag and Strep tag.
  • the invention provides a vector carrying a nucleic acid encoding a fusion protein of the present invention.
  • Vector can be a linear or a circular DNA or RNA with either single strand or double strands.
  • vector is free in cytoplasm of a cell after entering into the cell.
  • vector is integrated into the genome of a cell after entering into the cell.
  • vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene.
  • the vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
  • origins of replication for bacteria are those of plasmids pBR322, pUC19, pACYC177, and pACYC184 that allow replication in E. coli, and plasmids pUB110, pE194, pTA1060, and pAM ⁇ 1 that allow replication in Bacillus.
  • the vector can be integrated into the genome by homologous recombination.
  • the vector may contain a polynucleotide for directing integration into the genome of the host cell at one or more precise locations on one or more chromosomes by homologous recombination.
  • the integration element should contain a sufficient number of nucleotides that have high sequence identity with the corresponding target sequence to enhance the possibility of homologous recombination.
  • These integration elements can be any sequence that is homologous to a target sequence in the host cell genome.
  • these integration elements may be non-coding polynucleotides or coding polynucleotides.
  • the vector can be integrated into the genome of the host cell by non-homologous recombination.
  • the vector may contain one or more selectable markers that allow easy selection of transformed cells, transfected cells, transduced cells, and the like.
  • a selectable marker is a gene of which product provides biocide resistance or virus resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
  • bacterial selectable markers include markers for dal gene of Bacillus licheniformis or Bacillus subtilis, or those conferring antibiotic resistance (such as ampicillin, chloramphenicol, kanamycin, neomycin, spectinomycin, or tetracycline resistance) .
  • Suitable markers for use in yeast host cells include but are not limited to ADE2, HIS3, LEU2, LYS2, MET3, TRP1, and URA3.
  • Selectable markers for use in a filamentous fungal host cell include, but are not limited to, adeA (phosphoribosylaminoimidazole-succinocarboxamide synthase) , adeB (phosphoribosyl-aminoimidazole synthase) , amdS (acetamidase) , argB (ornithine carbamoyltransferase) , bar (phosphinothricin acetyltransferase) , hph (hygromycin phosphotransferase) , niaD (nitrate reductase) , pyrG (orotidine-5'-phosphate decarboxylase) , sC (sulfate adenyltransferase) , and trpC (anthranilate synthase) , etc.
  • adeA phosphoribosylaminoimidazole-succino
  • these proteins can be divided into different groups, and the nucleic acids encoding proteins in each group will be connected into one CDS and subsequently constructed into one expression cassette (EC) .
  • the multiple ECs can be inserted into one vector.
  • the multiple ECs can be inserted into multiple vectors to form a vector composition.
  • the vector or the vector composition can be introduced into a cell by transformation, transfection, electroporation or any other techniques well-known in the art.
  • the cells harboring vectors will be cultured under a condition that is suitable for protein expression.
  • the fusion proteins can be expressed in cells and then cleaved by either an endogenous MPP or an exogenous MPP to release individual proteins.
  • the EC encoding an exogenous MPP can be constructed on the same vector as fusion proteins, or on another vector.
  • vector carries a nucleic acid encoding GFP-cleav-RFP. In some embodiments, vector carries a nucleic acid encoding GFP-cleav-RFP and a nucleic acid encoding an MPP. In some embodiments, vector composition comprises a vector carrying a nucleic acid encoding GFP-cleav-RFP and a vector carrying a nucleic acid encoding an MPP. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN ⁇ NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU-cleav-NifS and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN ⁇ NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU, a nucleic acid encoding NifS, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN ⁇ NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU-cleav-NifS and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisi
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN ⁇ NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU, a nucleic acid encoding NifS, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “ ⁇ ” represents a flexible peptide. In some preferred embodiments, “ ⁇ ” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Sacchar
  • vector carries a nucleic acid encoding VioA-cleav-VioB-cleav-VioE, a nucleic acid encoding VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding VioA-cleav-VioB-cleav-VioE, a nucleic acid encoding VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • vector composition comprises a vector carrying a nucleic acid encoding VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC and a vector carrying a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding ILV5-cleav-ILV3-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • vector carries a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ADHA-cleav-ARO10, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding ILV5-cleav-ILV3-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ADHA-cleav-ARO10, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region.
  • MPP is derived from Saccharomyces cerevisiae.
  • the invention provides a cell harboring a vector or a vector composition of the present invention, wherein the cell is a prokaryotic cell or a eukaryotic cell.
  • the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholder
  • the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  • KPM minimal medium (10.4 g/L of Na 2 HPO 4 , 3.4 g/L of KH 2 PO 4 , 26 mg/L of CaCl 2 ⁇ 2H 2 O, 30 mg/L of MgSO 4 , 0.3 mg/L of MnSO 4 , 36 mg/L of ferric citrate, 10 mg/L of para-aminobenzoic acid, 30 nmol/L of Na 2 MoO 4 , 5 mg/L of biotin, 1 mg/L vitamin B1, 0.05%Casamino acids, and 0.8% (wt/vol) glucose) , supplied with 10 mM ammonium sulfate (KPM-HN) for pregrowth or 0.1%glutamate (KPM-LN) for nitrogenase activity assays.
  • KPM-HN ammonium sulfate
  • KPM-LN 0.1%glutamate
  • Three Level 2 vectors A-C were constructed for assembling EC1 to EC10 into sub-gene clusters.
  • Vector A, B, and C was used for assembling EC1-3; EC4-7, and EC8-10 respectively.
  • Ten plasmids carrying gap1-10 sequences were constructed with identical BpiI scars to the corresponding Level 1 vectors 1-10.
  • Three plasmids carrying gapA-C sequences were also constructed with identical BsaI scars to the corresponding Level 2 vectors A-C. These gap sequences were used when no corresponding expression cassette or sub-gene cluster existed.
  • Two Level 3 vectors were constructed.
  • FIG. 2 depicts the procedures of assembly gene clusters encoding Nif polyprotein systems.
  • the su9 sequence was added to the first gene by overlap PCR.
  • Each PCR gene product flanked with coding sequences of specific processing sites was assembled on a Level 0 vector as giant gene modules.
  • promoter modules, terminator modules and giant gene modules were assembled on Level 1 vectors.
  • the nifHDK, nifUS, nifFMY, nifJVW, and nifENB giant genes were assigned to the Level 1 vectors 1, 2, 3, 8, and 10 respectively.
  • the MBP-TEVp a maltose binding protein was fused to TEVp to enhance the solubility of the protein
  • gene module was assigned to Level 1 vector 9.
  • the LEU2 expression cassette with identical BsaI scars was assigned to Level 2 vector B and subsequently used for assembly to provide auxotrophic selection of positive transformants.
  • the restriction enzyme XbaI was used for linearization of the plasmids prior to transformation.
  • Yeast transformations were carried out according to the lithium acetate (LiAc) method as described in Gietz R. D. and Schiestl R. H. (2007) High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nature Protocols 2 (1) : 31-34 with modifications.
  • yeast cells were grown in liquid YPD medium at 30°C and 200 rpm overnight. 1 mL of yeast culture was added into 100 mL YPD medium for further growth until OD 600nm is 0.5 ⁇ 0.6.
  • Harvest the cells by centrifugation at 5,000 rpm for 5 min and resuspend the pellet in 25 mL of sterile water and centrifuge at 5,000 rpm for 5 min to pellet the cells.
  • the cells were resuspended in 1 mL of LiAc buffer (100 ⁇ L of TE buffer (containing 0.1 M Tri-HCl and 0.01 M EDTA, pH 7.5) , 100 ⁇ L of 1 M LiAc (pH 7.5) , 800 ⁇ L of sterile water) .
  • 200 ⁇ L of resuspended yeast cells were mixed with about 1 ⁇ g DNA and 5 ⁇ L ssDNA (Solarbio, H1060) .
  • the mixture was added into 1.2 mL of PEG buffer (960 ⁇ L of 40%PEG-3350, 120 ⁇ L of 1 M LiAc (pH 7.5) and 120 ⁇ L of TE buffer) in a 1.5 mL Eppendorf tube.
  • the cells were incubated at 30°C and 200 rpm for 30 min. Then harvest the cells by centrifugation at 5,000 rpm for 3 min. The supernatant was discarded and the cell pellet was resuspended in 200 ⁇ L of sterile water. Transformants were selected on solid synthetic dropout medium.
  • Mitochondria extraction was performed as described in K. Diekert, A.I.P. M. de Kroon, G. Kispal, R. Lill, “Isolation and subfractionation of mitochondria from the yeast Saccharomyces cerevisiae” in Methods in Cell Biology, L.A. Pon, E.A. Schon, Eds. (Academic Press, 2001) , Vol. 65, pp. 37–51 with modifications.
  • the spheroplasts were broken using a glass homogenizer, and total extracts were centrifuged at 1,500 ⁇ g for 5 min to remove the cellular debris and unbroken cells. Then the supernatants were centrifuged twice at 4,000 ⁇ g for 5 min to remove other needless cellular contents. The supernatants from the last step were centrifuged at 12,000 ⁇ g for 15 min to collect the crude mitochondria.
  • Precipitates (mitochondria pellets) from 10 mL of yeast cells ( ⁇ 0.5 g wet weight) were resuspended in 200 ⁇ L of PBS buffer and 50 ⁇ L of 5 ⁇ Protein Loading Dye (Sangon Biotech; C508320) . After boiling for 20 min, samples were cooled to room temperature and then centrifuged at 12,000 rpm for 2 min. Western Blotting assays were performed using 20 ⁇ L of each sample.
  • the secondary antibody goat anti-rabbit IgG-HRP (ZSGB Biotech; ZB-2301) or goat anti-mouse IgG-HRP (ZSGB Biotech; ZB-2305) were used at 1: 3,000 dilution and incubated for 2 hrs before membrane development.
  • the primary antibodies used for immunoblotting and quantification against Nif proteins were obtained from antiserum of rabbits immunized with specific proteins.
  • the primary antibodies against GFP (ZSGB Biotech, TA-06) , HSP60 (Proteintech, 15282-1-AP) and His tag (ZSGB Biotech, TA-02) are commercially available.
  • Plasmids carrying nitrogenase systems were transformed into E. coli NCM3722 strain, and the transformants were spread on LB plates with appropriate concentration of antibiotics. After incubation at 37°C for 16 h, single colonies were picked and streaked onto KPM-NN plates. Then the plates were moved to a 2.5-L anaerobic jar (Oxoid AG0025A; Thermo Fisher Scientific) equipped with anaerobic gas-generating sachets (Oxoid AN0025A; Thermo Fisher Scientific) and an oxygen indicator (Oxoid BR0055B; Thermo Fisher Scientific) . Anaerobic jars were immediately locked and incubated at 30°C for 3 to 4 days.
  • Example 1 Testing TEVp-based Polyprotein System in Yeast Mitochondria.
  • TEVp-based nif gene-encoded polyproteins for expression in yeast was designed, each flanked by different promoters and terminators, with variant sequences encoding the signal peptide of subunit 9 of the Neurospora crassa F0-ATPase (Su9) fused to the 5′end of the giant nif genes to enable import of polyproteins into yeast mitochondria ( Figure 3A) .
  • TEVp-based nif gene cluster were assembled with or without sequences designed to express and target an MBP-TEVp fusion protein to mitochondria (plasmids pNG410 and pNG411 respectively) , which were subsequently transformed and integrated into chromosome XIV of S.
  • TEVp-based nif system leads to less efficient protein targeting.
  • Yeast strains Sc_410 and Sc_411 were grown in liquid YPDG medium to induce the expression of polyproteins and MBP-TEVp under aerobic condition.
  • mitochondria extracts were prepared from aerobically grown yeast cultures and analyzed by immunoblotting using antibodies against specific Nif proteins, negligible amounts of Nif components were detected in strain Sc_410, which contains the complete TEVp-based nif system, even though MBP-TEVp was clearly imported into mitochondria (Figure 3B) .
  • the results suggest that polyproteins are not imported into mitochondria, potentially because they are unstable in the cytoplasm, or once imported into the mitochondrial matrix they are rapidly degraded.
  • 1.2 TEVp is toxic to yeast when co-expressed with Nif polyproteins.
  • strains Sc_410 and Sc_411 were plotted to determine whether the expression of TEVp would impede the growth of yeast. It was observed that when expression of MBP-TEVp was induced with galactose in strain Sc_411, cell growth was arrested ( Figure 4B) . These results reflect severe challenges to the deployment of TEVp-based nif polyprotein system in mitochondria.
  • the Mitochondrial Processing Peptidase was then evaluated for engineering a polyprotein system purpose as it efficiently cleaves mitochondrial proteins after translocation into mitochondria.
  • the Su9 pre-sequence (residues 1 to 69 of ATP synthase subunit 9, from Neurospora crassa) was selected for screening the efficiency of MPP processing, on the basis that this sequence targets proteins to mitochondria and is efficiently removed by MPPs (Westermann B and Neupert W (2000) Mitochondria-targeted green fluorescent proteins: convenient tools for the study of organelle biogenesis in Saccharomyces cerevisiae. Yeast 16 (15) : 1421-1427) . It has been previously demonstrated that the Su9 pre-sequence is cleaved at two sites.
  • residues 40-69 (designated here as Su9.30, SEQ ID NO: 67) was chosen, which contains the second processing site, as the starting sequence for investigation ( Figure 6A) .
  • the polyprotein was completely processed by reconstituted Sc MPP in E. coli and a single ⁇ 27 kDa band was detected with the anti-GFP antibody ( Figure 6B) .
  • Shortening the sequence by 5 residues at the N-terminus (Su9.25) gave similar results, but further removal of N-terminal residues (Su9.20, Su9.15 and Su9.11) decreases the processing efficiency in proportion to their length ( Figure 6B) .
  • 80%of pre-sequences contain an arginine residue either at position -2 (-2R) or -3 (-3R) (Teixeira P. F. and Glaser E. (2013) Processing peptidases in mitochondria and chloroplasts. Biochimica et Biophysica Acta (BBA) -Molecular Cell Research 1833 (2) : 360-370) and in addition, a flexible region between this conserved arginine and a more distal arginine, located several residues upstream, which is also important for processing (Kojima K., Kitada S., Ogishima. T and Ito A. (2001) A Proposed Common Structure of Substrates Bound to Mitochondrial Processing Peptidase.
  • GFP-RFP fusions with internal S8, S9 and S10 sites and the Su9 signal peptide for targeting to mitochondria were constructed and expressed under the control of a strong constitutive TDH3 promoter in S. cerevisiae ( Figure 7) . Dual copies of the S8, S9 and S10 sequences in the GFP-RFP fusion proteins were also included in this study.
  • mitochondrial extracts from yeast strains expressing these constructs were assayed by immunoblotting with anti-GFP antibodies, only one band with similar migration to processed GFP was detectable (Figure 7) , suggesting that unprocessed GFP-RFP polyprotein is susceptible to degradation in mitochondria.
  • the TEVp-based polyprotein system consisting of five giant nif genes as used in Example 1 was re-engineered, by replacing the TEVp cleavable sequences within polyproteins with MPP cleavable sequences while maintain the same gene order in each polyprotein as depicted in Figure 3A.
  • the variant NifD-Y100Q protein with highly activity was used to replace wild-type NifD protein, as the wild-type NifD is susceptible to internal cleavage by MPP (Allen R.S., Gregg C.M., Okada S., Menon A., Hussain D., Gillespie V., Johnston E., Devilla R., Warden A.C., Taylor M., Byrne K., Colgrave M., Wood C.C. (2020) Plant expression of NifD protein variants resistant to mitochondrial degradation. Proc. Natl. Acad. Sci. 117 (37) : 23165-23173) .

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present invention provides a method for co-expressing two or more proteins. Specifically, the present invention provides a method for co-expressing two or more proteins in a cell or a cell-free system by using a linker comprising one or more Mitochondrial Processing Peptidase (MPP) cleavable sequences. This invention also provides an MPP cleavable sequence, a fusion protein comprising the MPP cleavable sequence, a nucleic acid or a vector encoding the MPP cleavable sequence or the fusion protein, and a cell expressing the MPP cleavable sequence or the fusion protein.

Description

METHOD FOR CO-EXPRESSING PROTEINS FIELD OF THE INVENTION
The present invention relates to the field of bioengineering and in particularly, to Mitochondrial Processing Peptidase (MPP) cleavable sequences and uses thereof.
BACKGROUND OF THE INVENTION
Efficient co-expression of multiple proteins or their subunits finds many important applications in the field of life sciences and biotechnology. For instance, such a task is necessary for creating a complex biological system, constructing a biosynthetic pathway and producing a multimeric protein (such as an antibody, a biologically functional complex, etc. ) in a host cell or a cell free system. Currently, co-expression of multiple proteins or their subunits mostly relies on construction of individual expression cassettes in which each of the protein to be co-expressed is under the control of an individual promoter. This method would limit the number of the genes that can be co-expressed in a cell, especially in a eukaryotic cell. Other methods for co-expressing proteins involve fusing the coding sequences of multiple proteins in tandem as a polycistronic DNA or a DNA encoding a polyprotein, in which the co-expression of multiple proteins is under the control of one promoter. For example, a ribosome binding site (RBS) or an internal ribosomal entry site (IRES) can be used to connect the coding sequences of multiple proteins to form a polycistronic DNA. The polycistronic DNA will be transcribed into one polycistronic mRNA, and the translation of each protein is independent with each other. Besides, a cleavable peptide can be used to connect multiple proteins to form a polyprotein. The individual proteins will be released after a post-translational cleavage by a protease. One example of such protease is Tobacco etch virus protease (TEVp) , which is one of the most widely used proteases in biotechnology. In this circumstance, a 7-amino acids TEVp cleavable peptide is often used to connect the proteins to be co-expressed.
However, the above-mentioned methods have their limitations. For example, IRES-mediated translational initiation is less efficient compared with that of the 5’-cap-mediated initiation and results in uneven protein co-expression (Ha, S.H., Liang, Y.S., Jung, H., Ahn, M.J., Suh, S.C., Kweon, S.J., Kim, D.H., Kim, Y.M., and Kim, J.K. (2010) Application of two bicistronic systems involving 2A and IRES sequences to the biosynthesis of carotenoids in rice endosperm. Plant Biotechnol. J. 8, 928–938; Mizuguchi, H., Xu, Z., Ishii-Watabe, A., Uchida, E. and Hayakawa, T. (2000) IRES-dependent second gene expression is significantly lower than cap-dependent first gene expression in a bicistronic vector. Mol. Ther. 1, 376–382) . When TEVp-based method was used to achieve protein co-expression, the protein expression or accumulation levels were generally low (Marcos, J.F. and Beachy, R.N. (1997) Transgenic accumulation of two plant virus coat proteins on a single self-processing polypeptide. J. Gen. Virol. 78 (Pt 7) : 1771–1778) .
Therefore, there is still a need for novel approaches to efficiently co-expressing multiple proteins or their subunits, thus facilitating the research on life sciences and industrial applications.
SUMMARY OF THE INVENTION
In this invention, the inventors have found that a Mitochondrial Processing Peptidase (MPP) cleavable sequence can be used to convert multiple proteins into fusion proteins, facilitating the co-expression of multiple proteins or their subunits in mitochondria, cytoplasm or cell-free systems. When targeting these fusion proteins into mitochondria, the fusion proteins can be cleaved by an endogenous MPP located in mitochondria and the individual protein will be released and become functional proteins. Also, this method can be applied to other cellular or non-cellular environment, where the fusion proteins are co-expressed with an exogenous MPP. Precise cleavage of fusion proteins by MPP and efficient co-expression of individual proteins can be achieved in either mitochondria or cytoplasm, wherever in prokaryotes or eukaryotes. This MPP- based method succeeds in functional expression of multiple proteins with high level of protein accumulation in the circumstance of protein co-expression.
Accordingly, in the first aspect, the invention relates to a method for co-expressing two or more proteins in a cell or a cell-free system, comprising expressing the two or more proteins as a fusion protein in which at least two adjacent proteins are linked by a linker comprising a Mitochondrial Processing Peptidase (MPP) cleavable region, wherein cleavage of the MPP cleavable region by an MPP in the cell or the cell-free system results in release of the proteins linked by the linker.
In some embodiments, each of the protein to be co-expressed is linked to the adjacent protein (s) in the fusion protein by a linker comprising an MPP cleavable region or an uncleavable linker, and wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
In some embodiments, the cell co-expressing one or more proteins endogenously expresses an MPP. In some embodiments, the cell is a eukaryotic cell and the MPP is located at mitochondria of the cell, wherein the fusion protein is located at mitochondria of the cell.
In some embodiments, the cell co-expressing one or more proteins exogenously expresses an MPP. In some embodiments, the cell is a prokaryotic cell or a eukaryotic cell and the MPP is located in the cytoplasm of the cell, and wherein the fusion protein is expressed in the cytoplasm of the cell.
In some embodiments, the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some preferred embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus  cereus.
In some embodiments, the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
In some embodiments, the fungal cell is a yeast cell. In some embodiments, the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
In some embodiments, the cell-free system co-expressing one or more proteins comprises an effective amount of the MPP capable of cleaving the fusion protein at the MPP cleavable region.
In some embodiments, MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae S288C strain.
In some embodiments, the linker or the MPP cleavable region comprises one or more MPP cleavable sequences. In some embodiments, the linker or the MPP cleavable region comprises two or more MPP cleavable sequences arranged in tandem, and wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
In some embodiments, each of the MPP cleavable sequences independently comprises a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
In some embodiments, the wide type amino acid sequence is a wild type  mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria. In some embodiments, the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell. In some embodiments, the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
In some preferred embodiments, the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X1-X2,
wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST. In some embodiments, X1 is RQAFQKRA and X2 is FHT or FST. In some preferred embodiments, the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X3-X2,
wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6. In some embodiments, X3 is R (G) nRRA and X2 is FHT or FST, and wherein n=2 or 3. In some preferred embodiments, the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO:  79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
In some embodiments, the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
In some embodiments, the complex biological system is nitrogenase system. In some embodiments, the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ. In some preferred embodiments, the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” .
In some embodiments, the biosynthetic pathway is violacein biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE. In some preferred embodiments, the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the biosynthetic pathway is isobutanol biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA. In some embodiments, the fusion protein comprises:
(a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
(b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
(c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
wherein “cleav” is the linker comprising the MPP cleavable region.
In a second aspect, the invention provides an MPP cleavable sequence comprising a wild type amino acid sequence, a fragment or a variant thereof that  is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
In some embodiments, the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria. In some embodiments, the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell. In some embodiments, the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
In some preferred embodiments, the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X1-X2,
wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST. In some embodiments, X1 is RQAFQKRA and X2 is FHT or FST. In some preferred embodiments, the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X3-X2,
wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6. In some embodiments, X3 is R (G) nRRA  and X2 is FHT or FST, and wherein n=2 or 3. In some preferred embodiments, the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
In a third aspect, the invention provides a fusion protein comprising two or more proteins to be co-expressed in a cell or a cell-free system, wherein at least two adjacent proteins in the fusion protein are linked by a linker comprising an MPP cleavable region, wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
In some embodiments, the linker or the MPP cleavable region comprises one or more of the MPP cleavable sequence according to the second aspect of the invention, wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
In some embodiments, the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
In some embodiments, the complex biological system is nitrogenase system. In some embodiments, the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ. In some preferred embodiments, the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” .
In some embodiments, the biosynthetic pathway is violacein biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE. In some preferred embodiments, the fusion protein comprises VioA-cleav-VioB-cleav-VioE and  VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the biosynthetic pathway is isobutanol biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA. In some preferred embodiments, the fusion protein comprises:
(a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
(b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
(c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
wherein “cleav” is the linker comprising the MPP cleavable region.
In a fourth aspect, the invention provides a nucleic acid encoding the MPP cleavable sequence according to the second aspect of the invention or the fusion protein according to the third aspect of the invention.
In some embodiments, the nucleic acid is a linear DNA fragment or an mRNA fragment. In some embodiments, the linear DNA fragment is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell.
In a fifth aspect, the invention provides a vector comprising one or more of the nucleic acids according to the fourth aspect of the invention.
In some embodiments, the vector is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell. In some embodiments, the vector is a DNA plasmid vector, a viral vector, a bacterial vector, a cosmid, or an artificial chromosome.
In a sixth aspect, the invention provides a cell comprising the MPP cleavable sequence according to the second aspect of the invention, the fusion protein according to the third aspect of the invention, the nucleic acid according to the fourth aspect of the invention or the vector according to the fifth aspect of the invention, wherein the cell is a prokaryotic cell or a eukaryotic cell.
In some embodiments, the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some embodiments, the prokaryotic cell is a cell of Escherichia  coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus cereus.
In some embodiments, the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
In some embodiments, the fungal cell is a yeast cell. In some embodiments, the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris. In some preferred embodiments, the yeast is Saccharomyces cerevisiae. In some preferred embodiments, the yeast is Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, the yeast is Saccharomyces cerevisiae S288C strain.
DESCRIPTION OF THE FIGURES
Figure 1: Vectors for hierarchical assembly of multiple parts. The type IIS restriction enzymes BsaI and BpiI were used for hierarchical Golden Gate assembly. Level 0 vectors were used for carrying expression elements, such as promoters, CDSs or terminators. Level 1 vectors were used for carrying individual expression cassettes. Level 2 vectors were used for assembling sub-gene clusters. Level 3 vectors were used for assembling complete gene clusters.
Figure 2: Schematic diagram showing the procedure of assembling Nif polyprotein gene cluster. The nifHDK, nifUS, nifFMY, nifJVW, and nifENB  expression cassettes were assigned to the Level 1 vectors 1, 2, 3, 8, and 10 respectively. The MBP-TEVp (a maltose binding protein was fused to TEVp to enhance the solubility of the protein) expression cassette or Gap9 (a random sequence) was assigned to Level 1 vector 9. These expression cassettes or gap sequences were assembled in Level 2 vectors to form sub-gene clusters. The sub-gene clusters were further assembled in a Level 3 vector to form a complete Nif polyprotein gene cluster.
Figure 3A-3B: Engineering and assessment of TEVp-based Nif polyprotein system in yeast mitochondria. (A) Schematic diagram of TEVp-based nif constructs for expression in mitochondria of Saccharomyces. cerevisiae W303-1a strain. Symbol “ǒ” is used to represent dual TEVp sites (ENLYFQSENLYFQS) . The Leu2 marker is used for auxotrophic selection of transformants. Constructs were integrated in the YNRCΔ9 locus on chromosome XIV of the yeast genome with ~600 bp 5’YNRCΔ9 and 3’YNRCΔ9 flanking regions as homology arms. (B) Immunoblotting of Nif proteins from yeast carrying TEVp-based Nif polyprotein system. Ec Nif indicates protein samples prepared from E. coli cells carrying an operon-based nif system, in which each Nif protein is translated independently. Mit Isolations indicates protein samples prepared from mitochondrial extracts. HSP60 was used as an internal reference.
Figure 4A-4B: Growth curves of yeast strains carrying TEVp-based Nif polyprotein system. Sc_535 is a yeast strain transformed with an empty vector carrying Leu2 selection marker (EV) assigned as a control. Sc_410 and Sc_411 are yeast strains carrying constructs pNG410 and pNG411 indicated in Figure 3A, respectively. “Growth rate” represents the relative maximum growth rate of each strain. The maximum growth rate of strain Sc_535 was assigned as 100%. For the panel (A) , 2%glucose was added initially. For the panel (B) , 0.4%of glucose was used for pre-growth and the 1.6%of galactose was used for inducing protein expression.
Figure 5: Schematic diagram showing plasmid constructs and design procedure for testing the cleavage efficiency of reconstituted MPP in E. coli.  GFP-RFP fusions linked by MPP cleavage sites were expressed from the σ54 dependent PnifH promoter from Klebsiella oxytoca and activated by NifA constitutively expressed from the Ptet promoter. MPP (α subunit labeled with HA-tag and β subunit labeled with His-tag) from Saccharomyces cerevisiae W303-1a strain was controlled by inducible Ptac/lacO promoter. The diamond symbol represents the MPP processing site.
Figure 6A-6E: Screening for minimal MPP processing sites. (A) Sequences of the designed MPP processing sites. “AA” is short for amino acid. The symbol “↓” represents the proposed processing site of MPP. Panels (B) , (C) , (D) , and (E) show immunoblotting assays to determine the cleavage efficiency of the various MPP processing sites listed in (A) . The symbol “-” indicates the absence of the MPP expression module; “+” indicates the presence of the MPP expression module induced with 100 μM IPTG.
Figure 7: Assessment of the cleavage efficiency of MPP processing sites in mitochondria of S. cerevisiae W303-1a strain. The “2×” symbol indicates that tandem MPP processing sites were used. HSP60 was used as an internal reference for mitochondrial specificity.
Figure 8A-8E: Immunoblotting of S10 and S10S linked Nif polyproteins expressed in E. coli. In each case, a single MPP-based giant gene was co-transformed with a plasmid containing the remaining nif genes in the operon-based system and grown under nitrogen-fixing conditions. Samples were immediately collected after the acetylene reduction assay (ARA) and subjected to immunoblotting with Nif protein specific antibodies. “2× S10” indicates that dual S10 sites (RGGGRRAFHTRGGGRRAFHT) were used and “2× S10S” indicates that dual S10S sites (RGGGRRAFSTRGGGRRAFST) were used. Ec Nif indicates protein samples prepared from E. coli cells carrying an operon-based nif system, with or without the presence of MPP. “-” indicates that the MPP coding sequence was absent in the strain and “+” indicates that expression of MPP was induced with 100 μM of IPTG. Polyproteins tested were: (A) or (B) NifUǐǐS or  (C) NifHǐǐDǐǐK or (D)  orand (E) NifFǐǐMǐǐY orSymbolrepresents dual S10 sites (RGGGRRAFHTRGGGRRAFHT) and symbolrepresents dual S10S sites (RGGGRRAFSTRGGGRRAFST) .
Figure 9A-9B: Nitrogenase activity of MPP-based polyprotein system. (A) Schematic diagram showing the gene arrangement of TEVp-based polyprotein system (Ver 1.0) and MPP-based polyprotein system (Ver 2.0 and Ver 2.1) . Symbolsandare used to represent dual TEVp sites, dual MPP S10 sites and dual S10S sites, respectively. In each case, ARA activities were normalized to the activity exhibited by TEVp-based polyprotein system (Ver 1.0) , which was assigned as 100%. Error bars indicate the SD observed from at least two biological replicates. (B) Diazotrophic growth promoted by TEVp-based and MPP-based polyprotein systems in E. coli. WT represents the reconstituted operon-based nif system. EV represents empty vector (pBDS1549) used as a negative control. Ver 1.0 and Ver 2.0 represents TEVp-based and MPP-based polyprotein system as shown in (A) , respectively.
Figure 10: Efficient cleavage ofpolyprotein by MPP in yeast mitochondria. Thepolyprotein was expressed from PScGAL1 promoter and targeted to mitochondria of yeast strain Sc_3682 with the help of Su9 signal peptide. Ec Nif indicates protein samples prepared from E. coli cells carrying an operon-based nif system; Sc_535 is a yeast strain transformed with the empty vector pBDS535, which was used as a negative control. “Mit Isolations” indicates protein samples prepared from mitochondrial extracts. Image J software was used for protein quantification, and relative expression levels are shown in red (in parentheses below each lane, displayed as a percentage relative to Ec Nif) .
Figure 11A-11B: Assessment of MPP-based Nif polyprotein expression in yeast mitochondria. (A) Schematic diagram showing parts used to create constructs pBDS3752 and pBDS3942. The only difference between these two constructs is the presence of promoter variants in pBDS3942 highlighted in red. Giant genes are highlighted in blue and the symbolsandare used to represent dual S10 sites and dual S10S sites, respectively. The selection marker  and integration site are the same as described in Figure 3A. (C) Immunoblotting of yeast strains carrying MPP-based Nif polyprotein systems. One representative protein from each polyprotein was selected for the immunoblot assay. HSP60 was used as an internal reference.
Figure 12A-12C: Comparative growth analysis of yeast strains carrying MPP-based and TEV-based Nif polyprotein systems. (A) growth curves of strains grown in YPD (initial concentration of glucose was 2%) . (B) growth curves of strains grown in YPDG (initial concentrations of glucose and galactose were 0.4%and 1.6%, respectively) (C) Table showing the strain information and the maximum growth rate of each strain in panels (A) and (B) . a. The maximum growth rate of the Sc_535 strain was assigned as 100%. b. Indicates SD values lower than 0.5.
Figure 13: Assembly of violacein biosynthesis and isobutanol biosynthetic pathways for expression in S. cerevisiae mitochondria. Symbol “ǐǐ” is used to represent dual MPP S10 site.
Figure 14A-14C: Constructing of violacein biosynthesis pathway in mitochondria of S. cerevisiae S288C strain using MPP-based polyprotein strategy. (A) Schematic diagram showing violacein biosynthesis pathway and gene arrangement in polyproteins. (B) Petri dish experiment displaying the violet pigment biosynthesized by MPP-based Vio polyprotein in yeast mitochondria. I: yeast strain transformed with the empty vector (Sc_305) . II: yeast strain transformed with MPP-based vioAǐǐBǐǐE and vioDǐǐC polyprotein system carrying dual S10 sites (Sc_297) , in which five violacein biosynthesis genes were assembled as two polyproteins encoded by giant genes. III: yeast strain transformed with MPP-based vioAǐǐBǐǐE and vioDǐǐC polyprotein system carrying non-cleavable S10 site variants (Sc_299) . The symbol represents dual S10 site variants, in which the arginine residues at positions -2 and -3 were replaced by two alanine residues (RGGGAAAFSTRGGGAAAFST) , to provide a negative control; IV: yeast strain transformed with MPP-based vioAǐǐBǐǐEǐǐDǐǐC polyprotein system with tandem S10 sites (Sc_300) , in which five violacein  biosynthesis genes were assembled as a giant gene encoding a single polyprotein. (C) Immunoblotting of yeast strains carrying MPP-based violacein biosynthesis pathway used in (B) . The N-terminal His tagged VioE was selected for the immunoblot assay to determine the efficient cleavage. HSP60 was used as an internal reference.
Figure 15A-15B: Constructing of isobutanol biosynthesis pathway in mitochondria of S. cerevisiae S288C strain using MPP-based polyprotein strategy. (A) Schematic diagram showing genes in isobutanol biosynthesis pathway and the combinatorial strategy for constructing polyproteins to optimize isobutanol biosynthesis in yeast mitochondria. (B) Isobutanol production from MPP-based polyprotein combinations shown in (A) . EV indicates empty vector, which was used to assess the native level of isobutanol synthesized in yeast. Polyprotein combinations with the highest isobutanol production level are shown in red (4a and 6b combinations) .
DETAILED DESCRIPTION OF THE INVENTION
The “fusion and cut” feature of polyprotein or fusion protein strategy enables co-expression of protein components in prokaryotes, eukaryotes or cell-free systems at stoichiometric levels, thus enabling the engineering of protein complexes and intricate biochemical pathways requiring balanced gene expression. This strategy also dramatically reduces the number of expression parts for synthetic biology, thus decreasing the overall DNA cargo load and the requirement to diversify promoters and terminators in order to avoid homologous recombination. Although the complexity of eukaryotic gene regulation and the consequent DNA burden can be reduced through the deployment of minimal orthologous synthetic promoters and terminators, there is a lack of well-characterized expression parts in higher eukaryotes, especially in plants. The polyprotein or fusion protein strategy therefore has significant advantages for engineering complex biological pathways in eukaryotes, since it not only provides balanced protein expression, but also decreases the combinatorial  complexity when selecting suitable expression parts for engineering multiple protein-coding sequences.
The present invention provides an MPP-based method for efficiently co-expressing multiple proteins or their subunits, wherever in prokaryotes, eukaryotes or cell-free systems. This method relies on a linker comprising an MPP cleavable region, which is used to link two or more proteins into fusion proteins. The MPP cleavable region can be recognized and cleaved in the presence of an MPP, leading to the release of individual proteins within fusion proteins. Therefore, this method can be applied to protein co-expression in mitochondria, where an intrinsic MPP exists. Alternatively, this method can be applied to other cellular or non-cellular environment, where fusion proteins are co-expressed with an exogenous MPP.
Furthermore, the strategy disclosed herein could also be generalized to other co-expression methods involving in a specific processing peptidase of organelles, such as chloroplast, endoplasmic reticulum (ER) and vacuole, since all these organelles utilize a specific processing peptidase for signal peptide cleavage (Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. and Ghasemi, Y. (2018) A comprehensive review of signal peptides: Structure, roles, and applications. Eur. J. Cell Biol. 97 (6) : 422-441) . A skilled person in the art will recognize that the signal peptide specific to these organelles could be employed to screening for a useful peptidase cleavable sequence according to the method as disclosed herein. Therefore, those co-expression methods involving in the utilization of other organelle-specific peptidase will fall within the scope of the present disclosure.
Definitions
Unless otherwise defined herein, scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. In case of conflict, the present specification, including definitions, will control. Further, unless otherwise  required by context, singular terms shall include pluralities and plural terms shall include the singular.
As used herein, the term “nucleic acid” or “polynucleotide” , in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, a nucleic acid is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage. In some embodiments, “nucleic acid” refers to an individual nucleic acid residue (e.g., a nucleotide and/or nucleoside) ; in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues. A “nucleic acid” may be, for example, double-stranded, partially double-stranded, or single-stranded. As single-stranded nucleic acid, the nucleic acid may be the sense or antisense strand. A “nucleic acid” may be circular or linear. As used herein, the term “nucleic acid” encompasses DNA and RNA, including genomes, pre-mRNA, mRNA, cDNA, recombinant or synthetic nucleic acids including vectors. In some embodiments, a “nucleic acid” is or comprises RNA; in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a nucleic acid is, comprises, or consists of one or more nucleic acid analogs.
The term “coding sequence” means a polynucleotide encoding the amino acid sequence of a protein or polypeptide. The boundaries of a coding sequence are generally determined by an open reading frame, which begins with a start codon (such as ATG, GTG or TTG) and ends with a stop codon (such as TAA, TAG or TGA) . The coding sequence may be derived from genomic DNA, synthetic DNA, or a combination thereof. A skilled person in the art will recognize that due to the degeneracy of the genetic code, several nucleic acids may encode polypeptides having the same amino acid sequence. A codon preference table suitable for the target host cell may be used to modify the codons in the coding sequence of the protein to obtain optimal expression in a particular host cell, such as a prokaryotic cell or a eukaryotic cell. Codon preferences in  various hosts (for example, in E. coli, yeast, Arabidopsis, tobacco, maize, insect, mouse, rat, human, etc. ) are well-known in the art.
The terms “protein” and “polypeptide” are used interchangeably herein and generally refer to polymers of amino acid residues linked by peptide bonds, which has specific function and/or independent three-dimensional structure. “Protein” and “polypeptide” encompass full-length proteins and fragments thereof. The term also includes post-expression modifications of protein or polypeptide, such as glycosylation, acetylation, phosphorylation, and the like. In some circumstances, “protein” or “polypeptide” can be further divided into functional subunits or functional fragments that have independent and distinct functions. Therefore, these functional subunits or functional fragments can also be deemed as “proteins” or “polypeptides” and can be expressed independently. In addition, for the purposes of the present invention, “protein” and “polypeptide” also refer to variants obtained after modification, such as deletion, addition, insertion, and substitution (such as conservative amino acid substitutions) , of the amino acid sequence of a wild type protein or polypeptide.
As used herein, the term “wild type” refers to a nucleic acid, an amino acid sequence or a protein that is naturally occurs in an organism. A variant has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%sequence identity to a wild type nucleic acid, a wild type amino acid sequence or a wild type protein, provided that the variant retains the original function or activity of the wild type nucleic acid, the wild type amino acid sequence or the wild type protein.
Percent identity between two sequences is a function of the number of identical positions shared by the two sequences being compared, taking into account the number of gaps, and the length of each gap, which needs to be introduced for optimal alignment of the two sequences. Comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, percent identity between two nucleotide or amino acid sequences can be determined using the  algorithm of Meyers and Miller (CABIOS, 1989, 4: 11-17, which is herein incorporated by reference in its entirety) , which has been incorporated into the ALIGN program (version 2.0) . Other examples of such mathematical algorithms include the algorithm of Myers and Miller (1988) CABIOS 4: 11-17, the local homology algorithm of Smith et al. (1981) Adv. Appl. Math. 2: 482, homology alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48: 443-453, the method for searching homology of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85: 2444-2448, a modified version of the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87: 2264 and the algorithm described in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. By using a program based on such a mathematical algorithm, sequence comparisons (i.e., alignments) for determining sequence identity can be performed. The program can be appropriately executed by a computer. Examples of such programs include, but are not limited to, CLUSTAL of the PC/Gene program, ALIGN program (Version 2.0) , and GAP, BESTFIT, BLAST, FASTA, and TFASTA of the Wisconsin Genetics software package. Alignment using these programs may be performed, for example, by using initial parameters.
As used herein, the term “expression” refers to the transcription and/or translation of a gene so that a nucleic acid chain or an amino acid chain is synthesized. A protein is “expressed” or “to be expressed” in the cytoplasm or an organelle of a cell means that the protein is finally located at the corresponding place and functions here.
The terms “co-express” , “co-expressing” and “co-expression” can be used interchangeably herein and mean that two or more proteins or their subunits are expressed or required to be expressed simultaneously in a cell or a cell-free system. The co-expressed proteins can be located at the same subcellular compartment, such as a cytoplasm or an organelle (e.g., mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi, and vacuole) . The co-expressed proteins can be located at different subcellular compartments, such as cytoplasm and/or organelle (s) (e.g., mitochondrion, chloroplast, endoplasmic reticulum  (ER) , Golgi, and vacuole) . For example, the circumstance that protein A is expressed in mitochondria and protein B is expressed in mitochondria belongs to the scope of co-expression, as long as the co-expressed proteins are present in a cell or a cell-free system at the same time scale. Furthermore, the co-expressed proteins can be targeted to cell membrane or secreted to extracellular environment.
The terms “fusion protein” and “polyprotein” can be used interchangeably herein. As used herein, “fusion protein” or “polyprotein” refers to a polymer of individual proteins in which the C-terminal of an upstream protein is linked with the N-terminal of a downstream protein by a stretch of amino acid residues. The coding sequence of one fusion protein or one polyprotein which composed of two or more individual proteins will be co-transcribed into one polycistronic mRNA and then co-translated into one polypeptide chain. In some embodiments, the fusion protein or the polyprotein functions as a whole without further cleavage by a protease or a peptidase. In some embodiments, the fusion protein or the polyprotein will suffer from post-translational cleavage by a protease or a peptidase at the region between an upstream protein and a downstream protein and release at least one of the individual proteins. In some embodiments, the fusion protein or the polyprotein comprises at least two copies of an individual proteins. For example, protein A, protein B and protein C can be configured to a fusion protein as A-B-C, A-A-B-C, A-A-B-B-C and the like.
The terms “protease” and “peptidase” can be used interchangeably herein and refer to a class of enzyme that are capable of recognizing specific protein or peptide and resulting in the breakage of an amino acid sequence of the protein or peptide. A protein or a peptide is “cleaved” or “processed” by a protease or a peptidase means that the amino acid sequence of the protein or the peptide is split by the protease or the peptidase into at least two parts. Proteases or peptidases having the above functions are within the scope of the invention.
A “signal peptide” is a peptide present on proteins that are destined either to be secreted or to be targeted to specific location of a cell, such as cell membrane, mitochondrion, chloroplast, endoplasmic reticulum (ER) , Golgi and vacuole, etc.  A “signal peptide” may occurs in N-terminal or C-terminal of a protein, or even inside the amino acid sequence of a protein.
As used herein, the terms “mitochondrial signal peptide” , “mitochondrial targeting peptide” , “mitochondrial leader peptide” and “presequence” can be used interchangeably and refer to a signal peptide present on a protein that is capable of targeting this protein into mitochondria. Most of the mitochondrial proteins are encoded by nuclear genes. These proteins are transcribed and translated outside the mitochondria, and enter into mitochondria with the help of mitochondrial signal peptides. Some of the mitochondrial signal peptides can be found in protein annotations disclosed in protein databases, such as Uniprot. Usually, mitochondrial signal peptides will be removed from proteins after these proteins entering into mitochondria.
The Mitochondrial Processing Peptidase (MPP) , which is located at mitochondria, is responsible for recognizing and removing the mitochondrial signal peptides from proteins. MPP is composed of two subunits, MPPα and MPPβ. MPPβ is the catalytic subunit of MPP with a conserved Zn-binding motif. In yeast and mammals, MPPs are free in the matrix of mitochondria, while in plants, MPPs are integrated into cytbc1 complex of the mitochondrial respiratory chain. Sequence analysis of a large number of mitochondrial signal peptides shows that the whole sequence is less conservative, except for the arginine (s) located at -2 and/or -3 positions (refer to as R-2 motif and R-3 motif) upstream a site at which the peptide bond is broken by MPP (Ghifari, A.S., Huang, S., and Murcha, M.W. (2019) . The peptidases involved in plant mitochondrial protein import. J. Exp. Bot. 70, 6005-6018; F.N., Wortelkamp S., Zahedi R.P., Becker D., Leidhold C., Gevaert K., Kellermann J., Voos W., Sickmann A., Pfanner N., and Meisinger C (2009) . Global analysis of the mitochondrial N-proteome identifies a processing peptidase critical for protein stability. Cell. 139 (2) : 428-439) . These results suggest that the R-2 and/or R-3 motif may be the key residues for MPP recognition and cleavage.
In the present invention, the term “MPP cleavable sequence” refers to an  amino acid sequence unit that can be recognized and cleaved by MPP. The term “MPP cleavable region” refers to an amino acid sequence comprising one or more MPP cleavable sequences, wherein the one or more MPP cleavable sequences are linked with each other directly by a peptide bond or by a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some embodiments, the MPP cleavable sequences within one MPP cleavable region are the same or different.
In some embodiments of the present invention, the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments of the present invention, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments of the present invention, the MPP cleavable sequence comprises at least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments of the present invention, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP.
As used herein, the term “linker” refers to an amino acid chain that link an upstream protein and a downstream protein via a peptide bond, i.e., the C-terminal of the upstream protein is linked to the N-terminal of the linker and the C-terminal of the linker is linked to the N-terminal of the downstream protein both via a peptide bond. In some embodiments, linker can be a flexible amino acid chain or a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some embodiments, linker can be cleaved by a protease or a peptidase via breaking a peptide bond at one or more specific sites within the linker, thus the upstream protein and the downstream protein connected by the linker will be departed from each other.
In some embodiments, linker comprises one or more protease or peptidase cleavable sequences, the protease or peptidase may be selected from the group consisting of thrombin, Factor Xa, enterokinase, Tobacco Etch Virus (TEV) protease, PreScission, HRV 3C protease, MPP, Stromal Processing Peptidase  (SPP, responsible for recognizing and processing a signal peptide in chloroplast) and Signal Peptidase Complex (SPC, responsible for recognizing and processing a signal peptide in ER) . In some embodiments, linker comprises one or more MPP cleavable sequences as described above. In some embodiments, linker comprises an MPP cleavable region. In some embodiments, linker comprises an MPP cleavable region and a flexible amino acid sequence located at N-terminal and/or C-terminal of the linker, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some embodiments, linker is an MPP cleavable region.
As used herein, the term “expression cassette (EC) ” refers to a nucleic acid or a polynucleotide that contains all the elements required for efficiently expressing a protein. Usually, EC comprises a promoter, a Ribosome Binding Site (EBS) , a coding sequence and a terminator. In some embodiments, EC further comprises at least one of the elements selected from the group consisting of an enhancer, an intron, a 5’-UTR, a 3’-UTRand a poly (A) tail.
In some embodiments, a promoter can be a native promoter or a synthetic promoter. In some embodiments, a promoter can be a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter. The promoter may be a promoter commonly used in eukaryotic expression systems or a promoter used in prokaryotic expression systems. Examples of promoters used in eukaryotic expression systems include, but are not limited to, CMV promoter (Cytomegalovirus promoter) , SV40 promoter (Simian virus 40 promoter) , PGK promoter (phosphoglycerate kinase promoter) , EF1α promoter (elongation factor 1-alpha promoter) , β-actin promoter, Ubc promoter (human ubiquitin C gene-derived promoter) , CAG promoter (hybrid mammalian promoter) , TRE promoter (tetracycline response element promoter) , UAS promoter (Drosophila promoter with Gal4 binding site) , Ac5 promoter (Drosophila actin 5c gene-derived insect promoter) , CaMKIIa promoter (Ca2+/calmodulin-dependent protein kinase II promoter) , GAL1 promoter (yeast galactokinase promoter) , GAL1 and GAL10 promoters (yeast bidirectional promoter) , GAL2 promoter (yeast galactose permease promoter) , GAL7 promoter (yeast galactose-1-phosphate uridyl  transferase promoter) , GAL10 promoter (yeast UDP-glucose-4-epimerase promoter) , TDH3 promoter (yeast triose-phosphate dehydrogenase promoter) , TEF promoter (yeast transcription elongation factor promoter) , GDS promoter (glyceraldehyde-3-phosphate dehydrogenase-derived yeast promoter) , ADH1 promoter (yeast alcohol dehydrogenase I promoter) , CaMV35S promoter (cauliflower virus-derived plant promoter) , Ubi Promoter (maize ubiquitin gene promoter) , H1 promoter (human polymerase III-derived RNA promoter) and U6 promoter (human U6-derived small nuclear promoter) . Examples of promoters used in prokaryotic expression systems include, but are not limited to, T7 promoter (T7 phage-derived promoter) , T7lac promoter (T7 phage-derived promoter plus lac operator) , Sp6 promoter (Sp6 phage-derived promoter) , araBAD promoter (arabinose metabolism operon-derived promoter) , trp promoter (tryptophan operon-derived promoter) , lac promoter (lac operon-derived promoter) , Ptac promoter (a hybrid promoter of the lac promoter and the trp promoter) , Ptac/lacO promoter (Ptac promoter plus lac operator) and pL promoter (Lambda phage-derived promoter) .
As used herein, the term “vector” refers to a nucleic acid or a polynucleotide that is capable of carrying an expression cassette of a gene of interest and facilitating the expression of a protein. Vector can be a linear or a circular DNA or RNA with either single strand or double strands. In some embodiments, vector is free in cytoplasm of a cell after entering into the cell. In some embodiments, vector is integrated into the genome of a cell after entering into the cell. In some embodiments, vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene. The vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
The term “and/or” as used herein should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed  with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B” can refer, in one embodiment, to A only (optionally including elements other than B) ; in another embodiment, to B only (optionally including elements other than A) ; in yet another embodiment, to both A and B (optionally including other elements) ; etc.
Methods for co-expression of multiple proteins
The “fusion and cut” feature of polyprotein or fusion protein strategy enables co-expression of protein components in prokaryotes, eukaryotes or cell-free systems at stoichiometric levels, thus enabling the engineering of protein complexes and intricate biochemical pathways requiring balanced gene expression. The two or more proteins to be co-expressed can be linked as a fusion protein by a linker that are recognized and cleaved by an enzyme, such as a protease or peptidase. The protease or peptidase may be an endogenous enzyme that naturally exist in a host cell or subcellular compartment, where the proteins of interest are expressed. The protease or peptidase may be an exogenous enzyme that are artificially introduced into a host cell or subcellular compartment, where the proteins of interest are expressed. Examples of protease or peptidase include, but are not limited to thrombin, Factor Xa, enterokinase, Tobacco Etch Virus (TEV) protease, PreScission, HRV 3C protease, MPP, Stromal Processing Peptidase (SPP, responsible for recognizing and processing a signal peptide in chloroplast) and Signal Peptidase Complex (SPC, responsible for recognizing and processing a signal peptide in ER) .
In this invention, the inventors have found that an MPP cleavable sequence can be used to convert multiple proteins into fusion proteins, facilitating the co-expression of multiple proteins or their subunits in mitochondria, cytoplasm or cell-free systems. When targeting these fusion proteins into mitochondria, the  fusion proteins can be cleaved by an endogenous MPP located in mitochondria and the individual protein will be released and become functional proteins. Also, this method can be applied to other cellular or non-cellular environment, where the fusion proteins are co-expressed with an exogenous MPP. Precise cleavage of fusion proteins by MPP and efficient co-expression of individual proteins can be achieved in either mitochondria or cytoplasm, wherever in prokaryotes or eukaryotes. This MPP-based method succeeds in functional expression of multiple proteins with high level of protein accumulation in the circumstance of protein co-expression.
Accordingly, in one aspect, the invention relates to a method for co-expressing two or more proteins in a cell or a cell-free system, comprising expressing the two or more proteins as a fusion protein in which at least two adjacent proteins are linked by a linker comprising an MPP cleavable region, wherein cleavage of the MPP cleavable region by an MPP in the cell or the cell-free system results in release of the proteins linked by the linker.
In some embodiments, each of the protein to be co-expressed is linked to the adjacent protein (s) in the fusion protein by a linker comprising an MPP cleavable region or an uncleavable linker, and wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
In some embodiments, the cell co-expressing one or more proteins endogenously expresses an MPP. In some embodiments, the cell is a eukaryotic cell and the MPP is located at mitochondria of the cell, wherein the fusion protein is located at mitochondria of the cell.
In some embodiments, the cell co-expressing one or more proteins exogenously expresses an MPP. In some embodiments, the cell is a prokaryotic cell or a eukaryotic cell and the MPP is located in the cytoplasm of the cell, and wherein the fusion protein is expressed in the cytoplasm of the cell.
In some embodiments, the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some preferred embodiments, the prokaryotic cell is a cell of  Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus cereus. In some preferred embodiments, the prokaryotic cell is a cell of Escherichia coli.
In some embodiments, the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
In some embodiments, the fungal cell is a yeast cell. In some embodiments, the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris. In some preferred embodiments, the yeast is Saccharomyces cerevisiae. In some preferred embodiments, the yeast is Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, the yeast is Saccharomyces cerevisiae S288C strain.
In some embodiments, the algae is selected from the group consisting of Chlorella, Spirolina, Chlamydononas, Dunaliella, Chaetoceros and Porphyridum, such as Dunaliella tertiolecta, Porphyridium sp., Dunaliella parva, Chlorella pyrenoidosa, Chlamydononas reinhardtii or Chaetoceros muelleri.
In some embodiments, the animal is selected from the group consisting of mouse, rat, chicken, rabbit, goat, donkey, monkey, pig, sheep and human.
In some embodiments, the plant is selected from the group consisting of arabidopsis, tobacco, barley, rice, maize, wheat, sorghum, sweet corn, sugar cane, onions, tomatoes, strawberries and asparagus. In some embodiments, the plant is  selected from the group consisting of Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa, Triticum aestivum, Zea mays, Sorghum bicolor, Setaria italica, Solanum tuberosum, Ipomoea batatas, Arachis hypogaea, Brassica napus, Malva farviflora, Sesamum indicum, Olea europaea, Elaeis guineensis, Saccharum officinarum, Beta vulgaris and Gossypium spp.
In some embodiments, the cell-free system co-expressing one or more proteins comprises an effective amount of the MPP capable of cleaving the fusion protein at the MPP cleavable region.
In some embodiments, MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae S288C strain.
In some embodiments, the linker or the MPP cleavable region comprises one or more MPP cleavable sequences. In some embodiments, the linker or the MPP cleavable region comprises two or more MPP cleavable sequences arranged in tandem, and wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
In some embodiments, each of the MPP cleavable sequences independently comprises a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
In some embodiments, the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria. In some embodiments, the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell. In some embodiments, the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the  subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana. In some embodiments, the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments, the MPP cleavable sequence comprises at least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP. In some preferred embodiments, the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
In some embodiments, MPP cleavable sequences used in a fusion protein are selected from any one of SEQ ID NOs: 67-81 or any combinations thereof. In some embodiments, ore or more MPP cleavable sequences within one MPP cleavable region are linked by a peptide bond. In some embodiments, one or more MPP cleavable sequences within one MPP cleavable region are linked by a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some embodiments, MPP cleavable sequences within one MPP cleavable region are the same or different.
In some embodiments, linker comprises one or more MPP cleavable sequences as described above. In some embodiments, linker comprises an MPP cleavable region. In some embodiments, linker comprises an MPP cleavable region and a flexible amino acid sequence located at N-terminal and/or C-terminal of the linker, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1- 5. In some embodiments, linker is an MPP cleavable region.
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X1-X2,
wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST. In some embodiments, X1 is RQAFQKRA and X2 is FHT or FST. In some preferred embodiments, the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X3-X2,
wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6. In some embodiments, X3 is R (G) nRRA and X2 is FHT or FST, and wherein n=2 or 3. In some preferred embodiments, the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) . In some cases, the “G” in the brackets of above formula can be replaced by any other amino acid residues.
In some embodiments, the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
In some embodiments, the complex biological system is nitrogenase system. In some embodiments, the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ. In some preferred embodiments, the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and  optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” .
In some embodiments, the biosynthetic pathway is violacein biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE. In some preferred embodiments, the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the biosynthetic pathway is isobutanol biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA. In some embodiments, the fusion protein comprises:
(a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
(b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
(c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the two or more proteins to be co-expressed comprises fluorescence proteins, such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue fluorescence protein) , EYFP (enhanced yellow fluorescence protein) , ECFP (enhanced cyan fluorescence protein) , GFP (green fluorescence protein) , RFP (red fluorescence protein) , BFP (blue fluorescence protein) , YFP (yellow fluorescence protein) , CFP (cyan fluorescence protein) , FbFP (flavin mononucleotide based fluorescence protein) , mCherry, dsRed, tdTomato and turbo-RFP.
In some embodiments, the two or more proteins to be co-expressed comprises protein tags for use in protein purification or immunoblotting. These protein tags can be removed from fusion proteins after a purification step.  Examples of protein tags include, but are not limited to polyhistidine tag, GST tag, HA tag, FLAG tag, MBP tag, NusA tag, c-Myc tag and Strep tag.
For efficiently co-expressing of proteins, the nucleic acid encoding a linker comprising an MPP cleavable region is connected in frame with the nucleic acid encoding an upstream protein and a downstream protein to obtain a coding sequence (CDS) of a fusion protein. The nucleic acids encoding more than two proteins can be connected in the same manner as described above to obtain a CDS of a fusion protein carrying more individual proteins. The CDS of a fusion protein is operably linked to an expression control sequence, such as a promoter. A skilled person in the art will know whether a certain expression control sequence (such as a ribosome binding site, a terminator, an enhancer, an intron, a 5’-UTR, a 3’-UTR, a poly (A) tail, etc. ) is required to construct an expression cassette (EC) . The EC will be ligated to a vector for expression. For the circumstance that a large number of proteins are to be co-expressed, these proteins can be divided into different groups, and the nucleic acids encoding proteins in each group will be connected into one CDS and subsequently constructed into one EC. The multiple ECs can be inserted into one vector. Alternatively, the multiple ECs can be inserted into multiple vectors to form a vector composition. The vector or the vector composition can be introduced into a cell by transformation, transfection, electroporation or any other techniques well-known in the art. The cells harboring vectors will be cultured under a condition that is suitable for protein expression. The fusion proteins can be expressed in cells and then cleaved by either an endogenous MPP or an exogenous MPP to release individual proteins. The EC encoding an exogenous MPP can be constructed on the same vector as fusion proteins, or on another vector.
A skilled person in the art would easily choose an appropriate promoter for driving the expression of a fusion protein. In some embodiments, a promoter can be a native promoter or a synthetic promoter. In some embodiments, a promoter can be a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter. The promoter may be a promoter commonly used in eukaryotic  expression systems or a promoter used in prokaryotic expression systems. Examples of promoters used in eukaryotic expression systems include, but are not limited to, CMV promoter (Cytomegalovirus promoter) , SV40 promoter (Simian virus 40 promoter) , PGK promoter (phosphoglycerate kinase promoter) , EF1α promoter (elongation factor 1-alpha promoter) , β-actin promoter, Ubc promoter (human ubiquitin C gene-derived promoter) , CAG promoter (hybrid mammalian promoter) , TRE promoter (tetracycline response element promoter) , UAS promoter (Drosophila promoter with Gal4 binding site) , Ac5 promoter (Drosophila actin 5c gene-derived insect promoter) , CaMKIIa promoter (Ca2+/calmodulin-dependent protein kinase II promoter) , GAL1 promoter (yeast galactokinase promoter) , GAL1 and GAL10 promoters (yeast bidirectional promoter) , GAL2 promoter (yeast galactose permease promoter) , GAL7 promoter (yeast galactose-1-phosphate uridyl transferase promoter) , GAL10 promoter (yeast UDP-glucose-4-epimerase promoter) , TDH3 promoter (yeast triose-phosphate dehydrogenase promoter) , TEF promoter (yeast transcription elongation factor promoter) , GDS promoter (glyceraldehyde-3-phosphate dehydrogenase-derived yeast promoter) , ADH1 promoter (yeast alcohol dehydrogenase I promoter) , CaMV35S promoter (cauliflower virus-derived plant promoter) , Ubi Promoter (maize ubiquitin gene promoter) , H1 promoter (human polymerase III-derived RNA promoter) and U6 promoter (human U6-derived small nuclear promoter) . Examples of promoters used in prokaryotic expression systems include, but are not limited to, T7 promoter (T7 phage-derived promoter) , T7lac promoter (T7 phage-derived promoter plus lac operator) , Sp6 promoter (Sp6 phage-derived promoter) , araBAD promoter (arabinose metabolism operon-derived promoter) , trp promoter (tryptophan operon-derived promoter) , lac promoter (lac operon-derived promoter) , Ptac promoter (a hybrid promoter of the lac promoter and the trp promoter) , Ptac/lacO promoter (Ptac promoter plus lac operator) and pL promoter (Lambda phage-derived promoter) .
Vector can be a linear or a circular DNA or RNA with either single strand or double strands. In some embodiments, vector is free in cytoplasm of a cell after  entering into the cell. In some embodiments, vector is integrated into the genome of a cell after entering into the cell. In some embodiments, vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene. The vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
MPP cleavable sequence
In another aspect, the invention provides an MPP cleavable sequence comprising a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP. MPP cleavable sequence usually refers to amino acid sequence, while in some cases, a nucleic acid encoding an amino acid sequence that can be cleaved by MPP is also within the meaning of an MPP cleavable sequence.
In some embodiments, the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria. In some embodiments, the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell. In some embodiments, the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
In some embodiments, the MPP cleavable sequence is at least 8, 9, 10, 11, 12, 13, 14, 15, 20, 25 or 30 amino acids in length. In some embodiments, the MPP cleavable sequence is up to 30, 35, 40, 45, 50, 55, 60, 70, 80, 90 or 100 amino acids in length. In some embodiments, the MPP cleavable sequence comprises at  least 4, 5, 6, 7, 8, 9 or 10 amino acids upstream of a site at which a peptide bond is broken by MPP. In some embodiments, the MPP cleavable sequence comprises at least 2 or 3 arginine residues upstream of a site at which a peptide bond is broken by MPP. In some preferred embodiments, the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa. In some preferred embodiments, the MPP cleavable sequence comprises the amino acids of any one of SEQ ID NOs: 67-81.
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X1-X2,
wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST. In some embodiments, X1 is RQAFQKRA and X2 is FHT or FST. In some preferred embodiments, the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
In some embodiments, the artificially designed amino acid sequence comprises a formula as follows:
X3-X2,
wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6. In some embodiments, X3 is R (G) nRRA and X2 is FHT or FST, and wherein n=2 or 3. In some preferred embodiments, the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) . In some cases, the “G” in the brackets of above formula can be replaced by any other amino acid residues.
In some embodiments, MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii. In some embodiments, MPP is derived from species of Saccharomyces, Aspergillus, Kluyveromyces,  Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris. In some embodiments, MPP is derived from Saccharomyces cerevisiae. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae S288C strain.
Fusion protein
In another aspect, the invention provides a fusion protein comprising two or more proteins to be co-expressed in a cell or a cell-free system, wherein at least two adjacent proteins in the fusion protein are linked by a linker comprising an MPP cleavable region, wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
In some embodiments, the linker or the MPP cleavable region comprises one or more of the MPP cleavable sequence, wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
In some embodiments, the two or more proteins to be co-expressed are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
In some embodiments, the complex biological system is nitrogenase system. In some embodiments, the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ. In some preferred embodiments, the fusion protein comprises NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP  cleavable region, and wherein “~” represents a flexible peptide, such as (GGS) n, (GGGS) n or (GGGGS) n in which n is 1-5. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” .
In some embodiments, the biosynthetic pathway is violacein biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE. In some preferred embodiments, the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the biosynthetic pathway is isobutanol biosynthetic pathway. In some embodiments, the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA. In some preferred embodiments, the fusion protein comprises:
(a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
(b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
(c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
wherein “cleav” is the linker comprising the MPP cleavable region.
In some embodiments, the two or more proteins to be co-expressed comprises fluorescence proteins, such as EGFP (enhanced green fluorescence protein) , ERFP (enhanced red fluorescence protein) , EBFP (enhanced blue fluorescence protein) , EYFP (enhanced yellow fluorescence protein) , ECFP (enhanced cyan fluorescence protein) , GFP (green fluorescence protein) , RFP (red fluorescence protein) , BFP (blue fluorescence protein) , YFP (yellow fluorescence protein) , CFP (cyan fluorescence protein) , FbFP (flavin mononucleotide based fluorescence protein) , mCherry, dsRed, tdTomato and turbo-RFP.
In some embodiments, the two or more proteins to be co-expressed comprises protein tags for use in protein purification or immunoblotting. These protein tags can be removed from fusion proteins after purification step. Examples of protein tags include, but are not limited to polyhistidine tag, GST  tag, HA tag, FLAG tag, MBP tag, NusA tag, c-Myc tag and Strep tag.
Vector
In another aspect, the invention provides a vector carrying a nucleic acid encoding a fusion protein of the present invention. Vector can be a linear or a circular DNA or RNA with either single strand or double strands. In some embodiments, vector is free in cytoplasm of a cell after entering into the cell. In some embodiments, vector is integrated into the genome of a cell after entering into the cell. In some embodiments, vector further carrying a selected marker, such as an antibiotic resistance gene and/or a fluorescence reporter gene. The vector may be, for example, a vector derived from a bacterial plasmid, a viral vector, a vector derived from a yeast plasmid, a vector derived from a phage, a cosmid, a phagemid, or the like.
The vector may be a transient transformation vector that cannot replicate in a cell. The vector may be an autonomous replication vector, that is, a vector that exists as an extrachromosomal entity, and its replication is independent of chromosomal replication, such as a plasmid, extrachromosomal element, minichromosome, or artificial chromosome. The vector may contain any element for ensuring self-replication. Alternatively, the vector may be an integration vector that, when introduced into a host cell, is integrated into the genome and replicated with one or more chromosomes.
Examples of origins of replication for bacteria are those of plasmids pBR322, pUC19, pACYC177, and pACYC184 that allow replication in E. coli, and plasmids pUB110, pE194, pTA1060, and pAMβ1 that allow replication in Bacillus.
Examples of origins of replication used in yeast host cells are 2 μm origin of replication, ARS1, ARS4, a combination of ARS1 and CEN3, and a combination of ARS4 and CEN6.
For a vector integrated into the host cell genome, the vector can be integrated into the genome by homologous recombination. In this case, the vector may  contain a polynucleotide for directing integration into the genome of the host cell at one or more precise locations on one or more chromosomes by homologous recombination. To increase the possibility of integration at precise locations, the integration element should contain a sufficient number of nucleotides that have high sequence identity with the corresponding target sequence to enhance the possibility of homologous recombination. These integration elements can be any sequence that is homologous to a target sequence in the host cell genome. In addition, these integration elements may be non-coding polynucleotides or coding polynucleotides. In another case, the vector can be integrated into the genome of the host cell by non-homologous recombination.
The vector may contain one or more selectable markers that allow easy selection of transformed cells, transfected cells, transduced cells, and the like. A selectable marker is a gene of which product provides biocide resistance or virus resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
Examples of bacterial selectable markers include markers for dal gene of Bacillus licheniformis or Bacillus subtilis, or those conferring antibiotic resistance (such as ampicillin, chloramphenicol, kanamycin, neomycin, spectinomycin, or tetracycline resistance) . Suitable markers for use in yeast host cells include but are not limited to ADE2, HIS3, LEU2, LYS2, MET3, TRP1, and URA3. Selectable markers for use in a filamentous fungal host cell include, but are not limited to, adeA (phosphoribosylaminoimidazole-succinocarboxamide synthase) , adeB (phosphoribosyl-aminoimidazole synthase) , amdS (acetamidase) , argB (ornithine carbamoyltransferase) , bar (phosphinothricin acetyltransferase) , hph (hygromycin phosphotransferase) , niaD (nitrate reductase) , pyrG (orotidine-5'-phosphate decarboxylase) , sC (sulfate adenyltransferase) , and trpC (anthranilate synthase) , etc.
For the circumstance that a large number of proteins are to be co-expressed, these proteins can be divided into different groups, and the nucleic acids encoding proteins in each group will be connected into one CDS and subsequently constructed into one expression cassette (EC) . The multiple ECs can be inserted  into one vector. Alternatively, the multiple ECs can be inserted into multiple vectors to form a vector composition. The vector or the vector composition can be introduced into a cell by transformation, transfection, electroporation or any other techniques well-known in the art. The cells harboring vectors will be cultured under a condition that is suitable for protein expression. The fusion proteins can be expressed in cells and then cleaved by either an endogenous MPP or an exogenous MPP to release individual proteins. The EC encoding an exogenous MPP can be constructed on the same vector as fusion proteins, or on another vector.
In some embodiments, vector carries a nucleic acid encoding GFP-cleav-RFP. In some embodiments, vector carries a nucleic acid encoding GFP-cleav-RFP and a nucleic acid encoding an MPP. In some embodiments, vector composition comprises a vector carrying a nucleic acid encoding GFP-cleav-RFP and a vector carrying a nucleic acid encoding an MPP. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, vector carries a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN~NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU-cleav-NifS and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, vector carries a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN~NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU, a nucleic acid encoding NifS, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents  “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN~NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU-cleav-NifS and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding NifH-cleav-NifD-cleav-NifK, a nucleic acid encoding NifE-cleav-NifN~NifB, a nucleic acid encoding NifJ-cleav-NifV-cleav-NifW, a nucleic acid encoding NifF-cleav-NifM-cleav-NifY, a nucleic acid encoding NifU, a nucleic acid encoding NifS, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide. In some preferred embodiments, “~” represents “GGGGSGGGGSGGGGS” . In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, vector carries a nucleic acid encoding VioA-cleav-VioB-cleav-VioE, a nucleic acid encoding VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, vector carries a nucleic acid encoding VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding VioA-cleav-VioB-cleav-VioE, a nucleic acid encoding VioD-cleav-VioC, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, vector composition comprises a vector carrying a nucleic acid encoding VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC and a vector carrying a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, vector carries a nucleic acid encoding ILV5-cleav-ILV3-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, vector carries a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, vector carries a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ADHA-cleav-ARO10, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding ILV5-cleav-ILV3-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid  encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ARO10-cleav-ADHA, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae. In some embodiments, each of the vectors in a vector composition carries at least one of the following nucleic acids: a nucleic acid encoding ILV3-cleav-ILV5-cleav-ILV2, a nucleic acid encoding ADHA-cleav-ARO10, and optionally a nucleic acid encoding an MPP, wherein “cleav” is the linker comprising the MPP cleavable region. In some preferred embodiments, MPP is derived from Saccharomyces cerevisiae.
Cell
In another aspect, the invention provides a cell harboring a vector or a vector composition of the present invention, wherein the cell is a prokaryotic cell or a eukaryotic cell.
In some embodiments, the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas. In some embodiments, the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus cereus. In some preferred embodiments, the prokaryotic cell is a cell of Escherichia coli.
In some embodiments, the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
In some embodiments, the fungal cell is a yeast cell. In some embodiments, the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia,  such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris. In some preferred embodiments, the yeast is Saccharomyces cerevisiae. In some preferred embodiments, the yeast is Saccharomyces cerevisiae W303-1a strain. In some preferred embodiments, the yeast is Saccharomyces cerevisiae S288C strain.
In some embodiments, the algae is selected from the group consisting of Chlorella, Spirolina, Chlamydononas, Dunaliella, Chaetoceros and Porphyridum, such as Dunaliella tertiolecta, Porphyridium sp., Dunaliella parva, Chlorella pyrenoidosa, Chlamydononas reinhardtii or Chaetoceros muelleri.
In some embodiments, the animal is selected from the group consisting of mouse, rat, chicken, rabbit, goat, donkey, monkey, pig, sheep and human.
In some embodiments, the plant is selected from the group consisting of arabidopsis, tobacco, barley, rice, maize, wheat, sorghum, sweet corn, sugar cane, onions, tomatoes, strawberries and asparagus. In some embodiments, the plant is selected from the group consisting of Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa, Triticum aestivum, Zea mays, Sorghum bicolor, Setaria italica, Solanum tuberosum, Ipomoea batatas, Arachis hypogaea, Brassica napus, Malva farviflora, Sesamum indicum, Olea europaea, Elaeis guineensis, Saccharum officinarum, Beta vulgaris and Gossypium spp.
EXAMPLES
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of chemistry, biochemistry, molecular biology, microbiology and cell biology, which are within the capabilities of a person of ordinary skill in the art. The function and advantage of these and other embodiments of the present invention will be more fully understood from the Examples below. The following Examples are intended to illustrate the benefits of the present invention and to describe particular embodiments but are not intended to limit the scope of the invention. It should be understood that various changes and modifications to the Examples will become apparent to those skilled in the art and will fall within the scope of the present invention.
Strains and Media
Escherichia coli strains JM109 and DH5α were used for routine cloning and plasmid propagation. JM109 was used to screen functional MPP processing sites and measure nitrogenase activity by the acetylene reduction assay. E. coli strain NCM3722 was used as host for diazotrophic growth experiments. Luria-Bertani broth for E. coli growth contained 10 g/L of tryptone, 5 g/L of yeast extract, and 10 g/L of NaCl (with or without 15 g/L agar) . All nitrogen fixation assays were performed in KPM minimal medium (10.4 g/L of Na2HPO4, 3.4 g/L of KH2PO4, 26 mg/L of CaCl2·2H2O, 30 mg/L of MgSO4, 0.3 mg/L of MnSO4, 36 mg/L of ferric citrate, 10 mg/L of para-aminobenzoic acid, 30 nmol/L of Na2MoO4, 5 mg/L of biotin, 1 mg/L vitamin B1, 0.05%Casamino acids, and 0.8% (wt/vol) glucose) , supplied with 10 mM ammonium sulfate (KPM-HN) for pregrowth or 0.1%glutamate (KPM-LN) for nitrogenase activity assays. Diazotrophic growth experiments were carried out with solid KPM-NN minimal medium (KPM-LN medium without 0.1%glutamate, para-aminobenzoic acid and casamino acids, containing highly purified agarose (TsingKe BioTech; TSJ001) instead of agar) .
Saccharomyces cerevisiae strain W303-1a (MATα ade2-1 leu2-3, 112 trp1-1 his3-11, 15 ura3-1) was used as the host to express polyprotein-based nif systems. The S. cerevisiae S288C URA3 minus strain (KanMX: : URA3) was used as the  host to express the polyprotein-based violacein biosynthesis and isobutanol biosynthesis pathways. YPD medium for S. cerevisiae growth containing 20 g/L peptone, 10 g/L yeast extract, 20 g/L glucose, and 100 mg/L adenine, with or without 20 g/L agar. Solid synthetic dropout medium (13.4 g/L yeast nitrogen base [BD Biosciences; 291920] , 0.69 g/L dropout mixture [-Leu Do supplement; BD Biosciences, 630414] , 20 g/L glucose, and 20 g/L agar) was used to select transformants. Unless otherwise stated, for inducing protein expression, S. cerevisiae was grown at 30 ℃ in YPDG medium containing 20 g/L peptone, 10 g/L yeast extract, 10 g/L glucose, 10 g/L galactose, and 100 mg/L adenine.
Plasmids Construction
The protein coding sequences and amino acid sequences used for constructing MPP-based polyprotein systems are listed in Table 1. The plasmids were verified by sequencing before being used for further experiments. The coding sequences of MPP (α subunit labeled with HA-tag and β subunit labeled with His-tag) were chemically synthesized by GenScript. Each of the MPP subunits was amplified by PCR, and a ribosome-binding site (RBS) was added. The promoter element (Ptac/lacO) , MPPβ and MPPα subunits, and terminator element (TrrnB) were assembled by Golden Gate assembly to form an operon. Genes encoding violacein biosynthesis were amplified from a plasmid kindly provided by Bingzhi Li (Tianjin University) . The genes ILV2, ILV3, ILV5 and ARO10 encoding the isobutanol biosynthetic pathway were directly amplified from S. cerevisiae genome and gene ADHA from Lactococcus lactis was chemically synthesized. Genes (for use in yeast) encoding each Nif component were chemically synthesized by GenScript Company according to the codon bias of S. cerevisiae nuclear genes. All giant genes and corresponding expression cassettes were assembled via Golden Gate assembly, with the coding sequences for MPP processing sites added by PCR.
Standardized vectors for constructing polyprotein systems by Golden-Gate  Assembly.
Figure 1 depicts the vectors used for hierarchical assembly of multiple parts. The  type IIS restriction enzymes BsaI and BpiI were used for hierarchical Golden Gate assembly. BsaI and BpiI restriction sites located inside the sequences were removed either with point mutations method or overlap PCR. Identical BsaI scars were assigned for all level 0 promoter modules (AGGT/AATG) , terminator modules (TAAG/GCTT) , and the giant gene modules (ORF module, AATG/TAAG) . The Level 1 vectors (1-10) were constructed for assembling the promoter module, ORF module and terminator module into the general expression cassettes (EC) . Three Level 2 vectors A-C were constructed for assembling EC1 to EC10 into sub-gene clusters. Vector A, B, and C was used for assembling EC1-3; EC4-7, and EC8-10 respectively. Ten plasmids carrying gap1-10 sequences were constructed with identical BpiI scars to the corresponding Level 1 vectors 1-10. Three plasmids carrying gapA-C sequences were also constructed with identical BsaI scars to the corresponding Level 2 vectors A-C. These gap sequences were used when no corresponding expression cassette or sub-gene cluster existed. Two Level 3 vectors were constructed. One of these, carrying homologous recombination arms was used for constructing Nif polyprotein system, and the other one carrying the CEN6/ARS4 replication origin and the URA3 expression cassette was used for constructing violacein biosynthetic pathway and isobutanol biosynthetic pathway.
Methods for constructing Nif polyprotein gene clusters.
Figure 2 depicts the procedures of assembly gene clusters encoding Nif polyprotein systems. In each giant genes, the su9 sequence was added to the first gene by overlap PCR. Each PCR gene product flanked with coding sequences of specific processing sites was assembled on a Level 0 vector as giant gene modules. Next, promoter modules, terminator modules and giant gene modules were assembled on Level 1 vectors. The nifHDK, nifUS, nifFMY, nifJVW, and nifENB giant genes were assigned to the Level 1 vectors 1, 2, 3, 8, and 10 respectively. The MBP-TEVp (a maltose binding protein was fused to TEVp to enhance the solubility of the protein) gene module was assigned to Level 1 vector 9. The LEU2 expression cassette with identical BsaI scars was assigned to Level 2 vector  B and subsequently used for assembly to provide auxotrophic selection of positive transformants. The restriction enzyme XbaI was used for linearization of the plasmids prior to transformation.
Yeast Transformation and Mitochondria Extraction
Yeast transformations were carried out according to the lithium acetate (LiAc) method as described in Gietz R. D. and Schiestl R. H. (2007) High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nature Protocols 2 (1) : 31-34 with modifications. In brief, yeast cells were grown in liquid YPD medium at 30℃ and 200 rpm overnight. 1 mL of yeast culture was added into 100 mL YPD medium for further growth until OD600nm is 0.5~0.6. Harvest the cells by centrifugation at 5,000 rpm for 5 min and resuspend the pellet in 25 mL of sterile water and centrifuge at 5,000 rpm for 5 min to pellet the cells. The cells were resuspended in 1 mL of LiAc buffer (100 μL of TE buffer (containing 0.1 M Tri-HCl and 0.01 M EDTA, pH 7.5) , 100 μL of 1 M LiAc (pH 7.5) , 800 μL of sterile water) . 200 μL of resuspended yeast cells were mixed with about 1 μg DNA and 5 μL ssDNA (Solarbio, H1060) . The mixture was added into 1.2 mL of PEG buffer (960 μL of 40%PEG-3350, 120 μL of 1 M LiAc (pH 7.5) and 120 μL of TE buffer) in a 1.5 mL Eppendorf tube. The cells were incubated at 30℃ and 200 rpm for 30 min. Then harvest the cells by centrifugation at 5,000 rpm for 3 min. The supernatant was discarded and the cell pellet was resuspended in 200 μL of sterile water. Transformants were selected on solid synthetic dropout medium.
Mitochondria extraction was performed as described in K. Diekert, A.I.P. M. de Kroon, G. Kispal, R. Lill, “Isolation and subfractionation of mitochondria from the yeast Saccharomyces cerevisiae” in Methods in Cell Biology, L.A. Pon, E.A. Schon, Eds. (Academic Press, 2001) , Vol. 65, pp. 37–51 with modifications. In brief, S. cerevisiae carrying corresponding constructs were grown in flasks in YPDG medium (containing 20 g/L peptone, 10 g/L yeast extract, 10 g/L glucose, 10 g/L galactose, and 100 mg/L adenine) at 30℃ and 200 rpm for 36 hrs. Yeast cells were collected, washed with ddH2O, and resuspended in specific zymolyase  buffer with an appropriate amount of zymolyase to convert the cells to spheroplasts. Subsequently, the spheroplasts were broken using a glass homogenizer, and total extracts were centrifuged at 1,500 ×g for 5 min to remove the cellular debris and unbroken cells. Then the supernatants were centrifuged twice at 4,000 ×g for 5 min to remove other needless cellular contents. The supernatants from the last step were centrifuged at 12,000 ×g for 15 min to collect the crude mitochondria. Precipitates (mitochondria pellets) from 10 mL of yeast cells (~0.5 g wet weight) were resuspended in 200 μL of PBS buffer and 50 μL of 5× Protein Loading Dye (Sangon Biotech; C508320) . After boiling for 20 min, samples were cooled to room temperature and then centrifuged at 12,000 rpm for 2 min. Western Blotting assays were performed using 20 μL of each sample.
Western Blotting
Samples were loaded on 10%SDS-polyacrylamide gels (Thermo Fisher Scientific; NP0301, NP0302) with 5 μL of PageRuler Prestained Protein Ladder (Thermo Fisher Scientific; 26616) as a marker. Proteins on the gels were subsequently transferred to PVDF membranes (Thermo Fisher Scientific; IB24002) using an iBolt 2 (Thermo Fisher Scientific) . The membranes were blocked with 5%skim milk (BD Difco 232100; BD Biosciences) in PBS buffer and then incubated with primary antibodies for 4~8 hrs according to the sensitivity of the antibody for each Nif protein and labeled-tag. The secondary antibody goat anti-rabbit IgG-HRP (ZSGB Biotech; ZB-2301) or goat anti-mouse IgG-HRP (ZSGB Biotech; ZB-2305) were used at 1: 3,000 dilution and incubated for 2 hrs before membrane development. The primary antibodies used for immunoblotting and quantification against Nif proteins were obtained from antiserum of rabbits immunized with specific proteins. The primary antibodies against GFP (ZSGB Biotech, TA-06) , HSP60 (Proteintech, 15282-1-AP) and His tag (ZSGB Biotech, TA-02) are commercially available.
Acetylene Reduction Assay
To measure the nitrogenase activity, cells carrying nitrogenase systems were initially grown overnight in KPM-HN medium. The cells were then diluted into  2 mL of KPM-LN medium in 20-mL sealed tubes to a final OD600 of ~0.3, with or without 200 μM isopropyl-β-d-thiogalactoside (IPTG) to induce the expression of MPP. For the acetylene reduction assay, air in the tubes was repeatedly evacuated and replaced with argon. After incubation at 30℃ for 4 h, 2 mL of C2H2 was injected, and the gas phase was analyzed ~16 h later with a Shimadzu GC-2014 gas chromatograph. Data presented are mean values based on at least two biological replicates.
Diazotrophic Growth Assay
Plasmids carrying nitrogenase systems were transformed into E. coli NCM3722 strain, and the transformants were spread on LB plates with appropriate concentration of antibiotics. After incubation at 37℃ for 16 h, single colonies were picked and streaked onto KPM-NN plates. Then the plates were moved to a 2.5-L anaerobic jar (Oxoid AG0025A; Thermo Fisher Scientific) equipped with anaerobic gas-generating sachets (Oxoid AN0025A; Thermo Fisher Scientific) and an oxygen indicator (Oxoid BR0055B; Thermo Fisher Scientific) . Anaerobic jars were immediately locked and incubated at 30℃ for 3 to 4 days.
Example 1. Testing TEVp-based Polyprotein System in Yeast Mitochondria.
Previously, a TEVp-based Nif polyprotein system was established (Yang, J., Xie, X., Xiang, N., Tian, Z.X., Dixon, R., and Wang, Y.P. (2018) Polyprotein strategy for stoichiometric assembly of nitrogen fixation components for synthetic biology. Proc. Natl. Acad. Sci. 115 (36) : E8509–E8517) , in which the Klebsiella oxytoca (Ko) nif genes were regrouped into five giant genes whose products were expressed and subsequently cleaved by tobacco etch virus protease (TEVp) . This polyprotein system enabled nitrogenase activity and supported diazotrophic growth of Escherichia coli after polyprotein cleavage by TEVp.
To test the effect of this TEVp-based system, the nif gene-encoded polyproteins for expression in yeast was designed, each flanked by different promoters and terminators, with variant sequences encoding the signal peptide of  subunit 9 of the Neurospora crassa F0-ATPase (Su9) fused to the 5′end of the giant nif genes to enable import of polyproteins into yeast mitochondria (Figure 3A) . TEVp-based nif gene cluster were assembled with or without sequences designed to express and target an MBP-TEVp fusion protein to mitochondria (plasmids pNG410 and pNG411 respectively) , which were subsequently transformed and integrated into chromosome XIV of S. cerevisiae at the YNRCΔ9 locus to generate stable strains Sc_410 and Sc_411 (Figure 3A) . The natural nif gene cluster (an operon-based nitrogenase system) from K. oxytoca was cloned and expressed in E. coli for use in a positive control (represented as Ec Nif) .
1.1 TEVp-based nif system leads to less efficient protein targeting.
Yeast strains Sc_410 and Sc_411 were grown in liquid YPDG medium to induce the expression of polyproteins and MBP-TEVp under aerobic condition. However, when mitochondria extracts were prepared from aerobically grown yeast cultures and analyzed by immunoblotting using antibodies against specific Nif proteins, negligible amounts of Nif components were detected in strain Sc_410, which contains the complete TEVp-based nif system, even though MBP-TEVp was clearly imported into mitochondria (Figure 3B) . The results suggest that polyproteins are not imported into mitochondria, potentially because they are unstable in the cytoplasm, or once imported into the mitochondrial matrix they are rapidly degraded. In contrast, cross-reaction to the antibodies was notable in extracts from strain Sc_411 (Figure 3B) , which does not encode TEVp, suggesting that polyprotein import may have occurred, but the smearing of bands on the gel suggests they are unstable in mitochondria in the absence of processing. It can be seen that TEVp may lead to less efficient protein targeting of Nif proteins.
1.2 TEVp is toxic to yeast when co-expressed with Nif polyproteins.
The growth curves of strains Sc_410 and Sc_411 were plotted to determine whether the expression of TEVp would impede the growth of yeast. It was observed that when expression of MBP-TEVp was induced with galactose in strain Sc_411, cell growth was arrested (Figure 4B) . These results reflect severe  challenges to the deployment of TEVp-based nif polyprotein system in mitochondria.
Example 2 Knowledge-based Engineering of Minimal Sequences for Cleavage by Mitochondrial Processing Peptidase (MPP)
The Mitochondrial Processing Peptidase (MPP) was then evaluated for engineering a polyprotein system purpose as it efficiently cleaves mitochondrial proteins after translocation into mitochondria.
2.1 Screening for minimal sequences that can be efficiently cleaved by MPP.
As a first step towards engineering a polyprotein system based on MPP cleavage, it is necessary to identify a minimal cleavage site that would be as short as possible, to avoid the negative effects of residual tails on the activity of processed proteins, but would nevertheless satisfy the requirement for efficient processing. To detect the efficient cleavage by MPP, a two-plasmid system was used in E. coli, in which a GFP-RFP polyprotein containing variant MPP sites was co-expressed with an expression module encoding the α and β subunits of S. cerevisiae MPP (Sc MPP) (Figure 5) . Cleavage was identified by Western Blotting using anti-GFP antibodies. If putative processing sites are recognized and cleaved by MPP, a ~27 kDa GFP polypeptide will be released, whereas in the absence of processing, a ~55 kDa polypeptide representing the GFP-RFP fusion protein should be detected (Figure 5) .
The Su9 pre-sequence (residues 1 to 69 of ATP synthase subunit 9, from Neurospora crassa) was selected for screening the efficiency of MPP processing, on the basis that this sequence targets proteins to mitochondria and is efficiently removed by MPPs (Westermann B and Neupert W (2000) Mitochondria-targeted green fluorescent proteins: convenient tools for the study of organelle biogenesis in Saccharomyces cerevisiae. Yeast 16 (15) : 1421-1427) . It has been previously demonstrated that the Su9 pre-sequence is cleaved at two sites. To simplify the analysis, residues 40-69 (designated here as Su9.30, SEQ ID NO: 67) was chosen, which contains the second processing site, as the starting sequence for  investigation (Figure 6A) . When GFP and RFP were linked by Su9.30, the polyprotein was completely processed by reconstituted Sc MPP in E. coli and a single ~27 kDa band was detected with the anti-GFP antibody (Figure 6B) . Shortening the sequence by 5 residues at the N-terminus (Su9.25) gave similar results, but further removal of N-terminal residues (Su9.20, Su9.15 and Su9.11) decreases the processing efficiency in proportion to their length (Figure 6B) .
The inventors then focused on residues at the C-terminus of Su9.30, which contain the cleavage site itself, particularly the amino acid sequence KRAYSS from residues 25-30. The C-terminal YSS sequence in Su9.11 was replaced with either FHT or FST to form 11-residue peptides designated Su9.11H and Su9.11S, respectively (Figure 6A) . Both of these substitutions enabled complete processing of the GFP-RFP polyprotein by MPP (Figure 6C) .
80%of pre-sequences contain an arginine residue either at position -2 (-2R) or -3 (-3R) (Teixeira P. F. and Glaser E. (2013) Processing peptidases in mitochondria and chloroplasts. Biochimica et Biophysica Acta (BBA) -Molecular Cell Research 1833 (2) : 360-370) and in addition, a flexible region between this conserved arginine and a more distal arginine, located several residues upstream, which is also important for processing (Kojima K., Kitada S., Ogishima. T and Ito A. (2001) A Proposed Common Structure of Substrates Bound to Mitochondrial Processing Peptidase. Journal of Biological Chemistry 276 (3) : 2115-2121) . In an attempt to further shorten the scissile peptide for MPP cleavage, the QAFQ sequence of Su9.11H (located between the original positions 21-24 in Su9.30) was substituted with either one, two or three glycine residues to generate peptides S8K, S9K and S10K respectively (Figure 6A) . However, none of the three sequences was efficiently cleaved (Figure 6D) . Next, the lysine at the -3 position was replaced with arginine to form peptides S8, S9 and S10 (Figure 6A) . Surprisingly, it was observed that both S9 and S10 could be completely processed by MPP in E. coli (Figure 6E) .
2.2 Testing the minimal cleavable sequences in yeast mitochondria.
To determine whether the artificial processing sites work in yeast  mitochondria, GFP-RFP fusions with internal S8, S9 and S10 sites and the Su9 signal peptide for targeting to mitochondria were constructed and expressed under the control of a strong constitutive TDH3 promoter in S. cerevisiae (Figure 7) . Dual copies of the S8, S9 and S10 sequences in the GFP-RFP fusion proteins were also included in this study. When mitochondrial extracts from yeast strains expressing these constructs were assayed by immunoblotting with anti-GFP antibodies, only one band with similar migration to processed GFP was detectable (Figure 7) , suggesting that unprocessed GFP-RFP polyprotein is susceptible to degradation in mitochondria. This implies that the level of processed GFP observed in the immunoblot is proportional to the processing efficiency. Since the dual copy of S10 gave the strongest GFP signal (Figure 7) , suggesting the highest processing efficiency in yeast mitochondria, this double copy of the S10 peptide (RGGGRRAFHT) was selected for further polyprotein construction.
Example 3. Functional Reconstitution of Nitrogenase in E. coli with an MPP-based Polyprotein System
3.1 Construction of an MPP-based polyprotein system.
To determine if it is feasible to engineer an MPP-based polyprotein system that functions to support biosynthesis and activity of nitrogenase, the TEVp-based polyprotein system consisting of five giant nif genes as used in Example 1was re-engineered, by replacing the TEVp cleavable sequences within polyproteins with MPP cleavable sequences while maintain the same gene order in each polyprotein as depicted in Figure 3A. Here, the variant NifD-Y100Q protein with highly activity was used to replace wild-type NifD protein, as the wild-type NifD is susceptible to internal cleavage by MPP (Allen R.S., Gregg C.M., Okada S., Menon A., Hussain D., Gillespie V., Johnston E., Devilla R., Warden A.C., Taylor M., Byrne K., Colgrave M., Wood C.C. (2020) Plant expression of NifD protein variants resistant to mitochondrial degradation. Proc. Natl. Acad. Sci. 117 (37) : 23165-23173) .
Since FHT and FST corresponding to residues 28-30 of Su9.30 could  support efficient cleavage by MPP, the FHT in S10 site (RGGGRRAFHT) was replaced by FST to generate a S10S site (RGGGRRAFST) . Both the dual copies of S10 sites and S10S sites were tested in each polyprotein to find out an optimal MPP-based polyprotein system. In each case, a single MPP-based giant gene was co-transformed with a plasmid containing the remaining nif genes in the operon-based system and grown under nitrogen-fixing conditions. The first protein in each polyprotein was selected for immunoblotting with Nif protein specific antibodies. According to the results of immunoblotting (Figure 8A-8E) , the optimal combination of Nif polyproteins was as follows:  NifJǐǐVǐǐW, NifFǐǐMǐǐY and  (Figure 9A) . The symbolsand are used to represent dual S10 sites and dual S10S sites, respectively. A completely MPP-based polyprotein system (designated version 2.0) was constructed using the optimal combination of Nif polyproteins.
3.2 Testing the nitrogenase activity of MPP-based polyprotein system by  acetylene reduction assay (ARA) and diazotrophic growth assay.
The plasmids carrying TEVp-based polyprotein system (version 1.0) and MPP-based polyprotein system (version 2.0) were co-transformed with a plasmid carrying MPP expression module (show in Figure 5) , respectively. Acetylene reduction assay (ARA) was performed to measure nitrogenase activity of each system. When the MPP-based nif polyprotein system was co-expressed with Sc MPP, this system retained about 82%nitrogenase activity as compared to TEVp-based system (version 1.0) (Figure 9A) in E. coli. A new version with nifU and nifS expressed in an operon combined with the other four polyproteins was further constructed (version 2.1) . This version increased nitrogenase activity to ~127%as compared to TEVp-based version (Figure 9A) .
The NCM3722 strains carrying version 1.0 or version 2.0 of polyprotein system were used for diazotrophic growth assay. As reported previously for TEVp-based polyprotein system (version 1.0) , the MPP-based version (version 2.0) also supported diazotrophic growth of E. coli, confirming that this nitrogenase system is also fully functional (Figure 9B) .
Example 4. Stoichiometric expression of Nif proteins in yeast mitochondria using MPP-based polyproteins.
The functionality of nitrogenase was retained when Nif polyproteins were processed by reconstituted MPP in E. coli. Next, whether the polyproteins could be targeted to S. cerevisiae mitochondria and correctly processed in situ was determined. The NifHDK group, representing the key structural components of molybdenum nitrogenase, was first selected as an example. The giant gene in which nif genes were codon optimized for yeast expression (Table 1) was fused to the ScGAL1 promoter and the Su9 coding sequence for targeting to mitochondria (Figure 10) . When assayed by immunoblotting, mitochondrial extracts from the engineered yeast strain Sc_3682, exhibited bands corresponding to NifH, NifD and NifK, indicating that thepolyprotein is efficiently cleaved by native MPP after transport into mitochondria (Figure 10) . To evaluate the stoichiometric ratio of the processed proteins, the loading volumes were adjusted to provide a similar band intensity for NifH in both the mitochondrial extract and the extract from E. coli expressing native Nif proteins. Quantification of the band intensities revealed that the stochiometry of processed NifH, NifD, and NifK in mitochondria was similar to that of the components expressed from the native nifHDK operon in E. coli (Figure 10, values as indicated in parentheses) .
To assemble the complete MPP-based nif polyprotein system for expression in yeast, the 5 giant nif genes shown in Figure 9A version 2.0 were co-expressed using the same promoters and terminators used for TEVp-based system and fused to variants of the Su9 signal peptide to target polyproteins to mitochondria (Figure 11A) . The final construct (pBDS3752) was transformed into S. cerevisiae to generate stable strain Sc_3752. In order to mimic the balance of Nif component expression observed in E. coli, three promoter variants were used to replace the promoters ofandto generate a new construct (pBDS3942) . This construct was also transformed into S. cerevisiae to generate stable strain Sc_3942.
Immunoblotting of NifH, NifU, NifE, NifV, and NifM, representing each polyprotein, demonstrated that 5 polyproteins were all successfully co-expressed and cleaved by MPP in mitochondria of both of the strain Sc_3752 and strain Sc_3942 (Figure 11B) . Comparing with strain Sc_3752, strain Sc_3942 provided a more appropriate protein stoichiometric ratio, which is similar to that of the reconstituted operon-based Nif system in E. coli.
The growth curves of strains Sc_3752 and Sc_3942 were plotted to determine whether the expression of MPP-based polyproteins would impede the growth of yeast. Yeast strains were grown in YPD medium with 2%glucose (uninduced, Figure 12A) or 0.4%glucose plus 1.6%galactose (to induce the expression of polyproteins, Figure 12B) . The maximum growth rate of strain Sc_535 (a yeast strain transformed with an empty vector) was assigned as 100%(Figure 12C) . It was observed that induced expression of Nif polyprotein by galactose in the MPP-based polyprotein yeast strains Sc_3752 and Sc_3942 had minimal impacts on growth compared with induced expression of TEVp-based system. Therefore, the MPP-based polyprotein system is more suitable for expression in yeast mitochondria.
Example 5. Functional Reconstitution of Violacein Biosynthesis Pathway in Mitochondria Using the MPP-based Polyprotein Strategy.
To further investigate whether the MPP-based polyprotein strategy can be a universal strategy for co-expression of functional components, violacein biosynthesis pathway was selected for heterologous expression in yeast mitochondria. Violacein is an indolocarbazole pigment synthesized in bacteria by five enzymes encoded by vioABCDE operon using tryptophan as a substrate (Jiang P.X., Wang H.S., Zhang C., Lou K., and Xing X.H. (2010) Reconstruction of the violacein biosynthetic pathway from Duganella sp. B2 in different heterologous hosts. Applied Microbiology and Biotechnology 86 (4) : 1077-1088) . Due to the pathway catalysis proceeds in the order VioA, VioB, VioE, followed by VioD, and VioC (Figure 14A) , the five enzymes were arranged in two  polyproteins encoded by giant genes vioAǐǐBǐǐE and vioDǐǐC, each driven by a TDH3 promoter and targeted to mitochondria with the help of Su9 signal peptide (Figure 13) . The method for constructing a polyprotein-based violacein gene cluster is similar to that described in “Methods for constructing Nif polyprotein gene clusters” . As a control, VioABE and VioDC polyproteins linked by a non-cleavable version of the S10 MPP processing site were also constructed, in which the arginine residues at position -2 and -3 were replaced by alanine residues.
To assess violacein biosynthesis, yeast carrying the corresponding plasmids were picked and streaked onto YPD solid plates. Subsequently, the plates were incubated at 30℃ for 3 days. After incubation, the dark violet-colored violacein pigment was synthesized in yeast carrying S10 linked polyproteins, whereas no pigment was synthesized by polyproteins linked by non-cleavable S10 peptide (Figure 14B) . Furthermore, a giant gene, vioAǐǐBǐǐEǐǐDǐǐC, was assembled to express a longer polyprotein encompassing the complete violacein biosynthesis pathway. The synthesis of the violet-colored pigment was also observed, albeit lighter in color compared with yeast carrying the two separate giant genes (Figure 14B) . Immunoblotting against N-terminal His-tagged VioE indicated that the polyproteins comprising MPP cleavable S10 site were correctly and efficiently cleaved by MPP (Figure 14C, lanes Sc_297 and Sc_300) . The polyproteins with a non-cleavable version of the S10 site were undetectable, which means that the non-cleavable polyproteins may be unstable in mitochondria. These results demonstrate that all of the polyproteins with MPP cleavable sequences are correctly recognized and cleaved by MPP, thus releasing functional proteins and achieving the synthesis of violacein.
Example 6. Functional Reconstitution of Isobutanol Biosynthesis Pathway in Mitochondria Using the MPP-based Polyprotein Strategy.
In yeast, isobutanol can be synthesized from pyruvate by three genes, ILV2, ILV3, and ILV5, encoding proteins located in mitochondria, and two genes, ARO10 and ADHA, encoding proteins located in the cytosol (Park S.H., Kim S.  and Hahn J.S. (2016) Improvement of isobutanol production in Saccharomyces cerevisiae by increasing mitochondrial import of pyruvate through mitochondrial pyruvate carrier. Applied Microbiology and Biotechnology 100 (17) : 7591-7598) . A previous study revealed that compartmentalization of all proteins in this pathway in mitochondria remarkably increased isobutanol production (Avalos J.L., Fink G.R., and Stephanopoulos G. (2013) Compartmentalization of metabolic pathways in yeast mitochondria improves the production of branched-chain alcohols. Nature Biotechnology 31 (4) : 335-341) .
To determine whether isobutanol biosynthesis could be achieved by expressing the entire pathway as polyproteins targeted to mitochondria, ILV2, ILV3, and ILV5 genes were arranged in one group and ARO10 and ADHA genes were arranged in another group to create two polyproteins (Figure 13 and Figure 15A) . The method for constructing a polyprotein-based isobutanol gene cluster is similar to that described in “Methods for constructing Nif polyprotein gene clusters” . To optimize expression levels and take into account tailing tolerance (since processing of the S10 site by MPP results in the addition of a 7-residue C-terminal tail to proteins located upstream of the cleavage site) , the gene order in constructs were permuted randomly to obtain a library of polyproteins with multiple rearrangements of coding sequences (Figure 15A) .
To measure isobutanol biosynthesis, yeast carrying the corresponding plasmids were grown overnight in synthetic uracil minus medium with 2%glucose at 30℃ for 24 hrs. 5 mL of each overnight culture was then centrifuged at 1,500 g for 3 min, the supernatant was discarded and the cells were resuspended in 5 mL of synthetic uracil minus medium with 10%glucose in sterile 50-mL conical tubes. Cells were grown in this medium at 30℃ with 220 rpm agitation for 24 h, after which they were centrifuged at 1,500 g for 3 min. The supernatant was filtered through a 0.2 μm filter membrane and used for isobutanol quantification. Isobutanol was quantified using a gas chromatograph (Shimadzu GC-2014C) equipped with a flame ionization detector and a FFAP column (30 m × 0.25 mm × 0.25 μm) , with nitrogen as the carrier gas. The injector and detector  were maintained at 250℃ and 280℃, respectively. The column temperature was initially maintained at 70℃ for 1 min. It was then increased to 200℃ at a rate of 15℃/min, and then maintained at the temperature for 2 min.
Most permutations resulted in isobutanol biosynthesis, and optimal levels were produced by construct 4a (ILV5ǐǐ3ǐǐ2 combined with ARO10ǐǐADHA) or construct 6b (ILV3ǐǐ5ǐǐ2 combined with ADHAǐǐARO10) which yielded about 230 mg/L isobutanol after 48-hr fermentation (Figure 15B) . These results demonstrate that all of the proteins assembled in polyproteins are successfully cleaved by MPP in mitochondria and functions in synthesis of isobutanol.
Other Embodiments
All of the features disclosed in this specification may be combined in any combination. Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent, or similar purpose. Thus, unless expressly stated otherwise, each feature disclosed is only an example of a generic series of equivalent or similar features.
From the above description, one skilled in the art can easily ascertain the essential characteristics of the present invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usages and conditions. Thus, other embodiments are also within the claims.
Table 1. Sequence Information


Claims (70)

  1. A method for co-expressing two or more proteins in a cell or a cell-free system, comprising expressing the two or more proteins as a fusion protein in which at least two adjacent proteins are linked by a linker comprising a Mitochondrial Processing Peptidase (MPP) cleavable region, wherein cleavage of the MPP cleavable region by an MPP in the cell or the cell-free system results in release of the proteins linked by the linker.
  2. The method of claim 1, wherein each of the protein is linked to the adjacent protein (s) in the fusion protein by a linker comprising an MPP cleavable region or an uncleavable linker, and wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  3. The method of claim 1 or 2, wherein the cell endogenously expresses the MPP.
  4. The method of claim 3, wherein the cell is a eukaryotic cell and the MPP is located at mitochondria of the cell, wherein the fusion protein is located at mitochondria of the cell.
  5. The method of claim 1 or 2, wherein the cell exogenously expresses the MPP.
  6. The method of claim 5, wherein the cell is a prokaryotic cell or a eukaryotic cell and the MPP is located in the cytoplasm of the cell, and wherein the fusion protein is expressed in the cytoplasm of the cell.
  7. The method of claim 6, wherein the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas.
  8. The method of claim 7, wherein the prokaryotic cell is a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas  protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus cereus.
  9. The method of any one of claims 3-6, wherein the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  10. The method of claim 9, wherein the fungal cell is a yeast cell, wherein the yeast is selected from the group consisting of Saccharomyces, Aspergillus, Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
  11. The method of claim 1 or 2, wherein the cell-free system comprises an effective amount of the MPP capable of cleaving the fusion protein at the MPP cleavable region.
  12. The method of any one of claims 1-11, wherein the MPP is derived from yeast, arabidopsis, tobacco, rice or algae, such as Saccharomyces cerevisiae, Arabidopsis thaliana, Nicotiana tabacum, Oryza sativa or Chlamydomonas reinhardtii.
  13. The method of any one of claims 1-12, wherein the linker or the MPP cleavable region comprises one or more MPP cleavable sequences.
  14. The method of claim 13, wherein the linker or the MPP cleavable region comprises two or more MPP cleavable sequences arranged in tandem, and wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  15. The method of claim 13 or 14, wherein each of the MPP cleavable sequences independently comprises a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  16. The method of claim 15, wherein the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for targeting a mitochondrial endogenous protein into mitochondria.
  17. The method of claim 16, wherein the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  18. The method of claim 17, wherein the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  19. The method of claim 18, wherein the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa.
  20. The method of claim 19, wherein the MPP cleavable sequence comprises the amino acids sequences of any one of SEQ ID NOs: 67-81.
  21. The method of claim 15, wherein the artificially designed amino acid sequence comprises a formula as follows:
    X1-X2,
    wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST.
  22. The method of claim 21, wherein X1 is RQAFQKRA and X2 is FHT or FST.
  23. The method of claim 22, wherein the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  24. The method of claim 15, wherein the artificially designed amino acid sequence comprises a formula as follows:
    X3-X2,
    wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6.
  25. The method of claim 24, wherein X3 is R (G) nRRA and X2 is FHT or FST, and wherein n=2 or 3.
  26. The method of claim 25, wherein the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  27. The method of any one of claims 1-26, wherein the two or more proteins are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  28. The method of claim 27, wherein the complex biological system is nitrogenase system.
  29. The method of claim 28, wherein the two or more proteins are selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  30. The method of claim 29, wherein the fusion protein comprises NifH- cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide, preferably a “GGGGSGGGGSGGGGS” flexible peptide.
  31. The method of claim 27, wherein the biosynthetic pathway is violacein biosynthetic pathway.
  32. The method of claim 31, wherein the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  33. The method of claim 32, wherein the fusion protein comprises VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  34. The method of claim 27, wherein the biosynthetic pathway is isobutanol biosynthetic pathway.
  35. The method of claim 34, wherein the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  36. The method of claim 35, wherein the fusion protein comprises:
    (a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
    (b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
    (c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
    wherein “cleav” is the linker comprising the MPP cleavable region.
  37. An MPP cleavable sequence comprising a wild type amino acid sequence, a fragment or a variant thereof that is cleavable by MPP, or an artificially designed amino acid sequence that is cleavable by MPP.
  38. The MPP cleavable sequence of claim 37, wherein the wide type amino acid sequence is a wild type mitochondrial signal peptide which is responsible for  targeting a mitochondrial endogenous protein into mitochondria.
  39. The MPP cleavable sequence of claim 38, wherein the mitochondrial signal peptide is derived from a fungal cell, an algae cell, an animal cell or a plant cell.
  40. The MPP cleavable sequence of claim 39, wherein the mitochondrial signal peptide is derived from a mitochondrial endogenous protein selected from the group consisting of: the subunit 9 of the F0 ATPase of Neurospora crassa, COX4 of Saccharomyces cerevisiae, HSP60/YLR259c of Saccharomyces cerevisiae, SSC1/YJR045c of Saccharomyces cerevisiae, CYB2/YML054C of Saccharomyces cerevisiae, mitochondrial ATP synthase of Nicotiana plubaginifolia (encoded by atp2-1 gene) , mitochondrial import receptor subunit TOM20 of Arabidopsis thaliana and 2-oxoglutarate dehydrogenase subunit E1 of Arabidopsis thaliana.
  41. The MPP cleavable sequence of claim 40, wherein the MPP cleavable sequence is derived from the subunit 9 of the F0 ATPase of Neurospora crassa.
  42. The MPP cleavable sequence of claim 41, comprising the amino acids sequences of any one of SEQ ID NOs: 67-81.
  43. The MPP cleavable sequence of claim 37, wherein the artificially designed amino acid sequence comprises a formula as follows:
    X1-X2,
    wherein X1 comprises RQAFQKRA or RQAFQRRA, and X2 comprises YSS, FHT or FST.
  44. The MPP cleavable sequence of claim 43, wherein X1 is RQAFQKRA and X2 is FHT or FST.
  45. The MPP cleavable sequence of claim 44, wherein the artificially designed amino acid sequence is RQAFQKRAFHT (SEQ ID NO: 72) or RQAFQKRAFST (SEQ ID NO: 73) .
  46. The MPP cleavable sequence of claim 37, wherein the artificially designed amino acid sequence comprises a formula as follows:
    X3-X2,
    wherein X3 comprises R (G) nKRA or R (G) nRRA, and X2 comprises YSS, FHT or FST, and wherein n=1, 2, 3, 4, 5 or 6.
  47. The MPP cleavable sequence of claim 46, wherein X3 is R (G) nRRA and X2 is FHT or FST, and wherein n=2 or 3.
  48. The MPP cleavable sequence of claim 47, wherein the artificially designed amino acid sequence is selected from the group consisting of RGGRRAFHT (SEQ ID NO: 78) , RGGGRRAFHT (SEQ ID NO: 79) , RGGRRAFST (SEQ ID NO: 81) or RGGGRRAFST (SEQ ID NO: 80) .
  49. A fusion protein comprising two or more proteins to be co-expressed in a cell or a cell-free system, wherein at least two adjacent proteins in the fusion protein are linked by a linker comprising an MPP cleavable region, wherein the linkers or the MPP cleavable regions between adjacent proteins are the same or different.
  50. The fusion protein of claim 49, wherein the linker or the MPP cleavable region comprises one or more of the MPP cleavable sequences of any one of claims 37-48, wherein the MPP cleavable sequences within the same linker or MPP cleavable region are the same or different.
  51. The fusion protein of claim 49 or 50, wherein the two or more proteins are functionally related, preferable the two or more proteins are involved in the same biosynthetic pathway or metabolic pathway, participating in the same signaling pathway, or belonging to the same complex biological system.
  52. The fusion protein of claim 51, wherein the complex biological system is nitrogenase system.
  53. The fusion protein of claim 52, wherein the two or more proteins are  selected from the group consisting of NifH, NifD, NifK, NifY, NifE, NifN, NifB, NifU, NifS, NifV, NifM, NifJ, NifF NifW, NifZ, NifT, NifX and NifQ.
  54. The fusion protein of claim 53, comprising NifH-cleav-NifD-cleav-NifK, NifE-cleav-NifN~NifB, NifJ-cleav-NifV-cleav-NifW, NifF-cleav-NifM-cleav-NifY, and optionally, NifU-cleav-NifS, wherein “cleav” is the linker comprising the MPP cleavable region, and wherein “~” represents a flexible peptide, preferably a “GGGGSGGGGSGGGGS” flexible peptide.
  55. The fusion protein of claim 51, wherein the biosynthetic pathway is violacein biosynthetic pathway.
  56. The fusion protein of claim 55, wherein the two or more proteins are selected from the group consisting of VioA, VioB, VioC, VioD and VioE.
  57. The fusion protein of claim 56, comprising VioA-cleav-VioB-cleav-VioE and VioD-cleav-VioC, or VioA-cleav-VioB-cleav-VioE-cleav-VioD-cleav-VioC, wherein “cleav” is the linker comprising the MPP cleavable region.
  58. The fusion protein of claim 51, wherein the biosynthetic pathway is isobutanol biosynthetic pathway.
  59. The fusion protein of claim 58, wherein the two or more proteins are selected from the group consisting of ILV2, ILV3, ILV5, ARO10 and ADHA.
  60. The fusion protein of claim 59, comprising:
    (a) ILV5-cleav-ILV3-cleav-ILV2 and ARO10-cleav-ADHA;
    (b) ILV3-cleav-ILV5-cleav-ILV2 and ARO10-cleav-ADHA; or
    (c) ILV3-cleav-ILV5-cleav-ILV2 and ADHA-cleav-ARO10;
    wherein “cleav” is the linker comprising the MPP cleavable region.
  61. A nucleic acid encoding the MPP cleavable sequence of any one of claims 37-48 or the fusion protein of any one of claims 49-60.
  62. The nucleic acid of claim 61, wherein the nucleic acid is a linear DNA fragment or an mRNA fragment.
  63. The nucleic acid of claim 62, wherein the linear DNA fragment is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell.
  64. A vector comprising one or more of the nucleic acids of any one of claims 61-63.
  65. The vector of claim 64, wherein the vector is free in the cytoplasm of a host cell or can be integrated into the genome of the host cell.
  66. The vector of claim 65, wherein the vector is a DNA plasmid vector, a viral vector, a bacterial vector, a cosmid, or an artificial chromosome.
  67. A cell comprising the MPP cleavable sequence of any one of claims 37-48, the fusion protein of any one of claims 49-60, the nucleic acid of any one of claims 61-63 or the vector of any one of claims 64-66, wherein the cell is a prokaryotic cell or a eukaryotic cell.
  68. The cell of claim 67, wherein the prokaryotic cell is a cell of genera Acetobacter, Azotobacter, Bacillus, Enterobacter, Escherichia, Klebsiella, Salmonella or Pseudomonas, such as a cell of Escherichia coli, Azotobacter vinelandii, Klebsiella oxytoca, Klebsiella pneumoniae, Pseudomonas fluorescens, Bacillus subtilis, Pseudomonas protegens, Pseudomonas putida, Pseudomonas veronii, Pseudomonas taetrolens, Pseudomonas balearica, Pseudomonas stutzeri, Pseudomonas aeruginosa, Pseudomonas syringae, Bacillus amyloliquefaciens, Burkholderia phytofirmans, Gluconacetobacter diazotrophicus, Herbaspirillum seropedicae or Bacillus cereus.
  69. The cell of claim 67, wherein the eukaryotic cell is selected from the group consisting of a fungal cell, an algae cell, an animal cell or a plant cell.
  70. The cell of claim 69, wherein the fungal cell is a yeast cell, wherein the yeast is selected from the group consisting of Saccharomyces, Aspergillus,  Kluyveromyces, Schizosaccharomyces, Rhizopus, Candida, Yarrowia and Pichia, such as Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, Kluyveromyces marxianus, Aspergillus terreus, Aspergillus niger, Rhizopus arrhizus, Rhizopus oryzae, Candida albicans, Candida boidinii, Candida sonorensis, Candida tropicalis, Yarrowia lipolytica or Pichia pastoris.
PCT/CN2023/101377 2023-06-20 2023-06-20 Method for co-expressing proteins Pending WO2024259585A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/101377 WO2024259585A1 (en) 2023-06-20 2023-06-20 Method for co-expressing proteins

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/101377 WO2024259585A1 (en) 2023-06-20 2023-06-20 Method for co-expressing proteins

Publications (1)

Publication Number Publication Date
WO2024259585A1 true WO2024259585A1 (en) 2024-12-26

Family

ID=93934635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101377 Pending WO2024259585A1 (en) 2023-06-20 2023-06-20 Method for co-expressing proteins

Country Status (1)

Country Link
WO (1) WO2024259585A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000011175A1 (en) * 1998-08-18 2000-03-02 Syngenta Limited Genetic method for the expression of polyproteins in plants
WO2011110713A1 (en) * 2010-03-09 2011-09-15 Consejo Superior De Investigaciones Científicas (Csic) Vector for the coexpression of a plurality of heterologous proteins in equimolar amounts
WO2018141030A1 (en) * 2017-02-06 2018-08-09 Commonwealth Scientific And Industrial Research Organisation Expression of nitrogenase polypeptides in plant cells
US20210230607A1 (en) * 2018-05-11 2021-07-29 Peking University Method for reconstructing complex biological system on the basis of polyprotein, and use thereof in high activity super simplified nitrogen fixation system construction
CN113755459A (en) * 2020-06-05 2021-12-07 北京大学 Azotoxin variants

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000011175A1 (en) * 1998-08-18 2000-03-02 Syngenta Limited Genetic method for the expression of polyproteins in plants
WO2011110713A1 (en) * 2010-03-09 2011-09-15 Consejo Superior De Investigaciones Científicas (Csic) Vector for the coexpression of a plurality of heterologous proteins in equimolar amounts
WO2018141030A1 (en) * 2017-02-06 2018-08-09 Commonwealth Scientific And Industrial Research Organisation Expression of nitrogenase polypeptides in plant cells
US20210230607A1 (en) * 2018-05-11 2021-07-29 Peking University Method for reconstructing complex biological system on the basis of polyprotein, and use thereof in high activity super simplified nitrogen fixation system construction
CN113755459A (en) * 2020-06-05 2021-12-07 北京大学 Azotoxin variants

Similar Documents

Publication Publication Date Title
US10351863B2 (en) Selection in fungi
Minai et al. Chloroplast biogenesis of photosystem II cores involves a series of assembly-controlled steps that regulate translation
JP4896013B2 (en) Production of polypeptides by improved secretion.
CN104053779B (en) Split inteins and uses thereof
US20200347391A1 (en) Promoter variants
Anraku et al. Protein splicing: its discovery and structural insight into novel chemical mechanisms
US12116699B2 (en) Materials and methods for protein production
Yarimizu et al. Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus
KR102638505B1 (en) Improved protein expression strain
US20230193338A1 (en) Genetic factor to increase expression of recombinant proteins
Lemaire et al. A yeast mitochondrial membrane methyltransferase-like protein can compensate for oxa1 mutations
Kim et al. Production of autolysis-proof Kex2 protease from Candida albicans in Saccharomyces cerevisiae for in vitro processing of fusion proteins
WO2024259585A1 (en) Method for co-expressing proteins
Lin et al. Enzyme kinetics of tobacco Rubisco expressed in Escherichia coli varies depending on the small subunit composition
EP2848684A1 (en) Reagents and methods for the expression of oxygen-sensitive proteins
US20190352346A1 (en) Targeting Nuclear-Encoded Recombinant Proteins to the Chloroplast in Microalgae
EP2553103A1 (en) Protein production in filamentous fungi
Bruenn The Ustilago maydis killer toxins
San Hoa et al. Development of selectable markers for mitochondrial transformation in yeast
JP2007215471A (en) Fusion protein having cellulase partial sequence
KR20120045276A (en) Thermo-tolerant yeast strains and genes thereof
Bartels et al. Supplemental Material for
CHITNIS 5 Import, assembly and degradation of chloroplast

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23941892

Country of ref document: EP

Kind code of ref document: A1