[go: up one dir, main page]

WO2025163019A1 - Étiquettes pour l'expression améliorée de protéines recombinantes - Google Patents

Étiquettes pour l'expression améliorée de protéines recombinantes

Info

Publication number
WO2025163019A1
WO2025163019A1 PCT/EP2025/052302 EP2025052302W WO2025163019A1 WO 2025163019 A1 WO2025163019 A1 WO 2025163019A1 EP 2025052302 W EP2025052302 W EP 2025052302W WO 2025163019 A1 WO2025163019 A1 WO 2025163019A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
poi
peptide
tag
acid sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2025/052302
Other languages
English (en)
Inventor
Monika CSERJAN
Christoph KOEPPL
Christina KROESS
Rainer Schneider
Bernhard Sprenger
Gerald Striedner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boehringer Ingelheim RCV GmbH and Co KG
Original Assignee
Boehringer Ingelheim RCV GmbH and Co KG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boehringer Ingelheim RCV GmbH and Co KG filed Critical Boehringer Ingelheim RCV GmbH and Co KG
Publication of WO2025163019A1 publication Critical patent/WO2025163019A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/02Preparation of peptides or proteins having a known sequence of two or more amino acids, e.g. glutathione
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/20Fusion polypeptide containing a tag with affinity for a non-protein ligand
    • C07K2319/21Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a His-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/35Fusion polypeptide containing a fusion for enhanced stability/folding during expression, e.g. fusions with chaperones or thioredoxin

Definitions

  • the present invention is in the field of recombinant biotechnology, in particular in the field of protein expression.
  • the invention generally relates to peptide tags capable of increasing the yield and/or titer and/or solubility of at least one protein of interest in a host cell, in particular difficult-to-express recombinant proteins. More specifically, the invention relates fusion proteins comprising such peptide tags and to nucleotide sequences encoding such peptide tags and to host cells, expression vectors and expression cassettes comprising such nucleotide sequences.
  • the invention further relates to a method of producing at least one protein of interest in a recombinant host cell using said peptide tags.
  • Escherichia coli is one of the most commonly used host organisms for the production of biopharmaceuticals, as it allows for cost-efficient and fast recombinant protein expression.
  • McElawain et al. 2022, Baeshen et al. 2015 While high yield of several proteins can often be achieved, the production efficiency of numerous proteins is still hampered, resulting in reduced production output and non- profitable manufacturing conditions.
  • POI protein of interest
  • the initial coding nucleotide sequence has been proposed to contribute to translation efficiency and fidelity. While it was initially believed that the base pair complementary of the DB with the 16S ribosomal RNA stabilizes the interaction of the target transcript with the ribosome, it has been later shown that base pairing between the DB and the 16S ribosomal RNA is dispensable for the DB-associated enhancer activity (Moll et al. 2001 ; O’Connor et al. 1999, Sprengart et al. 1996). As the precise mechanism of the function of the DB remains elusive, the a priori design of effective DB sequences for a given gene is still unattainable. Moreover, the effects of the DB are target-gene dependent, necessitating empirical POI-specific DB optimization (Richter et al., 2018).
  • the present invention provides new tools to enhance the protein yield, solubility and/or titer of recombinant proteins in host cells that are also suitable for industrial production.
  • novel peptide tags for the production of recombinant proteins the invention includes recombinant proteins containing said peptide tags and methods for the production and expression thereof, including expression vectors.
  • Recombinant protein production is, while being of highest importance for the biopharmaceutical industry, still hampered by several limitations. Among other features, it has been shown that translation efficiency is a key aspect for protein production efficiency. Besides the importance of recombinant protein production, currently available expression systems and tools still suffer from low translation efficiency and expression titers, consequently resulting in inefficient recombinant protein production conditions. It is the objective of the present invention to overcome current limitations in protein biosynthesis by introducing a novel peptide tag for the high-titer expression of proteins.
  • a peptide tag for expression of a recombinant protein of interest (POI) in a host cell comprising a first peptide sequence consisting of an amino acid sequence selected from SEQ ID NO:1 or SEQ ID NO:2.
  • the peptide tag comprises a second peptide sequence comprising an amino acid sequence selected from SEQ ID NO:5, SEQ ID NO: 10 or SEQ ID NO:21 , preferably wherein said second peptide sequence is C- terminal of the first peptide sequence.
  • the second peptide sequence comprises an amino acid sequence selected from SEQ ID NO: 10 and SEQ ID NO:21.
  • the peptide tag comprises a second peptide sequence selected from SEQ ID NO: 107 or SEQ ID: 140, preferably wherein said second peptide sequence is C-terminal of the first peptide sequence.
  • the peptide tag comprises at least two first peptide sequences, selected from the group consisting of SEQ ID NO:1 or SEQ ID NO:2, preferably wherein said at least two first peptide sequences are directly linked.
  • the first peptide sequence of the peptide tag is directly linked to the second peptide sequence.
  • the first peptide sequence is directly linked to the N-terminus of the second peptide sequence.
  • the peptide tag further comprises one or more linker sequences.
  • the peptide tag described herein may further comprise a solubility and/or expression enhancement tag, a monitoring tag and/or an affinity tag.
  • the peptide tag further comprises a protease recognition and/or cleavage site, preferably at its C-terminus.
  • a host cell comprising the peptide tag described herein, said host cell is a eukaryotic or prokaryotic host cell, preferably a yeast cell, a mammalian cell or a bacterial cell.
  • a peptide tag for expression of a recombinant protein of interest (POI) in a host cell comprising an amino acid sequence of SEQ ID NO:84, wherein a. X at position 2 is L or H; b. X at position 3 is V or S; c. X at position 16 is E or Q; and d. X at position 18 is E or Q.
  • POI protein of interest
  • a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO:11 , SEQ ID NO: 12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, or a functionally active variant thereof.
  • peptide and functionally active variants thereof when used as an expression tag, facilitate the expression of a POI.
  • a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 150 and SEQ ID NO: 155, or a functionally active variant thereof.
  • peptide and functionally active variants thereof when used as an expression tag, facilitate the expression of a POI.
  • a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, or a functionally active variant thereof comprising an amino acid sequence having at least 95% sequence identity to SEQ ID NO: 10, SEQ ID NO:21 , SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87 or SEQ ID NO:88 and comprising SEQ ID NO:1 or SEQ ID NO:2, preferably at its N-terminus.
  • a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 150 and SEQ ID NO: 155, or a functionally active variant thereof comprising an amino acid sequence having at least 95% sequence identity to SEQ ID NO: 160 and comprising SEQ ID NO:1 or SEQ ID NO:2, preferably at its N-terminus.
  • a functionally active variant of a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO:11 , SEQ ID NO:12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, said functionally active variant comprising SEQ ID NO:1 or SEQ ID NO:2 and comprising at least 95% sequence identity to SEQ ID N0:11 , SEQ ID N0: 12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 or SEQ ID NO:47.
  • a functionally active variant of a peptide tag comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 150 and SEQ ID NO: 155, said functionally active variant comprising SEQ ID NQ:150 or SEQ ID NO:155.
  • the peptide tag described herein comprises SEQ ID NO:1 or SEQ ID NO:2 directly linked to the N-terminus of an amino acid sequence selected from the group consisting of SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87 and SEQ ID NO:88.
  • the peptide tag described herein comprises SEQ ID NO:1 or SEQ ID NO:2 directly linked to the N-terminus of an amino acid sequence selected from the group consisting of SEQ ID NO: 160 and SEQ ID: 140.
  • nucleic acid sequences encoding a peptide tag of the invention are provided herein.
  • nucleic acid sequence encoding an amino acid sequence selected from the group consisting of SEQ ID NO:1 , SEQ ID NO:2, SEQ ID NO:11 , SEQ ID NO:12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, or a functionally active variant thereof comprising an amino acid sequence having at least 95% sequence identity to SEQ ID NQ:10, SEQ ID NO:21 , , SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87 or SEQ ID NO:88 and comprising SEQ ID NO:1 or SEQ ID NO:2, preferably at its N-termi- nus.
  • nucleic acid sequence encoding an amino acid sequence selected from the group consisting of SEQ ID NO: 150 and SEQ ID NO: 155, or a functionally active variant thereof comprising an amino acid sequence having at least 95% sequence identity to SEQ ID NO: 160 and comprising SEQ ID NO:1 or SEQ ID NO:2, preferably at its N-terminus.
  • nucleic acid sequence encoding an amino acid sequence selected from the group consisting of SEQ ID NO: 1 , SEQ ID NO:2, SEQ ID NO:11 , SEQ ID NO:12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, or a functionally active variant thereof comprising SEQ ID NO:1 or SEQ ID NO:2 and comprising at least 95% sequence identity to SEQ ID NO:11 , SEQ ID N0:12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 or SEQ ID NO:47.
  • nucleic acid sequence encoding an amino acid sequence selected from the group consisting of SEQ ID NO: 150 and SEQ ID NO: 155, or a functionally active variant thereof comprising SEQ ID NO:1 or SEQ ID NO:2 and comprising at least 95% sequence identity to SEQ ID NO: 150 or SEQ ID NO:155.
  • the peptide tag comprises the amino acid sequence SEQ ID NO: 11 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID NO:16.
  • the peptide tag comprises the amino acid sequence SEQ ID NO: 12 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19 and SEQ ID NQ:20.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:22 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26 and SEQ ID NO:27.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:23 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:28, SEQ ID NO:29, SEQ ID NQ:30 and SEQ ID NO:31.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:32 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:38, and SEQ ID NO:39.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:35 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:36, SEQ ID NO:37, SEQ ID NQ:40, and SEQ ID NO:41.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:42 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, and SEQ ID NO:46.
  • the peptide tag comprises the amino acid sequence SEQ ID NO:47 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:48, SEQ ID NO:49, SEQ ID NQ:50, and SEQ ID NO:51.
  • the peptide tag comprises the amino acid sequence SEQ ID NO: 150 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO: 151 , SEQ ID NO: 152, SEQ ID NO: 153, and SEQ ID NO:154.
  • the peptide tag comprises the amino acid sequence SEQ ID NO: 155 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:156, SEQ ID NO:157, SEQ ID NO:158, and SEQ ID NO:159.
  • an expression vector for expression of a recombinant POI as a fusion protein comprising any of the peptide tags of the invention fused to said POI, preferably fused directly to the N- or C-terminus of said POI, wherein said expression vector comprises a nucleic acid sequence encoding said peptide tag and a nucleic acid sequence encoding said POI.
  • the expression vector described herein comprises a nucleic acid sequence encoding any one of the peptide tags described herein, and said nucleic acid sequence is selected from the group consisting of SEQ ID NQ:13-20, SEQ ID NO:24-31 , SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:36- 41 , SEQ ID NO:43-46 and SEQ ID NO:48-51 .
  • the expression vector described herein comprises a nucleic acid sequence encoding any one of the peptide tags described herein, and said nucleic acid sequence is selected from the group consisting of, SEQ ID NO:151 -154 and SEQ ID NO:156-159.
  • POI protein of interest
  • the expression vector described herein comprises: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. the sequence encoding any one of the peptide tags described herein, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. the sequence encoding the POI, preferably wherein said sequence encoding the POI follows immediately after the 3’ end of the sequence encoding the peptide tag.
  • RBS ribosome binding site
  • an expression vector for expression of a recombinant protein of interest (POI) as a fusion protein comprising: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. a nucleic acid sequence encoding any one of the peptide tags described herein, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. a cloning site, preferably placed immediately downstream of the sequence encoding the peptide tag.
  • an expression cassette for expression of a recombinant protein of interest (POI) as a fusion protein comprising the peptide tag of the invention fused to one terminus of said POI wherein said expression cassette comprises a nucleic acid sequence encoding said peptide tag of the invention and a nucleic acid sequence encoding said POI.
  • POI protein of interest
  • the expression cassette described herein comprises a nucleic acid sequence encoding the peptide tag of the invention, which nucleic acid sequence is selected from the group consisting of SEQ ID NO: 13-20, SEQ ID NO:24-31 , SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:36-41 , SEQ ID NO:43-46 and SEQ ID NO:48-51.
  • the expression cassette described herein comprises a nucleic acid sequence encoding the peptide tag of the invention, which nucleic acid sequence is selected from the group consisting of SEQ ID NO: 151 -154 and SEQ ID NO:156-159.
  • POI protein of interest
  • the expression cassette described herein comprises: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. a nucleic acid sequence encoding the peptide tag of the invention, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. a nucleic acid sequence encoding the POI, preferably wherein said sequence encoding the POI is placed immediately after the sequence encoding the peptide tag.
  • an expression cassette for expression of a recombinant protein of interest (POI) as a fusion protein comprising: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. a nucleic acid sequence encoding the peptide tag of the invention, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. a cloning site, preferably placed immediately downstream of the sequence encoding the peptide tag.
  • an expression vector comprising the expression cassette described herein.
  • a recombinant protein to which any one of the peptide tags described herein is linked, preferably at the N-terminus.
  • a fusion protein comprising a recombinant POI and the peptide tag of the invention, wherein the peptide tag of the invention is fused to the recombinant POI, preferably at the N-terminus of the POI.
  • nucleic acid sequences encoding a fusion protein described herein comprises a start codon.
  • said start codon encodes a methionine.
  • a method of producing a recombinant protein of interest comprising expressing the POI in the form of a fusion protein comprising any one of the peptide tags described herein fused to one terminus of said POI, by a bacterial, mammalian or a yeast expression system.
  • the bacterial expression system is an E. coli expression system, for example using a T7 promoter or an A1 promoter or a functionally active variant thereof.
  • yeast expression system is an expression system using yeast cells of the genus Komagataella or Saccharomyces.
  • the mammalian expression system is an expression system using CHO cells.
  • a recombinant protein of interest comprising expressing the POI in the form of a fusion protein described herein
  • said fusion protein is expressed from an expression vector transiently inserted into a host cell or from an expression cassette stably integrated into the genome of a host cell.
  • a recombinant protein of interest comprising expressing the POI in the form of a fusion protein described herein
  • said fusion protein is expressed from an expression vector, preferably a plasmid, within a host cell or from an expression cassette stably integrated into the genome or chromosome of a host cell.
  • the method of producing a recombinant POI described herein comprises culturing the host cell for a period of time under conditions permitting expression of the POI in the form of a fusion protein and comprising any one or more of the steps of: a. recovering the fusion protein and/or POI; b. purifying the fusion protein and/or POI; c. removing the peptide tag; d. further purifying the fusion protein and/or POI; e. formulating the fusion protein and/or POI; f. modifying the fusion protein and/or POI.
  • a host cell comprising the peptide tag of the invention and/or a nucleic acid sequence encoding said peptide tag.
  • a host cell comprising a fusion protein comprising the peptide tag of the invention and a protein of interest (POI) and/or a nucleic acid sequence encoding said fusion protein.
  • POI protein of interest
  • a host cell comprising the expression vector described herein and/or the expression cassette described herein.
  • Figure 1 Amino acid and nucleotide sequences referred to herein.
  • Figure 2 Schematic overview of tested constructs. N-terminal variants of the tags and their minimal free folding energies of nucleotides -4 to +37.
  • Figure 3 Recombinant protein production kinetics for the PTH fusion proteins (A and B) and the hFGF2 fusion proteins (D and E) and biomass formation for the PTH fusion proteins (C) and the hFGF2 fusion proteins (F) over the time course of the fermentations.
  • a and B Recombinant protein production kinetics for the PTH fusion proteins
  • D and E hFGF2 fusion proteins
  • C biomass formation for the PTH fusion proteins
  • F hFGF2 fusion proteins
  • Figure 4 Direct comparison of specific end-of-fermentation titres for all tested constructs. For amino acid sequences and nucleotide sequences of the fusion constructs see Table 4.
  • Figure 6 Soluble product formation kinetics for TNFa fusion proteins: Volumetric titer [g/L] of fermentations of TNFa fusion proteins comprising LED, PLV or PHS in the T7AC - tag.
  • Figure 7 Soluble product formation kinetics for hFGF-2 fusion proteins: Volumetric titer [g/L] of fermentations of hFGF-2 fusion proteins comprising LED, PLV or PHS N-terminal to the PERNKERKE - tag.
  • Figure 8 Soluble product formation kinetics for PTH fusion proteins: Volumetric titer [g/L] of fermentations of PTH fusion proteins comprising LED, PLV or PHS N-terminal to the PERNKERKE - tag.
  • a primary objective of the present invention is to enhance expression, solubility and/or proper folding of recombinant proteins of interest expressed in host cells. Accordingly, the present invention enables the production of biologically active non- prokaryotic and prokaryotic proteins within host cells in sufficient quantities, e.g. for industrial and/or medical applications.
  • prokaryotic cells e.g. E. coli
  • E. coli represent an important example of host cells, however, low titres, solubility problems and the formation of inclusion bodies are well-known issues faced in eukaryotic host cells as well.
  • the present invention is based on the surprising finding that a peptide tag comprising the amino acid sequence SEQ ID NO:1 or SEQ ID NO:2 significantly enhance expression of target proteins fused to such peptide tag.
  • the present invention thus provides means to efficiently improve existing strategies of recombinant protein production and to obtain a target protein at a high yield, such that active protein required for biotechnology and pharmaceuticals may be supplied in mass-production.
  • a peptide tag comprising a first peptide sequence consisting of an amino acid sequence selected from SEQ ID NO:1 or SEQ ID NO:2, which peptide tag is useful for expression of a recombinant protein of interest (POI) in a host cell.
  • POI protein of interest
  • the peptide tag described herein increases expression of at least one POI.
  • nucleic acid sequences encoding the peptide tag of the invention may comprise or consist of SEQ ID NO:3 or SEQ ID NO:4.
  • the peptide tag of the invention may be longer than the first peptide sequence of SEQ ID NO:1 or SEQ ID NO:2.
  • the peptide tag comprises at least about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 amino acids.
  • the peptide tag comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more amino acids.
  • additional amino acids include, but are not limited to, tag elements, linkers, protease recognition sites and/or protease cleavage sites as described herein.
  • the peptide tag comprises an amino acid elongation C-terminal to the first peptide sequence, so that the first peptide sequence is located at the N-terminus of the peptide tag, preferably only preceded by a starting methionine.
  • the peptide tag comprises additional amino acids at the C-termi- nus of the first peptide sequence.
  • the peptide tag comprises at least about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 amino acids in addition to the first peptide sequence.
  • the peptide tag comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more amino acids in addition to the first peptide sequence.
  • the peptide tag may comprise a second peptide sequence comprising SEQ ID NO:5, SEQ ID NO:10 or SEQ ID NO:21 , or functionally active variants thereof.
  • the peptide tag may comprise a second peptide sequence comprising SEQ ID NO: 107 or functionally active variants thereof.
  • first peptide sequence and the second peptide sequence are directly linked or they are connected via a linker as further described herein.
  • first peptide sequence and the second peptide sequence are directly linked.
  • the first peptide sequence is N-terminal of the second peptide sequence.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:1 directly linked to a second peptide sequence of SEQ ID NO: 10.
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO:11 .
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:1 directly linked to a second peptide sequence of SEQ ID N0:21 .
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO:22.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:2 directly linked to a second peptide sequence of SEQ ID NO: 10.
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO: 12.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:2 directly linked to a second peptide sequence of SEQ ID NO:21 .
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO:23.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:1 directly linked to a second peptide sequence of SEQ ID NO: 107.
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO: 111.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:2 directly linked to a second peptide sequence of SEQ ID NO: 107.
  • the peptide tag of the invention thus may comprise or consist of the amino acid sequence of SEQ ID NO:113.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:1 directly linked to a second peptide sequence of SEQ ID NQ:140.
  • the peptide tag of the invention comprises a first peptide sequence of SEQ ID NO:2 directly linked to a second peptide sequence of SEQ ID NQ:140.
  • the nucleic acid sequence encoding the peptide tag described herein may comprise one or more silent mutation(s), preferably including the introduction of rare codons.
  • the nucleic acid sequence encoding the peptide tag described herein comprises at least 1 , 2, 3, 4, 5, 6, 7, or 8 codons that have been modified to introduce silent mutations. Specifically, such modifications may be achieved by single or multiple nucleotide substitutions, deletions and/or insertions.
  • the nucleic acid sequence encoding the peptide tag described herein comprises at least 1 rare codon.
  • more than one of the silent mutations result in rare codons.
  • Rare codons typically are codons with a codon usage of less than 1 % in the host cell, or less than 10 in 1000 codons.
  • the nucleic acid sequence encoding the peptide tag described herein comprises at least 2, 3, 4, or 5 rare codons.
  • the peptide tag described herein comprises or consists of SEQ ID NO: 11 and is encoded by a nucleic acid sequence comprising SEQ ID NO: 13 or by a nucleic acid sequence comprising SEQ ID NO: 13 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:14, SEQ ID NO:15 and SEQ ID NO:16.
  • the peptide tag described herein comprises or consists of SEQ ID NO: 12 and is encoded by a nucleic acid sequence comprising SEQ ID NO: 17 or by a nucleic acid sequence comprising SEQ ID NO: 17 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:18, SEQ ID NO:19 and SEQ ID NQ:20.
  • the peptide tag described herein comprises or consists of SEQ ID NO:22 and is encoded by a nucleic acid sequence comprising SEQ ID NO:24 or by a nucleic acid sequence comprising SEQ ID NO:22 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:25, SEQ ID NO:26 and SEQ ID NO:27.
  • the peptide tag described herein comprises or consists of SEQ ID NO:23 and is encoded by a nucleic acid sequence comprising SEQ ID NO:28 or by a nucleic acid sequence comprising SEQ ID NO:23 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:29, SEQ ID NQ:30 and SEQ ID NO:31 .
  • the peptide tag described herein comprises or consists of SEQ ID NO:111 and is encoded by a nucleic acid sequence comprising SEQ ID NO:112 or by a nucleic acid sequence comprising SEQ ID NO:112 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:144, SEQ ID NO:145 and SEQ ID NO:146.
  • the peptide tag described herein comprises or consists of SEQ ID NO:113 and is encoded by a nucleic acid sequence comprising SEQ ID NO:114 or by a nucleic acid sequence comprising SEQ ID NO:114 comprising at least 1 , 2, 3, 4 or 5 silent mutations.
  • nucleic acid sequence include but are not limited to SEQ ID NO:147, SEQ ID NO:148 and SEQ ID NO:149.
  • the peptide tag described herein may comprise the first peptide sequence and/or the sequence peptide sequence more than once. Such multiples of the first and/or second peptide sequence may be adjacent to each other or may be separated by one or more linkers or tag elements.
  • the fusion protein described herein may comprise multiple parts, and may comprise duplicates of such parts.
  • the fusion protein herein may comprise a first part comprising a peptide tag as described herein, a second part comprising a protease recognition site, another first part comprising the same or different peptide tag, another second part comprising the same or different protease recognition site and a third part comprising a POI.
  • the peptide tag described herein comprises a starting methionine.
  • the peptide tag described herein comprises one or more linkers.
  • Such linker may be placed between any individual part or tag element of the peptide tag.
  • the peptide tag described herein comprises one or more further tag elements such as solubility enhancement tags, monitoring tags and/or affinity tags.
  • the peptide tag described herein may comprise one or more tag elements more than once.
  • the peptide tag described herein comprises at least one protease recognition site. Specifically, the peptide tag described herein comprises at least one protease cleavage site. Specifically, the peptide tag described herein comprises at least one protease recognition and cleavage site. Specifically, the peptide tag described herein comprises a protease cleavage site at its C-terminus.
  • the peptide tag described herein additionally comprises C-terminal to the first or second peptide sequence, and preferably in that order: a. an affinity tag, preferably a His tag, even more preferably a 6-His-tag; b. a linker, preferably GSG, and c. a caspase-2 recognition and cleavage site, preferably VDVAD.
  • the peptide tag described herein comprises or consists of SEQ ID NO:42.
  • the peptide tag of SEQ ID NO:42 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45 and SEQ ID NO:46.
  • the peptide tag described herein comprises or consists of SEQ ID NO:47.
  • the peptide tag of SEQ ID NO:47 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO:48, SEQ ID NO:49, SEQ ID NQ:50 and SEQ ID NO:51.
  • the peptide tag described herein comprises or consists of SEQ ID NO: 150.
  • the peptide tag of SEQ ID NO: 150 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO: 151 , SEQ ID NO:152, SEQ ID NO:153 and SEQ ID NO:154.
  • the peptide tag described herein comprises or consists of SEQ ID NO: 155.
  • the peptide tag of SEQ ID NO: 155 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO: 156, SEQ ID NO:157, SEQ ID NO:158 and SEQ ID NO:159.
  • the peptide tag described herein additionally comprises C-terminal to the first or second peptide sequence, and preferably in the following order, a. a first affinity tag, preferably a His tag, even more preferably a 6-His-tag; b. a linker, preferably an SA linker; c. a second affinity tag different from the first affinity tag, preferably a Strep tag, even more preferably a Strep II tag; d. a linker, preferably a GSG linker, and e. a caspase-2 recognition and cleavage site, preferably VDVAD.
  • the peptide tag described herein comprises or consists of SEQ ID NO:32.
  • the peptide tag of SEQ ID NO:32 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:38 and SEQ ID NO:39.
  • the peptide tag described herein comprises or consists of SEQ ID NO:35.
  • the peptide tag of SEQ ID NO:35 is encoded by a nucleic acid sequence selected from the group consisting of SEQ ID NO:36, SEQ ID NO:37, SEQ ID NQ:40 and SEQ ID NO:41.
  • a host cell or “a method” includes one or more of such host cells or methods, respectively, and a reference to “the method” includes equivalent steps and methods that could be modified or substituted known to those of ordinary skill in the art.
  • a reference to “methods” or “host cells” includes “a host cell” or “a method”, respectively.
  • less than means ⁇ 20 and more than 20 means > 20.
  • nucleic acid refers to deoxyribonucleic acid (DNA, e.g. a cDNA or genomic DNA), ribonucleic acid (RNA, e.g. a mRNA), or a DNA or RNA analog and polymers thereof, in either single- or double-stranded form, but preferably is doublestranded DNA, made of monomers (nucleotides) containing a sugar, phosphate and a base that is either a purine or pyrimidine.
  • DNA deoxyribonucleic acid
  • RNA e.g. a mRNA
  • DNA or RNA analog and polymers thereof in either single- or double-stranded form, but preferably is doublestranded DNA, made of monomers (nucleotides) containing a sugar, phosphate and a base that is either a purine or pyrimidine.
  • a polynucleotide refers to deoxyribonucleotides in a polymeric unbranched form of any length.
  • nucleotides consist of a pentose sugar (deoxyribose), a nitrogenous base (adenine, guanine, cytosine or thymine/uracil) and a phosphate group.
  • polynucleotide ⁇ ) "nucleic acid sequence(s)" are used interchangeably herein. Unless otherwise indicated, a particular nucleic acid sequence also encompasses complementary sequences.
  • isolated refers to material that is removed from its original or native environment (e.g. the natural environment if it is naturally occurring).
  • a naturally-occurring nucleic acid molecule or polypeptide present in a living animal is not isolated, but the same nucleic acid molecule or polypeptide, separated by human intervention from some or all of the co-existing materials in the natural system, is isolated.
  • nucleic acid molecules could be part of a vector and/or such nucleic acid molecules or polypeptides could be part of a composition, and still be isolated in that such vector or composition is not part of the environment in which the nucleic acid molecule or the polypeptide is found in nature.
  • nucleic acid molecule or polypeptide is considered to be "(in) isolated (form)" when, compared to its native biological source and/or the reaction medium or cultivation medium from which it has been obtained, it has been separated from at least one other component with which it is usually associated in said source or medium, such as another nucleic acid molecule, another polypeptide, another biological component or macromolecule or at least one contaminant, impurity or minor component.
  • Amino acid residues will be indicated according to the standard three-letter or one-letter amino acid code, as generally known and agreed upon in the art. There are twenty known naturally occurring amino acids encoded by sixty-one triplet codons. These 20 amino acids can be split into those that have neutral charges, positive charges, and negative charges:
  • Alanine (Ala, A) nonpolar, neutral; Asparagine: (Asn, N) polar, neutral;
  • Cysteine (Cys, C) polar, neutral
  • Glutamine (Gin, Q) polar, neutral
  • Glycine (Gly, G) nonpolar, neutral
  • Leucine (Leu, L) nonpolar, neutral
  • Methionine (Met, M) nonpolar, neutral
  • Phenylalanine (Phe, F) nonpolar, neutral;
  • Proline (Pro, P) nonpolar, neutral
  • Serine (Ser, S) polar, neutral
  • Threonine (Thr, T) polar, neutral
  • Tryptophan (Trp, W) nonpolar, neutral;
  • Tyrosine (Tyr, Y) polar, neutral
  • Valine (Vai, V) nonpolar, neutral
  • Histidine (His, H) polar, positive (10%) neutral (90%).
  • the "positively” charged amino acids are:
  • Arginine (Arg, R) polar, positive
  • Lysine (Lys, K) polar, positive.
  • the "negatively” charged amino acids are:
  • Aspartic acid (Asp, D) polar, negative;
  • Glutamic acid (Glu, E) polar, negative.
  • peptide refers to the arrangement of amino acid residues in a polymer (polypeptide chain).
  • a peptide, polypeptide or protein can be composed of the standard 20 naturally occurring amino acids, and may include rare amino acids and synthetic amino acid analogs (e.g. non-canonical amino acids).
  • a protein can be a monomer, a dimer or a higher -mer comprising one, two or more polypetide chains.
  • the protein can be a homo-di- or higher -mer comprising two or more identical polypeptide chains or a hetero-di- or higher -mer comprising two or more different polypeptide chains or a mixture thereof, such as a heteromer of a homomer or a homomer of a heteromer.
  • the protein and the polypeptide chains can be any chain of amino acids, regardless of length or post-translational modification (e.g. glycosylation or phosphorylation).
  • the peptide, polypeptide, protein or polypeptide chain(s) described herein includes recombinantly or synthetically produced peptide tags described herein, either individually or as part of a fusion with e.g. a POI.
  • the genetic code translates mRNA nucleotide sequences to amino acid sequences. Genetic information is coded using this process with groups of three nucleotides along the mRNA which are commonly known as “codons”. The set of three nucleotides almost always produce the same amino acid, with a few exceptions like UGA which typically serves as the stop codon but can also encode tryptophan in mammalian mitochondria. Most amino acids are specified by multiple codons demonstrating that the genetic code is degenerate-different codons result in the same amino acid. Codons that code for the same amino acid are termed synonyms or synonymous codons. Translation is initiated at the start codon, which typically is AUG.
  • the start codon is the first codon of an mRNA transcript translated by a ribosome and typically codes for a methionine.
  • the start codon is preferably preceded by a 5' untranslated region (5' UTR), which in prokaryotes preferably includes the ribosome binding site.
  • a point mutation is particularly understood as the engineering of a polynucleotide that results in the expression of an amino acid sequence that differs from the non-engineered amino acid sequence in the substitution or exchange, deletion or insertion of one or more single (non-consecutive) or doublets of amino acids for different amino acids.
  • silent mutation(s) refers to base substitutions that result in no change of the amino acid or amino acid functionality when the altered messenger RNA (mRNA) is translated. Specifically, it refers to a substitution of at least one nucleotide in a codon that does not result in a change of the respective amino acid. For example, if the codon AAA is altered to become AAG, the same amino acid - lysine - will be incorporated into the peptide chain.
  • a “recombinant” as used herein shall mean “being prepared by or the result of genetic engineering”.
  • a “recombinant cell” or “recombinant host cell” refers to a cell or host cell that has been genetically altered to comprise a nucleic acid sequence which was not native to said cell.
  • a recombinant host specifically comprises a recombinant expression vector or cloning vector, or it has been genetically engineered to contain a recombinant nucleic acid sequence in its genome, such as its chromosome.
  • the term “homologous” means derived from the same cell or organism with the same genomic background.
  • heterologous means derived from a cell or organism with a different genomic background, or an artificial sequence not found in nature.
  • a heterologous nucleotide sequence e.g. a DNA
  • a DNA can be a DNA encoding an artificial protein not found in nature.
  • endogenous means originating inside the organism or cell. Specifically, use of an endogenous DNA, e.g. for expression of a protein, means that the DNA originating inside the organism or cell is used for expression. As used herein the term “exogenous” means originating outside the organism or cell. Specifically, use of an exogenous sequence, such as DNA, for expression of a protein, means that the sequence is introduced into the cell, where expression from this sequence takes place.
  • a heterologous or a homologous gene for expression of a heterologous or homologous protein can be introduced into the cell by integrating a vector carrying the gene (there are one or more copies of the vector and thus one or more copies of the gene in the host cell) and being expressed from the vector or one or more copies of the gene can be integrated into the genome and/or chromosome of the host cell, from where the gene is expressed.
  • a homologous protein can also be expressed or overexpressed in a host cell from an endogenous gene.
  • the tag sequence described herein and optionally a promoter is integrated into the genome and/or chromosome of the host cell such that it is operably linked to the endogenous gene.
  • a recombinant protein also may be a homologous protein.
  • one or more copies of the polynucleotide encoding the homologous protein are introduced into the host cell by genetic manipulation.
  • heterologous nucleotide sequences are those not found in the same relationship to a host cell in nature (i.e. , "not natively associated”). Any recombinant or artificial nucleotide sequence is understood to be heterologous.
  • the term “increasing the expression of at least one protein of interest” or “improving the expression of at least one protein of interest” means that the yield and/or the titer and/or secretion and/or the soluble yield and/or soluble titer of a POI which is expressed by a host cell is increased, when said POI is expressed from a fusion construct comprising a nucleotide sequence encoding a peptide tag of the invention compared to expression by the same host cell but from an expression construct comprising a nucleotide sequence encoding the POI without a peptide tag or encoding the POI in fusion to a peptide tag but wherein said peptide tag does not comprise SEQ ID NO: 1 or SEQ ID NO:2.
  • the peptide tags of the invention improve the expression of at least one POI, such as e.g. PTH or hFGF2, as compared to T7AC (SEQ ID NO:79) and/or as compared to T7A3 (SEQ ID NO:78). Specifically, expression is increased at least by about 1.1 fold, 1 .2 fold, 1 .3 fold, 1 .4 fold, 1 .5 fold,
  • yield synonymously used herein with the terms “specific titer” or “spec, titer” or “POI [mg/g CDM]”, refers to the amount of POI(s) as described herein per cell or biomass or product per biomass in mg/g biomass measured as dry cell weight (CDM) and may be presented by mg POI/g biomass (biomass being measured as dry cell weight or cell dry mass (CDM) or wet cell weight (WCW), preferably measured as CDM so that “yield” is represented by mg POI/g (CDM) of a host cell.
  • volumemetric titer refers similarly to the amount of produced POI or model protein(s) as described herein per volume and may be presented as product per volume in mg/L or mg POI/L culture supernatant or whole cell broth.
  • the expressed POI(s) can be located within the cytosol and/or periplasm (in soluble form or in insoluble form, specifically as Inclusion Bodies (“IB”)) of the host cell and/or the supernatant of the cell culture or cell broth.
  • IB Inclusion Bodies
  • titer volumetric titer
  • yield specific titer
  • soluble titer I “soluble volumetric titer” or “soluble yield” I “soluble specific titer” for the part of the POI or fusion protein which, is expressed solubly and “total titer” I “total volumetric titer” or “total yield” I “total specific titer” for the whole amount of the POI or fusion protein including soluble and insoluble fractions
  • IB titer I “IB volumetric titer” or “IB yield” I “IB specific titer” for the part of the POI or fusion protein which is in
  • the protein may either be a polypeptide not naturally occurring in the host cell, i.e. a heterologous protein.
  • the POI may further be a homologous protein to the host cell, but is produced, for example, upon integration by recombinant techniques of one or more copies of the nucleic acid sequence encoding the homologous POI into a host cell.
  • the POI may be produced upon integration of the nucleic acid sequence encoding the POI into the genome or chromosome of the host cell.
  • the POI can also be expressed in a host using a vector, more specifically a plasmid.
  • the POI can be expressed from a plasmid or from the chromosome or genome of the host cell.
  • a nucleic acid sequence comprising the tag sequence of the invention, and, optionally, a promoter different to the native promoter of the endogenous gene, may be integrated into the host cell so that it is operably linked to the endogenous gene encoding the homologous POI.
  • the POI can be a monomer, dimer or multimer, it can be a homomer or heteromer or a mixture thereof.
  • the POI can be any peptide, protein or polypeptide, whether naturally occurring or not.
  • the POI can also be a synthetic peptide, protein or polypeptide.
  • the POI may also comprise one or more rare amino acids and synthetic amino acid analogs (e.g. non-canonical amino acids).
  • proteins that can be produced by the method of the invention are, without limitation, enzymes, regulatory proteins, receptors, growth factors, hormones, peptides, e.g. peptide hormones, cytokines, membrane or transport proteins.
  • the POIs may also be antigens as used for vaccination, vaccines, antigen-binding proteins, immune stimulatory proteins, interleukins, interferons, allergens, full-length antibodies or antibody fragments or derivatives thereof or affinity scaffolds.
  • Antibody derivatives may be for example, but not limited to single chain variable fragments (scFv), Fab fragments or single domain antibodies or camelid antibodies or heavy chain antibodies or derivatives thereof such as VHH fragments or the like.
  • the POI can be an artificial protein comprising polypeptide chains comprising natural as well as artificial amino acid sequences and/or comprising non-canonical amino acids.
  • the POI can also exclusively comprise artificial sequences or can comprise more than one artificial polypeptide chains.
  • the DNA molecule encoding the protein of interest is also termed "gene of interest” or “GOI”.
  • the gene of interest encoding the POI can be a naturally existing DNA sequence or a non-natural DNA sequence.
  • One or more GOI(s) can be under the control of one promoter.
  • each gene of interest is under the control of its own promoter.
  • all different polypeptide chains may be under the control of the same promoter, or each polypeptide chain may be under the control of its own promoter or a mixture thereof.
  • the genes of interest resp. the genes encoding the polypeptide chains of a heteromer may all be on the same expression cassette or on multiple expression cassettes.
  • the POI can be modified in any way.
  • Non-limiting examples for modifications can be insertion or deletion of post-translational modification sites, insertion or deletion of targeting signals (e.g.: leader peptides), fusion to tags (e,g. improving expression and/or solubility of the POI, affinity tags such as 6His-tag for IMAC purification, detection tags such as GFP and/or any tag having other functions), protease recognition and/or cleavage sites, proteins or protein fragments facilitating purification or detection, mutations affecting changes in stability or changes in solubility, or any other modification known in the art.
  • the recombinant protein is a biopharmaceutical product, which can be any protein suitable for therapeutic or prophylactic purposes in mammals.
  • the term “host cell” refers to a cell which is capable of protein expression and optionally protein secretion. Such host cell is applied in the methods of the present invention. For that purpose, for the host cell to produce a recombinant POI, a nucleotide sequence encoding a peptide tag as described herein is introduced or present in the cell. Typically, the term refers to viable cells, capable of growing in a cell culture and/or capable of expressing genes such as the GOI. Common cells that serve as hosts for expression of recombinant genes are prokaryotic and eukaryotic cells, such as bacteria, yeast, or mammalian cells.
  • eukaryotic cells include, but are not limited to, vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells, or yeast cells.
  • E. coli cells examples include, but are not limited to Escherichia coli, Bacillus species, Corynebacterium species, Pseudomonas species, Salmonella species and Streptomyces species.
  • E. coli strains include but are not limited to B strains and K strains, such as BL21 , BL21 (DE3), HMS 174, HMS 174 (DE3).
  • the eukaryotic host cell may be a fungal cell. More preferred is a yeast host cell.
  • yeast cells include but are not limited to the Saccharomyces genus (e.g. Saccharomyces cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum), the Komagataella genus ( Komagataella pastoris, Komagataella pseudopastoris or Komagataella phaffii), Kluyveromyces genus (e.g. Kluyveromyces lactis, Kluyveromyces marxianus), the Candida genus (e.g. Candida utilis, Candida cacao/'), as well as Hansenula polymorpha.
  • Saccharomyces genus e.g. Saccharomyces cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum
  • Komagataella genus Komagataella pastoris, Komagataella
  • the genus Pichia is of particular interest.
  • Pichia comprises a number of species, including the species Pichia pastoris, Pichia methanol- ica, Pichia kluyveri, and Pichia angusta. Most preferred is the species Pichia pastoris.
  • the former species Pichia pastoris has been divided and renamed to Komagataella pastoris, Komagataella phaffii and Komagataella pseudopastoris. Therefore, Pichia pastoris is a synonymous for both Komagataella pastoris, Komagataella phaffii and Komagataella pseudopastoris.
  • Non-limiting examples of mammalian cells include human, mice, rat, monkey and rodent cells lines.
  • Specific mammalian cell lines available as host cells for expression are well known in the art and include, inter alia, Chinese hamster ovary (CHO) cells, NSO, SP2/0 cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (COS), human carcinoma cells (e.g., Hep G2 and A-549 cells), 3T3 cells or the derivatives/progenies of any such cell line.
  • the gene can be translated into protein using cell free translation systems, possibly coupled to an in vitro transcription system.
  • cell free translation systems provide all steps necessary to obtain protein from DNA by supplying the necessary enzymes and substrates in an in vitro reaction.
  • any living cell or organism can provide the necessary enzymes for this process and extraction protocols for obtaining such enzyme systems are known in the art.
  • Common systems used for in vitro transcription/translation are extracts or lysates from reticulocytes, wheat germ or Escherichia coli.
  • nucleic acid molecules containing a desired coding sequence of an expression product such as e.g., a peptide tag, a POI or a fusion protein as described herein, may be used for expression purposes.
  • Hosts transformed or transfected with or containing these sequences are capable of producing the encoded proteins.
  • the expression system may be included in a vector; however, the relevant DNA may also be integrated into the host chromosome or genome.
  • the term refers to a host cell and compatible vector under suitable conditions, e.g., for the expression of a protein coded for by foreign DNA carried by the vector and introduced to the host cell.
  • a POI or a fusion protein can be expressed as soluble protein within the cytoplasm or as soluble protein within the periplasm (e.g. upon fusion of a leader or signal sequence to the POI or fusion protein).
  • a POI or a fusion protein can also be expressed in soluble form and released to the culture supernatant by active transport of the POI or fusion protein from the cytosol, periplasm, or endoplasmatic reticulum to the supernatant.
  • a leader or signal sequence (specifically in bacterial expression systems) or a pre and/or pro sequence (specifically in yeast expression systems) may be fused to the N-terminus of the POI or fusion protein.
  • the POI or fusion protein can also be released from the periplasm to the supernatant upon cell lysis or leakiness of the cell membrane. This may be achieved by addition of enzymes inducing lysis of the cells or by inducing expression of a cell lytic enzyme within the host cell during or after fermentation.
  • Leakiness of the cell membrane(s) can be achieved by mutagenesis of the cell and selection of leaky cells, genetic engineering of cells to make them leaky or by inducing the overexpression and/or underexpression/knock-down of proteins of the cell leading to leakiness of the cell.
  • a POI or fusion protein can also be expressed as inclusion bodies within the cytosol or, for example when an N-terminal signal or leader peptide is used, within the periplasm of the host cell.
  • operably linked refers to the association of nucleotide sequences on a single nucleic acid molecule, i.e. the vector, in a way such that the function of one or more nucleotide sequences is affected by at least one other nucleotide sequence present on said nucleic acid molecule.
  • a promoter is operably linked with a coding sequence encoding the protein of interest or a fusion protein comprising the POI, when it is capable of effecting the expression of that coding sequence.
  • nucleic acids operably linked to each other may be immediately linked, i.e. directly linked without further elements or nucleic acid sequences in between or may be indirectly linked with spacer sequences or other sequences in between.
  • lac operator being operably linked to a promoter refers to the ability of the lac operator to regulate the ability of the promoter to control expression of the coding sequence under specific conditions. Such as the ability of the lac operator to inhibit promoter-dependent expression of the gene of interest when lac repressor protein is bound thereto.
  • vector includes autonomously replicating nucleotide sequences as well as genome integrating nucleotide sequences.
  • a common type of vector is a “plasmid”, which generally is a molecule of double-stranded DNA that can readily accept additional (foreign) DNA and which can readily be introduced into a suitable host cell.
  • a plasmid vector often contains coding DNA and promoter DNA and has one or more restriction sites suitable for inserting foreign DNA.
  • the term “vector” or “plasmid” refers to a vehicle by which a DNA or RNA sequence (e.g., a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g., transcription and translation) of the introduced sequence.
  • the plasmid or expression vector can also be the 2p plasmid for use in yeasts from which the POI or fusion protein is expressed.
  • “Expression vectors” or “vectors” as used herein are defined as DNA sequences that are required for the transcription of cloned recombinant nucleotide sequences, i.e. of recombinant genes and the translation of their mRNA in a suitable host organism.
  • a sequence encoding a desired expression product such as e.g. the fusion protein described herein, is typically cloned into an expression vector that contains a promoter to direct transcription.
  • Suitable bacterial and eukaryotic promoters are well known in the art. The promoter used to direct expression of a nucleic acid depends on the particular application.
  • an inducible or regulatable promoter is typically used for expression and purification of proteins such as e.g. fusion proteins.
  • Expression vectors comprise the expression cassette and additionally usually comprise an origin for autonomous replication in the host cells or a genome or chromosome integration site, one or more selectable markers (e.g., an amino acid synthesis gene or a gene conferring resistance to antibiotics such as zeocin, kanamycin, G418 or hygromycin), a number of restriction enzyme cleavage sites, a suitable promoter sequence, a cloning site or multiple cloning site and a transcription terminator, which components are operably linked together.
  • selectable markers e.g., an amino acid synthesis gene or a gene conferring resistance to antibiotics such as zeocin, kanamycin, G418 or hygromycin
  • inducible promotors for use in bacteria include but are not limited to the T7, A1 (also termed “T7A1 ” or “T7AI”) and T5 promoter systems.
  • T7A1 also termed “T7A1 ” or “T7AI”
  • T7AI T7A1
  • T5 promoter systems For example, being inducible in connection with e. g. one or more lacO repressor binding sites, the Pm promoter/operator or the pBAD promoter/operator.
  • inducible promotors for use in yeast include the AOX 1 promoter or derivatives thereof, such as those published in W02006/089329A1 .
  • the promoter can also be a constitutive promoter, such as but not limited to the GAP or ADH1 promoter for yeast, or a promoter which is repressible by a metabolite or carbon source and is activated upon depletion of the metabolite or carbon source.
  • a constitutive promoter such as but not limited to the GAP or ADH1 promoter for yeast
  • the promoter can be any promoter with and without additional regulatory sequences (such as e.g. repressor binding sites) facilitating translation of the GOI.
  • An “expression cassette” refers to a DNA coding sequence or segment of DNA coding for an expression product that can be inserted into a vector at defined restriction sites.
  • the cassette restriction sites are designed to ensure insertion of the cassette in the proper reading frame.
  • DNA is inserted at one or more restriction sites of the vector DNA, and then is carried by the vector into a host cell along with the transmissible vector DNA.
  • a segment or sequence of DNA having inserted or added DNA, such as an expression vector, can also be called a “DNA construct”.
  • an expression cassette is a DNA construct comprising essentially a promoter, a gene of interest, and upstream of the gene of interest a Shine-Dalgarno (SO) sequence, also termed ribosome binding site (RBS).
  • SO Shine-Dalgarno
  • expression cassette or “integration cassette” may also refer to a linear or circular DNA construct to be integrated, specifically “stably integrated”, into the host genome, such as the bacterial chromosome.
  • the expression host cell has an integrated expression cassette.
  • the expression cassette preferably also comprises two terminally flanking regions which are homologous to a genomic region and which enable homologous recombination.
  • the cassette may contain other sequences such as for example sequences coding for antibiotic selection markers, prototrophic selection markers or fluorescent markers, markers coding for a metabolic gene, genes which improve protein expression or two flippase recognition target sites (FRT) which enable the removal of certain sequences (e.g.
  • linear expression cassette provides the advantage that the genomic integration site can be freely chosen by the respective design of the flanking homologous regions of the cassette. Thereby, integration of the linear expression cassette allows for greater variability with regard to the genomic region.
  • Expression products such as the fusion protein described herein, can be expressed from an autonomously replicating nucleotide sequence, or from nucleotide sequences stably integrated into the genome or chromosome of a host cell.
  • Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al.).
  • “functionally active variants” or “functionally active derivatives” of a peptide tag of the invention may be obtained by substituting, deleting, adding, inserting and/or modifying one or more amino acids of/from/to/of the peptide, whose substitution(s), deletion(s), additions(s), insertion(s) and/or modification(s) preserve the function of the peptide according to the present invention.
  • the peptides of the invention or functionally active variants thereof when used as an expression tag, increase the expression of a protein of interest, optionally, as compared to T7AC (SEQ ID NO:79) and/or as compared to T7A3 (SEQ ID NO:78) and/or facilitate expression of a protein of interest.
  • the yield and/or titer is increased. Specifically, expression is increased at least by about 1.1 fold, 1 .2 fold, 1 .3 fold, 1.4 fold, 1.5 fold, 1.6 fold, 1.7 fold, 1.8 fold, 1.9 fold, 2 fold, 2.1 fold, 2.2 fold,
  • a functionally active variant preferably means a nucleotide sequence having a sequence different form the original nucleotide sequence, but which still codes for the same amino acid sequence, due to the use of the degenerated genetic code.
  • Functional variants of a protein, in particular the peptide tags described herein may be obtained by substituting one or more amino acids of the protein or peptide, which substitution(s) preserve(s) the function of the protein or peptide. Preferably, such substitutions are conservative amino acid substitutions.
  • functionally active variants of the peptide tags described herein comprise the first peptide sequence of SEQ ID NO:1 or SEQ ID NO:2, preferably at the same position as the peptide tag sequence from which the functionally active variant is derived.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:11 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:11 and comprises SEQ ID NO:1 .
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:12 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:12 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:22 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:22 and comprises SEQ ID NO:1.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:23 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:23 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:111 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:111 and comprises SEQ ID NO:1 .
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:113 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:113 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:32 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:32 and comprises SEQ ID NO:1.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:35 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:35 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:42 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:42 and comprises SEQ ID NO:1.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:47 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:47 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO: 150 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 150 and comprises SEQ ID NO: 1 .
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO: 155 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:155 and comprises SEQ ID NO:2.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:11 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 10 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:12 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 10 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:22 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:21 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:23 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:21 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:111 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 107 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:113 has at least about 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 107 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:32 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:85 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:35 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:85 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:42 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:87 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO:47 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO:87 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO: 150 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 160 and in addition comprises SEQ ID NO:1 , preferably at the N-terminus.
  • a functionally active variant of the amino acid sequence as shown in SEQ ID NO: 155 has at least about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or even 100% amino acid sequence identity to the amino acid sequence as shown in SEQ ID NO: 160 and in addition comprises SEQ ID NO:2, preferably at the N-terminus.
  • sequence identity is understood as the relatedness between two amino acid sequences or between two nucleotide sequences and described by the degree of sequence identity or sequence complementarity.
  • sequence identity of a variant, homologue or orthologue as compared to a parent nucleotide or amino acid sequence indicates the degree of identity of two or more sequences.
  • Two or more amino acid sequences may have the same or conserved amino acid residues at a corresponding position, to a certain degree, up to 100%.
  • Two or more nucleotide sequences may have the same or conserved base pairs at a corresponding position, to a certain degree, up to 100%.
  • Sequence similarity searching is an effective and reliable strategy for identifying homologs with excess (e.g., at least 50%) sequence identity.
  • Sequence similarity search tools frequently used are e.g., BLAST, FASTA, and HMMER.
  • Sequence similarity searches can identify such homologous proteins or polynucleotides by detecting excess similarity, and statistically significant similarity that reflects common ancestry.
  • Homologues may encompass orthologues, which are herein understood as the same protein in different organisms, e.g., variants of such protein in different organisms or species.
  • one of the two sequences needs to be converted to its complementary sequence before the % complementarity can then be calculated as the % identity between the first sequence and the second converted sequences using the above-mentioned algorithm.
  • Percent (%) identity with respect to an amino acid sequence, homologs and orthologues described herein is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the specific polypeptide sequence, after aligning the sequence and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity.
  • Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
  • sequence identity between two amino acid sequences can be determined using NCBI BLAST, specifically NCBI BLAST + 2.9.0 program version (Apr-02-2019).
  • Percent (%) identity with respect to a nucleotide sequence e.g., of a nucleic acid molecule or a part thereof, in particular a coding DNA sequence, is defined as the percentage of nucleotides in a candidate DNA sequence that is identical with the nucleotides in the DNA sequence, after aligning the sequence and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent nucleotide sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomies.org.cn), and Maq (available at maq.sourceforge.net).
  • the Burrows- Wheeler Transform e.g., the Burrows Wheeler Aligner
  • ClustalW Clustal X
  • BLAT Novoalign
  • ELAND Illumina, San Diego, CA
  • SOAP available at soap.genomies.org.cn
  • Maq available at maq.sourceforge.net.
  • fusion protein refers to a POI comprising, preferably at its N-terminus, an engineered fusion sequence comprising the peptide tag(s) described herein.
  • fusion sequence specifically the peptide tag described herein, is “fused” or “linked” to a POI, thus making up the fusion protein.
  • fusion sequence and POI linked to each other may be immediately linked, i.e. without further elements or nucleic acid sequences in between or may be indirectly linked with linkers or other sequences in between.
  • Such engineered fusion sequence may comprise a protease recognition and/or cleavage site, e.g. a caspase recognition and/or cleavage site or a TEV-protease recognition and/or cleavage site, to facilitate separation of the POI and the fusion sequence.
  • the fusion protein may be cleaved at the protease cleavage site by addition of a protease such as caspase or TEV-protease, e.g. to the cell culture, to remove the peptide tag from the POI.
  • the engineered fusion sequence described herein comprises one or more peptide tags as described herein, at least one caspase recognition site, and optionally one or more linkers.
  • the fusion protein comprises one or more peptide tags, optionally linked via linker sequences, one or more caspase recognition sites and one or more POIs.
  • the fusion protein provided herein comprises a first part, comprising one or more peptide tags, optionally linked via linker sequences, a second part, comprising a recognition site for target-specific proteolytic cleavage, e.g. by caspases, cp-caspase, caspase-2 or cp caspase-2, or variants thereof having one or more modifications of the amino acid sequence and a third part, comprising a POI, whereas “cp” refers to a “circularly permuted” or reversed caspase.
  • the second part may be part of the peptide tag.
  • the fusion protein described herein may comprise each part more than once and in different order.
  • the fusion protein provided herein may comprise a first part comprising a tag, a second part comprising a caspase recognition site, another first part comprising the same or a different tag sequence, another second part comprising the same or a different recognition site and a third part comprising a POI.
  • the fusion protein described herein may comprise more than one POI separated by one or more fusion sequences comprising one or more recognition sites.
  • the fusion protein described herein is encoded by a heterologous gene which is engineered in such a way that it is translated into protein by a host organism. Specifically, said heterologous gene comprises a nucleotide sequence encoding the peptide tag of the invention.
  • the peptide tag of the invention comprises a first peptide sequence consisting of an amino acid sequence selected from SEQ ID NO:1 or SEQ ID NO:2, that can be used to facilitate expression of a POI.
  • the peptide tag may comprise additional elements N-terminal or C-terminal of SEQ ID NO:1 and/or SEQ ID NO:2.
  • the peptide tag comprises further amino acids C-terminal of SEQ ID NO:1 and/or SEQ ID NO:2.
  • the peptide tag comprises at least about 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49 or 50 additional amino acids C-terminal of SEQ ID NO:1 and/or SEQ ID NO:2.
  • the peptide tag comprises a second peptide sequence of 8 additional amino acids, for example SEQ ID NO:5.
  • the peptide tag comprises a second peptide sequence of 19 additional amino acids, for example SEQ ID NO: 10 or SEQ ID NO:21 .
  • the peptide tag comprises a second peptide sequence of 9 additional amino acids, for example SEQ ID NO: 107.
  • the peptide tag described herein may comprise further tag elements, such as those selected from the group consisting of one or more affinity tags, one or more solubility enhancement tags, one or more monitoring tags, and may comprise one or more linkers, and one or more protease recognition and/or cleavage sites.
  • tag element refers to the first and second peptide sequences as described herein.
  • tag element as used herein further refers to elements such as solubility enhancement tags, monitoring tags and affinity tags.
  • the protease recognition and/or cleavage site refers to those recognized and/or cleaved by TEV protease or a caspase, cp-caspase or reversed caspase, preferably caspase-2, cp caspase-2 or reversed caspase-2.
  • Affinity tags are amino acid sequences that can be used for example for the purification of proteins where they are attached to (fusion proteins with affinity tag e.g. at its N-terminus). These affinity tags have high affinity to appropriate ligands of a solid support, like chromatography resins or directly to the resins. By selectively binding of the fusion protein having the affinity tag to the particular resin, the fusion protein can be purified highly effective by only one chromatography step.
  • affinity tag sequences used herein are selected from histidine (His) tag, specifically a poly-histidine tag, arginine-tag, specifically a poly-ar- ginine tag, peptide substrate for antibodies, chitin binding domain, RNAse S peptide, protein A, l3>-galactosidase, FLAG tag, Strep II tag, streptavidin binding peptide (SBP) tag, calmodulin-binding peptide (CBP), glutathione S-transferase (GST), maltose-binding protein (MBP), S-tag, HA tag, or c-Myc tag or any other tag known to be useful for the efficient purification of a protein it is fused to.
  • His histidine
  • arginine-tag specifically a poly-ar- ginine tag
  • peptide substrate for antibodies chitin binding domain
  • RNAse S peptide protein A
  • Strep II tag streptavidin
  • the affinity tag is a His tag comprising one or more H, specifically a hexahistidine tag.
  • fusion proteins comprising a poly-, or hexa-histidine tag can be captured and purified by IMAC, preferably using a Ni-NTA chromatography material.
  • Solubility and/or expression enhancement tags used herein may be calmodulin- binding peptide (CBP), poly Arg, poly Lys, G B1 domain, protein D, Z domain of Staphylococcal protein A, and thioredoxin or any other tag known to improve the solubility and/or expression of the protein it is fused to, e.g. during expression in a host cell.
  • CBP calmodulin- binding peptide
  • poly Arg poly Arg
  • poly Lys G B1 domain
  • protein D Z domain of Staphylococcal protein A
  • thioredoxin any other tag known to improve the solubility and/or expression of the protein it is fused to, e.g. during expression in a host cell.
  • the solubility tag is based on highly charged peptides of bacteriophage genes, for example such as those listed in US 8,535, 908 B2.
  • the solubility enhancement tag is selected from the group consisting of T7C, T7B, T7B1 , T7B2, T7B3, T7B3, T7B4, T7B5, T7B6, T7B6, T7B7, T7B8, T7B9, T7B10, T7B11 , T7B12, T7B13, T7A, T7A1 , T7A2, T7A3, T7A4, T7A5, T7AC T3, N1 , N2, N3, N4, N5, N6, N7, calmodulin-binding peptide (CBP), poly Arg, poly Lys, G B1 domain, protein D, Z domain of Staphylococcal protein A, DsbA, DsbC and thioredoxin.
  • CBP calmodulin-binding peptide
  • the monitoring tag sequence used herein is m-Cherry, GFP or f-Actin or any other tag useful for detection or quantification of the POI and/or the fusion protein during production of the fusion protein including fermentation, isolation and purification by simple in-situ, inline online or atline detectors, like UV, IR, Raman, Fluorescence and the like.
  • the peptide tag described herein comprises more than one additional tag elements, specifically it comprises an affinity tag and a solubility enhancement tag. Specifically, it comprises an affinity tag, a solubility enhancement tag and a monitoring tag. Specifically, it comprises more than one tag element of the same functionality, specifically it comprises more than one affinity tag, more than one solubility enhancement tag and/or more than one monitoring tag, and any combination thereof.
  • first peptide sequence described herein may be linked to an amino acid sequence comprising the second peptide sequence and further tag elements, which amino acid sequence may be selected from the group consisting of SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87 and SEQ ID NO:88.
  • tag elements may be immediately linked, i.e. without further elements or nucleic acid sequences in between or may be indirectly linked with linkers in between.
  • first peptide sequence described herein may be linked to an amino acid sequence comprising the second peptide sequence and further tag elements, which amino acid sequence may be SEQ ID NO: 160.
  • tag elements may be immediately linked, i.e. without further elements or nucleic acid sequences in between or may be indirectly linked with linkers in between.
  • first peptide sequence described herein may be linked to an amino acid sequence comprising the second peptide sequence and further tag elements, which amino acid sequence may be SEQ ID NO: 140.
  • linkers refers to any amino acid sequence that does not interfere with the function of elements being linked.
  • Linkers may connect e.g., nucleotide sequences, or amino acid sequences.
  • Linkers can be used between the tag and the POI or between tag sequences.
  • the linkers may be used to engineer appropriate amounts of flexibility.
  • the linkers are short, e.g., 1-20 nucleotides or amino acids or even more and are typically flexible.
  • Amino acid linkers commonly used consist of a number of glycine, serine, and optionally alanine, in any order.
  • linkers usually have a length of at least any one of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, or 20 amino acids, as required.
  • the linker comprises 1 to 12 amino acid residues, preferably it is a short linker.
  • the linker is a GS, GSG, GGSGG (SEQ ID NO:80), GSAGSAAGSG (SEQ ID NO:81 ), (GS) n , GSGSGSG (SEQ ID NO:82), GSG or GGGGS (SEQ ID NO:83) linker or any combination thereof.
  • the linker comprises one or more units, repeats or copies of a motif, such as for example GS, GSG or G4S.
  • the linker may also consist of a series of the same or different amino acids.
  • the fusion protein described herein comprising at least one POI and the peptide tag of the invention, is cloned into an expression vector under operable linkage to a promoter.
  • Said expression vector is integrated into a host cell and the host cell is cultured under conditions allowing expression of the fusion protein, optionally following a growth phase for the accumulation of biomass before the recombinant protein is expressed. Integration may be stable integration into the genome or chromosome of the host cell or may be transient integration using plasmid(s).
  • the POI may be produced employing a fed-batch process as described herein, comprising an expression phase as described herein and optionally a growth phase as described herein.
  • the fusion protein comprising a protease recognition/ cleavage site, is contacted with a protease after expression, to cleave off the peptide tag and produce a POI comprising the desired N-terminus, i.e. the natural or designed N-termi- nus without any unwanted amino acids attached.
  • the fusion protein is contacted with the protease enzyme after isolation of the fusion protein from the host cell culture.
  • the protease recognition site is recognized by a caspase, preferably caspase-2 and the fusion protein is contacted with the corresponding caspase, preferably caspase-2 or a variant thereof such as circularly permuted (cp) caspase-2 or mutated variants thereof having a higher P1 'prime tolerance.
  • a caspase preferably caspase-2
  • the fusion protein is contacted with the corresponding caspase, preferably caspase-2 or a variant thereof such as circularly permuted (cp) caspase-2 or mutated variants thereof having a higher P1 'prime tolerance.
  • the POI may be further modified, purified and/or formulated.
  • the term “modifying the at least one protein of interest” is meant that the POI may be chemically, physically or enzymatically modified.
  • proteins can be coupled to carbohydrates or lipids.
  • the POI may be PEGylated (the POI chemically coupled to polyethylenglycole) or HESylated (the POI is chemically coupled to hydroxyethyl starch) for half-life extension.
  • the POI may also be coupled with other moieties such as affinity domains for e.g. human serum albumin for half-life extension.
  • the POI also may be treated by a protease, e.g.
  • the POI may also be coupled to other moieties such as toxins, radioactive moieties or any other moiety.
  • the POI may further be treated under conditions to form dimers, trimers and the like.
  • the term “formulating the at least one protein of interest” refers to bringing the POI to conditions, where the POI can be stored for a longer time and/or for optimized pharmaceutical form and/or pharmaceutical application and/or to better adjust the concentration and/or to provide higher concentrations in liquid formulations.
  • Many different methods known in the art are available to stabilize proteins. By exchanging the buffer in which the POI is existent after purification and I or modification, the POI can be brought under conditions, where it is more stable. Different buffer substances and additives, such as sucrose, mild detergents, stabilizer and the like, known in the art can be used.
  • the POI can also be stabilized by lyophyliza- tion. For some POIs formulations can be done by formation of complexes of the POI with lipids or lipoproteins, such as polyplexes, and the like. Some protein may be co-formulated with other proteins.
  • heterologous compounds refers to a compound which is either foreign, i.e. “exogenous”, such as not found in nature, to a given host cell; or that is naturally found in a given host cell, e.g., is “endogenous”, however, in the context of a heterologous construct, e.g., employing a heterologous nucleic acid, thus “not naturally-occurring”.
  • the heterologous nucleotide sequence as found endogenously may also be produced in an unnatural, e.g., greater than expected or greater than naturally found, amount in the cell.
  • heterologous nucleotide sequence or a nucleic acid comprising the heterologous nucleotide sequence, possibly differs in sequence from the endogenous nucleotide sequence but encodes the same protein as found endogenously.
  • heterologous nucleotide sequences are those not found in the same relationship to a host cell in nature (i.e., “not natively associated”). Any recombinant or artificial nucleotide sequence is understood to be heterologous.
  • Recombinant host cells according to the present invention can be obtained by introducing a vector or plasmid (such as an expression vector as mentioned above) comprising the target polynucleotide sequences, including the tag sequences described herein, into the cells.
  • a vector or plasmid such as an expression vector as mentioned above
  • Techniques for transfecting or transforming eukaryotic cells or transforming prokaryotic cells are well known in the art. These can include lipid vesicle mediated uptake, heat shock mediated uptake, calcium phosphate mediated transfection (calcium phosphate/DNA co-precipitation), viral infection, particularly using modified viruses such as, for example, modified adenoviruses, microinjection and electroporation.
  • Recombinant host cells according to the present invention may also comprise tag sequences described herein stably integrated into their genome.
  • a foreign or target polynucleotide such as the nucleic acid sequence encoding a peptide tag as described herein can be inserted into the chromosome by various means, e.g., by homologous recombination or by using a hybrid recombinase that specifically targets sequences at the integration sites.
  • the foreign or target polynucleotide comprising a tag sequence as described herein and/or a GOI is typically present in a vector (“inserting/integration vector”), preferably within an expression cassette. These vectors are typically circular and linearized before they are used for homologous recombination.
  • the foreign or target polynucleotides may be DNA fragments joined by fusion PCR or synthetically constructed DNA fragments which are then recombined into the host cell.
  • the vectors may also contain markers suitable for selection or screening, an origin of replication, and other elements. It is also possible to use heterologous recombination which results in random or non-targeted integration. Heterologous recombination refers to recombination between DNA molecules with significantly different sequences. Methods of recombinations are known in the art and for example described in Boer et al., Appl Microbiol Biotechnol (2007) 77:513-523. One may also refer to Principles of Gene Manipulation and Genomics by Primrose and Twyman (7 th edition, Blackwell Publishing 2006) for genetic manipulation of yeast cells.
  • culturing said host cell under conditions permitting” expression of a target protein refers to maintaining and/or growing host cells under conditions (e.g., but not limited to temperature, aeration, pressure, pH, induction, growth rate, culture medium, nutrients duration of the cultivation, mode of nutrient feed(s) etc.) appropriate or sufficient to obtain production of the desired compound (fusion protein, POI).
  • conditions e.g., but not limited to temperature, aeration, pressure, pH, induction, growth rate, culture medium, nutrients duration of the cultivation, mode of nutrient feed(s) etc.
  • a host cell may preferably first be cultivated at conditions to grow efficiently to a large cell number without the burden of expressing a protein.
  • suitable cultivation conditions are selected and optimized to produce the POI.
  • An inducible promoter may be used that becomes activated as soon as an inductive stimulus is applied to direct transcription of the gene under its control.
  • An inductive stimulus is preferably the addition of an appropriate agents (e.g. methanol for the AOX-promoter or IPTG or lactose for promoters under the control of lac operators (binding sites for the lac repressor) such as the lac, T7, or A1 promoters) or the depletion of an appropriate nutrient (e.g., methionine for the MET3-promoter or phosphate for the phoA promoter).
  • an appropriate agents e.g. methanol for the AOX-promoter or IPTG or lactose for promoters under the control of lac operators (binding sites for the lac repressor) such as the lac, T7, or A1 promoters
  • an appropriate nutrient e.g., methionine for the MET3-promoter or phosphate for the phoA promoter
  • the addition of ethanol, methylamine, cadmium or copper as well as heat or an osmotic pressure increasing agent can induce the expression depending on the promotors operably linked to the proteins of the invention and the POI(s).
  • Some inducible or de-repressible promoters can be induced by depletion of a certain medium component or metabolite, such as phosphate for the phoA promoter.
  • An inductive stimulus can also be a pH, temperature or osmotic pressure change.
  • Some promoters are activated by a combination of an inductive stimulus and de-repression by depletion or limitation of certain metabolites.
  • the host cell(s) according to the invention in a bioreactor under optimized growth conditions to obtain a cell density of at least 1 g/L, preferably at least 10 g/L cell dry weight, more preferably at least 50 g/L cell dry weight. It is advantageous to achieve such yields of biomolecule production not only on a laboratory scale, but also on a pilot or industrial scale.
  • the host cells are cultivated in a minimal medium with a suitable carbon source, thereby further simplifying the isolation process significantly.
  • the minimal medium contains a utilizable carbon source (e.g. glucose, glycerol, ethanol or methanol), salts containing the macro elements (potassium, magnesium, calcium, ammonium, chloride, sulphate, phosphate) and trace elements (copper, iodide, manganese, molybdate, cobalt, zinc, and iron salts, and boric acid).
  • a utilizable carbon source e.g. glucose, glycerol, ethanol or methanol
  • salts containing the macro elements potassium, magnesium, calcium, ammonium, chloride, sulphate, phosphate
  • trace elements copper, iodide, manganese, molybdate, cobalt, zinc, and iron salts, and boric acid.
  • the cells may be transformed with one or more of the above-described expression vector(s) and cultured in conventional nutrient media, optionally modified as appropriate for inducing promoters, selecting transformants or amplifying the genes encoding the desired sequences.
  • conventional nutrient media optionally modified as appropriate for inducing promoters, selecting transformants or amplifying the genes encoding the desired sequences.
  • a number of minimal media suitable for the growth of yeast are known in the art. Any of these media may be supplemented as necessary with salts (such as sodium chloride, calcium, magnesium, and phosphate), buffers (such as HEPES, citric acid and phosphate buffer), nucleosides (such as adenosine and thymidine), antibiotics, trace elements, vitamins, and glucose or an equivalent energy source. Any other necessary supplements may also be included at appropriate concentrations that would be known to those skilled in the art.
  • the culture conditions such as temperature, pH and the like, are those previously used with the host cell selected for expression and are known to the ordinarily skilled artisan. Cell culture conditions for other type of host cells are also known and can be readily determined by the artisan. Descriptions of culture media for various microorganisms are for example contained in the handbook "Manual of Methods for General Bacteriology” of the American Society for Bacteriology (Washington D.C, USA, 1981 ).
  • Host cells can be cultured (e.g., maintained and/or grown) in liquid media and preferably are cultured, either continuously or intermittently, by conventional culturing methods such as standing culture, test tube culture, shaking culture (e.g., rotary shaking culture, shake flask culture, etc.), aeration spinner culture, or fermentation.
  • cells are cultured in shake flasks or deep well plates.
  • cells are cultured in a bioreactor (e.g., in a bioreactor cultivation process). Cultivation processes include, but are not limited to, batch, fed-batch and continuous methods of cultivation.
  • batch process and “batch cultivation” refer to a closed system in which the composition of media, nutrients, supplemental additives and the like is set at the beginning of the cultivation and not subject to alteration during the cultivation; however, attempts may be made to control such factors as pH and oxygen concentration to prevent excess media acidification and/or cell death.
  • fed-batch process and “fed-batch cultivation” refer to a batch cultivation with the exception that one or more substrates and/or supplements and/or inducers are added (e.g., added in increments or continuously) as the cultivation progresses in one or more feed solutions/suspensions.
  • continuous process and “continuous cultivation” refer to a system in which a defined cultivation media is added continuously to a bioreactor and an equal amount of used or “conditioned” media is simultaneously removed, for example, for recovery of the desired product.
  • conditioned media is simultaneously removed, for example, for recovery of the desired product.
  • host cells are cultured for about 12 to 24 hours, in other embodiments, host cells are cultured for about 24 to 36 hours, about 36 to 48 hours, about 48 to 72 hours, about 72 to 96 hours, about 96 to 120 hours, about 120 to 144 hours, or for a duration greater than 144 hours. In yet other embodiments, culturing is continued for a time sufficient to reach desirable production yields of POI.
  • the above-mentioned methods may further comprise a step of isolating the expressed at least one target protein, e.g. the POI or fusion protein described herein, from the cell culture and optionally followed by a step of purifying the at least one POI or fusion protein.
  • the target protein e.g. the POI or fusion protein described herein
  • it can be isolated and then purified from the culture medium using state of the art techniques. Secretion of the target protein from the cells is generally preferred, since the products are recovered from the culture supernatant rather than from the complex mixture of proteins that results when cells are disrupted to release intracellular proteins.
  • a protease inhibitor such as phenyl methyl sulfonyl fluoride (PMSF) or any other means may be useful to inhibit proteolytic degradation during purification, and antibiotics may be included to prevent the growth of adventitious contaminants.
  • the composition may be concentrated, filtered, dialyzed, etc., using methods known in the art.
  • the cell culture after fermentation I cultivation can be centrifuged using a separator or a tube centrifuge to separate the cells from the culture supernatant or the cells can be separated from the supernatant by filtration, e.g. depth or tangential flow filtration.
  • the supernatant can then be filtered and/or concentrated by using a tangential flow filtration.
  • cultured host cells may also be ruptured sonically or mechanically (e.g. high pressure homogenisation), enzymatically or chemically or by heat to obtain a cell extract containing the desired POI, from which the POI may be isolated and purified.
  • the step of “removing the peptide tag”, may be effected by cleavage of the tag using a protease as described herein, such as for examples a cp caspase-2.
  • Isolation and purification methods also referred to as methods of recovery, for obtaining the POI, including “recovering” and/or “purifying” the POI, may be based on methods utilizing difference in solubility, such as salting out, solvent precipitation, heat precipitation, methods utilizing difference in molecular weight, such as size exclusion chromatography, ultrafiltration and gel electrophoresis, methods utilizing difference in electric charge, such as ion-exchange chromatography, methods utilizing specific affinity, such as affinity chromatography, methods utilizing difference in hydrophobicity, such as hydrophobic interaction chromatography and reverse phase high performance liquid chromatography, methods utilizing difference in isoelectric point, such as isoelectric focusing may be used and methods utilizing certain amino acids, such as IMAC (immobilized metal ion affinity chromatography.
  • IMAC immobilized metal ion affinity chromatography.
  • the solubilized Inclusion Bodies need to be refolded and may be purified.
  • a His-tag such as e.g. the 6 His-tag, which can be included in the peptide tag described herein
  • IMAC may be used for capturing and purification of the fusion protein. After cleavage of the fusion protein to release the POI, the POI can be separated by IMAC in the flow through mode from the cleaved tag, un-cleaved fusion protein and the protease, if the protease also comprised a His-tag.
  • the isolated and purified POI can be identified by conventional methods such as Western Blotting or specific assays for POI activity.
  • the structure of the purified POI can be determined by amino acid analysis, amino-terminal peptide sequencing, primary structure analysis for example by mass spectrometry, RP-HPLC, ion ex- change-HPLC, ELISA and the like. It is preferred that the POI is obtainable in large amounts and in a high purity level, thus meeting the necessary requirements for being used as an active ingredient in pharmaceutical compositions or as feed or food additive.
  • a peptide tag for expression of a recombinant protein of interest (POI) in a host cell comprising a first peptide sequence consisting of an amino acid sequence selected from SEQ ID NO:1 or SEQ ID NO:2.
  • the peptide tag of item 1 comprising a second peptide sequence comprising an amino acid sequence selected from SEQ ID NO:5, SEQ ID NO: 10 or SEQ ID NO:21 , preferably wherein said second peptide sequence is C-terminal of the first peptide sequence.
  • the peptide tag of item 1 or 2 comprising at least two first peptide sequences, each consisting of an amino acid sequence selected from SEQ ID NO:1 or SEQ ID NO:2, preferably wherein said at least two first peptide sequences are directly linked.
  • the peptide tag of any one of items 1 to 5 further comprising a solubility and/or expression enhancement tag, a monitoring tag and/or an affinity tag.
  • the peptide tag of any one of items 1 to 6 further comprising a protease recognition and/or cleavage site, preferably at its C-terminus.
  • the host cell is a eukaryotic or prokaryotic host cell, preferably a yeast cell, a mammalian cell or a bacterial cell.
  • the peptide tag of item 8 wherein the host cell is a mammalian host cell, preferably a CHO cell.
  • the peptide tag of any one of items 1 to 8, comprising an amino acid sequence selected from the group consisting of SEQ ID NO:11 , SEQ ID NO: 12, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:32, SEQ ID NO:35, SEQ ID NO:42 and SEQ ID NO:47, or a functionally active variant thereof comprising an amino acid sequence having at least 95% sequence identity to SEQ ID NO: 10, SEQ ID NO:21 , SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87 or SEQ ID NO:88 and comprising SEQ ID NO:1 or SEQ ID NO:2, preferably at its N-terminus.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:11 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 and SEQ ID NO:16.
  • the peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO: 12 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19 and SEQ ID NQ:20.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:23 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:28, SEQ ID NO:29, SEQ ID NQ:30 and SEQ ID NO:31.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:32 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:38, and SEQ ID NO:39.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:35 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:36, SEQ ID NO:37, SEQ ID NQ:40, and SEQ ID NO:41.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:42 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, and SEQ ID NO:46.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:47 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:48, SEQ ID NO:49, SEQ ID NQ:50, and SEQ ID NO:51.
  • POI protein of interest
  • sequence encoding the peptide tag is placed immediately after the start codon; and e. the sequence encoding the POI, preferably wherein said sequence encoding the POI is placed immediately after the sequence encoding the peptide tag.
  • An expression vector for expression of a recombinant protein of interest (POI) as a fusion protein comprising: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. a nucleic acid sequence encoding the peptide tag of any one of items 1 to 18, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. a cloning site, preferably placed immediately downstream of the sequence encoding the peptide tag.
  • POI recombinant protein of interest
  • POI protein of interest
  • sequence encoding the peptide tag is placed immediately after the start codon; and e. the sequence encoding the POI, preferably wherein said sequence encoding the POI is placed immediately after the sequence encoding the peptide tag.
  • An expression cassette for expression of a recombinant protein of interest (POI) as a fusion protein comprising: a. a promoter; b. a ribosome binding site (RBS); c. a start codon; d. a nucleic acid sequence encoding the peptide tag of any one of items 1 to 18, preferably wherein said sequence encoding the peptide tag is placed immediately after the start codon; and e. a cloning site, preferably placed immediately downstream of the sequence encoding the peptide tag.
  • POI recombinant protein of interest
  • a fusion protein comprising a recombinant protein of interest (POI) and the peptide tag of any one of items 1 to 18, wherein the peptide tag of any one of items 1 to 18 is fused to the recombinant POI, preferably at the N-terminus of the recombinant protein.
  • POI protein of interest
  • a method of producing a recombinant protein of interest comprising expressing the POI in the form of a fusion protein comprising the peptide tag of any one of items 1 to 18 fused to one terminus of said POI, by a prokaryotic or eukaryotic expression system, preferably by a bacterial, yeast or mammalian cell expression system.
  • yeast expression system is an expression system using yeast cells of the genus Komagataella or Saccharomyces.
  • any one of items 30 to 33 wherein said fusion protein is expressed from an expression vector, preferably a plasmid, within a host cell or from an expression cassette stably integrated into the genome or chromosome of a host cell.
  • 35 The method of any one of items 30 to 34, comprising culturing the host cell for a period of time under conditions permitting expression of the POI in the form of a fusion protein and comprising any one or more of the steps of: a. recovering the fusion protein and/or POI; b. purifying the fusion protein and/or POI; c. removing the peptide tag; d. further purifying the fusion protein and/or POI; e. formulating the fusion protein and/or POI; f. modifying the fusion protein and/or POI.
  • a peptide tag for expression of a recombinant protein of interest (POI) in a host cell comprising an amino acid sequence of SEQ ID NO:84, wherein a. X at position 2 is L or H; b. X at position 3 is V or S; c. X at position 16 is E or Q; and d. X at position 18 is E or Q.
  • POI protein of interest
  • a host cell comprising the peptide tag of any one of items 1 to 18 and/or a nucleic acid sequence encoding said peptide tag.
  • a host cell comprising a fusion protein comprising the peptide tag of any one of items 1 to 18 and a protein of interest (POI) and/or a nucleic acid sequence encoding said fusion protein.
  • POI protein of interest
  • a host cell comprising the expression vector of any one of items 19 to 23 or the expression cassette of any one of items 24 to 29.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:111 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:112, SEQ ID NO:144, SEQ ID NO:145 and SEQ ID NO:146.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO:113 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO:114, SEQ ID NO:147, SEQ ID NO:148 and SEQ ID NO:149.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO: 150 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO: 151 , SEQ ID NO: 152, SEQ ID NO:153, and SEQ ID NO:154.
  • peptide tag of item 9 wherein the peptide tag comprises the amino acid sequence SEQ ID NO: 155 and is encoded by a nucleotide sequence selected from the group consisting of SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO:158, and SEQ ID NO:159.
  • POI protein of interest
  • the expression vector of item 46 wherein the nucleic acid sequence encoding the peptide tag of any one of items 42 to 45 is selected from the group consisting of SEQ ID NO: 112 and 144-146, SEQ ID NO: 113 and 147-149, SEQ ID NO:151 -154 and SEQ ID NO:156-159.
  • POI protein of interest
  • the expression cassette of item 48, wherein the nucleic acid sequence encoding the peptide tag of any one of items 42 to 45 is selected from the group consisting of SEQ ID NO:112 and 144-146, SEQ ID NO: 114 and 147-149, SEQ ID NO:151 -154 and SEQ ID NO:156-159. 50.
  • Expression vectors were created using backbones from existing pET30acer plasmids from a previous study by Kdppl et al. 2022.
  • Q5® High-Fidelity DNA Polymerase, Bsal-HF®v2, Dpnl and T4 DNA Ligase were purchased from NEB. Construction of the vector plasmids followed a standard cloning protocol. In brief, site- directed mutagenesis was carried out utilizing primers carrying the desired mutations.
  • a polymerase chain reaction (PCR) was performed employing Q5® High-Fidelity DNA Polymerase to amplify the original plasmid backbone and simultaneously insert the wanted mutations.
  • the linear DNA fragment was purified using the Monarch® PCR & DNA Cleanup Kit (NEB) followed by restriction digest with Bsal-HF®v2 and Dpnl (37°C, overnight) to create specific overhangs for the following ligation step and cut the template plasmid used for PCR amplification.
  • NEB Monarch® PCR & DNA Cleanup Kit
  • restriction digest with Bsal-HF®v2 and Dpnl (37°C, overnight
  • the template plasmid used for PCR amplification.
  • preparative agarose gel 2% agarose
  • the correctly sized band was excised and dissolved using the Monarch® DNA Gel Extraction Kit (NEB).
  • Ligation to generate circular plasmid DNA was performed at 16 °C overnight using T4 DNA Ligase (NEB) and the ligation mix was directly transformed into chemically competent cells according to the manufacturer’s instructions.
  • Cell lysis and separation of the soluble und insoluble intracellular protein fractions was carried out according to Fink et al. (2019) [7], Additionally, the cell lysis buffer contained 4 mmol L -1 of NuPAGE Sample Reducing Agent (10x) (Invitrogen, Waltham, MA, USA).
  • the CASPONTM tag comprises SEQ ID NO:52 (here shown with a starting methionine), including a T7AC tag at its N-terminus.
  • SEQ ID NO:1 two novel peptide sequences, named DBv1 (SEQ ID NO:1 ) and DBv2 (SEQ ID NO:2), were incorporated at the N-terminus of the T7AC tag at amino acid positions +2 to +4 (see Figure 1 ).
  • performance of the novel peptide tags was assessed. The resulting amino acid and nucleotide sequences of the tags are shown in Table 3.
  • T7AC-tag means the T7AC-tag encoded by SEQ ID NO:53; T7ACnew-tag means the T7AC-tag encoded by SEQ ID NO:54)
  • the free folding energy at the mRNA nucleotide positions -4 to +37 of the tag sequences was calculated using the method developed by Zuker et al. 1980, resulting in e.g. a AGmin of -7.50 kcal/mol for T7AC and -1 .50 kcal/mol for T7ACnew (see Fig. 2).
  • the different tag sequences were then introduced into the CASPONTM tag, fused to two different pharmaceutically relevant model proteins: parathyroid hormone (PTH) and human fibroblast growth factor 2 (hFGF2) tested in a series of carbon limited fed-batch fermentations.
  • PTH parathyroid hormone
  • hFGF2 human fibroblast growth factor 2
  • Recombinant protein formation was higher with the DBv1 -T7AC-tag and DBv2- T7AC-tag compared to the T7AC-tag for every tested model protein. The same has been observed with the DBv1 -T7ACnew and DBv2-T7ACnew variants compared to the T7ACnew variant (see Table 5).
  • the comparative increase in recombinant soluble protein titre at the end of fermentation is dependent on the fusion tag variant used. With PTH increases in specific titre of 1.26-fold and 3.14-fold have been observed for DBv1 -T7AC and DBv2-T7AC, respectively, compared to the T7AC-tag.
  • the DB-T7ACnew variants increased expression 2.39-fold and 2.01 -fold with constructs DBv1-T7ACnew and DBv2-T7ACnew compared to T7ACnew.
  • hFGF2 showed an increase in titre of 1 .82-fold and 1 .75-fold comparing DBv2-T7AC with T7AC and DBv2-T7ACnew with T7ACnew, respectively.
  • Measured specific and volumetric recombinant protein titres are summarized in Table 5 above.
  • AG is the free folding energy. Lower AG values (lower free folding energy) mean more tightly folded mRNA, whereas higher AG values mean less folding. At a value of 0 kcal/mol, there is no folding present on the mRNA. Previously, it was described that a high free folding energy (AGmin) would aid in protein production and lead to higher titers, because of better accessibility of the mRNA for the ribosome. Tuller et al., for example, describe that there is a global selection for non-folding structures at the beginning of E. coli and S.
  • Example 3 Novel peptide tags enhance production of hFGF-2, PTH and
  • the POI production using peptide tags comprising the newly identified peptide sequences DBv1 (SEQ ID NO: 1 ) or DBv2 (SEQ ID NO: 2) of the present invention is compared to the POI production using the T7AC - tag (SEQ ID NO: 79) or the LED-PERNKERKE - tag (SEQ ID NO: 109) in high cell density fermentations.
  • the T7AC tag and the N-terminal part of the T7AC-tag, LED-PERNKERKE comprise the amino acids LED (SEQ ID NO: 139) at their N- termini.
  • the influence of the peptides, DBv1 (PLV; SEQ ID NO: 1 ) or DBV2 (PHS; SEQ ID NO: 2) of the present invention on the titer of the POIs is tested by substituting LED (SEQ ID NO: 139) by the peptides DBv1 (PLV; SEQ ID NO: 1 ) or DBv2 (PHS; SEQ ID NO: 2) in the T7AC - and the LED-PERNKERKE - tags (SEQ ID NO: 79 and SEQ ID NO: 109, respectively), resulting in the following tags: Table 6: amino acid sequences of the tags used for example 3 and nucleic acids encoding them.
  • Table 8 Amino acid and nucleic acid sequences of the tested POI fusion proteins. See Fig. 9 for the respective sequences.
  • the nucleic acids encoding the POI fusion proteins were synthesized and cloned into the respective expression vectors (pET30a-cer derivative). The correct sequences of all expression constructs were confirmed by sequencing.
  • the expression vectors were transformed into E. coli BL21 (DE3). A research cell bank was produced. The expression of the POI-fusions was under the control of the T7 promoter system and the tZenit terminator (Mairhofer 2014).
  • All production strains were cultivated in a 1 L Multifors bioreactor with a working volume of 0.7 L and a starting batch volume of 0.4 L.
  • the preculture was inoculated with research cell bank and performed in shake flask containing preculture medium at 37 °C.
  • the batch medium in the 1 L bioreactor was inoculated with preculture in a ratio of 1 : 100, when the OD550 in the preculture reached a value of 2.
  • Temperature, pH (by addition of an ammonia solution), and dissolved oxygen (pO2; by stirrer speed and increasing the concentration of O2 in the in-air) in the bioreactor were controlled.
  • the pO2 was kept constant at 20 %, the aeration at 1 vvm and pressure at 1 bar.
  • the WCW [g/L] (wet cell weight) was measured by weighing the pellet of 1 mL of culture broth after centrifugation.
  • the pellets from the fermentation broth samples were resuspended in Bug- BusterTM from Novagen (including 1 pl LysonaseTM from Novagen) by vortexing until the pellet has been completely resuspended and centrifuged after incubation for 15 minutes. The supernatant was used for the determination of the titer of the intracellular soluble POI fusion protein.
  • the titer was determined using SDS-Page.
  • scFvM single chain antibody

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

L'invention concerne une étiquette peptidique pour l'expression d'une protéine recombinante d'intérêt (POI) dans une cellule hôte, comprenant une première séquence peptidique constituée d'une séquence d'acides aminés choisie parmi SEQ ID NO : 1 ou SEQ ID NO : 2.
PCT/EP2025/052302 2024-02-01 2025-01-30 Étiquettes pour l'expression améliorée de protéines recombinantes Pending WO2025163019A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP24155373 2024-02-01
EP24155373.4 2024-02-01

Publications (1)

Publication Number Publication Date
WO2025163019A1 true WO2025163019A1 (fr) 2025-08-07

Family

ID=89834444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2025/052302 Pending WO2025163019A1 (fr) 2024-02-01 2025-01-30 Étiquettes pour l'expression améliorée de protéines recombinantes

Country Status (1)

Country Link
WO (1) WO2025163019A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US908A (en) 1838-09-08 Spring-draft and bumper for railroad-cars
US8535A (en) 1851-11-18 Stove-geate
WO2006089329A2 (fr) 2005-02-23 2006-08-31 Technische Universität Graz Promoteurs mutants d'aox 1
US20120264161A1 (en) * 2007-03-26 2012-10-18 Dako Denmark A/S Mhc peptide complexes and uses thereof in infectious diseases
US20200048634A1 (en) * 2018-08-09 2020-02-13 Washington University Methods to modulate protein translation efficiency
WO2021028590A1 (fr) * 2019-08-14 2021-02-18 Boehringer Ingelheim Rcv Gmbh & Co Kg Variants de caspase 2
WO2023008415A1 (fr) * 2021-07-27 2023-02-02 STAND Therapeutics株式会社 Marqueur peptidique et acide nucléique codant pour celui-ci

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US908A (en) 1838-09-08 Spring-draft and bumper for railroad-cars
US8535A (en) 1851-11-18 Stove-geate
WO2006089329A2 (fr) 2005-02-23 2006-08-31 Technische Universität Graz Promoteurs mutants d'aox 1
US20120264161A1 (en) * 2007-03-26 2012-10-18 Dako Denmark A/S Mhc peptide complexes and uses thereof in infectious diseases
US20200048634A1 (en) * 2018-08-09 2020-02-13 Washington University Methods to modulate protein translation efficiency
WO2021028590A1 (fr) * 2019-08-14 2021-02-18 Boehringer Ingelheim Rcv Gmbh & Co Kg Variants de caspase 2
WO2023008415A1 (fr) * 2021-07-27 2023-02-02 STAND Therapeutics株式会社 Marqueur peptidique et acide nucléique codant pour celui-ci

Non-Patent Citations (20)

* Cited by examiner, † Cited by third party
Title
BAESHEN MNAL-HEJIN AMBORA RSAHMED MMRAMADAN HASAINI KSBAESHEN NAREDWAN EM: "Production of Biopharmaceuticals in E. coli: Current Scenario and Future Perspectives.", J MICROBIOL BIOTECHNOL, vol. 25, 2015, pages 953 - 962
BOER ET AL., APPL MICROBIOL BIOTECHNOL, vol. 77, 2007, pages 513 - 523
CSERJAN-PUSCHMANN, M. ET AL.: "Production of Circularly Permuted Caspase-2 for Affinity Fusion-Tag Removal: Cloning, Expression in Escherichia coli, Purification, and Characterization", BIOMOLECULES, vol. 12, 2020, pages 10
FINK, M. ET AL.: "Integrated process development: The key to improve Fab production in E. coli.", BIOTECHNOLOGY JOURNAL, vol. 16, no. 6, 2021, pages 2000562
FINK, M. ET AL.: "Microbioreactor Cultivations of Fab-Producing Escherichia coli Reveal Genome-Integrated Systems as Suitable for Prospective Studies on Direct Fab Expression Effects.", BIOTECHNOLOGY JOURNAL, vol. 14, no. 11, 2019, pages 1800637
FINK, M.: "High-throughput microbioreactor provides a capable tool for early stage bioprocess development.", SCIENTIFIC REPORTS, vol. 11, no. 1, 2021, pages 2056
JANEWAY ET AL.: "Immunobiology", 2001, GARLAND SCIENCE
KEOWN ET AL., PROCESSES IN ENZYMOLOGY, vol. 185, 1990, pages 527 - 537
KÖPPL, C. ET AL.: "Fusion Tag Design Influences Soluble Recombinant Protein Production in Escherichia coli.", INT J MOL SCI, vol. 14, 2022, pages 23
K�PPL CHRISTOPH ET AL: "Fusion Tag Design Influences Soluble Recombinant Protein Production in Escherichia coli", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 23, no. 14, 12 July 2022 (2022-07-12), Basel, CH, pages 7678, XP093191740, ISSN: 1661-6596, DOI: 10.3390/ijms23147678 *
LINGG N, KROΒ C, ENGELE P, OHLKNECHT C, KÖPPL C, FISCHER A, LIER B, LOIBL J,SPRENGER B, LIU J: "CASPON platform technology: Ultrafast circularly permuted caspase-2 cleaves tagged fusion proteins before all 20 natural amino acids at the N-terminus. ", NEW BIOTECHNOLOGY, vol. 71, 2022, pages 37 - 46
MAIRHOFER JUERGEN ET AL.: "Preventing T7 RNA Polymerase Read-through Transcription - A Synthetic Termination Signal Capable of Improving Bioprocess Stability.", ACS SYNTHETIC BIOLOGY., vol. 4, no. 3, 2014, pages 256 - 273
MCELWAIN LPHAIR KKEALEY CBRADY D.: "Current trends in biopharmaceuticals production in Escherichia coli.", BIOTECHNOL LETT., vol. 44, no. 8, 2022, pages 917 - 931, XP037927988, DOI: 10.1007/s10529-022-03276-5
MOLL I, HUBER M, GRILL S, SAIRAFI P, MUELLER F, BRIMACOMBE R, LONDEI P, BLASI U.: "Evidence against an Interaction between the mRNA downstream box and 16S rRNA in translation initiation.", J BACTERIOL., vol. 183, no. 11, 2001, pages 3499 - 505
O'CONNOR MASAI TSQUIRES CLDAHLBERG AE: "Enhancement of translation by the downstream box does not involve base pairing of mRNA with the penultimate stem sequence of 16S rRNA.", PROC NATL ACAD SCI U S A., vol. 96, no. 16, 1999, pages 8973 - 8
OOSTENBRINK CCSERJAN-PUSCHMANN M ET AL.: "PROFICS: A bacterial selection system for directed evolution of proteases.", JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 297, 2021, pages 101095
RICHTER LVYANG HYAZDANI MHANSON MRAHNER BA: "A downstream box fusion allows stable accumulation of a bacterial cellulase in Chlamydomonas rein-hardtii chloroplasts.", BIOTECHNOL BIOFUELS., vol. 11, 2018, pages 133
SPRENGART, M.L.E. FUCHSA.G. PORTER: "The downstream box: an efficient and independent translation initiation signal in Escherichia coli.", EMBO J, vol. 15, no. 3, 1996, pages 665 - 74
STARGARDT, P. ET AL.: "Bacteriophage Inspired Growth-Decoupled Recombinant Protein Production in Escherichia coli.", ACS SYNTHETIC BIOLOGY, vol. 9, no. 6, 2020, pages 1336 - 1348, XP055718823, DOI: 10.1021/acssynbio.0c00028
TULLER T. ET AL.: "Translation Efficiency is Determined by Both Codon Bias and Folding Energy.", PNAS, vol. 107, no. 8, 2010, pages 3645 - 3650, XP007918224, DOI: 10.1073/pnas.0909910107

Similar Documents

Publication Publication Date Title
AU2019294515B2 (en) Means and methods for increased protein expression by use of transcription factors
JP7621001B2 (ja) リシン誘導体を効率的に導入するアミノアシル-tRNAシンテターゼ
JP7246100B2 (ja) 新規融合タンパク質の調製およびそのタンパク質合成の向上における使用
AU2020242724B2 (en) Aminoacyl-tRNA synthetase for efficiently introducing lysine derivative in protein
CN113388633B (zh) 利用枯草芽孢杆菌和核酸内切酶制备人碱性成纤维细胞生长因子
CN110408636B (zh) 多重标签串联的dna序列及其在蛋白质表达纯化系统的应用
CN113481226B (zh) 信号肽相关序列及其在蛋白质合成中的应用
CN110408635B (zh) 一种含有链霉亲和素元件的核酸构建物在蛋白质表达、纯化中的应用
CN110551745A (zh) 一种多重组氨酸序列标签及其在蛋白质表达、纯化中的应用
CN113631712A (zh) 利用双质粒系统在蛋白中引入非天然氨基酸
CN111850020B (zh) 利用质粒系统在蛋白中引入非天然氨基酸
CN104387473B (zh) 用于非酶切非色谱纯化方法原核表达融合蛋白Prx的类弹性蛋白多肽ELP
JP7266325B2 (ja) 蛍光タンパク質フラグメントを含む融合タンパク質およびその用途
WO2025163019A1 (fr) Étiquettes pour l'expression améliorée de protéines recombinantes
WO2025104126A1 (fr) Marqueur pour une meilleure expression des protéines recombinées
CN103382496B (zh) 一种制备s-腺苷甲硫氨酸的方法
CN109136209B (zh) 肠激酶轻链突变体及其应用
KR20130141001A (ko) 목적 단백질의 분리 및 정제를 위한 신규한 벡터 시스템
US20090239262A1 (en) Affinity Polypeptide for Purification of Recombinant Proteins
EP4067492A1 (fr) Marqueur polypeptidique et son application dans la synthèse de protéines in vitro
RU2799794C2 (ru) АМИНОАЦИЛ-тРНК-СИНТЕТАЗА ДЛЯ ЭФФЕКТИВНОГО ВВЕДЕНИЯ ПРОИЗВОДНОГО ЛИЗИНА В БЕЛОК
CN114703168B (zh) 一种肝素酶iii
RU2790662C1 (ru) АМИНОАЦИЛ-тРНК-СИНТЕТАЗА, ЭФФЕКТИВНОЕ ВВЕДЕНИЕ ПРОИЗВОДНЫХ ЛИЗИНА
EP4530291A1 (fr) Plateforme pour exprimer une protéine d'intérêt à l'aide d'une nucléocapside virale
WO2025066687A1 (fr) Procédé d'expression et de purification pour interleukine-15 humaine recombinante

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25702549

Country of ref document: EP

Kind code of ref document: A1