[go: up one dir, main page]

WO2025058109A1 - Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same - Google Patents

Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same Download PDF

Info

Publication number
WO2025058109A1
WO2025058109A1 PCT/KR2023/013895 KR2023013895W WO2025058109A1 WO 2025058109 A1 WO2025058109 A1 WO 2025058109A1 KR 2023013895 W KR2023013895 W KR 2023013895W WO 2025058109 A1 WO2025058109 A1 WO 2025058109A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
sequence
amino acid
acid molecule
host cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2023/013895
Other languages
French (fr)
Inventor
Ye-Eun Kim
Zeewon LEE
Hanhyoung LEE
Ju Eun Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CJ CheilJedang Corp
Original Assignee
CJ CheilJedang Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CJ CheilJedang Corp filed Critical CJ CheilJedang Corp
Priority to PCT/KR2023/013895 priority Critical patent/WO2025058109A1/en
Publication of WO2025058109A1 publication Critical patent/WO2025058109A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/74Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora
    • C12N15/77Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora for Corynebacterium; for Brevibacterium
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P13/00Preparation of nitrogen-containing organic compounds
    • C12P13/04Alpha- or beta- amino acids
    • C12P13/10Citrulline; Arginine; Ornithine
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y603/00Ligases forming carbon-nitrogen bonds (6.3)
    • C12Y603/04Other carbon-nitrogen ligases (6.3.4)
    • C12Y603/04005Argininosuccinate synthase (6.3.4.5)

Definitions

  • the present disclosure relates to improved amino acid molecules, host cells expressing the amino acid molecules and use thereof to produce L-arginine, nucleic acid molecules with sequences encoding the amino acid molecule, vectors comprising nucleic acid molecules with sequences encoding the amino acid molecule, and methods of producing L-arginine and for increasing L-arginine production by a host cell expressing the amino acid molecule.
  • L-arginine is an amino acid with extensive application for medical, food, animal feed and industrial applications. L-Arginine is produced by the body and, for example, has effects as a vasodilator. L-Arginine may also be used as a feed additive. Currently, various studies are being conducted to develop host cells and fermentation process technology that produce L-arginine at a high efficiency.
  • An object of the present disclosure provides an amino acid molecule such as, for example, an amino acid molecule protein disclosure herein.
  • Another object of the present disclosure provides a host cell or microorganism expressing the amino acid molecule disclosure herein.
  • Another object of the present disclosure provides a nucleic acid molecule comprising a nucleic acid sequence encoding the amino acid molecule disclosed herein.
  • Another object of the present disclosure provides a vector comprising a nucleic acid molecule disclosed herein.
  • Another object of the present disclosure provides an engineered host cell expressing a recombinant amino acid molecule, including a recombinant protein that is an argininosuccinate synthase.
  • Another object of the present disclosure provides use of the engineered host cell or microorganism disclosed herein for producing L-arginine.
  • Another object of the present disclosure provides a method for producing L-arginine disclosed herein.
  • Figure 1 depicts a protein sequence alignment comparing alignment of exemplary argininosuccinate synthase proteins between E. coli and Corynebacterium glutamicum to determine amino acid residues essential for binding to substrate of argininosuccinate synthase.
  • Figures 2a to 2c depict a sequence alignment carried out on exemplary argininosuccinate synthase proteins of different heterogeneous model microorganism species to determine the non-conserved structures of the substrate binding site.
  • Figure 3 depicts an exemplary tertiary structure of an argininosuccinate synthase protein without the amino acid extension (derived from C. glutamicum ) overlayed with a tertiary structure of an argininosuccinate synthase protein with the amino acid extension (derived from E. coli ) to compare the structures and to predict the role of the amino acid extension in terms of protein function.
  • Figures 4a to 4c list identified exemplary motifs amongst the argininosuccinate synthase proteins of the heterogeneous model microorganisms for which sequence alignment was carried out, along with generalized versions of the motif sequences.
  • Figures 5a to 5f depict the exemplary DNA sequences for the argG gene encoding the argininosuccinate synthase proteins of different heterogeneous model microorganism species.
  • Figure 6 depicts the exemplary DNA sequences for the argR deletion mutation and the argB (M54V) mutation of the argG gene encoding argininosuccinate synthase protein.
  • amino acid molecules including argininosuccinate synthase proteins, host cells, nucleic acid molecules, vectors, and methods described herein may employ, unless otherwise indicated, conventional techniques and descriptions of molecular biology (including recombinant techniques), cell biology, biochemistry, and microarray and sequencing technology, which are within the skill of those who practice in the art.
  • conventional techniques include polymerase chain reaction (PCR), protein sequence alignment, and sequencing of proteins and oligonucleotides.
  • the present disclosure relates to an improved amino acid molecule including, for example, an improved argininosuccinate synthase protein, which is an enzyme responsible for the rate-limiting reaction step in the process of producing L-arginine.
  • Argininosuccinate synthase catalyzes synthesis of argininosuccinate from substrates citrulline and aspartate. Argininosuccinate is then cleaved by the action of argininosuccinate lyase into fumaric acid and L-arginine.
  • L-arginine is an amino acid and is necessary for the production for nitric oxide, which regulates vasodilation, vascular tone, and blood flow.
  • L-arginine is recognized for its effects as a vasodilator, and has been studied as an option for treating high blood pressure, angina, and erectile dysfunction.
  • the present disclosure further relates to recombinant and non-naturally occurring amino acid molecules including argininosuccinate synthase proteins, engineered host cells capable of expressing the amino acid molecules and producing L-arginine, recombinant nucleic acid molecules and vectors for expressing the amino acid molecules, and use and methods for employing these proteins, host cells, vectors, and nucleic acid molecules to produce L-arginine or to increase production of L-arginine.
  • nucleic acid or protein refers to one, more than one, or mixtures of such regions
  • an assay may include reference to equivalent steps and methods known to those skilled in the art, and so forth.
  • heterologous when used with reference to portions of a nucleic acid or protein indicates that the nucleic acid or protein comprises two or more subsequences that are not found in the same relationship to each other in nature.
  • the nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a promoter from one source and a coding region from another source, or coding regions from different sources.
  • a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature.
  • conservative amino acid substitutions means amino acid sequence modifications which do not abrogate the function or a binding cite of an enzyme.
  • Conservative amino acid substitutions include the substitution of an amino acid in one class by an amino acid of the same class, where a class is defined by common physicochemical amino acid side chain properties and high substitution frequencies in homologous proteins found in nature, as determined, for example, by a standard Dayhoff frequency exchange matrix or BLOSUM matrix.
  • the term "variant" encompasses but is not limited to enzymes which comprise an amino acid sequence which differs from the amino acid sequence of a reference protein by way of one or more substitutions, deletions and/or additions at certain positions within or adjacent to the amino acid sequence of the reference protein.
  • the variant may comprise one or more conservative substitutions in its amino acid sequence as compared to the amino acid sequence of a reference antibody. Conservative substitutions may involve, e.g., the substitution of similarly charged or uncharged amino acids.
  • the variant retains the ability, for example, to specifically bind to a substrate of the reference protein.
  • the term variant also includes pegylated proteins.
  • Nucleic acid sequences implicitly encompass conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. Batzer, et al., Nucleic Acid Res. 1991, 19, 5081; Ohtsuka, et al., J. Biol. Chem. 1985, 260, 2605-2608; Rossolini, et al., Mol. Cell. Probes 1994, 8, 91-98. The term nucleic acid is used interchangeably with cDNA, mRNA, oligonucleotide, and polynucleotide.
  • homology refers to the degree of similarity between two given amino acid sequences or nucleotide sequences and may be expressed as a percentage. The terms homology and identity may often be used interchangeably.
  • the terms “correspond(s) to” and “corresponding to,” as they relate to sequence alignment, are intended to mean enumerated positions within the reference protein, and those positions in the sequence of interest that align with the positions on the reference protein.
  • the amino acids in the subject sequence that "correspond to" certain enumerated positions of the reference sequence are those that align with these positions of the reference sequence, but are not necessarily in these exact numerical positions of the reference sequence.
  • amino acid molecule generally refer to a chain of amino acids that are held together by peptide bonds (also called amide bonds).
  • amino acid refers to an organic molecule that contains both an amino group (i.e., -NH 2 ) and a carboxylic acid group (i.e., -COOH).
  • nucleotide molecule refers to a DNA or RNA strand of a certain length or longer as a polymer of nucleotides in which nucleotide monomers are linked in a long chain shape by covalent bonds, more specifically a polynucleotide fragment encoding the protein.
  • the term "enhancement" of polypeptide activity means that the activity of a polypeptide is increased compared to the intrinsic activity.
  • operably linked in the context of a promoter sequence, which initiates and mediates transcription of the polynucleotide encoding the target polypeptide of the present application, means that the promoter sequence and the polynucleotide sequence are functionally linked to each other.
  • the present disclosure provides an amino acid molecule such as, for example, an amino acid molecule protein.
  • the amino acid molecule may comprise a peptide sequence having SEQ ID NO:1, YKPWLDX 1 X 2 FX 3 X 4 EL, in which each of X 1 , X 2 , and X 4 is an amino acid; and X 3 is I or V, is provided herein.
  • X 1 of SEQ ID NO:1 is S. In some embodiments, X 1 of SEQ ID NO:1 is Q. In some embodiments, X 1 of SEQ ID NO:1 is T.
  • X 2 of SEQ ID NO:1 is D. In some embodiments, X 2 of SEQ ID NO:1 is A. In some embodiments, X 2 of SEQ ID NO:1 is T. In some embodiments, X 2 of SEQ ID NO:1 is Q.
  • X 3 of SEQ ID NO:1 is V. In some embodiments, X 3 of SEQ ID NO:1 is I.
  • X 4 of SEQ ID NO:1 is D.
  • SEQ ID NO:1 consists of a sequence selected from the group consisting of YKPWLDSAFIDEL (SEQ ID NO: 2), YKPWLDQTFIDEL (SEQ ID NO: 3), YKPWLDQQFIDEL (SEQ ID NO: 4), and YKPWLDTDFIDEL (SEQ ID NO: 5).
  • SEQ ID NO: 1 consists of the sequence YKPWLDSAFIDEL (SEQ ID NO: 2). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDQTFIDEL (SEQ ID NO: 3). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDQQFIDEL (SEQ ID NO: 4). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDTDFIDEL (SEQ ID NO: 5).
  • amino acid molecules e.g., the amino acid molecule proteins
  • the amino acid molecules may be "enhanced" in function and may yield higher quantities of L-arginine than their naturally occurring enzyme counterparts; e.g. , they may have enhanced catalytic activity in comparison to the naturally occurring counterparts.
  • the amino acid molecules may be derived from a host cell or a microorganism.
  • the amino acid molecule may be derived from a microorganism that is modified (e.g., artificially and/or specifically genetically modified) from its naturally occurring variant or strain.
  • the host cell or the microorganism described herein may be Corynebacterium, Escherichia, Bacillus, Streptomyces, Penicillum, Klebsiella, Erwinia, or Pantoea .
  • the host cell or the microorganism may be Acidobacterium capsulatum, Alcaligenes faecalis, Bacillus amyloliquefaciens, Burkholderia pyrrocinia, Corynebacterium ammoniagenes, Corynebacterium glutamicum, Cupriavidus necator, Escherichia coli, Mycobacterium smegmatis, or Neisseria weaveri.
  • amino acid molecules described herein also may be specifically derived from a microorganism of the genus Corynebacterium , more specifically derived from Corynebacterium glutamicum, Corynebacterium deserti, Corynebacterium crenatum, Corynebacterium efficiens, Corynebacterium suranareeae , and the like, but is not limited thereto.
  • activation, enhancement, up-regulation, overexpression, and increase may include both exhibiting activity that is not originally possessed and exhibiting an improved activity compared to the intrinsic activity or activity before modification or alteration of the amino acid sequence of the polypeptide.
  • intrinsic activity refers to the activity of a specific polypeptide originally possessed by the parent strain before transformation or the unmodified host cell or microorganism when the trait is changed by genetic mutation due to natural or artificial factors.
  • introduction activity may be used interchangeably with the term "activity before modification”.
  • the activity of a polypeptide being "enhanced, up-regulated, overexpressed, or increased” compared to the intrinsic activity means that the activity of a polypeptide is improved compared to the activity and/or concentration (expression level) of a specific polypeptide originally possessed by the parent strain before transformation or unmodified host cell or microorganism.
  • the enhancement may be achieved by introducing an exogenous polypeptide or by enhancing the activity and/or concentration (expression level) of the endogenous polypeptide. Whether or not the activity of a polypeptide is enhanced may be confirmed from an increase in the activity degree or expression level of the polypeptide or the amount of product generated from the activity of polypeptide.
  • the claimed peptide and/or the enhancement in activity are not limited as long as the activity of a target polypeptide has been enhanced compared to that in the host cell or microorganism before modification.
  • the procedures used for enhancing the activity of the peptide may be modifications using genetic engineering and/or protein engineering well known to those skilled in the art, which are routinely used in molecular biology, but are not limited thereto (for example, Sitnicka et al., Functional Analysis of Genes. Advances in Cell Biology. 2010, Vol. 2. 1-16, Sambrook et al. Molecular Cloning 2012).
  • the enhancement of the polypeptide activity of the present disclosure may be achieved by:
  • the increase in the intracellular copy number of a polynucleotide encoding a polypeptide may be achieved by introduction into a host cell of a vector capable of replicating and functioning independently of a host, to which a polynucleotide encoding the polypeptide is operably linked.
  • the increase may be achieved by introduction of one or two or more copies of a polynucleotide encoding the polypeptide into a chromosome in a host cell.
  • the introduction into a chromosome may be performed by introducing a vector capable of inserting the polynucleotide into the chromosome in a host cell into the host cell, but is not limited thereto.
  • the replacement of the gene expression control region (or expression control sequence) on the chromosome encoding a polypeptide with a sequence having strong activity may be, for example, occurrence of mutation in the sequence by deletion, insertion, non-conservative or conservative substitution or a combination thereof to further enhance the activity of the expression control region; or replacement with a sequence having stronger activity.
  • the expression control region may include, but is not particularly limited to, a promoter, an operator sequence, a sequence encoding a ribosome binding site, and a sequence for regulating the termination of transcription and translation.
  • the replacement may be to replace the original promoter with a strong promoter, but is not limited thereto.
  • strong promoters include, but are not limited to, CJ1 to CJ7 promoters (US 7662943 B2), lac promoter, trp promoter, trc promoter, tac promoter, lambda phage PR promoter, PL promoter, tet promoter, gapA promoter, SPL7 promoter, SPL13 (sm3) promoter (US 10584338 B2), O2 promoter (US 10273491 B2), tkt promoter, and yccA promoter.
  • modification of the nucleotide sequence encoding the start codon or 5'-UTR region of a gene transcript encoding a polypeptide may include, for example, substitution with a nucleotide sequence encoding another (i.e., a different) start codon having a higher polypeptide expression rate compared to the endogenous start codon.
  • the modification of the amino acid sequence or polynucleotide sequence may be, but is not limited to, effectuated by a mutation in the amino acid sequence of a polypeptide or the polynucleotide sequence encoding the polypeptide by deletion, insertion, non-conservative or conservative substitution, or a combination thereof so that the activity of the polypeptide is enhanced, or replacement of the sequence with an amino acid sequence or polynucleotide sequence modified to have stronger activity or an amino acid sequence or polynucleotide sequence modified to increase activity.
  • the replacement may be specifically performed by inserting a polynucleotide into a chromosome by homologous recombination, but is not limited thereto.
  • the vector used at this time may further include a selection marker for confirming chromosome insertion.
  • introduction of an exogenous polypeptide exhibiting the activity of a polypeptide or an exogenous polynucleotide encoding the same may be introduction of an exogenous polynucleotide encoding a polypeptide exhibiting activity the same as/similar to that of the polypeptide into a host cell.
  • the exogenous polynucleotide is not limited in origin or sequence as long as it exhibits activity the same as/similar to that of the polypeptide.
  • a known transformation method may be appropriately selected by those skilled in the art.
  • As the introduced polynucleotide is expressed in the host cell, a polypeptide may be produced and its activity may be increased.
  • codon optimization of a polynucleotide encoding a polypeptide may be codon optimization of the endogenous polynucleotide so that the transcription or translation is increased in a host cell, or codon optimization of the exogenous polynucleotide so that the optimized transcription and translation are performed in a host cell.
  • modification or chemical modification of an exposed site selected through analysis of the tertiary structure of a polypeptide may be, for example, determining a template protein candidate according to the degree of sequence similarity by comparing the sequence information of the polypeptide to be analyzed with a database in which sequence information of known proteins is stored, confirming the structure based on this, selecting an exposed site to be modified or chemically modified, selecting the exposed site to be modified or chemically modified, and modifying or chemically modifying the same.
  • Such enhancement of the amino acid molecule activity may be an increase in the activity or concentration expression level of the corresponding amino acid molecule compared to the activity or concentration of the amino acid molecule expressed in the wild-type host cell or host cell before modification, or an increase in the amount of product produced with the amino acid molecule, but is not limited thereto.
  • the amino acid molecule is a recombinant protein such as, for example, a recombinant argininosuccinate synthase.
  • the amino acid molecule comprises a sequence having SEQ ID NO:6, X 5 PX 6 X 7 X 8 X 9 GX 10 AFSGGLDTSX 11 AX 12 , in which X 5 is I, L or V; X 6 is an amino acid; X 7 is A, E or Q; X 8 is K or R; X 9 is I or V; X 10 is I or L; X 11 is A, T or V; and X 12 is I, L or V.
  • X 5 of SEQ ID NO:6 is I. In some embodiments, X 5 of SEQ ID NO:6 is L. In some embodiments, X 5 of SEQ ID NO:6 is V.
  • X 7 of SEQ ID NO:6 is A. In some embodiments, X 7 of SEQ ID NO:6 is E. In some embodiments, X 7 of SEQ ID NO:6 is Q.
  • X 8 of SEQ ID NO:6 is K. In some embodiments, X 8 of SEQ ID NO:6 is R.
  • X 9 of SEQ ID NO:6 is I. In some embodiments, X 9 of SEQ ID NO:6 is V.
  • X 10 of SEQ ID NO:6 is I. In some embodiments, X 10 of SEQ ID NO:6 is L.
  • X 11 of SEQ ID NO:6 is A. In some embodiments, X 11 of SEQ ID NO:6is T. In some embodiments, X 11 of SEQ ID NO:6 is V.
  • X 12 of SEQ ID NO:6 is I. In some embodiments, X 12 of SEQ ID NO:6 is L. In some embodiments, X 12 of SEQ ID NO:6 is V.
  • the amino acid molecule comprises a sequence having SEQ ID NO:7, X 13 GAX a X 14 X 15 X 16 YTA X 17 X 18 GQ X 19 DE, in which X 13 is K or N; X a is an amino acid; X 14 is C or P; X 15 is C or Y; X 16 is A, S or T; X 17 is D or N; X 18 is I or L; and X 19 is A, P or Y.
  • X 13 of SEQ ID NO:7 is K. In some embodiments, X 13 of SEQ ID NO:7 is N.
  • X 14 of SEQ ID NO:7 is C. In some embodiments, X 14 of SEQ ID NO:7 is P.
  • X 15 of SEQ ID NO:7 is C. In some embodiments, X 15 of SEQ ID NO:7 is Y.
  • X 16 of SEQ ID NO:7 is A. In some embodiments, X 16 of SEQ ID NO:7 is S. In some embodiments, X 16 of SEQ ID NO:7 is T.
  • X 17 of SEQ ID NO:7 is D. In some embodiments, X 17 of SEQ ID NO:7 is N.
  • X 18 of SEQ ID NO:7 is I. In some embodiments, X 18 of SEQ ID NO:7 is L.
  • X 19 of SEQ ID NO:7 is A. In some embodiments, X 19 of SEQ ID NO:7 is P. In some embodiments, X 19 of SEQ ID NO:7 is Y.
  • the amino acid molecule comprises a sequence having SEQ ID NO:8, X 20 X 21 X 22 X 23 X 24 X 25 X 26 X a LX 27 , in which X 20 is A or S; X 21 is R or V; X 22 is I or L; X 23 is I or V; X 24 is E or D; X 25 is C or G; X 26 is K or R; X a is an amino acid; and X 27 is A or V.
  • X 20 of SEQ ID NO:8 is A. In some embodiments, X 20 of SEQ ID NO:8 is S.
  • X 21 of SEQ ID NO:8 is R. In some embodiments, X 21 of SEQ ID NO:8 is V.
  • X 22 of SEQ ID NO:8 is I. In some embodiments, X 22 of SEQ ID NO:8 is L.
  • X 23 of SEQ ID NO:8 is I. In some embodiments, X 23 of SEQ ID NO:8 is V.
  • X 24 of SEQ ID NO:8 is E. In some embodiments, X 24 of SEQ ID NO:8 is D.
  • X 25 of SEQ ID NO:8 is C. In some embodiments, X 25 of SEQ ID NO:8 is G.
  • X 26 of SEQ ID NO:8 is K. In some embodiments, X 26 of SEQ ID NO:8 is R.
  • X 27 of SEQ ID NO:8 is A. In some embodiments, X 27 of SEQ ID NO:8 is V.
  • the amino acid molecule comprises a sequence having SEQ ID NO:9, X 28 AFX 29 X a X a X 30 X 31 G, in in which X 28 is G or N; X 29 is H or N; X a is an amino acid; X 30 is S or T; and X 31 is A or G.
  • X 28 of SEQ ID NO:9 is G. In some embodiments, X 28 of SEQ ID NO:9 is N.
  • X 29 of SEQ ID NO:9 is H. In some embodiments, X 29 of SEQ ID NO:9 is N.
  • X 30 of SEQ ID NO:9 is S. In some embodiments, X 30 of SEQ ID NO:9 is T.
  • X 31 of SEQ ID NO:9 is A. In some embodiments, X 31 of SEQ ID NO:9 is G.
  • the amino acid molecule comprises a sequence having SEQ ID NO:10, YFNTTPX 32 GRAVX 33 X 34 TX 35 LV, in which X 32 is I or L; X 33 is A or T; X 34 is A or G; and X 35 is L or M.
  • X 32 of SEQ ID NO:10 is I. In some embodiments, X 32 of SEQ ID NO:10 is L.
  • X 33 of SEQ ID NO:10 is A. In some embodiments, X 33 of SEQ ID NO:10 is T.
  • X 34 of SEQ ID NO:10 is A. In some embodiments, X 34 of SEQ ID NO:10 is G.
  • X 35 of SEQ ID NO:10 is L. In some embodiments, X 35 of SEQ ID NO:10 is M.
  • the amino acid molecule comprises a sequence having SEQ ID NO:11, TX 36 KGNDIERF, in which X 36 is F or Y.
  • X 36 of SEQ ID NO:11 is F. In some embodiments, X 36 of SEQ ID NO:11 is Y.
  • X 37 of SEQ ID NO:12 is L. In some embodiments, X 37 of SEQ ID NO:12 is V.
  • X 38 of SEQ ID NO:12 is A. In some embodiments, X 38 of SEQ ID NO:12 is T. In some embodiments, X 38 of SEQ ID NO:12 is V.
  • the amino acid molecule comprises a sequence having SEQ ID NO:13, GGRX a EMX 39 X 40 X 41 X 42 , in which X a is an amino acid; X 39 is A or S; X 40 is A, E or Q; X 41 is F, W or Y; and X 42 is L or M.
  • X 39 of SEQ ID NO:13 is A. In some embodiments, X 39 of SEQ ID NO:13 is S.
  • X 40 of SEQ ID NO:13 is A. In some embodiments, X 40 of SEQ ID NO:13 is E. In some embodiments, X 40 of SEQ ID NO:13 is Q.
  • X 41 of SEQ ID NO:13 is F. In some embodiments, X 41 of SEQ ID NO:13 is W. In some embodiments, X 41 of SEQ ID NO:13 is Y.
  • X 42 of SEQ ID NO:13 is L. In some embodiments, X 42 of SEQ ID NO:13 is M.
  • the amino acid molecule comprises a sequence having SEQ ID NO:14, EKAYSTDX 43 NX 44 X 45 GATHE, in which X 43 is A or S; X 44 is I, L or M; and X 45 is L or W.
  • X 43 of SEQ ID NO:14 is A. In some embodiments, X 43 of SEQ ID NO:14 is S.
  • X 44 of SEQ ID NO:14 is I. In some embodiments, X 44 of SEQ ID NO:14 is L. In some embodiments, X 44 of SEQ ID NO:14 is M.
  • X 45 of SEQ ID NO:14 is L. In some embodiments, X 45 of SEQ ID NO:14 is W.
  • the amino acid molecule comprises a sequence having SEQ ID NO:15, VX a PIMGVX a X 46 W, in which X a is an amino acid and X 46 is F, H or S.
  • X 46 of SEQ ID NO:15 is F. In some embodiments, X 46 of SEQ ID NO:15 is H. In some embodiments, X 46 of SEQ ID NO:15 is S.
  • the amino acid molecule comprises a sequence having SEQ ID NO:16, X 47 GGRHGX 48 GX 49 X 50 DQIENRX 51 IEA, in which X 47 is I or V; X 48 is L or M; X 49 is M or V; X 50 is A or S; and X 51 is I or V.
  • X 47 of SEQ ID NO:16 is I. In some embodiments, X 47 of SEQ ID NO:16 is V.
  • X 48 of SEQ ID NO:16 is L. In some embodiments, X 48 of SEQ ID NO:16 is M.
  • X 49 of SEQ ID NO:16 is M. In some embodiments, X 49 of SEQ ID NO:16 is V.
  • X 50 of SEQ ID NO:16 is A. In some embodiments, X 50 of SEQ ID NO:16 is S.
  • X 51 of SEQ ID NO:16 is I. In some embodiments, X 51 of SEQ ID NO:16 is V.
  • the amino acid molecule comprises a sequence having SEQ ID NO:17, KSRGIYEAPG.
  • the amino acid molecule comprises a sequence having SEQ ID NO:18, X 52 ALX 53 X 54 X 55 AX 56 ERLX a X 57 X 58 IHNEDT, in which X 52 is L or M; X 53 is F or L; X 54 is F, H or Y; X 55 is A or I; X 56 is F or Y; X a is an amino acid; X 57 is N, S or T; and X 58 is A or G.
  • X 52 of SEQ ID NO:18 is L. In some embodiments, X 52 of SEQ ID NO:18 is M.
  • X 53 of SEQ ID NO:18 is F. In some embodiments, X 53 of SEQ ID NO:18 is L.
  • X 54 of SEQ ID NO:18 is F. In some embodiments, X 54 of SEQ ID NO:18 is H. In some embodiments, X 54 of SEQ ID NO:18 is Y.
  • X 55 of SEQ ID NO:18 is A. In some embodiments, X 55 of SEQ ID NO:18 is I.
  • X 56 of SEQ ID NO:18 is F. In some embodiments, X 56 of SEQ ID NO:18 is Y.
  • X 57 of SEQ ID NO:18 is N. In some embodiments, X 57 of SEQ ID NO:18 is S. In some embodiments, X 57 of SEQ ID NO:18 is T.
  • X 58 of SEQ ID NO:18 is A. In some embodiments, X 58 of SEQ ID NO:18 is G.
  • the amino acid molecule comprises a sequence having SEQ ID NO:19, LGX 59 LX 60 YX 61 GRWX 62 DX 63 QX 64 X 65 MX 66 RX 67 , in which X 59 is K or R; X 60 is L or M; X 61 is A, E or Q; X 62 is F or L; X 63 is P or S; X 64 is A, S or G; X 65 is I, L or M; X 66 is I, L or V; and X 67 is D or E.
  • X 59 of SEQ ID NO: 19 is K. In some embodiments, X 59 of SEQ ID NO: 19 is R.
  • X 60 of SEQ ID NO: 19 is L. In some embodiments, X 60 of SEQ ID NO: 19 is M.
  • X 61 of SEQ ID NO: 19 is A. In some embodiments, X 61 of SEQ ID NO: 19 is E. In some embodiments, X 61 of SEQ ID NO: 19 is Q.
  • X 62 of SEQ ID NO: 19 is F. In some embodiments, X 62 of SEQ ID NO: 19 is L.
  • X 63 of SEQ ID NO: 19 is P. In some embodiments, X 63 of SEQ ID NO: 19 is S.
  • X 64 of SEQ ID NO: 19 is A. In some embodiments, X 64 of SEQ ID NO: 19 is S. In some embodiments, X 64 of SEQ ID NO: 19 is G.
  • X 65 of SEQ ID NO: 19 is I. In some embodiments, X 65 of SEQ ID NO: 19 is L. In some embodiments, X 65 of SEQ ID NO: 19 is M.
  • X 66 of SEQ ID NO: 19 is I. In some embodiments, X 66 of SEQ ID NO: 19 is L. In some embodiments, X 66 of SEQ ID NO: 19 is V.
  • X 67 of SEQ ID NO: 19 is D. In some embodiments, X 67 of SEQ ID NO: 19 is E.
  • the amino acid molecule comprises a sequence having SEQ ID NO:20, LRRGX a DX 68 X 69 X 70 , in which X a is an amino acid; X 68 is F or Y; X 69 is S or T; and X 70 is I or L.
  • X 68 of SEQ ID NO: 20 is F. In some embodiments, X 68 of SEQ ID NO: 20 is Y.
  • X 69 of SEQ ID NO: 20 is S. In some embodiments, X 69 of SEQ ID NO: 20 is T.
  • X 70 of SEQ ID NO: 20 is I. In some embodiments, X 70 of SEQ ID NO: 20 is L.
  • the amino acid molecule comprises a sequence having SEQ ID NO:21, X 71 X 72 X 73 X 74 X a X 75 X 76 X 77 LX 78 MEX 79 , in which X 71 is A, D or N; X 72 is F or L; X 73 is S or T; X 74 is F or Y; X a is an amino acid; X 75 is A, P or S; X 76 is A, E or D; X 77 is K or R; X 78 is S or T; and X 79 is K or R.
  • X 71 of SEQ ID NO: 21 is A. In some embodiments, X 71 of SEQ ID NO: 21 is D. In some embodiments, X 71 of SEQ ID NO: 21 is N.
  • X 72 of SEQ ID NO: 21 is F. In some embodiments, X 72 of SEQ ID NO: 21 is L.
  • X 73 of SEQ ID NO: 21 is S. In some embodiments, X 73 of SEQ ID NO: 21is T.
  • X 74 of SEQ ID NO: 21 is F. In some embodiments, X 74 of SEQ ID NO: 21 is Y.
  • X 75 of SEQ ID NO: 21 is A. In some embodiments, X 75 of SEQ ID NO: 21 is P. In some embodiments, X 75 of SEQ ID NO: 21 is S.
  • X 76 of SEQ ID NO: 21 is A. In some embodiments, X 76 of SEQ ID NO: 21 is E. In some embodiments, X 76 of SEQ ID NO: 21 is D.
  • X 77 of SEQ ID NO: 21 is K. In some embodiments, X 77 of SEQ ID NO: 21 is R.
  • X 78 of SEQ ID NO: 21 is S. In some embodiments, X 78 of SEQ ID NO: 21 is T.
  • X 79 of SEQ ID NO: 21 is K. In some embodiments, X 79 of SEQ ID NO: 21 is R.
  • the amino acid molecule comprises a sequence having SEQ ID NO:22, DRIGQLX 80 MRX 81 X 82 DX 83 X a DX 84 R, in which X 80 is H or T; X 81 is L, N or T; X 82 is L or N; X 83 is I, L or V; X a is an amino acid; and X 84 is S or T.
  • X 80 of SEQ ID NO:22 is H. In some embodiments, X 80 of SEQ ID NO: 22 is T.
  • X 81 of SEQ ID NO:22 is L. In some embodiments, X 80 of SEQ ID NO: 22 is N. In some embodiments, X 80 of SEQ ID NO:22 is T.
  • X 82 of SEQ ID NO:22 is L. In some embodiments, X 82 of SEQ ID NO: 22 is N.
  • X 83 of SEQ ID NO:22 is I. In some embodiments, X 83 of SEQ ID NO: 22 is L. In some embodiments, X 83 of SEQ ID NO:22 is V.
  • X 84 of SEQ ID NO:22 is S. In some embodiments, X 84 of SEQ ID NO: 22 is T.
  • the amino acid molecule comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 or 17 sequence(s) selected from the group consisting of SEQ ID NO: 6-22.
  • the amino acid molecule comprises all of SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, and SEQ ID NO: 22.
  • the amino acid molecule described herein may have a length of at least 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, or 171 amino acids and/or 200, 199, 198, 197, 196, 195, 194, 193, 192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 181, 180, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154, 153, 152, 151 amino acids or less
  • the amino acid molecule comprises a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. capsulatum sequence, SEQ ID NO: 23.
  • the amino acid molecule comprises a sequence corresponding to at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-445 of SEQ ID NO: 23.
  • the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 23.
  • the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. faecalis sequence, SEQ ID NO: 24.
  • the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-444 of SEQ ID NO: 24.
  • the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-444 of SEQ ID NO: 24.
  • the amino acid molecule comprises a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 26.
  • the amino acid molecule comprises a sequence corresponding to at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-445 of SEQ ID NO: 26.
  • the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 26.
  • the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to C. necator sequence, SEQ ID NO: 29.
  • the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-441 of SEQ ID NO: 29.
  • the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 29.
  • the amino acid molecule comprises a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 % sequence identity to E. coli sequence SEQ ID NO: 30.
  • the amino acid molecule comprises a sequence corresponding to at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 % of positions 1-153 and 167-447 of SEQ ID NO: 30.
  • the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 30.
  • the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to N. weaveri sequence, SEQ ID NO: 32.
  • the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-157 and 171-448 of SEQ ID NO: 32.
  • the amino acid molecule comprises a sequence corresponding to positions 1-157 and 171-448 of SEQ ID NO: 32.
  • the amino acid molecule comprises a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 23; having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 24; having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 25; having at least 80, 81, 82, 83, 84, 85, 86, 87, 88,
  • the present disclosure provides a host cell or microorganism expressing the amino acid molecule disclosure herein.
  • the present disclosure provides a nucleic acid molecule comprising a nucleic acid sequence encoding the amino acid molecule disclosed herein.
  • the present disclosure provides a nucleic acid molecule comprising a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A.
  • capsulatum sequence SEQ ID NO: 33; a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A.
  • faecalis sequence SEQ ID NO: 34; a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to B.
  • pyrrocinia sequence SEQ ID NO: 35; a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to C.
  • necator sequence SEQ ID NO: 36; a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to E. coli sequence SEQ ID NO: 37; or a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to N. weaveri sequence, SEQ ID NO: 38, wherein the nucleic acid molecule does not occur in nature.
  • a vector comprising a nucleic acid molecule as disclosed herein.
  • a "vector” refers to and may include a DNA construct containing a nucleotide sequence of a polynucleotide encoding a target polypeptide operably linked to a suitable expression control region (or expression control sequence) so that the target polypeptide can be expressed in a suitable host.
  • the expression control region may include a promoter capable of initiating transcription, an optional operator sequence for regulating such transcription, a sequence encoding a suitable mRNA ribosome binding site, and a sequence regulating the termination of transcription and translation.
  • the vector After transformation into an appropriate host cell, the vector may replicate or function independently of the host genome, and may be integrated into the genome itself.
  • the vector used in the present disclosure is not particularly limited, and any vector known in the art may be used.
  • Examples of commonly used vectors include natural or recombinant plasmids, cosmids, viruses and bacteriophages.
  • pWE15, M13, MBL3, MBL4, IXII, ASHII, APII, t10, t11, Charon4A, and Charon21A may be used as a phage vector or cosmid vector
  • a pDZ system a pBR system, a pUC system, a pBluescript II system, a pGEM system, a pTZ system, a pCL system, and a pET system may be used as a plasmid vector.
  • pDZ, pDC, pDCM2, pACYC177, pACYC184, pCL, pECCG117, pUC19, pBR322, pMW118, pCC1BAC vectors and the like may be used. Additional information about the vectors may be found in U.S. Patent Application Publication No. 2023/0134555, which is incorporated by reference in its entirety.
  • a polynucleotide encoding a target polypeptide may be inserted into a chromosome through a vector for intracellular chromosome insertion.
  • the insertion of a polynucleotide into a chromosome may be performed by any method known in the art, for example, homologous recombination, but is not limited thereto.
  • a selection marker for confirming chromosome insertion may be further included.
  • the selection marker is used to select cells transformed with a vector, that is, to confirm the insertion of a target nucleic acid molecule, and markers that confer selectable phenotypes such as drug resistance, auxotrophy, resistance to cytotoxic agents, and expression of surface polypeptides may be used. In an environment treated with a selective agent, only the cells expressing the selection marker can survive or exhibit other expression traits, and thus the transformed cells can be selected.
  • the modification of part or all of the polynucleotide in the host cell or microorganism may be induced by (a) homologous recombination using a vector for chromosome insertion into host cells or microorganisms, or genome editing using engineered nuclease (e.g., CRISPR-Cas9) and/or (b) light such as ultraviolet light and radiation and/or chemical treatment, but is not limited thereto.
  • the method for modifying part or all of the gene may include a method by DNA recombination technology.
  • nucleotide sequence or vector including a nucleotide sequence homologous to the target gene into the host cell or microorganism to cause homologous recombination, part or all of the gene may be deleted.
  • the injected nucleotide sequence or vector may include a dominant selection marker, but is not limited thereto.
  • polynucleotide in the polynucleotide according to some embodiments of the present disclosure, various modifications may be made in the coding region within a range in which the amino acid sequence of the protein is not changed in consideration of codon degeneracy or preferred codons in organisms to express the protein described herein.
  • Another aspect of the present disclosure is to provide a polynucleotide comprising a nucleic acid sequence encoding the protein described above, wherein the polynucleotide does not occur in nature.
  • sequence homology or identity of a conserved polynucleotide or polypeptide is determined by standard alignment algorithms, and a default gap penalty established by the program being used may be used together.
  • Substantially homologous or identical sequences are generally capable of hybridizing with all or part of the sequence under moderate or high stringent conditions. It is apparent that hybridization also includes hybridization with a polynucleotide containing a common codon in a polynucleotide or a codon taking codon degeneracy into account.
  • Whether or not any two polynucleotide or polypeptide sequences have homology, similarity, or identity may be determined, for example, using known computer algorithms such as the "FASTA” program using default parameters as in Pearson et al. (1988) [Proc. Natl. Acad. Sci. USA 85]: 2444.
  • the homology, similarity or identity may be determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as performed in the Needleman program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet.
  • the homology, similarity or identity of polynucleotides or polypeptides may be determined by comparing sequence information, for example, using a GAP computer program such as Needleman et al. (1970), J Mol Biol. 48: 443, for example, as known in Smith and Waterman, Adv. Appl. Math. (1981) 2: 482.
  • a GAP program may be defined as the value acquired by dividing the total number of symbols in the shorter of two sequences by the number of similarly aligned symbols (namely, nucleotides or amino acids).
  • Default parameters for the GAP program may include (1) binary comparison matrix (containing values of 1 for identity and 0 for non-identity) and weighted comparison matrix of Gribskov et al., (1986) Nucl. Acids Res. 14: 6745 as disclosed in Schwartz and Dayhoff, eds., Atlas Of Protein Sequence And Structure, National Biomedical Research Foundation, pp. 353-358 (1979) (or EDNAFULL (EMBOSS version of NCBI NUC4.4) substitution matrix); (2) a penalty of 3.0 for each gap and an additional 0.10 penalty for each symbol in each gap (or a gap opening penalty of 10, a gap extension penalty of 0.5); and (3) no penalty for an end gap.
  • binary comparison matrix containing values of 1 for identity and 0 for non-identity
  • weighted comparison matrix of Gribskov et al., (1986) Nucl. Acids Res. 14: 6745 as disclosed in Schwartz and Dayhoff, eds., Atlas Of Protein Sequence And Structure, National Biomedical
  • a host cell in a further aspect, includes both wild-type unmodified host cells or host cells in which genetic modification has occurred naturally or “transformation” has occurred artificially.
  • the host cell described herein may be a microorganism.
  • microorganism (or strain) includes both wild-type unmodified microorganisms or microorganisms in which genetic modification has occurred naturally or "transformation” has occurred artificially.
  • the term "unmodified host cell” or “unmodified microorganism” does not exclude strains containing mutations that may occur naturally in the host cells or microorganisms, and may refer to a wild-type strain or a natural strain itself, or a strain before the trait is changed by genetic mutation due to natural or artificial factors.
  • the unmodified microorganism may refer to a strain in which the activity of the protein described herein is not enhanced or has not yet been enhanced.
  • the "unmodified microorganism” may be used interchangeably with "strain before modification", “microorganism before modification”, “unmutated strain”, “unmodified strain”, “unmutated microorganism”, or "reference microorganism”.
  • the term "transformation” refers to introducing a vector including a polynucleotide encoding a target polypeptide into a host cell or microorganism so that the polypeptide encoded by the polynucleotide can be expressed in the host cell.
  • the transformed polynucleotide may include both: a transformed polynucleotide that is located by being inserted into the chromosome of the host cell and a transformed polynucleotide that is located outside the chromosome as long as they can be expressed in the host cell.
  • the polynucleotide includes DNA and/or RNA encoding a target polypeptide.
  • the polynucleotide may be introduced in any form as long as it can be introduced into and expressed in a host cell.
  • the polynucleotide may be introduced into a host cell in the form of an expression cassette, which is a gene construct including all elements required for self-expression.
  • the expression cassette may usually include a promoter operably linked to the polynucleotide, a transcription termination signal, a ribosome binding site, and a translation termination signal.
  • the expression cassette may be in the form of an expression vector capable of self-replication.
  • the polynucleotide may be introduced into a host cell in its own form and be operably linked to a sequence required for expression in the host cell, but is not limited thereto.
  • the engineered host cell or microorganism described herein may comprise one, two, three or more vectors, each comprising a different nucleotide sequence. In some embodiments, the engineered host cell or microorganism described herein may comprise one, two, three or more vectors, each comprising a nucleotide sequence encoding a different amino acid molecules.
  • Artificially modified host cells or microorganisms refer to host cells or microorganisms in which a specific mechanism is weakened or enhanced by causes such as insertion of an exogenous gene or enhancement or inactivation of the activity of an endogenous gene, containing genetic modification to produce a desired polypeptide, protein, or product.
  • Engineered host cells, microorganisms or strains in accordance with some embodiments herein may be transformed to express recombinant nucleic acid sequences and/or produce recombinant amino acid molecules or argininosuccinate synthase proteins.
  • the present disclosure provides a host cell, microorganism or strain expressing an argininosuccinate synthase disclosed herein.
  • the host cell or microorganism may be an engineered host cell or microorganism, in which the host cell or microorganism has been transformed, modified, or mutated artificially by techniques known to those of skill in the art to yield a host cell or microorganism that does not occur in nature.
  • an amino acid sequence for the amino acid molecule or argininosuccinate synthase described herein is extrinsic to the host cell.
  • an amino acid molecule or an argininosuccinate synthase disclosed herein is extrinsic to the host cell.
  • the "extrinsic" sequence means that the sequence does not occur in the host cell without any modification, and in particular, without any artificial modification.
  • the host cell may be modified to produce an amino acid molecule or an argininosuccinate synthase of another organism or a non-naturally occurring recombinant amino acid molecule or argininosuccinate synthase that does not naturally occur in the host cell (i.e., without any modification).
  • an amino acid sequence for an amino acid molecule or argininosuccinate synthase described herein is an intrinsic amino acid sequence of the microorganism.
  • the "intrinsic" sequence means that the sequence can occur in the host cell without any modification.
  • the host cell may be modified to produce a recombinant amino acid molecule or an argininosuccinate synthase having an intrinsic sequence of the host cell.
  • a nucleotide sequence encoding the argininosuccinate synthase protein is an extrinsic nucleotide sequence of the host cell. In further embodiments of the host cell according to the present disclosure, a nucleotide sequence encoding the argininosuccinate synthase protein is an intrinsic nucleotide sequence of the host cell.
  • the recombinant or modified host cell may be a host cell having increased L-arginine producing ability compared to a natural wild-type host cell, because the activity of the protein described herein or a polynucleotide encoding the same is improved in the recombinant or modified host cell compared to that in the natural wild-type host cell, but is not limited thereto.
  • the L-arginine producing recombinant or modified host cell e.g., of the genus Corynebacterium
  • the L-arginine producing ability of the recombinant or modified host cell may be increased by about 0.3% or more, specifically by about 0.5% or more, about 1% or more, about 2% or more, about 3% or more, about 4% or more, about 5% or more, about 6% or more, about 7% or more, about 8% or more, about 9% or more, about 10% or more, about 10.7% or more, about 11% or more, about 12% or more, about 12.3% or more, about 13% or more, about 14% or more, about 14.4% or more, about 15% or more, about 15.1% or more, about 16% or more, or about 16.9% or more (the upper limit thereof is not particularly limited and the producing ability may be increased by, for example, about 200% or less, about 150% or less, about 100% or less, about 50% or less, about 40% or less, about 30% or less, or about 20% or less) compared to the L-arginine producing ability of the parent strain before mutation or unmodified host cell.
  • the L-arginine production by the engineered host cell described herein is increased at least by about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 or 50% and/or about 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40 or less %.
  • the producing ability is not limited thereto as long as it has an increased amount of + value compared to the producing ability of the parent strain before mutation or an unmodified host cell thereof.
  • the parent strain may be a cell prior to transforming an amino acid molecule described herein.
  • the L-arginine producing ability of the host cell having the increased producing ability may be increased by about 1.005 times or more, about 1.01 times or more, about 1.02 times or more, about 1.03 times or more, about 1.04 times or more, about 1.05 times or more, about 1.06 times or more, about 1.07 times or more, about 1.08 times or more, about 1.09 times or more, about 1.10 times or more, about 1.107 times or more, about 1.11 times or more, about 1.12 times or more, about 1.123 times or more, about 1.13 times or more, about 1.14 times or more, about 1.144 times or more, about 1.15 times or more, about 1.151 times or more, about 1.16 times or more, about 1.169 times or more, 1.2 times or more, 1.3 times or more, 1.4 times or more (the upper limit thereof is not particularly limited and the producing ability may be increased by, for example, about 10 times or less, about 5 times or less, about 3 times or less, or about 2 times or less) compared to the L-arginine producing ability of
  • the term "about” is relative to the actual value stated, as will be appreciated by those of skill in the art, and allows for approximations, inaccuracies and limits of measurement under the relevant circumstances.
  • the terms “about,” “substantially,” and “approximately” may provide an industry-accepted tolerance for their corresponding terms and/or relativity between items, such as a tolerance of from less than one percent to ten percent of the actual value stated, and other suitable tolerances.
  • the actual value stated may, for example, be that of lengths of nucleotide sequences, degrees of errors, dimensions, quantity of an ingredient in a composition, concentrations, volumes, process temperature, process time, yields, flow rates, pressures, and like and/or ranges thereof.
  • the tolerance may occur, for example, through typical measuring and handling procedures used for making compounds, compositions, concentrates or use formulations; through inadvertent error in these procedures; through differences in the manufacture, source, or purity of starting materials or ingredients used to carry out the methods; and like considerations.
  • the term “about” also encompasses amounts that differ due to aging of, for example, a composition, formulation, or cell culture with a particular initial concentration or mixture, and amounts that differ due to mixing or processing a composition or formulation with a particular initial concentration or mixture. Whether modified by the term “about” the claims appended hereto include equivalents to these quantities.
  • the term “about” further may refer to a range of values that are similar to the stated reference value. In certain embodiments, the term “about” refers to a range of values that fall within, for example, 50, 25, 10, 9, 8,7, 6, 5,4, 3, 2, 1 percent or less of the stated reference value.
  • the present disclosure provides an engineered host cell expressing a recombinant amino acid molecule, including a recombinant protein that is an argininosuccinate synthase.
  • the amino acid molecule has an amino acid sequence comprising SEQ ID NO: 1 disclosed herein.
  • the engineered host cell described above may be an engineered microorganism.
  • the engineered microorganism is Corynebacterium.
  • the engineered microorganism is Corynebacterium glutamicum.
  • the engineered microorganism is Corynebacterium stationis.
  • the engineered microorganism is Corynebacterium crudilactis.
  • the engineered microorganism is Corynebacterium deserti.
  • the engineered microorganism is Corynebacterium efficiens.
  • the engineered microorganism is Corynebacterium callunae.
  • the engineered microorganism is Corynebacterium singulare. In some embodiments, the engineered microorganism is Corynebacterium halotolerans. In some embodiments, the engineered microorganism is Corynebacterium striatum. In some embodiments, the engineered microorganism is Corynebacterium ammoniagenes. In some embodiments, the engineered microorganism is Corynebacterium pollutisoli. In some embodiments, the engineered microorganism is Corynebacterium imitans. In some embodiments, the engineered microorganism is Corynebacterium testudinoris.
  • the engineered microorganism is Corynebacterium flavescens. In some embodiments, the engineered microorganism is Corynebacterium crenatum. In some embodiments, the engineered microorganism is Corynebacterium suranareeae. In some embodiments, the engineered microorganism is Escherichia. In some embodiments, the engineered microorganism is E. coli.
  • the engineered host cell produces L-arginine. In some embodiments, the engineered microorganism produces L-arginine.
  • the engineered host cell or microorganism is transformed with an argR gene comprising a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 100% sequence identity to SEQ ID NO: 39.
  • the engineered host cell or microorganism is transformed with an argB gene with M54V comprising a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 100% sequence identity to SEQ ID NO: 40.
  • the recombinant protein has an extrinsic amino acid sequence of the hot cell or microorganism. In other embodiments, the recombinant protein has an intrinsic amino acid sequence of the host cell or microorganism.
  • the present disclosure provides use of the engineered host cell or microorganism disclosed herein for producing L-arginine.
  • a method for producing L-arginine comprises culturing an engineered host cell expressing a recombinant protein disclosed herein.
  • the host cell is cultured in a medium.
  • the term "culture” means growing the host cell described herein under properly controlled environmental conditions.
  • the culture process according to some embodiments of the present disclosure may be performed under appropriate medium and culture conditions known in the art. Such a culture process may be easily adjusted and used by those skilled in the art depending on the selected host cell.
  • the culture may be batch, continuous and/or fed-batch culture, but is not limited thereto.
  • the term "medium” refers to a material in which nutrients required for culturing the host cell described herein are mixed as a main component, and supplies nutrients and growth factors, including water, which are essential for survival and growth.
  • any medium may be used without particular limitation as long as it is a medium used for culturing conventional host cells, but the host cell described herein may be cultured in a conventional medium containing appropriate carbon sources, nitrogen sources, phosphorus sources, inorganic compounds, amino acids, vitamins and/or the like under an aerobic condition while the temperature, pH, and the like are adjusted.
  • a culture medium for host cells of the genus Corynebacterium may be found in the literature ["Manual of Methods for General Bacteriology” by the American Society for Bacteriology (Washington D.C., USA, 1981)].
  • compounds such as ammonium hydroxide, potassium hydroxide, ammonia, phosphoric acid, sulfuric acid, and the like may be added to the medium in an appropriate manner to adjust the pH of the medium.
  • an antifoaming agent such as fatty acid polyglycol ester may be used to suppress bubble formation.
  • Oxygen or oxygen-containing gas may be injected into the medium in order to maintain the aerobic state of the medium; or gas may not be injected or nitrogen, hydrogen or carbon dioxide gas may be injected in order to maintain the anaerobic and microaerobic conditions, but the control of atmosphere is not limited thereto.
  • the culturing is performed with at least one source of carbon.
  • suitable sources of carbon include, but are not limited to, carbohydrates, sugar alcohols, organic acids, and amino acids.
  • Carbohydrates may include glucose, saccharose, lactose, fructose, sucrose, and maltose, among others.
  • Sugar alcohols may include mannitol and sorbitol, among others.
  • Organic acids may include pyruvic acid, lactic acid, and citric acid, among others, amino acids may include glutamic acid, methionine, and lysine, among others.
  • the culturing is performed with at least one source of nitrogen.
  • suitable sources of nitrogen include inorganic nitrogen sources, organic nitrogen sources, and combinations thereof.
  • Inorganic nitrogen sources include, but are not limited to, ammonia, ammonium sulfate, ammonium chloride, ammonium phosphate, and ammonium nitrate.
  • Organic nitrogen sources include, but are not limited to, organic acids including ammonium acetate and ammonium carbonate; amino acids such as glutamic acid, methionine, and glutamine; and other sources of organic nitrogen including peptone, NZ-amine, meat extract, yeast extract, malt extract, corn steep liquor, casein hydrolysates, fish or decomposition products thereof, and defatted soybean cake or decomposition products thereof. These nitrogen sources may be used singly or in combination of two or more, but the manner of use is not limited thereto.
  • the culturing is performed with at least one of monobasic potassium phosphate, dibasic potassium phosphate, monobasic sodium phosphate, dibasic sodium phosphate, sodium chloride, calcium chloride, iron chloride, magnesium sulfate, iron sulfate, manganese sulfate, calcium carbonate, and the like. Additionally, amino acids, vitamins, suitable precursors and/or the like may be added to the medium.
  • These sources of carbon, sources of nitrogen, additives, or precursors thereof may be added to the medium batchwise or continuously. However, the manner of addition is not limited thereto.
  • Another embodiment of the method of producing L-arginine as disclosed herein provides that the culturing is performed with at least one selected from the group consisting of potassium phosphate, dipotassium phosphate, a sodium-containing salt corresponding thereto, sodium chloride, calcium chloride, iron chloride, magnesium sulfate, iron sulfate, manganese sulfate, and calcium carbonate.
  • the culturing is performed at a temperature maintained between about 20 °C to about 45 °C. In some embodiments, the culturing is performed at a temperature maintained between about 25 °C to about 40 °C. In some embodiments, the culturing is performed at a temperature maintained from about 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 or 36 °C to about 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44 or 45 °C. In still further embodiments, the culturing may be conducted for from about 10 to about 160 hours. In still further embodiments, the culturing may be conducted for from about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 to about 100, 110, 120, 130, 140, 150 or 160 hours. However, the culture conditions are not limited thereto.
  • Another aspect of the present disclosure provides a method of increasing L-arginine production by a host cell.
  • the method comprises transforming the host cell to produce the engineered host cell expressing a recombinant amino acid molecule disclosed herein.
  • the method further includes preparing the host cell described herein, preparing a medium for culturing the host cell, or a combination thereof (in any order), for example, before the culturing step.
  • L-arginine produced by the culture according to some embodiments of the present disclosure may be secreted into the medium or may remain in the cells.
  • the method for producing L-arginine comprising culturing an engineered host cell expressing a recombinant amino acid molecule such as, for example, a recombinant argininosuccinate synthase as set forth herein, the method further includes recovering L-arginine from the medium after a culturing step (medium subjected to culture) or from the cultured host cell. The recovery step may be further comprised after the culture step.
  • the production method further comprises recovering L-arginine from the cultured engineered host cell and/or from the medium in which the engineered host cell is cultured.
  • Suitable methods for recovering the L-arginine include protocols known in the art according to the method for culturing a host cell described herein, for example, a batch, continuous or fed-batch culture method.
  • centrifugation, filtration, treatment with a crystallized protein precipitating agent salting-out method
  • extraction sonication
  • ultrafiltration dialysis
  • various kinds of chromatography such as molecular sieve chromatography (gel filtration), adsorption chromatography, ion-exchange chromatography, and affinity chromatography, HPLC, or any combination thereof
  • the desired L-arginine may be recovered from the medium or host cell using a suitable method known in the art.
  • the production method further comprises a purification step.
  • the purification may be performed using a suitable method known in the art.
  • the recovery step and the purification step may be performed continuously or discontinuously in any order, or may be performed simultaneously or by being integrated into one step, but the manner of performance is not limited thereto.
  • the protein, polynucleotide, vector, host cell and the like are as described in the other aspects herein.
  • compositions for L-arginine production comprising a microorganism of the genus Corynebacterium ; a medium in which the microorganism has been cultured; or a combination thereof.
  • the composition according to some embodiments of the present disclosure may further contain arbitrary suitable excipients commonly used in compositions for L-arginine production, and such excipients may include, for example, but are not limited to, preservatives, wetting agents, dispersing agents, suspending agents, buffering agents, stabilizing agents, and isotonic agents.
  • Another aspect provides a use of the L-arginine product characterized by one or more elements disclosed in the application.
  • a further aspect provides a use of the method characterized by one or more elements disclosed in the application.
  • the amino acid residues essential for the binding to substrates such L- citrulline, aspartate and ATP in Corynebacterium glutamicum were identified through protein sequence alignment between the argininosuccinate synthase of Corynebacterium glutamicum (SEQ ID NO: 28) and a heterologous argininosuccinate synthase with a known structure. Although the tertiary structure of argininosuccinate synthase of Corynebacterium glutamicum has not been identified, with regard to the argininosuccinate synthase derived from Escherichia coli (SEQ ID NO: 30), the tertiary crystal structure and active site residues therein have been reported.
  • argininosuccinate synthases of Acidobacterium capsulatum, Alcaligenes faecalis, Burkholderia pyrrocinia, Cupriavidus necator, Escherichia coli, and Neisseria weaveri 's argininosuccinate synthases have a 5-amino-acid extension around essential residues in the binding site, which is not present in Corynebacterium glutamicum ( e.g. , see FIG. 2 and SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 26 SEQ ID NO: 29 SEQ ID NO: 30 and SEQ ID NO: 32, respectively).
  • the site where the amino acid extension has occurred is located at the entrance of the active site on the tertiary structure of the protein and is particularly adjacent to the substrate binding site. Additionally, this amino acid extension may be involved in the formation of a helical secondary structure. While, in the Corynebacterium glutamicum -derived protein where this amino acid extension is absent, the corresponding site exists in the form of a loop without a specific secondary structure, in the E. coli -derived protein where this extension is present, the site exists in the form of an alpha helix structure. Additionally, he distance between the corresponding region and the substrate may increase in the tertiary structure of the protein due to the difference in sequence between this extended portion and the portion preceding it.
  • the active site entrance may be expanded due to the formation of a secondary structure, and easier access to the active site of the substrate can be facilitated, which may contribute to the improvement of catalytic efficiency of the protein.
  • Argininosuccinate synthase from Escherichia coli may be functional as a tetramer.
  • the amino acid extension may be located near interface between argininosuccinate synthase monomers and thus capable of affecting tetramer formation.
  • a protein percent identity matrix for the ten kinds of argininosuccinate synthase was also generated using the Clustal 2.1 program (Madeira, Fabio et al., Nucleic acids research 50(W1) W276-W279, 2022) and shown in Table 2 below.
  • argininosuccinate synthases having the amino acid extension were included, and the protein sequences of 20 argininosuccinate synthases having a 5 amino acid extension at the corresponding position were additionally collected using the NCBI BLAST program (Altschul, S F et al., Nucleic acids research 25(17) 3389-402, 1997) and included (SEQ ID NO: 1 through SEQ ID NO: 16). These sequences were searched for motifs using the MEME program, a protein motif search program (Bailey, T L, and C Elkan. Proceedings. International Conference on Intelligent Systems for Molecular Biology 2, 28-36, 1994).
  • MOTIF 01 [ILV]Px[AEQ][KR][IV]G[IL]AFSGGLDTS[ATV]A[ILV] (SEQ ID NO: 6) MOTIF 02 [KN]GAx[CP][CY][AST]YTA[DN][IL]GQ[APY]DE (SEQ ID NO: 7) MOTIF 03 [AS][RV][IL][IV][ED][CG][KR]xL[AV] (SEQ ID NO: 8) MOTIF 04 [GN]AF[HN]xx[ST][AG]G(SEQ ID NO: 9) MOTIF 05 YFNTTP[IL]GRAV[AT][AG]T[LM]LV(SEQ ID NO: 10) MOTIF 06 T[FY]KGNDIERF (SEQ ID NO: 11) MOTIF 07 YRYGL[LV][ATV]N (SEQ ID NO: 12) MOTIF 08 YKPWLDxxF[IV]XEL (
  • vectors were prepared, each having an argG gene encoding the argininosuccinate synthase enzyme as disclosed herin.
  • the argG genes were each derived from Acidobacterium capsulatum (SEQ ID NO:33) , Alkaligenes faecalis (SEQ ID NO:34) , Burkholderia pyrrocinia (SEQ ID NO:35) , Bacillus amyloliquefaciens (SEQ ID NO:41) , Corynebacterium ammoniagenes (SEQ ID NO:42) , Corynebacterium glutamicum (SEQ ID NO:43) , Cupriavidus necator (SEQ ID NO:36) , Escherichia coli (SEQ ID NO:37) , Mycobacterium smegmatis (SEQ ID NO:44), and Neisseria weaveri (SEQ ID NO:38).
  • each argG gene can be inserted into the chromosome of Corynebacterium glutamicum by homologous recombination, particularly at the BBD29_RS08210 site, and expressed under the constitutive Po2 promoter (Korean Patent No. 10-1632642).
  • Two DNA fragments containing promoter and homologous region targeting BBD29_RS08210 were prepared.
  • One DNA fragment was for the 5'-homologous region targeting BBD29_RS08210 and a promoter, the other DNA fragment was for the 3'-homologous region targeting BBD29_RS08210.
  • PCR was performed using the genomic DNA of Corynebacterium glutamicum ATCC13869 as a template and the primer pairs (SEQ ID NOS: 45 and 46, and SEQ ID NOS: 47 and 48) shown in Table 4 to obtain a DNA fragment, which includes sequences of a 5'-homologous region of BBD29_RS08210 with a Po2 promoter and a 3'-homologous region of BBD29_RS08210 (hereinafter, "BBD29_RS08210 5'-DNA fragment” and "BBD29_RS08210 3'-DNA fragment").
  • the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. Then, a DNA fragment of the argG gene was prepared.
  • PCR was performed using the primer pairs shown in Table 4 (SEQ ID NOS: 49 and 50 with the genomic DNA of Acidobacterium capsulatum strain KACC 14500, SEQ ID NOS: 51 and 52 with the genomic DNA of Bacillus amyloliquefaciens strain KACC 12067, SEQ ID NOS: 53 and 54 with the genomic DNA of Burkholderia pyrrocinia strain KACC 12018, SEQ ID NOS: 55 and 56 with the genomic DNA of Corynebacterium ammoniagenes strain ATCC 6872, SEQ ID NOS: 57 and 58 with the genomic DNA of Corynebacterium glutamicum strain ATCC 13869, SEQ ID NOS: 59 and 60 with the genomic DNA of Cupriavidus necator strain KCTC 22469, SEQ ID NOS: 61 and 62 with the genomic DNA of Escherichia coli K-12 substrain MG1655, SEQ ID NOS: 63 and 64 with the genomic DNA of Mycobacterium smegmati
  • the genomic DNAs used were distributed by KCTC and KACC.
  • the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes.
  • DNA fragments of the argG gene hereinafter, argG ( A. capsulatum ), argG ( B. amyloliquefaciens ), argG ( B. pyrrocinia ), argG ( C. ammoniagenes ), argG ( C. glutamicum ), argG ( C. necator ), argG ( E. coli ), argG ( M. smegmatis ), and argG ( N. weaveri ) were obtained.
  • the obtained DNA fragments and the linearized vectors were cloned.
  • the linear vectors were used after treating pDCM2 (Korean Patent Application Publication No. 10-2020-0136813), which cannot replicate in Corynebacterium glutamicum , with SmaI restriction enzyme.
  • the thus-obtained BBD29_RS08210 5'-DNA fragment, BBD29_RS08210 3'-DNA fragment, and each of the DNA fragments of argG gene were cloned by fusion.
  • the fusion cloning was performed using the In-Fusion® HD Cloning Kit (Clontech), and the resulting clones were transformed into E.
  • a Corynebacterium glutamicum strain CJR2 having the ability to produce L-arginine was prepared. This preparation is to introduce two mutations ( ⁇ argR, argB (M54V)) serially into wild-type Corynebacterium glutamicum ATCC13869 (Ikeda, Masato et al., Applied and Environmental Microbiology 75(6)1635-41, 2009).
  • vectors introducing argR deletion (SEQ ID NO:39) and argB (M54V) mutation (SEQ ID NO: 40) were prepared. See FIG. 6.
  • PCR was performed using the genomic DNA of Corynebacterium glutamicum ATCC13869 as a template and the primer pairs shown in Table 5 (SEQ ID NOS: 67 and 68, and SEQ ID NOS: 69 and 70), and overlapping PCR was performed using the primer pair of SEQ ID NOS: 67 and 70 so as to obtain homologous recombination fragments having a sequence of the argR deletion mutation (SEQ ID NO:39).
  • PCR was performed using the primer pairs shown in Table 5 (SEQ ID NOS: 71 and 72, and SEQ ID NOS: 73 and 74), and overlapping PCR was performed using SEQ ID NOS: 71 and 74. to obtain homologous recombination fragments having a sequence of the argB (M54V) mutation (SEQ ID NO:40).
  • the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes.
  • an argR deletion mutation was introduced into wild-type Corynebacterium glutamicum ATCC13869 and transformed by the electric pulse method using the pDCM2- ⁇ argR plasmid prepared above (van der Rest et al., Appl Microbiol Biotechnol 52:541-545, 1999). Subsequently, secondary recombination was performed in a solid plate medium containing 4% sucrose followed by PCR using a primer pair (SEQ ID NOS: 67 and 70) targeting the transformed strains whose secondary recombination was completed, thereby confirming that a deletion mutation was introduced into the argR gene on the chromosome. In particular, the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. The transformed strain was named CJR1.
  • the argB (M54V) mutation was introduced into the Corynebacterium glutamicum CJR1 in the same manner as above.
  • the pDCM2- argB (M54V) plasmid prepared above was used, and it was confirmed that the M54V mutation was introduced into the argB gene on the chromosome by performing PCR using a primer pair (SEQ ID NOS: 71 and 74) targeting the transformant whose secondary recombination was completed.
  • the transformed strain was named CJR2.
  • the solid plate medium was as follows: Composite Plate Medium (pH 7.0), glucose 10 g, peptone 10 g, beef extract 5 g, yeast extract 5 g, brain heart infusion 18.5 g, NaCl 2.5 g, urea 2 g, sorbitol 91 g, agar 20 g (per 1 L of distilled water).
  • Example 3-2 Preparation of strains introduced with argininosuccinate synthase (argG) based on CJR2 strain
  • Example 3-2 In order to compare the L-arginine producing ability of the CJR2 strains, in which the argG gene is introduced, prepared in Example 3-2 (i.e., CJR2-argG ( A. capsulatum ), CJR2-argG ( B. amyloliquefaciens ), CJR2-argG ( B. pyrrocinia ), CJR2-argG ( C. ammoniagenes ), CJR2-argG ( C. glutamicum ), CJR2-argG ( C. necator ), CJR2-argG ( E. coli ), CJR2-argG ( M. smegmatis ), and CJR2-argG ( N. weaveri )), the concentration of L-arginine in the culture medium was analyzed by culturing by the method described below.
  • Corynebacterium glutamicum CJR2 i.e., the parent strain
  • the strains prepared in Example 3-2 were each inoculated into a 250 mL corner-baffle flask containing 25 mL of a production medium and then cultured with shaking at 200 rpm at 30°C for 44 hours.
  • Each of the composition of the production medium is as shown below.
  • smegmatis not including the 5-amino-acid extension were introduced, were each measured to be 4.9 g/L and 0.7 g/L; therefore, compared to the former group, the L-arginine concentration was 23% lower on average, and the L-citrulline concentration was 350% higher on average. The average yield improvement of these strains compared to their parent strain was 3.3%. This indicates that the argininosuccinate synthase having the amino acid extension is relatively excellent in terms of catalytic efficiency compared to the enzymes not including the amino acid extension.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present disclosure provides an amino acid molecule, host cells expressing the amino acid molecule, use of host cells to produce L-arginine, nucleic acid molecules with sequences encoding the amino acid molecule, vectors comprising nucleic acid molecules with sequences encoding the amino acid molecule, methods of producing L-arginine, and methods for increasing L-arginine production by a host cell.

Description

RECOMBINANT AMINO ACID MOLECULE, HOST CELLS FOR PRODUCING L-ARGININE, AND METHODS FOR PRODUCING L-ARGININE USING THE SAME
The present disclosure relates to improved amino acid molecules, host cells expressing the amino acid molecules and use thereof to produce L-arginine, nucleic acid molecules with sequences encoding the amino acid molecule, vectors comprising nucleic acid molecules with sequences encoding the amino acid molecule, and methods of producing L-arginine and for increasing L-arginine production by a host cell expressing the amino acid molecule.
L-arginine is an amino acid with extensive application for medical, food, animal feed and industrial applications. L-Arginine is produced by the body and, for example, has effects as a vasodilator. L-Arginine may also be used as a feed additive. Currently, various studies are being conducted to develop host cells and fermentation process technology that produce L-arginine at a high efficiency.
Currently, various studies are being conducted to develop host cells and fermentation process technology that produce L-arginine at a high efficiency.
An object of the present disclosure provides an amino acid molecule such as, for example, an amino acid molecule protein disclosure herein.
Another object of the present disclosure provides a host cell or microorganism expressing the amino acid molecule disclosure herein.
Another object of the present disclosure provides a nucleic acid molecule comprising a nucleic acid sequence encoding the amino acid molecule disclosed herein.
Another object of the present disclosure provides a vector comprising a nucleic acid molecule disclosed herein.
Another object of the present disclosure provides an engineered host cell expressing a recombinant amino acid molecule, including a recombinant protein that is an argininosuccinate synthase.
Another object of the present disclosure provides use of the engineered host cell or microorganism disclosed herein for producing L-arginine.
Another object of the present disclosure provides a method for producing L-arginine disclosed herein.
Figure 1 depicts a protein sequence alignment comparing alignment of exemplary argininosuccinate synthase proteins between E. coli and Corynebacterium glutamicum to determine amino acid residues essential for binding to substrate of argininosuccinate synthase.
Figures 2a to 2c depict a sequence alignment carried out on exemplary argininosuccinate synthase proteins of different heterogeneous model microorganism species to determine the non-conserved structures of the substrate binding site.
Figure 3 depicts an exemplary tertiary structure of an argininosuccinate synthase protein without the amino acid extension (derived from C. glutamicum) overlayed with a tertiary structure of an argininosuccinate synthase protein with the amino acid extension (derived from E. coli) to compare the structures and to predict the role of the amino acid extension in terms of protein function.
Figures 4a to 4c list identified exemplary motifs amongst the argininosuccinate synthase proteins of the heterogeneous model microorganisms for which sequence alignment was carried out, along with generalized versions of the motif sequences.
Figures 5a to 5f depict the exemplary DNA sequences for the argG gene encoding the argininosuccinate synthase proteins of different heterogeneous model microorganism species.
Figure 6 depicts the exemplary DNA sequences for the argR deletion mutation and the argB (M54V) mutation of the argG gene encoding argininosuccinate synthase protein.
[Detailed Description]
The amino acid molecules, including argininosuccinate synthase proteins, host cells, nucleic acid molecules, vectors, and methods described herein may employ, unless otherwise indicated, conventional techniques and descriptions of molecular biology (including recombinant techniques), cell biology, biochemistry, and microarray and sequencing technology, which are within the skill of those who practice in the art. Such conventional techniques include polymerase chain reaction (PCR), protein sequence alignment, and sequencing of proteins and oligonucleotides. Before the present compositions, research tools, and methods are described, it is to be understood that this invention is not limited to the specific methods, compositions, targets and uses described herein, as those may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to limit the scope of the present invention, which will be limited only by appended claims.
In some embodiments, the present disclosure relates to an improved amino acid molecule including, for example, an improved argininosuccinate synthase protein, which is an enzyme responsible for the rate-limiting reaction step in the process of producing L-arginine. Argininosuccinate synthase catalyzes synthesis of argininosuccinate from substrates citrulline and aspartate. Argininosuccinate is then cleaved by the action of argininosuccinate lyase into fumaric acid and L-arginine. L-arginine is an amino acid and is necessary for the production for nitric oxide, which regulates vasodilation, vascular tone, and blood flow. Thus, L-arginine is recognized for its effects as a vasodilator, and has been studied as an option for treating high blood pressure, angina, and erectile dysfunction.
The present disclosure further relates to recombinant and non-naturally occurring amino acid molecules including argininosuccinate synthase proteins, engineered host cells capable of expressing the amino acid molecules and producing L-arginine, recombinant nucleic acid molecules and vectors for expressing the amino acid molecules, and use and methods for employing these proteins, host cells, vectors, and nucleic acid molecules to produce L-arginine or to increase production of L-arginine.
The term "and/or" used herein is used to indicate any combination of the components. Moreover, the singular forms "a", "an"," and "the" may further include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a nucleotide region" refers to one, more than one, or mixtures of such regions, and reference to "an assay" may include reference to equivalent steps and methods known to those skilled in the art, and so forth. The term "heterologous" when used with reference to portions of a nucleic acid or protein indicates that the nucleic acid or protein comprises two or more subsequences that are not found in the same relationship to each other in nature. For instance, the nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a promoter from one source and a coding region from another source, or coding regions from different sources. Similarly, a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature.
The term "conservative amino acid substitutions" means amino acid sequence modifications which do not abrogate the function or a binding cite of an enzyme. Conservative amino acid substitutions include the substitution of an amino acid in one class by an amino acid of the same class, where a class is defined by common physicochemical amino acid side chain properties and high substitution frequencies in homologous proteins found in nature, as determined, for example, by a standard Dayhoff frequency exchange matrix or BLOSUM matrix. Six general classes of amino acid side chains have been categorized and include: Class I (Cys); Class II (Ser, Thr, Pro, Ala, Gly); Class III (Asn, Asp, Gln, Glu); Class IV (His, Arg, Lys); Class V (Ile, Leu, Val, Met); and Class VI (Phe, Tyr, Trp). For example, substitution of an Asp for another class III residue such as Asn, Gln, or Glu, is a conservative substitution. Thus, a predicted nonessential amino acid residue in a protein is preferably replaced with another amino acid residue from the same class. Methods of identifying amino acid conservative substitutions which do not eliminate enzyme activity are well-known in the art.
As used herein, the term "variant" encompasses but is not limited to enzymes which comprise an amino acid sequence which differs from the amino acid sequence of a reference protein by way of one or more substitutions, deletions and/or additions at certain positions within or adjacent to the amino acid sequence of the reference protein. The variant may comprise one or more conservative substitutions in its amino acid sequence as compared to the amino acid sequence of a reference antibody. Conservative substitutions may involve, e.g., the substitution of similarly charged or uncharged amino acids. The variant retains the ability, for example, to specifically bind to a substrate of the reference protein. The term variant also includes pegylated proteins.
Nucleic acid sequences implicitly encompass conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. Batzer, et al., Nucleic Acid Res. 1991, 19, 5081; Ohtsuka, et al., J. Biol. Chem. 1985, 260, 2605-2608; Rossolini, et al., Mol. Cell. Probes 1994, 8, 91-98. The term nucleic acid is used interchangeably with cDNA, mRNA, oligonucleotide, and polynucleotide.
As used herein, the term "homology" or "identity" refers to the degree of similarity between two given amino acid sequences or nucleotide sequences and may be expressed as a percentage. The terms homology and identity may often be used interchangeably.
As used herein, the terms "correspond(s) to" and "corresponding to," as they relate to sequence alignment, are intended to mean enumerated positions within the reference protein, and those positions in the sequence of interest that align with the positions on the reference protein. Thus, when the amino acid sequence of interest is aligned with the amino acid sequence of a reference sequence, the amino acids in the subject sequence that "correspond to" certain enumerated positions of the reference sequence are those that align with these positions of the reference sequence, but are not necessarily in these exact numerical positions of the reference sequence. Methods for aligning sequences for determining corresponding amino acids between sequences are well known in the art.
The terms "amino acid molecule," "peptide," "polypeptide" and "protein" as used herein, generally refer to a chain of amino acids that are held together by peptide bonds (also called amide bonds).
An "amino acid" as used herein refers to an organic molecule that contains both an amino group (i.e., -NH2) and a carboxylic acid group (i.e., -COOH).
As used herein, the term "nucleotide molecule," "polynucleotide" or "oligonucleotide" refers to a DNA or RNA strand of a certain length or longer as a polymer of nucleotides in which nucleotide monomers are linked in a long chain shape by covalent bonds, more specifically a polynucleotide fragment encoding the protein.
As used herein, the term "enhancement" of polypeptide activity means that the activity of a polypeptide is increased compared to the intrinsic activity.
In addition, the term "operably linked" in the context of a promoter sequence, which initiates and mediates transcription of the polynucleotide encoding the target polypeptide of the present application, means that the promoter sequence and the polynucleotide sequence are functionally linked to each other.
In one aspect, the present disclosure provides an amino acid molecule such as, for example, an amino acid molecule protein. In some embodiments, the amino acid molecule may comprise a peptide sequence having SEQ ID NO:1, YKPWLDX1X2FX3X4EL, in which each of X1, X2, and X4 is an amino acid; and X3 is I or V, is provided herein.
In some embodiments, X1 of SEQ ID NO:1 is S. In some embodiments, X1 of SEQ ID NO:1 is Q. In some embodiments, X1 of SEQ ID NO:1 is T.
In some embodiments, X2 of SEQ ID NO:1 is D. In some embodiments, X2 of SEQ ID NO:1 is A. In some embodiments, X2 of SEQ ID NO:1 is T. In some embodiments, X2 of SEQ ID NO:1 is Q.
In some embodiments, X3 of SEQ ID NO:1 is V. In some embodiments, X3 of SEQ ID NO:1 is I.
In further embodiments, X4 of SEQ ID NO:1 is D.
In some embodiments, SEQ ID NO:1 consists of a sequence selected from the group consisting of YKPWLDSAFIDEL (SEQ ID NO: 2), YKPWLDQTFIDEL (SEQ ID NO: 3), YKPWLDQQFIDEL (SEQ ID NO: 4), and YKPWLDTDFIDEL (SEQ ID NO: 5).
In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDSAFIDEL (SEQ ID NO: 2). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDQTFIDEL (SEQ ID NO: 3). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDQQFIDEL (SEQ ID NO: 4). In some embodiments, SEQ ID NO: 1 consists of the sequence YKPWLDTDFIDEL (SEQ ID NO: 5).
The amino acid molecules (e.g., the amino acid molecule proteins) disclosed herein may be "enhanced" in function and may yield higher quantities of L-arginine than their naturally occurring enzyme counterparts; e.g., they may have enhanced catalytic activity in comparison to the naturally occurring counterparts. The amino acid molecules according to some embodiments may be derived from a host cell or a microorganism. Thus, for example, the amino acid molecule may be derived from a microorganism that is modified (e.g., artificially and/or specifically genetically modified) from its naturally occurring variant or strain. In some embodiments, the host cell or the microorganism described herein may be Corynebacterium, Escherichia, Bacillus, Streptomyces, Penicillum, Klebsiella, Erwinia, or Pantoea. In some embodiments, the host cell or the microorganism may be Acidobacterium capsulatum, Alcaligenes faecalis, Bacillus amyloliquefaciens, Burkholderia pyrrocinia, Corynebacterium ammoniagenes, Corynebacterium glutamicum, Cupriavidus necator, Escherichia coli, Mycobacterium smegmatis, or Neisseria weaveri. The amino acid molecules described herein also may be specifically derived from a microorganism of the genus Corynebacterium, more specifically derived from Corynebacterium glutamicum, Corynebacterium deserti, Corynebacterium crenatum, Corynebacterium efficiens, Corynebacterium suranareeae, and the like, but is not limited thereto.
The term "enhancement" of a polypeptide may be used interchangeably with terms such as activation, up-regulation, overexpression, and increase. Here, activation, enhancement, up-regulation, overexpression, and increase may include both exhibiting activity that is not originally possessed and exhibiting an improved activity compared to the intrinsic activity or activity before modification or alteration of the amino acid sequence of the polypeptide. The "intrinsic activity" refers to the activity of a specific polypeptide originally possessed by the parent strain before transformation or the unmodified host cell or microorganism when the trait is changed by genetic mutation due to natural or artificial factors. The term "intrinsic activity" may be used interchangeably with the term "activity before modification". Thus, the activity of a polypeptide being "enhanced, up-regulated, overexpressed, or increased" compared to the intrinsic activity, means that the activity of a polypeptide is improved compared to the activity and/or concentration (expression level) of a specific polypeptide originally possessed by the parent strain before transformation or unmodified host cell or microorganism.
The enhancement may be achieved by introducing an exogenous polypeptide or by enhancing the activity and/or concentration (expression level) of the endogenous polypeptide. Whether or not the activity of a polypeptide is enhanced may be confirmed from an increase in the activity degree or expression level of the polypeptide or the amount of product generated from the activity of polypeptide.
For the enhancement of the activity of a polypeptide, various techniques including a combination of well-understood procedures may be utilized. The claimed peptide and/or the enhancement in activity are not limited as long as the activity of a target polypeptide has been enhanced compared to that in the host cell or microorganism before modification. Specifically, the procedures used for enhancing the activity of the peptide may be modifications using genetic engineering and/or protein engineering well known to those skilled in the art, which are routinely used in molecular biology, but are not limited thereto (for example, Sitnicka et al., Functional Analysis of Genes. Advances in Cell Biology. 2010, Vol. 2. 1-16, Sambrook et al. Molecular Cloning 2012).
In some embodiments, the enhancement of the polypeptide activity of the present disclosure may be achieved by:
1) an increase in the intracellular copy number of a polynucleotide encoding a polypeptide. In additional embodiments, the increase in the intracellular copy number of a polynucleotide encoding a polypeptide may be achieved by introduction into a host cell of a vector capable of replicating and functioning independently of a host, to which a polynucleotide encoding the polypeptide is operably linked. Alternatively, the increase may be achieved by introduction of one or two or more copies of a polynucleotide encoding the polypeptide into a chromosome in a host cell. The introduction into a chromosome may be performed by introducing a vector capable of inserting the polynucleotide into the chromosome in a host cell into the host cell, but is not limited thereto.
2) replacement of the gene expression control region on the chromosome encoding a polypeptide with a sequence having strong activity. The replacement of the gene expression control region (or expression control sequence) on the chromosome encoding a polypeptide with a sequence having strong activity may be, for example, occurrence of mutation in the sequence by deletion, insertion, non-conservative or conservative substitution or a combination thereof to further enhance the activity of the expression control region; or replacement with a sequence having stronger activity. The expression control region may include, but is not particularly limited to, a promoter, an operator sequence, a sequence encoding a ribosome binding site, and a sequence for regulating the termination of transcription and translation. As an example, the replacement may be to replace the original promoter with a strong promoter, but is not limited thereto. Examples of known strong promoters include, but are not limited to, CJ1 to CJ7 promoters (US 7662943 B2), lac promoter, trp promoter, trc promoter, tac promoter, lambda phage PR promoter, PL promoter, tet promoter, gapA promoter, SPL7 promoter, SPL13 (sm3) promoter (US 10584338 B2), O2 promoter (US 10273491 B2), tkt promoter, and yccA promoter.
3) modification of the nucleotide sequence encoding the start codon or 5'-UTR region of a gene transcript encoding a polypeptide. The modification of the nucleotide sequence encoding the start codon or 5'-UTR region of a gene transcript encoding a polypeptide may include, for example, substitution with a nucleotide sequence encoding another (i.e., a different) start codon having a higher polypeptide expression rate compared to the endogenous start codon.
4) modification of the amino acid sequence of a polypeptide to enhance the polypeptide activity or modification of a polynucleotide sequence encoding a polypeptide to enhance the polypeptide activity (for example, modification of the polynucleotide sequence of a polypeptide gene to encode a polypeptide that has been modified to enhance the activity of the polypeptide). The modification of the amino acid sequence or polynucleotide sequence may be, but is not limited to, effectuated by a mutation in the amino acid sequence of a polypeptide or the polynucleotide sequence encoding the polypeptide by deletion, insertion, non-conservative or conservative substitution, or a combination thereof so that the activity of the polypeptide is enhanced, or replacement of the sequence with an amino acid sequence or polynucleotide sequence modified to have stronger activity or an amino acid sequence or polynucleotide sequence modified to increase activity. The replacement may be specifically performed by inserting a polynucleotide into a chromosome by homologous recombination, but is not limited thereto. The vector used at this time may further include a selection marker for confirming chromosome insertion.
5) introduction of an exogenous polypeptide exhibiting the activity of a polypeptide or an exogenous polynucleotide encoding the same. The introduction of an exogenous polynucleotide exhibiting the activity of a polypeptide may be introduction of an exogenous polynucleotide encoding a polypeptide exhibiting activity the same as/similar to that of the polypeptide into a host cell. The exogenous polynucleotide is not limited in origin or sequence as long as it exhibits activity the same as/similar to that of the polypeptide. As the method used for the introduction, a known transformation method may be appropriately selected by those skilled in the art. As the introduced polynucleotide is expressed in the host cell, a polypeptide may be produced and its activity may be increased.
6) codon optimization of a polynucleotide encoding a polypeptide. The codon optimization of a polynucleotide encoding a polypeptide may be codon optimization of the endogenous polynucleotide so that the transcription or translation is increased in a host cell, or codon optimization of the exogenous polynucleotide so that the optimized transcription and translation are performed in a host cell.
7) modification or chemical modification of an exposed site selected through analysis of the tertiary structure of a polypeptide. The modification or chemical modification of an exposed site selected through analysis of the tertiary structure of a polypeptide may be, for example, determining a template protein candidate according to the degree of sequence similarity by comparing the sequence information of the polypeptide to be analyzed with a database in which sequence information of known proteins is stored, confirming the structure based on this, selecting an exposed site to be modified or chemically modified, selecting the exposed site to be modified or chemically modified, and modifying or chemically modifying the same.
8) a combination of two or more selected from 1) to 7) above, but not particularly limited thereto.
Such enhancement of the amino acid molecule activity may be an increase in the activity or concentration expression level of the corresponding amino acid molecule compared to the activity or concentration of the amino acid molecule expressed in the wild-type host cell or host cell before modification, or an increase in the amount of product produced with the amino acid molecule, but is not limited thereto.
In a further embodiment, the amino acid molecule is a recombinant protein such as, for example, a recombinant argininosuccinate synthase.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:6, X5PX6X7X8X9GX10AFSGGLDTSX11AX12, in which X5 is I, L or V; X6 is an amino acid; X7 is A, E or Q; X8 is K or R; X9 is I or V; X10 is I or L; X11 is A, T or V; and X12 is I, L or V.
In some embodiments, X5 of SEQ ID NO:6 is I. In some embodiments, X5 of SEQ ID NO:6 is L. In some embodiments, X5 of SEQ ID NO:6 is V.
In some embodiments, X7 of SEQ ID NO:6 is A. In some embodiments, X7 of SEQ ID NO:6 is E. In some embodiments, X7 of SEQ ID NO:6 is Q.
In some embodiments, X8 of SEQ ID NO:6 is K. In some embodiments, X8 of SEQ ID NO:6 is R.
In some embodiments, X9 of SEQ ID NO:6 is I. In some embodiments, X9 of SEQ ID NO:6 is V.
In some embodiments, X10 of SEQ ID NO:6 is I. In some embodiments, X10 of SEQ ID NO:6 is L.
In some embodiments, X11 of SEQ ID NO:6 is A. In some embodiments, X11 of SEQ ID NO:6is T. In some embodiments, X11 of SEQ ID NO:6 is V.
In some embodiments, X12 of SEQ ID NO:6 is I. In some embodiments, X12 of SEQ ID NO:6 is L. In some embodiments, X12 of SEQ ID NO:6 is V.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:7, X13GAXaX14X15X16YTA X17X18GQ X19DE, in which X13 is K or N; Xa is an amino acid; X14 is C or P; X15 is C or Y; X16 is A, S or T; X17 is D or N; X18 is I or L; and X19 is A, P or Y.
In some embodiments, X13 of SEQ ID NO:7 is K. In some embodiments, X13 of SEQ ID NO:7 is N.
In some embodiments, X14 of SEQ ID NO:7 is C. In some embodiments, X14 of SEQ ID NO:7 is P.
In some embodiments, X15 of SEQ ID NO:7 is C. In some embodiments, X15 of SEQ ID NO:7 is Y.
In some embodiments, X16 of SEQ ID NO:7 is A. In some embodiments, X16 of SEQ ID NO:7 is S. In some embodiments, X16 of SEQ ID NO:7 is T.
In some embodiments, X17 of SEQ ID NO:7 is D. In some embodiments, X17 of SEQ ID NO:7 is N.
In some embodiments, X18 of SEQ ID NO:7 is I. In some embodiments, X18 of SEQ ID NO:7 is L.
In some embodiments, X19 of SEQ ID NO:7 is A. In some embodiments, X19 of SEQ ID NO:7 is P. In some embodiments, X19 of SEQ ID NO:7 is Y.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:8, X20X21X22X23X24X25X26XaLX27, in which X20 is A or S; X21 is R or V; X22 is I or L; X23 is I or V; X24 is E or D; X25 is C or G; X26 is K or R; Xa is an amino acid; and X27 is A or V.
In some embodiments, X20 of SEQ ID NO:8 is A. In some embodiments, X20 of SEQ ID NO:8 is S.
In some embodiments, X21 of SEQ ID NO:8 is R. In some embodiments, X21 of SEQ ID NO:8 is V.
In some embodiments, X22 of SEQ ID NO:8 is I. In some embodiments, X22 of SEQ ID NO:8 is L.
In some embodiments, X23 of SEQ ID NO:8 is I. In some embodiments, X23 of SEQ ID NO:8 is V.
In some embodiments, X24 of SEQ ID NO:8 is E. In some embodiments, X24 of SEQ ID NO:8 is D.
In some embodiments, X25 of SEQ ID NO:8 is C. In some embodiments, X25 of SEQ ID NO:8 is G.
In some embodiments, X26 of SEQ ID NO:8 is K. In some embodiments, X26 of SEQ ID NO:8 is R.
In some embodiments, X27 of SEQ ID NO:8 is A. In some embodiments, X27 of SEQ ID NO:8 is V.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:9, X28AFX29XaXaX30X31G, in in which X28 is G or N; X29 is H or N; Xa is an amino acid; X30 is S or T; and X31 is A or G.
In some embodiments, X28 of SEQ ID NO:9 is G. In some embodiments, X28 of SEQ ID NO:9 is N.
In some embodiments, X29 of SEQ ID NO:9 is H. In some embodiments, X29 of SEQ ID NO:9 is N.
In some embodiments, X30 of SEQ ID NO:9 is S. In some embodiments, X30 of SEQ ID NO:9 is T.
In some embodiments, X31 of SEQ ID NO:9 is A. In some embodiments, X31 of SEQ ID NO:9 is G.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:10, YFNTTPX32GRAVX33X34TX35LV, in which X32 is I or L; X33 is A or T; X34 is A or G; and X35 is L or M.
In some embodiments, X32 of SEQ ID NO:10 is I. In some embodiments, X32 of SEQ ID NO:10 is L.
In some embodiments, X33 of SEQ ID NO:10 is A. In some embodiments, X33 of SEQ ID NO:10 is T.
In some embodiments, X34 of SEQ ID NO:10 is A. In some embodiments, X34 of SEQ ID NO:10 is G.
In some embodiments, X35 of SEQ ID NO:10 is L. In some embodiments, X35 of SEQ ID NO:10 is M.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:11, TX36KGNDIERF, in which X36 is F or Y.
In some embodiments, X36 of SEQ ID NO:11 is F. In some embodiments, X36 of SEQ ID NO:11 is Y.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:12, YRYGLX37X38N, in which X37 is L or V; and X38 is A, T or V.
In some embodiments, X37 of SEQ ID NO:12 is L. In some embodiments, X37 of SEQ ID NO:12 is V.
In some embodiments, X38 of SEQ ID NO:12 is A. In some embodiments, X38 of SEQ ID NO:12 is T. In some embodiments, X38 of SEQ ID NO:12 is V.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:13, GGRXaEMX39X40X41X42, in which Xa is an amino acid; X39 is A or S; X40 is A, E or Q; X41 is F, W or Y; and X42 is L or M.
In some embodiments, X39 of SEQ ID NO:13 is A. In some embodiments, X39 of SEQ ID NO:13 is S.
In some embodiments, X40 of SEQ ID NO:13 is A. In some embodiments, X40 of SEQ ID NO:13 is E. In some embodiments, X40 of SEQ ID NO:13 is Q.
In some embodiments, X41 of SEQ ID NO:13 is F. In some embodiments, X41 of SEQ ID NO:13 is W. In some embodiments, X41 of SEQ ID NO:13 is Y.
In some embodiments, X42 of SEQ ID NO:13 is L. In some embodiments, X42 of SEQ ID NO:13 is M.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:14, EKAYSTDX43NX44X45GATHE, in which X43 is A or S; X44 is I, L or M; and X45 is L or W.
In some embodiments, X43 of SEQ ID NO:14 is A. In some embodiments, X43 of SEQ ID NO:14 is S.
In some embodiments, X44 of SEQ ID NO:14 is I. In some embodiments, X44 of SEQ ID NO:14 is L. In some embodiments, X44 of SEQ ID NO:14 is M.
In some embodiments, X45 of SEQ ID NO:14 is L. In some embodiments, X45 of SEQ ID NO:14 is W.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:15, VXaPIMGVXaX46W, in which Xa is an amino acid and X46 is F, H or S.
In some embodiments, X46 of SEQ ID NO:15 is F. In some embodiments, X46 of SEQ ID NO:15 is H. In some embodiments, X46 of SEQ ID NO:15 is S.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:16, X47GGRHGX48GX49X50DQIENRX51IEA, in which X47 is I or V; X48 is L or M; X49 is M or V; X50 is A or S; and X51 is I or V.
In some embodiments, X47 of SEQ ID NO:16 is I. In some embodiments, X47 of SEQ ID NO:16 is V.
In some embodiments, X48 of SEQ ID NO:16 is L. In some embodiments, X48 of SEQ ID NO:16 is M.
In some embodiments, X49 of SEQ ID NO:16 is M. In some embodiments, X49 of SEQ ID NO:16 is V.
In some embodiments, X50 of SEQ ID NO:16 is A. In some embodiments, X50 of SEQ ID NO:16 is S.
In some embodiments, X51 of SEQ ID NO:16 is I. In some embodiments, X51 of SEQ ID NO:16 is V.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:17, KSRGIYEAPG.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:18, X52ALX53X54X55AX56ERLXaX57X58IHNEDT, in which X52 is L or M; X53 is F or L; X54 is F, H or Y; X55 is A or I; X56 is F or Y; Xa is an amino acid; X57 is N, S or T; and X58 is A or G.
In some embodiments, X52 of SEQ ID NO:18 is L. In some embodiments, X52 of SEQ ID NO:18 is M.
In some embodiments, X53 of SEQ ID NO:18 is F. In some embodiments, X53 of SEQ ID NO:18 is L.
In some embodiments, X54 of SEQ ID NO:18 is F. In some embodiments, X54 of SEQ ID NO:18 is H. In some embodiments, X54 of SEQ ID NO:18 is Y.
In some embodiments, X55 of SEQ ID NO:18 is A. In some embodiments, X55 of SEQ ID NO:18 is I.
In some embodiments, X56 of SEQ ID NO:18 is F. In some embodiments, X56 of SEQ ID NO:18 is Y.
In some embodiments, X57 of SEQ ID NO:18 is N. In some embodiments, X57 of SEQ ID NO:18 is S. In some embodiments, X57 of SEQ ID NO:18 is T.
In some embodiments, X58 of SEQ ID NO:18 is A. In some embodiments, X58 of SEQ ID NO:18 is G.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:19, LGX59LX60YX61GRWX62DX63QX64X65MX66RX67, in which X59 is K or R; X60 is L or M; X61 is A, E or Q; X62 is F or L; X63 is P or S; X64 is A, S or G; X65 is I, L or M; X66 is I, L or V; and X67 is D or E.
In some embodiments, X59 of SEQ ID NO: 19 is K. In some embodiments, X59 of SEQ ID NO: 19 is R.
In some embodiments, X60 of SEQ ID NO: 19 is L. In some embodiments, X60 of SEQ ID NO: 19 is M.
In some embodiments, X61 of SEQ ID NO: 19 is A. In some embodiments, X61 of SEQ ID NO: 19 is E. In some embodiments, X61 of SEQ ID NO: 19 is Q.
In some embodiments, X62 of SEQ ID NO: 19 is F. In some embodiments, X62 of SEQ ID NO: 19 is L.
In some embodiments, X63 of SEQ ID NO: 19 is P. In some embodiments, X63 of SEQ ID NO: 19 is S.
In some embodiments, X64 of SEQ ID NO: 19 is A. In some embodiments, X64 of SEQ ID NO: 19 is S. In some embodiments, X64 of SEQ ID NO: 19 is G.
In some embodiments, X65 of SEQ ID NO: 19 is I. In some embodiments, X65 of SEQ ID NO: 19 is L. In some embodiments, X65 of SEQ ID NO: 19 is M.
In some embodiments, X66 of SEQ ID NO: 19 is I. In some embodiments, X66 of SEQ ID NO: 19 is L. In some embodiments, X66 of SEQ ID NO: 19 is V.
In some embodiments, X67 of SEQ ID NO: 19 is D. In some embodiments, X67 of SEQ ID NO: 19 is E.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:20, LRRGXaDX68X69X70, in which Xa is an amino acid; X68 is F or Y; X69 is S or T; and X70 is I or L.
In some embodiments, X68 of SEQ ID NO: 20 is F. In some embodiments, X68 of SEQ ID NO: 20 is Y.
In some embodiments, X69 of SEQ ID NO: 20 is S. In some embodiments, X69 of SEQ ID NO: 20 is T.
In some embodiments, X70 of SEQ ID NO: 20 is I. In some embodiments, X70 of SEQ ID NO: 20 is L.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:21, X71X72X73X74XaX75X76X77LX78MEX79, in which X71 is A, D or N; X72 is F or L; X73 is S or T; X74 is F or Y; Xa is an amino acid; X75 is A, P or S; X76 is A, E or D; X77 is K or R; X78 is S or T; and X79 is K or R.
In some embodiments, X71 of SEQ ID NO: 21 is A. In some embodiments, X71 of SEQ ID NO: 21 is D. In some embodiments, X71 of SEQ ID NO: 21 is N.
In some embodiments, X72 of SEQ ID NO: 21 is F. In some embodiments, X72 of SEQ ID NO: 21 is L.
In some embodiments, X73 of SEQ ID NO: 21 is S. In some embodiments, X73 of SEQ ID NO: 21is T.
In some embodiments, X74 of SEQ ID NO: 21 is F. In some embodiments, X74 of SEQ ID NO: 21 is Y.
In some embodiments, X75 of SEQ ID NO: 21 is A. In some embodiments, X75 of SEQ ID NO: 21 is P. In some embodiments, X75 of SEQ ID NO: 21 is S.
In some embodiments, X76 of SEQ ID NO: 21 is A. In some embodiments, X76 of SEQ ID NO: 21 is E. In some embodiments, X76 of SEQ ID NO: 21 is D.
In some embodiments, X77 of SEQ ID NO: 21 is K. In some embodiments, X77 of SEQ ID NO: 21 is R.
In some embodiments, X78 of SEQ ID NO: 21 is S. In some embodiments, X78 of SEQ ID NO: 21 is T.
In some embodiments, X79 of SEQ ID NO: 21 is K. In some embodiments, X79 of SEQ ID NO: 21 is R.
In additional embodiments, the amino acid molecule comprises a sequence having SEQ ID NO:22, DRIGQLX80MRX81X82DX83XaDX84R, in which X80 is H or T; X81 is L, N or T; X82 is L or N; X83 is I, L or V; Xa is an amino acid; and X84 is S or T.
In some embodiments, X80 of SEQ ID NO:22 is H. In some embodiments, X80 of SEQ ID NO: 22 is T.
In some embodiments, X81 of SEQ ID NO:22 is L. In some embodiments, X80 of SEQ ID NO: 22 is N. In some embodiments, X80 of SEQ ID NO:22 is T.
In some embodiments, X82 of SEQ ID NO:22 is L. In some embodiments, X82 of SEQ ID NO: 22 is N.
In some embodiments, X83 of SEQ ID NO:22 is I. In some embodiments, X83 of SEQ ID NO: 22 is L. In some embodiments, X83 of SEQ ID NO:22 is V.
In some embodiments, X84 of SEQ ID NO:22 is S. In some embodiments, X84 of SEQ ID NO: 22 is T.
In additional embodiments, the amino acid molecule comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 or 17 sequence(s) selected from the group consisting of SEQ ID NO: 6-22. In some embodiments, the amino acid molecule comprises all of SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, and SEQ ID NO: 22.
In additional embodiments, the amino acid molecule described herein may have a length of at least 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, or 171 amino acids and/or 200, 199, 198, 197, 196, 195, 194, 193, 192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 181, 180, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154, 153, 152, 151 amino acids or less.
In further embodiments, the amino acid molecule comprises a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. capsulatum sequence, SEQ ID NO: 23.
In a further embodiment, the amino acid molecule comprises a sequence corresponding to at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-445 of SEQ ID NO: 23. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 23.
In further embodiments, the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. faecalis sequence, SEQ ID NO: 24.
In a further embodiment, the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-444 of SEQ ID NO: 24. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-444 of SEQ ID NO: 24.
In further embodiments, the amino acid molecule comprises a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 26.
In a further embodiment, the amino acid molecule comprises a sequence corresponding to at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-445 of SEQ ID NO: 26. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 26.
In further embodiments, the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to C. necator sequence, SEQ ID NO: 29.
In another embodiment, the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-153 and 167-441 of SEQ ID NO: 29. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 29.
In further embodiments, the amino acid molecule comprises a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 % sequence identity to E. coli sequence SEQ ID NO: 30.
In another embodiment, the amino acid molecule comprises a sequence corresponding to at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 % of positions 1-153 and 167-447 of SEQ ID NO: 30. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 30.
In further embodiments, the amino acid molecule comprises a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to N. weaveri sequence, SEQ ID NO: 32.
In a further embodiment, the amino acid molecule comprises a sequence corresponding to at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% of positions 1-157 and 171-448 of SEQ ID NO: 32. In still a further embodiment, the amino acid molecule comprises a sequence corresponding to positions 1-157 and 171-448 of SEQ ID NO: 32.
In further embodiments, the amino acid molecule comprises a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 23; having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 24; having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 25; having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 26; having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 27; and having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to SEQ ID NO: 28.
In another aspect, the present disclosure provides a host cell or microorganism expressing the amino acid molecule disclosure herein.
In another aspect, the present disclosure provides a nucleic acid molecule comprising a nucleic acid sequence encoding the amino acid molecule disclosed herein.
In another aspect, the present disclosure provides a nucleic acid molecule comprising a sequence having at least 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. capsulatum sequence, SEQ ID NO: 33; a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to A. faecalis sequence, SEQ ID NO: 34; a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 35; a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to C. necator sequence, SEQ ID NO: 36; a sequence having at least 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to E. coli sequence SEQ ID NO: 37; or a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% sequence identity to N. weaveri sequence, SEQ ID NO: 38, wherein the nucleic acid molecule does not occur in nature.
In another aspect, a vector comprising a nucleic acid molecule as disclosed herein is provided. A "vector" according to some embodiments of the present disclosure refers to and may include a DNA construct containing a nucleotide sequence of a polynucleotide encoding a target polypeptide operably linked to a suitable expression control region (or expression control sequence) so that the target polypeptide can be expressed in a suitable host. The expression control region may include a promoter capable of initiating transcription, an optional operator sequence for regulating such transcription, a sequence encoding a suitable mRNA ribosome binding site, and a sequence regulating the termination of transcription and translation. After transformation into an appropriate host cell, the vector may replicate or function independently of the host genome, and may be integrated into the genome itself.
The vector used in the present disclosure is not particularly limited, and any vector known in the art may be used. Examples of commonly used vectors include natural or recombinant plasmids, cosmids, viruses and bacteriophages. For example, pWE15, M13, MBL3, MBL4, IXII, ASHII, APII, t10, t11, Charon4A, and Charon21A may be used as a phage vector or cosmid vector, and a pDZ system, a pBR system, a pUC system, a pBluescript II system, a pGEM system, a pTZ system, a pCL system, and a pET system may be used as a plasmid vector. Specifically, pDZ, pDC, pDCM2, pACYC177, pACYC184, pCL, pECCG117, pUC19, pBR322, pMW118, pCC1BAC vectors and the like may be used. Additional information about the vectors may be found in U.S. Patent Application Publication No. 2023/0134555, which is incorporated by reference in its entirety.
As an example, a polynucleotide encoding a target polypeptide may be inserted into a chromosome through a vector for intracellular chromosome insertion. The insertion of a polynucleotide into a chromosome may be performed by any method known in the art, for example, homologous recombination, but is not limited thereto. A selection marker for confirming chromosome insertion may be further included. The selection marker is used to select cells transformed with a vector, that is, to confirm the insertion of a target nucleic acid molecule, and markers that confer selectable phenotypes such as drug resistance, auxotrophy, resistance to cytotoxic agents, and expression of surface polypeptides may be used. In an environment treated with a selective agent, only the cells expressing the selection marker can survive or exhibit other expression traits, and thus the transformed cells can be selected.
The modification of part or all of the polynucleotide in the host cell or microorganism according to some embodiments of the present disclosure may be induced by (a) homologous recombination using a vector for chromosome insertion into host cells or microorganisms, or genome editing using engineered nuclease (e.g., CRISPR-Cas9) and/or (b) light such as ultraviolet light and radiation and/or chemical treatment, but is not limited thereto. The method for modifying part or all of the gene may include a method by DNA recombination technology. For example, by injecting a nucleotide sequence or vector including a nucleotide sequence homologous to the target gene into the host cell or microorganism to cause homologous recombination, part or all of the gene may be deleted. The injected nucleotide sequence or vector may include a dominant selection marker, but is not limited thereto.
In the polynucleotide according to some embodiments of the present disclosure, various modifications may be made in the coding region within a range in which the amino acid sequence of the protein is not changed in consideration of codon degeneracy or preferred codons in organisms to express the protein described herein. Another aspect of the present disclosure is to provide a polynucleotide comprising a nucleic acid sequence encoding the protein described above, wherein the polynucleotide does not occur in nature.
The sequence homology or identity of a conserved polynucleotide or polypeptide is determined by standard alignment algorithms, and a default gap penalty established by the program being used may be used together. Substantially homologous or identical sequences are generally capable of hybridizing with all or part of the sequence under moderate or high stringent conditions. It is apparent that hybridization also includes hybridization with a polynucleotide containing a common codon in a polynucleotide or a codon taking codon degeneracy into account.
Whether or not any two polynucleotide or polypeptide sequences have homology, similarity, or identity may be determined, for example, using known computer algorithms such as the "FASTA" program using default parameters as in Pearson et al. (1988) [Proc. Natl. Acad. Sci. USA 85]: 2444. Alternatively, the homology, similarity or identity may be determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as performed in the Needleman program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277) (version 5.0.0 or later) (including GCG program package (Devereux, J., et al., Nucleic Acids Research 12: 387 (1984), BLASTP, BLASTN, FASTA (Atschul, [S.] [F.,] [et al., J. Mol. Biol. 215]: 403 (1990); Guide to Huge Computers, Martin J. Bishop, [ED.,] Academic Press, San Diego, 1994, and [CARILLO et al.](1988) SIAM J Applied Math 48: 1073). For example, BLAST of the National Center for Biotechnology Information, or ClustalW may be used to determine the homology, similarity, or identity.
The homology, similarity or identity of polynucleotides or polypeptides may be determined by comparing sequence information, for example, using a GAP computer program such as Needleman et al. (1970), J Mol Biol. 48: 443, for example, as known in Smith and Waterman, Adv. Appl. Math. (1981) 2: 482. In summary, a GAP program may be defined as the value acquired by dividing the total number of symbols in the shorter of two sequences by the number of similarly aligned symbols (namely, nucleotides or amino acids). Default parameters for the GAP program may include (1) binary comparison matrix (containing values of 1 for identity and 0 for non-identity) and weighted comparison matrix of Gribskov et al., (1986) Nucl. Acids Res. 14: 6745 as disclosed in Schwartz and Dayhoff, eds., Atlas Of Protein Sequence And Structure, National Biomedical Research Foundation, pp. 353-358 (1979) (or EDNAFULL (EMBOSS version of NCBI NUC4.4) substitution matrix); (2) a penalty of 3.0 for each gap and an additional 0.10 penalty for each symbol in each gap (or a gap opening penalty of 10, a gap extension penalty of 0.5); and (3) no penalty for an end gap.
In a further aspect, a host cell, a microorganism or "strain" is disclosed herein. As used herein, the term "host cell" includes both wild-type unmodified host cells or host cells in which genetic modification has occurred naturally or "transformation" has occurred artificially. In some embodiments, the host cell described herein may be a microorganism. The term "microorganism (or strain)" includes both wild-type unmodified microorganisms or microorganisms in which genetic modification has occurred naturally or "transformation" has occurred artificially. As used herein, the term "unmodified host cell" or "unmodified microorganism" does not exclude strains containing mutations that may occur naturally in the host cells or microorganisms, and may refer to a wild-type strain or a natural strain itself, or a strain before the trait is changed by genetic mutation due to natural or artificial factors. For example, the unmodified microorganism may refer to a strain in which the activity of the protein described herein is not enhanced or has not yet been enhanced. The "unmodified microorganism" may be used interchangeably with "strain before modification", "microorganism before modification", "unmutated strain", "unmodified strain", "unmutated microorganism", or "reference microorganism".
As used herein, the term "transformation" refers to introducing a vector including a polynucleotide encoding a target polypeptide into a host cell or microorganism so that the polypeptide encoded by the polynucleotide can be expressed in the host cell. The transformed polynucleotide may include both: a transformed polynucleotide that is located by being inserted into the chromosome of the host cell and a transformed polynucleotide that is located outside the chromosome as long as they can be expressed in the host cell. In addition, the polynucleotide includes DNA and/or RNA encoding a target polypeptide. The polynucleotide may be introduced in any form as long as it can be introduced into and expressed in a host cell. For example, the polynucleotide may be introduced into a host cell in the form of an expression cassette, which is a gene construct including all elements required for self-expression. The expression cassette may usually include a promoter operably linked to the polynucleotide, a transcription termination signal, a ribosome binding site, and a translation termination signal. The expression cassette may be in the form of an expression vector capable of self-replication. In addition, the polynucleotide may be introduced into a host cell in its own form and be operably linked to a sequence required for expression in the host cell, but is not limited thereto.
In some embodiments, the engineered host cell or microorganism described herein may comprise one, two, three or more vectors, each comprising a different nucleotide sequence. In some embodiments, the engineered host cell or microorganism described herein may comprise one, two, three or more vectors, each comprising a nucleotide sequence encoding a different amino acid molecules.
Artificially modified host cells or microorganisms―i.e., "engineered host cells or microorganisms" - refer to host cells or microorganisms in which a specific mechanism is weakened or enhanced by causes such as insertion of an exogenous gene or enhancement or inactivation of the activity of an endogenous gene, containing genetic modification to produce a desired polypeptide, protein, or product. Engineered host cells, microorganisms or strains in accordance with some embodiments herein may be transformed to express recombinant nucleic acid sequences and/or produce recombinant amino acid molecules or argininosuccinate synthase proteins.
In one aspect, the present disclosure provides a host cell, microorganism or strain expressing an argininosuccinate synthase disclosed herein. The host cell or microorganism may be an engineered host cell or microorganism, in which the host cell or microorganism has been transformed, modified, or mutated artificially by techniques known to those of skill in the art to yield a host cell or microorganism that does not occur in nature.
In some embodiments of the host cell according to the present disclosure, an amino acid sequence for the amino acid molecule or argininosuccinate synthase described herein is extrinsic to the host cell. Specifically, in some embodiments, an amino acid molecule or an argininosuccinate synthase disclosed herein is extrinsic to the host cell. Herein, the "extrinsic" sequence means that the sequence does not occur in the host cell without any modification, and in particular, without any artificial modification. The host cell may be modified to produce an amino acid molecule or an argininosuccinate synthase of another organism or a non-naturally occurring recombinant amino acid molecule or argininosuccinate synthase that does not naturally occur in the host cell (i.e., without any modification). In some embodiments of the host cell according to the present disclosure, an amino acid sequence for an amino acid molecule or argininosuccinate synthase described herein is an intrinsic amino acid sequence of the microorganism. Herein, the "intrinsic" sequence means that the sequence can occur in the host cell without any modification. The host cell may be modified to produce a recombinant amino acid molecule or an argininosuccinate synthase having an intrinsic sequence of the host cell.
In further embodiments of the host cell according to the present disclosure, a nucleotide sequence encoding the argininosuccinate synthase protein is an extrinsic nucleotide sequence of the host cell. In further embodiments of the host cell according to the present disclosure, a nucleotide sequence encoding the argininosuccinate synthase protein is an intrinsic nucleotide sequence of the host cell.
In some embodiments of the present disclosure, the recombinant or modified host cell may be a host cell having increased L-arginine producing ability compared to a natural wild-type host cell, because the activity of the protein described herein or a polynucleotide encoding the same is improved in the recombinant or modified host cell compared to that in the natural wild-type host cell, but is not limited thereto. The L-arginine producing recombinant or modified host cell (e.g., of the genus Corynebacterium) of the present disclosure can produce L-arginine at a high yield, and can therefore be used for industrial production of L-arginine.
As an example, the L-arginine producing ability of the recombinant or modified host cell may be increased by about 0.3% or more, specifically by about 0.5% or more, about 1% or more, about 2% or more, about 3% or more, about 4% or more, about 5% or more, about 6% or more, about 7% or more, about 8% or more, about 9% or more, about 10% or more, about 10.7% or more, about 11% or more, about 12% or more, about 12.3% or more, about 13% or more, about 14% or more, about 14.4% or more, about 15% or more, about 15.1% or more, about 16% or more, or about 16.9% or more (the upper limit thereof is not particularly limited and the producing ability may be increased by, for example, about 200% or less, about 150% or less, about 100% or less, about 50% or less, about 40% or less, about 30% or less, or about 20% or less) compared to the L-arginine producing ability of the parent strain before mutation or unmodified host cell. In some embodiments, the L-arginine production by the engineered host cell described herein is increased at least by about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 or 50% and/or about 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40 or less %. However, the producing ability is not limited thereto as long as it has an increased amount of + value compared to the producing ability of the parent strain before mutation or an unmodified host cell thereof. In some embodiments, the parent strain may be a cell prior to transforming an amino acid molecule described herein.
In another example, the L-arginine producing ability of the host cell having the increased producing ability may be increased by about 1.005 times or more, about 1.01 times or more, about 1.02 times or more, about 1.03 times or more, about 1.04 times or more, about 1.05 times or more, about 1.06 times or more, about 1.07 times or more, about 1.08 times or more, about 1.09 times or more, about 1.10 times or more, about 1.107 times or more, about 1.11 times or more, about 1.12 times or more, about 1.123 times or more, about 1.13 times or more, about 1.14 times or more, about 1.144 times or more, about 1.15 times or more, about 1.151 times or more, about 1.16 times or more, about 1.169 times or more, 1.2 times or more, 1.3 times or more, 1.4 times or more (the upper limit thereof is not particularly limited and the producing ability may be increased by, for example, about 10 times or less, about 5 times or less, about 3 times or less, or about 2 times or less) compared to the L-arginine producing ability of the parent strain before mutation or an unmodified host cell thereof, but is not limited thereto.
As used herein, the term "about" is relative to the actual value stated, as will be appreciated by those of skill in the art, and allows for approximations, inaccuracies and limits of measurement under the relevant circumstances. In one or more aspects, the terms "about," "substantially," and "approximately" may provide an industry-accepted tolerance for their corresponding terms and/or relativity between items, such as a tolerance of from less than one percent to ten percent of the actual value stated, and other suitable tolerances. The actual value stated may, for example, be that of lengths of nucleotide sequences, degrees of errors, dimensions, quantity of an ingredient in a composition, concentrations, volumes, process temperature, process time, yields, flow rates, pressures, and like and/or ranges thereof. In some instances, the tolerance may occur, for example, through typical measuring and handling procedures used for making compounds, compositions, concentrates or use formulations; through inadvertent error in these procedures; through differences in the manufacture, source, or purity of starting materials or ingredients used to carry out the methods; and like considerations. The term "about" also encompasses amounts that differ due to aging of, for example, a composition, formulation, or cell culture with a particular initial concentration or mixture, and amounts that differ due to mixing or processing a composition or formulation with a particular initial concentration or mixture. Whether modified by the term "about" the claims appended hereto include equivalents to these quantities. The term "about" further may refer to a range of values that are similar to the stated reference value. In certain embodiments, the term "about" refers to a range of values that fall within, for example, 50, 25, 10, 9, 8,7, 6, 5,4, 3, 2, 1 percent or less of the stated reference value.
In one aspect, the present disclosure provides an engineered host cell expressing a recombinant amino acid molecule, including a recombinant protein that is an argininosuccinate synthase. The amino acid molecule has an amino acid sequence comprising SEQ ID NO: 1 disclosed herein.
In some embodiments, the engineered host cell described above may be an engineered microorganism. In some embodiments, the engineered microorganism is Corynebacterium. In some embodiments, the engineered microorganism is Corynebacterium glutamicum. In some embodiments, the engineered microorganism is Corynebacterium stationis. In some embodiments, the engineered microorganism is Corynebacterium crudilactis. In some embodiments, the engineered microorganism is Corynebacterium deserti. In some embodiments, the engineered microorganism is Corynebacterium efficiens. In some embodiments, the engineered microorganism is Corynebacterium callunae. In some embodiments, the engineered microorganism is Corynebacterium singulare. In some embodiments, the engineered microorganism is Corynebacterium halotolerans. In some embodiments, the engineered microorganism is Corynebacterium striatum. In some embodiments, the engineered microorganism is Corynebacterium ammoniagenes. In some embodiments, the engineered microorganism is Corynebacterium pollutisoli. In some embodiments, the engineered microorganism is Corynebacterium imitans. In some embodiments, the engineered microorganism is Corynebacterium testudinoris. In some embodiments, the engineered microorganism is Corynebacterium flavescens. In some embodiments, the engineered microorganism is Corynebacterium crenatum. In some embodiments, the engineered microorganism is Corynebacterium suranareeae. In some embodiments, the engineered microorganism is Escherichia. In some embodiments, the engineered microorganism is E. coli.
In some embodiments, the engineered host cell produces L-arginine. In some embodiments, the engineered microorganism produces L-arginine.
In some embodiments, the engineered host cell or microorganism is transformed with an argR gene comprising a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 100% sequence identity to SEQ ID NO: 39. In some embodiments, the engineered host cell or microorganism is transformed with an argB gene with M54V comprising a sequence having at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 100% sequence identity to SEQ ID NO: 40.
In some embodiments, the recombinant protein has an extrinsic amino acid sequence of the hot cell or microorganism. In other embodiments, the recombinant protein has an intrinsic amino acid sequence of the host cell or microorganism.
In another aspect, the present disclosure provides use of the engineered host cell or microorganism disclosed herein for producing L-arginine.
In yet a further aspect, a method for producing L-arginine is provided herein. In some embodiments, the method comprises culturing an engineered host cell expressing a recombinant protein disclosed herein. In some embodiments, the host cell is cultured in a medium. In the present disclosure, the term "culture" means growing the host cell described herein under properly controlled environmental conditions. The culture process according to some embodiments of the present disclosure may be performed under appropriate medium and culture conditions known in the art. Such a culture process may be easily adjusted and used by those skilled in the art depending on the selected host cell. Specifically, the culture may be batch, continuous and/or fed-batch culture, but is not limited thereto.
As used herein, the term "medium" refers to a material in which nutrients required for culturing the host cell described herein are mixed as a main component, and supplies nutrients and growth factors, including water, which are essential for survival and growth. Specifically, as the medium and other culture conditions used for culturing the host cell described herein, any medium may be used without particular limitation as long as it is a medium used for culturing conventional host cells, but the host cell described herein may be cultured in a conventional medium containing appropriate carbon sources, nitrogen sources, phosphorus sources, inorganic compounds, amino acids, vitamins and/or the like under an aerobic condition while the temperature, pH, and the like are adjusted. For example, a culture medium for host cells of the genus Corynebacterium may be found in the literature ["Manual of Methods for General Bacteriology" by the American Society for Bacteriology (Washington D.C., USA, 1981)].
During the culture of the host cell according to some embodiments of the present disclosure, compounds such as ammonium hydroxide, potassium hydroxide, ammonia, phosphoric acid, sulfuric acid, and the like may be added to the medium in an appropriate manner to adjust the pH of the medium. During the culture, an antifoaming agent such as fatty acid polyglycol ester may be used to suppress bubble formation. Oxygen or oxygen-containing gas may be injected into the medium in order to maintain the aerobic state of the medium; or gas may not be injected or nitrogen, hydrogen or carbon dioxide gas may be injected in order to maintain the anaerobic and microaerobic conditions, but the control of atmosphere is not limited thereto.
In further embodiments of the method for producing L-arginine, the culturing is performed with at least one source of carbon. Suitable sources of carbon include, but are not limited to, carbohydrates, sugar alcohols, organic acids, and amino acids. Carbohydrates may include glucose, saccharose, lactose, fructose, sucrose, and maltose, among others. Sugar alcohols may include mannitol and sorbitol, among others. Organic acids may include pyruvic acid, lactic acid, and citric acid, among others, amino acids may include glutamic acid, methionine, and lysine, among others. In addition, natural organic nutrients such as starch hydrolysates, molasses, blackstrap molasses, rice bran, cassava, sugar cane waste and corn steep liquor may be used. In particular embodiments, carbohydrates such as glucose and sterilized pre-treated molasses (namely, molasses converted to reducing sugar) may be used, and appropriate amounts of other carbon sources may be variously used without limitation. These carbon sources may be used singly or in combination of two or more, but the manner of use is not limited thereto.
In further embodiments of the method for producing L-arginine, the culturing is performed with at least one source of nitrogen. Suitable sources of nitrogen include inorganic nitrogen sources, organic nitrogen sources, and combinations thereof. Inorganic nitrogen sources include, but are not limited to, ammonia, ammonium sulfate, ammonium chloride, ammonium phosphate, and ammonium nitrate. Organic nitrogen sources include, but are not limited to, organic acids including ammonium acetate and ammonium carbonate; amino acids such as glutamic acid, methionine, and glutamine; and other sources of organic nitrogen including peptone, NZ-amine, meat extract, yeast extract, malt extract, corn steep liquor, casein hydrolysates, fish or decomposition products thereof, and defatted soybean cake or decomposition products thereof. These nitrogen sources may be used singly or in combination of two or more, but the manner of use is not limited thereto.
In further embodiments of the method for producing L-arginine, the culturing is performed with at least one of monobasic potassium phosphate, dibasic potassium phosphate, monobasic sodium phosphate, dibasic sodium phosphate, sodium chloride, calcium chloride, iron chloride, magnesium sulfate, iron sulfate, manganese sulfate, calcium carbonate, and the like. Additionally, amino acids, vitamins, suitable precursors and/or the like may be added to the medium.
These sources of carbon, sources of nitrogen, additives, or precursors thereof may be added to the medium batchwise or continuously. However, the manner of addition is not limited thereto.
Another embodiment of the method of producing L-arginine as disclosed herein provides that the culturing is performed with at least one selected from the group consisting of potassium phosphate, dipotassium phosphate, a sodium-containing salt corresponding thereto, sodium chloride, calcium chloride, iron chloride, magnesium sulfate, iron sulfate, manganese sulfate, and calcium carbonate.
In further embodiments of the method for producing L-arginine, the culturing is performed at a temperature maintained between about 20 °C to about 45 °C. In some embodiments, the culturing is performed at a temperature maintained between about 25 °C to about 40 °C. In some embodiments, the culturing is performed at a temperature maintained from about 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 or 36 °C to about 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44 or 45 °C. In still further embodiments, the culturing may be conducted for from about 10 to about 160 hours. In still further embodiments, the culturing may be conducted for from about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 to about 100, 110, 120, 130, 140, 150 or 160 hours. However, the culture conditions are not limited thereto.
Another aspect of the present disclosure provides a method of increasing L-arginine production by a host cell. In some embodiments, the method comprises transforming the host cell to produce the engineered host cell expressing a recombinant amino acid molecule disclosed herein.
In further embodiments of the methods as set forth herein, the method further includes preparing the host cell described herein, preparing a medium for culturing the host cell, or a combination thereof (in any order), for example, before the culturing step.
L-arginine produced by the culture according to some embodiments of the present disclosure may be secreted into the medium or may remain in the cells. In a further embodiment of the method for producing L-arginine comprising culturing an engineered host cell expressing a recombinant amino acid molecule such as, for example, a recombinant argininosuccinate synthase as set forth herein, the method further includes recovering L-arginine from the medium after a culturing step (medium subjected to culture) or from the cultured host cell. The recovery step may be further comprised after the culture step.
In some embodiments of the methods for producing L-arginine comprising culturing an engineered host cell expressing a recombinant protein as set forth herein, the production method further comprises recovering L-arginine from the cultured engineered host cell and/or from the medium in which the engineered host cell is cultured. Suitable methods for recovering the L-arginine include protocols known in the art according to the method for culturing a host cell described herein, for example, a batch, continuous or fed-batch culture method. For example, centrifugation, filtration, treatment with a crystallized protein precipitating agent (salting-out method), extraction, sonication, ultrafiltration, dialysis, various kinds of chromatography such as molecular sieve chromatography (gel filtration), adsorption chromatography, ion-exchange chromatography, and affinity chromatography, HPLC, or any combination thereof may be used, and the desired L-arginine may be recovered from the medium or host cell using a suitable method known in the art.
In some embodiments of the methods for producing L-arginine as set forth herein, the production method further comprises a purification step. The purification may be performed using a suitable method known in the art. In an example, when the method for producing L-arginine of the present disclosure comprises both a recovery step and a purification step, the recovery step and the purification step may be performed continuously or discontinuously in any order, or may be performed simultaneously or by being integrated into one step, but the manner of performance is not limited thereto.
In the method according to some embodiments, the protein, polynucleotide, vector, host cell and the like are as described in the other aspects herein.
Another aspect of the present disclosure provides a composition for L-arginine production, comprising a microorganism of the genus Corynebacterium; a medium in which the microorganism has been cultured; or a combination thereof. The composition according to some embodiments of the present disclosure may further contain arbitrary suitable excipients commonly used in compositions for L-arginine production, and such excipients may include, for example, but are not limited to, preservatives, wetting agents, dispersing agents, suspending agents, buffering agents, stabilizing agents, and isotonic agents.
Each description and embodiment disclosed in the present disclosure may also be applied to other descriptions and embodiments. That is, all combinations of various elements disclosed in the present disclosure fall within the scope of the present application. Further, the scope of the present application is not limited by the specific description below. In addition, those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific aspects of the present application described herein. Further, these equivalents should be interpreted to fall within the scope of the present application.
Another aspect provides a use of the L-arginine product characterized by one or more elements disclosed in the application. A further aspect provides a use of the method characterized by one or more elements disclosed in the application.
Hereinafter, the present application will be described in more detail with reference to Examples. However, the following Examples are merely preferred embodiments for illustrating the present application, and therefore, the scope of the present application is not intended to be limited thereto. Meanwhile, technical matters not described in this specification can be sufficiently understood and easily implemented by those skilled in the art of the present application or similar technical fields.
Example 1. Comparison of protein sequences and tertiary structures of argininosuccinate synthase between genera
First, the amino acid residues essential for the binding to substrates such L- citrulline, aspartate and ATP in Corynebacterium glutamicum were identified through protein sequence alignment between the argininosuccinate synthase of Corynebacterium glutamicum (SEQ ID NO: 28) and a heterologous argininosuccinate synthase with a known structure. Although the tertiary structure of argininosuccinate synthase of Corynebacterium glutamicum has not been identified, with regard to the argininosuccinate synthase derived from Escherichia coli (SEQ ID NO: 30), the tertiary crystal structure and active site residues therein have been reported. Among them, threonine at position 130, tyrosine at 131, lysine at 132, glycine at 133, asparagine at 134, and aspartic acid at 135 have been reported as essential residues for the binding to the substrates (Lemke, C T, and P L Howell., Structure 9(12):1153-64, 2001)). As a result of performing protein sequence alignment of argininosuccinate synthase between E. coli and Corynebacterium glutamicum based on the same (see FIG. 1), it was found that in the case of Corynebacterium glutamicum, threonine at position 119, glycine at 120, lysine at 121, glycine at 122, asparagine at 123, and aspartic acid at 124 correspond to the residues described above and are essential for the binding to the substrate.
Next, in order to discover the non-conserved structure of the substrate binding site, the protein sequences of argininosuccinate synthase derived from several heterogeneous model microorganisms were subjected to multiple sequence alignment.
As a result, it was found that the argininosuccinate synthases of Acidobacterium capsulatum, Alcaligenes faecalis, Burkholderia pyrrocinia, Cupriavidus necator, Escherichia coli, and Neisseria weaveri's argininosuccinate synthases have a 5-amino-acid extension around essential residues in the binding site, which is not present in Corynebacterium glutamicum (e.g., see FIG. 2 and SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 26 SEQ ID NO: 29 SEQ ID NO: 30 and SEQ ID NO: 32, respectively).
In order to predict the role of this amino acid extension in terms of protein function, the tertiary structures of argininosuccinate synthase with and without the amino acid extension were compared. As a tertiary structure not including the amino acid extension, the Corynebacterium glutamicum-derived protein was used, and the Alphafold program was used to generate the relevant tertiary protein structures (Jumper, J., Evans, R., Pritzel, A. et al. Nature 596, 583-589, 2021). As a tertiary structure including the amino acid extension, the E. coli-derived protein (PDB ID: 1K97) was used. For comparison, these two were overlapped by the VMD program (Humphrey, W et al., J Mol Graph. 14(1):33-8, 27-8, 1996). See FIG. 3.
As depicted in FIG. 3, the site where the amino acid extension has occurred is located at the entrance of the active site on the tertiary structure of the protein and is particularly adjacent to the substrate binding site. Additionally, this amino acid extension may be involved in the formation of a helical secondary structure. While, in the Corynebacterium glutamicum-derived protein where this amino acid extension is absent, the corresponding site exists in the form of a loop without a specific secondary structure, in the E. coli-derived protein where this extension is present, the site exists in the form of an alpha helix structure. Additionally, he distance between the corresponding region and the substrate may increase in the tertiary structure of the protein due to the difference in sequence between this extended portion and the portion preceding it. Overall, when the corresponding amino acid extension is present, the active site entrance may be expanded due to the formation of a secondary structure, and easier access to the active site of the substrate can be facilitated, which may contribute to the improvement of catalytic efficiency of the protein. In addition to easier access of substrates, by facilitating and stabilizing quaternary structure formation of the protein it can contribute to enhancing its catalytic efficiency. Argininosuccinate synthase from Escherichia coli may be functional as a tetramer. The amino acid extension may be located near interface between argininosuccinate synthase monomers and thus capable of affecting tetramer formation.
The identities of the 10 kinds of argininosuccinate synthase shown in Table 1 below were analyzed.
Acidobacterium capsulatum KEDDVNIWG DGS TFKGND IERF YRYGLLVNPDLKV YKPWLD SAFIDELG
167 (Positions 119-167 of SEQ ID NO: 23)
Alcaligenes faecalis KEDDVHIWG DGS TYKGND IERF YRYGLLTNPELKI YKPWLD QTFIDELG
167 (Positions 119-167 of SEQ ID NO: 24)
Bacillus amyloliquefaciens EKENAVAVA HGC TGKGND QVRF EVSIKSLNPDLEVIAPVREWQW-S---
151 (Positions 107-151 of SEQ ID NO: 25)
Burkholderia pyrrocinia REDGVNIWG DGS TYKGND IERF YRYGLLVNPDLKI YKPWLD QTFIDELG
167 (Positions 119-167 of SEQ ID NO: 26)
Corynebacterium ammoniagenes QEFGGIHVS HGC TGKGND QVRF EVSFRALDPSLDIIAPARDYAW-T---
151 (Positions 107-151 of SEQ ID NO: 27)
Corynebacterium glutamicum KQFNGTHVA HGC TGKGND QVRF EVGFMDTDPNLEIIAPARDFA----W-
150 (Positions 107-150 of SEQ ID NO: 28)
Cupriavidus necator KEDGVNIWG DGS TFKGND IERF YRYGLLTNPGLQI YKPWLD QQFIDELG
167 (Positions 119-167 of SEQ ID NO: 29)
Escherichia coli KEDGVNIWG DGS TYKGND IERF YRYGLLTNAELQI YKPWLD TDFIDELG
167 (Positions 119-167 of SEQ ID NO: 30)
Mycobacterium smegmatis REHGGGVVA HGC TGKGND QVRF EVGFASLAPDLQVLAPVRDYAW-T---
151 (Positions 107-151 of SEQ ID NO: 31)
Neisseria weaveri KEDDVNIWG DGS TYKGND IERF YRYGLLTNPSLRI YKPWLD QTFIDELG
171 (Positions 123-171 of SEQ ID NO: 32)
A protein percent identity matrix for the ten kinds of argininosuccinate synthase was also generated using the Clustal 2.1 program (Madeira, Fabio et al., Nucleic acids research 50(W1) W276-W279, 2022) and shown in Table 2 below.
Percent Identity (%) 01 02 03 04 05 06 07 08 09 10
A. capsulatum 100 81 28 82 27 26 82 79 29 80
A. faecalis 81 100 27 85 28 28 82 80 29 84
B. amyloliquefaciens 28 27 100 27 53 49 26 27 52 27
B. pyrrocinia 82 85 27 100 28 27 81 78 29 81
C. ammoniagenes 27 28 53 28 100 82 28 28 74 29
C. glutamicum 26 28 49 27 82 100 28 28 71 27
C. necator 82 82 26 81 28 28 100 80 30 80
E. coli 79 80 27 78 28 28 80 100 29 80
M. smegmatis 29 29 52 29 74 71 30 29 100 29
N. weaveri 80 84 27 81 29 27 80 80 29 100
Additionally, a protein motif analysis was performed on argininosuccinate synthases having the amino acid extension so as to derive motifs that can define the structural difference. The protein motifs are represented by amino acid sequences patterned into shortsequences that appear as a kind of pattern attributable to molecular functions. The above argininosuccinate synthases having the amino acid extension were included, and the protein sequences of 20 argininosuccinate synthases having a 5 amino acid extension at the corresponding position were additionally collected using the NCBI BLAST program (Altschul, S F et al., Nucleic acids research 25(17) 3389-402, 1997) and included (SEQ ID NO: 1 through SEQ ID NO: 16). These sequences were searched for motifs using the MEME program, a protein motif search program (Bailey, T L, and C Elkan. Proceedings. International Conference on Intelligent Systems for Molecular Biology 2, 28-36, 1994).
As a result, total of 18 motifs were obtained from the analysis of the argininosuccinate synthases shown above. Typically, the protein motifs were expressed using regular expressions, and are also expressed in the same way in the drawings (Sigrist, Christian J A et al., Briefings in bioinformatics 3(3) 265-74, 2002). It was noticed that a motif exists at the exact position where the amino acid residues are extended, which is represented by motif 8 (SEQ ID NO:1). This is an essential motif that describes the aforementioned amino acid extension in argininosuccinate synthases. Motifs 6 (SEQ ID NO:11) and 13 (SEQ ID NO:17) commonly appear in argininosuccinate synthase proteins.
Apart from this, another 15 motifs were identified within these argininosuccinate synthase proteins. These additional 15 motifs are commonly possessed by proteins having the amino acid extension among argininosuccinate synthases. Therefore, those proteins, which always include Motif 8 (SEQ ID NO:1) and optionally include some of those 17 motifs (SEQ ID NO:6 through SEQ ID NO:22), may have the characteristics of argininosuccinate synthase that has the amino acid extension. Table 3 displays the motifs found during the search (shown as SEQ ID NO: 1 and SEQ ID NO. 6 through SEQ ID NO: 22 in FIG. 4).
MOTIF 01 [ILV]Px[AEQ][KR][IV]G[IL]AFSGGLDTS[ATV]A[ILV] (SEQ ID NO: 6)
MOTIF 02 [KN]GAx[CP][CY][AST]YTA[DN][IL]GQ[APY]DE (SEQ ID NO: 7)
MOTIF 03 [AS][RV][IL][IV][ED][CG][KR]xL[AV] (SEQ ID NO: 8)
MOTIF 04 [GN]AF[HN]xx[ST][AG]G(SEQ ID NO: 9)
MOTIF 05 YFNTTP[IL]GRAV[AT][AG]T[LM]LV(SEQ ID NO: 10)
MOTIF 06 T[FY]KGNDIERF (SEQ ID NO: 11)
MOTIF 07 YRYGL[LV][ATV]N (SEQ ID NO: 12)
MOTIF 08 YKPWLDxxF[IV]XEL (SEQ ID NO: 1)
MOTIF 09 GGRxEM[AS][AEQ][FWY][LM] (SEQ ID NO: 13)
MOTIF 10 EKAYSTD[AS]N[ILM][LW]GATHE
MOTIF 11 VxPIMGVx[FHS]W
MOTIF 12 [IV]GGRHG[LM]G[MV][AS]DQIENR[IV]IEA
MOTIF 13 KSRGIYEAPG
MOTIF 14 [LM]AL[FL][FHY][AI]A[FY]ERLx[NST][AG]IHNEDT
MOTIF 15 LG[KR]L[LM]Y[AEQ]GRW[FL]D[PS]Q[ASG][ILM]M[ILV]R[DE]
MOTIF 16 LRRGxD[FY][ST][IL]
MOTIF 17 [ADN][FL][ST][FY]x[APS][AED][KR]L[ST]ME[KR]
MOTIF 18 DRIGQL[HT]MR[LNT][LN]D[ILV]xD[ST]R
Example 2. Preparation of vectors introduced with foreign argininosuccinate synthase
Before preparing L-arginine-producing strains of Corynebacterium glutamicum wherein each strain includes argininosuccinate synthase, vectors were prepared, each having an argG gene encoding the argininosuccinate synthase enzyme as disclosed herin. The argG genes were each derived from Acidobacterium capsulatum (SEQ ID NO:33), Alkaligenes faecalis (SEQ ID NO:34), Burkholderia pyrrocinia (SEQ ID NO:35), Bacillus amyloliquefaciens (SEQ ID NO:41), Corynebacterium ammoniagenes (SEQ ID NO:42), Corynebacterium glutamicum (SEQ ID NO:43), Cupriavidus necator (SEQ ID NO:36), Escherichia coli (SEQ ID NO:37), Mycobacterium smegmatis (SEQ ID NO:44), and Neisseria weaveri (SEQ ID NO:38). See FIG. 5. The vectors were commonly designed so that each argG gene can be inserted into the chromosome of Corynebacterium glutamicum by homologous recombination, particularly at the BBD29_RS08210 site, and expressed under the constitutive Po2 promoter (Korean Patent No. 10-1632642).
Among the vector components, two DNA fragments containing promoter and homologous region targeting BBD29_RS08210 were prepared. One DNA fragment was for the 5'-homologous region targeting BBD29_RS08210 and a promoter, the other DNA fragment was for the 3'-homologous region targeting BBD29_RS08210. PCR was performed using the genomic DNA of Corynebacterium glutamicum ATCC13869 as a template and the primer pairs (SEQ ID NOS: 45 and 46, and SEQ ID NOS: 47 and 48) shown in Table 4 to obtain a DNA fragment, which includes sequences of a 5'-homologous region of BBD29_RS08210 with a Po2 promoter and a 3'-homologous region of BBD29_RS08210 (hereinafter, "BBD29_RS08210 5'-DNA fragment" and "BBD29_RS08210 3'-DNA fragment"). In particular, the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. Then, a DNA fragment of the argG gene was prepared. Specifically, PCR was performed using the primer pairs shown in Table 4 (SEQ ID NOS: 49 and 50 with the genomic DNA of Acidobacterium capsulatum strain KACC 14500, SEQ ID NOS: 51 and 52 with the genomic DNA of Bacillus amyloliquefaciens strain KACC 12067, SEQ ID NOS: 53 and 54 with the genomic DNA of Burkholderia pyrrocinia strain KACC 12018, SEQ ID NOS: 55 and 56 with the genomic DNA of Corynebacterium ammoniagenes strain ATCC 6872, SEQ ID NOS: 57 and 58 with the genomic DNA of Corynebacterium glutamicum strain ATCC 13869, SEQ ID NOS: 59 and 60 with the genomic DNA of Cupriavidus necator strain KCTC 22469, SEQ ID NOS: 61 and 62 with the genomic DNA of Escherichia coli K-12 substrain MG1655, SEQ ID NOS: 63 and 64 with the genomic DNA of Mycobacterium smegmatis strain MC2 155, and SEQ ID NOS: 65 and 66 with the genomic DNA of Neisseria weaveri strain KCTC 23363), as a template. The genomic DNAs used were distributed by KCTC and KACC. In particular, the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. As a result, DNA fragments of the argG gene (hereinafter, argG (A. capsulatum), argG (B. amyloliquefaciens), argG (B. pyrrocinia), argG (C. ammoniagenes), argG (C. glutamicum), argG (C. necator), argG (E. coli), argG (M. smegmatis), and argG (N. weaveri) were obtained.
No. Name DNA Sequence (5'-3')
SEQ ID NO: 45 BBD29_RS08210_5'_pF tgaattcgagctcggtacccctggtacaaggttgcccatg
SEQ ID NO: 46 BBD29_RS08210_5'_pR gcttttgcacgtttccattataccacataattctgttgctgccaaaattcacgattattgactagtaatgctggatcgttggcgaa
SEQ ID NO: 47 BBD29_RS08210_3'-F acgaaaggctcagtcgaaagactgggcctttcgtttttatcttctagtcccgggatccggctcaactatataaccgt
SEQ ID NO: 48 BBD29_RS08210_3'-R gtcgactctagaggatccccactctggcatccaccaacaa
SEQ ID NO: 49 argG-A.ca-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgtctgtgattctggaaca
SEQ ID NO: 50 argG-A.ca-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttaattcttttcgcccgagt
SEQ ID NO: 51 argG-B.am-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatggcagaacaaaaaaaggt
SEQ ID NO: 52 argG-B.am-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgattcatgcttcaatctgctcct
SEQ ID NO: 53 argG-B.py-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgagcacgattctcgaaag
SEQ ID NO: 54 argG-B.py-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttactcgccgctatcgccct
SEQ ID NO: 55 argG-C.am-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgaacgcacgtgttgtgct
SEQ ID NO: 56 argG-C.am-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttactggccgtttgcgctgg
SEQ ID NO: 57 argG-C.gl-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgactaaccgcatcgttct
SEQ ID NO: 58 argG-C.gl-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttagttgttgccagcttcgc
SEQ ID NO: 59 argG-C.ne-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgactaccatcctgcagca
SEQ ID NO: 60 argG-C.ne-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgattcagtcttccagcttcggca
SEQ ID NO: 61 argG-E.co-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgacgacgattctcaagca
SEQ ID NO: 62 argG-E.co-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttactggcctttgttttcca
SEQ ID NO: 63 argG-M.sm-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgtccgaacgcgtcatcct
SEQ ID NO: 64 argG-M.sm-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatctagatgcccaggtcgcgct
SEQ ID NO: 65 argG-N.we-F cagcaacagaattatgtggtataatggaaacgtgcaaaagcatagattattggaggagatcaaaacacatatgagcagtcagaaccatac
SEQ ID NO: 66 argG-N.we-R ataaaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatttattttttatgtcccagct
Then, the obtained DNA fragments and the linearized vectors were cloned. The linear vectors were used after treating pDCM2 (Korean Patent Application Publication No. 10-2020-0136813), which cannot replicate in Corynebacterium glutamicum, with SmaI restriction enzyme. The thus-obtained BBD29_RS08210 5'-DNA fragment, BBD29_RS08210 3'-DNA fragment, and each of the DNA fragments of argG gene were cloned by fusion. The fusion cloning was performed using the In-Fusion® HD Cloning Kit (Clontech), and the resulting clones were transformed into E. coli DH5α and plated on LB solid medium containing kanamycin (25 mg/L). After selecting colonies transformed with the plasmid into which the target gene was inserted through PCR, DNA was obtained using a DNA-spin plasmid DNA purification kit (iNtRON), and thereby vectors (recombinant plasmids) including all of the BBD29_RS08210 5'-DNA fragment, BBD29_RS08210 3'-DNA fragment, and the DNA fragment of each argG gene (i.e., pDCM2-△BBD29_RS08210::Po2_argG (A. capsulatum), pDCM2-△BBD29_RS08210::Po2_argG (B. amyloliquefaciens), pDCM2-△BBD29_RS08210::Po2_argG (B. pyrrocinia)), pDCM2-△BBD29_RS08210::Po2_argG (C. ammoniagenes), pDCM2-△BBD29_RS08210::Po2_argG (C. glutamicum), pDCM2-△BBD29_RS08210::Po2_argG (C. necator), pDCM2-△BBD29_RS08210::Po2_argG (E. coli), pDCM2-△BBD29_RS08210::Po2_argG (M. smegmatis), and pDCM2-△BBD29_RS08210::Po2_argG (N. weaveri)) were prepared.
Example 3. Preparation of L-arginine producing Corynebacterium glutamicum strains expressing foreign argininosuccinate synthase (argG) and evaluation of their L-arginine producing ability
Example 3-1. Preparation of Corynebacterium glutamicum CJR2 strain
A Corynebacterium glutamicum strain CJR2 having the ability to produce L-arginine was prepared. This preparation is to introduce two mutations (△argR, argB (M54V)) serially into wild-type Corynebacterium glutamicum ATCC13869 (Ikeda, Masato et al., Applied and Environmental Microbiology 75(6)1635-41, 2009).
First, vectors introducing argR deletion (SEQ ID NO:39) and argB (M54V) mutation (SEQ ID NO: 40) were prepared. See FIG. 6. PCR was performed using the genomic DNA of Corynebacterium glutamicum ATCC13869 as a template and the primer pairs shown in Table 5 (SEQ ID NOS: 67 and 68, and SEQ ID NOS: 69 and 70), and overlapping PCR was performed using the primer pair of SEQ ID NOS: 67 and 70 so as to obtain homologous recombination fragments having a sequence of the argR deletion mutation (SEQ ID NO:39). In the same manner, in order to prepare homologous recombination fragments having an argB (M54V) mutation, PCR was performed using the primer pairs shown in Table 5 (SEQ ID NOS: 71 and 72, and SEQ ID NOS: 73 and 74), and overlapping PCR was performed using SEQ ID NOS: 71 and 74. to obtain homologous recombination fragments having a sequence of the argB (M54V) mutation (SEQ ID NO:40). The PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. In the same manner as in Example 2, the linearized pDCM2 vector and each homologous recombination fragment were cloned by fusion. The thus-prepared vectors (recombinant plasmids) were named pDCM2-△argR and pDCM2- argB (M54V), respectively.
No. Name DNA Sequence (5'-3')
SEQ ID NO: 67 argR-5'-F tgaattcgagctcggtaccccactggtgaactccttgtcc
SEQ ID NO: 68 argR-5'-R ttgaactaggggcgctttaaaagttttccggtgttgacgg
SEQ ID NO: 69 argR-3'-F ccgtcaacaccggaaaacttttaaagcgcccctagttcaa
SEQ ID NO: 70 argR-3'-R gtcgactctagaggatcccccgttgaactgcttgccagcc
SEQ ID NO: 71 argB-5'-F tgaattcgagctcggtaccctgcggctcgcacggttgctc
SEQ ID NO: 72 argB-5'-R acggtgcgcaagaagaccacgtcggcagcaaaagcagcct
SEQ ID NO: 73 argB-3'-F ggctgcttttgctgccgacgtggtcttcttgcgcaccgtg
SEQ ID NO: 74 argB-3'-R gtcgactctagaggatccccctcttatcaggccaatcggt
Then, an argR deletion mutation was introduced into wild-type Corynebacterium glutamicum ATCC13869 and transformed by the electric pulse method using the pDCM2-△argR plasmid prepared above (van der Rest et al., Appl Microbiol Biotechnol 52:541-545, 1999). Subsequently, secondary recombination was performed in a solid plate medium containing 4% sucrose followed by PCR using a primer pair (SEQ ID NOS: 67 and 70) targeting the transformed strains whose secondary recombination was completed, thereby confirming that a deletion mutation was introduced into the argR gene on the chromosome. In particular, the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 2 minutes. The transformed strain was named CJR1.
Subsequently, the argB (M54V) mutation was introduced into the Corynebacterium glutamicum CJR1 in the same manner as above. The pDCM2- argB (M54V) plasmid prepared above was used, and it was confirmed that the M54V mutation was introduced into the argB gene on the chromosome by performing PCR using a primer pair (SEQ ID NOS: 71 and 74) targeting the transformant whose secondary recombination was completed. The transformed strain was named CJR2.
The solid plate medium was as follows: Composite Plate Medium (pH 7.0), glucose 10 g, peptone 10 g, beef extract 5 g, yeast extract 5 g, brain heart infusion 18.5 g, NaCl 2.5 g, urea 2 g, sorbitol 91 g, agar 20 g (per 1 L of distilled water).
Example 3-2. Preparation of strains introduced with argininosuccinate synthase (argG) based on CJR2 strain
In order to introduce a foreign argG gene into the CJR2 strain, mutant strains, each of which was introduced with the vector prepared in Example 2, were prepared in the same manner as in Example 3-1. These mutant strains were each named CJR2-argG (A. capsulatum), CJR2-argG (B. amyloliquefaciens), CJR2-argG (B. pyrrocinia), CJR2-argG (C. ammoniagenes), CJR2-argG (C. glutamicum), CJR2-argG (C. necator), CJR2-argG (E. coli), CJR2-argG (M. smegmatis), and CJR2-argG (N. weaveri). The transformed strains in which secondary recombination was completed were subjected to PCR using a primer pair (SEQ ID NOS: 45 and 48), thereby confirming that a Po2 promoter and each argG gene were introduced into the BBD29_RS08210 gene on the chromosome. In particular, the PCR was performed as follows: 30 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 4 minutes.
Example 3-3. Evaluation of L-arginine producing ability
In order to compare the L-arginine producing ability of the CJR2 strains, in which the argG gene is introduced, prepared in Example 3-2 (i.e., CJR2-argG (A. capsulatum), CJR2-argG (B. amyloliquefaciens), CJR2-argG (B. pyrrocinia), CJR2-argG (C. ammoniagenes), CJR2-argG (C. glutamicum), CJR2-argG (C. necator), CJR2-argG (E. coli), CJR2-argG (M. smegmatis), and CJR2-argG (N. weaveri)), the concentration of L-arginine in the culture medium was analyzed by culturing by the method described below.
Corynebacterium glutamicum CJR2 (i.e., the parent strain) and the strains prepared in Example 3-2 were each inoculated into a 250 mL corner-baffle flask containing 25 mL of a production medium and then cultured with shaking at 200 rpm at 30°C for 44 hours. Each of the composition of the production medium is as shown below.
<Production medium (pH 7.2)> sucrose 50 g, ammonium sulfate 57 g, MgSO4·7H2O 2 g, beet molasses 5 g, calcium chloride 1 mg, cobalt chloride 1 mg, KH2PO4 2 g, biotin 0.01 mg, thiamine-HCl 0.1 mg, calcium pantothenate 2 mg, nicotinamide 3 mg, iron sulfate 10 mg, manganese sulfate 10 mg, zinc sulfate 0.02 mg, copper sulfate 0.5 mg, calcium carbonate 30 g (per 1 L of distilled water).
After completion of the culture, the L-arginine producing ability (concentration) was analyzed using HPLC (Waters 2478), and the L-arginine and L-citrulline concentrations and L-arginine yield improvement analyzed are shown in Table 6 below.
Name of Strain L-Arginine (g/L) L-Citrulline (g/L) Degree of Yield Improvement (%)
Batch
1
Batch
2
Batch
3
Avg Batch
1
Batch
2
Batch
3
Avg Batch
1
Batch
2
Batch
3
Avg
CJR2 4.8 4.8 4.7 4.7 0.9 0.9 0.9 0.9 - - - -
CJR2-argG
(A. capsulatum)
6.2 6.0 6.2 6.1 0.4 0.5 0.4 0.4 30.1 26.7 30.2 29.0
CJR2-argG
(B. amyloliquefaciens)
5.2 5.1 4.9 5.1 0.6 0.5 0.7 0.6 8.4 8.0 2.7 6.4
CJR2-argG(B. pyrrocinia) 6.7 6.6 6.6 6.6 0.2 0.2 0.2 0.2 40.6 38.9 40.2 39.9
CJR2-argG(C. ammoniagenes) 4.9 4.8 4.8 4.8 0.7 0.8 0.8 0.8 3.4 0.4 2.1 2.0
CJR2-argG(C. glutamicum) 5.1 5.4 5.2 5.3 0.5 0.6 0.5 0.5 8.2 14.3 10.8 11.1
CJR2-argG(C. necator) 6.7 6.5 6.5 6.6 0.1 0.1 0.1 0.1 40.6 36.1 37.8 38.2
CJR2-argG(E. coli) 5.9 5.9 5.9 5.9 0.3 0.3 0.3 0.3 23.8 23.9 24.3 24.0
CJR2-argG(M. smegmatis) 4.5 4.5 4.4 4.4 0.7 0.7 0.7 0.7 -6.1 -6.3 -6.8 -6.4
CJR2-argG(N. weaveri) 6.4 6.6 6.5 6.5 0.2 0.1 0.1 0.1 35.4 38.9 36.4 36.9
As a result, in the case of the Corynebacterium glutamicum strains which were introduced with the argininosuccinate synthase having the 5-amino-acid extension of Example 1, its L-arginine producing ability was higher than those of strains introduced with other enzymes. As mentioned in Example 1, the argininosuccinate synthases derived from A. capsulatum, B. pyrrocinia, C. necator, E. coli, and N. weaveri were shown to have the corresponding amino acid extension (FIGS. 1-3). The average concentrations of L-arginine and L-citrulline present in the fermented broth of CJR2-derived strains carrying each of them, were measured to be 6.3 g/L and 0.2 g/L, respectively. The average yield improvement of these strains compared to their parent strain CJR2 was 33.6%. In contrast, the average L-arginine and L-citrulline concentrations present in the culture media of the strains, into which argininosuccinate synthases derived from B. amyloliquefaciens, C. ammoniagenes, C. glutamicum, and M. smegmatis not including the 5-amino-acid extension were introduced, were each measured to be 4.9 g/L and 0.7 g/L; therefore, compared to the former group, the L-arginine concentration was 23% lower on average, and the L-citrulline concentration was 350% higher on average. The average yield improvement of these strains compared to their parent strain was 3.3%. This indicates that the argininosuccinate synthase having the amino acid extension is relatively excellent in terms of catalytic efficiency compared to the enzymes not including the amino acid extension. In addition, these results suggest that in Corynebacterium glutamicum which produces L-arginine, such improvement in the catalytic efficiency of argininosuccinate synthase leads to enhancement of its L-arginine producing ability, thereby enabling large-scale production of L-arginine.

Claims (67)

  1. An amino acid molecule comprising the amino acid sequence of SEQ ID NO: 1,
    YKPWLDX1X2FX3X4EL,
    in which each of X1, X2, and X4 is an amino acid; and X3 is I or V,
    wherein the amino acid molecule does not occur in nature.
  2. The amino acid molecule according to claim 1, wherein SEQ ID NO: 1 consists of a sequence selected from the group consisting of YKPWLDSAFIDEL (SEQ ID NO: 2), YKPWLDQTFIDEL (SEQ ID NO: 3), YKPWLDQQFIDEL (SEQ ID NO: 4), and YKPWLDTDFIDEL (SEQ ID NO: 5).
  3. The amino acid molecule according to any one of the preceding claims, wherein the amino acid sequence comprises at least one sequence selected from the group consisting of SEQ ID NO: 6-22,
    X5PX6X7X8X9GX10AFSGGLDTSX11AX12 (SEQ ID NO: 6),
    in which X5 is I, L or V; X6 is an amino acid; X7 is A, E or Q; X8 is K or R; X9 is I or V; X10 is I or L; X11 is A, T or V; and X12 is I, L or V,
    X13GAXaX14X15X16YTA X17X18GQ X19DE (SEQ ID NO: 7),
    in which X13 is K or N; Xa is an amino acid; X14 is C or P; X15 is C or Y; X16 is A, S or T; X17 is D or N; X18 is I or L; and X19 is A, P or Y,
    X20X21X22X23X24X25X26XaLX27 (SEQ ID NO: 8),
    in which X20 is A or S; X21 is R or V; X22 is I or L; X23 is I or V; X24 is E or D; X25 is C or G; X26 is K or R; Xa is an amino acid; and X27 is A or V;
    X28AFX29XaXaX30X31G (SEQ ID NO: 9),
    in which X28 is G or N; X29 is H or N; Xa is an amino acid; X30 is S or T; and X31 is A or G,
    YFNTTPX32GRAVX33X34TX35LV (SEQ ID NO: 10),
    in which X32 is I or L; X33 is A or T; X34 is A or G; and X35 is L or M,
    TX36KGNDIERF (SEQ ID NO: 11),
    in which X36 is F or Y,
    YRYGLX37X38N (SEQ ID NO: 12),
    in which X37 is L or V; and X38 is A, T or V,
    GGRXaEMX39X40X41X42 (SEQ ID NO: 13),
    in which Xa is an amino acid; X39 is A or S; X40 is A, E or Q; X41 is F, W or Y; and X42 is L or M,
    EKAYSTDX43NX44X45GATHE (SEQ ID NO: 14),
    in which X43 is A or S; X44 is I, L or M; and X45 is L or W,
    VXaPIMGVXaX46W (SEQ ID NO: 15),
    in which Xa is an amino acid and X46 is F, H or S,
    X47GGRHGX48GX49X50DQIENRX51IEA (SEQ ID NO: 16),
    in which X47 is I or V; X48 is L or M; X49 is M or V; X50 is A or S; and X51 is I or V,
    KSRGIYEAPG (SEQ ID NO: 17),
    X52ALX53X54X55AX56ERLXaX57X58IHNEDT (SEQ ID NO: 18),
    in which X52 is L or M; X53 is F or L; X54 is F, H or Y; X55 is A or I; X56 is F or Y; Xa is an amino acid; X57 is N, S or T; and X58 is A or G,
    LGX59LX60YX61GRWX62DX63QX64X65MX66RX67 (SEQ ID NO: 19),
    in which X59 is K or R; X60 is L or M; X61 is A, E or Q; X62 is F or L; X63 is P or S; X64 is A, S or G; X65 is I, L or M; X66 is I, L or V; and X67 is D or E,
    LRRGXaDX68X69X70 (SEQ ID NO: 20),
    in which Xa is an amino acid; X68 is F or Y; X69 is S or T; and X70 is I or L,
    X71X72X73X74XaX75X76X77LX78MEX79 (SEQ ID NO: 21),
    in which X71 is A, D or N; X72 is F or L; X73 is S or T; X74 is F or Y; Xa is an amino acid; X75 is A, P or S; X76 is A, E or D; X77 is K or R; X78 is S or T; and X79 is K or R,
    DRIGQLX80MRX81X82DX83XaDX84R (SEQ ID NO: 22),
    in which X80 is H or T; X81 is L, N or T; X82 is L or N; X83 is I, L or V; Xa is an amino acid; and X84 is S or T.
  4. The amino acid molecule according to any one of the preceding claims, wherein the amino acid molecule comprises all of SEQ ID NO:6-22.
  5. The amino acid molecule according to any one of the preceding claims, wherein the amino acid molecule comprises at least one sequence selected from the group consisting of:
    a sequence having at least 79% sequence identity to A. capsulatum sequence, SEQ ID NO: 23;
    a sequence having at least 80% sequence identity to A. faecalis sequence, SEQ ID NO: 24;
    a sequence having at least 78% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 25;
    a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 26;
    a sequence having at least 78% sequence identity to E. coli sequence SEQ ID NO: 27; and
    a sequence having at least 80% sequence identity to N. weaveri sequence, SEQ ID NO: 28.
  6. The amino acid molecule according claim 5, wherein the amino acid molecule comprises the sequence having at least 80% sequence identity to SEQ ID NO: 26.
  7. The amino acid molecule according to claim 5, wherein the amino acid molecule comprises the sequence having at least 78% sequence identity to SEQ ID NO: 27.
  8. The amino acid molecule according to claim 5, wherein the amino acid molecule comprises a sequence that is a sequence having at least 79% sequence identity to SEQ ID NO: 23; having at least 80% sequence identity to SEQ ID NO: 24; having at least 78% sequence identity to SEQ ID NO: 25; having at least 80% sequence identity to SEQ ID NO: 26; having at least 78% sequence identity to SEQ ID NO: 27; and having at least 80% sequence identity to SEQ ID NO: 28.
  9. The amino acid molecule according to according to any one of the preceding claims, wherein the amino acid molecule comprises at least one sequence selected from the group consisting of
    a sequence corresponding to at least 79% of positions 1-153 and 167-445 of SEQ ID NO: 23;
    a sequence corresponding to at least 80% of positions 1-153 and 167-444 of SEQ ID NO: 24;
    a sequence corresponding to at least 78% of positions 1-153 and 167-445 of SEQ ID NO: 25;
    a sequence corresponding to at least 80% of positions 1-153 and 167-441 of SEQ ID NO: 26;
    a sequence corresponding to at least 78% of positions 1-153 and 167-447 of SEQ ID NO: 27; and
    a sequence corresponding to at least 80% of positions 1-157 and 171-448 of SEQ ID NO: 28.
  10. The amino acid molecule according to claim 9, wherein the amino acid molecule comprises the sequence corresponding to at least 80% of positions 1-153 and 167-441 of SEQ ID NO: 26.
  11. The amino acid molecule according to claim 9, wherein the amino acid molecule comprises the sequence corresponding to at least 78% of positions 1-153 and 167-447 of SEQ ID NO: 27.
  12. The amino acid molecule according to according to any one of the preceding claims, wherein the amino acid molecule comprises at least one sequence selected from the group consisting of
    a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 23;
    a sequence corresponding to positions 1-153 and 167-444 of SEQ ID NO: 24;
    a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 25;
    a sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 26;
    a sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 27; and
    a sequence corresponding to positions 1-157 and 171-448 of SEQ ID NO: 28.
  13. The amino acid molecule according to claim 12, wherein the amino acid molecule comprises the sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 26.
  14. The amino acid molecule according to claim 12, wherein the amino acid molecule comprises the sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 27.
  15. The argininosuccinate synthase according to any one of the preceding claims, wherein X1 is S, Q, or T.
  16. The amino acid molecule according to any one of the preceding claims, wherein X2 is A, T, Q, or D.
  17. The amino acid molecule according to any one of the preceding claims, wherein X3 is I.
  18. The amino acid molecule according to any one of the preceding claims, wherein X4 is D.
  19. The amino acid molecule according to any one of the preceding claims, wherein the amino acid molecule is a recombinant protein.
  20. The amino acid molecule according to any one of the preceding claims, wherein the amino acid molecule is an argininosuccinate synthase.
  21. A host cell expressing the amino acid molecule of any one of the preceding claims.
  22. The host cell according to claim 21, wherein the host cell is a microorganism.
  23. The host cell according to any one of claims 21-22, wherein the host cell is Corynebacterium.
  24. The host cell according to any one of claims 21-23, wherein the host cell is selected from the group consisting of Corynebacterium glutamicum, Corynebacterium stationis, Corynebacterium crudilactis, Corynebacterium deserti, Corynebacterium efficiens, Corynebacterium callunae, Corynebacterium singulare, Corynebacterium halotolerans, Corynebacterium striatum, Corynebacterium ammoniagenes, Corynebacterium pollutisoli, Corynebacterium imitans, Corynebacterium testudinoris, Corynebacterium flavescens, Corynebacterium crenatum and Corynebacterium suranareeae.
  25. The host cell according to any one of claims 21-24, wherein the host cell is Corynebacterium glutamicum.
  26. The host cell according to any one of claims 21-25, wherein the host cell produces L-arginine.
  27. The host cell according to any one of claims 21-26, wherein the host cell is transformed with an argR gene comprising a sequence having at least 90% sequence identity to SEQ ID NO: 39 and an argB gene with M54V comprising a sequence having at least 90% sequence identity to SEQ ID NO: 40.
  28. A nucleic acid molecule comprising a nucleic acid sequence encoding the amino acid molecule of any one of claims 1-20.
  29. The nucleic acid molecule according to claim 28, comprising at least one sequence selected from the group consisting of
    a sequence having at least 79% sequence identity to A. capsulatum sequence, SEQ ID NO: 33;
    a sequence having at least 80% sequence identity to A. faecalis sequence, SEQ ID NO: 34;
    a sequence having at least 78% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 35;
    a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 36;
    a sequence having at least 78% sequence identity to E. coli sequence SEQ ID NO: 37; and
    a sequence having at least 80% sequence identity to N. weaveri sequence, SEQ ID NO: 38.
  30. The nucleic acid molecule according to claim 28 or 29, comprising at least one sequence selected from the group consisting of SEQ ID NO: 33-38.
  31. A vector comprising the nucleic acid molecule of any one of claims 28-30.
  32. An engineered host cell expressing a recombinant amino acid molecule comprising the amino acid sequence of SEQ ID NO: 1,
    YKPWLDX1X2FX3X4EL,
    in which each of X1, X2, and X4 is an amino acid; and X3 is I or V.
  33. The engineered host cell according to claim 32, wherein the recombinant amino acid molecule comprises at least one sequence selected from the group consisting of SEQ ID NO:6-22,
    X5PX6X7X8X9GX10AFSGGLDTSX11AX12 (SEQ ID NO: 6),
    in which X5 is I, L or V; X6 is an amino acid; X7 is A, E or Q; X8 is K or R; X9 is I or V; X10 is I or L; X11 is A, T or V; and X12 is I, L or V,
    X13GAXaX14X15X16YTA X17X18GQ X19DE (SEQ ID NO: 7),
    in which X13 is K or N; Xa is an amino acid; X14 is C or P; X15 is C or Y; X16 is A, S or T; X17 is D or N; X18 is I or L; and X19 is A, P or Y,
    X20X21X22X23X24X25X26XaLX27 (SEQ ID NO: 8),
    in which X20 is A or S; X21 is R or V; X22 is I or L; X23 is I or V; X24 is E or D; X25 is C or G; X26 is K or R; Xa is an amino acid; and X27 is A or V;
    X28AFX29XaXaX30X31G (SEQ ID NO: 9),
    in which X28 is G or N; X29 is H or N; Xa is an amino acid; X30 is S or T; and X31 is A or G,
    YFNTTPX32GRAVX33X34TX35LV (SEQ ID NO: 10),
    in which X32 is I or L; X33 is A or T; X34 is A or G; and X35 is L or M,
    TX36KGNDIERF (SEQ ID NO: 11),
    in which X36 is F or Y,
    YRYGLX37X38N (SEQ ID NO: 12),
    in which X37 is L or V; and X38 is A, T or V,
    GGRXaEMX39X40X41X42 (SEQ ID NO: 13),
    in which Xa is an amino acid; X39 is A or S; X40 is A, E or Q; X41 is F, W or Y; and X42 is L or M,
    EKAYSTDX43NX44X45GATHE (SEQ ID NO: 14),
    in which X43 is A or S; X44 is I, L or M; and X45 is L or W,
    VXaPIMGVXaX46W (SEQ ID NO: 15),
    in which Xa is an amino acid and X46 is F, H or S,
    X47GGRHGX48GX49X50DQIENRX51IEA (SEQ ID NO: 16),
    in which X47 is I or V; X48 is L or M; X49 is M or V; X50 is A or S; and X51 is I or V,
    KSRGIYEAPG (SEQ ID NO: 17),
    X52ALX53X54X55AX56ERLXaX57X58IHNEDT (SEQ ID NO: 18),
    in which X52 is L or M; X53 is F or L; X54 is F, H or Y; X55 is A or I; X56 is F or Y; Xa is an amino acid; X57 is N, S or T; and X58 is A or G,
    LGX59LX60YX61GRWX62DX63QX64X65MX66RX67 (SEQ ID NO: 19),
    in which X59 is K or R; X60 is L or M; X61 is A, E or Q; X62 is F or L; X63 is P or S; X64 is A, S or G; X65 is I, L or M; X66 is I, L or V; and X67 is D or E,
    LRRGXaDX68X69X70 (SEQ ID NO: 20),
    in which Xa is an amino acid; X68 is F or Y; X69 is S or T; and X70 is I or L,
    X71X72X73X74XaX75X76X77LX78MEX79 (SEQ ID NO: 21),
    in which X71 is A, D or N; X72 is F or L; X73 is S or T; X74 is F or Y; Xa is an amino acid; X75 is A, P or S; X76 is A, E or D; X77 is K or R; X78 is S or T; and X79 is K or R,
    DRIGQLX80MRX81X82DX83XaDX84R (SEQ ID NO: 22),
    in which X80 is H or T; X81 is L, N or T; X82 is L or N; X83 is I, L or V; Xa is an amino acid; and X84 is S or T.
  34. The engineered host cell according to claim 32 or 33, wherein the recombinant amino acid molecule comprises all of SEQ ID NO: 6-22.
  35. The engineered host cell according to any one of claims 32-34, wherein the recombinant amino acid molecule comprises at least one sequence selected from the group consisting of
    a sequence having at least 79% sequence identity to A. capsulatum sequence, SEQ ID NO: 23;
    a sequence having at least 80% sequence identity to A. faecalis sequence, SEQ ID NO: 24;
    a sequence having at least 78% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 25;
    a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 26;
    a sequence having at least 78% sequence identity to E. coli sequence SEQ ID NO: 27; and
    a sequence having at least 80% sequence identity to N. weaveri sequence, SEQ ID NO: 28.
  36. The engineered host cell according to claim 35, wherein the recombinant amino acid molecule comprises the sequence having at least 80% sequence identity to SEQ ID NO: 26.
  37. The engineered host cell according to claim 35, wherein the recombinant amino acid molecule comprises the sequence having at least 78% sequence identity to SEQ ID NO: 27.
  38. The engineered host cell according to claim 35, wherein the recombinant amino acid molecule comprises a sequence that is a sequence having at least 79% sequence identity to SEQ ID NO: 23; having at least 80% sequence identity to SEQ ID NO: 24; having at least 78% sequence identity to SEQ ID NO: 25; having at least 80% sequence identity to SEQ ID NO: 26; having at least 78% sequence identity to SEQ ID NO: 27; and having at least 80% sequence identity to SEQ ID NO: 28.
  39. The engineered host cell according to any one of claims 32-38, wherein the recombinant amino acid molecule comprises at least one sequence selected from the group consisting of:
    a sequence corresponding to at least 79% of positions 1-153 and 167-445 of SEQ ID NO: 23;
    a sequence corresponding to at least 80% of positions 1-153 and 167-444 of SEQ ID NO: 24;
    a sequence corresponding to at least 78% of positions 1-153 and 167-445 of SEQ ID NO: 25;
    a sequence corresponding to at least 80% of positions 1-153 and 167-441 of SEQ ID NO: 26;
    a sequence corresponding to at least 78% of positions 1-153 and 167-447 of SEQ ID NO: 27; and
    a sequence corresponding to at least 80% of positions 1-157 and 171-448 of SEQ ID NO: 28.
  40. The engineered host cell according to claim 39, wherein the recombinant amino acid molecule comprises the sequence corresponding to at least 80% of positions 1-153 and 167-441 of SEQ ID NO: 26.
  41. The engineered host cell according to claim 39, wherein the recombinant amino acid molecule comprises the sequence corresponding to at least 78% of positions 1-153 and 167-447 of SEQ ID NO: 27.
  42. The engineered host cell according to any one of claims 32-41, wherein the recombinant amino acid molecule comprises at least one sequence selected from the group consisting of:
    a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 23;
    a sequence corresponding to positions 1-153 and 167-444 of SEQ ID NO: 24;
    a sequence corresponding to positions 1-153 and 167-445 of SEQ ID NO: 25;
    a sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 26;
    a sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 27; and
    a sequence corresponding to positions 1-157 and 171-448 of SEQ ID NO: 28.
  43. The engineered host cell according to claim 42, wherein the recombinant amino acid molecule comprises the sequence corresponding to positions 1-153 and 167-441 of SEQ ID NO: 26.
  44. The engineered host cell according to claim 42, wherein the recombinant amino acid molecule comprises the sequence corresponding to positions 1-153 and 167-447 of SEQ ID NO: 27.
  45. The engineered host cell according to any one of claims 32-44, wherein the recombinant amino acid molecule is encoded by a nucleotide molecule comprising at least one sequence selected from the group consisting of:
    a sequence having at least 79% sequence identity to A. capsulatum sequence, SEQ ID NO: 33;
    a sequence having at least 80% sequence identity to A. faecalis sequence, SEQ ID NO: 34;
    a sequence having at least 78% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 35;
    a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 36;
    a sequence having at least 78% sequence identity to E. coli sequence SEQ ID NO: 37; and
    a sequence having at least 80% sequence identity to N. weaveri sequence, SEQ ID NO: 38.
  46. The engineered host cell according to claim 45, wherein the recombinant amino acid molecule is encoded by the nucleotide molecule comprising a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 36.
  47. The engineered host cell according to claim 45, wherein the recombinant amino acid molecule is encoded by the nucleotide molecule comprising a sequence having at least 78% sequence identity to E.coli sequence, SEQ ID NO: 37.
  48. The engineered host cell according to any one of claims 32-47, wherein the recombinant amino acid molecule is an argininosuccinate synthase.
  49. The engineered host cell according to any one of claims 32-48, wherein the engineered host cell is a microorganism.
  50. The engineered host cell according to any one of claims 32-49, wherein the engineered host cell is Corynebacterium.
  51. The engineered host cell according to any one of claims 32-50, wherein the engineered host cell is selected from the group consisting of Corynebacterium glutamicum, Corynebacterium stationis, Corynebacterium crudilactis, Corynebacterium deserti, Corynebacterium efficiens, Corynebacterium callunae, Corynebacterium singulare, Corynebacterium halotolerans, Corynebacterium striatum, Corynebacterium ammoniagenes, Corynebacterium pollutisoli, Corynebacterium imitans, Corynebacterium testudinoris, Corynebacterium flavescens, Corynebacterium crenatum and Corynebacterium suranareeae.
  52. The engineered host cell according to any one of claims 32-51, wherein the engineered host cell is Corynebacterium glutamicum.
  53. The engineered host cell according to any one of claims 32-52, wherein the engineered host cell produces L-arginine.
  54. The engineered host cell according to any one of claims 32-53, wherein the engineered host cell is transformed with an argR gene comprising a sequence having at least 90% sequence identity to SEQ ID NO: 39 and an argB gene with M54V comprising a sequence having at least 90% sequence identity to SEQ ID NO: 40.
  55. The engineered host cell according to any one of claims 32-54, wherein the recombinant amino acid molecule has an extrinsic amino acid sequence of the host cell.
  56. Use of the amino acid molecule of any one of claims 1-20, the host cell of any one of claims 21-27, the nucleic acid molecule of any one of claims 28-30, the vector of claim 31, or the engineered host cell of any one of claims 32-55 for producing L-arginine.
  57. A method of producing L-arginine, comprising culturing the host cell of any one of claims 21-27 or the engineered host cell of any one of claims 32-55.
  58. The method according to claim 57, wherein the production method further comprises recovering L-arginine from the cultured host cell.
  59. The method according to claim 57 or 58, wherein the production method further comprises recovering L-arginine from medium in which the host cell is cultured.
  60. The method according to any one of claims 57-59, wherein the culturing is performed with at least one source of carbon selected from the group consisting of carbohydrates, sugar alcohols, organic acids, amino acids, starch hydrolyzate, molasses, blackstrap molasses, rice winter, cassava, sugar cane offal and corn steep liquor.
  61. The method according to any one of claims 57-60, wherein the culturing is performed with at least one source of nitrogen selected from the group consisting of ammonia, ammonium sulfate, ammonium chloride, ammonium acetate, ammonium phosphate, anmonium carbonate, ammonium nitrate, glutamic acid, methionine, glutamine, peptone, NZ-amine, meat extract, yeast extract, malt extract, corn steep liquor, casein hydrolyzate, fish or degradation products thereof, and defatted soybean cake or degradation products thereof.
  62. The method according to any one of claims 57-61, wherein the culturing is performed with at least one selected from the group consisting of potassium phosphate, dipotassium phosphate, a sodium-containing salt corresponding thereto, sodium chloride, calcium chloride, iron chloride, magnesium sulfate, iron sulfate, manganese sulfate, and calcium carbonate.
  63. A method of increasing L-arginine production by a host cell, comprising transforming the host cell to produce the engineered host cell of any one of claims 32-55.
  64. A nucleic acid molecule comprising at least one sequence selected from the group consisting of:
    a sequence having at least 79% sequence identity to A. capsulatum sequence, SEQ ID NO: 33;
    a sequence having at least 80% sequence identity to A. faecalis sequence, SEQ ID NO: 34;
    a sequence having at least 78% sequence identity to B. pyrrocinia sequence, SEQ ID NO: 35;
    a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 36;
    a sequence having at least 78% sequence identity to E. coli sequence SEQ ID NO: 37; and
    a sequence having at least 80% sequence identity to N. weaveri sequence, SEQ ID NO: 38,
    wherein the nucleic acid molecule does not occur in nature.
  65. The nucleic acid molecule according to claim 64, comprising a sequence having at least 80% sequence identity to C. necator sequence, SEQ ID NO: 36.
  66. The nucleic acid molecule according to claim 64 or 65 comprising a sequence having at least 78% sequence identity to E.coli sequence, SEQ ID NO: 37.
  67. Use of the product or method characterized by one or more elements disclosed in the application.
PCT/KR2023/013895 2023-09-15 2023-09-15 Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same Pending WO2025058109A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2023/013895 WO2025058109A1 (en) 2023-09-15 2023-09-15 Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2023/013895 WO2025058109A1 (en) 2023-09-15 2023-09-15 Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same

Publications (1)

Publication Number Publication Date
WO2025058109A1 true WO2025058109A1 (en) 2025-03-20

Family

ID=95021491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/013895 Pending WO2025058109A1 (en) 2023-09-15 2023-09-15 Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same

Country Status (1)

Country Link
WO (1) WO2025058109A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233675A1 (en) * 2002-02-21 2003-12-18 Yongwei Cao Expression of microbial proteins in plants for production of plants with improved properties
US20050260581A1 (en) * 2001-02-12 2005-11-24 Chiron Spa Gonococcal proteins and nucleic acids
WO2009037329A2 (en) * 2007-09-21 2009-03-26 Basf Plant Science Gmbh Plants with increased yield
US8541208B1 (en) * 2004-07-02 2013-09-24 Metanomics Gmbh Process for the production of fine chemicals
US9085765B2 (en) * 2005-08-20 2015-07-21 Scarab Genomics Llc Reduced genome E. coli
WO2019122936A1 (en) * 2017-12-22 2019-06-27 Cancer Research Technology Limited Fusion proteins

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050260581A1 (en) * 2001-02-12 2005-11-24 Chiron Spa Gonococcal proteins and nucleic acids
US20030233675A1 (en) * 2002-02-21 2003-12-18 Yongwei Cao Expression of microbial proteins in plants for production of plants with improved properties
US8541208B1 (en) * 2004-07-02 2013-09-24 Metanomics Gmbh Process for the production of fine chemicals
US9085765B2 (en) * 2005-08-20 2015-07-21 Scarab Genomics Llc Reduced genome E. coli
WO2009037329A2 (en) * 2007-09-21 2009-03-26 Basf Plant Science Gmbh Plants with increased yield
WO2019122936A1 (en) * 2017-12-22 2019-06-27 Cancer Research Technology Limited Fusion proteins

Similar Documents

Publication Publication Date Title
WO2021167414A1 (en) Purine nucleotide-producing microorganism and purine nucleotide production method using same
WO2022216088A1 (en) L-arginine-producing corynebacterium sp. microorganism and l-arginine production method using same
WO2022050671A1 (en) L-valine-producing microorganisms and l-valine producing method using same
WO2025023743A1 (en) Protein variant and method for producing l-lysine using same
WO2022239953A1 (en) Microorganism having enhanced activity of 3-methyl-2-oxobutanoate hydroxymethyltransferase and uses thereof
WO2022245176A1 (en) Microorganism producing purine nucleotide, and purine nucleotide production method using same
WO2022255839A1 (en) Novel yhhs variant and method for producing o-phosphoserine, cysteine, and derivate of cysteine using same
WO2022191633A1 (en) Novel citrate synthase variant and method for producing o-acetyl-l-homoserine or l-methionine using same
WO2021177731A1 (en) Glutamine synthetase mutant-type polypeptide and l-glutamine production method using same
WO2025058109A1 (en) Recombinant amino acid molecule, host cells for producing l-arginine, and methods for producing l-arginine using the same
WO2024144283A1 (en) Microorganism into which heterologous glutamine synthetase is introduced, and method for producing l-tryptophan by using same
WO2023277307A1 (en) Strain for producing highly concentrated l-glutamic acid, and l-glutamic acid production method using same
WO2022163904A1 (en) Novel protein variant and method for producing l-lysine using same
WO2022191630A1 (en) Novel citrate synthase variant and method for producing l-valine using same
WO2022108383A1 (en) Microorganism having enhanced l-glutamine producing ability, and l-glutamine producing method using same
WO2022124511A1 (en) Mutant atp-dependent protease, and method for producing l-amino acid using same
WO2023048343A1 (en) Novel glutamine hydrolysis gmp synthase variant and method for producing purine nucleotides using same
WO2025249709A1 (en) Protein variant and l-arginine production method using same
WO2022225254A1 (en) L-amino-acid-producing corynebacterium sp. microorganism, and method for producing l-amino acids by using same
WO2023249421A1 (en) L-histidine export protein and method for producing l-histidine by using same
WO2023063763A1 (en) Corynebacterium genus microorganism producing l-arginine, and method for producing l-arginine using same
WO2022186487A1 (en) Isopropylmalate synthase variant and method for producing l-leucine by using same
WO2025183467A1 (en) Microorganism comprising nicotinamide nucleotide transhydrogenase derived from edwardsiella tarda, and method for producing l-amid acids or derivatives thereof using same
WO2024248498A1 (en) O-phosphoserine sulfhydrylase variant and method for producing cysteine using same
WO2025244452A1 (en) Microorganism having increased activity of glycine cleavage system, and use thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23952349

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: KR1020257037430

Country of ref document: KR