[go: up one dir, main page]

WO2024092769A1 - Modified covalently-linked pili and recombinant bacteria comprising the same - Google Patents

Modified covalently-linked pili and recombinant bacteria comprising the same Download PDF

Info

Publication number
WO2024092769A1
WO2024092769A1 PCT/CN2022/130033 CN2022130033W WO2024092769A1 WO 2024092769 A1 WO2024092769 A1 WO 2024092769A1 CN 2022130033 W CN2022130033 W CN 2022130033W WO 2024092769 A1 WO2024092769 A1 WO 2024092769A1
Authority
WO
WIPO (PCT)
Prior art keywords
polypeptide
gca
carrier protein
seq
spa2
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/130033
Other languages
French (fr)
Other versions
WO2024092769A9 (en
Inventor
Chao Zhong
Yuanyuan Huang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202280102906.7A priority Critical patent/CN120476135A/en
Priority to PCT/CN2022/130033 priority patent/WO2024092769A1/en
Publication of WO2024092769A1 publication Critical patent/WO2024092769A1/en
Publication of WO2024092769A9 publication Critical patent/WO2024092769A9/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/34Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Corynebacterium (G)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/15Corynebacterium

Definitions

  • the present disclosure relates to biological engineering.
  • the present disclosure relates to engineered bacteria, such as Corynebacterium glutamicum comprising modified covalently-linked pili (CLP) .
  • CLP covalently-linked pili
  • the engineered living materials relate to engineered biomaterials with distinctive “living” attributes such as autonomous growth, self-healing and environmental responsiveness that are only found in natural living materials, a wide range of remarkable ELMs had been developed for the applications in biosensors, bioremediation, biomedicine, biomanufacturing, wearable devices, and electronics.
  • ELMs can be produced either by harnessing engineered cells to simultaneously make the material and incorporate novel functionalities into it (known as self-organizing living materials or biological ELMs) or by embedding living cells in an organic or inorganic matrix (referred to as hybrid living materials) .
  • Self-organizing living materials aim to recapitulate the autonomous, adaptive, and versatile properties of natural living materials, and represent opportunities to harness engineered biological systems for new capabilities.
  • Some Gram-positive bacteria comprise covalently-linked pili (CLP) .
  • CLP covalently-linked pili
  • the CLP monomer subunits are typically joined via intermolecular isopeptide bond catalyzed by sortase conferring enormous tensile strength (McConnell, S. A. et al., Protein labeling via a specific lysine-isopeptide bond using the pilin polymerizing sortase from Corynebacterium diphtheriae. J. Am.
  • the CLP subunits contain auto-catalyzed intramolecular isopeptide bonds that are less susceptible to proteolytic cleavage and can dissipate mechanical energy (Ramirez, N.A. et al., 2020) imparting the robustness of CLP.
  • several pilin proteins in the CLP structure of different strains contain additional disulfide bonds that further enhance stability (Kang, H. J. et al., The Corynebacterium diphtheriae shaft pilin SpaA is built of tandem Ig-like modules with stabilizing isopeptide and disulfide bonds. Proc. Natl. Acad. Sci. U.S.A. 106, 16967-16971, 2009) .
  • the inventors develop an integrative technological platform for ELMs based on the discovary of the biosynthetic gene cluster (BGC) of the covalently-linked pili (CLP) fiber in the industrial workhorse Corynebacterium glutamicum.
  • BGC biosynthetic gene cluster
  • CLP covalently-linked pili
  • the present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
  • CLP covalently-linked pili
  • the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  • the carrier protein is a major pilin.
  • the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
  • the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
  • the carrier protein is a major pilin from Corynebacterium glutamicum.
  • the polypeptide of interest is inserted into the M domain of the major pilin.
  • the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  • the present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure, and a vector comprising the polynucleotide, as well as a host cell comprising the polypeptide, the polynucleotide or the vector of the present disclosure.
  • the present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
  • the recombinant cell is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  • the carrier protein is a major pilin.
  • the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
  • the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
  • the carrier protein is a major pilin from Corynebacterium glutamicum.
  • the polypeptide of interest is inserted into the M domain of the major pilin.
  • the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  • the carrier protein comprises amino acids 35-509 of SEQ ID NO: 1, and the polypeptide of interest is fused to the N terminus of carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  • the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
  • the present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion polypeptide of the present disclosure into a host cell derived from a microorganism having CLP.
  • the host cell is knock-out of native major pilin.
  • the method comprises a step of native major pilin knock-out.
  • the present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure.
  • CLP covalently-linked pili
  • the present disclosure provides a method of preparing a modified CLP comprising the steps of
  • the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure.
  • the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase.
  • the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  • the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
  • the method is an in vitro method.
  • the present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.
  • the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  • the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
  • Fig. 1 shows the map of plasmid pEK-spa2.
  • Fig. 2 shows the workflow for constructing the tandem of two cassettes.
  • Fig. 3 shows the maps of plasmids comprising the tandem of two cassettes.
  • Fig. 4 shows the map of plasmid pZ9-dxs_crtEBI.
  • Fig. 5 shows the map of plasmid pET-28a-Spa2.
  • Fig. 6 shows the Cg CLP biosynthetic gene cluster (BGC) encoding the sortase genes srtC1 and srtC2, and the sortase-catalyzed pilin genes spa1, spa2, and spa3.
  • BGC Cg CLP biosynthetic gene cluster
  • Fig. 7 is the TEM and AFM images showing that the major pilin Spa2 is indispensable for Cg CLP fiber structure formation.
  • the bars in the TEM and AFM images are 200 nm and 400 nm, respectively
  • Fig. 8 shows the identification of the composition of CLP in C. glutamicum (CgCLP) by immunogold labelling.
  • the cartoon shows that Cg CLP fibers comprise two minor pilins (Spa1 and Spa3) and a major pilin of Spa2.
  • the immunogold labelling and TEM images show the constitution and distribution of Cg CLP pilins indicating that Spa2 is the major pilin.
  • For single immunogold labelling of Cg CLP with primary polyclonal antibodies of Spa1, Spa2, and Spa3 ( ⁇ -Spa1, ⁇ -Spa2, and ⁇ -Spa3, respectively) ; gold-decorated goat anti-rabbit IgG was used as the secondary antibody for labelling target pilin.
  • Fig. 9 shows the deletion of both the srtC1 and srtC2 genes abrogates pili formation.
  • the bars in the TEM (a) and AFM (b) images are 200 nm and 400 nm, respectively.
  • ⁇ -Spa2 is the primary antibody
  • the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody.
  • Each ELISA experiment was performed at least in triplicate, and the standard error was shown.
  • Fig. 10 shows the isolation of Cg CLP fibers for mass spectrometry analysis.
  • SDS-PAGE gel electrophoresis analysis of the nickel affinity chromatography purified Cg CLP fibers showed the high-molecular Cg CLP polymers were eluted under 100 mM imidazole.
  • Fig. 11 shows the identification of intermolecular isopeptide bonds for the polymerization of Spa2 monomers in Cg CLP. Fragmentation spectra of the parent ion at m/z 832.9 2+ containing the intermolecular isopeptide bond (green font) between Spa2 i Lys194 (blue font) and Spa2 i+1 Thr477 (red font) are shown.
  • Fig. 12 shows the liquid chromatography-tandem mass spectrometry (LC-MS/MS) identifies the signal peptide of Spa2.
  • the cartoon shows the amino acid sequence of Spa2 cut (replacing the 470-509 residues at the C-terminus of Spa2 with 6His) , enabling the Spa2 monomer not to be polymerized and to be secreted as a monomer in the medium.
  • SDS-PAGE gel electrophoresis indicates the purified Spa2 cut .
  • the LC-MS/MS identified that the residues 1-34 at the N-terminus of Spa2 are the signal peptide.
  • This figure shows an MS/MS spectrum of the peptide with m/z 916.4538 2+ generated from chymotrypsin digest of Spa2.
  • Predicted b-and y-type ions (not all included) are listed above and below the peptide sequence, respectively. Matched ions are labelled in the spectrum.
  • Fig. 13 shows the Quadrupole time-of-flight mass spectrometry measured the accurate molecular weight of Spa2 cut .
  • the measured molecular weight is ⁇ 54.7 Da less than the calculated value of Spa2 cut , indicating that three intramolecular isopeptide bonds and two disulfide bonds exist in the monomeric Spa2.
  • An intramolecular isopeptide bond formation will lose one molecule of ammonia, ⁇ 17 Da; A disulfide bond formation will lose two hydrogen atoms, ⁇ 2 Da.
  • Fig. 14 shows crystals of Spa2 diffracted to resolution on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) .
  • Fig. 15 shows the X-ray crystal structure of Spa2 which is arranged in three tandem Ig-like domains, N-domain (pink) , M-domain (blue) , and C-domain (green) . Residues involved in the formation of three intramolecular isopeptide bonds (yellow) and two disulfide bonds (red) are shown as sticks.
  • Fig. 16 shows the comparison of Spa2 in the crystal structure with the prediction from AlphaFold2 and crystal structure of 3HR6 and 4HSS.
  • C ⁇ alpha-carbon
  • RMSD root-mean-square deviation
  • Fig. 17 shows the Omit electron density maps showing the presence of internal covalent bonds in the crystal structure of Spa2.2mFo-DFc omit electron density maps of three isopeptide bonds (a) and two disulfide bonds (b) were shown in blue mesh, contoured at 1.0 ⁇ .
  • the omit electron density maps were generated using Phenix composite omit map.
  • Fig. 18 shows Identification of the disulfide bonds and intramolecular isopeptide bonds formation at appropriate sequence locations in Spa2 by LC-MS/MS analysis.
  • the cartoon shows the critical features in Spa2, including three intramolecular isopeptide bonds in individual domains, two disulfide bonds in the N-domain (C97-C128) and the C-domain (C380-C432) , the pilin motif of YPKN in N-domain, and the sortase cleavage sorting signal motif of LPLTG in C-domain.
  • Figs. 19 and 20 show the genetic manipulation in ⁇ spa2 strains (harboring a plasmid that expressed Spa2 or Spa2 variants of K194A, LPLTG 474LALAA478 , E158A, D246A, E435A, D246A/E435A, C97A, C380A, and C97A/C380A, respectively) to assess the key residues promoting the formation of inter-and intra-molecular isopeptide bonds, and disulfide bonds, in Spa2 by TEM bio-imaging (Fig. 19) and quantitative analysis of the amount of Cg CLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (Fig. 20) .
  • Results are presented as mean ⁇ s.d in Fig. 20.
  • Not significant (NS) P >0.05, *P ⁇ 0.05, **P ⁇ 0.01, ***P ⁇ 0.001, ****P ⁇ 0.0001.
  • Statistics were derived using a t-test. The bars in Fig. 19 are 200nm.
  • Fig. 21 shows the accurate molecular weight of Spa2 cut mutant variants determined by quadrupole time-of-flight mass spectrometry.
  • the measured molecular weight of E158A cut (a) , D246A cut (b) , E435A cut (c) , and D246A/E435A cut (d) are ⁇ 54.9, 37.3, 21.4, and 4.0 Da less than the calculated value of related variants, indicating that three, two, one and no intramolecular isopeptide bonds are retained in the corresponding monomeric mutants, respectively.
  • Spa2cut mutant variants E158A cut , D246A cut , E435A cut , and D246A/E435A cut were expressed in ⁇ spa2 and purified by nickel-affinity chromatography.
  • Fig. 22 shows the rational engineering of the Cg CLP protein scaffold through a modular genetic design strategy: the cartoon shows a polymerized Spa2 major pilin functionalized by incorporating a protein-of-interest (POI) (e.g., mCherry, a fluorescent reporter protein) at candidate insertion sites (including Q35 (E1) at the N-terminus, and G215 (E2) , G236 (E3) and G336 (E4) in the M-domain lacking a disulfide bond) based on structural verification.
  • POI protein-of-interest
  • Fig. 24 shows the TEM morphologies of the assembled mCherry-Spa2 fusion proteins associated with cell surfaces based on immunogold labelling.
  • TEM images of ⁇ spa2 cells (a) , E1 cells (b) , E2 cells (c) , E3 cells (d) and E4 cells (e) .
  • the TEM samples were collected from the ⁇ spa2 strain harboring a plasmid that expresses various mCherry-Spa2 fusions under the native constitutive promoter of the spa2 gene.
  • ⁇ -Spa2 is the primary antibody
  • the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Scale bars, 200 nm.
  • Fig. 25 shows the extracellular secretion and assembly of R-Spa2 pilins into CgCLP fiber at the cell-surfaces of engineered C. glutamicum cells: a series of R-Spa2 fusion protein constructs comprising functional R peptides/proteins with different amino acid sequences.
  • Fig. 27 shows the Functional characterization of engineered Cg CLP with various fusion domains.
  • (a) TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2 Cg CLP.
  • (b) Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2 Cg CLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs.
  • (c) Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2 Cg CLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs.
  • Fig. 28 shows the schematic showing simultaneous expression of the two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (N-Ven-Spa2+C-Ven-Spa2 strain) , containing the N-terminus (N-Ven) and C-terminus (C-Ven) module of the split-Venus system, resulting in co-assembly of the split-Venus components into the final functional Cg CLP structures.
  • Fig. 29 shows the TEM morphologies of the assembled split-Venus components fused with Spa2 associated with cell surfaces based on immunogold labelling.
  • N-Ven+C-Ven cells expressing co-secreted split-Venus system (a) , N-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of N-Venus-Spa2 (b) , C-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of C-Venus-Spa2 (c) , and N-Ven-Spa2+C-Ven-Spa2 cells for simultaneous expression of two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (d) .
  • TEM samples were collected from the ⁇ spa2 strain harboring a plasmid that expresses various Spa2 fusion proteins under the native constitutive promoter of the spa2 gene.
  • ⁇ -Spa2 is the primary antibody
  • 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody.
  • Scale bars 200 nm.
  • Fig. 30 shows the co-assembly of split-Venus components into the Cg CLP fibers leading to increased fluorescence intensity.
  • the engineered C. glutamicum cells show greater fluorescence intensity only in the N-Ven-Spa2+C-Ven-Spa2 strain, and
  • (b) confocal microscopy of C. glutamicum cells showing that the strongest Venus fluorescence signal appeared at the extracellular sites of the N-Ven-Spa2+C-Ven-Spa2 strain (scale bar 2 ⁇ m) .
  • Fig. 31 shows the schematic illustrating of engineered C. glutamicum living materials transforming cellulosic biomass into a value-added product of lycopene by combining the extracellular cellulose degradation capacity and intracellular bioconversion ability.
  • extracellular cellulose degradation (Step1) , endo-1, 4- ⁇ -glucanase from T. reesei (TrEgl) and a ⁇ -glucosidase from S.
  • SdBgl Spa2 pilin
  • TrEgl-Spa2+SdBgl-Spa2 Spa2 pilin
  • Step2 the glucose was used for lycopene production in the pathway engineered C. glutamicum of C003 strain by inducing IPTG.
  • G3P glyceraldehyde-3-phosphate
  • IPP isopentenyl phosphate.
  • Fig. 32 shows the lycopene production from biowastes with engineered C. glutamicum harboring modified CLPs.
  • a TEM images show that cells of C003, which contain the P2 plasmid, enabled co-assembly of TrEgl and SdBgl into Cg CLP structure, while the cells of C001, C002, and C004 did not.
  • Cg CLP was labeled with 10 nm gold particles by immunogold labelling. Scale bars, 200 nm.
  • ELMs can degrade CMC-Na in a medium from a viscous gel to a thin solution only when both TrEgl and SdBgl were co-assembled into the CgCLP structure (TrEgl-Spa2+SdBgl-Spa2, C003 strain) , outperforming the case of the secreted free enzymes (TrEgl+SdBgl, C004 strain) .
  • ⁇ spa2 ⁇ dec (C001 strain) is the negative control strain.
  • the C003 strain showed 4-fold higher enzymeactivity than the C004 strain.
  • covalently-linked pili or “CLP” refers to pili in which the monomers are linked to each other via covalent bonds.
  • the engineered living materials herein refers to the pili formed by the engineered monomers, i.e., the fusion polypeptide of the present disclosure, or recombinant bacterium forming the pili.
  • C. glutamicum a Gram-positive bacterium
  • GRAS general regarded as safe
  • peptide can be exchanged with “polypeptide” and “protein” , means a chain comprising at least two amino acids linked by peptide bond, such as ten or more amino acid residues.
  • the chemical formulas or sequences of all the peptides and polypeptide herein are written in left-to-right order, showing the direction from the amino terminal to the carboxyl terminal.
  • “Peptide” , “polypeptide” and “protein” can include, but are not limited to, an enzyme, an antibody, a hormone, a ligand, a receptor, etc.
  • amino acid includes amino acids naturally occurred in proteins and the unnatural amino acids.
  • the conventional nomenclature one-letter and three-letter of the amino acids naturally occurred in proteins is employed, which can be seen in Sambrook, et al. (Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989) .
  • fusion polypeptide is a recombinant product comprising two or more peptide fragments which are not present in a single natural polypeptide.
  • the fragments can be fused directly or via a linker, such as a flexible linker, e.g., GS linkers.
  • a fusion polypeptide can be produced by the expression of a polynucleotide comprising nucleotide sequences encoding the two or more peptide fragments and the linker, if present, in desired order.
  • polynucleotide usually refers to generally a nucleic acid molecule (e.g., 100 nucleotides and up to 30k nucleotides in length) and a sequence that is either complementary (antisense) or identical (sense) to the sequence of a messenger RNA (mRNA) or miRNA fragment or molecule.
  • mRNA messenger RNA
  • miRNA fragment or molecule usually refers to DNA or RNA molecules that are either transcribed or non-transcribed.
  • polynucleotide construct refers to a single-stranded or double-stranded polynucleotide, which is isolated from a naturally occurring gene or modified to contain a nucleic acid segment that does not naturally occur.
  • polynucleotide construct contains the control sequences required to express the coding sequence of the present disclosure, the polynucleotide construct comprises an “expression cassette” .
  • exogenous polynucleotide refers to a nucleotide sequence that does not originate from the host in which it is placed. It may be identical or heterologous to the host’s DNA. An example is a sequence of interest inserted into a vector. Such exogenous DNA sequences may be derived from a variety of sources including DNA, cDNA, synthetic DNA, and RNA. Exogenous polynucleotides also encompass DNA sequences that encode antisense oligonucleotides.
  • expression cassette refers to a polynucleotide segment comprising a polynucleotide encoding a polypeptide operably linked to additional nucleotides provided for the expression of the polynucleotide, for example, control sequence.
  • the term “encoding” means that a polynucleotide directly specifies the amino acid sequence of its protein product.
  • the boundaries of the coding sequence are generally determined by an open reading frame, which generally starts with the ATG start codon or other start codons such as GTG and TTG, and ends with a stop codon such as TAA, TAG and TGA.
  • the coding sequence can be a DNA, cDNA or recombinant nucleotide sequence.
  • expression includes any step involved in the production of a polypeptide, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
  • control sequence includes all elements necessary or beneficial for the expression of the polynucleotide encoding the polypeptide of the present disclosure.
  • Each control sequence may be natural or foreign to the nucleotide sequence encoding the polypeptide, or natural or foreign to each other.
  • control sequences include, but are not limited to, leader sequence, polyadenylation sequence, propeptide sequence, promoter, enhancer, signal peptide sequence, and transcription terminator.
  • control sequences include a promoter and signals for the termination of transcription and translation.
  • control sequence may be a suitable promoter sequence, a nucleotide sequence recognized by the host cell to express the polynucleotide encoding the polypeptide of the present disclosure.
  • the promoter sequence contains a transcription control sequence that mediates the expression of the polypeptide.
  • the promoter may be any nucleotide sequence that exhibits transcriptional activity in the selected host cell, for example, lac operon of E. coli.
  • the promoters also include mutant, truncated and hybrid promoters, and can be obtained from genes encoding extracellular or intracellular polypeptides, which are homologous or heterologous to the host cell.
  • operably linked refers to a configuration in which a control sequence is placed at an appropriate position relative to the coding sequence of the polynucleotide sequence, whereby the control sequence directs the expression of the polypeptide coding sequence.
  • the polynucleotide encoding a polypeptide of interest can be subjected to various manipulations to improve the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector or the host, such as codon optimization, is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.
  • recombinant refers to nucleic acids, vectors, polypeptides, or proteins that have been generated using DNA recombination (cloning) methods and are distinguishable from native or wild-type nucleic acids, vectors, polypeptides, or proteins.
  • hybridization that nucleotides sequences, which are at least about 90%, preferably at least about 95%, more preferably at least about 96%, and more preferably at least 98%homologous to each other, generally maintain hybridization with each other under given stringent hybridization and washing conditions.
  • the sequences are aligned for the purpose of optimal comparison (e.g., a gap can be introduced into the first amino acid or nucleic acid sequence for the optimal alignment with the second amino acid or nucleic acid sequence) . Then, the amino acid residues or nucleotides at the corresponding amino acid positions or nucleotide positions are compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide at the corresponding position in the second sequence, these molecules are identical at this position.
  • the two sequences are identical in length.
  • Identity percentage or “sequence identity percentage” refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, the two polypeptides or polynucleotides have approximately the specified percentage of identical amino acids.
  • 95% identity refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, 95%of the amino acids in the two polypeptides or 95%of the nucleotides in the two polynucleotides are identical.
  • polynucleotide of the present disclosure does not include a polynucleotide that only hybridizes to a poly A sequence (such as the 3' end poly (A) of mRNA) or a complementary stretch of poly T (or U) residues.
  • the term “host cell” refers to, for example microorganisms, yeast cells, insect cells, and mammalian cells, that can be, or have been, used as recipients of vectors.
  • the term includes the progeny of the original cell which has been transduced.
  • a “host cell” as used herein generally refers to a cell which has been transduced with an exogenous DNA sequence. It is understood that the progeny of a single parental cell may not necessarily be completely identical in morphology or in genomic or total DNA complement to the original parent, due to natural, accidental, or deliberate mutation.
  • Spa2 protein is identified as the major pilin of the CLP fiber structure.
  • structure-guided design the inventor developed a new type of engineerable extracellular protein scaffold that can be genetically appended with diverse functional peptides or proteins at multiple sites of Spa2 protein.
  • the present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
  • CLP covalently-linked pili
  • the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  • the bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA
  • cremoris NZ9000 GenBank assembly accession: GCA_000143205.1
  • cremoris MG1363 GenBank assembly accession: GCA_000009425.1
  • cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp.
  • lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv.
  • Lactococcus lactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp.
  • lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp.
  • cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bac
  • tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp.
  • tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp.
  • paracasei strain BD5115 GenBank assembly accession: GCA_018596415.1
  • Paracasei JCM 8130 GenBank assembly accession: GCA_000829035.1
  • Corynebacterium glutamicum ATCC 14067 preferably, Corynebacterium glutamicum ATCC 14067.
  • the carrier protein is a major pilin.
  • the fusion of insertion of the polypeptide of interest does not influence the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond in the carrier protein.
  • the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
  • the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
  • the carrier protein is a major pilin from Corynebacterium glutamicum (Spa2 protein) . It is observed that the Spa2 protein (SEQ ID NO: 1) comprises three tandem Ig-like domains, including N-domain (residues 36-197) , M-domain (residues 198-343) , and C-domain (residues 344-469) which is consistent with other major pilin. It is also observed that the deletion of M-domain does not influence the formation of CLP.
  • the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
  • the carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide.
  • the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  • the polypeptide of interest is directly linked to the N terminal of the carrier polypeptide. In some embodiments, the polypeptide of interest is linked to the N terminal of the carrier polypeptide via a peptide linker such as a flexible linker.
  • a peptide linker can be generally short peptides with about 4-20 or more amino acids, such as combinations of Ser and Gly residues, which is a conventional flexible linker.
  • the peptide linker used in the present disclosure is (G4S) 2 i.e., SEQ ID NO: 22.
  • the peptide linker is a C10 linker of SEQ ID NO: 23.
  • the polypeptide of interest can be selected according to the desired application of the fusion polypeptide.
  • the fusion polypeptide is provided to bind, capture or enrich a target molecule
  • the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody.
  • the fusion polypeptide is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37)
  • the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.
  • the fusion polypeptide is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .
  • the fusion polypeptide is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme.
  • the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4- ⁇ -glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or ⁇ -glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) .
  • the fusion polypeptide is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.
  • the present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure.
  • the polynucleotide of the present disclosure can be amplified with cDNA, mRNA or genomic DNA as the template and suitable oligonucleotide primers according to standard PCR amplification techniques.
  • the nucleic acid amplified as above can be cloned into a suitable vector and characterized by DNA sequence analysis.
  • the polynucleotide of the present disclosure can be prepared by standard synthesis techniques, for example, by using an automated DNA synthesizer.
  • a nucleic acid molecule that is complementary to other nucleotide sequence is a molecule that is sufficiently complementary to the nucleotide sequence so that it can hybridize with the other nucleotide sequences to form a stable duplex.
  • a polynucleotide construct and a vector comprising the polynucleotide of the present disclosure, such as an expression vector.
  • the polynucleotide of the present disclosure is operably linked to a promoter.
  • the promoter is a constitutive promoter, such as the native promoter driving Spa2 gene in Corynebacterium glutamicum.
  • the promoter is an inducible promoter.
  • the expression vector comprises a Lac operon.
  • the polynucleotide encoding the polypeptide of the present disclosure can be subjected to various manipulations to allow the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.
  • the vector of the present disclosure preferably contains one or more selectable markers, which allow simple selection of transformed, transfected, transduced, etc. cells.
  • a selectable marker is a gene, of which the product provides biocide or virus resistance, heavy metal resistance, supplemental auxotrophs, etc.
  • the bacterial selectable marker is the dal gene from Bacillus subtilis or Bacillus licheniformis, or a marker that confers antibiotic resistance such as ampicillin, kanamycin, chloramphenicol or tetracycline resistance.
  • the vector of the present disclosure can be integrated into the genome of the host cell or autonomously replicate in the cell, which is independent of the genome.
  • the elements required for the integration into the genome of the host cell or the autonomous replication are known in the art (see, for example, the aforementioned Sambrook et al., 1989) .
  • the present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
  • the carrier protein in the fusion polypeptide is the native major pilin of the recombinant cell.
  • the recombinant cell is a recombinant gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  • a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei preferably, Corynebacterium glutamicum.
  • the bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA
  • cremoris NZ9000 GenBank assembly accession: GCA_000143205.1
  • cremoris MG1363 GenBank assembly accession: GCA_000009425.1
  • cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp.
  • lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv.
  • Lactococcus lactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp.
  • lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp.
  • cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bac
  • tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp.
  • tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp.
  • paracasei strain BD5115 GenBank assembly accession: GCA_018596415.1
  • Paracasei JCM 8130 GenBank assembly accession: GCA_000829035.1
  • Corynebacterium glutamicum ATCC 14067 preferably, Corynebacterium glutamicum ATCC 14067.
  • the carrier protein is a major pilin. In some embodiments, the carrier protein is the native major pilin of the bacterium.
  • the fusion of insertion of the polypeptide of interest does not influence the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond in the carrier protein.
  • the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
  • the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
  • the carrier protein is a major pilin from Corynebacterium glutamicum (Spa2 protein) . It is observed that the Spa2 protein (SEQ ID NO: 1) comprises three tandem Ig-like domains, including N-domain (residues 36-197) , M-domain (residues 198-343) , and C-domain (residues 344-469) which is consistent with other major pilin. It is also observed that the deletion of M-domain does not influence the formation of CLP.
  • the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4.
  • the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
  • the carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  • the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1.
  • the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  • the polypeptide of interest is directly linked to the N terminal of the carrier polypeptide. In some embodiments, the polypeptide of interest is linked to the N terminal of the carrier polypeptide via a peptide linker such as a flexible linker.
  • a peptide linker can be generally short peptides with about 4-20 or more amino acids, such as combinations of Ser and Gly residues, which is a conventional flexible linker.
  • the peptide linker used in the present disclosure is (G4S) 2 i.e., SEQ ID NO: 22.
  • the peptide linker is a C10 linker of SEQ ID NO: 23.
  • the polypeptide of interest can be selected according to the desired application of the fusion polypeptide.
  • the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4- ⁇ -glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or ⁇ -glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) .
  • the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
  • the recombinant cell is provided to bind, capture or enrich a target molecule
  • the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody.
  • the recombinant cell is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37)
  • the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.
  • the recombinant cell is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .
  • the recombinant cell is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme.
  • the recombinant cell is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4- ⁇ -glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or ⁇ -glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) .
  • the recombinant cell is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.
  • the present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion polypeptide of the present disclosure into a host cell.
  • the carrier protein in the fusion polypeptide is the native major pilin of the host cell.
  • the host cell is a gram-positive bacterium. In some embodiments, the host cell is a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  • the bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA
  • cremoris NZ9000 GenBank assembly accession: GCA_000143205.1
  • cremoris MG1363 GenBank assembly accession: GCA_000009425.1
  • cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp.
  • lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv.
  • Lactococcus lactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp.
  • lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp.
  • cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bac
  • tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp.
  • tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp.
  • paracasei strain BD5115 GenBank assembly accession: GCA_018596415.1
  • Paracasei JCM 8130 GenBank assembly accession: GCA_000829035.1
  • Corynebacterium glutamicum ATCC 14067 preferably, Corynebacterium glutamicum ATCC 14067.
  • the host cell is modified to inactivate the native major pilin.
  • the method comprises a step of knocking out the native major pilin.
  • the endogenous polynucleotide encoding the major pilin can also be replaced by the polynucleotide encoding the fusion polypeptide via homologous recombination.
  • the present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure.
  • the modified CLP is cell-free.
  • the present disclosure further provides a method of preparing a modified CLP comprising the steps of a) providing the fusion polypeptide of the present disclosure; and b) providing an activity of sortase.
  • the modified CLP is cell-free.
  • the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure.
  • the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase.
  • the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  • the method comprises contacting the fusion polypeptide of the present disclosure with the sortase protein.
  • the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
  • the method is an in vitro method.
  • the present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.
  • the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  • the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
  • the modified CLP and recombinant cell achieve the cascade reaction of enzymes, and improves the catalytic efficiency of a multi-enzyme system.
  • the immobilization of enzymes onto CLP and recombinant cells can achieve a whole-cell catalyzation.
  • the original DNA sequence was fully synthesized (Genewiz, Nanjing, China) or PCR-generated. All PCR products were generated by KOD DNA polymerase (TOYOBO, Japan) . All plasmid construction was performed using the T4 DNA ligase (New England BioLabs, Boston, MA) for ligations or the NEB Builder HiFi DNA Assembly Master Mix (New England BioLabs, Boston, MA) for assembly. All plasmids or markerless strains were confirmed by DNA sequencing (GENEWIZ, Guangzhou, China) . Primers used in the Examples are listed in Table 1.
  • C. glutamicum ATCC140675 was provided by Dr. Zheng’s research group at the South China University of Technology.
  • C. glutamicum ATCC14067 was grown in BHI liquid medium for recovery (37 g L -1 brain heart infusion (Becton, Dickinson and company) ) at 30 °C, 250 rpm, overnight.
  • BHI liquid medium for recovery 37 g L -1 brain heart infusion (Becton, Dickinson and company)
  • C. glutamicum ATCC14067 was inoculated into M63 liquid medium (15.6 g L -1 M63 Broth (Sangon Biotech, Guangzhou, China) , supplemented with 1 mM MgSO4, 0.2% (wt/vol) glucose) and cultivated in an incubator at 30 °C without shaking for 2-3 days.
  • Antibiotics for C. glutamicum culture were kanamycin (25 ⁇ g mL -1 ) and hloramphenicol (7.5 ⁇ g mL -1 )
  • Isopropyl- ⁇ -d-thiogalactoside (IPTG) at 1 mM/0.5mM or theophylline at 1mM was used to induce gene expression.
  • Trans1-T1 TransGen Biotech, Shenzhen, China
  • E. coli BL21 DE3 (New England BioLabs, Boston, MA) was used for protein expression.
  • E. coli was cultured in Luria-Bertani medium (10 g L -1 peptone, 5 g L -1 yeast extract, 10 g L -1 NaCl) at 37 °C or 16 °C when applicable for protein expression.
  • Antibiotics for E. coli culture were kanamycin (50 ⁇ g mL -1 ) and chloramphenicol (30 ⁇ g mL -1 ) .
  • the markerless deletion strains of C. glutamicum ATCC 14067 were achieved by the RecET-Cre/loxP system. Detailed methods for markerless deletion are described in Huang, Y. et al. (Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette. Sci. Rep. 7, 1-8, 2017) .
  • dsDNA fragments including the Cre-Kan cassette, the left and right homologous fragments, were used for subsequent fusion PCR to generate a ⁇ 4, 385 bp linear self-excisable dsDNA cassette with primer pairs clpL-S/clpR-A.
  • primer pairs spa1L-S/A, spa1R-S/A, ck-S/A and spa1L-S/spa1R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
  • primer pairs spa2L-S/A, spa2R-S/A, ck-S/A and spa2L-S/spa2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
  • primer pairs spa3L-S/A, spa3R-S/A, ck-S/A and spa3L-S/spa3R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
  • primer pairs srtC1L-S/A, srtC2R-S/A, ck-S/A and srtC1L-S/srtC2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
  • primer pairs srtAL-S/A, srtAR-S/A, ck-S/A and srtAL-S/srtAR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
  • primer pairs decL-S/A, decR-S/A, ck-S/A and decL-S/decR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassette, respectively.
  • the self-excisable dsDNA cassettes for markerless deletion of different genes were transformed into exonuclease-recombinase RecE/T expressed competent cells (C. glutamicum ATCC 1406) by electroporation, yielding multiple Kan-resistant colonies on BHI agar plates.
  • the cell-plasmid DNA/dsDNA mixture was transferred to an ice-cold electroporation cuvette (0.1 cm electrode gap) .
  • Electroporation was performed with a Bio-Rad Micropulser set by three times 1.8 KV/cm (Ec1) pulse (see Huang et al., Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette, Sci Rep 7, 7916 (2017) )
  • Cre enzyme was used to induce expression by adding 1 mM theophylline and excising selectable marker by Cre/lox site specific recombination. Finally, sequencing of the PCR fragments from the genomic of mutants was performed for further identification.
  • the resultant mutant strains used in this study were referred to as C. glutamicum ATCC 14067 ⁇ clp ( ⁇ clp) , C. glutamicum ATCC 14067 ⁇ spa1 ( ⁇ spa1) , C. glutamicum ATCC 14067 ⁇ spa2 ( ⁇ spa2) , C. glutamicum ATCC 14067 ⁇ spa3 ( ⁇ spa3) , and C.
  • glutamicum ATCC 14067 ⁇ srtC1 ⁇ srtC2 ( ⁇ srtC1 ⁇ srtC2) .
  • C. glutamicum ATCC 14067 ⁇ spa1 ⁇ spa3 ( ⁇ spa1 ⁇ spa3) mutant was constructed by transforming ⁇ spa3-cassette into ⁇ spa1 strain.
  • C. glutamicum ATCC 14067 ⁇ spa2 ⁇ srtA ( ⁇ spa2 ⁇ srtA) and C. glutamicum ATCC 14067 ⁇ spa2 ⁇ dec ( ⁇ spa2 ⁇ dec) mutants were constructed by transforming ⁇ srtA-cassette and ⁇ dec-cassette into ⁇ spa2 strain, respectively, as described above.
  • the pEC-XK99E plasmid was used as an original plasmid.
  • DNA fragments of the pEC-XK99E backbone (GNENWIZ, China) the coding sequence of Spa2 or various recombinant Spa2 (SEQ ID NOs: 1, 5, 8-14, and 24, respectively) , and the native promoter (SEQ ID NO: 25) of spa2 gene via PCR, and then all the DNA fragments were assembled by NEB Builder HiFi DNA Assembly Master Mix to construct the plasmids pEK-spa2, pEK-spa2cut, pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2, pEK-E4/mCherry-spa2, pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2,
  • the two basic plasmids 203 and 204 were constructed based on pEC-XK99E backbone with additional restriction sites of SmaI, XbaI, NcoI, BamHI, SpeI and SalI by Gibson assembly with NEB Builder HiFi DNA Assembly Master Mix.
  • SmaI, XbaI, and NcoI were used to fuse proteins with Spa2 pilin, and SpeI and SalI (Takara) were used to insert another independent expression cassette for fusion protein.
  • CDSs coding sequences of SpyCatcher, Venus, CcEgl, N-Ven, and TrEgl
  • the CDSs of N-Ven and TrEgl were inserted into the linearized backbone of 203 (digestion with SmaI and SpeI, Takara) via Gibson assembly.
  • CDSs of C-Ven and SdBgl were cloned into the SmaI and XbaI sites in 204 by ligation.
  • CDSs of C-Ven and SdBgl were inserted into the linearized backbone of 204 (digestion with SmaI and SalI, Takara) via Gibson assembly.
  • the C-Ven-Spa2 cassette was obtained by digesting pEK-C-Ven-Spa2 with SpeI and SalI, and then, cloned into the plasmid of pEK-N-Ven-Spa2 (digested with SpeI and SalI, Takara) to construct tandem expression plasmids of pEK-N-Ven-Spa2_C-Ven-Spa2 (see Fig. 3) .
  • Spa2 The coding sequence of Spa2 (SEQ ID NO: 6) was amplified from the genome of C. glutamicum ATCC 14067, and then assembled into the pET-28a (+) backbone (Novagen, Madison, WI) by Gibson assembly (see Fig. 5) .
  • C. glutamicum cells cultured 2-3 days in M63 medium were collected and washed twice in PBS buffer, and 20 ⁇ L of liquid culture in M63 (OD600 ⁇ 1) were deposited onto carbon-coated TEM grids for 5-10 min.
  • the samples were washed two times with 50 ⁇ L PBS buffer and three times with 20 ⁇ L water, and then, the excessive solution was quickly wicked away with filter paper.
  • the cells were deposited onto the cropper wire mesh, and were negatively stained with 15 ⁇ L 2 w/v%uranyl acetate solutions for 1 min and dried for 10 min under an infrared lamp. Samples were examined in a JEOL JEM-1400 transmission electron microscope at an accelerating voltage of 120 kv.
  • C. glutamicum strains were cultured for 48 h in M63 liquid medium, and the cultures were collected, washed and diluted to an OD600 of 0.1 in Tris-buffered saline with 0.1%ProclinTM 300 (Sigma, 48912-U) on ice.
  • the recombinant Spa2 was expressed as an N-terminus His-tagged protein.
  • E. coli BL21 (DE3) transformed with plasmid PET-28a-Spa2 (CaCl 2 process) were grown overnight at 37°C to provide a starter culture for expression.
  • a total of 1 L medium with 50 ⁇ g mL -1 kanamycin was inoculated with 1% (v/v) of the starter culture and grown at 37°C.
  • the cultivation temperature was lowered to 16°C and IPTG was added to a final concentration of 0.5 mM to induce protein overexpression.
  • cells were collected by centrifugation, and the cell pellets were suspended in buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) and lysed by high pressure homogenization. The cell lysates were centrifuged at 12, 000 rpm for 30 min at 4°C.
  • buffer A 50 mM Tris-HCl, 150 mM NaCl, pH 8.0
  • the resulting supernatant was loaded onto a Nickel-affinity column (5 mL, GE) pre-equilibrated with buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) .
  • His-tagged Spa2 protein was eluted with buffer A with 50 mM imidazole.
  • the His-tagged Spa2 protein was buffer-exchanged into buffer A and subjected to tag removal by HRV3c (SEQ ID NO: 34, 1 mg/50 mg Spa2) at 4 °C overnight.
  • the digested product was loaded onto the 5-mL Ni-NTA column (GE) and eluted with a buffer A/buffer B (buffer A + 500 mM imidazole) gradient (5%buffer B, 10%buffer B, 20%buffer B and 100%buffer B) .
  • the flow-through at 10% buffer B was collected.
  • the final purified protein was concentrated to 20 mg mL-1 in 10 mM Tris-HCl pH 8.0 and 50 mM NaCl for crystallization.
  • the sitting drop vapor diffusion technique http: //soft-matter. seas. harvard. edu/index. php/Vapor_Diffusion_Method) was used to crystallize the Spa2 protein. Crystals were obtained by mixing 4 ⁇ L of Spa2 protein with 4 ⁇ L reservoir solution (0.2 M sodium sulfate, 0.1 M Bis-Tris propane pH 7.5, 20 %w/v PEG 3350) and incubating the mixture at 18 °C for 1-2 weeks.
  • the crystals were soaked in a cryo-protectant solution consisting of the reservoir solution and 20% (v/v) glycerol and then quickly frozen with liquid nitrogen. Diffraction data were collected on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) with flash frozen crystals (at 100 K in a stream of nitrogen gas) . The data were processed by XDS9 and then further processed using STARANISO10 (aserver of Global Phasing Company) .
  • the structure was solved by the molecular replacement method using PHASER11 and the predicted Spa2 coordinates by Alphafold Colab12 as template. Further manual model building was carried out using COOT13. The model was refined by PHENLX14. Data collection, phasing and refinement statistics are given in Table 3. Structure figures were prepared using PyMOL2.3.4 (https: //pymol. org/2/) .
  • C. glutamicum colonies were inoculated into 10 mL BHI and cultured for 12 h. Then cells were transferred into M63 medium with an initial OD600 of 0.1 for 3 days at 30°C without shaking. Cells were collected by centrifugation at 5, 000 rpm, washed three times with PBS and diluted with PBS (OD600 ⁇ 0.5) . Exactly 200 ⁇ L of the samples were transferred to a flat-bottom 96-well black plate and analyzed on a Tecan Infinite Pro 200 Plate Reader, with excitation/emission wavelengths of 580/610 nm for mCherry fluorescence intensity, and 510/545 nm for Venus fluorescence intensity. The fluorescence intensity divided by the absorbance of OD is the normalized fluorescence intensity.
  • Fluorescence (confocal) microscopy imaging Cells prepared for plate-reader measurements were dripped on a glass slide and imaged under a Nikon TI2-E inverted microscope. Microscope light source power, detector gain, and image processing settings were consistent among different samples.
  • Stains expressing SpyTag-Spa2, SpyCatcher-Spa2 and Spa2 (strain ⁇ spa2 transformed with pEK-SpyTagSpa2, pEK-SpyCatcherSpa2, and pEK-spa2, respectively) were cultured in glass-bottom dishes in M63 for 3 days. The dishes were then gently washed three times with PBS containing 0.5%Tween80 (PBST) and blocked in PBST with 1%BSA for 1 h.
  • PBST PBS containing 0.5%Tween80
  • the group of SpyTag-Spa2 and Spa2 were incubated with purified GFP-SpyCatcher (SEQ ID NO: 35) , and the group of SpyCatcher-Spa2 and Spa2 were incubated with purified GFP-SpyTag (SEQ ID NO: 36) for 1 h at room temperature. All samples were washed three times with PBS buffer and imaged under a Nikon TI2-E inverted microscope.
  • Spa2 strain or the Mfp3Spep-Spa2 strain was cultured in the M63 medium (3 mL) supplemented with 200 ⁇ L of green-fluorescent PS microsphere solution in 35-mm Petri dishes containing 2-3 glass slides for 3 days at 30°C without shaking. The settled glass slides were then taken out and gently flushed to wash away the microspheres that had not adhered. The binding capacity of different samples was compared with water jetting at a constant discharge pressure of 5 psi for 15 s, performed on a pressure-flow controller (PG-MFC-8CH, PreciGenome) . Fluorescence images were recorded before and after the mechanical challenge with water jetting.
  • PG-MFC-8CH pressure-flow controller
  • the pEK-spa2cut plasmid was transferred into ⁇ spa2 by electroporation as described above to construct the strain ⁇ spa2-pEK-spa2cut, which was used to express the monomer of Spa2cut (SEQ ID NO: 5) .
  • Cells were inoculated into M63 medium with 25 ⁇ g mL-1 kanamycin and cultured for 3 days.
  • Supernatants 200 mL were collected and concentrated into 1 mL and then purified by nickel-affinity chromatography as previously described in the section of “Expression and purification of recombinant Spa2” .
  • Spa2cut was eluted with 100 mM imidazole.
  • the final purified protein was buffer-exchanged into 10 mM Tris-HCl, 100 mM NaCl, pH 8.0.
  • a similar process was followed for expression and purification of Spa2cut mutant variants of E158Acut, D246Acut, E435Acut, and D246A/E435Acut.
  • ⁇ spa2 ⁇ srtA-pEK-6his-spa2 strain enables secretion of the expressed 6His- Cg CLP into the culture medium due to lacking sortase A.
  • 6His- Cg CLP polymers ⁇ spa2 ⁇ srtA-pEK-6his-spa2 cells were inoculated into M63 medium with 25 ⁇ g mL -1 kanamycin and cultured for 3 days.
  • 6His- Cg CLP purification 500 mL supernatants were collected and concentrated to 5mL in buffer of 10 mM Tris-HCl, 100 mM NaCl, pH 8.0 and were purified by nickel affinity chromatography.
  • the 6His- Cg CLP polymers were eluted with 100 mM imidazole. Purified 6His- Cg CLP fibers were then boiled in SDS sample buffer (6 ⁇ Protein Loading Buffer, TransGen Biotech, DL101-02) and subjected to an SDS-PAGE gel. The high-molecular-weight Cg CLP polymer bands were excised from Coomassie brilliant blue stained SDS-PAGE gels and prepared for intermolecular isopeptide bond identification.
  • the Spa2cut solution was precipitated with acetone (1: 4) and the pellets were dried using a Speedvac (room temperature) for 1-2 min. The pellets were then dissolved in 100 mM Tris-HCl (pH 8.5) supplemented with 8 M urea. 5mM TCEP (Thermo Scientific) for reduction and 10 mM iodoacetamide (Sigma) for alkylation were added and incubated at room temperature for 30 min. The protein mixture was diluted (1: 4) and digested overnight with chymotrypsin at 1: 40 (w/w) . The protease-digested peptide solution was desalted using a MonoSpinTM C18 column (GL Science, Tokyo, Japan) and dried with a SpeedVac.
  • the Spa2cut sample was processed following the same protocol as previously described for signal peptide identification.
  • the Spa2cut sample was processed following a similar protocol except that pepsin (Promega) was purposely added for digestion, while addition of 5mM TCEP (Thermo Scientific) was avoided to ensure that the disulfide bond, if any, was kept intact.
  • the Coomassie brilliant blue stained SDS-PAGE gel band of Cg CLP fibers was excised into small pieces and washed in water, followed by 50 mM NH 4 HCO 3 in 50%acetonitrile and 100%acetonitrile.
  • the sample was reduced with 10 mM TCEP (Thermo Scientific) in 100 mM NH 4 HCO 3 at 55 °C for 1 h and alkylated with 55 mM iodoacetamide (Sigma) in 100 mM NH 4 HCO 3 at 37 °C in the dark for 30 min.
  • the gel pieces were then washed with 100 mM NH 4 HCO 3 and 100%acetonitrile, and dried.
  • the sample was primarily digested with 3 ⁇ g trypsin (Promega) in 50 mM NH 4 HCO 3 at 37 °C overnight, then 1 ⁇ g of Asp-N endoproteinase (Promega) was added for another overnight incubation. Digested peptides were extracted twice with 50%acetonitrile containing 5%formic acid.
  • protease-digested peptides were analyzed by LCMS/MS using an Easy-nLC 1200 nano HPLC (Thermo Scientific) hybrid of a Q Exactive Orbitrap mass spectrometer (Thermo Scientific) system. Peptides were separated on a 30 cm-long pulled-tip analytical column (75 ⁇ m ID packed with ReproSil-Pur C18-AQ 1.9 ⁇ m resin, Dr. Maisch GmbH) in 0.1%aqueous formic acid (buffer A) and 0.1%formic acid in 80%acetonitrile (buffer B) at 55 °C with a flow rate of 300 nl/min using a 120 min linear gradient.
  • Buffer A 0.1%aqueous formic acid
  • buffer B 0.1%formic acid in 80%acetonitrile
  • CMC-Na carboxymethylcellulose sodium salt
  • DMS 3,5dinitrosaloculoc acid
  • TrEgl-Spa2_SdBgl-Spa2 C003 strain
  • TrEgl_SdBgl C004 strain
  • the lycopene producing plasmid of pZ9-dxs_crtEBI was transferred into strain TrEgl_SdBgl to construct the recombinant strains of C003 and C004 for the utilization of cellulose to produce lycopene.
  • C003 and C004 strains were inoculated into 10 mL BHI with 25 ⁇ g mL -1 kanamycin and 7.5 ⁇ g mL -1 chloramphenicol, and cultured for 12 h at 30 °C at a stirring speed at 200 rpm.
  • modified M63 medium (15.6 g L -1 M63 broth, supplemented with 1 mM MgSO 4 , 2% (wt/vol) CMC-Na) with initial OD600 of 3 for 2 days at 30°C and 1 mM IPTG was added or not.
  • lycopene production was carried out according to Li, C. et al. (Heterologous production of ⁇ -Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) .
  • IPTG induced and un-induced cells (1 mL) were separately collected into 2 mL tubes of lysing matrix Y (M. P. Biomedicals) by centrifugation at 12, 000 rpm for 5 min.
  • the pellets were resuspended in a 60%hexane and 40%acetone mixture and lysed using the FastPrepR-24 5G bead beating grinder and lysis system (M. P. Biomedicals) for lycopene extraction.
  • the lysis condition is 30 s once with a 1 min interval, for 6 times.
  • the samples were centrifuged at 14, 000 rpm for 10 min at 4 °C, and the resulting supernatant was then transferred to brown 2 mL screw cap glass vials (Agilent Technologies) and directly subjected to HPLC analysis.
  • the quantification of lycopene was performed on an Agilent 1260 series HPLC system (Agilent Technologies) using YMC Carotenoid (250 ⁇ 4.6 mml. D., YMC) and detected via a diode array detector (DAD) at 450 nm.
  • binary gradient elution was applied to change the eluent from 100%eluent A of methanol/Methyl tert-butyl ether/water (81/15/4) to 100%eluent B of methanol/Methyl tert-butyl ether/water (7/90/3) over 90 min at a flow rate of 1.0 mL ⁇ min-1 at 20 °C with an injection volume of 10 ⁇ L (eluent A for 2min, eluent B 2min-95min, and eluent A 95min-100min.
  • This Example was carried out to investigate the CLP assembly in the industrial workhorse C. glutamicum ATCC 14067 (referred to as Cg CLP) .
  • the industrial workhorse C. glutamicum is a ‘generally recognized as safe’ (GRAS) strain with well-established gene editing tools that is widely used for the industrial-scale production of valued products such as amino acids, diamines, terpenoids, and other chemicals (Zhao, N. et al. Development of a Transcription Factor-Based Diamine Biosensor in Corynebacterium glutamicum. ACS Synth. Biol. 10, 3074-3083, 2021; and Xu, X. et al., Ledesma-Amaro, R. &Liu, L. Microbial chassis development for natural product biosynthesis. Trends Biotechnol. 38, 779-796, 2020) .
  • GRAS generally recognized as safe
  • CLP BGC contains three pilin-encoding genes, spa1, spa2, and spa3, as well as two sortase coding genes of srtC1, and srtC2 (Fig. 6) , which is similar to the SpaH-type (arelatively less well-studied pili type) CLP gene cluster in the pathogenic C. diphtheriae (Mandlik, A. et al., Pili in Gram-positive bacteria: assembly, involvement in colonization and biofilm development. Trends Microbiol. 16, 33-40, 2008) .
  • the composition of Cg CLP was determined with polyclonal antibodies against Spa1, Spa2, and Spa3, respectively.
  • TEM images of the Cg CLP with immunogold labelling showed that the Cg CLP fibers comprise two minor pilins of Spa1 and Spa3 and a major pilin of Spa2 (Fig. 8) .
  • TEM and AFM imaging used to assess the specific roles of the three pilins in the Cg CLP assembly showed that the cells, which were defective for Spa1 ( ⁇ spa1 strain) , Spa3 ( ⁇ spa3 strain) , or both ( ⁇ spa1 ⁇ spa3 strain) , could still produce fibers (Fig. 7) .
  • cells lacking Spa2 ( ⁇ spa2) could not produce any fiber, and overexpression of Spa2 (Spa2) promoted the formation of abundant long fibers throughout the cell surface (Fig. 7) .
  • TEM and AFM images also showed that cells lacking both SrtC1 and SrtC2 ( ⁇ srtC1 ⁇ srtC2) completely blocked fiber formation (Fig. 9) .
  • the purified Cg CLP polymers were excised from Coomassie blue-stained SDS-PAGE gels (Fig. 10) and then digested in-gel with trypsin (Promega) and AspN endoproteinase (Promega) .
  • Liquid chromatography-tandem mass spectrometry was used to analyze the digestion products, and verify the presence of the intermolecular isopeptide bond (bond formation results in the elimination of a water molecule and thus a slight decrease of molecular weight) .
  • the peptide peak with m/z 832.9 2+ (Fig. 11 and Table 2) suggested that the major pilin of Spa2 was cross-linked between K194 in the N-terminus of Spa2 i and T477 in the C-terminus of Spa2 i+1 (Lys194-Thr477) .
  • This detected mass is consistent with the loss of three NH 3 units and two H 2 units, indicating the formation of three intramolecular isopeptide bonds (loss of one molecule of ammonia, ⁇ 17 Da) and two disulfide bonds (loss of two hydrogen atoms, ⁇ 2 Da) in Spa2.
  • a Values in parentheses correspond to the outermost shell of data.
  • d R free
  • Spa2 is arranged in three tandem Ig-like domains, including N-domain (residues 36-197, pink) , M-domain (residues 198-343, blue) , and C-domain (residues 344-469, green) , giving an elongated molecule in length (Fig. 15) .
  • These three tandem Ig-like domains of Spa2 are similar to the major pilin of SpaA (PDB ID: 3HR6, root-mean-square deviation (RMSD) over 270 alpha-carbon (C ⁇ ) atoms, Fig. 16b) and SpaD (PDB ID: 4HSS, RMSD over 311 C ⁇ atom, Fig. 16c) from human pathogen C.
  • glutamicum is similar to the feature of the major pilin SpaD from the pathogenic C. diphtheriae (Kang, H. J. et al., 2014 above) , but is quite different from the major pilin SpaA from the pathogenic C. diphtheriae lacking isopeptide bonds in the N-terminal domain (Kang, H.J. et al., 2009 above) .
  • two disulfide bonds were formed in the N-domain between Cys97 and Cys128 and the C-domain between Cys380 and Cys432, respectively (Fig. 17b) .
  • Spa2 the presence of two disulfide bonds in Spa2 is very unique in comparison with other major pilins in human pathogens, such as Spy0128 (PDB ID: 3B2M) from Streptococcus pyogenes 37 and BcpA (PDB ID: 3KPT) from Bacillus cereus 38 lacking disulfide bond, and the SpaA and SpaD from C. diphtheriae containing only one disulfide bond in the C-terminal domain (Kang, H. J. et al., 2009 and 2014 above) .
  • PDB ID: 3B2M Speptococcus pyogenes 37
  • BcpA PBD ID: 3KPT
  • SpaA and SpaD from C. diphtheriae containing only one disulfide bond in the C-terminal domain
  • the CLP structure may serve as an attractive building block for various applications because these extracellular fibers have extraordinarily high tensile strength owing to their extensive inter-and intra-molecular isopeptide bonds. Moreover, as an extracellular matrix, CLP fibers can be conveniently and reliably positioned directly outside cells. Finally, their proteinaceous nature makes them potentially amenable for elaboration using genetic engineering.
  • This Example was carried out to determine suitable fusion sites to append peptides/proteins to Spa2. According to both the Spa2 crystal structure and the characterization of specific functional domains within Spa2 observed in Example 2, four different positions to test the fusion of a protein-of-interest (POI) , with one site in the N-terminus of Spa2 and three sites in the M-domain lacking a disulfide bond (Fig. 22) .
  • POI protein-of-interest
  • the CLP-defective strain C. glutamicum ATCC 14067 ⁇ spa2 ( ⁇ spa2) with abrogated extracellular Cg CLP formation was transformed with the exogenous expression plasmid (pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2, or pEK-E4/mCherry-spa2) for Spa2 fusion protein expression to test the restored Cg CLP fiber production.
  • the fluorescent reporter protein mCherry was fused at the interrogated positions for generating functional fusion proteins (SEQ ID NOs: 8-11) while retaining the sortase-catalyzed covalently-linked pili formation capacity of Spa2.
  • SEQ ID NOs: 8-11 functional fusion proteins
  • four sites were tested for mCherry addition/insertion, including Q35 (E1) at the N-terminus of Spa2, G215 in loop 1 of the M-domain (E2) , G236 in the loop 2 of the M-domain (E3) , and G336 in the ⁇ 23-sheet of the M-domain (E4) .
  • Quantitative analysis showed that the cells expressing each of the fusion proteins fluoresced and enabled the formation of fiber (Fig. 23a) .
  • Spa2 fusion proteins (six POIs, each fused at the E1 position via a linker of SEQ ID NO: 23) (see Fig. 25) were expressed by ⁇ spa2 strains transformed with plasmids pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2, pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, and pEK-CcEgl-Spa2, respectively. All of these fusion proteins were successfully expressed, secreted, and formed Cg CLP (Fig. 26) .
  • TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2 Cg CLP (Fig. 27a) .
  • Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2 Cg CLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27b) .
  • Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2 Cg CLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27c) .
  • ⁇ spa2 strain was transformed with plasmids pEK-N-Ven-Spa2, pEK-C-Ven-Spa2 and pEK-N-Ven-Spa2_C-Ven-Spa2, respectively, ⁇ spa2 strain transformed with pEK-N-Ven_C-Ven was used as a control.
  • This Example was carried out to verify the co-assembly of multiple cellulases into a catalytic cascade for extracellular degradation of cellulose into glucose to support production of specific chemicals of interest (e.g., lycopene) in C. glutamicum ATCC 14067 ⁇ spa2 (Fig. 31) .
  • specific chemicals of interest e.g., lycopene
  • endo-1, 4- ⁇ -glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and ⁇ -glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) were co-assembled in the Cg CLP fiber; these two enzymes are known to work in concert to degrade cellulose into glucose via enzyme cascade reactions.
  • Lycopene can be produced via the methylerythritol phosphate (MEP) pathway by engineered C. glutamicum (Li, C. et al. Heterologous production of ⁇ -Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) .
  • a C001 chassis ⁇ spa2 ⁇ dec
  • spa2 spa2 ⁇ dec
  • CEY17_RS03380 for the abrogation Cg CLP formation
  • CEY17_RS03560 ⁇ dec, for accumulation of the precursor for lycopene production
  • the basal lycopene-producing strain C002 was constructed by transforming strain C001 with plasmid pZ9-dxs_crtEBI for IPTG-inducible expression of the dxs gene and crtEBI gene cluster. Then, the C002 strain was transformed with plasmids pEC-TrEgl-Spa2_SdBgl-Spa2, and pEC-TrEgl_SdBgl, respectively, resulting in the strains C003 and C004.
  • the C003 strain co-assembled TrEgl and SdBgl in Cg CLP fiber on the cell surface (Fig. 32a) and enabled the degradation of carboxymethylcellulose sodium (CMC-Na, the ether derivate of cellulose) in medium, based on the medium turning from a viscous gel to a thin solution (Fig. 32b) .
  • Strain C004, which only simultaneously secreted both TrEgl and SdBgl without anchoring to the Cg CLP scaffold did not show similar behavior.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided is a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently- linked pili (CLP) from a microorganism. Also provided is a recombinant cell comprising a modified CLP comprising the fusion polypeptide, as well as the modified CLP.

Description

Modified Covalently-linked Pili and Recombinant Bacteria Comprising the Same Technical Field
The present disclosure relates to biological engineering. In particular, the present disclosure relates to engineered bacteria, such as Corynebacterium glutamicum comprising modified covalently-linked pili (CLP) .
Background
The engineered living materials (ELMs) relate to engineered biomaterials with distinctive “living” attributes such as autonomous growth, self-healing and environmental responsiveness that are only found in natural living materials, a wide range of remarkable ELMs had been developed for the applications in biosensors, bioremediation, biomedicine, biomanufacturing, wearable devices, and electronics. Depending on the source of their structural components, ELMs can be produced either by harnessing engineered cells to simultaneously make the material and incorporate novel functionalities into it (known as self-organizing living materials or biological ELMs) or by embedding living cells in an organic or inorganic matrix (referred to as hybrid living materials) . Self-organizing living materials aim to recapitulate the autonomous, adaptive, and versatile properties of natural living materials, and represent opportunities to harness engineered biological systems for new capabilities.
Despite the advances in ELMs, further development and application of self-organizing living materials faces challenges due to the lack of engineerable chassis and the limited access to programmable endogenous biopolymers in microorganisms, particularly the non-pathogens. At present, only model microbial systems, such as Escherichia coli and Bacillus subtilis along with their extracellular amyloid fibers, and several non-model systems including bacterial cellulose-producing K. rhaeticus, the surface-layer protein-containing Caulobacter crescentus and the dominant bacterial component of Pantoea agglomerans in native feedstocks of fungus have been successfully harnessed in ELMs design (Tang, T. -C. et al., Materials design by synthetic biology. Nat. Rev. Mater. 6, 332-350, 2021; Caro-Astorga, J. et al., Bacterial cellulose spheroids asbuilding blocks for 3D and patterned living materials and for regeneration. Nat. Commun. 12, 1-9, 2021; Charrier, M. et al. Engineering the S-layer of Caulobacter crescentus as a foundation for stable, high-density, 2D living materials. ACS Synth. Biol. 8, 181-190, 2018; and Huang, J. et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nat. Chem. Biol. 15, 34-41, 2019) .
Some Gram-positive bacteria comprise covalently-linked pili (CLP) . Unlike the non-covalently linked pili produced in Gram-negative bacteria (Ramirez, N. A. et al., New paradigms of pilus assembly mechanisms in gram-positive actinobacteria. Trends Microbiol. 28, 999-1009, 2020) , the CLP monomer subunits are typically joined via intermolecular isopeptide bond catalyzed by sortase conferring enormous tensile strength (McConnell, S. A. et al., Protein labeling via a specific lysine-isopeptide bond using the pilin polymerizing sortase from Corynebacterium diphtheriae. J. Am. Chem. Soc. 140, 8420-8423, 2018) . Furthermore, the CLP subunits contain auto-catalyzed intramolecular isopeptide bonds that are less susceptible to proteolytic cleavage and can dissipate mechanical energy (Ramirez,  N.A. et al., 2020) imparting the robustness of CLP. In addition, several pilin proteins in the CLP structure of different strains contain additional disulfide bonds that further enhance stability (Kang, H. J. et al., The Corynebacterium diphtheriae shaft pilin SpaA is built of tandem Ig-like modules with stabilizing isopeptide and disulfide bonds. Proc. Natl. Acad. Sci. U.S.A. 106, 16967-16971, 2009) .
Therefore, there remains a need of developing new chassis for ELMs, such as self-organizing living materials, preferably a bacterium forming CLP.
Summary of the Invention
The inventors develop an integrative technological platform for ELMs based on the discovary of the biosynthetic gene cluster (BGC) of the covalently-linked pili (CLP) fiber in the industrial workhorse Corynebacterium glutamicum.
In the first aspect, the present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
In some embodiments, the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. In some embodiments, the carrier protein is a major pilin.
In some embodiments, the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
In some embodiments, the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
In the second aspect, the present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure, and a vector comprising the polynucleotide, as well as a host cell comprising the polypeptide, the polynucleotide or the vector of the present disclosure.
In the third aspect, the present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
In some embodiments, the recombinant cell is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. In some embodiments, the carrier protein is a major pilin.
In some embodiments, the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
In some embodiments, the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
In some embodiments, the carrier protein comprises amino acids 35-509 of SEQ ID NO: 1, and the polypeptide of interest is fused to the N terminus of carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
In some embodiments, the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
In the fourth aspect, the present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion  polypeptide of the present disclosure into a host cell derived from a microorganism having CLP.
In some embodiments, the host cell is knock-out of native major pilin. In some embodiments, the method comprises a step of native major pilin knock-out.
In the fifth aspect, the present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure.
In the sixth aspect, the present disclosure provides a method of preparing a modified CLP comprising the steps of
a) providing the fusion polypeptide of the present disclosure; and
b) providing an activity of sortase.
In some embodiments, the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure. In some embodiments, the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase. In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster. In some embodiments, the method is an in vitro method.
In the seventh aspect, the present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.
In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
Brief Description of the Drawings
Fig. 1 shows the map of plasmid pEK-spa2.
Fig. 2 shows the workflow for constructing the tandem of two cassettes.
Fig. 3 shows the maps of plasmids comprising the tandem of two cassettes.
Fig. 4 shows the map of plasmid pZ9-dxs_crtEBI.
Fig. 5 shows the map of plasmid pET-28a-Spa2.
Fig. 6 shows the  CgCLP biosynthetic gene cluster (BGC) encoding the sortase genes srtC1 and srtC2, and the sortase-catalyzed pilin genes spa1, spa2, and spa3.
Fig. 7 is the TEM and AFM images showing that the major pilin Spa2 is indispensable for  CgCLP fiber structure formation. The bars in the TEM and AFM images are 200 nm and 400 nm, respectively
Fig. 8 shows the identification of the composition of CLP in C. glutamicum (CgCLP) by immunogold labelling. (a) The cartoon shows that  CgCLP fibers comprise two minor pilins (Spa1 and Spa3) and a major pilin of Spa2. (b) The immunogold labelling and TEM images show the constitution and distribution of  CgCLP pilins indicating that Spa2 is the major pilin. For single immunogold labelling of  CgCLP with primary polyclonal antibodies of Spa1, Spa2, and Spa3 (α-Spa1, α-Spa2, and α-Spa3, respectively) ; gold-decorated goat anti-rabbit IgG was used as the secondary antibody for labelling target pilin. For double immunogold labelling of  CgCLP with both α-Spa1 and α-Spa3, the 30 nm and 5 nm gold-decorated goat anti-rabbit IgG were used to label Spa1 and Spa2, respectively. For double labelling of CgCLP with both α-Spa2 and α-Spa3, the 15 nm and 5 nm gold-decorated goat anti-rabbit IgG were used to label Spa2 and Spa3, respectively. (c) Quantification analysis of CgCLP composition via whole-cell filtration ELISA (detection by the antibodies of α-Spa1, α-Spa2, and α-Spa3, respectively) . The quantified results also show that Spa2 is the main component of CgCLP. Each experiment was performed at least triplicate, and the standard error is shown. The bars in the TEM images indicate 200 nm.
Fig. 9 shows the deletion of both the srtC1 and srtC2 genes abrogates pili formation. The TEM images (detection by α-Spa2) (a) , AFM images (b) and whole-cell filtration ELISA quantification analysis (c) of the  CgCLP fiber of ΔsrtC1ΔsrtC2 strain. The bars in the TEM (a) and AFM (b) images are 200 nm and 400 nm, respectively. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Each ELISA experiment was performed at least in triplicate, and the standard error was shown.
Fig. 10 shows the isolation of  CgCLP fibers for mass spectrometry analysis. SDS-PAGE gel electrophoresis analysis of the nickel affinity chromatography purified  CgCLP fibers showed the high-molecular  CgCLP polymers were eluted under 100 mM imidazole.
Fig. 11 shows the identification of intermolecular isopeptide bonds for the polymerization of Spa2 monomers in  CgCLP. Fragmentation spectra of the parent ion at m/z 832.9 2+ containing the intermolecular isopeptide bond (green font) between Spa2 i Lys194 (blue font) and Spa2 i+1 Thr477 (red font) are shown.
Fig. 12 shows the liquid chromatography-tandem mass spectrometry (LC-MS/MS) identifies the signal peptide of Spa2. (a) The cartoon shows the amino acid sequence of Spa2 cut (replacing the 470-509 residues at the C-terminus of Spa2 with 6His) , enabling the Spa2 monomer not to be polymerized and to be secreted as a monomer in the medium. (b) SDS-PAGE gel electrophoresis indicates the purified Spa2 cut. (c) The LC-MS/MS identified that the residues 1-34 at the N-terminus of Spa2 are the signal peptide. This figure shows an MS/MS spectrum of the peptide with m/z 916.4538 2+ generated from chymotrypsin digest of Spa2. Predicted b-and y-type ions (not all included) are listed above and below the peptide  sequence, respectively. Matched ions are labelled in the spectrum.
Fig. 13 shows the Quadrupole time-of-flight mass spectrometry measured the accurate molecular weight of Spa2 cut. The measured molecular weight is ≈54.7 Da less than the calculated value of Spa2 cut, indicating that three intramolecular isopeptide bonds and two disulfide bonds exist in the monomeric Spa2. An intramolecular isopeptide bond formation will lose one molecule of ammonia, ≈17 Da; A disulfide bond formation will lose two hydrogen atoms, ≈2 Da.
Fig. 14 shows crystals of Spa2 diffracted to
Figure PCTCN2022130033-appb-000001
resolution on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) .
Fig. 15 shows the X-ray crystal structure of Spa2 which is arranged in three tandem Ig-like domains, N-domain (pink) , M-domain (blue) , and C-domain (green) . Residues involved in the formation of three intramolecular isopeptide bonds (yellow) and two disulfide bonds (red) are shown as sticks.
Fig. 16 shows the comparison of Spa2 in the crystal structure with the prediction from AlphaFold2 and crystal structure of 3HR6 and 4HSS. (a) Chain A in the Spa2 crystal structure (yellow) is superimposed with the AlphaFold2 predicted structure of Spa2 (blue) by PyMOL Align. The structures are superimposed using alpha-carbon (Cα) atoms of 410 residues with a root-mean-square deviation (RMSD) of
Figure PCTCN2022130033-appb-000002
indicating that AlphaFold2 accurately predicted the Spa2 fold of the individual domains. Chain A in the Spa2 crystal structure (yellow) is superimposed with crystal structure of 3HR6 (pink) (b) and 4HSS (green) (c) , and the RMSD values are
Figure PCTCN2022130033-appb-000003
 (270 Cα atoms) , 
Figure PCTCN2022130033-appb-000004
 (311 Cα atoms) , respectively.
Fig. 17 shows the Omit electron density maps showing the presence of internal covalent bonds in the crystal structure of Spa2.2mFo-DFc omit electron density maps of three isopeptide bonds (a) and two disulfide bonds (b) were shown in blue mesh, contoured at 1.0σ. The omit electron density maps were generated using Phenix composite omit map.
Fig. 18 shows Identification of the disulfide bonds and intramolecular isopeptide bonds formation at appropriate sequence locations in Spa2 by LC-MS/MS analysis. (a) The cartoon shows the critical features in Spa2, including three intramolecular isopeptide bonds in individual domains, two disulfide bonds in the N-domain (C97-C128) and the C-domain (C380-C432) , the pilin motif of YPKN in N-domain, and the sortase cleavage sorting signal motif of LPLTG in C-domain. (b) MS/MS spectrum of the peptide with m/z 1407.4 4+ generated from pepsin digest of Spa2 containing the disulfide bond between Cys97 and Cys128. (c) MS/MS spectrum of the peptide with m/z 1583.7 2+ generated from pepsin digest of Spa2 containing the disulfide bond between Cys380 and Cys432. (d) MS/MS spectrum of the peptide with m/z 1326.9 4+ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys57 and Asn195. (e) MS/MS spectrum of the peptide with m/z 1324.6 3+ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys203 and Asn318. (f) MS/MS spectrum of the peptide with m/z 754.6 4+ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys355 and Asp466. For (b) - (f) , predicted b-and y-type ions (not all included) are listed above and below  the peptide sequence, respectively; thedisulfide bonds and intramolecular isopeptide bonds are shown as red and yellow bars, respectively.
Figs. 19 and 20 show the genetic manipulation in Δspa2 strains (harboring a plasmid that expressed Spa2 or Spa2 variants of K194A, LPLTG 474LALAA478, E158A, D246A, E435A, D246A/E435A, C97A, C380A, and C97A/C380A, respectively) to assess the key residues promoting the formation of inter-and intra-molecular isopeptide bonds, and disulfide bonds, in Spa2 by TEM bio-imaging (Fig. 19) and quantitative analysis of the amount of  CgCLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (Fig. 20) . Results are presented as mean ± s.d in Fig. 20. The P value of Spa2 mutated strains vs the Spa2 strain from left to right in Fig. 20 is P < 0.0001, P < 0.0001, P = 0.4664, P = 0.8673, P = 0.7137, P = 0.0011, P = 0.0008, P = 0.0004 and P < 0.0001, respectively. Not significant (NS) P >0.05, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Statistics were derived using a t-test. The bars in Fig. 19 are 200nm.
Fig. 21 shows the accurate molecular weight of Spa2 cut mutant variants determined by quadrupole time-of-flight mass spectrometry. The measured molecular weight of E158A cut (a) , D246A cut (b) , E435A cut (c) , and D246A/E435A cut (d) are ≈54.9, 37.3, 21.4, and 4.0 Da less than the calculated value of related variants, indicating that three, two, one and no intramolecular isopeptide bonds are retained in the corresponding monomeric mutants, respectively. Spa2cut mutant variants E158A cut, D246A cut, E435A cut, and D246A/E435A cut were expressed in Δspa2 and purified by nickel-affinity chromatography.
Fig. 22 shows the rational engineering of the  CgCLP protein scaffold through a modular genetic design strategy: the cartoon shows a polymerized Spa2 major pilin functionalized by incorporating a protein-of-interest (POI) (e.g., mCherry, a fluorescent reporter protein) at candidate insertion sites (including Q35 (E1) at the N-terminus, and G215 (E2) , G236 (E3) and G336 (E4) in the M-domain lacking a disulfide bond) based on structural verification.
Fig. 23 shows the fluorescence intensity and quantitative analysis of the amount of  CgCLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (a) ; and confocal microscopy imaging (b) (scale bar = 2 μm) of engineered cells containing Spa2-mCherry fusion proteins inserted at different sites.
Fig. 24 shows the TEM morphologies of the assembled mCherry-Spa2 fusion proteins associated with cell surfaces based on immunogold labelling. TEM images of Δspa2 cells (a) , E1 cells (b) , E2 cells (c) , E3 cells (d) and E4 cells (e) . The TEM samples were collected from the Δspa2 strain harboring a plasmid that expresses various mCherry-Spa2 fusions under the native constitutive promoter of the spa2 gene. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Scale bars, 200 nm.
Fig. 25 shows the extracellular secretion and assembly of R-Spa2 pilins into CgCLP fiber at the cell-surfaces of engineered C. glutamicum cells: a series of R-Spa2 fusion protein constructs comprising functional R peptides/proteins with different amino acid sequences.
Fig. 26 shows the morphologies of assembled R-Spa2  CgCLP on the cell-surfaces based on immunogold labelling and TEM imaging, scale bar = 200 nm.
Fig. 27 shows the Functional characterization of engineered  CgCLP with various fusion domains. (a) TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2  CgCLP. (b) Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2  CgCLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs. (c) Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2  CgCLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs. (d) Confocal microscopic images show the green fluorescence emitted from Venus-Spa2  CgCLP cells. (e) Fluorescent images and quantification analysis of the immobilization ability of Mfp3Spep-Spa2  CgCLP cells. Immobilized microspheres (left) on the substrates before (top) and after (bottom) challenge with water jetting at a constant discharge pressure of 5 psi. Quantification analysis of the relative capabilities of different cells (right) with immobilized PS microspheres on the substrate. (f) The degradation of carboxymethyl cellulose into glucose by CcEgl-Spa2  CgCLP cells was detected by a 3, 5-dinitrosaloculoc acid (DNS) assay. Each experiment was performed at least in triplicate, and standard error is shown. Scale bars, 200 nm in a, 2 μm in b, c, and e, 100μm in d.
Fig. 28 shows the schematic showing simultaneous expression of the two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (N-Ven-Spa2+C-Ven-Spa2 strain) , containing the N-terminus (N-Ven) and C-terminus (C-Ven) module of the split-Venus system, resulting in co-assembly of the split-Venus components into the final functional  CgCLP structures.
Fig. 29 shows the TEM morphologies of the assembled split-Venus components fused with Spa2 associated with cell surfaces based on immunogold labelling. N-Ven+C-Ven cells expressing co-secreted split-Venus system (a) , N-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of N-Venus-Spa2 (b) , C-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of C-Venus-Spa2 (c) , and N-Ven-Spa2+C-Ven-Spa2 cells for simultaneous expression of two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (d) . The TEM samples were collected from the Δspa2 strain harboring a plasmid that expresses various Spa2 fusion proteins under the native constitutive promoter of the spa2 gene. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Scale bars, 200 nm.
Fig. 30 shows the co-assembly of split-Venus components into the  CgCLP fibers leading to increased fluorescence intensity. (a) The engineered C. glutamicum cells show greater fluorescence intensity only in the N-Ven-Spa2+C-Ven-Spa2 strain, and (b) confocal microscopy of C. glutamicum cells showing that the strongest Venus fluorescence signal appeared at the extracellular sites of the N-Ven-Spa2+C-Ven-Spa2 strain (scale bar = 2 μm) .
Fig. 31 shows the schematic illustrating of engineered C. glutamicum living materials transforming cellulosic biomass into a value-added product of lycopene by combining the extracellular cellulose degradation capacity and intracellular bioconversion ability. Specifically, for extracellular cellulose degradation (Step1) , endo-1, 4-β-glucanase from T.  reesei (TrEgl) and a β-glucosidase from S. degradans (SdBgl) were simultaneously fused with Spa2 pilin (TrEgl-Spa2+SdBgl-Spa2) and co-assembled into a  CgCLP structure, potentially forming a catalytic cascade for the extracellular degradation of cellulose into glucose. For intracellular transformation (Step2) , the glucose was used for lycopene production in the pathway engineered C. glutamicum of C003 strain by inducing IPTG. G3P: glyceraldehyde-3-phosphate; IPP, isopentenyl phosphate.
Fig. 32 shows the lycopene production from biowastes with engineered C. glutamicum harboring modified CLPs. a, TEM images show that cells of C003, which contain the P2 plasmid, enabled co-assembly of TrEgl and SdBgl into  CgCLP structure, while the cells of C001, C002, and C004 did not.  CgCLP was labeled with 10 nm gold particles by immunogold labelling. Scale bars, 200 nm. b, ELMs can degrade CMC-Na in a medium from a viscous gel to a thin solution only when both TrEgl and SdBgl were co-assembled into the CgCLP structure (TrEgl-Spa2+SdBgl-Spa2, C003 strain) , outperforming the case of the secreted free enzymes (TrEgl+SdBgl, C004 strain) . Δspa2Δdec (C001 strain) is the negative control strain. c, Degradation assays using CMC-Na as the substrate. The C003 strain showed 4-fold higher enzymeactivity than the C004 strain. d, HPLC assay for lycopene production with the C003 strain cultured in M63 medium with the replacement of the carbon source of glucose by CMC-Na with lycopene production induced by the addition of IPTG. Results are presented as mean ± s. d. The P values of C003 strain, C004 strains vs the C001 strain in c are P < 0.0001 and P=0.8629, respectively. Not significant (NS) P > 0.05, ****P < 0.0001. Statistics were derived using a t-test. Each experiment was performed at least in triplicate.
Detailed Description of the Invention
1. Definitions
Unless otherwise indicated, all terms used herein have the same meaning as they would to one skilled in the art, and the practice of the present disclosure will employ conventional techniques of microbiology and recombinant DNA technology, which are within the knowledge of those of skill in the art.
As used herein, the term “covalently-linked pili” or “CLP” refers to pili in which the monomers are linked to each other via covalent bonds. The engineered living materials herein refers to the pili formed by the engineered monomers, i.e., the fusion polypeptide of the present disclosure, or recombinant bacterium forming the pili.
As an example of the CLP forming bacteria, C. glutamicum, a Gram-positive bacterium, is “generally regarded as safe” (GRAS) ; this bacterium presents a potential platform for various product such as amino acids, and lycopene.
As used herein, the terms “peptide” can be exchanged with “polypeptide” and “protein” , means a chain comprising at least two amino acids linked by peptide bond, such as ten or more amino acid residues. The chemical formulas or sequences of all the peptides and polypeptide herein are written in left-to-right order, showing the direction from the amino  terminal to the carboxyl terminal. “Peptide” , “polypeptide” and “protein” can include, but are not limited to, an enzyme, an antibody, a hormone, a ligand, a receptor, etc.
The term “amino acid” includes amino acids naturally occurred in proteins and the unnatural amino acids. The conventional nomenclature (one-letter and three-letter) of the amino acids naturally occurred in proteins is employed, which can be seen in Sambrook, et al. (Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989) .
Figure PCTCN2022130033-appb-000005
As used herein, the term “fusion polypeptide” is a recombinant product comprising two or more peptide fragments which are not present in a single natural polypeptide. The fragments can be fused directly or via a linker, such as a flexible linker, e.g., GS linkers. Generally, a fusion polypeptide can be produced by the expression of a polynucleotide comprising nucleotide sequences encoding the two or more peptide fragments and the linker, if present, in desired order.
As used herein, the term “polynucleotide” usually refers to generally a nucleic acid molecule (e.g., 100 nucleotides and up to 30k nucleotides in length) and a sequence that is either complementary (antisense) or identical (sense) to the sequence of a messenger RNA (mRNA) or miRNA fragment or molecule. The term can also refer to DNA or RNA molecules that are either transcribed or non-transcribed.
As used herein, the term “polynucleotide construct” refers to a single-stranded or double-stranded polynucleotide, which is isolated from a naturally occurring gene or modified  to contain a nucleic acid segment that does not naturally occur. When the polynucleotide construct contains the control sequences required to express the coding sequence of the present disclosure, the polynucleotide construct comprises an “expression cassette” .
The term “exogenous polynucleotide” as used herein refers to a nucleotide sequence that does not originate from the host in which it is placed. It may be identical or heterologous to the host’s DNA. An example is a sequence of interest inserted into a vector. Such exogenous DNA sequences may be derived from a variety of sources including DNA, cDNA, synthetic DNA, and RNA. Exogenous polynucleotides also encompass DNA sequences that encode antisense oligonucleotides.
As used herein, the term “expression cassette” refers to a polynucleotide segment comprising a polynucleotide encoding a polypeptide operably linked to additional nucleotides provided for the expression of the polynucleotide, for example, control sequence.
As used herein, the term “encoding” means that a polynucleotide directly specifies the amino acid sequence of its protein product. The boundaries of the coding sequence are generally determined by an open reading frame, which generally starts with the ATG start codon or other start codons such as GTG and TTG, and ends with a stop codon such as TAA, TAG and TGA. The coding sequence can be a DNA, cDNA or recombinant nucleotide sequence.
As used herein, the term “expression” includes any step involved in the production of a polypeptide, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
A “control sequence” includes all elements necessary or beneficial for the expression of the polynucleotide encoding the polypeptide of the present disclosure. Each control sequence may be natural or foreign to the nucleotide sequence encoding the polypeptide, or natural or foreign to each other. Such control sequences include, but are not limited to, leader sequence, polyadenylation sequence, propeptide sequence, promoter, enhancer, signal peptide sequence, and transcription terminator. At a minimum, control sequences include a promoter and signals for the termination of transcription and translation.
For example, the control sequence may be a suitable promoter sequence, a nucleotide sequence recognized by the host cell to express the polynucleotide encoding the polypeptide of the present disclosure. The promoter sequence contains a transcription control sequence that mediates the expression of the polypeptide. The promoter may be any nucleotide sequence that exhibits transcriptional activity in the selected host cell, for example, lac operon of E. coli. The promoters also include mutant, truncated and hybrid promoters, and can be obtained from genes encoding extracellular or intracellular polypeptides, which are homologous or heterologous to the host cell.
As used herein, the term “operably linked” herein refers to a configuration in which a control sequence is placed at an appropriate position relative to the coding sequence of the polynucleotide sequence, whereby the control sequence directs the expression of the  polypeptide coding sequence.
The polynucleotide encoding a polypeptide of interest can be subjected to various manipulations to improve the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector or the host, such as codon optimization, is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.
The term “recombinant” as used herein refers to nucleic acids, vectors, polypeptides, or proteins that have been generated using DNA recombination (cloning) methods and are distinguishable from native or wild-type nucleic acids, vectors, polypeptides, or proteins.
As used herein, the term "hybridization" that nucleotides sequences, which are at least about 90%, preferably at least about 95%, more preferably at least about 96%, and more preferably at least 98%homologous to each other, generally maintain hybridization with each other under given stringent hybridization and washing conditions.
For the present disclosure, in order to determine the percentage identity between two amino acid sequences or two nucleic acid sequences, the sequences are aligned for the purpose of optimal comparison (e.g., a gap can be introduced into the first amino acid or nucleic acid sequence for the optimal alignment with the second amino acid or nucleic acid sequence) . Then, the amino acid residues or nucleotides at the corresponding amino acid positions or nucleotide positions are compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide at the corresponding position in the second sequence, these molecules are identical at this position. The percentage identity between two sequences is a function of the number of identical positions shared by the sequences (i.e., percentage identity=number of identical positions/total number of positions (i.e., the overlapping positions) ×100) . Preferably, the two sequences are identical in length.
A person skilled in the art knows that various computer programs can be used to determine the identity between two sequences.
“Identity percentage” or “sequence identity percentage” refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, the two polypeptides or polynucleotides have approximately the specified percentage of identical amino acids. For example, “95%identity” refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, 95%of the amino acids in the two polypeptides or 95%of the nucleotides in the two polynucleotides are identical.
A person skilled in the art knows various conditions for hybridization, such as stringent hybridization conditions and highly stringent hybridization conditions. See, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds. ) , 1995, Current Protocols in Molecular Biology, John Wiley & Sons, N.Y..
Of course, the polynucleotide of the present disclosure does not include a polynucleotide that only hybridizes to a poly A sequence (such as the 3' end poly (A) of mRNA) or a complementary stretch of poly T (or U) residues.
As used herein, the term “host cell” refers to, for example microorganisms, yeast cells, insect cells, and mammalian cells, that can be, or have been, used as recipients of vectors. The term includes the progeny of the original cell which has been transduced. Thus, a “host cell” as used herein generally refers to a cell which has been transduced with an exogenous DNA sequence. It is understood that the progeny of a single parental cell may not necessarily be completely identical in morphology or in genomic or total DNA complement to the original parent, due to natural, accidental, or deliberate mutation.
2. Fusion polypeptide
Through genetic manipulation, bio-imaging, and structural characterization, Spa2 protein is identified as the major pilin of the CLP fiber structure. Using structure-guided design, the inventor developed a new type of engineerable extracellular protein scaffold that can be genetically appended with diverse functional peptides or proteins at multiple sites of Spa2 protein.
The present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
In some embodiments, the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) ,  Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) ,  Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115  (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.
In some embodiments, the carrier protein is a major pilin.
Preferably, the fusion of insertion of the polypeptide of interest does not influence the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond in the carrier protein.
In some embodiments, the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
In some embodiments, the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum (Spa2 protein) . It is observed that the Spa2 protein (SEQ ID NO: 1) comprises three tandem Ig-like domains, including N-domain (residues 36-197) , M-domain (residues 198-343) , and C-domain (residues 344-469) which is consistent with other major pilin. It is also observed that the deletion of M-domain does not influence the formation of CLP. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
The Spa2 protein from different Corynebacterium glutamicum strains may vary in sequence. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged. The carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide. In some embodiments, the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the  residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
In some embodiments, the polypeptide of interest is directly linked to the N terminal of the carrier polypeptide. In some embodiments, the polypeptide of interest is linked to the N terminal of the carrier polypeptide via a peptide linker such as a flexible linker.
A peptide linker can be generally short peptides with about 4-20 or more amino acids, such as combinations of Ser and Gly residues, which is a conventional flexible linker. In some embodiments, the peptide linker used in the present disclosure is (G4S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G3S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G4S) 2 i.e., SEQ ID NO: 22. In some embodiments, the peptide linker is a C10 linker of SEQ ID NO: 23.
The polypeptide of interest can be selected according to the desired application of the fusion polypeptide.
In some embodiments, the fusion polypeptide is provided to bind, capture or enrich a target molecule, and the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody. For example, the fusion polypeptide is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37) , and the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.
In some embodiments, the fusion polypeptide is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .
In some embodiments, the fusion polypeptide is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme. In some embodiments, the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) . In some embodiments, the fusion polypeptide is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.
3. Polynucleotide and Vector
The present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure.
The polynucleotide of the present disclosure can be amplified with cDNA, mRNA or genomic DNA as the template and suitable oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid amplified as above can be cloned into a suitable vector and characterized by DNA sequence analysis.
The polynucleotide of the present disclosure can be prepared by standard synthesis techniques, for example, by using an automated DNA synthesizer.
The present disclosure also relates to the complementary strand of the nucleic acid molecule described herein. A nucleic acid molecule that is complementary to other nucleotide sequence is a molecule that is sufficiently complementary to the nucleotide sequence so that it can hybridize with the other nucleotide sequences to form a stable duplex.
In order to express the fusion polypeptide of the present disclosure, also provided is a polynucleotide construct and a vector comprising the polynucleotide of the present disclosure, such as an expression vector.
In some embodiments, the polynucleotide of the present disclosure is operably linked to a promoter. In some embodiments, the promoter is a constitutive promoter, such as the native promoter driving Spa2 gene in Corynebacterium glutamicum. In some embodiment, the promoter is an inducible promoter.
In some embodiments, the expression vector comprises a Lac operon.
The polynucleotide encoding the polypeptide of the present disclosure can be subjected to various manipulations to allow the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.
In order to identify and select host cells comprising the expression vector of the present disclosure, the vector of the present disclosure preferably contains one or more selectable markers, which allow simple selection of transformed, transfected, transduced, etc. cells. A selectable marker is a gene, of which the product provides biocide or virus resistance, heavy metal resistance, supplemental auxotrophs, etc. For example, the bacterial selectable marker is the dal gene from Bacillus subtilis or Bacillus licheniformis, or a marker that confers antibiotic resistance such as ampicillin, kanamycin, chloramphenicol or tetracycline resistance.
The vector of the present disclosure can be integrated into the genome of the host cell or  autonomously replicate in the cell, which is independent of the genome. The elements required for the integration into the genome of the host cell or the autonomous replication are known in the art (see, for example, the aforementioned Sambrook et al., 1989) .
4. Recombinant cell
The present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
In some embodiments, the carrier protein in the fusion polypeptide is the native major pilin of the recombinant cell.
In some embodiments, the recombinant cell is a recombinant gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) , Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank  assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank  assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115 (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.
In some embodiments, the carrier protein is a major pilin. In some embodiments, the carrier protein is the native major pilin of the bacterium.
Preferably, the fusion of insertion of the polypeptide of interest does not influence the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide  bond in the carrier protein.
In some embodiments, the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.
In some embodiments, the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.
In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum (Spa2 protein) . It is observed that the Spa2 protein (SEQ ID NO: 1) comprises three tandem Ig-like domains, including N-domain (residues 36-197) , M-domain (residues 198-343) , and C-domain (residues 344-469) which is consistent with other major pilin. It is also observed that the deletion of M-domain does not influence the formation of CLP. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof.
The Spa2 protein from different Corynebacterium glutamicum strains may vary in sequence. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
The carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.
In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions  corresponding to G336 and T337 of SEQ ID NO: 1.
In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
In some embodiments, the polypeptide of interest is directly linked to the N terminal of the carrier polypeptide. In some embodiments, the polypeptide of interest is linked to the N terminal of the carrier polypeptide via a peptide linker such as a flexible linker.
A peptide linker can be generally short peptides with about 4-20 or more amino acids, such as combinations of Ser and Gly residues, which is a conventional flexible linker. In some embodiments, the peptide linker used in the present disclosure is (G4S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G3S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G4S) 2 i.e., SEQ ID NO: 22. In some embodiments, the peptide linker is a C10 linker of SEQ ID NO: 23.
The polypeptide of interest can be selected according to the desired application of the fusion polypeptide. In some embodiments, the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) .
In some embodiments, the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
In some embodiments, the recombinant cell is provided to bind, capture or enrich a target molecule, and the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody. For example, the recombinant cell is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37) , and the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.
In some embodiments, the recombinant cell is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .
In some embodiments, the recombinant cell is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme. In some embodiments, the recombinant cell is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) . In some embodiments, the recombinant cell is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.
The present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion polypeptide of the present disclosure into a host cell.
In some embodiments, the carrier protein in the fusion polypeptide is the native major pilin of the host cell.
In some embodiments, the host cell is a gram-positive bacterium. In some embodiments, the host cell is a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) , Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain  NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain  BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115 (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.
In some embodiments, the host cell is modified to inactivate the native major pilin. In some embodiments, the method comprises a step of knocking out the native major pilin. The endogenous polynucleotide encoding the major pilin can also be replaced by the polynucleotide encoding the fusion polypeptide via homologous recombination.
5. Modified CLP
The present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure. In some embodiments, the modified CLP is cell-free.
The present disclosure further provides a method of preparing a modified CLP comprising the steps of a) providing the fusion polypeptide of the present disclosure; and b) providing an  activity of sortase. In some embodiments, the modified CLP is cell-free.
In some embodiments, the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure. In some embodiments, the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase.
In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the method comprises contacting the fusion polypeptide of the present disclosure with the sortase protein. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster. In some embodiments, the method is an in vitro method.
The present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.
In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
Benefits
The modified CLP and recombinant cell achieve the cascade reaction of enzymes, and improves the catalytic efficiency of a multi-enzyme system. The immobilization of enzymes onto CLP and recombinant cells can achieve a whole-cell catalyzation.
Examples
Example 1. Materials and Methods
Unless otherwise indicated, the experiments in the Examples are conventional in the art, and the experiments employing commercially available kit or reagents were carried out according to the manufacturer’s instructions.
1.1. Strains, plasmids, and media.
General Method
The original DNA sequence was fully synthesized (Genewiz, Nanjing, China) or PCR-generated. All PCR products were generated by KOD DNA polymerase (TOYOBO, Japan) . All plasmid construction was performed using the T4 DNA ligase (New England BioLabs, Boston, MA) for ligations or the NEB Builder HiFi DNA Assembly Master Mix (New England BioLabs, Boston, MA) for assembly. All plasmids or markerless strains were confirmed by DNA sequencing (GENEWIZ, Guangzhou, China) . Primers used in the  Examples are listed in Table 1.
Table 1. Primers
Figure PCTCN2022130033-appb-000006
Figure PCTCN2022130033-appb-000007
Growth Media
C. glutamicum ATCC140675 was provided by Dr. Zheng’s research group at the South China University of Technology. C. glutamicum ATCC14067 was grown in BHI liquid medium for recovery (37 g L -1 brain heart infusion (Becton, Dickinson and company) ) at 30 ℃, 250 rpm, overnight. For  CgCLP formation, C. glutamicum ATCC14067 was inoculated into M63 liquid medium (15.6 g L -1 M63 Broth (Sangon Biotech, Guangzhou, China) , supplemented with 1 mM MgSO4, 0.2% (wt/vol) glucose) and cultivated in an incubator at 30 ℃ without shaking for 2-3 days. Antibiotics for C. glutamicum culture were kanamycin (25 μg mL -1) and hloramphenicol (7.5 μg mL -1) .
Isopropyl-β-d-thiogalactoside (IPTG) at 1 mM/0.5mM or theophylline at 1mM was used to induce gene expression. Trans1-T1 (TransGen Biotech, Shenzhen, China) was used as the cloning host for plasmid manipulation, and E. coli BL21 (DE3) (New England BioLabs, Boston, MA) was used for protein expression. E. coli was cultured in Luria-Bertani medium (10 g L -1 peptone, 5 g L -1 yeast extract, 10 g L -1 NaCl) at 37 ℃ or 16 ℃ when applicable for protein expression. Antibiotics for E. coli culture were kanamycin (50 μg mL -1) and chloramphenicol (30 μg mL -1) .
Strain construction
The markerless deletion strains of C. glutamicum ATCC 14067 were achieved by the RecET-Cre/loxP system. Detailed methods for markerless deletion are described in Huang, Y. et al. (Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette. Sci. Rep. 7, 1-8, 2017) .
Briefly, to create a CLP-defective strain of Δclp, we first constructed a self-excisable cassette of Δclp-cassette. Primer pairs ck-S/A were used to amplify the fragment of the Cre-Kan cassette from the PBS-Cre-Kan plasmid. Primer pairs clpL-S/A and clpR-S/A were used to amplify ~800 bp left and right homologous fragments from the genome of C. glutamicum ATCC 14067. Finally, all dsDNA fragments, including the Cre-Kan cassette, the left and right homologous fragments, were used for subsequent fusion PCR to generate a ~4, 385 bp linear self-excisable dsDNA cassette with primer pairs clpL-S/clpR-A.
Similarly, to construct the Δspa1-cassette, primer pairs spa1L-S/A, spa1R-S/A, ck-S/A and spa1L-S/spa1R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
For the Δspa2-cassette, primer pairs spa2L-S/A, spa2R-S/A, ck-S/A and spa2L-S/spa2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
For the Δspa3-cassette, primer pairs spa3L-S/A, spa3R-S/A, ck-S/A and spa3L-S/spa3R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the  linear self-excisable dsDNA cassettes, respectively.
For the ΔsrtC1ΔsrtC2-cassette, primer pairs srtC1L-S/A, srtC2R-S/A, ck-S/A and srtC1L-S/srtC2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
For the ΔsrtA-cassette, primer pairs srtAL-S/A, srtAR-S/A, ck-S/A and srtAL-S/srtAR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.
For the Δdec-cassette, primer pairs decL-S/A, decR-S/A, ck-S/A and decL-S/decR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassette, respectively.
Then the self-excisable dsDNA cassettes for markerless deletion of different genes were transformed into exonuclease-recombinase RecE/T expressed competent cells (C. glutamicum ATCC 1406) by electroporation, yielding multiple Kan-resistant colonies on BHI agar plates. In particular, the cell-plasmid DNA/dsDNA mixture was transferred to an ice-cold electroporation cuvette (0.1 cm electrode gap) . Electroporation was performed with a Bio-Rad Micropulser set by three times 1.8 KV/cm (Ec1) pulse (see Huang et al., Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette, Sci Rep 7, 7916 (2017) )
To achieve markerless deletion mutants, Cre enzyme was used to induce expression by adding 1 mM theophylline and excising selectable marker by Cre/lox site specific recombination. Finally, sequencing of the PCR fragments from the genomic of mutants was performed for further identification. The resultant mutant strains used in this study were referred to as C. glutamicum ATCC 14067 Δclp (Δclp) , C. glutamicum ATCC 14067 Δspa1 (Δspa1) , C. glutamicum ATCC 14067 Δspa2 (Δspa2) , C. glutamicum ATCC 14067 Δspa3 (Δspa3) , and C. glutamicum ATCC 14067 ΔsrtC1ΔsrtC2 (ΔsrtC1ΔsrtC2) . C. glutamicum ATCC 14067 Δspa1Δspa3 (Δspa1Δspa3) mutant was constructed by transforming Δspa3-cassette into Δspa1 strain. C. glutamicum ATCC 14067 Δspa2ΔsrtA (Δspa2ΔsrtA) and C. glutamicum ATCC 14067 Δspa2Δdec (Δspa2Δdec) mutants were constructed by transforming ΔsrtA-cassette and Δdec-cassette into Δspa2 strain, respectively, as described above.
Plasmid construction
i) Construction of plasmids for constitutive expression of Spa2 pilin and different fusion proteins
[Rectified under Rule 91, 16.01.2023]
The pEC-XK99E plasmid was used as an original plasmid. DNA fragments of the pEC-XK99E backbone (GNENWIZ, China) the coding sequence of Spa2 or various recombinant Spa2 (SEQ ID NOs: 1, 5, 8-14, and 24, respectively) , and the native promoter (SEQ ID NO: 25) of spa2 gene via PCR, and then all the DNA fragments were assembled by NEB Builder HiFi DNA Assembly Master Mix to construct the plasmids pEK-spa2, pEK-spa2cut, pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2,  pEK-E4/mCherry-spa2, pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2, pEK-N-mCherry-C (see Fig. 1 for example) .
ii) Construction of pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, pEK-CcEgl-Spa2, pEK-N-Ven_C-Ven, pEK-N-Ven-Spa2, pEK-C-Ven-Spa2, pEK-N-Ven-Spa2_C-Ven-Spa2, pEC-TrEgl_SdBgl and pEC-TrEgl-Spa2_SdBgl-Spa2 plasmids
The two basic plasmids 203 and 204 (see Fig. 2) were constructed based on pEC-XK99E backbone with additional restriction sites of SmaI, XbaI, NcoI, BamHI, SpeI and SalI by Gibson assembly with NEB Builder HiFi DNA Assembly Master Mix. SmaI, XbaI, and NcoI were used to fuse proteins with Spa2 pilin, and SpeI and SalI (Takara) were used to insert another independent expression cassette for fusion protein.
To create the plasmids of pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, pEK-CcEgl-Spa2, pEK-N-Ven-Spa2, and pEK-TrEgl-Spa2, the coding sequences (CDSs) of SpyCatcher, Venus, CcEgl, N-Ven, and TrEgl (SEQ ID NOs: 15-19) were cloned into the SmaI and XbaI sites in 203 by ligation.
To construct the plasmids of pEK-N-Ven and pEK-TrEgl, the CDSs of N-Ven and TrEgl (SEQ ID NOs: 18 and 19) were inserted into the linearized backbone of 203 (digestion with SmaI and SpeI, Takara) via Gibson assembly.
To create the plasmids of pEK-C-Ven-Spa2 and pEK-SdBgl-Spa2, the CDSs of C-Ven and SdBgl (SEQ ID NOs: 20 and 21) were cloned into the SmaI and XbaI sites in 204 by ligation.
To construct the plasmids pEK-C-Ven and pEK-SdBgl, the CDSs of C-Ven and SdBgl (SEQ ID NOs: 20 and 21) were inserted into the linearized backbone of 204 (digestion with SmaI and SalI, Takara) via Gibson assembly.
Finally, the C-Ven-Spa2 cassette was obtained by digesting pEK-C-Ven-Spa2 with SpeI and SalI, and then, cloned into the plasmid of pEK-N-Ven-Spa2 (digested with SpeI and SalI, Takara) to construct tandem expression plasmids of pEK-N-Ven-Spa2_C-Ven-Spa2 (see Fig. 3) .
A similarly strategy was used to construct other tandem expression plasmids of pEK-TrEgl-Spa2_SdBgl-Spa2, pEK-N-Ven_C-Ven, and pEK-TrEgl_SdBgl. pEC-TrEgl-Spa2_SdBgl-Spa2 and pEC-TrEgl_SdBgl were constructed by replacing the kanamycin resistance with chloramphenicol resistance (see Fig. 3) .
iii) Construction of the pZ9-dxs_crtEBI plasmid
[Rectified under Rule 91, 16.01.2023]
The gene fragments of dxs (SEQ ID NO: 26) and crtEBI (crtE, SEQ ID NO: 27, and crtBI, SEQ ID NO: 28) were amplified from the genome of C. glutamicum ATCC 13032 with primer pairs dxs-A/dxs-S, crtE-S/crtE-A, and crtBI-S/crtBI-A, respectively; the Ptac promoter (SEQ ID NO: 30) driven dxs and crtEBI was amplified with primer pair ptrc-S/ptrc-A; and the lacI fragment (SEQ ID NO: 29) was amplified from pEC-XK99E with primer pair  lacI-S/lacI-A.
Then, the dxs, crtEBI and lacI fragments were assembled into the pZ9 backbone (GENEWIZ, China) by Gibson assembly to construct the pZ9-dxs_crtEBI plasmid (Fig. 4) .
iv) Construction of the pET-28a-Spa2 plasmid
The coding sequence of Spa2 (SEQ ID NO: 6) was amplified from the genome of C. glutamicum ATCC 14067, and then assembled into the pET-28a (+) backbone (Novagen, Madison, WI) by Gibson assembly (see Fig. 5) .
Transmission electron microscopy and immunogold labelling.
Transmission electron microscope imaging. C. glutamicum cells cultured 2-3 days in M63 medium were collected and washed twice in PBS buffer, and 20 μL of liquid culture in M63 (OD600 ≈1) were deposited onto carbon-coated TEM grids for 5-10 min. The samples were washed two times with 50 μL PBS buffer and three times with 20 μL water, and then, the excessive solution was quickly wicked away with filter paper. The cells were deposited onto the cropper wire mesh, and were negatively stained with 15 μL 2 w/v%uranyl acetate solutions for 1 min and dried for 10 min under an infrared lamp. Samples were examined in a JEOL JEM-1400 transmission electron microscope at an accelerating voltage of 120 kv.
[Rectified under Rule 91, 16.01.2023]
Immunogold labelling. Partial of the CDSs of  CgCLP pilins of Spa1 (SEQ ID NO: 31, Spa1-Ab) , Spa2 (SEQ ID NO: 32, Spa2ab) and Spa3 (SEQ ID NO: 33, Spa3-Ab) , were expressed in E. coli, purified and injected into rabbits to prepare the specific polyclonal antibodies α-Spa1, α-Spa2 and α-Spa3 (Your Bio-Tech Partner, Shanghai, China) , respectively.
For immunogold labelling, 20 μL of liquid culture of C. glutamicum in M63 (OD600 ≈1) were placed on carbon-coated grids for 10 min, washed two times with PBS buffer and three times with water. The samples were blocked with PBS with 1%bovine serum albumin (Sangon Biotech, A600332-0100) for 30 min. The solution was wicked off with filter paper and the cells deposited onto the cropper wire mesh were stained with a pilin primary antibody (the polyclonal antibodies above) diluted 1: 200 in PBS with 1%BSA for 1 h, followed by washing and blocking (PBS+1%BSA) . Samples were stained with 10 nm gold-decorated goat anti-rabbit IgG (Bioss, Beijing, China) diluted 1: 50 in PBS with 1%BSA for 45 min followed by washing three times with PBS and five times with water. Then, negative staining as described above, drying and imaging were performed. Double immunogold labelling experiments were performed according to Budzik, J. M. et al. (Assembly of pili on the surface of Bacillus cereus vegetative cells. Mol. Microbiol. 66, 495-510, 2007) with some modification. Briefly, after the incubation with primary antibody, samples were incubated with PBS containing 3%paraformaldehyde and 2%glutaraldehyde for 2 h room temperature. Samples were washed three times with PBS and incubated with 0.02 M glycine in PBS for 10 min room temperature. The immunogold labelling process was performed with the second pilin antibody and different sizes (5 nm, 15 nm or 30 nm) of gold-decorated goat anti-rabbit IgG (Bioss, Beijing, China) , followed by negative staining, drying and imaging.
Quantitative assay of CLP via whole-cell filtration ELISA.
The presence of extracellular amyloids was detected for the quantitative assay of CLP by whole-cell filtration ELISA (see Nguyen, P. Q. et al., Programmable biofilm-based materials from engineered curli nanofibres. Nat. Commun. 5, 1-10, 2014) . Briefly, C. glutamicum strains were cultured for 48 h in M63 liquid medium, and the cultures were collected, washed and diluted to an OD600 of 0.1 in Tris-buffered saline with 0.1%ProclinTM 300 (Sigma, 48912-U) on ice. Then, 25 μL of the diluted culture was loaded in a Multiscreen-GV96-well filter plate (0.22 mm pore size; EMD Millipore) , followed by washing (TBST (Sangon Biotech, C520009-0005) + 0.1%ProclinTM 300) , blocking (TBST + 0.1%ProclinTM 300+1%bovine serum albumin+ 0.01%H 2O 2) , incubating with α-Spa2 (diluted to 1: 5, 000 in TBST+ 0.1%ProclinTM 300) , washing and blocking as above, and incubating with goat anti-rabbit HRP-conjugated secondary antibody (Sangon Biotech, Guangzhou, China; diluted to 1: 5,000 in TBST+ 0.1%ProclinTM 300) . Subsequently, a chromogenic reaction was performed via Ultra-TMB (3, 30, 5, 50-tetramethyl-benzidine, Thermo Fisher, 34028) , which was terminated by the addition of 2 M H 2SO 4. Finally, the product was measured absorbance at 450 nm (areference wavelength of 650 nm) with a Cytation reader (BioTek) .
AFM imaging.
In total, 2 mL of cultures in M63 liquid medium were incubated on a mica surface for 2-4 h to allow sample deposition. Excessive solution was wicked away with a pipette and washed two times with water. The samples were then dried by nitrogen gas and immediately collected for AFM imaging. ScanAsyst mode AFM was performed on a Dimension FastScanTM AFM (Bruker) using silica cantilevers (SANASYST-AIR, Bruker, K = 0.4 N/m, ~70 kHz) .
Expression and purification of recombinant Spa2.
The recombinant Spa2 was expressed as an N-terminus His-tagged protein. E. coli BL21 (DE3) transformed with plasmid PET-28a-Spa2 (CaCl 2 process) were grown overnight at 37℃ to provide a starter culture for expression. A total of 1 L medium with 50 μg mL -1 kanamycin was inoculated with 1% (v/v) of the starter culture and grown at 37℃. When the OD600 reached 0.8, the cultivation temperature was lowered to 16℃ and IPTG was added to a final concentration of 0.5 mM to induce protein overexpression. After 16 h, cells were collected by centrifugation, and the cell pellets were suspended in buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) and lysed by high pressure homogenization. The cell lysates were centrifuged at 12, 000 rpm for 30 min at 4℃.
The resulting supernatant was loaded onto a Nickel-affinity column (5 mL, GE) pre-equilibrated with buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) . His-tagged Spa2 protein was eluted with buffer A with 50 mM imidazole. The His-tagged Spa2 protein was buffer-exchanged into buffer A and subjected to tag removal by HRV3c (SEQ ID NO: 34, 1 mg/50 mg Spa2) at 4 ℃ overnight. The digested product was loaded onto the 5-mL Ni-NTA column (GE) and eluted with a buffer A/buffer B (buffer A + 500 mM imidazole) gradient (5%buffer B, 10%buffer B, 20%buffer B and 100%buffer B) . The flow-through at 10% buffer B was collected.
Further purification was performed via ion-exchange chromatography (HiTrap Q HP, 5 mL &Cytiva) and size-exclusion chromatography (Uniondex 75 pg 16/60, UNION-BIOTECH, China) . The whole procedure of protein purification was carried out at 4 ℃.
Protein crystallization and structure determination.
The final purified protein was concentrated to 20 mg mL-1 in 10 mM Tris-HCl pH 8.0 and 50 mM NaCl for crystallization. The sitting drop vapor diffusion technique (http: //soft-matter. seas. harvard. edu/index. php/Vapor_Diffusion_Method) was used to crystallize the Spa2 protein. Crystals were obtained by mixing 4 μL of Spa2 protein with 4 μL reservoir solution (0.2 M sodium sulfate, 0.1 M Bis-Tris propane pH 7.5, 20 %w/v PEG 3350) and incubating the mixture at 18 ℃ for 1-2 weeks. The crystals were soaked in a cryo-protectant solution consisting of the reservoir solution and 20% (v/v) glycerol and then quickly frozen with liquid nitrogen. Diffraction data were collected on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) with flash frozen crystals (at 100 K in a stream of nitrogen gas) . The data were processed by XDS9 and then further processed using STARANISO10 (aserver of Global Phasing Company) .
The recombinant Spa2 crystal form diffracted to 
Figure PCTCN2022130033-appb-000008
resolution (Fig. 14) and belongs to the space group P212121, with unit-cell parameters 
Figure PCTCN2022130033-appb-000009
α=β=γ=90.0° and two molecules in the asymmetric unit. The structure was solved by the molecular replacement method using PHASER11 and the predicted Spa2 coordinates by Alphafold Colab12 as template. Further manual model building was carried out using COOT13. The model was refined by PHENLX14. Data collection, phasing and refinement statistics are given in Table 3. Structure figures were prepared using PyMOL2.3.4 (https: //pymol. org/2/) .
Fluorescence measurements.
Plate-reader measurements. C. glutamicum colonies were inoculated into 10 mL BHI and cultured for 12 h. Then cells were transferred into M63 medium with an initial OD600 of 0.1 for 3 days at 30℃ without shaking. Cells were collected by centrifugation at 5, 000 rpm, washed three times with PBS and diluted with PBS (OD600 ≈0.5) . Exactly 200 μL of the samples were transferred to a flat-bottom 96-well black plate and analyzed on a Tecan Infinite Pro 200 Plate Reader, with excitation/emission wavelengths of 580/610 nm for mCherry fluorescence intensity, and 510/545 nm for Venus fluorescence intensity. The fluorescence intensity divided by the absorbance of OD is the normalized fluorescence intensity.
Fluorescence (confocal) microscopy imaging. Cells prepared for plate-reader measurements were dripped on a glass slide and imaged under a Nikon TI2-E inverted microscope. Microscope light source power, detector gain, and image processing settings were consistent among different samples.
Stains expressing SpyTag-Spa2, SpyCatcher-Spa2 and Spa2 (strain Δspa2 transformed with pEK-SpyTagSpa2, pEK-SpyCatcherSpa2, and pEK-spa2, respectively) were cultured in  glass-bottom dishes in M63 for 3 days. The dishes were then gently washed three times with PBS containing 0.5%Tween80 (PBST) and blocked in PBST with 1%BSA for 1 h. The group of SpyTag-Spa2 and Spa2 were incubated with purified GFP-SpyCatcher (SEQ ID NO: 35) , and the group of SpyCatcher-Spa2 and Spa2 were incubated with purified GFP-SpyTag (SEQ ID NO: 36) for 1 h at room temperature. All samples were washed three times with PBS buffer and imaged under a Nikon TI2-E inverted microscope.
Microsphere binding tests.
Spa2 strain or the Mfp3Spep-Spa2 strain was cultured in the M63 medium (3 mL) supplemented with 200 μL of green-fluorescent PS microsphere solution in 35-mm Petri dishes containing 2-3 glass slides for 3 days at 30℃ without shaking. The settled glass slides were then taken out and gently flushed to wash away the microspheres that had not adhered. The binding capacity of different samples was compared with water jetting at a constant discharge pressure of 5 psi for 15 s, performed on a pressure-flow controller (PG-MFC-8CH, PreciGenome) . Fluorescence images were recorded before and after the mechanical challenge with water jetting.
Mass spectrometry analysis.
1) Preparation of samples.
i) Preparation of Spa2cut and its mutant variants.
The pEK-spa2cut plasmid was transferred into Δspa2 by electroporation as described above to construct the strain Δspa2-pEK-spa2cut, which was used to express the monomer of Spa2cut (SEQ ID NO: 5) . Cells were inoculated into M63 medium with 25 μg mL-1 kanamycin and cultured for 3 days. Supernatants (200 mL) were collected and concentrated into 1 mL and then purified by nickel-affinity chromatography as previously described in the section of “Expression and purification of recombinant Spa2” . Spa2cut was eluted with 100 mM imidazole. The final purified protein was buffer-exchanged into 10 mM Tris-HCl, 100 mM NaCl, pH 8.0. A similar process was followed for expression and purification of Spa2cut mutant variants of E158Acut, D246Acut, E435Acut, and D246A/E435Acut.
ii) Isolation of  CgCLP.
A method for isolation of SpaA pili of C. diphtheriae was adopted for the collection of  CgCLP fibers. (see Kang, H. J. et al., The Corynebacterium diphtheriae shaft pilin SpaA is built of tandem Ig-like modules with stabilizing isopeptide and disulfide bonds. Proc. Natl. Acad. Sci. U.S. A. 106, 16967-16971, 2009) . Specifically, engineered  CgCLP for polymer purification was produced by transforming the plasmid pEK-6his-spa2 into Δspa2ΔsrtA strain that lacks the spa2 gene and a housekeeping sortase encoding gene of srtA. Δspa2ΔsrtA-pEK-6his-spa2 strain enables secretion of the expressed 6His- CgCLP into the culture medium due to lacking sortase A. For the expression of 6His- CgCLP polymers, Δspa2ΔsrtA-pEK-6his-spa2 cells were inoculated into M63 medium with 25 μg mL -1 kanamycin and cultured for 3 days. For 6His- CgCLP purification, 500 mL supernatants were collected and concentrated to 5mL in buffer of 10 mM Tris-HCl, 100 mM NaCl, pH 8.0 and were purified by nickel affinity chromatography. The 6His- CgCLP polymers were eluted with 100 mM imidazole. Purified 6His- CgCLP fibers were then boiled in SDS sample buffer (6× Protein Loading Buffer, TransGen Biotech, DL101-02) and subjected to an SDS-PAGE gel. The high-molecular-weight  CgCLP polymer bands were excised from Coomassie brilliant blue  stained SDS-PAGE gels and prepared for intermolecular isopeptide bond identification.
2) Protein precipitation and digestion.
i) Samples processed for signal peptide identification.
The Spa2cut solution was precipitated with acetone (1: 4) and the pellets were dried using a Speedvac (room temperature) for 1-2 min. The pellets were then dissolved in 100 mM Tris-HCl (pH 8.5) supplemented with 8 M urea. 5mM TCEP (Thermo Scientific) for reduction and 10 mM iodoacetamide (Sigma) for alkylation were added and incubated at room temperature for 30 min. The protein mixture was diluted (1: 4) and digested overnight with chymotrypsin at 1: 40 (w/w) . The protease-digested peptide solution was desalted using a MonoSpinTM C18 column (GL Science, Tokyo, Japan) and dried with a SpeedVac.
ii) Samples processed for intramolecular covalent bond identification.
For the identification of the intramolecular isopeptide bond, the Spa2cut sample was processed following the same protocol as previously described for signal peptide identification. For the identification of the disulfide bond, the Spa2cut sample was processed following a similar protocol except that pepsin (Promega) was purposely added for digestion, while addition of 5mM TCEP (Thermo Scientific) was avoided to ensure that the disulfide bond, if any, was kept intact.
iii) Samples processed for inter-molecular isopeptide bond identification.
The Coomassie brilliant blue stained SDS-PAGE gel band of  CgCLP fibers was excised into small pieces and washed in water, followed by 50 mM NH 4HCO 3 in 50%acetonitrile and 100%acetonitrile. The sample was reduced with 10 mM TCEP (Thermo Scientific) in 100 mM NH 4HCO 3 at 55 ℃ for 1 h and alkylated with 55 mM iodoacetamide (Sigma) in 100 mM NH 4HCO 3 at 37 ℃ in the dark for 30 min. The gel pieces were then washed with 100 mM NH 4HCO 3 and 100%acetonitrile, and dried. The sample was primarily digested with 3 μg trypsin (Promega) in 50 mM NH 4HCO 3 at 37 ℃ overnight, then 1 μg of Asp-N endoproteinase (Promega) was added for another overnight incubation. Digested peptides were extracted twice with 50%acetonitrile containing 5%formic acid.
3) LC/tandem MS (MS/MS) analysis of peptide.
The protease-digested peptides were analyzed by LCMS/MS using an Easy-nLC 1200 nano HPLC (Thermo Scientific) hybrid of a Q Exactive Orbitrap mass spectrometer (Thermo Scientific) system. Peptides were separated on a 30 cm-long pulled-tip analytical column (75 μm ID packed with ReproSil-Pur C18-AQ 1.9 μm resin, Dr. Maisch GmbH) in 0.1%aqueous formic acid (buffer A) and 0.1%formic acid in 80%acetonitrile (buffer B) at 55 ℃ with a flow rate of 300 nl/min using a 120 min linear gradient. A cycle of one full-scan MS spectrum (m/z 300-1800) was acquired, followed by top 20 MS/MS events, sequentially generated on the first to the 20th most intense ions selected from the full MS spectrum at a 30%normalized collision energy. The peptide validation for signal peptide identification was automatically performed in PEAKS AB v2.0 (Tran, N. H. et al. Complete de novo assembly of monoclonal antibody sequences. Sci. Rep. 6, 1-10, 2016) . Peptides containing isopeptide bonds were identified using plink2 software (pFind Team, Beijing, China) (Lu, S. et al. Mapping native disulfide bonds at a proteome scale. Nat. Methods 12, 329-331, 2015) . Peptides from the digestion of  CgCLP containing the intermolecular isopeptide bond were manually analyzed  from MS/MS data according to the theoretical m/z of production of predicted peptides containing the isopeptide linkage.
4) Accurate molecular masses determination.
Accurate molecular masses were determined for Spa2cut and its variants by Quadrupole time-of-flight mass spectrometry (Agilent 6550 iFunnel Q-TOF) using a linear gradient by an HPLC system. The raw MS data were deconvoluted by the BioConfirm algorithm integrated into MassHunter software.
Enzymatic activity assay.
The enzyme activity of cellulases against carboxymethylcellulose sodium salt (CMC-Na, Sigma, USA) was detected using a 3, 5-dinitrosaloculoc acid (DNS) assay (Dong, C. et al. Engineering Pichia pastoris with surface-display minicellulosomes for carboxymethyl cellulose hydrolysis and ethanol production. Biotechnol. Biofuels 13, 1-9, 2020) . Cells of TrEgl-Spa2_SdBgl-Spa2 (C003 strain) and TrEgl_SdBgl (C004 strain) at 10 OD were concentrated to 500 μL and incubated in 2 mL 50 mM acetic acid (pH 4.8) with 1% (w/v) CMC-Na substrate at 50 ℃ for 30 min. The reaction was stopped by adding DNS and boiling for 10 min; reducing sugars were detected at 540 nm. One unit of enzyme activity was defined as the amount of cells that released 1 μmol of glucose from cellulose at 50 ℃ in 1 min.
Quantitative analysis of lycopene by HPLC.
The lycopene producing plasmid of pZ9-dxs_crtEBI was transferred into strain TrEgl_SdBgl to construct the recombinant strains of C003 and C004 for the utilization of cellulose to produce lycopene. C003 and C004 strains were inoculated into 10 mL BHI with 25 μg mL -1 kanamycin and 7.5 μg mL -1 chloramphenicol, and cultured for 12 h at 30 ℃ at a stirring speed at 200 rpm. Then cells were transferred into 50 mL modified M63 medium (15.6 g L -1 M63 broth, supplemented with 1 mM MgSO 4, 2% (wt/vol) CMC-Na) with initial OD600 of 3 for 2 days at 30℃ and 1 mM IPTG was added or not.
The quantitative analysis of lycopene production was carried out according to Li, C. et al. (Heterologous production of α-Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) . IPTG induced and un-induced cells (1 mL) were separately collected into 2 mL tubes of lysing matrix Y (M. P. Biomedicals) by centrifugation at 12, 000 rpm for 5 min. The pellets were resuspended in a 60%hexane and 40%acetone mixture and lysed using the FastPrepR-24 5G bead beating grinder and lysis system (M. P. Biomedicals) for lycopene extraction. The lysis condition is 30 s once with a 1 min interval, for 6 times.
The samples were centrifuged at 14, 000 rpm for 10 min at 4 ℃, and the resulting supernatant was then transferred to brown 2 mL screw cap glass vials (Agilent Technologies) and directly subjected to HPLC analysis. The quantification of lycopene was performed on an Agilent 1260 series HPLC system (Agilent Technologies) using YMC Carotenoid (250 × 4.6 mml. D., YMC) and detected via a diode array detector (DAD) at 450 nm. For separation, binary gradient elution was applied to change the eluent from 100%eluent A of methanol/Methyl tert-butyl ether/water (81/15/4) to 100%eluent B of methanol/Methyl tert-butyl ether/water  (7/90/3) over 90 min at a flow rate of 1.0 mL ·min-1 at 20 ℃ with an injection volume of 10 μL (eluent A for 2min, eluent B 2min-95min, and eluent A 95min-100min.
Example 2. Probing the molecular assembly of the CLP structure in C. glutamicum
This Example was carried out to investigate the CLP assembly in the industrial workhorse C. glutamicum ATCC 14067 (referred to as  CgCLP) .
2.1. Determination of the essential building block in CLP assembly in C. glutamicum
The industrial workhorse C. glutamicum is a ‘generally recognized as safe’ (GRAS) strain with well-established gene editing tools that is widely used for the industrial-scale production of valued products such as amino acids, diamines, terpenoids, and other chemicals (Zhao, N. et al. Development of a Transcription Factor-Based Diamine Biosensor in Corynebacterium glutamicum. ACS Synth. Biol. 10, 3074-3083, 2021; and Xu, X. et al., Ledesma-Amaro, R. &Liu, L. Microbial chassis development for natural product biosynthesis. Trends Biotechnol. 38, 779-796, 2020) .
In C. glutamicum, we predicted that the CLP BGC contains three pilin-encoding genes, spa1, spa2, and spa3, as well as two sortase coding genes of srtC1, and srtC2 (Fig. 6) , which is similar to the SpaH-type (arelatively less well-studied pili type) CLP gene cluster in the pathogenic C. diphtheriae (Mandlik, A. et al., Pili in Gram-positive bacteria: assembly, involvement in colonization and biofilm development. Trends Microbiol. 16, 33-40, 2008) .
Upon TEM and AFM imaging, no filamentous structures at the C. glutamicum cell surface upon deletion of the CLP BGC, while the filamentous structure phenotype was rescued upon complementing CLP BGC (Fig. 7) , indicating that  CgCLP BGC are responsible for fiber formation.
The composition of  CgCLP was determined with polyclonal antibodies against Spa1, Spa2, and Spa3, respectively. TEM images of the  CgCLP with immunogold labelling showed that the  CgCLP fibers comprise two minor pilins of Spa1 and Spa3 and a major pilin of Spa2 (Fig. 8) . TEM and AFM imaging used to assess the specific roles of the three pilins in the  CgCLP assembly showed that the cells, which were defective for Spa1 (Δspa1 strain) , Spa3 (Δspa3 strain) , or both (Δspa1Δspa3 strain) , could still produce fibers (Fig. 7) . In contrast, cells lacking Spa2 (Δspa2) could not produce any fiber, and overexpression of Spa2 (Spa2) promoted the formation of abundant long fibers throughout the cell surface (Fig. 7) .
TEM and AFM images also showed that cells lacking both SrtC1 and SrtC2 (ΔsrtC1ΔsrtC2) completely blocked fiber formation (Fig. 9) .
Collectively, it was verified that the major pilin of Spa2 protein is an indispensable building block for the sortase-catalyzed  CgCLP assembly and production, similar to the role of the well studied SpaA in pili assembly in the pathogenic C. diphtheriae. Despite this similarity, the wide variation in the size and sequences of major pilin protein from diverse Gram-positive pathogens 35 makes it challenging to predict whether the structural principles characterized for  the CLP of other hosts are also appliable in  CgCLP.
2.2. Isopeptide bond and disulfide bond during the  CgCLP assembly
Having identified the Spa2 major pilin as the essential building block for  CgCLP fiber production, experiments were performed to identify the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond during the  CgCLP assembly.
First, the purified  CgCLP polymers were excised from Coomassie blue-stained SDS-PAGE gels (Fig. 10) and then digested in-gel with trypsin (Promega) and AspN endoproteinase (Promega) . Liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to analyze the digestion products, and verify the presence of the intermolecular isopeptide bond (bond formation results in the elimination of a water molecule and thus a slight decrease of molecular weight) . Specifically, the peptide peak with m/z 832.9 2+ (Fig. 11 and Table 2) suggested that the major pilin of Spa2 was cross-linked between K194 in the N-terminus of Spa2 i and T477 in the C-terminus of Spa2 i+1 (Lys194-Thr477) .
Table 2. Daughter ions produced during MS/MS of a peptide at m/z 832.92+ containing Lys194-Thr477 intermolecular isopeptide bond of Spa2
Figure PCTCN2022130033-appb-000010
a Monoisoptic masses of observed ions.
b Theoretical ions. Monoisotopic masses were calculated using the Fragment Ion Calculator.
(http: //db. systemsbiology. net: 8080/proteo micsToolkit/FragIonServlet. html) .
c Difference between observed ion mass and theoretical ion mass.
Quadrupole time-of-flight mass spectrometry analysis of a recombinant variant of Spa2 (Spa2 cut, SEQ ID NO: 5) (Fig. 12) secreted by C. glutamicum cells indicated a molecular weight of 46, 504.6 Da (Fig. 13) , which is about 54.7 Da less than the expected value calculated from the secreted Spa2 cut amino acid sequence. This detected mass is consistent with the loss of three NH 3 units and two H 2 units, indicating the formation of three intramolecular isopeptide bonds (loss of one molecule of ammonia, ≈17 Da) and two disulfide  bonds (loss of two hydrogen atoms, ≈2 Da) in Spa2.
2.3. The structural features of major pilin Spa2
The X-ray crystal structure of Spa2 at
Figure PCTCN2022130033-appb-000011
resolution (PDB ID: 7WOI) (Fig. 15 and Table 3) by the molecular replacement method using PHASER with the coordinates predicted by Alphafold Colab as a template (Fig. 16a) .
Table 3. Data collection and refinement statistics
Figure PCTCN2022130033-appb-000012
a Values in parentheses correspond to the outermost shell of data.
b R merge=ΣΣi|I (h) i-<I (h) >|/ΣΣi|I (h) i|, where <I (h) > is the mean equivalent intensity.
c R work=Σ|Fo-Fc|/Σ|Fo|, where Fo and Fc are the observed and calculated structure factor amplitudes, respectively.
d R free=Σ|Fo-Fc|/Σ|Fo|. This value was calculated using a test data set comprising 5%of the total data that was randomly selected from the observed reflections.
Spa2 is arranged in three tandem Ig-like domains, including N-domain (residues 36-197, pink) , M-domain (residues 198-343, blue) , and C-domain (residues 344-469, green) , giving an elongated molecule
Figure PCTCN2022130033-appb-000013
in length (Fig. 15) . These three tandem Ig-like domains of  Spa2 are similar to the major pilin of SpaA (PDB ID: 3HR6, root-mean-square deviation (RMSD) 
Figure PCTCN2022130033-appb-000014
over 270 alpha-carbon (C α) atoms, Fig. 16b) and SpaD (PDB ID: 4HSS, RMSD
Figure PCTCN2022130033-appb-000015
over 311 C α atom, Fig. 16c) from human pathogen C. diphtheriae (Kang, H. J. et al., 2009 above, and Kang, H. J. et al. A slow-forming isopeptide bond in the structure of the major pilin SpaD from Corynebacterium diphtheriae has implications for pilus assembly. Acta Crystallogr. D Biol. Crystallogr. 70, 1190-1201, 2014) . The crystals of the Spa2 adopt head-to-tail stacking such that the N-domain in Spa2 i abuts against the C-domain in Spa2 i+1 (Fig. 15) , which is consistent with the result that the Spa2 monomers joined via the intermolecular isopeptide bond between K194 in the N-terminus of Spa2 i and T477 in the C-terminus of Spa2 i+1 (Fig. 11) . Together these results imply that the biological assembly of  CgCLP fiber occurs via the head-to-tail polymerization of Spa2 monomers.
Furthermore, interpretation of electron density maps clearly showed three common isopeptide bonds and two unique disulfide bonds in the structure of Spa2 (Fig. 17) . Formation of multiple covalent bonds was also verified by LC-MS/MS analysis of the pepsin-digested Spa2 cut products (Fig. 18) . The isopeptide bonds linked Lys57 and Asn195 with catalytic Glu158 in the N-domain; Lys203 and Asn318 with catalytic Asp246 in the M-domain; and Lys355 and Asn466 with catalytic Glu435 in the C-domain (Fig. 17a) . Notably, the presence of three intramolecular isopeptide bonds distributed in three domains of major pilin Spa2 in C. glutamicum is similar to the feature of the major pilin SpaD from the pathogenic C. diphtheriae (Kang, H. J. et al., 2014 above) , but is quite different from the major pilin SpaA from the pathogenic C. diphtheriae lacking isopeptide bonds in the N-terminal domain (Kang, H.J. et al., 2009 above) . In addition, two disulfide bonds were formed in the N-domain between Cys97 and Cys128 and the C-domain between Cys380 and Cys432, respectively (Fig. 17b) . Notably, the presence of two disulfide bonds in Spa2 is very unique in comparison with other major pilins in human pathogens, such as Spy0128 (PDB ID: 3B2M) from Streptococcus pyogenes 37 and BcpA (PDB ID: 3KPT) from Bacillus cereus 38 lacking disulfide bond, and the SpaA and SpaD from C. diphtheriae containing only one disulfide bond in the C-terminal domain (Kang, H. J. et al., 2009 and 2014 above) .
2.4. The intermolecular polymerization between Spa2 monomers
Functional assays with various Spa2 mutant variants expressed in ΔSpa2 was conducted to explore their roles for  CgCLP formation in vivo. Indeed, mutagenesis experiments with K194A and LPLTG 474LALAA478 variants blocked  CgCLP production, confirming that both Lys194 in the N-domain and LPLTG 474-478 in the C-domain participate in Spa2 monomer polymerization (Figs. 19 and 20) .
A series of Spa2 variants were generated to further test how the intramolecular isopeptide bond and disulfide bond in Spa2 monomer contribute to the formation and stabilization of  CgCLP.
First, variants of Spa2 with alanine substitutions of Glu158, Asp246, and Glu435 (E158A, D246A, E435A) that originally catalyzed Lys-Asn isopeptide bond formation in each domain, were constructed in the Δspa2 strain (the substitutions were introduced to the Spa2-encoding sequence in pEK-Spa2, respectively) . The LC-MS/MS analysis, bio-imaging characterization,  and ELISA quantification analysis showed that E158A, D246A, and E435A abolished one or two intramolecular isopeptide bonds (Fig. 21a-c) , none of which had any obvious impacts on  CgCLP production (Figs. 19 and 20) . Only the double mutation variants of D246A/E435A abolished all three intramolecular isopeptide bonds in Spa2 (Fig. 21d) , and produced only 44.9%of  CgCLP compared to Spa2 cells (Fig. 20) .
Second, the variants of C97A and C380A abrogated the disulfide bonds in the N-and C-domains of Spa2, respectively. TEM assay showed the influence of the mutations in Spa2 on the  CgCLP formation (Fig. 19) . ELISA assay showed a dramatic reduction in the extent of  CgCLP formation upon Spa2 variants (Fig. and 20) . We also found that  CgCLP formation was completely blocked in a C97A/C380A double mutant variant (Figs. 19 and 20) .
Taken together, these results suggest that both isopeptide and disulfide bonds contribute to the formation of CLP in C. glutamicum, with the disulfide bond appearing as the most important element for stabilization of the  CgCLP structure.
Example 3. Engineering  CgCLP as a programmable extracellular protein scaffold
The CLP structure may serve as an attractive building block for various applications because these extracellular fibers have extraordinarily high tensile strength owing to their extensive inter-and intra-molecular isopeptide bonds. Moreover, as an extracellular matrix, CLP fibers can be conveniently and reliably positioned directly outside cells. Finally, their proteinaceous nature makes them potentially amenable for elaboration using genetic engineering.
This Example was carried out to determine suitable fusion sites to append peptides/proteins to Spa2. According to both the Spa2 crystal structure and the characterization of specific functional domains within Spa2 observed in Example 2, four different positions to test the fusion of a protein-of-interest (POI) , with one site in the N-terminus of Spa2 and three sites in the M-domain lacking a disulfide bond (Fig. 22) .
The CLP-defective strain C. glutamicum ATCC 14067 Δspa2 (Δspa2) with abrogated extracellular  CgCLP formation was transformed with the exogenous expression plasmid (pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2, or pEK-E4/mCherry-spa2) for Spa2 fusion protein expression to test the restored  CgCLP fiber production.
The fluorescent reporter protein mCherry was fused at the interrogated positions for generating functional fusion proteins (SEQ ID NOs: 8-11) while retaining the sortase-catalyzed covalently-linked pili formation capacity of Spa2. As shown in Fig. 22, four sites were tested for mCherry addition/insertion, including Q35 (E1) at the N-terminus of Spa2, G215 in loop 1 of the M-domain (E2) , G236 in the loop 2 of the M-domain (E3) , and G336 in the β23-sheet of the M-domain (E4) . Quantitative analysis (ELISA) showed that the cells expressing each of the fusion proteins fluoresced and enabled the formation of fiber (Fig. 23a) .
Confocal microscopy showed that mCherry fluorescence was detected for all engineered  variants, with fluorescence evident at extracellular sites on the C. glutamicum cells (Fig. 23b) , consistent with TEM imaging results showing that mCherry-functionalized  CgCLP fibers formed on the surface of cells (Fig. 24) . Combining the results of ELISA, fluorescence intensity, confocal microscopy and TEM imaging, it is concluded that both E1 and E2 are more ideal sites for fusion of a functional POI yielding abundant amount of functionalized  CgCLP fibers.
A variety of Spa2 fusion proteins (six POIs, each fused at the E1 position via a linker of SEQ ID NO: 23) (see Fig. 25) were expressed by Δspa2 strains transformed with plasmids pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2, pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, and pEK-CcEgl-Spa2, respectively. All of these fusion proteins were successfully expressed, secreted, and formed  CgCLP (Fig. 26) .
TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2  CgCLP (Fig. 27a) . Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2  CgCLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27b) . Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2  CgCLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27c) . Confocal microscopic images show the green fluorescence emitted from Venus-Spa2  CgCLP cells (Fig. 27d) . Fluorescent images and quantification analysis of the immobilization ability of Mfp3Spep-Spa2  CgCLP cells. Immobilized microspheres (left) on the substrates before (top) and after (bottom) challenge with water jetting at a constant discharge pressure of 5 psi. Quantification analysis of the relative capabilities of different cells (right) with immobilized PS microspheres on the substrate (Fig. 27e) . The degradation of carboxymethyl cellulose into glucose by CcEgl-Spa2  CgCLP cells was detected by a 3, 5-dinitrosaloculoc acid (DNS) assay (Fig. 27f) .
These findings indicate that the sortase-mediated polymerization is not disrupted by fusion of POIs to Spa2 monomers, especially fusion of POIs in the N-terminus of Spa2 and loop 1 of the M-domain, and various types and sizes of proteins can be engineered into a generally programmable extracellular protein scaffold of  CgCLP.
To assess whether the programmable  CgCLP extracellular protein scaffold can support the co-assembly of multiple heterologous proteins, we conducted experiments in the Δspa2 strain with the well-established spilt-Venus system (see Fig. 28, and Kodama, Y. &Hu, C. -D. An improved bimolecular fluorescence complementation assay with a high signal-to-noise ratio. BioTechniques 49, 793-805, 2010) .
Δspa2 strain was transformed with plasmids pEK-N-Ven-Spa2, pEK-C-Ven-Spa2 and pEK-N-Ven-Spa2_C-Ven-Spa2, respectively, Δspa2 strain transformed with pEK-N-Ven_C-Ven was used as a control.
As indicated by TEM images of the transformed cells, co-assembly of two distinct proteins did not disturb  CgCLP assembly (Fig. 29) . The fluorescence intensity assay and confocal microscopy imaging showed that the highest fluorescence intensity was observed in cells  where the split-Venus components were simultaneously fused with Spa2 (Fig. 30) . Almost no fluorescence was detected when only N-Ven and C-Ven were simultaneously secreted without anchoring to the  CgCLP scaffold (Fig. 30) . These results indicated that the split components can be co-assembled in the extracellular  CgCLP scaffold.
Example 4. Engineering living materials to degrade cellulosic biomass into valued chemicals
This Example was carried out to verify the co-assembly of multiple cellulases into a catalytic cascade for extracellular degradation of cellulose into glucose to support production of specific chemicals of interest (e.g., lycopene) in C. glutamicum ATCC 14067 Δspa2 (Fig. 31) .
In particular, the endo-1, 4-β-glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and β-glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) were co-assembled in the  CgCLP fiber; these two enzymes are known to work in concert to degrade cellulose into glucose via enzyme cascade reactions.
Lycopene can be produced via the methylerythritol phosphate (MEP) pathway by engineered C. glutamicum (Li, C. et al. Heterologous production of α-Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) . A C001 chassis (Δspa2Δdec) with deletion of both spa2 gene (Δspa2, for the abrogation  CgCLP formation) and a 43, 702 bp region between CEY17_RS03380 and CEY17_RS03560 (Δdec, for accumulation of the precursor for lycopene production) (Heider, S.A. et al., Carotenoid biosynthesis and overproduction in Corynebacterium glutamicum. BMC Microbiol. 12, 1-11, 2012) was constructed as described in Example 1. The basal lycopene-producing strain C002 was constructed by transforming strain C001 with plasmid pZ9-dxs_crtEBI for IPTG-inducible expression of the dxs gene and crtEBI gene cluster. Then, the C002 strain was transformed with plasmids pEC-TrEgl-Spa2_SdBgl-Spa2, and pEC-TrEgl_SdBgl, respectively, resulting in the strains C003 and C004.
As shown in Fig. 32, the C003 strain co-assembled TrEgl and SdBgl in  CgCLP fiber on the cell surface (Fig. 32a) and enabled the degradation of carboxymethylcellulose sodium (CMC-Na, the ether derivate of cellulose) in medium, based on the medium turning from a viscous gel to a thin solution (Fig. 32b) . Strain C004, which only simultaneously secreted both TrEgl and SdBgl without anchoring to the  CgCLP scaffold did not show similar behavior.
The extracellular activity of cellulase assays showed that the C003 strain produced a 4-fold higher yield of reducing sugars than strain C004 (Fig. 32c) . As shown in Fig. 32d, the lycopene production titer in C003 strain reached 0.83 mg/g dry cell weight (DCW) after 36 h culture in a M63 medium with CMC-Na as the sole carbon resource. SEQUENCES
SEQ ID NO: 1 Wildtype Spa2
Figure PCTCN2022130033-appb-000016
SEQ ID NO: 2 Wildtype Spa2
Figure PCTCN2022130033-appb-000017
SEQ ID NO: 3 Wildtype Spa2
Figure PCTCN2022130033-appb-000018
SEQ ID NO: 4 Wildtype Spa2
Figure PCTCN2022130033-appb-000019
SEQ ID NO: 5 Spa2 cut
Figure PCTCN2022130033-appb-000020
Figure PCTCN2022130033-appb-000021
SEQ ID NO: 6 Recombinant Spa2
Figure PCTCN2022130033-appb-000022
SEQ ID NO: 7 mCherry
Figure PCTCN2022130033-appb-000023
SEQ ID NO: 8 E1/mCherry-spa2
Figure PCTCN2022130033-appb-000024
SEQ ID NO: 9 E2/mCherry-spa2
Figure PCTCN2022130033-appb-000025
SEQ ID NO: 10 E3/mCherry-spa2
Figure PCTCN2022130033-appb-000026
SEQ ID NO: 11 E4/mCherry-spa2
Figure PCTCN2022130033-appb-000027
SEQ ID NO: 12 6his-spa2
Figure PCTCN2022130033-appb-000028
SEQ ID NO: 13 SpyTagSpa2
Figure PCTCN2022130033-appb-000029
SEQ ID NO: 14 Mfp3Spep-Spa2
Figure PCTCN2022130033-appb-000030
SEQ ID NO: 15 SpyCatcher
Figure PCTCN2022130033-appb-000031
SEQ ID NO: 16 Venus
Figure PCTCN2022130033-appb-000032
SEQ ID NO: 17 CcEgl
Figure PCTCN2022130033-appb-000033
SEQ ID NO: 18 N-Ven
Figure PCTCN2022130033-appb-000034
SEQ ID NO: 19 TrEgl
Figure PCTCN2022130033-appb-000035
SEQ ID NO: 20 C-Ven
Figure PCTCN2022130033-appb-000036
SEQ ID NO: 21 SdBgl
Figure PCTCN2022130033-appb-000037
SEQ ID NO: 22 Linker 1 (GS)
Figure PCTCN2022130033-appb-000038
SEQ ID NO: 23 Linker 2 (C10)
Figure PCTCN2022130033-appb-000039
SEQ ID NO: 24 N-mCherry-C
Figure PCTCN2022130033-appb-000040
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 25 Spa2 promoter
Figure PCTCN2022130033-appb-000041
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 26 dxs
Figure PCTCN2022130033-appb-000042
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 27 crtE
Figure PCTCN2022130033-appb-000043
Figure PCTCN2022130033-appb-000044
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 28 crtBI
Figure PCTCN2022130033-appb-000045
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 29 lacI
Figure PCTCN2022130033-appb-000046
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 30 Ptac promoter
Figure PCTCN2022130033-appb-000047
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 31 Spa1-Ab
Figure PCTCN2022130033-appb-000048
Figure PCTCN2022130033-appb-000049
[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 32 Spa2-Ab
Figure PCTCN2022130033-appb-000050
SEQ ID NO: 33 Spa3-Ab
Figure PCTCN2022130033-appb-000051
SEQ ID NO: 34 HRV3c
Figure PCTCN2022130033-appb-000052
SEQ ID NO: 35 GFP-SpyCatcher
Figure PCTCN2022130033-appb-000053
SEQ ID NO: 36 GFP-SpyTag
Figure PCTCN2022130033-appb-000054
SEQ ID NO: 37 SpyTag
Figure PCTCN2022130033-appb-000055
SEQ ID NO: 38 Mfp35
Figure PCTCN2022130033-appb-000056

Claims (39)

  1. A fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
  2. The fusion polypeptide of claim 1, wherein the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  3. The fusion polypeptide of claim 1 or 2, wherein the carrier protein is a major pilin.
  4. The fusion polypeptide of any of claims 1-3, wherein the polypeptide of interest is fused to N or C terminus of the carrier protein.
  5. The fusion polypeptide of claim 4, wherein the polypeptide of interest is fused to the N terminus of the carrier protein.
  6. The fusion polypeptide of any of claims 1-3, wherein the polypeptide of interest is inserted into the carrier protein.
  7. The fusion polypeptide of claim 6, wherein the polypeptide of interest is inserted into a loop in the carrier protein.
  8. The fusion polypeptide of claim 6, wherein the carrier protein is the major pilin from Corynebacterium glutamicum, and wherein the polypeptide of interest is inserted into the M domain of the major pilin.
  9. The fusion polypeptide of any claims 1-8, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
  10. The fusion polypeptide of any claims 1-9, wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  11. The fusion polypeptide of any of claims 1-10, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, and wherein the polypeptide of interest is fused to the N terminus of carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  12. A polynucleotide encoding the fusion polypeptide of any of claims 1-11.
  13. A vector comprising the polynucleotide of claim 12.
  14. A host cell comprising the polypeptide of any of claims 1-11, the polynucleotide of claim 12 or the vector of claim 13.
  15. A recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
  16. The recombinant cell of claim 15, wherein the recombinant cell is a gram-positive bacterium.
  17. The recombinant cell of claim 15 or 16, wherein the bacterium is selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
  18. The recombinant cell of any of claims 15-17, wherein the carrier protein is a major pilin.
  19. The recombinant cell of any of claims 15-18, wherein the polypeptide of interest is fused to N or C terminus of the carrier protein.
  20. The recombinant cell of claim 19, wherein the polypeptide of interest is fused to the N terminus of the carrier protein.
  21. The recombinant cell of any of claims 15-18, wherein the polypeptide of interest is inserted into the carrier protein.
  22. The recombinant cell of claim 21, wherein the polypeptide of interest is inserted into a loop in the carrier protein.
  23. The recombinant cell of claim 22, wherein the carrier protein is a major pilin from Corynebacterium glutamicum, and wherein the polypeptide of interest is inserted into the M domain of the major pilin.
  24. The recombinant cell of any claims 15-23, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
  25. The recombinant cell of any claims 15-24, wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
  26. The recombinant cell of any of claims 15-25, wherein the carrier protein comprises amino acids 36-509 of SEQ ID NO: 1, and wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
  27. The recombinant cell of any of claims 15-26, wherein the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
  28. A method of preparing the recombinant cell of any of claims 15-27, comprising introducing a polynucleotide of claim 12 or the vector of claim 13 into a host cell.
  29. The method of claim 28, wherein the host cell is a bacterium having a native CLP.
  30. The method of claim 28 or 29, wherein the host cell is a gram-positive bacterium.
  31. The method of any of claims 28-30, wherein the method comprises a step of knocking out the native major pilin of the host cell.
  32. A modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of any of claims 1-11.
  33. A method of preparing a modified CLP comprising the steps of
    a) providing the fusion polypeptide of any of claims 1-11; and
    b) providing an activity of sortase.
  34. The method of any of claim 33, wherein the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  35. The method of claim 33 or 34, wherein the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
  36. The method of any of claims 33-35, wherein the method is an in vitro method.
  37. A polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of claim 12, and one or more polynucleotides encoding a sortase.
  38. The polynucleotide construct or a combination of polynucleotide constructs of claim 37, wherein the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
  39. The polynucleotide construct or a combination of polynucleotide constructs of claim 37 or  38, wherein the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
PCT/CN2022/130033 2022-11-04 2022-11-04 Modified covalently-linked pili and recombinant bacteria comprising the same Ceased WO2024092769A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280102906.7A CN120476135A (en) 2022-11-04 2022-11-04 Modified covalently cross-linked pili and recombinant bacteria containing the same
PCT/CN2022/130033 WO2024092769A1 (en) 2022-11-04 2022-11-04 Modified covalently-linked pili and recombinant bacteria comprising the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/130033 WO2024092769A1 (en) 2022-11-04 2022-11-04 Modified covalently-linked pili and recombinant bacteria comprising the same

Publications (2)

Publication Number Publication Date
WO2024092769A1 true WO2024092769A1 (en) 2024-05-10
WO2024092769A9 WO2024092769A9 (en) 2025-04-17

Family

ID=90929428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130033 Ceased WO2024092769A1 (en) 2022-11-04 2022-11-04 Modified covalently-linked pili and recombinant bacteria comprising the same

Country Status (2)

Country Link
CN (1) CN120476135A (en)
WO (1) WO2024092769A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119613570A (en) * 2024-12-20 2025-03-14 合肥工业大学 Dextran enzyme fusion modified body SpyCatcher-padex, encoding gene and application thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009137763A2 (en) * 2008-05-08 2009-11-12 Emory University Methods and compositions for the display of polypeptides on the pili of gram-positive bacteria
WO2017003305A1 (en) * 2015-07-01 2017-01-05 Auckland Uniservices Limited Peptides and uses thereof
WO2019213262A1 (en) * 2018-05-01 2019-11-07 The Regents Of The University Of California Reagent to label proteins via lysine isopeptide bonds

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160304567A1 (en) * 2007-12-19 2016-10-20 Emory University Methods and compositions for the display of polypeptides on the pili of gram- positive bacteria
WO2009137763A2 (en) * 2008-05-08 2009-11-12 Emory University Methods and compositions for the display of polypeptides on the pili of gram-positive bacteria
US20110189236A1 (en) * 2008-05-08 2011-08-04 Emory University Methods and Compositions for the Display of Polypeptides on the Pili of Gram-Positive Bacteria
WO2017003305A1 (en) * 2015-07-01 2017-01-05 Auckland Uniservices Limited Peptides and uses thereof
WO2019213262A1 (en) * 2018-05-01 2019-11-07 The Regents Of The University Of California Reagent to label proteins via lysine isopeptide bonds

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUNG TON‐THAT: "Sortases and pilin elements involved in pilus assembly of Corynebacterium diphtheriae", MOLECULAR MICROBIOLOGY, WILEY-BLACKWELL PUBLISHING LTD, GB, vol. 53, no. 1, 1 July 2004 (2004-07-01), GB , pages 251 - 261, XP093168778, ISSN: 0950-382X, DOI: 10.1111/j.1365-2958.2004.04117.x *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119613570A (en) * 2024-12-20 2025-03-14 合肥工业大学 Dextran enzyme fusion modified body SpyCatcher-padex, encoding gene and application thereof

Also Published As

Publication number Publication date
CN120476135A (en) 2025-08-12
WO2024092769A9 (en) 2025-04-17

Similar Documents

Publication Publication Date Title
CA3121848A1 (en) Production of 3-fucosyllactose and lactose converting .alpha.-1,3-fucosyltransferase enzymes
Biedendieck et al. Plasmid system for the intracellular production and purification of affinity‐tagged proteins in Bacillus megaterium
EP2970953B1 (en) Improved surface display of functional proteins in a broad range of gram negative bacteria
EA017803B1 (en) Expression system
CN108103039B (en) A group of fucosyltransferase mutants and their screening methods and applications
KR101481142B1 (en) Synthetic Promoter for Expressing Corynebacteria
CN115175994A (en) Enhanced production of histidine, purine pathway metabolites and plasmid DNA
KR102350425B1 (en) Methods of using o-methyltransferase for biosynthetic production of pterostilbene
CN118291510A (en) A method for producing target molecules based on coagulation-dissociation technology, recombinant strain and its application
WO2014170460A2 (en) Method for the production of collagen proteins derived from marine sponges and an organism able to produce said proteins
WO2024092769A1 (en) Modified covalently-linked pili and recombinant bacteria comprising the same
KR102793833B1 (en) Recombinant microorganism expressing fucosyltransferase and Method of producing 2’-fucolsylactose using thereof
CN116083387B (en) Enzyme, strain for producing salidroside and production method
CN114806913B (en) High-yield succinic acid yeast engineering strain with mitochondria positioning reduction TCA pathway, construction method and application thereof
CN114875001B (en) A method for in vitro recombinant rice SCF (D3) E3 ligase and its application
CN114196646B (en) A kind of olive alcohol synthase variant A and use thereof
CN114875000B (en) Method for in vitro recombination of multi-subunit SCF E3 ligase by using fusion protein and application
CN106916819B (en) A Bacillus subtilis promoter with improved activity and its construction and application
CN111363709B (en) Genetically engineered bacterium for improving isoprene yield and construction method and application thereof
US8636999B2 (en) Stable plasmid expression vector for bacteria
WO2021188816A1 (en) Methods and biological systems for discovering and optimizing lasso peptides
KR102194697B1 (en) Gene Circuit for Selecting 3-Hydroxypropionic Acid Using Responding 3-Hydroxypropionic Acid Transcription Factor and Method for Screening of 3-Hydroxypropionic Acid Producing Strain
CN117025557A (en) Cyclooxygenase Lsd18 variant with higher heat stability and stronger enzyme activity and application thereof
WO2024242102A1 (en) Method for producing pf1378a, protein, nucleic acid, and transformant
CN119241719A (en) Fusion protein, fluorescent probe responsive to glycolic acid and its application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22964074

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 202280102906.7

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 202280102906.7

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 22964074

Country of ref document: EP

Kind code of ref document: A1