WO2024092769A1

WO2024092769A1 - Modified covalently-linked pili and recombinant bacteria comprising the same

Info

Publication number: WO2024092769A1
Application number: PCT/CN2022/130033
Authority: WO
Inventors: Chao Zhong; Yuanyuan Huang
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2024-05-10
Anticipated expiration: 2025-05-04
Also published as: CN120476135A; WO2024092769A9

Abstract

Provided is a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently- linked pili (CLP) from a microorganism. Also provided is a recombinant cell comprising a modified CLP comprising the fusion polypeptide, as well as the modified CLP.

Description

Modified Covalently-linked Pili and Recombinant Bacteria Comprising the Same

Technical Field

The present disclosure relates to biological engineering. In particular, the present disclosure relates to engineered bacteria, such as Corynebacterium glutamicum comprising modified covalently-linked pili (CLP) .

Background

The engineered living materials (ELMs) relate to engineered biomaterials with distinctive “living” attributes such as autonomous growth, self-healing and environmental responsiveness that are only found in natural living materials, a wide range of remarkable ELMs had been developed for the applications in biosensors, bioremediation, biomedicine, biomanufacturing, wearable devices, and electronics. Depending on the source of their structural components, ELMs can be produced either by harnessing engineered cells to simultaneously make the material and incorporate novel functionalities into it (known as self-organizing living materials or biological ELMs) or by embedding living cells in an organic or inorganic matrix (referred to as hybrid living materials) . Self-organizing living materials aim to recapitulate the autonomous, adaptive, and versatile properties of natural living materials, and represent opportunities to harness engineered biological systems for new capabilities.

Despite the advances in ELMs, further development and application of self-organizing living materials faces challenges due to the lack of engineerable chassis and the limited access to programmable endogenous biopolymers in microorganisms, particularly the non-pathogens. At present, only model microbial systems, such as Escherichia coli and Bacillus subtilis along with their extracellular amyloid fibers, and several non-model systems including bacterial cellulose-producing K. rhaeticus, the surface-layer protein-containing Caulobacter crescentus and the dominant bacterial component of Pantoea agglomerans in native feedstocks of fungus have been successfully harnessed in ELMs design (Tang, T. -C. et al., Materials design by synthetic biology. Nat. Rev. Mater. 6, 332-350, 2021; Caro-Astorga, J. et al., Bacterial cellulose spheroids asbuilding blocks for 3D and patterned living materials and for regeneration. Nat. Commun. 12, 1-9, 2021; Charrier, M. et al. Engineering the S-layer of Caulobacter crescentus as a foundation for stable, high-density, 2D living materials. ACS Synth. Biol. 8, 181-190, 2018; and Huang, J. et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nat. Chem. Biol. 15, 34-41, 2019) .

Some Gram-positive bacteria comprise covalently-linked pili (CLP) . Unlike the non-covalently linked pili produced in Gram-negative bacteria (Ramirez, N. A. et al., New paradigms of pilus assembly mechanisms in gram-positive actinobacteria. Trends Microbiol. 28, 999-1009, 2020) , the CLP monomer subunits are typically joined via intermolecular isopeptide bond catalyzed by sortase conferring enormous tensile strength (McConnell, S. A. et al., Protein labeling via a specific lysine-isopeptide bond using the pilin polymerizing sortase from Corynebacterium diphtheriae. J. Am. Chem. Soc. 140, 8420-8423, 2018) . Furthermore, the CLP subunits contain auto-catalyzed intramolecular isopeptide bonds that are less susceptible to proteolytic cleavage and can dissipate mechanical energy (Ramirez, N.A. et al., 2020) imparting the robustness of CLP. In addition, several pilin proteins in the CLP structure of different strains contain additional disulfide bonds that further enhance stability (Kang, H. J. et al., The Corynebacterium diphtheriae shaft pilin SpaA is built of tandem Ig-like modules with stabilizing isopeptide and disulfide bonds. Proc. Natl. Acad. Sci. U.S.A. 106, 16967-16971, 2009) .

Therefore, there remains a need of developing new chassis for ELMs, such as self-organizing living materials, preferably a bacterium forming CLP.

Summary of the Invention

The inventors develop an integrative technological platform for ELMs based on the discovary of the biosynthetic gene cluster (BGC) of the covalently-linked pili (CLP) fiber in the industrial workhorse Corynebacterium glutamicum.

In the first aspect, the present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.

In some embodiments, the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. In some embodiments, the carrier protein is a major pilin.

In some embodiments, the polypeptide of interest is fused to a terminus of the carrier protein. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein.

In some embodiments, the polypeptide of interest is inserted into the carrier protein. In some embodiments, the polypeptide of interest is inserted into a loop in the carrier protein.

In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.

In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1. In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.

In the second aspect, the present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure, and a vector comprising the polynucleotide, as well as a host cell comprising the polypeptide, the polynucleotide or the vector of the present disclosure.

In the third aspect, the present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.

In some embodiments, the recombinant cell is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. In some embodiments, the carrier protein is a major pilin.

In some embodiments, the carrier protein comprises amino acids 35-509 of SEQ ID NO: 1, and the polypeptide of interest is fused to the N terminus of carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.

In some embodiments, the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.

In the fourth aspect, the present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion polypeptide of the present disclosure into a host cell derived from a microorganism having CLP.

In some embodiments, the host cell is knock-out of native major pilin. In some embodiments, the method comprises a step of native major pilin knock-out.

In the fifth aspect, the present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure.

In the sixth aspect, the present disclosure provides a method of preparing a modified CLP comprising the steps of

a) providing the fusion polypeptide of the present disclosure; and

b) providing an activity of sortase.

In some embodiments, the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure. In some embodiments, the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase. In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster. In some embodiments, the method is an in vitro method.

In the seventh aspect, the present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.

In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.

Brief Description of the Drawings

Fig. 1 shows the map of plasmid pEK-spa2.

Fig. 2 shows the workflow for constructing the tandem of two cassettes.

Fig. 3 shows the maps of plasmids comprising the tandem of two cassettes.

Fig. 4 shows the map of plasmid pZ9-dxs_crtEBI.

Fig. 5 shows the map of plasmid pET-28a-Spa2.

Fig. 6 shows the ^CgCLP biosynthetic gene cluster (BGC) encoding the sortase genes srtC1 and srtC2, and the sortase-catalyzed pilin genes spa1, spa2, and spa3.

Fig. 7 is the TEM and AFM images showing that the major pilin Spa2 is indispensable for ^CgCLP fiber structure formation. The bars in the TEM and AFM images are 200 nm and 400 nm, respectively

Fig. 8 shows the identification of the composition of CLP in C. glutamicum (CgCLP) by immunogold labelling. (a) The cartoon shows that ^CgCLP fibers comprise two minor pilins (Spa1 and Spa3) and a major pilin of Spa2. (b) The immunogold labelling and TEM images show the constitution and distribution of ^CgCLP pilins indicating that Spa2 is the major pilin. For single immunogold labelling of ^CgCLP with primary polyclonal antibodies of Spa1, Spa2, and Spa3 (α-Spa1, α-Spa2, and α-Spa3, respectively) ; gold-decorated goat anti-rabbit IgG was used as the secondary antibody for labelling target pilin. For double immunogold labelling of ^CgCLP with both α-Spa1 and α-Spa3, the 30 nm and 5 nm gold-decorated goat anti-rabbit IgG were used to label Spa1 and Spa2, respectively. For double labelling of CgCLP with both α-Spa2 and α-Spa3, the 15 nm and 5 nm gold-decorated goat anti-rabbit IgG were used to label Spa2 and Spa3, respectively. (c) Quantification analysis of CgCLP composition via whole-cell filtration ELISA (detection by the antibodies of α-Spa1, α-Spa2, and α-Spa3, respectively) . The quantified results also show that Spa2 is the main component of CgCLP. Each experiment was performed at least triplicate, and the standard error is shown. The bars in the TEM images indicate 200 nm.

Fig. 9 shows the deletion of both the srtC1 and srtC2 genes abrogates pili formation. The TEM images (detection by α-Spa2) (a) , AFM images (b) and whole-cell filtration ELISA quantification analysis (c) of the ^CgCLP fiber of ΔsrtC1ΔsrtC2 strain. The bars in the TEM (a) and AFM (b) images are 200 nm and 400 nm, respectively. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Each ELISA experiment was performed at least in triplicate, and the standard error was shown.

Fig. 10 shows the isolation of ^CgCLP fibers for mass spectrometry analysis. SDS-PAGE gel electrophoresis analysis of the nickel affinity chromatography purified ^CgCLP fibers showed the high-molecular ^CgCLP polymers were eluted under 100 mM imidazole.

Fig. 11 shows the identification of intermolecular isopeptide bonds for the polymerization of Spa2 monomers in ^CgCLP. Fragmentation spectra of the parent ion at m/z 832.9 ²⁺ containing the intermolecular isopeptide bond (green font) between Spa2 _i Lys194 (blue font) and Spa2 _i+1 Thr477 (red font) are shown.

Fig. 12 shows the liquid chromatography-tandem mass spectrometry (LC-MS/MS) identifies the signal peptide of Spa2. (a) The cartoon shows the amino acid sequence of Spa2 ^cut (replacing the 470-509 residues at the C-terminus of Spa2 with 6His) , enabling the Spa2 monomer not to be polymerized and to be secreted as a monomer in the medium. (b) SDS-PAGE gel electrophoresis indicates the purified Spa2 ^cut. (c) The LC-MS/MS identified that the residues 1-34 at the N-terminus of Spa2 are the signal peptide. This figure shows an MS/MS spectrum of the peptide with m/z 916.4538 ²⁺ generated from chymotrypsin digest of Spa2. Predicted b-and y-type ions (not all included) are listed above and below the peptide sequence, respectively. Matched ions are labelled in the spectrum.

Fig. 13 shows the Quadrupole time-of-flight mass spectrometry measured the accurate molecular weight of Spa2 ^cut. The measured molecular weight is ≈54.7 Da less than the calculated value of Spa2 ^cut, indicating that three intramolecular isopeptide bonds and two disulfide bonds exist in the monomeric Spa2. An intramolecular isopeptide bond formation will lose one molecule of ammonia, ≈17 Da; A disulfide bond formation will lose two hydrogen atoms, ≈2 Da.

Fig. 14 shows crystals of Spa2 diffracted to

resolution on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) .

Fig. 15 shows the X-ray crystal structure of Spa2 which is arranged in three tandem Ig-like domains, N-domain (pink) , M-domain (blue) , and C-domain (green) . Residues involved in the formation of three intramolecular isopeptide bonds (yellow) and two disulfide bonds (red) are shown as sticks.

Fig. 16 shows the comparison of Spa2 in the crystal structure with the prediction from AlphaFold2 and crystal structure of 3HR6 and 4HSS. (a) Chain A in the Spa2 crystal structure (yellow) is superimposed with the AlphaFold2 predicted structure of Spa2 (blue) by PyMOL Align. The structures are superimposed using alpha-carbon (Cα) atoms of 410 residues with a root-mean-square deviation (RMSD) of

indicating that AlphaFold2 accurately predicted the Spa2 fold of the individual domains. Chain A in the Spa2 crystal structure (yellow) is superimposed with crystal structure of 3HR6 (pink) (b) and 4HSS (green) (c) , and the RMSD values are

(270 Cα atoms) ,

(311 Cα atoms) , respectively.

Fig. 17 shows the Omit electron density maps showing the presence of internal covalent bonds in the crystal structure of Spa2.2mFo-DFc omit electron density maps of three isopeptide bonds (a) and two disulfide bonds (b) were shown in blue mesh, contoured at 1.0σ. The omit electron density maps were generated using Phenix composite omit map.

Fig. 18 shows Identification of the disulfide bonds and intramolecular isopeptide bonds formation at appropriate sequence locations in Spa2 by LC-MS/MS analysis. (a) The cartoon shows the critical features in Spa2, including three intramolecular isopeptide bonds in individual domains, two disulfide bonds in the N-domain (C97-C128) and the C-domain (C380-C432) , the pilin motif of YPKN in N-domain, and the sortase cleavage sorting signal motif of LPLTG in C-domain. (b) MS/MS spectrum of the peptide with m/z 1407.4 ⁴⁺ generated from pepsin digest of Spa2 containing the disulfide bond between Cys97 and Cys128. (c) MS/MS spectrum of the peptide with m/z 1583.7 ²⁺ generated from pepsin digest of Spa2 containing the disulfide bond between Cys380 and Cys432. (d) MS/MS spectrum of the peptide with m/z 1326.9 ⁴⁺ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys57 and Asn195. (e) MS/MS spectrum of the peptide with m/z 1324.6 ³⁺ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys203 and Asn318. (f) MS/MS spectrum of the peptide with m/z 754.6 ⁴⁺ generated from pepsin digest of Spa2 containing the Internal isopeptide bond between Lys355 and Asp466. For (b) - (f) , predicted b-and y-type ions (not all included) are listed above and below the peptide sequence, respectively; thedisulfide bonds and intramolecular isopeptide bonds are shown as red and yellow bars, respectively.

Figs. 19 and 20 show the genetic manipulation in Δspa2 strains (harboring a plasmid that expressed Spa2 or Spa2 variants of K194A, LPLTG _474LALAA478, E158A, D246A, E435A, D246A/E435A, C97A, C380A, and C97A/C380A, respectively) to assess the key residues promoting the formation of inter-and intra-molecular isopeptide bonds, and disulfide bonds, in Spa2 by TEM bio-imaging (Fig. 19) and quantitative analysis of the amount of ^CgCLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (Fig. 20) . Results are presented as mean ± s.d in Fig. 20. The P value of Spa2 mutated strains vs the Spa2 strain from left to right in Fig. 20 is P < 0.0001, P < 0.0001, P = 0.4664, P = 0.8673, P = 0.7137, P = 0.0011, P = 0.0008, P = 0.0004 and P < 0.0001, respectively. Not significant (NS) P >0.05, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Statistics were derived using a t-test. The bars in Fig. 19 are 200nm.

Fig. 21 shows the accurate molecular weight of Spa2 ^cut mutant variants determined by quadrupole time-of-flight mass spectrometry. The measured molecular weight of E158A ^cut (a) , D246A ^cut (b) , E435A ^cut (c) , and D246A/E435A ^cut (d) are ≈54.9, 37.3, 21.4, and 4.0 Da less than the calculated value of related variants, indicating that three, two, one and no intramolecular isopeptide bonds are retained in the corresponding monomeric mutants, respectively. Spa2cut mutant variants E158A ^cut, D246A ^cut, E435A ^cut, and D246A/E435A ^cut were expressed in Δspa2 and purified by nickel-affinity chromatography.

Fig. 22 shows the rational engineering of the ^CgCLP protein scaffold through a modular genetic design strategy: the cartoon shows a polymerized Spa2 major pilin functionalized by incorporating a protein-of-interest (POI) (e.g., mCherry, a fluorescent reporter protein) at candidate insertion sites (including Q35 (E1) at the N-terminus, and G215 (E2) , G236 (E3) and G336 (E4) in the M-domain lacking a disulfide bond) based on structural verification.

Fig. 23 shows the fluorescence intensity and quantitative analysis of the amount of ^CgCLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (a) ; and confocal microscopy imaging (b) (scale bar = 2 μm) of engineered cells containing Spa2-mCherry fusion proteins inserted at different sites.

Fig. 24 shows the TEM morphologies of the assembled mCherry-Spa2 fusion proteins associated with cell surfaces based on immunogold labelling. TEM images of Δspa2 cells (a) , E1 cells (b) , E2 cells (c) , E3 cells (d) and E4 cells (e) . The TEM samples were collected from the Δspa2 strain harboring a plasmid that expresses various mCherry-Spa2 fusions under the native constitutive promoter of the spa2 gene. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Scale bars, 200 nm.

Fig. 25 shows the extracellular secretion and assembly of R-Spa2 pilins into CgCLP fiber at the cell-surfaces of engineered C. glutamicum cells: a series of R-Spa2 fusion protein constructs comprising functional R peptides/proteins with different amino acid sequences.

Fig. 26 shows the morphologies of assembled R-Spa2 ^CgCLP on the cell-surfaces based on immunogold labelling and TEM imaging, scale bar = 200 nm.

Fig. 27 shows the Functional characterization of engineered ^CgCLP with various fusion domains. (a) TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2 ^CgCLP. (b) Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2 ^CgCLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs. (c) Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2 ^CgCLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs. (d) Confocal microscopic images show the green fluorescence emitted from Venus-Spa2 ^CgCLP cells. (e) Fluorescent images and quantification analysis of the immobilization ability of Mfp3Spep-Spa2 ^CgCLP cells. Immobilized microspheres (left) on the substrates before (top) and after (bottom) challenge with water jetting at a constant discharge pressure of 5 psi. Quantification analysis of the relative capabilities of different cells (right) with immobilized PS microspheres on the substrate. (f) The degradation of carboxymethyl cellulose into glucose by CcEgl-Spa2 ^CgCLP cells was detected by a 3, 5-dinitrosaloculoc acid (DNS) assay. Each experiment was performed at least in triplicate, and standard error is shown. Scale bars, 200 nm in a, 2 μm in b, c, and e, 100μm in d.

Fig. 28 shows the schematic showing simultaneous expression of the two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (N-Ven-Spa2+C-Ven-Spa2 strain) , containing the N-terminus (N-Ven) and C-terminus (C-Ven) module of the split-Venus system, resulting in co-assembly of the split-Venus components into the final functional ^CgCLP structures.

Fig. 29 shows the TEM morphologies of the assembled split-Venus components fused with Spa2 associated with cell surfaces based on immunogold labelling. N-Ven+C-Ven cells expressing co-secreted split-Venus system (a) , N-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of N-Venus-Spa2 (b) , C-Ven-Spa2 cells expressing the Spa2 pilin fusion protein of C-Venus-Spa2 (c) , and N-Ven-Spa2+C-Ven-Spa2 cells for simultaneous expression of two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (d) . The TEM samples were collected from the Δspa2 strain harboring a plasmid that expresses various Spa2 fusion proteins under the native constitutive promoter of the spa2 gene. For immunogold labelling, α-Spa2 is the primary antibody, and the 10 nm gold-decorated goat anti-rabbit IgG is the secondary antibody. Scale bars, 200 nm.

Fig. 30 shows the co-assembly of split-Venus components into the ^CgCLP fibers leading to increased fluorescence intensity. (a) The engineered C. glutamicum cells show greater fluorescence intensity only in the N-Ven-Spa2+C-Ven-Spa2 strain, and (b) confocal microscopy of C. glutamicum cells showing that the strongest Venus fluorescence signal appeared at the extracellular sites of the N-Ven-Spa2+C-Ven-Spa2 strain (scale bar = 2 μm) .

Fig. 31 shows the schematic illustrating of engineered C. glutamicum living materials transforming cellulosic biomass into a value-added product of lycopene by combining the extracellular cellulose degradation capacity and intracellular bioconversion ability. Specifically, for extracellular cellulose degradation (Step1) , endo-1, 4-β-glucanase from T. reesei (TrEgl) and a β-glucosidase from S. degradans (SdBgl) were simultaneously fused with Spa2 pilin (TrEgl-Spa2+SdBgl-Spa2) and co-assembled into a ^CgCLP structure, potentially forming a catalytic cascade for the extracellular degradation of cellulose into glucose. For intracellular transformation (Step2) , the glucose was used for lycopene production in the pathway engineered C. glutamicum of C003 strain by inducing IPTG. G3P: glyceraldehyde-3-phosphate; IPP, isopentenyl phosphate.

Fig. 32 shows the lycopene production from biowastes with engineered C. glutamicum harboring modified CLPs. a, TEM images show that cells of C003, which contain the P2 plasmid, enabled co-assembly of TrEgl and SdBgl into ^CgCLP structure, while the cells of C001, C002, and C004 did not. ^CgCLP was labeled with 10 nm gold particles by immunogold labelling. Scale bars, 200 nm. b, ELMs can degrade CMC-Na in a medium from a viscous gel to a thin solution only when both TrEgl and SdBgl were co-assembled into the CgCLP structure (TrEgl-Spa2+SdBgl-Spa2, C003 strain) , outperforming the case of the secreted free enzymes (TrEgl+SdBgl, C004 strain) . Δspa2Δdec (C001 strain) is the negative control strain. c, Degradation assays using CMC-Na as the substrate. The C003 strain showed 4-fold higher enzymeactivity than the C004 strain. d, HPLC assay for lycopene production with the C003 strain cultured in M63 medium with the replacement of the carbon source of glucose by CMC-Na with lycopene production induced by the addition of IPTG. Results are presented as mean ± s. d. The P values of C003 strain, C004 strains vs the C001 strain in c are P < 0.0001 and P=0.8629, respectively. Not significant (NS) P > 0.05, ****P < 0.0001. Statistics were derived using a t-test. Each experiment was performed at least in triplicate.

Detailed Description of the Invention

1. Definitions

Unless otherwise indicated, all terms used herein have the same meaning as they would to one skilled in the art, and the practice of the present disclosure will employ conventional techniques of microbiology and recombinant DNA technology, which are within the knowledge of those of skill in the art.

As used herein, the term “covalently-linked pili” or “CLP” refers to pili in which the monomers are linked to each other via covalent bonds. The engineered living materials herein refers to the pili formed by the engineered monomers, i.e., the fusion polypeptide of the present disclosure, or recombinant bacterium forming the pili.

As an example of the CLP forming bacteria, C. glutamicum, a Gram-positive bacterium, is “generally regarded as safe” (GRAS) ; this bacterium presents a potential platform for various product such as amino acids, and lycopene.

As used herein, the terms “peptide” can be exchanged with “polypeptide” and “protein” , means a chain comprising at least two amino acids linked by peptide bond, such as ten or more amino acid residues. The chemical formulas or sequences of all the peptides and polypeptide herein are written in left-to-right order, showing the direction from the amino terminal to the carboxyl terminal. “Peptide” , “polypeptide” and “protein” can include, but are not limited to, an enzyme, an antibody, a hormone, a ligand, a receptor, etc.

The term “amino acid” includes amino acids naturally occurred in proteins and the unnatural amino acids. The conventional nomenclature (one-letter and three-letter) of the amino acids naturally occurred in proteins is employed, which can be seen in Sambrook, et al. (Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989) .

As used herein, the term “fusion polypeptide” is a recombinant product comprising two or more peptide fragments which are not present in a single natural polypeptide. The fragments can be fused directly or via a linker, such as a flexible linker, e.g., GS linkers. Generally, a fusion polypeptide can be produced by the expression of a polynucleotide comprising nucleotide sequences encoding the two or more peptide fragments and the linker, if present, in desired order.

As used herein, the term “polynucleotide” usually refers to generally a nucleic acid molecule (e.g., 100 nucleotides and up to 30k nucleotides in length) and a sequence that is either complementary (antisense) or identical (sense) to the sequence of a messenger RNA (mRNA) or miRNA fragment or molecule. The term can also refer to DNA or RNA molecules that are either transcribed or non-transcribed.

As used herein, the term “polynucleotide construct” refers to a single-stranded or double-stranded polynucleotide, which is isolated from a naturally occurring gene or modified to contain a nucleic acid segment that does not naturally occur. When the polynucleotide construct contains the control sequences required to express the coding sequence of the present disclosure, the polynucleotide construct comprises an “expression cassette” .

The term “exogenous polynucleotide” as used herein refers to a nucleotide sequence that does not originate from the host in which it is placed. It may be identical or heterologous to the host’s DNA. An example is a sequence of interest inserted into a vector. Such exogenous DNA sequences may be derived from a variety of sources including DNA, cDNA, synthetic DNA, and RNA. Exogenous polynucleotides also encompass DNA sequences that encode antisense oligonucleotides.

As used herein, the term “expression cassette” refers to a polynucleotide segment comprising a polynucleotide encoding a polypeptide operably linked to additional nucleotides provided for the expression of the polynucleotide, for example, control sequence.

As used herein, the term “encoding” means that a polynucleotide directly specifies the amino acid sequence of its protein product. The boundaries of the coding sequence are generally determined by an open reading frame, which generally starts with the ATG start codon or other start codons such as GTG and TTG, and ends with a stop codon such as TAA, TAG and TGA. The coding sequence can be a DNA, cDNA or recombinant nucleotide sequence.

As used herein, the term “expression” includes any step involved in the production of a polypeptide, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification, and secretion.

A “control sequence” includes all elements necessary or beneficial for the expression of the polynucleotide encoding the polypeptide of the present disclosure. Each control sequence may be natural or foreign to the nucleotide sequence encoding the polypeptide, or natural or foreign to each other. Such control sequences include, but are not limited to, leader sequence, polyadenylation sequence, propeptide sequence, promoter, enhancer, signal peptide sequence, and transcription terminator. At a minimum, control sequences include a promoter and signals for the termination of transcription and translation.

For example, the control sequence may be a suitable promoter sequence, a nucleotide sequence recognized by the host cell to express the polynucleotide encoding the polypeptide of the present disclosure. The promoter sequence contains a transcription control sequence that mediates the expression of the polypeptide. The promoter may be any nucleotide sequence that exhibits transcriptional activity in the selected host cell, for example, lac operon of E. coli. The promoters also include mutant, truncated and hybrid promoters, and can be obtained from genes encoding extracellular or intracellular polypeptides, which are homologous or heterologous to the host cell.

As used herein, the term “operably linked” herein refers to a configuration in which a control sequence is placed at an appropriate position relative to the coding sequence of the polynucleotide sequence, whereby the control sequence directs the expression of the polypeptide coding sequence.

The polynucleotide encoding a polypeptide of interest can be subjected to various manipulations to improve the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector or the host, such as codon optimization, is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.

The term “recombinant” as used herein refers to nucleic acids, vectors, polypeptides, or proteins that have been generated using DNA recombination (cloning) methods and are distinguishable from native or wild-type nucleic acids, vectors, polypeptides, or proteins.

As used herein, the term "hybridization" that nucleotides sequences, which are at least about 90%, preferably at least about 95%, more preferably at least about 96%, and more preferably at least 98%homologous to each other, generally maintain hybridization with each other under given stringent hybridization and washing conditions.

For the present disclosure, in order to determine the percentage identity between two amino acid sequences or two nucleic acid sequences, the sequences are aligned for the purpose of optimal comparison (e.g., a gap can be introduced into the first amino acid or nucleic acid sequence for the optimal alignment with the second amino acid or nucleic acid sequence) . Then, the amino acid residues or nucleotides at the corresponding amino acid positions or nucleotide positions are compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide at the corresponding position in the second sequence, these molecules are identical at this position. The percentage identity between two sequences is a function of the number of identical positions shared by the sequences (i.e., percentage identity=number of identical positions/total number of positions (i.e., the overlapping positions) ×100) . Preferably, the two sequences are identical in length.

A person skilled in the art knows that various computer programs can be used to determine the identity between two sequences.

“Identity percentage” or “sequence identity percentage” refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, the two polypeptides or polynucleotides have approximately the specified percentage of identical amino acids. For example, “95%identity” refers to the comparison between the amino acids of two polypeptides or nucleotides between two polynucleotides, and when optimally aligned, 95%of the amino acids in the two polypeptides or 95%of the nucleotides in the two polynucleotides are identical.

A person skilled in the art knows various conditions for hybridization, such as stringent hybridization conditions and highly stringent hybridization conditions. See, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds. ) , 1995, Current Protocols in Molecular Biology, John Wiley & Sons, N.Y..

Of course, the polynucleotide of the present disclosure does not include a polynucleotide that only hybridizes to a poly A sequence (such as the 3' end poly (A) of mRNA) or a complementary stretch of poly T (or U) residues.

As used herein, the term “host cell” refers to, for example microorganisms, yeast cells, insect cells, and mammalian cells, that can be, or have been, used as recipients of vectors. The term includes the progeny of the original cell which has been transduced. Thus, a “host cell” as used herein generally refers to a cell which has been transduced with an exogenous DNA sequence. It is understood that the progeny of a single parental cell may not necessarily be completely identical in morphology or in genomic or total DNA complement to the original parent, due to natural, accidental, or deliberate mutation.

2. Fusion polypeptide

Through genetic manipulation, bio-imaging, and structural characterization, Spa2 protein is identified as the major pilin of the CLP fiber structure. Using structure-guided design, the inventor developed a new type of engineerable extracellular protein scaffold that can be genetically appended with diverse functional peptides or proteins at multiple sites of Spa2 protein.

The present disclosure provides a fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.

In some embodiments, the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) , Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115 (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.

In some embodiments, the carrier protein is a major pilin.

Preferably, the fusion of insertion of the polypeptide of interest does not influence the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond in the carrier protein.

In some embodiments, the carrier protein is a major pilin from Corynebacterium glutamicum (Spa2 protein) . It is observed that the Spa2 protein (SEQ ID NO: 1) comprises three tandem Ig-like domains, including N-domain (residues 36-197) , M-domain (residues 198-343) , and C-domain (residues 344-469) which is consistent with other major pilin. It is also observed that the deletion of M-domain does not influence the formation of CLP. In some embodiments, the polypeptide of interest is inserted into the M domain of the major pilin. In some embodiments, the polypeptide of interest replaces the M domain of the major pilin or a part thereof.

The Spa2 protein from different Corynebacterium glutamicum strains may vary in sequence. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged. The carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide. In some embodiments, the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.

In some embodiments, the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.

In some embodiments, the polypeptide of interest is directly linked to the N terminal of the carrier polypeptide. In some embodiments, the polypeptide of interest is linked to the N terminal of the carrier polypeptide via a peptide linker such as a flexible linker.

A peptide linker can be generally short peptides with about 4-20 or more amino acids, such as combinations of Ser and Gly residues, which is a conventional flexible linker. In some embodiments, the peptide linker used in the present disclosure is (G4S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G3S) n, n=1-4. In some embodiments, the peptide linker used in the present disclosure is (G4S) 2 i.e., SEQ ID NO: 22. In some embodiments, the peptide linker is a C10 linker of SEQ ID NO: 23.

The polypeptide of interest can be selected according to the desired application of the fusion polypeptide.

In some embodiments, the fusion polypeptide is provided to bind, capture or enrich a target molecule, and the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody. For example, the fusion polypeptide is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37) , and the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.

In some embodiments, the fusion polypeptide is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .

In some embodiments, the fusion polypeptide is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme. In some embodiments, the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) . In some embodiments, the fusion polypeptide is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.

3. Polynucleotide and Vector

The present disclosure provides a polynucleotide encoding the fusion polypeptide of the present disclosure.

The polynucleotide of the present disclosure can be amplified with cDNA, mRNA or genomic DNA as the template and suitable oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid amplified as above can be cloned into a suitable vector and characterized by DNA sequence analysis.

The polynucleotide of the present disclosure can be prepared by standard synthesis techniques, for example, by using an automated DNA synthesizer.

The present disclosure also relates to the complementary strand of the nucleic acid molecule described herein. A nucleic acid molecule that is complementary to other nucleotide sequence is a molecule that is sufficiently complementary to the nucleotide sequence so that it can hybridize with the other nucleotide sequences to form a stable duplex.

In order to express the fusion polypeptide of the present disclosure, also provided is a polynucleotide construct and a vector comprising the polynucleotide of the present disclosure, such as an expression vector.

In some embodiments, the polynucleotide of the present disclosure is operably linked to a promoter. In some embodiments, the promoter is a constitutive promoter, such as the native promoter driving Spa2 gene in Corynebacterium glutamicum. In some embodiment, the promoter is an inducible promoter.

In some embodiments, the expression vector comprises a Lac operon.

The polynucleotide encoding the polypeptide of the present disclosure can be subjected to various manipulations to allow the expression of the polypeptide. Before the insertion thereof into a vector, manipulation of the polynucleotide according to the expression vector is desirable or necessary. Techniques for modifying polynucleotide sequences with recombinant DNA methods are well known in the art.

In order to identify and select host cells comprising the expression vector of the present disclosure, the vector of the present disclosure preferably contains one or more selectable markers, which allow simple selection of transformed, transfected, transduced, etc. cells. A selectable marker is a gene, of which the product provides biocide or virus resistance, heavy metal resistance, supplemental auxotrophs, etc. For example, the bacterial selectable marker is the dal gene from Bacillus subtilis or Bacillus licheniformis, or a marker that confers antibiotic resistance such as ampicillin, kanamycin, chloramphenicol or tetracycline resistance.

The vector of the present disclosure can be integrated into the genome of the host cell or autonomously replicate in the cell, which is independent of the genome. The elements required for the integration into the genome of the host cell or the autonomous replication are known in the art (see, for example, the aforementioned Sambrook et al., 1989) .

4. Recombinant cell

The present disclosure provides a recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.

In some embodiments, the carrier protein in the fusion polypeptide is the native major pilin of the recombinant cell.

In some embodiments, the recombinant cell is a recombinant gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) , Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115 (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.

In some embodiments, the carrier protein is a major pilin. In some embodiments, the carrier protein is the native major pilin of the bacterium.

The Spa2 protein from different Corynebacterium glutamicum strains may vary in sequence. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4. In some embodiments, the carrier protein comprises an amino acid sequence of SEQ ID NO: 1, 2, 3, or 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to SEQ ID NO: 1, 2, 3, or 4 with the residues corresponding to residues C97, C128, K194, C380, C432, and LPLTG (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.

The carrier protein can be the mature form of SEQ ID NO: 1, 2, 3, or 4, i.e., with the deletion of the signal peptide. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4. In some embodiments, the carrier protein comprises amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4, or an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%or 99.5%identical to amino acids 35 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4with the residues corresponding to residues C97, C128, E158, K194, D246, C380, C432, E435, and LPLTGT (474-478) , and optionally E158, D246, and/or E435 of SEQ ID NO: 1 unchanged.

The polypeptide of interest can be selected according to the desired application of the fusion polypeptide. In some embodiments, the fusion polypeptide is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) .

In some embodiments, the recombinant cell is provided to bind, capture or enrich a target molecule, and the polypeptide of interest is a polypeptide that can recognize a target peptide, including but not limited to a ligand, a receptor, an antigen and an antibody such as scFV and nanobody. For example, the recombinant cell is provided to capture a protein comprising a SpyTag (SEQ ID NO: 37) , and the polypeptide of interest comprises SpyCatcher (SEQ ID NO: 15) , vice versa.

In some embodiments, the recombinant cell is provided as an adhesive agent, and the polypeptide of interest is an adhesive peptide, e.g., Mfp35 (SEQ ID NO: 38) .

In some embodiments, the recombinant cell is provided to catalyze chemical or biochemical reactions, and the polypeptide of interest is an enzyme. In some embodiments, the recombinant cell is provided to degrade carbohydrates such as cellulose, and the polypeptide of interest can be the endo-1, 4-β-glucanase, e.g., from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and/or β-glucosidase, e.g., from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) . In some embodiments, the recombinant cell is provided to degrade refractory organics, such as plastics, and the polypeptide of interest is an enzyme responsible for the degradation, such as a PETase.

The present disclosure provides a method of preparing the recombinant cell of present disclosure, comprising introducing a polynucleotide encoding the fusion polypeptide of the present disclosure into a host cell.

In some embodiments, the carrier protein in the fusion polypeptide is the native major pilin of the host cell.

In some embodiments, the host cell is a gram-positive bacterium. In some embodiments, the host cell is a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum. The bacterium can include, but are not limited to, a bacterium selected from Corynebacterium glutamicum strain BE (GenBank assembly accession: GCA_013046805.1) , Corynebacterium glutamicum ATCC 14067 (GenBank assembly accession: GCA_002243555.1) , Corynebacterium glutamicum strain YI (GenBank assembly accession: GCA_001643035.1) , Corynebacterium glutamicum strain ATCC 13869 (GenBank assembly accession: GCA_001687645.1) , Corynebacterium glutamicum AJ1511 (GenBank assembly accession: GCA_002355675.1) , Corynebacterium glutamicum strain XV (GenBank assembly accession: GCA_001936195.1) , Corynebacterium glutamicum strain CP (GenBank assembly accession: GCA_001447865.2) , Corynebacterium glutamicum R (GenBank assembly accession: GCA_000010225.1) , Corynebacterium glutamicum strain USDA-ARS-USMARC-56828 (GenBank assembly accession: GCA_001518935.2) , Bifidobacterium breve strain LMC520 (GenBank assembly accession: GCA_001990225.1) , Bifidobacterium breve strain BR3 (GenBank assembly accession: GCA_001281425.1) , Bifidobacterium breve strain NRBB51 (GenBank assembly accession: GCA_002838405.1) , Bifidobacterium breve strain NRBB09 (GenBank assembly accession: GCA_002838325.1) , Bifidobacterium breve 12L (GenBank assembly accession: GCA_000568955.1) , Bifidobacterium breve strain DRBB26 (GenBank assembly accession: GCA_002838225.1) , Bifidobacterium breve strain 180W83 (GenBank assembly accession: GCA_002838525.1) , Bifidobacterium breve strain JSRL01 (GenBank assembly accession: GCA_009498435.1) , Bifidobacterium breve 689b (GenBank assembly accession: GCA_000569055.1) , Bifidobacterium breve strain DRBB29 (GenBank assembly accession: GCA_002838705.1) , Bifidobacterium breve strain DRBB27 (GenBank assembly accession: GCA_002838445.1) , Bifidobacterium breve strain JR01 (GenBank assembly accession: GCA_009931415.1) , Bifidobacterium breve S27 (GenBank assembly accession: GCA_000569075.1) , Bifidobacterium breve ACS-071-V-Sch8b (GenBank assembly accession: GCA_000213865.1) , Bifidobacterium breve strain NRBB56 (GenBank assembly accession: GCA_002838425.1) , Bifidobacterium breve DSM 20213 = JCM 1192 (GenBank assembly accession: GCA_001025175.1) , Bifidobacterium breve strain NRBB01 (GenBank assembly accession: GCA_002838245.1) , Bifidobacterium breve strain FDAARGOS_561 (GenBank assembly accession: GCA_003813065.1) , Bifidobacterium breve strain NCTC11815 (GenBank assembly accession: GCA_900637145.1) , Bifidobacterium breve strain NRBB52 (GenBank assembly accession: GCA_002838385.1) , Bifidobacterium breve strain 082W48 (GenBank assembly accession: GCA_002838545.1) , Bifidobacterium breve strain lw01 (GenBank assembly accession: GCA_003860285.1) , Bifidobacterium breve UCC2003 (GenBank assembly accession: GCA_000220135.1) , Bifidobacterium breve strain NRBB11 (GenBank assembly accession: GCA_002838305.1) , Bifidobacterium breve strain NRBB04 (GenBank assembly accession: GCA_002838285.1) , Bifidobacterium breve NCFB 2258 (GenBank assembly accession: GCA_000569035.1) , Bifidobacterium breve strain NRBB20 (GenBank assembly accession: GCA_002838645.1) , Bifidobacterium breve strain NRBB27 (GenBank assembly accession: GCA_002838665.1) , Bifidobacterium breve strain NRBB49 (GenBank assembly accession: GCA_002838685.1) , Bifidobacterium breve strain NRBB18 (GenBank assembly accession: GCA_002838605.1) , Bifidobacterium breve strain NRBB02 (GenBank assembly accession: GCA_002838265.1) , Bifidobacterium breve strain NRBB19 (GenBank assembly accession: GCA_002838625.1) , Bifidobacterium breve strain 017W439 (GenBank assembly accession: GCA_002838465.1) , Bifidobacterium breve JCM 7017 (GenBank assembly accession: GCA_000568975.1) , Bifidobacterium breve strain NRBB50 (GenBank assembly accession: GCA_002838365.1) , Bifidobacterium breve strain 139W423 (GenBank assembly accession: GCA_002838565.1) , Bifidobacterium breve strain DRBB28 (GenBank assembly accession: GCA_002838505.1) , Bifidobacterium breve strain CNCM I-4321 (GenBank assembly accession: GCA_002838585.1) , Bifidobacterium breve strain DRBB30 (GenBank assembly accession: GCA_002838725.1) , Bifidobacterium breve strain NRBB57 (GenBank assembly accession: GCA_002838345.1) , Bifidobacterium breve strain 215W447a (GenBank assembly accession: GCA_002838485.1) , Lactococcus lactis subsp. cremoris NZ9000 (GenBank assembly accession: GCA_000143205.1) , Lactococcus lactis subsp. cremoris MG1363 (GenBank assembly accession: GCA_000009425.1) , Lactococcus lactis subsp. cremoris A76 (GenBank assembly accession: GCA_000236475.1) , Lactococcus lactis strain SRCM103457 (GenBank assembly accession: GCA_004194355.1) , Lactococcus lactis strain CBA3619 (GenBank assembly accession: GCA_007954765.1) , Lactococcus lactis strain WiKim0098 (GenBank assembly accession: GCA_016406265.1) , Lactococcus lactis strain K_LL005 (GenBank assembly accession: GCA_014334715.1) , Lactococcus lactis subsp. lactis strain G121 (GenBank assembly accession: GCA_013395015.1) , Lactococcus lactis strain N8 (GenBank assembly accession: GCA_014884605.1) , Lactococcus lactis subsp. lactis IO-1 (GenBank assembly accession: GCA_000344575.1) , Lactococcus lactis subsp. lactis strain F44 (GenBank assembly accession: GCA_002804185.1) , Lactococcus lactis subsp. lactis bv. diacetylactis strain S50 (GenBank assembly accession: GCA_003627395.2) , Lactococcus lactis strain FDAARGOS_1064 (GenBank assembly accession: GCA_016127135.1) , Lactococcus lactis strain FDAARGOS_887 (GenBank assembly accession: GCA_016027975.1) , Lactococcus lactis subsp. lactis strain UC77 (GenBank assembly accession: GCA_002078615.2) , Lactococcus lactis strain FDAARGOS_866 (GenBank assembly accession: GCA_016028815.1) , Lactococcus lactis strain IL1403 (GenBank assembly accession: GCA_003722275.1) , Lactococcus lactis strain FDAARGOS_865 (GenBank assembly accession: GCA_016028835.1) , Lactococcus lactis subsp. cremoris IBB477 (GenBank assembly accession: GCA_001856165.1) , Lacticaseibacillus paracasei strain TD 062 (GenBank assembly accession: GCA_009834405.1) , Lacticaseibacillus paracasei strain HM1 (GenBank assembly accession: GCA_018064185.1) , Bacillus thuringiensis strain FDAARGOS_794 (GenBank assembly accession: GCA_013267795.1) , Bacillus thuringiensis strain XL6 (GenBank assembly accession: GCA_000774075.2) , Bacillus thuringiensis strain Bt-GS57 (GenBank assembly accession: GCA_017751245.1) , Bacillus thuringiensis strain HER1410 (GenBank assembly accession: GCA_013340745.1) , Bacillus thuringiensis serovar tolworthi (GenBank assembly accession: GCA_001548175.1) , Bacillus thuringiensis strain BT62 (GenBank assembly accession: GCA_003054785.2) , Bacillus thuringiensis strain HD12 (GenBank assembly accession: GCA_001598095.1) , Bacillus thuringiensis serovar alesti strain BGSC 4C1 (GenBank assembly accession: GCA_001640965.1) , Bacillus thuringiensis LM1212 (GenBank assembly accession: GCA_003546665.1) , Lacticaseibacillus paracasei strain 347-16 (GenBank assembly accession: GCA_012955485.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0734 (GenBank assembly accession: GCA_015476135.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0747 (GenBank assembly accession: GCA_015476175.1) , Lacticaseibacillus paracasei strain CBA3611 (GenBank assembly accession: GCA_007292115.1) , Lacticaseibacillus paracasei subsp. paracasei strain GR0548 (GenBank assembly accession: GCA_019175405.1) , Lacticaseibacillus paracasei subsp. paracasei strain IBB3423 (GenBank assembly accession: GCA_009739485.1) , Lacticaseibacillus paracasei strain NFFJ04 (GenBank assembly accession: GCA_014905075.1) , Lacticaseibacillus paracasei strain HL182 (GenBank assembly accession: GCA_017638905.1) , Lacticaseibacillus paracasei strain Lpc10 (GenBank assembly accession: GCA_003199005.1) , Lacticaseibacillus paracasei subsp. tolerans strain AO356 (GenBank assembly accession: GCA_003957435.1) , Lacticaseibacillus paracasei subsp. tolerans strain MGB0625 (GenBank assembly accession: GCA_015476155.1) , Lacticaseibacillus paracasei strain 10266 (GenBank assembly accession: GCA_008329845.1) , Lacticaseibacillus paracasei subsp. tolerans strain S-NB (GenBank assembly accession: GCA_016757695.1) , Lacticaseibacillus paracasei strain Lp02 (GenBank assembly accession: GCA_013307125.1) , Lacticaseibacillus paracasei strain ZFM54 (GenBank assembly accession: GCA_003627255.1) , Lacticaseibacillus paracasei subsp. paracasei strain TMW 1.1434 (GenBank assembly accession: GCA_002813615.1) , Lacticaseibacillus paracasei strain SRCM103299 (GenBank assembly accession: GCA_004141835.1) , Lacticaseibacillus paracasei strain NJ (GenBank assembly accession: GCA_007637635.1) , Lacticaseibacillus paracasei strain EG9 (GenBank assembly accession: GCA_003177075.1) , Lacticaseibacillus paracasei strain TK-P4A (GenBank assembly accession: GCA_015377585.1) , Lacticaseibacillus paracasei subsp. paracasei strain BD5115 (GenBank assembly accession: GCA_018596415.1) , and Lacticaseibacillus paracasei subsp. Paracasei JCM 8130 (GenBank assembly accession: GCA_000829035.1) , preferably, Corynebacterium glutamicum ATCC 14067.

In some embodiments, the host cell is modified to inactivate the native major pilin. In some embodiments, the method comprises a step of knocking out the native major pilin. The endogenous polynucleotide encoding the major pilin can also be replaced by the polynucleotide encoding the fusion polypeptide via homologous recombination.

5. Modified CLP

The present disclosure provides a modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of the present disclosure. In some embodiments, the modified CLP is cell-free.

The present disclosure further provides a method of preparing a modified CLP comprising the steps of a) providing the fusion polypeptide of the present disclosure; and b) providing an activity of sortase. In some embodiments, the modified CLP is cell-free.

In some embodiments, the fusion polypeptide is provided by transcribing and/or translalting the polynucleotide of the present disclosure. In some embodiments, the activity of sortase is provided by transcribing and/or translalting one or more polynucleotides encoding a sortase.

In some embodiments, the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature. In some embodiments, the method comprises contacting the fusion polypeptide of the present disclosure with the sortase protein. In some embodiments, the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster. In some embodiments, the method is an in vitro method.

The present disclosure provides a polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of the present disclosure, and one or more polynucleotides encoding a sortase.

Benefits

The modified CLP and recombinant cell achieve the cascade reaction of enzymes, and improves the catalytic efficiency of a multi-enzyme system. The immobilization of enzymes onto CLP and recombinant cells can achieve a whole-cell catalyzation.

Examples

Example 1. Materials and Methods

Unless otherwise indicated, the experiments in the Examples are conventional in the art, and the experiments employing commercially available kit or reagents were carried out according to the manufacturer’s instructions.

1.1. Strains, plasmids, and media.

General Method

The original DNA sequence was fully synthesized (Genewiz, Nanjing, China) or PCR-generated. All PCR products were generated by KOD DNA polymerase (TOYOBO, Japan) . All plasmid construction was performed using the T4 DNA ligase (New England BioLabs, Boston, MA) for ligations or the NEB Builder HiFi DNA Assembly Master Mix (New England BioLabs, Boston, MA) for assembly. All plasmids or markerless strains were confirmed by DNA sequencing (GENEWIZ, Guangzhou, China) . Primers used in the Examples are listed in Table 1.

Table 1. Primers

Growth Media

C. glutamicum ATCC140675 was provided by Dr. Zheng’s research group at the South China University of Technology. C. glutamicum ATCC14067 was grown in BHI liquid medium for recovery (37 g L ^-1 brain heart infusion (Becton, Dickinson and company) ) at 30 ℃, 250 rpm, overnight. For ^CgCLP formation, C. glutamicum ATCC14067 was inoculated into M63 liquid medium (15.6 g L ^-1 M63 Broth (Sangon Biotech, Guangzhou, China) , supplemented with 1 mM MgSO4, 0.2% (wt/vol) glucose) and cultivated in an incubator at 30 ℃ without shaking for 2-3 days. Antibiotics for C. glutamicum culture were kanamycin (25 μg mL ^-1) and hloramphenicol (7.5 μg mL ^-1) .

Isopropyl-β-d-thiogalactoside (IPTG) at 1 mM/0.5mM or theophylline at 1mM was used to induce gene expression. Trans1-T1 (TransGen Biotech, Shenzhen, China) was used as the cloning host for plasmid manipulation, and E. coli BL21 (DE3) (New England BioLabs, Boston, MA) was used for protein expression. E. coli was cultured in Luria-Bertani medium (10 g L ^-1 peptone, 5 g L ^-1 yeast extract, 10 g L ^-1 NaCl) at 37 ℃ or 16 ℃ when applicable for protein expression. Antibiotics for E. coli culture were kanamycin (50 μg mL ^-1) and chloramphenicol (30 μg mL ^-1) .

Strain construction

The markerless deletion strains of C. glutamicum ATCC 14067 were achieved by the RecET-Cre/loxP system. Detailed methods for markerless deletion are described in Huang, Y. et al. (Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette. Sci. Rep. 7, 1-8, 2017) .

Briefly, to create a CLP-defective strain of Δclp, we first constructed a self-excisable cassette of Δclp-cassette. Primer pairs ck-S/A were used to amplify the fragment of the Cre-Kan cassette from the PBS-Cre-Kan plasmid. Primer pairs clpL-S/A and clpR-S/A were used to amplify ～800 bp left and right homologous fragments from the genome of C. glutamicum ATCC 14067. Finally, all dsDNA fragments, including the Cre-Kan cassette, the left and right homologous fragments, were used for subsequent fusion PCR to generate a ～4, 385 bp linear self-excisable dsDNA cassette with primer pairs clpL-S/clpR-A.

Similarly, to construct the Δspa1-cassette, primer pairs spa1L-S/A, spa1R-S/A, ck-S/A and spa1L-S/spa1R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.

For the Δspa2-cassette, primer pairs spa2L-S/A, spa2R-S/A, ck-S/A and spa2L-S/spa2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.

For the Δspa3-cassette, primer pairs spa3L-S/A, spa3R-S/A, ck-S/A and spa3L-S/spa3R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.

For the ΔsrtC1ΔsrtC2-cassette, primer pairs srtC1L-S/A, srtC2R-S/A, ck-S/A and srtC1L-S/srtC2R-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.

For the ΔsrtA-cassette, primer pairs srtAL-S/A, srtAR-S/A, ck-S/A and srtAL-S/srtAR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassettes, respectively.

For the Δdec-cassette, primer pairs decL-S/A, decR-S/A, ck-S/A and decL-S/decR-A were used to amplify the left and right homologous fragments, Cre-Kan cassette, and the linear self-excisable dsDNA cassette, respectively.

Then the self-excisable dsDNA cassettes for markerless deletion of different genes were transformed into exonuclease-recombinase RecE/T expressed competent cells (C. glutamicum ATCC 1406) by electroporation, yielding multiple Kan-resistant colonies on BHI agar plates. In particular, the cell-plasmid DNA/dsDNA mixture was transferred to an ice-cold electroporation cuvette (0.1 cm electrode gap) . Electroporation was performed with a Bio-Rad Micropulser set by three times 1.8 KV/cm (Ec1) pulse (see Huang et al., Recombineering using RecET in Corynebacterium glutamicum ATCC14067 via a self-excisable cassette, Sci Rep 7, 7916 (2017) )

To achieve markerless deletion mutants, Cre enzyme was used to induce expression by adding 1 mM theophylline and excising selectable marker by Cre/lox site specific recombination. Finally, sequencing of the PCR fragments from the genomic of mutants was performed for further identification. The resultant mutant strains used in this study were referred to as C. glutamicum ATCC 14067 Δclp (Δclp) , C. glutamicum ATCC 14067 Δspa1 (Δspa1) , C. glutamicum ATCC 14067 Δspa2 (Δspa2) , C. glutamicum ATCC 14067 Δspa3 (Δspa3) , and C. glutamicum ATCC 14067 ΔsrtC1ΔsrtC2 (ΔsrtC1ΔsrtC2) . C. glutamicum ATCC 14067 Δspa1Δspa3 (Δspa1Δspa3) mutant was constructed by transforming Δspa3-cassette into Δspa1 strain. C. glutamicum ATCC 14067 Δspa2ΔsrtA (Δspa2ΔsrtA) and C. glutamicum ATCC 14067 Δspa2Δdec (Δspa2Δdec) mutants were constructed by transforming ΔsrtA-cassette and Δdec-cassette into Δspa2 strain, respectively, as described above.

Plasmid construction

i) Construction of plasmids for constitutive expression of Spa2 pilin and different fusion proteins

[Rectified under Rule 91, 16.01.2023]
The pEC-XK99E plasmid was used as an original plasmid. DNA fragments of the pEC-XK99E backbone (GNENWIZ, China) the coding sequence of Spa2 or various recombinant Spa2 (SEQ ID NOs: 1, 5, 8-14, and 24, respectively) , and the native promoter (SEQ ID NO: 25) of spa2 gene via PCR, and then all the DNA fragments were assembled by NEB Builder HiFi DNA Assembly Master Mix to construct the plasmids pEK-spa2, pEK-spa2cut, pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2, pEK-E4/mCherry-spa2, pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2, pEK-N-mCherry-C (see Fig. 1 for example) .

ii) Construction of pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, pEK-CcEgl-Spa2, pEK-N-Ven_C-Ven, pEK-N-Ven-Spa2, pEK-C-Ven-Spa2, pEK-N-Ven-Spa2_C-Ven-Spa2, pEC-TrEgl_SdBgl and pEC-TrEgl-Spa2_SdBgl-Spa2 plasmids

The two basic plasmids 203 and 204 (see Fig. 2) were constructed based on pEC-XK99E backbone with additional restriction sites of SmaI, XbaI, NcoI, BamHI, SpeI and SalI by Gibson assembly with NEB Builder HiFi DNA Assembly Master Mix. SmaI, XbaI, and NcoI were used to fuse proteins with Spa2 pilin, and SpeI and SalI (Takara) were used to insert another independent expression cassette for fusion protein.

To create the plasmids of pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, pEK-CcEgl-Spa2, pEK-N-Ven-Spa2, and pEK-TrEgl-Spa2, the coding sequences (CDSs) of SpyCatcher, Venus, CcEgl, N-Ven, and TrEgl (SEQ ID NOs: 15-19) were cloned into the SmaI and XbaI sites in 203 by ligation.

To construct the plasmids of pEK-N-Ven and pEK-TrEgl, the CDSs of N-Ven and TrEgl (SEQ ID NOs: 18 and 19) were inserted into the linearized backbone of 203 (digestion with SmaI and SpeI, Takara) via Gibson assembly.

To create the plasmids of pEK-C-Ven-Spa2 and pEK-SdBgl-Spa2, the CDSs of C-Ven and SdBgl (SEQ ID NOs: 20 and 21) were cloned into the SmaI and XbaI sites in 204 by ligation.

To construct the plasmids pEK-C-Ven and pEK-SdBgl, the CDSs of C-Ven and SdBgl (SEQ ID NOs: 20 and 21) were inserted into the linearized backbone of 204 (digestion with SmaI and SalI, Takara) via Gibson assembly.

Finally, the C-Ven-Spa2 cassette was obtained by digesting pEK-C-Ven-Spa2 with SpeI and SalI, and then, cloned into the plasmid of pEK-N-Ven-Spa2 (digested with SpeI and SalI, Takara) to construct tandem expression plasmids of pEK-N-Ven-Spa2_C-Ven-Spa2 (see Fig. 3) .

A similarly strategy was used to construct other tandem expression plasmids of pEK-TrEgl-Spa2_SdBgl-Spa2, pEK-N-Ven_C-Ven, and pEK-TrEgl_SdBgl. pEC-TrEgl-Spa2_SdBgl-Spa2 and pEC-TrEgl_SdBgl were constructed by replacing the kanamycin resistance with chloramphenicol resistance (see Fig. 3) .

iii) Construction of the pZ9-dxs_crtEBI plasmid

[Rectified under Rule 91, 16.01.2023]
The gene fragments of dxs (SEQ ID NO: 26) and crtEBI (crtE, SEQ ID NO: 27, and crtBI, SEQ ID NO: 28) were amplified from the genome of C. glutamicum ATCC 13032 with primer pairs dxs-A/dxs-S, crtE-S/crtE-A, and crtBI-S/crtBI-A, respectively; the Ptac promoter (SEQ ID NO: 30) driven dxs and crtEBI was amplified with primer pair ptrc-S/ptrc-A; and the lacI fragment (SEQ ID NO: 29) was amplified from pEC-XK99E with primer pair lacI-S/lacI-A.

Then, the dxs, crtEBI and lacI fragments were assembled into the pZ9 backbone (GENEWIZ, China) by Gibson assembly to construct the pZ9-dxs_crtEBI plasmid (Fig. 4) .

iv) Construction of the pET-28a-Spa2 plasmid

The coding sequence of Spa2 (SEQ ID NO: 6) was amplified from the genome of C. glutamicum ATCC 14067, and then assembled into the pET-28a (+) backbone (Novagen, Madison, WI) by Gibson assembly (see Fig. 5) .

Transmission electron microscopy and immunogold labelling.

Transmission electron microscope imaging. C. glutamicum cells cultured 2-3 days in M63 medium were collected and washed twice in PBS buffer, and 20 μL of liquid culture in M63 (OD600 ≈1) were deposited onto carbon-coated TEM grids for 5-10 min. The samples were washed two times with 50 μL PBS buffer and three times with 20 μL water, and then, the excessive solution was quickly wicked away with filter paper. The cells were deposited onto the cropper wire mesh, and were negatively stained with 15 μL 2 w/v%uranyl acetate solutions for 1 min and dried for 10 min under an infrared lamp. Samples were examined in a JEOL JEM-1400 transmission electron microscope at an accelerating voltage of 120 kv.

[Rectified under Rule 91, 16.01.2023]
Immunogold labelling. Partial of the CDSs of ^CgCLP pilins of Spa1 (SEQ ID NO: 31, Spa1-Ab) , Spa2 (SEQ ID NO: 32, Spa2ab) and Spa3 (SEQ ID NO: 33, Spa3-Ab) , were expressed in E. coli, purified and injected into rabbits to prepare the specific polyclonal antibodies α-Spa1, α-Spa2 and α-Spa3 (Your Bio-Tech Partner, Shanghai, China) , respectively.

For immunogold labelling, 20 μL of liquid culture of C. glutamicum in M63 (OD600 ≈1) were placed on carbon-coated grids for 10 min, washed two times with PBS buffer and three times with water. The samples were blocked with PBS with 1%bovine serum albumin (Sangon Biotech, A600332-0100) for 30 min. The solution was wicked off with filter paper and the cells deposited onto the cropper wire mesh were stained with a pilin primary antibody (the polyclonal antibodies above) diluted 1: 200 in PBS with 1%BSA for 1 h, followed by washing and blocking (PBS+1%BSA) . Samples were stained with 10 nm gold-decorated goat anti-rabbit IgG (Bioss, Beijing, China) diluted 1: 50 in PBS with 1%BSA for 45 min followed by washing three times with PBS and five times with water. Then, negative staining as described above, drying and imaging were performed. Double immunogold labelling experiments were performed according to Budzik, J. M. et al. (Assembly of pili on the surface of Bacillus cereus vegetative cells. Mol. Microbiol. 66, 495-510, 2007) with some modification. Briefly, after the incubation with primary antibody, samples were incubated with PBS containing 3%paraformaldehyde and 2%glutaraldehyde for 2 h room temperature. Samples were washed three times with PBS and incubated with 0.02 M glycine in PBS for 10 min room temperature. The immunogold labelling process was performed with the second pilin antibody and different sizes (5 nm, 15 nm or 30 nm) of gold-decorated goat anti-rabbit IgG (Bioss, Beijing, China) , followed by negative staining, drying and imaging.

Quantitative assay of CLP via whole-cell filtration ELISA.

The presence of extracellular amyloids was detected for the quantitative assay of CLP by whole-cell filtration ELISA (see Nguyen, P. Q. et al., Programmable biofilm-based materials from engineered curli nanofibres. Nat. Commun. 5, 1-10, 2014) . Briefly, C. glutamicum strains were cultured for 48 h in M63 liquid medium, and the cultures were collected, washed and diluted to an OD600 of 0.1 in Tris-buffered saline with 0.1%ProclinTM 300 (Sigma, 48912-U) on ice. Then, 25 μL of the diluted culture was loaded in a Multiscreen-GV96-well filter plate (0.22 mm pore size; EMD Millipore) , followed by washing (TBST (Sangon Biotech, C520009-0005) + 0.1%ProclinTM 300) , blocking (TBST + 0.1%ProclinTM 300+1%bovine serum albumin+ 0.01%H ₂O ₂) , incubating with α-Spa2 (diluted to 1: 5, 000 in TBST+ 0.1%ProclinTM 300) , washing and blocking as above, and incubating with goat anti-rabbit HRP-conjugated secondary antibody (Sangon Biotech, Guangzhou, China; diluted to 1: 5,000 in TBST+ 0.1%ProclinTM 300) . Subsequently, a chromogenic reaction was performed via Ultra-TMB (3, 30, 5, 50-tetramethyl-benzidine, Thermo Fisher, 34028) , which was terminated by the addition of 2 M H ₂SO ₄. Finally, the product was measured absorbance at 450 nm (areference wavelength of 650 nm) with a Cytation reader (BioTek) .

AFM imaging.

In total, 2 mL of cultures in M63 liquid medium were incubated on a mica surface for 2-4 h to allow sample deposition. Excessive solution was wicked away with a pipette and washed two times with water. The samples were then dried by nitrogen gas and immediately collected for AFM imaging. ScanAsyst mode AFM was performed on a Dimension FastScanTM AFM (Bruker) using silica cantilevers (SANASYST-AIR, Bruker, K = 0.4 N/m, ～70 kHz) .

Expression and purification of recombinant Spa2.

The recombinant Spa2 was expressed as an N-terminus His-tagged protein. E. coli BL21 (DE3) transformed with plasmid PET-28a-Spa2 (CaCl ₂ process) were grown overnight at 37℃ to provide a starter culture for expression. A total of 1 L medium with 50 μg mL ^-1 kanamycin was inoculated with 1% (v/v) of the starter culture and grown at 37℃. When the OD600 reached 0.8, the cultivation temperature was lowered to 16℃ and IPTG was added to a final concentration of 0.5 mM to induce protein overexpression. After 16 h, cells were collected by centrifugation, and the cell pellets were suspended in buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) and lysed by high pressure homogenization. The cell lysates were centrifuged at 12, 000 rpm for 30 min at 4℃.

The resulting supernatant was loaded onto a Nickel-affinity column (5 mL, GE) pre-equilibrated with buffer A (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) . His-tagged Spa2 protein was eluted with buffer A with 50 mM imidazole. The His-tagged Spa2 protein was buffer-exchanged into buffer A and subjected to tag removal by HRV3c (SEQ ID NO: 34, 1 mg/50 mg Spa2) at 4 ℃ overnight. The digested product was loaded onto the 5-mL Ni-NTA column (GE) and eluted with a buffer A/buffer B (buffer A + 500 mM imidazole) gradient (5%buffer B, 10%buffer B, 20%buffer B and 100%buffer B) . The flow-through at 10% buffer B was collected.

Further purification was performed via ion-exchange chromatography (HiTrap Q HP, 5 mL &Cytiva) and size-exclusion chromatography (Uniondex 75 pg 16/60, UNION-BIOTECH, China) . The whole procedure of protein purification was carried out at 4 ℃.

Protein crystallization and structure determination.

The final purified protein was concentrated to 20 mg mL-1 in 10 mM Tris-HCl pH 8.0 and 50 mM NaCl for crystallization. The sitting drop vapor diffusion technique (http: //soft-matter. seas. harvard. edu/index. php/Vapor_Diffusion_Method) was used to crystallize the Spa2 protein. Crystals were obtained by mixing 4 μL of Spa2 protein with 4 μL reservoir solution (0.2 M sodium sulfate, 0.1 M Bis-Tris propane pH 7.5, 20 %w/v PEG 3350) and incubating the mixture at 18 ℃ for 1-2 weeks. The crystals were soaked in a cryo-protectant solution consisting of the reservoir solution and 20% (v/v) glycerol and then quickly frozen with liquid nitrogen. Diffraction data were collected on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility (Shanghai, China) with flash frozen crystals (at 100 K in a stream of nitrogen gas) . The data were processed by XDS9 and then further processed using STARANISO10 (aserver of Global Phasing Company) .

The recombinant Spa2 crystal form diffracted to

resolution (Fig. 14) and belongs to the space group P212121, with unit-cell parameters

α=β=γ=90.0° and two molecules in the asymmetric unit. The structure was solved by the molecular replacement method using PHASER11 and the predicted Spa2 coordinates by Alphafold Colab12 as template. Further manual model building was carried out using COOT13. The model was refined by PHENLX14. Data collection, phasing and refinement statistics are given in Table 3. Structure figures were prepared using PyMOL2.3.4 (https: //pymol. org/2/) .

Fluorescence measurements.

Plate-reader measurements. C. glutamicum colonies were inoculated into 10 mL BHI and cultured for 12 h. Then cells were transferred into M63 medium with an initial OD600 of 0.1 for 3 days at 30℃ without shaking. Cells were collected by centrifugation at 5, 000 rpm, washed three times with PBS and diluted with PBS (OD600 ≈0.5) . Exactly 200 μL of the samples were transferred to a flat-bottom 96-well black plate and analyzed on a Tecan Infinite Pro 200 Plate Reader, with excitation/emission wavelengths of 580/610 nm for mCherry fluorescence intensity, and 510/545 nm for Venus fluorescence intensity. The fluorescence intensity divided by the absorbance of OD is the normalized fluorescence intensity.

Fluorescence (confocal) microscopy imaging. Cells prepared for plate-reader measurements were dripped on a glass slide and imaged under a Nikon TI2-E inverted microscope. Microscope light source power, detector gain, and image processing settings were consistent among different samples.

Stains expressing SpyTag-Spa2, SpyCatcher-Spa2 and Spa2 (strain Δspa2 transformed with pEK-SpyTagSpa2, pEK-SpyCatcherSpa2, and pEK-spa2, respectively) were cultured in glass-bottom dishes in M63 for 3 days. The dishes were then gently washed three times with PBS containing 0.5%Tween80 (PBST) and blocked in PBST with 1%BSA for 1 h. The group of SpyTag-Spa2 and Spa2 were incubated with purified GFP-SpyCatcher (SEQ ID NO: 35) , and the group of SpyCatcher-Spa2 and Spa2 were incubated with purified GFP-SpyTag (SEQ ID NO: 36) for 1 h at room temperature. All samples were washed three times with PBS buffer and imaged under a Nikon TI2-E inverted microscope.

Microsphere binding tests.

Spa2 strain or the Mfp3Spep-Spa2 strain was cultured in the M63 medium (3 mL) supplemented with 200 μL of green-fluorescent PS microsphere solution in 35-mm Petri dishes containing 2-3 glass slides for 3 days at 30℃ without shaking. The settled glass slides were then taken out and gently flushed to wash away the microspheres that had not adhered. The binding capacity of different samples was compared with water jetting at a constant discharge pressure of 5 psi for 15 s, performed on a pressure-flow controller (PG-MFC-8CH, PreciGenome) . Fluorescence images were recorded before and after the mechanical challenge with water jetting.

Mass spectrometry analysis.

1) Preparation of samples.

i) Preparation of Spa2cut and its mutant variants.

The pEK-spa2cut plasmid was transferred into Δspa2 by electroporation as described above to construct the strain Δspa2-pEK-spa2cut, which was used to express the monomer of Spa2cut (SEQ ID NO: 5) . Cells were inoculated into M63 medium with 25 μg mL-1 kanamycin and cultured for 3 days. Supernatants (200 mL) were collected and concentrated into 1 mL and then purified by nickel-affinity chromatography as previously described in the section of “Expression and purification of recombinant Spa2” . Spa2cut was eluted with 100 mM imidazole. The final purified protein was buffer-exchanged into 10 mM Tris-HCl, 100 mM NaCl, pH 8.0. A similar process was followed for expression and purification of Spa2cut mutant variants of E158Acut, D246Acut, E435Acut, and D246A/E435Acut.

ii) Isolation of ^CgCLP.

A method for isolation of SpaA pili of C. diphtheriae was adopted for the collection of ^CgCLP fibers. (see Kang, H. J. et al., The Corynebacterium diphtheriae shaft pilin SpaA is built of tandem Ig-like modules with stabilizing isopeptide and disulfide bonds. Proc. Natl. Acad. Sci. U.S. A. 106, 16967-16971, 2009) . Specifically, engineered ^CgCLP for polymer purification was produced by transforming the plasmid pEK-6his-spa2 into Δspa2ΔsrtA strain that lacks the spa2 gene and a housekeeping sortase encoding gene of srtA. Δspa2ΔsrtA-pEK-6his-spa2 strain enables secretion of the expressed 6His- ^CgCLP into the culture medium due to lacking sortase A. For the expression of 6His- ^CgCLP polymers, Δspa2ΔsrtA-pEK-6his-spa2 cells were inoculated into M63 medium with 25 μg mL ^-1 kanamycin and cultured for 3 days. For 6His- ^CgCLP purification, 500 mL supernatants were collected and concentrated to 5mL in buffer of 10 mM Tris-HCl, 100 mM NaCl, pH 8.0 and were purified by nickel affinity chromatography. The 6His- ^CgCLP polymers were eluted with 100 mM imidazole. Purified 6His- ^CgCLP fibers were then boiled in SDS sample buffer (6× Protein Loading Buffer, TransGen Biotech, DL101-02) and subjected to an SDS-PAGE gel. The high-molecular-weight ^CgCLP polymer bands were excised from Coomassie brilliant blue stained SDS-PAGE gels and prepared for intermolecular isopeptide bond identification.

2) Protein precipitation and digestion.

i) Samples processed for signal peptide identification.

The Spa2cut solution was precipitated with acetone (1: 4) and the pellets were dried using a Speedvac (room temperature) for 1-2 min. The pellets were then dissolved in 100 mM Tris-HCl (pH 8.5) supplemented with 8 M urea. 5mM TCEP (Thermo Scientific) for reduction and 10 mM iodoacetamide (Sigma) for alkylation were added and incubated at room temperature for 30 min. The protein mixture was diluted (1: 4) and digested overnight with chymotrypsin at 1: 40 (w/w) . The protease-digested peptide solution was desalted using a MonoSpinTM C18 column (GL Science, Tokyo, Japan) and dried with a SpeedVac.

ii) Samples processed for intramolecular covalent bond identification.

For the identification of the intramolecular isopeptide bond, the Spa2cut sample was processed following the same protocol as previously described for signal peptide identification. For the identification of the disulfide bond, the Spa2cut sample was processed following a similar protocol except that pepsin (Promega) was purposely added for digestion, while addition of 5mM TCEP (Thermo Scientific) was avoided to ensure that the disulfide bond, if any, was kept intact.

iii) Samples processed for inter-molecular isopeptide bond identification.

The Coomassie brilliant blue stained SDS-PAGE gel band of ^CgCLP fibers was excised into small pieces and washed in water, followed by 50 mM NH ₄HCO ₃ in 50%acetonitrile and 100%acetonitrile. The sample was reduced with 10 mM TCEP (Thermo Scientific) in 100 mM NH ₄HCO ₃ at 55 ℃ for 1 h and alkylated with 55 mM iodoacetamide (Sigma) in 100 mM NH ₄HCO ₃ at 37 ℃ in the dark for 30 min. The gel pieces were then washed with 100 mM NH ₄HCO ₃ and 100%acetonitrile, and dried. The sample was primarily digested with 3 μg trypsin (Promega) in 50 mM NH ₄HCO ₃ at 37 ℃ overnight, then 1 μg of Asp-N endoproteinase (Promega) was added for another overnight incubation. Digested peptides were extracted twice with 50%acetonitrile containing 5%formic acid.

3) LC/tandem MS (MS/MS) analysis of peptide.

The protease-digested peptides were analyzed by LCMS/MS using an Easy-nLC 1200 nano HPLC (Thermo Scientific) hybrid of a Q Exactive Orbitrap mass spectrometer (Thermo Scientific) system. Peptides were separated on a 30 cm-long pulled-tip analytical column (75 μm ID packed with ReproSil-Pur C18-AQ 1.9 μm resin, Dr. Maisch GmbH) in 0.1%aqueous formic acid (buffer A) and 0.1%formic acid in 80%acetonitrile (buffer B) at 55 ℃ with a flow rate of 300 nl/min using a 120 min linear gradient. A cycle of one full-scan MS spectrum (m/z 300-1800) was acquired, followed by top 20 MS/MS events, sequentially generated on the first to the 20th most intense ions selected from the full MS spectrum at a 30%normalized collision energy. The peptide validation for signal peptide identification was automatically performed in PEAKS AB v2.0 (Tran, N. H. et al. Complete de novo assembly of monoclonal antibody sequences. Sci. Rep. 6, 1-10, 2016) . Peptides containing isopeptide bonds were identified using plink2 software (pFind Team, Beijing, China) (Lu, S. et al. Mapping native disulfide bonds at a proteome scale. Nat. Methods 12, 329-331, 2015) . Peptides from the digestion of ^CgCLP containing the intermolecular isopeptide bond were manually analyzed from MS/MS data according to the theoretical m/z of production of predicted peptides containing the isopeptide linkage.

4) Accurate molecular masses determination.

Accurate molecular masses were determined for Spa2cut and its variants by Quadrupole time-of-flight mass spectrometry (Agilent 6550 iFunnel Q-TOF) using a linear gradient by an HPLC system. The raw MS data were deconvoluted by the BioConfirm algorithm integrated into MassHunter software.

Enzymatic activity assay.

The enzyme activity of cellulases against carboxymethylcellulose sodium salt (CMC-Na, Sigma, USA) was detected using a 3, 5-dinitrosaloculoc acid (DNS) assay (Dong, C. et al. Engineering Pichia pastoris with surface-display minicellulosomes for carboxymethyl cellulose hydrolysis and ethanol production. Biotechnol. Biofuels 13, 1-9, 2020) . Cells of TrEgl-Spa2_SdBgl-Spa2 (C003 strain) and TrEgl_SdBgl (C004 strain) at 10 OD were concentrated to 500 μL and incubated in 2 mL 50 mM acetic acid (pH 4.8) with 1% (w/v) CMC-Na substrate at 50 ℃ for 30 min. The reaction was stopped by adding DNS and boiling for 10 min; reducing sugars were detected at 540 nm. One unit of enzyme activity was defined as the amount of cells that released 1 μmol of glucose from cellulose at 50 ℃ in 1 min.

Quantitative analysis of lycopene by HPLC.

The lycopene producing plasmid of pZ9-dxs_crtEBI was transferred into strain TrEgl_SdBgl to construct the recombinant strains of C003 and C004 for the utilization of cellulose to produce lycopene. C003 and C004 strains were inoculated into 10 mL BHI with 25 μg mL ^-1 kanamycin and 7.5 μg mL ^-1 chloramphenicol, and cultured for 12 h at 30 ℃ at a stirring speed at 200 rpm. Then cells were transferred into 50 mL modified M63 medium (15.6 g L ^-1 M63 broth, supplemented with 1 mM MgSO ₄, 2% (wt/vol) CMC-Na) with initial OD600 of 3 for 2 days at 30℃ and 1 mM IPTG was added or not.

The quantitative analysis of lycopene production was carried out according to Li, C. et al. (Heterologous production of α-Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) . IPTG induced and un-induced cells (1 mL) were separately collected into 2 mL tubes of lysing matrix Y (M. P. Biomedicals) by centrifugation at 12, 000 rpm for 5 min. The pellets were resuspended in a 60%hexane and 40%acetone mixture and lysed using the FastPrepR-24 5G bead beating grinder and lysis system (M. P. Biomedicals) for lycopene extraction. The lysis condition is 30 s once with a 1 min interval, for 6 times.

The samples were centrifuged at 14, 000 rpm for 10 min at 4 ℃, and the resulting supernatant was then transferred to brown 2 mL screw cap glass vials (Agilent Technologies) and directly subjected to HPLC analysis. The quantification of lycopene was performed on an Agilent 1260 series HPLC system (Agilent Technologies) using YMC Carotenoid (250 × 4.6 mml. D., YMC) and detected via a diode array detector (DAD) at 450 nm. For separation, binary gradient elution was applied to change the eluent from 100%eluent A of methanol/Methyl tert-butyl ether/water (81/15/4) to 100%eluent B of methanol/Methyl tert-butyl ether/water (7/90/3) over 90 min at a flow rate of 1.0 mL ·min-1 at 20 ℃ with an injection volume of 10 μL (eluent A for 2min, eluent B 2min-95min, and eluent A 95min-100min.

Example 2. Probing the molecular assembly of the CLP structure in C. glutamicum

This Example was carried out to investigate the CLP assembly in the industrial workhorse C. glutamicum ATCC 14067 (referred to as ^CgCLP) .

2.1. Determination of the essential building block in CLP assembly in C. glutamicum

The industrial workhorse C. glutamicum is a ‘generally recognized as safe’ (GRAS) strain with well-established gene editing tools that is widely used for the industrial-scale production of valued products such as amino acids, diamines, terpenoids, and other chemicals (Zhao, N. et al. Development of a Transcription Factor-Based Diamine Biosensor in Corynebacterium glutamicum. ACS Synth. Biol. 10, 3074-3083, 2021; and Xu, X. et al., Ledesma-Amaro, R. &Liu, L. Microbial chassis development for natural product biosynthesis. Trends Biotechnol. 38, 779-796, 2020) .

In C. glutamicum, we predicted that the CLP BGC contains three pilin-encoding genes, spa1, spa2, and spa3, as well as two sortase coding genes of srtC1, and srtC2 (Fig. 6) , which is similar to the SpaH-type (arelatively less well-studied pili type) CLP gene cluster in the pathogenic C. diphtheriae (Mandlik, A. et al., Pili in Gram-positive bacteria: assembly, involvement in colonization and biofilm development. Trends Microbiol. 16, 33-40, 2008) .

Upon TEM and AFM imaging, no filamentous structures at the C. glutamicum cell surface upon deletion of the CLP BGC, while the filamentous structure phenotype was rescued upon complementing CLP BGC (Fig. 7) , indicating that ^CgCLP BGC are responsible for fiber formation.

The composition of ^CgCLP was determined with polyclonal antibodies against Spa1, Spa2, and Spa3, respectively. TEM images of the ^CgCLP with immunogold labelling showed that the ^CgCLP fibers comprise two minor pilins of Spa1 and Spa3 and a major pilin of Spa2 (Fig. 8) . TEM and AFM imaging used to assess the specific roles of the three pilins in the ^CgCLP assembly showed that the cells, which were defective for Spa1 (Δspa1 strain) , Spa3 (Δspa3 strain) , or both (Δspa1Δspa3 strain) , could still produce fibers (Fig. 7) . In contrast, cells lacking Spa2 (Δspa2) could not produce any fiber, and overexpression of Spa2 (Spa2) promoted the formation of abundant long fibers throughout the cell surface (Fig. 7) .

TEM and AFM images also showed that cells lacking both SrtC1 and SrtC2 (ΔsrtC1ΔsrtC2) completely blocked fiber formation (Fig. 9) .

Collectively, it was verified that the major pilin of Spa2 protein is an indispensable building block for the sortase-catalyzed ^CgCLP assembly and production, similar to the role of the well studied SpaA in pili assembly in the pathogenic C. diphtheriae. Despite this similarity, the wide variation in the size and sequences of major pilin protein from diverse Gram-positive pathogens ₃₅ makes it challenging to predict whether the structural principles characterized for the CLP of other hosts are also appliable in ^CgCLP.

2.2. Isopeptide bond and disulfide bond during the ^CgCLP assembly

Having identified the Spa2 major pilin as the essential building block for ^CgCLP fiber production, experiments were performed to identify the formation of intermolecular isopeptide bond, disulfide bond, or intramolecular isopeptide bond during the ^CgCLP assembly.

First, the purified ^CgCLP polymers were excised from Coomassie blue-stained SDS-PAGE gels (Fig. 10) and then digested in-gel with trypsin (Promega) and AspN endoproteinase (Promega) . Liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to analyze the digestion products, and verify the presence of the intermolecular isopeptide bond (bond formation results in the elimination of a water molecule and thus a slight decrease of molecular weight) . Specifically, the peptide peak with m/z 832.9 ₂₊ (Fig. 11 and Table 2) suggested that the major pilin of Spa2 was cross-linked between K194 in the N-terminus of Spa2 _i and T477 in the C-terminus of Spa2 _i+1 (Lys194-Thr477) .

Table 2. Daughter ions produced during MS/MS of a peptide at m/z 832.92+ containing Lys194-Thr477 intermolecular isopeptide bond of Spa2

^a Monoisoptic masses of observed ions.

^b Theoretical ions. Monoisotopic masses were calculated using the Fragment Ion Calculator.

(http: //db. systemsbiology. net: 8080/proteo micsToolkit/FragIonServlet. html) .

^c Difference between observed ion mass and theoretical ion mass.

Quadrupole time-of-flight mass spectrometry analysis of a recombinant variant of Spa2 (Spa2 _cut, SEQ ID NO: 5) (Fig. 12) secreted by C. glutamicum cells indicated a molecular weight of 46, 504.6 Da (Fig. 13) , which is about 54.7 Da less than the expected value calculated from the secreted Spa2 ^cut amino acid sequence. This detected mass is consistent with the loss of three NH ₃ units and two H ₂ units, indicating the formation of three intramolecular isopeptide bonds (loss of one molecule of ammonia, ≈17 Da) and two disulfide bonds (loss of two hydrogen atoms, ≈2 Da) in Spa2.

2.3. The structural features of major pilin Spa2

The X-ray crystal structure of Spa2 at

resolution (PDB ID: 7WOI) (Fig. 15 and Table 3) by the molecular replacement method using PHASER with the coordinates predicted by Alphafold Colab as a template (Fig. 16a) .

Table 3. Data collection and refinement statistics

^a Values in parentheses correspond to the outermost shell of data.

^b R _merge=ΣΣi|I (h) i-<I (h) >|/ΣΣi|I (h) i|, where <I (h) > is the mean equivalent intensity.

^c R _work=Σ|Fo-Fc|/Σ|Fo|, where Fo and Fc are the observed and calculated structure factor amplitudes, respectively.

^d R _free=Σ|Fo-Fc|/Σ|Fo|. This value was calculated using a test data set comprising 5%of the total data that was randomly selected from the observed reflections.

Spa2 is arranged in three tandem Ig-like domains, including N-domain (residues 36-197, pink) , M-domain (residues 198-343, blue) , and C-domain (residues 344-469, green) , giving an elongated molecule

in length (Fig. 15) . These three tandem Ig-like domains of Spa2 are similar to the major pilin of SpaA (PDB ID: 3HR6, root-mean-square deviation (RMSD)

over 270 alpha-carbon (C _α) atoms, Fig. 16b) and SpaD (PDB ID: 4HSS, RMSD

over 311 C _α atom, Fig. 16c) from human pathogen C. diphtheriae (Kang, H. J. et al., 2009 above, and Kang, H. J. et al. A slow-forming isopeptide bond in the structure of the major pilin SpaD from Corynebacterium diphtheriae has implications for pilus assembly. Acta Crystallogr. D Biol. Crystallogr. 70, 1190-1201, 2014) . The crystals of the Spa2 adopt head-to-tail stacking such that the N-domain in Spa2 _i abuts against the C-domain in Spa2 _i+1 (Fig. 15) , which is consistent with the result that the Spa2 monomers joined via the intermolecular isopeptide bond between K194 in the N-terminus of Spa2 _i and T477 in the C-terminus of Spa2 _i+1 (Fig. 11) . Together these results imply that the biological assembly of ^CgCLP fiber occurs via the head-to-tail polymerization of Spa2 monomers.

Furthermore, interpretation of electron density maps clearly showed three common isopeptide bonds and two unique disulfide bonds in the structure of Spa2 (Fig. 17) . Formation of multiple covalent bonds was also verified by LC-MS/MS analysis of the pepsin-digested Spa2 ^cut products (Fig. 18) . The isopeptide bonds linked Lys57 and Asn195 with catalytic Glu158 in the N-domain; Lys203 and Asn318 with catalytic Asp246 in the M-domain; and Lys355 and Asn466 with catalytic Glu435 in the C-domain (Fig. 17a) . Notably, the presence of three intramolecular isopeptide bonds distributed in three domains of major pilin Spa2 in C. glutamicum is similar to the feature of the major pilin SpaD from the pathogenic C. diphtheriae (Kang, H. J. et al., 2014 above) , but is quite different from the major pilin SpaA from the pathogenic C. diphtheriae lacking isopeptide bonds in the N-terminal domain (Kang, H.J. et al., 2009 above) . In addition, two disulfide bonds were formed in the N-domain between Cys97 and Cys128 and the C-domain between Cys380 and Cys432, respectively (Fig. 17b) . Notably, the presence of two disulfide bonds in Spa2 is very unique in comparison with other major pilins in human pathogens, such as Spy0128 (PDB ID: 3B2M) from Streptococcus pyogenes ₃₇ and BcpA (PDB ID: 3KPT) from Bacillus cereus ₃₈ lacking disulfide bond, and the SpaA and SpaD from C. diphtheriae containing only one disulfide bond in the C-terminal domain (Kang, H. J. et al., 2009 and 2014 above) .

2.4. The intermolecular polymerization between Spa2 monomers

Functional assays with various Spa2 mutant variants expressed in ΔSpa2 was conducted to explore their roles for ^CgCLP formation in vivo. Indeed, mutagenesis experiments with K194A and LPLTG _474LALAA478 variants blocked ^CgCLP production, confirming that both Lys194 in the N-domain and LPLTG _474-478 in the C-domain participate in Spa2 monomer polymerization (Figs. 19 and 20) .

A series of Spa2 variants were generated to further test how the intramolecular isopeptide bond and disulfide bond in Spa2 monomer contribute to the formation and stabilization of ^CgCLP.

First, variants of Spa2 with alanine substitutions of Glu158, Asp246, and Glu435 (E158A, D246A, E435A) that originally catalyzed Lys-Asn isopeptide bond formation in each domain, were constructed in the Δspa2 strain (the substitutions were introduced to the Spa2-encoding sequence in pEK-Spa2, respectively) . The LC-MS/MS analysis, bio-imaging characterization, and ELISA quantification analysis showed that E158A, D246A, and E435A abolished one or two intramolecular isopeptide bonds (Fig. 21a-c) , none of which had any obvious impacts on ^CgCLP production (Figs. 19 and 20) . Only the double mutation variants of D246A/E435A abolished all three intramolecular isopeptide bonds in Spa2 (Fig. 21d) , and produced only 44.9%of ^CgCLP compared to Spa2 cells (Fig. 20) .

Second, the variants of C97A and C380A abrogated the disulfide bonds in the N-and C-domains of Spa2, respectively. TEM assay showed the influence of the mutations in Spa2 on the ^CgCLP formation (Fig. 19) . ELISA assay showed a dramatic reduction in the extent of ^CgCLP formation upon Spa2 variants (Fig. and 20) . We also found that ^CgCLP formation was completely blocked in a C97A/C380A double mutant variant (Figs. 19 and 20) .

Taken together, these results suggest that both isopeptide and disulfide bonds contribute to the formation of CLP in C. glutamicum, with the disulfide bond appearing as the most important element for stabilization of the ^CgCLP structure.

Example 3. Engineering ^CgCLP as a programmable extracellular protein scaffold

The CLP structure may serve as an attractive building block for various applications because these extracellular fibers have extraordinarily high tensile strength owing to their extensive inter-and intra-molecular isopeptide bonds. Moreover, as an extracellular matrix, CLP fibers can be conveniently and reliably positioned directly outside cells. Finally, their proteinaceous nature makes them potentially amenable for elaboration using genetic engineering.

This Example was carried out to determine suitable fusion sites to append peptides/proteins to Spa2. According to both the Spa2 crystal structure and the characterization of specific functional domains within Spa2 observed in Example 2, four different positions to test the fusion of a protein-of-interest (POI) , with one site in the N-terminus of Spa2 and three sites in the M-domain lacking a disulfide bond (Fig. 22) .

The CLP-defective strain C. glutamicum ATCC 14067 Δspa2 (Δspa2) with abrogated extracellular ^CgCLP formation was transformed with the exogenous expression plasmid (pEK-E1/mCherry-spa2, pEK-E2/mCherry-spa2, pEK-E3/mCherry-spa2, or pEK-E4/mCherry-spa2) for Spa2 fusion protein expression to test the restored ^CgCLP fiber production.

The fluorescent reporter protein mCherry was fused at the interrogated positions for generating functional fusion proteins (SEQ ID NOs: 8-11) while retaining the sortase-catalyzed covalently-linked pili formation capacity of Spa2. As shown in Fig. 22, four sites were tested for mCherry addition/insertion, including Q35 (E1) at the N-terminus of Spa2, G215 in loop 1 of the M-domain (E2) , G236 in the loop 2 of the M-domain (E3) , and G336 in the β23-sheet of the M-domain (E4) . Quantitative analysis (ELISA) showed that the cells expressing each of the fusion proteins fluoresced and enabled the formation of fiber (Fig. 23a) .

Confocal microscopy showed that mCherry fluorescence was detected for all engineered variants, with fluorescence evident at extracellular sites on the C. glutamicum cells (Fig. 23b) , consistent with TEM imaging results showing that mCherry-functionalized ^CgCLP fibers formed on the surface of cells (Fig. 24) . Combining the results of ELISA, fluorescence intensity, confocal microscopy and TEM imaging, it is concluded that both E1 and E2 are more ideal sites for fusion of a functional POI yielding abundant amount of functionalized ^CgCLP fibers.

A variety of Spa2 fusion proteins (six POIs, each fused at the E1 position via a linker of SEQ ID NO: 23) (see Fig. 25) were expressed by Δspa2 strains transformed with plasmids pEK-6his-spa2, pEK-SpyTagSpa2, pEK-Mfp3Spep-Spa2, pEK-SpyCatcher-Spa2, pEK-Venus-Spa2, and pEK-CcEgl-Spa2, respectively. All of these fusion proteins were successfully expressed, secreted, and formed ^CgCLP (Fig. 26) .

TEM images showed that Ni-NTA-decorated AuNPs were anchored onto 6His-Spa2 ^CgCLP (Fig. 27a) . Confocal microscopic images showed the green fluorescence emitted from SpyTag-Spa2 ^CgCLP cells to which SpyCatcher-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27b) . Confocal microscopic images show the green fluorescence emitted from SpyCatcher-Spa2 ^CgCLP cells to which SpyTag-EGFP protein binding partners were covalently attached via Spytag-SpyCatcher interaction pairs (Fig. 27c) . Confocal microscopic images show the green fluorescence emitted from Venus-Spa2 ^CgCLP cells (Fig. 27d) . Fluorescent images and quantification analysis of the immobilization ability of Mfp3Spep-Spa2 ^CgCLP cells. Immobilized microspheres (left) on the substrates before (top) and after (bottom) challenge with water jetting at a constant discharge pressure of 5 psi. Quantification analysis of the relative capabilities of different cells (right) with immobilized PS microspheres on the substrate (Fig. 27e) . The degradation of carboxymethyl cellulose into glucose by CcEgl-Spa2 ^CgCLP cells was detected by a 3, 5-dinitrosaloculoc acid (DNS) assay (Fig. 27f) .

These findings indicate that the sortase-mediated polymerization is not disrupted by fusion of POIs to Spa2 monomers, especially fusion of POIs in the N-terminus of Spa2 and loop 1 of the M-domain, and various types and sizes of proteins can be engineered into a generally programmable extracellular protein scaffold of ^CgCLP.

To assess whether the programmable ^CgCLP extracellular protein scaffold can support the co-assembly of multiple heterologous proteins, we conducted experiments in the Δspa2 strain with the well-established spilt-Venus system (see Fig. 28, and Kodama, Y. &Hu, C. -D. An improved bimolecular fluorescence complementation assay with a high signal-to-noise ratio. BioTechniques 49, 793-805, 2010) .

Δspa2 strain was transformed with plasmids pEK-N-Ven-Spa2, pEK-C-Ven-Spa2 and pEK-N-Ven-Spa2_C-Ven-Spa2, respectively, Δspa2 strain transformed with pEK-N-Ven_C-Ven was used as a control.

As indicated by TEM images of the transformed cells, co-assembly of two distinct proteins did not disturb ^CgCLP assembly (Fig. 29) . The fluorescence intensity assay and confocal microscopy imaging showed that the highest fluorescence intensity was observed in cells where the split-Venus components were simultaneously fused with Spa2 (Fig. 30) . Almost no fluorescence was detected when only N-Ven and C-Ven were simultaneously secreted without anchoring to the ^CgCLP scaffold (Fig. 30) . These results indicated that the split components can be co-assembled in the extracellular ^CgCLP scaffold.

Example 4. Engineering living materials to degrade cellulosic biomass into valued chemicals

This Example was carried out to verify the co-assembly of multiple cellulases into a catalytic cascade for extracellular degradation of cellulose into glucose to support production of specific chemicals of interest (e.g., lycopene) in C. glutamicum ATCC 14067 Δspa2 (Fig. 31) .

In particular, the endo-1, 4-β-glucanase from Trichoderma reesei (TrEgl, SEQ ID NO: 19) and β-glucosidase from Saccharophagus degradans (SdBgl, SEQ ID NO: 21) were co-assembled in the ^CgCLP fiber; these two enzymes are known to work in concert to degrade cellulose into glucose via enzyme cascade reactions.

Lycopene can be produced via the methylerythritol phosphate (MEP) pathway by engineered C. glutamicum (Li, C. et al. Heterologous production of α-Carotene in Corynebacterium glutamicum using a multi-copy chromosomal integration method. Bioresour. Technol. 341, 125782, 2021) . A C001 chassis (Δspa2Δdec) with deletion of both spa2 gene (Δspa2, for the abrogation ^CgCLP formation) and a 43, 702 bp region between CEY17_RS03380 and CEY17_RS03560 (Δdec, for accumulation of the precursor for lycopene production) (Heider, S.A. et al., Carotenoid biosynthesis and overproduction in Corynebacterium glutamicum. BMC Microbiol. 12, 1-11, 2012) was constructed as described in Example 1. The basal lycopene-producing strain C002 was constructed by transforming strain C001 with plasmid pZ9-dxs_crtEBI for IPTG-inducible expression of the dxs gene and crtEBI gene cluster. Then, the C002 strain was transformed with plasmids pEC-TrEgl-Spa2_SdBgl-Spa2, and pEC-TrEgl_SdBgl, respectively, resulting in the strains C003 and C004.

As shown in Fig. 32, the C003 strain co-assembled TrEgl and SdBgl in ^CgCLP fiber on the cell surface (Fig. 32a) and enabled the degradation of carboxymethylcellulose sodium (CMC-Na, the ether derivate of cellulose) in medium, based on the medium turning from a viscous gel to a thin solution (Fig. 32b) . Strain C004, which only simultaneously secreted both TrEgl and SdBgl without anchoring to the ^CgCLP scaffold did not show similar behavior.

The extracellular activity of cellulase assays showed that the C003 strain produced a 4-fold higher yield of reducing sugars than strain C004 (Fig. 32c) . As shown in Fig. 32d, the lycopene production titer in C003 strain reached 0.83 mg/g dry cell weight (DCW) after 36 h culture in a M63 medium with CMC-Na as the sole carbon resource. SEQUENCES

SEQ ID NO: 1 Wildtype Spa2

SEQ ID NO: 2 Wildtype Spa2

SEQ ID NO: 3 Wildtype Spa2

SEQ ID NO: 4 Wildtype Spa2

SEQ ID NO: 5 Spa2 ^cut

SEQ ID NO: 6 Recombinant Spa2

SEQ ID NO: 7 mCherry

SEQ ID NO: 8 E1/mCherry-spa2

SEQ ID NO: 9 E2/mCherry-spa2

SEQ ID NO: 10 E3/mCherry-spa2

SEQ ID NO: 11 E4/mCherry-spa2

SEQ ID NO: 12 6his-spa2

SEQ ID NO: 13 SpyTagSpa2

SEQ ID NO: 14 Mfp3Spep-Spa2

SEQ ID NO: 15 SpyCatcher

SEQ ID NO: 16 Venus

SEQ ID NO: 17 CcEgl

SEQ ID NO: 18 N-Ven

SEQ ID NO: 19 TrEgl

SEQ ID NO: 20 C-Ven

SEQ ID NO: 21 SdBgl

SEQ ID NO: 22 Linker 1 (GS)

SEQ ID NO: 23 Linker 2 (C10)

SEQ ID NO: 24 N-mCherry-C

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 25 Spa2 promoter

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 26 dxs

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 27 crtE

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 28 crtBI

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 29 lacI

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 30 Ptac promoter

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 31 Spa1-Ab

[Rectified under Rule 91, 16.01.2023]
SEQ ID NO: 32 Spa2-Ab

SEQ ID NO: 33 Spa3-Ab

SEQ ID NO: 34 HRV3c

SEQ ID NO: 35 GFP-SpyCatcher

SEQ ID NO: 36 GFP-SpyTag

SEQ ID NO: 37 SpyTag

SEQ ID NO: 38 Mfp35

Claims

A fusion polypeptide comprising a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, and wherein the carrier protein is a pilin of covalently-linked pili (CLP) from a microorganism.
The fusion polypeptide of claim 1, wherein the microorganism is a gram-positive bacterium, such as a bacterium selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
The fusion polypeptide of claim 1 or 2, wherein the carrier protein is a major pilin.
The fusion polypeptide of any of claims 1-3, wherein the polypeptide of interest is fused to N or C terminus of the carrier protein.
The fusion polypeptide of claim 4, wherein the polypeptide of interest is fused to the N terminus of the carrier protein.
The fusion polypeptide of any of claims 1-3, wherein the polypeptide of interest is inserted into the carrier protein.
The fusion polypeptide of claim 6, wherein the polypeptide of interest is inserted into a loop in the carrier protein.
The fusion polypeptide of claim 6, wherein the carrier protein is the major pilin from Corynebacterium glutamicum, and wherein the polypeptide of interest is inserted into the M domain of the major pilin.
The fusion polypeptide of any claims 1-8, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
The fusion polypeptide of any claims 1-9, wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
The fusion polypeptide of any of claims 1-10, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, and wherein the polypeptide of interest is fused to the N terminus of carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
A polynucleotide encoding the fusion polypeptide of any of claims 1-11.
A vector comprising the polynucleotide of claim 12.
A host cell comprising the polypeptide of any of claims 1-11, the polynucleotide of claim 12 or the vector of claim 13.
A recombinant cell comprising a polynucleotide encoding a fusion polypeptide, wherein the fusion polypeptide comprises a carrier protein and a polypeptide of interest, wherein the polypeptide of interest is fused to a terminus of the carrier protein or inserted into the carrier protein, wherein the carrier protein is a pilin of CLP, and wherein the recombinant cell is capable of expressing the polynucleotide and displaying a modified CLP comprising the fusion polypeptide.
The recombinant cell of claim 15, wherein the recombinant cell is a gram-positive bacterium.
The recombinant cell of claim 15 or 16, wherein the bacterium is selected from Corynebacterium glutamicum, Bifidobacterium breve, Lactococcus lactis, Lacticaseibacillus paracasei, Bacillus thuringiensis, and Lacticaseibacillus paracasei; preferably, Corynebacterium glutamicum.
The recombinant cell of any of claims 15-17, wherein the carrier protein is a major pilin.
The recombinant cell of any of claims 15-18, wherein the polypeptide of interest is fused to N or C terminus of the carrier protein.
The recombinant cell of claim 19, wherein the polypeptide of interest is fused to the N terminus of the carrier protein.
The recombinant cell of any of claims 15-18, wherein the polypeptide of interest is inserted into the carrier protein.
The recombinant cell of claim 21, wherein the polypeptide of interest is inserted into a loop in the carrier protein.
The recombinant cell of claim 22, wherein the carrier protein is a major pilin from Corynebacterium glutamicum, and wherein the polypeptide of interest is inserted into the M domain of the major pilin.
The recombinant cell of any claims 15-23, wherein the carrier protein comprises amino acids 36 to 509 of SEQ ID NO: 1, amino acids 34 to 520 of SEQ ID NO: 2, amino acids 34 to 530 of SEQ ID NO: 3, or amino acids 34 to 519 of SEQ ID NO: 4.
The recombinant cell of any claims 15-24, wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between positions corresponding to G215 and L216 of SEQ ID NO: 1, between positions corresponding to G236 and E237 of SEQ ID NO: 1, or between positions corresponding to G336 and T337 of SEQ ID NO: 1.
The recombinant cell of any of claims 15-25, wherein the carrier protein comprises amino acids 36-509 of SEQ ID NO: 1, and wherein the polypeptide of interest is fused to the N terminus of the carrier protein, or is inserted between G215 and L216, between G236 and E237, or between G336 and T337 of SEQ ID NO: 1.
The recombinant cell of any of claims 15-26, wherein the recombinant cell comprises two or more polynucleotide respectively encoding two or more fusion polypeptides each comprising a different polypeptide of interest, and the modified CLP comprises the two or more polypeptides.
A method of preparing the recombinant cell of any of claims 15-27, comprising introducing a polynucleotide of claim 12 or the vector of claim 13 into a host cell.
The method of claim 28, wherein the host cell is a bacterium having a native CLP.
The method of claim 28 or 29, wherein the host cell is a gram-positive bacterium.
The method of any of claims 28-30, wherein the method comprises a step of knocking out the native major pilin of the host cell.
A modified covalently-linked pili (CLP) comprising a plurality of the fusion polypeptides of any of claims 1-11.
A method of preparing a modified CLP comprising the steps of

a) providing the fusion polypeptide of any of claims 1-11; and

b) providing an activity of sortase.
The method of any of claim 33, wherein the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
The method of claim 33 or 34, wherein the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.
The method of any of claims 33-35, wherein the method is an in vitro method.
A polynucleotide construct or a combination of polynucleotide constructs comprising the polynucleotide of claim 12, and one or more polynucleotides encoding a sortase.
The polynucleotide construct or a combination of polynucleotide constructs of claim 37, wherein the sortase is encoded by a gene which is identified to be present in the same cluster with the gene encoding the carrier protein in nature.
The polynucleotide construct or a combination of polynucleotide constructs of claim 37 or 38, wherein the sortase is class C type sortase, such as srtC1 and/or srtC2, preferably wherein the srtC1 and srtC2 are encoded by genes from the same cluster.