[go: up one dir, main page]

WO2025021112A1 - 开发mhc新抗原的工程化细胞 - Google Patents

开发mhc新抗原的工程化细胞 Download PDF

Info

Publication number
WO2025021112A1
WO2025021112A1 PCT/CN2024/107278 CN2024107278W WO2025021112A1 WO 2025021112 A1 WO2025021112 A1 WO 2025021112A1 CN 2024107278 W CN2024107278 W CN 2024107278W WO 2025021112 A1 WO2025021112 A1 WO 2025021112A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
chain
hla
cell
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/107278
Other languages
English (en)
French (fr)
Inventor
姜威
王建铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jwe Beijing Science Technology Inc
Original Assignee
Jwe Beijing Science Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jwe Beijing Science Technology Inc filed Critical Jwe Beijing Science Technology Inc
Publication of WO2025021112A1 publication Critical patent/WO2025021112A1/zh
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P37/00Drugs for immunological or allergic disorders
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids

Definitions

  • the present application relates to cell surface display technology, and in particular to an improved technical solution for displaying MHC-II and its trimer with antigen peptide on the surface of yeast cells.
  • MHC-II major histocompatibility complex class II
  • the peptide binding groove of MHC-II formed by the ⁇ 1 and ⁇ 1 domains contains several pocket-like spatial structures, which like to accommodate specific side chains of the "anchor" residues on the register of antigenic peptides. Therefore, the peptide register that MHC-II can accommodate has a certain degree of pan-activity, in other words, it can bind and present many different antigenic peptides, but the special convex and concave spatial structure of the anchor-pocket limits the types of peptides presented by a given MHC-II to a certain extent.
  • the human MHC or human leukocyte antigen (HLA)
  • HLA human leukocyte antigen
  • This polymorphism is associated with the presence of multiple naturally occurring HLA haplotypes, gene subtypes (DR, DQ, DP), and thousands of alleles.
  • DR, DQ, DP gene subtypes
  • DP gene subtypes
  • the MHC proteins translated and expressed by each allele can bind and present multiple peptide fragments. Therefore, studying the peptide binding properties of various MHC-IIs and determining the epitope maps of various peptide antigens are extremely challenging problems, requiring high-throughput and high-content gene protein engineering methods.
  • MHC-II molecules are purified from different expression systems (such as B cell lines, insect cells, yeast, or E. coli), and then the binding strength of these molecules with different peptides is analyzed; other studies focus on evaluating the binding of peptides to MHC-II expressed on the surface of APCs.
  • the peptide fragments used in these methods are either solid-phase synthesized by chemical methods or produced by genetic methods such as phage display.
  • eukaryotic cell surface display technology with the advantages of directed evolution can be used to express specified MHC-II alleles for peptide binding assays on the cell surface.
  • yeast has the advantages of simple molecular cloning and expression of eukaryotic post-translationally modified proteins, and therefore, it is a preferred platform for the development of this approach.
  • the present application provides an engineered cell that can display a major histocompatibility complex and a trimer thereof with an antigenic peptide.
  • the engineered cells can be used for accurate and efficient high-throughput screening of MHC-II target peptides and affinity analysis of MHC-II and target peptides.
  • a cell comprising a major histocompatibility complex (MHC) and a single-chain domain, wherein the complex comprises an ⁇ chain and a ⁇ chain, wherein:
  • MHC major histocompatibility complex
  • the beta chain is linked to a second protein binding domain to form a second fusion protein
  • the single-chain domain non-covalently binds to the ⁇ chain and the ⁇ chain
  • the first fusion protein binds to the second fusion protein to form the complex
  • the complex and the single-chain domain form a trimer
  • the first protein binding domain binds to the second protein binding domain to enhance or promote the formation of a trimer formed between the complex and the single-chain domain.
  • a cell according to item 1 wherein the first protein binding domain is non-covalently bound to the second protein binding domain; or the first protein binding domain is covalently linked to the second protein binding domain, and the covalent bond is preferably a disulfide bond.
  • a cell according to item 9 wherein the amino acid sequence of Aga2p is as shown in SEQ ID NO:1.
  • a cell according to item 1 wherein the single-chain domain is a peptide segment of 9 to 30 amino acids in length, and the single-chain domain contains at least one continuous registered peptide (register) of 9 amino acids in length that can be specifically recognized by the complex or a registered peptide variant (variant) that can be bound by the complex, preferably, the registered peptide or its variant is selected from the amino acid sequence shown in any one of SEQ ID NO: 20-38.
  • a cell according to item 11 wherein the amino acid sequence of the single-chain domain is as shown in any one of SEQ ID NO: 2-6 or 46-77.
  • first protein binding domain is a first leucine zipper domain
  • second protein binding domain is a second leucine zipper domain
  • first leucine zipper domain and the second leucine zipper domain that form a complete leucine zipper can be used interchangeably in the first fusion protein and the second fusion protein;
  • the first protein binding domain is an FcA domain
  • the second protein binding domain is an FcB domain.
  • the FcA domain and the FcB domain that form the FcAB dimer can be used interchangeably in the first fusion protein and the second fusion protein.
  • the FcA domain comprises only the first CH3 domain
  • the FcB domain comprises only the second CH3 domain
  • the single-chain domain comprises a peptide, or a mutant of a peptide, or a library of mutants of a peptide, or a mixture of peptides.
  • the first protein binding domain is covalently bound to the second protein binding domain, and preferably the covalent binding is through a disulfide bond.
  • the first protein binding domain and the second protein binding domain form an FcAB domain
  • the FcA domain comprises both a first CH2 domain and a first CH3 domain
  • the FcB domain comprises both a second CH2 domain and a second CH3 domain
  • first CH2 domain and the first CH3 domain of the first protein-binding domain are connected via a linker
  • second CH2 domain and the second CH3 domain of the second protein-binding domain are connected via a linker
  • the modification can achieve any of the following:
  • FIG. 3 is similar to FIG. 2 and shows a schematic diagram of the target gene region of the selectable plasmid, wherein the target gene is the DR4 allele (HLA-DRA*01:01/DRB1*04:01).
  • FIG. 6 is similar to FIG. 5 and shows a schematic diagram of the target gene region of the selectable plasmid, wherein the target gene is the DR4 allele (HLA-DRA*01:01/DRB1*04:01).
  • FIG8 shows immunofluorescence labeling and flow cytometry of surface DR or DQ protein conformations in four yeast strains, which secrete only HLA-II (background control), display LZ modification (note HLA-II-LZ), display FcAB modification (note HLA-II-Fc), and display FcAB modification with AGA2 SP signal peptide to enhance extracellular delivery (note HLA-II-Fc (w/AGA2 SP).
  • the flow cytometry results are displayed in histogram format. Among them, the median fluorescence intensity (MFI) of HLA-II ⁇ / ⁇ alleles and DR or DQ proteins under each condition is shown in the figure.
  • MFI median fluorescence intensity
  • FIG9 shows immunofluorescence co-labeling and flow cytometry of surface HA tags and DR protein conformations of six types of yeast strains, including those that secrete only DR1 (no pep+DR background control), those that display only DR-specific peptide (+HA 308-316 no DR control), those that display both HA 308-316 peptide and secrete LZ-modified DR1-LZ, those that display both HA 308-316 peptide and secrete Fc-modified DR1-LZ with AGA2 SP signal peptide for enhanced extracellular delivery, those that display both HA 308-316 peptide and secrete LZ-modified DR4-LZ, and those that display both HA 308-316 peptide and secrete Fc-modified DR4-LZ with AGA2 SP signal peptide for enhanced extracellular delivery.
  • results of flow cytometry are shown in dot plots. (SSC scatter vs HA label) or histogram.
  • the dot graphs indicate the percentage of gated HA+ positive cell subpopulations, and the histograms indicate the median fluorescence intensity (MFI) of HA+ subpopulation DR ⁇ / ⁇ proteins.
  • DR display levels can be quantitatively analyzed as shown in Figure 8.
  • FIG10 shows immunofluorescence co-labeling and flow cytometry sorting of surface HA tags and DR protein conformations for parent yeast libraries (+pep) that display both HA 308-316 peptide variants and secrete DR1.
  • FIG11 shows yeast PCR and target peptide base sequencing of 10 single clones selected from the yeast sub-library that both display the HA 308-316 peptide variant and secrete DR1.
  • the results are compared with the library DNA primers and the wild-type gene and its peptide amino acid sequence. Except for #1 and #4, the sequencing peak graph has obvious overlapping peaks, indicating that a single yeast can accommodate multiple recombinant plasmids formed by transformation. Therefore, the dominant codon DNA of clone #5 and clone #7 peptide variants is the same as the product corresponding to the dominant codon, and it cannot be confirmed that the peptide variant display recombinant plasmids obtained by transformation of these two yeast clones are exactly the same.
  • Figure 12 shows the extraction of yeast plasmids, transformation of E. coli, and subsequent extraction of one to three E. coli plasmids corresponding to each yeast and sequencing of target peptide bases for the four selected monoclonal clones (#1, #7, #9, #10). The results are compared with the wild-type gene and its peptide amino acid sequence.
  • the selected peptide display plasmid was further transformed into yeast to construct a strain that not only displays the HA 308-316 peptide variant but also secretes DR1, and immunofluorescence co-labeling and flow cytometry of the surface HA tag and DR protein conformation were performed.
  • the flow cytometry results are displayed in dot plots (SSC scatter vs HA tag) or histograms.
  • the dot plots are annotated with the percentage of gated HA+ positive cell subpopulations
  • the histograms are annotated with the median fluorescence intensity (MFI) of the HA+ subpopulation DR ⁇ / ⁇ protein.
  • the DR display level can be quantitatively analyzed as shown in Figure 8.
  • Figure 14EBY100 yeast secretes soluble HLA-II (background control, or BC), displays LZ-modified HLA-II (W/LZ), or displays FcAB-modified HLA-II (W/Fc), and performs immunofluorescence labeling and flow cytometry on surface DR or DQ protein levels.
  • the flow cytometry results are displayed in histogram format. Among them, the median fluorescence intensity of HLA-II ⁇ / ⁇ alleles and DR or DQ proteins under each condition is shown in the figure.
  • the flow cytometry results are displayed in Dot-plot or histogram format.
  • FIG19 EBY100 yeast displaying LZ modified DR1 (DR-LZ) or displaying FcAB modification and AGA2 signal peptide guided extracellular delivery display DR1 (DR1-Fc), respectively combined with 10 ⁇ M biotinylated indicator peptide (Bio-HA 306-318 ) and different concentrations of competitive peptide (HA 306-318 ) and immunofluorescence co-labeling and flow cytometry of surface DR protein level and indicator peptide level. The flow cytometry results were analyzed as shown in FIG18 and then single-point competition binding kinetic curve fitting was performed. The IC50 calculated by the fitting curve is shown in the figure.
  • the present application provides a scheme for making yeast co-express an antigen peptide gene and an MHC-II allele, and co-displaying the expressed protein product, i.e., peptide/MHC-II trimer, on the yeast surface.
  • This scheme not only achieves the co-expression of complex MHC-II alleles in yeast, but also solves the problem of low co-display of peptide/MHC-II trimer. It greatly improves the efficiency of the research and development of new antigens related to MHC-II/antigen peptides, and at the same time, provides new ideas for the development of various vaccines including oral vaccines; it also provides solutions for the development of targets for precision immunotherapy.
  • the scheme of the present application uses a yeast co-display system ( Figure 1), which is named CODAAH (CO-Display of Antigen-ligand and Assicated HLA-II).
  • CODAAH CO-Display of Antigen-ligand and Assicated HLA-II
  • the present application first connects the protein product to the yeast surface by genetically fusing the peptide fragment gene. Since the peptide fragments are derived from T cell antigens, the MHC-II complex secreted in a soluble form can non-covalently bind to the peptide fragments displayed by yeast, thereby achieving peptide/MHC-II trimer co-display on the yeast surface.
  • the comparison is performed using default parameters.
  • a preferred comparison program is BLAST.
  • Preferred programs are BLASTN and BLASTP. Details of these programs can be found at the following Internet address: ncbi.nlm.nih.gov/cgi-bin/BLAST.
  • complementarity of nucleic acids refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid through traditional Watson-Crick base pairing. Percent complementarity represents the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (i.e., Watson-Crick base pairing) with another nucleic acid molecule (e.g., about 5, 6, 7, 8, 9, 10 out of 10 are about 50%, 60%, 70%, 80%, 90% and 100% complementary, respectively). "Complete complementarity” refers to the formation of hydrogen bonds between all consecutive residues of a nucleic acid sequence and the same number of consecutive residues in a second nucleic acid sequence.
  • substantially complementary refers to a degree of complementarity of at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% in a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
  • a single base or a single nucleotide according to the Watson-Crick base pairing principle, when A is paired with T or U, C is paired with G or I, it is called complementary or matching, and vice versa; and other base pairings are called non-complementary.
  • the "complementary polynucleotide sequence" of a polynucleotide sequence refers to a polynucleotide sequence that is completely complementary to the polynucleotide sequence.
  • a "conservative substitution variant" of a protein, polypeptide or amino acid sequence refers to one in which one or more amino acid residues undergo amino acid substitution without changing the overall conformation and function of the protein or enzyme, including but not limited to replacing the amino acids in the amino acid sequence of the parent protein in the manner described by the aforementioned "conservative substitution". Therefore, the similarity of two proteins or amino acid sequences with similar functions may be different. For example, a similarity (identity) of 70% to 99% based on the MEGALIGN algorithm.
  • the "coding sequence" of the present application may further include polynucleotide sequences encoding proteins, functional nucleic acids, or fragments thereof, such as miRNA, shRNA, dsRNA, guide RNA, Poly (A) tail, 5'UTR, 3'UTR, etc.
  • a DNA molecule containing genetic information that can be transcribed into an RNA molecule is called the "coding nucleic acid” of the RNA molecule; an RNA molecule containing genetic information that can be translated into an amino acid sequence is called the "coding nucleic acid” of the amino acid sequence.
  • HLA-DQ6 beta chain or “HLA-DQ6 beta chain protein” provided herein include any recombinant or naturally occurring form of human leukocyte antigen (HLA) DRB1 beta chain 1 protein (HLA-DQ6 beta chain), also known as MHC-II class DQB1, HLA-DQB1, or variants or homologs thereof that retain HLA-DQ6 beta chain protein activity (e.g., at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of the activity compared to HLA-DQ6 beta chain).
  • HLA human leukocyte antigen
  • HLA-DQ6 beta chain also known as MHC-II class DQB1, HLA-DQB1, or variants or homologs thereof that retain HLA-DQ6 beta chain protein activity (e.g., at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of the activity compared to HLA-DQ6 beta chain).
  • the designated proteins include naturally occurring forms, variants, or homologs of any protein that retain protein transcription factor activity (e.g., at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% activity compared to the native protein).
  • the variant or homolog has at least 90%, 95%, 96%, 97%, 98%, 99%, or 100% amino acid sequence identity over the entire sequence or a portion of the sequence (e.g., 50, 100, 150, or 200 consecutive amino acid portions) compared to the naturally occurring form.
  • the protein is a protein identified by its NCBI sequence reference.
  • the protein is a protein identified by its NCBI sequence reference, a homolog thereof, or a functional fragment thereof.
  • recombinant when referring to, for example, a cell, nucleic acid, protein, or vector means that the cell, nucleic acid, protein, or vector A cell that has been modified or altered by the introduction of a heterologous nucleic acid or protein, or a cell derived from a cell so modified.
  • a recombinant protein is a protein produced by a recombinant nucleic acid molecule.
  • Nucleic acid molecules can include genetic material from a variety of sources, thereby including non-naturally occurring sequences.
  • Recombinant DNA can be produced by methods known in the art of molecular biology or by synthetic methods.
  • expression includes any step involved in the production of the polypeptide, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification and secretion. Expression can be detected using conventional techniques for detecting proteins (e.g., ELISA, Western blotting, flow cytometry, immunofluorescence, immunohistochemistry, etc.).
  • Control refers to a sample, measurement or value used as a reference, usually a known reference, for comparison with a test sample, measurement or value.
  • a test sample can be obtained from a patient suspected of having a given disease and compared with a known normal (undiseased) individual (e.g., a standard control subject).
  • a standard control can also represent an average measurement or value collected from a group of healthy individuals without a given disease (i.e., a standard control population), such as a similar medical background, the same age, and weight.
  • Standard control values can also be obtained from the same individual, such as a patient sample obtained earlier before the onset of the disease.
  • controls can be designed to compare the therapeutic benefits based on pharmacological data (e.g., half-life) or treatment measures (e.g., comparison of side effects). Controls are also valuable for determining the importance of data. For example, if the value of a given parameter varies greatly in the control, the change in the test sample will not be considered significant. Among those skilled in the art, it will be recognized that standard controls can be designed to evaluate any number. Quantitative parameters (e.g., RNA levels, protein levels, specific cell types, specific body fluids, specific tissues, etc.).
  • the present application provides an engineered cell comprising a major histocompatibility complex (MHC) and a single-chain domain, wherein the complex comprises an ⁇ chain and a ⁇ chain, wherein:
  • MHC major histocompatibility complex
  • the ⁇ chain is linked to a first protein binding domain to form a first fusion protein
  • the beta chain is linked to a second protein binding domain to form a second fusion protein
  • the single-chain domain non-covalently binds to the ⁇ chain and the ⁇ chain
  • the first fusion protein binds to the second fusion protein to prompt the ⁇ chain and the ⁇ chain to form the complex, so that the complex and the single-chain domain together constitute a trimer
  • the first protein binding domain binds to the second protein binding domain to enhance or promote the formation of the trimer formed between the complex and the single-chain domain.
  • the cell is a eukaryotic cell. In some embodiments, the cell is a cell derived from a unicellular organism. In some embodiments, the cell is derived from a unicellular eukaryotic organism. In some embodiments, the cell is a yeast cell. In some embodiments, the yeast is a brewer's yeast. In some embodiments, the yeast belongs to strain EBY100 (Boder & Wittrup, Yeast surface display for screening combinatorial polypeptide libraries, 1997).
  • the single-chain domain is bound to the surface of the cell by covalent fusion with the molecule on the cell surface. In some embodiments, the single-chain domain is bound to the surface of the cell by a peptide bond with the molecule on the cell surface. In some embodiments, the single-chain domain is bound to the surface of the cell by a peptide chain with the molecule on the cell surface.
  • the ⁇ chain or the ⁇ chain is bound to the molecule on the cell surface by non-covalent binding with the single-chain domain. In some embodiments, the ⁇ chain and the ⁇ chain are bound to the molecule on the cell surface by non-covalent binding with the single-chain domain.
  • the molecule comprises a structure that can anchor the yeast cell wall. In some embodiments, the molecule comprises a structure that can anchor the yeast cell wall and a secretion signal peptide (or secretion signal region). In some embodiments, the structure that can anchor the yeast cell wall can be directly or indirectly bound to the cell wall glucan or cell wall mannan. In some embodiments, the structure that can be anchored to the yeast cell wall can be covalently or non-covalently bound to the cell wall glucan or the cell wall mannan directly or indirectly. In some embodiments, the structure that can be anchored to the yeast cell wall can be covalently bound to the cell wall glucan or non-covalently bound to the cell wall mannan directly or indirectly.
  • the structure that can be anchored to the yeast cell wall and/or the secretion signal peptide are derived from the ⁇ -condensin system and/or flocculin. Lectin system (glycosylphosphatidylinositol anchor system and flocculin domain anchor system).
  • the structure that can anchor the yeast cell wall is directly or indirectly combined with or connected to the GPI anchor attachment signal region or the flocculation functional region.
  • the molecule is endogenous to the cell. In some embodiments, the molecule is selected from: Aga2p, a-lectin, ⁇ -lectin, flocculin, Cwp1p, Cwp2p or Tip1p.
  • the protein is Aga2p.
  • Aga2p is a protein encoded by a gene with a GENE ID of 852851 in the NCBI database.
  • Aga2p i.e., Aga2 protein
  • Aga2p is a binding subunit of a-condensin, which can be combined with the a-condensin core subunit Aga1 protein through a disulfide bond.
  • Aga2p is a protein encoded by a gene with a GENE ID of 852851 in the NCBI database.
  • Aga2p has 1, 2, 3, 4, 5, 6 or more mutations in its amino acid sequence compared to the protein encoded by the gene with GENE ID 852851 in the NCBI database.
  • Aga2p has no S288C mutation in its amino acid sequence compared to the protein encoded by the gene with GENE ID 852851 in the NCBI database.
  • the amino acid sequence of Aga2p comprises the amino acid sequence as shown in SEQ ID NO: 1, or a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO: 1, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 1.
  • the amino acid sequence of Aga2p is as shown in SEQ ID NO: 1.
  • the single-chain domain is a peptide segment of 9 to 30 (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29) amino acids in length, and comprises at least one continuous register peptide (register) of a fixed length of 9 amino acids that can be specifically recognized by the complex or a variant thereof that can be bound by the complex.
  • the amino acid sequence of the single-chain domain is as shown in any one of SEQ ID NO: 2-6.
  • the amino acid sequence of the single-chain domain is a synonymous mutant of the amino acid sequence shown in any one of SEQ ID NO: 2-6.
  • the amino acid sequence of the single-chain domain has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence shown in any one of SEQ ID NO: 2-6.
  • the introduction of mutations into the ⁇ chain and/or the ⁇ chain can make the registered peptide accommodated by the MHC II complex peptide binding domain constituted by it longer or shorter, and the engineered cells containing the ⁇ chain and the ⁇ chain introduced with the mutations should also be covered within the scope of the present application, and, corresponding to the ⁇ chain and the ⁇ chain introduced with the mutations, the length of the registered peptide contained in the peptide chain can be more or less than 9 amino acids, for example, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids.
  • amino acid sequence of the registered peptide or its mutant is shown in SEQ ID NO:20-38.
  • the first protein binding domain is non-covalently bound to the second protein binding domain. In some embodiments, the first protein binding domain is covalently bound to the second protein binding domain.
  • the first protein binding domain and the second protein binding domain may be the same or different, and may be homologous proteins or non-homologous proteins. In some embodiments, the first protein binding domain and the second protein binding domain comprise fragments from the same antibody. In some embodiments, the first protein binding domain and the second protein binding domain may be bound to each other by any method for forming antibody homologous or heterologous dimers (or multimers).
  • the method for forming antibody homologous or heterologous dimers (or multimers) includes but is not limited to: "knob-in-hole” technology (see, e.g., U.S. Patent Application No. 5,731,168), engineered electrostatic steering technology (see, e.g., PCT Application No. WO 2009/089004A1), leucine zipper technology (see, e.g., Kostelny et al, J. Immunol., 148(5): 1547-1553, 1992)) and those described in U.S. Patent Application No. 4,676,980 and Brennan et al, Science, 229: 81, 1985.
  • knock-in-hole see, e.g., U.S. Patent Application No. 5,731,168
  • engineered electrostatic steering technology see, e.g., PCT Application No. WO 2009/089004A1
  • leucine zipper technology see, e.g., Kostelny et al, J. Immunol.
  • the first protein binding domain and the second protein binding domain may contain mutations for the corresponding binding structure of the above method.
  • the first protein binding domain and the second protein binding domain are bound to each other via a disulfide bond.
  • the first protein binding domain is a first leucine zipper domain
  • the second protein binding domain is a second leucine zipper domain, wherein the first leucine zipper domain and the second leucine zipper domain that form a complete leucine zipper can be used interchangeably in the first fusion protein and the second fusion protein; or the first protein binding domain is an FcA domain, and the second protein binding domain is an FcB domain, wherein the FcA domain and the FcB domain that form an FcAB dimer can be used interchangeably in the first fusion protein and the second fusion protein.
  • the second protein binding domain when the first protein binding domain is a first leucine zipper domain, the second protein binding domain is a second leucine zipper domain; and when the first protein binding domain is a second leucine zipper domain, the second protein binding domain is a first leucine zipper domain; or when the first protein binding domain is FcA, the second protein binding domain is FcB; and when the first protein binding domain is FcB, the second protein binding domain is FcA.
  • the first leucine zipper domain and the second leucine zipper domain are both acidic leucine domains (Acid zipper). In some embodiments, the first leucine zipper domain and the second leucine zipper domain are both basic leucine domains (Basic zipper). In some embodiments, the first leucine zipper domain is an acidic leucine domain (Acid zipper), and the second leucine zipper domain is a basic leucine domain (Basic zipper). In some embodiments, the first leucine zipper domain and the second leucine zipper domain are both c-Jun leucine zipper domains (Jun Zipper).
  • the first leucine zipper domain is Jun Zipper and the second leucine zipper domain is c-Fos leucine zipper domain (Fos Zipper).
  • the acidic leucine domain comprises an amino acid sequence as set forth in SEQ ID NO:7, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO:7, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO:7.
  • the amino acid sequence of the acidic leucine domain is an amino acid sequence as set forth in SEQ ID NO:7, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO:7, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO:7.
  • the basic leucine domain comprises an amino acid sequence as set forth in SEQ ID NO: 8, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO: 8, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO: 8.
  • the amino acid sequence of the basic leucine domain is an amino acid sequence as set forth in SEQ ID NO: 8, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO: 8, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO: 8. Amino acid sequences with 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity.
  • the Fos Zipper comprises an amino acid sequence as shown in SEQ ID NO: 9, or a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO: 9, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 9.
  • the amino acid sequence of the Fos Zipper is an amino acid sequence as shown in SEQ ID NO: 9, or a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO: 9, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 9.
  • the JunZipper comprises an amino acid sequence as set forth in SEQ ID NO: 10, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO: 10, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO: 10.
  • the amino acid sequence of the JunZipper is an amino acid sequence as set forth in SEQ ID NO: 10, or a conservatively substituted variant of the amino acid sequence as set forth in SEQ ID NO: 10, or an amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as set forth in SEQ ID NO: 10.
  • amino acid sequence of the first leucine zipper domain is shown in SEQ ID NO: 7 or SEQ ID NO: 8 or SEQ ID NO: 9 or SEQ ID NO: 10;
  • amino acid sequence of the second leucine zipper domain is shown in SEQ ID NO:8 or SEQ ID NO:7 or SEQ ID NO:10 or SEQ ID NO:9.
  • the first protein binding domain and the second protein binding domain both comprise the Fc domain of an immunoglobulin.
  • the immunoglobulin molecule After being digested by papain, the immunoglobulin molecule is split into an Fc segment and a Fab segment, wherein the Fc segment includes the portion other than the CH1 segment of the heavy chain constant region of the immunoglobulin molecule.
  • the term "Fc domain” is a monomer, which may refer to any portion other than the CH1 segment in the constant region of any heavy chain of an immunoglobulin.
  • the FcA and FcB may be derived from the Fc segment of the same immunoglobulin, or from the Fc segment of different immunoglobulins.
  • the immunoglobulins from which the FcA and FcB are derived may be selected from: IgG1, IgG, IgE, IgM, IgD and IgA. In some embodiments, the immunoglobulins from which the FcA and FcB are derived may be selected from: IgG1, IgG2, IgG3, IgG4, IgA1 and IgA2.
  • the FcA and FcB are selected from the group consisting of amino acid sequences as set forth in SEQ ID NOs: 11-19, conservative substitution variants of the amino acid sequences as set forth in SEQ ID NOs: 11-19, and amino acid sequences having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequences as set forth in SEQ ID NOs: 11-19.
  • the amino acid sequences of the FcA and FcB are both as set forth in SEQ ID NO: 11, or conservative substitution variants of the amino acid sequences as set forth in SEQ ID NO: 11, or having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequences as set forth in SEQ ID NOs: 11.
  • the amino acid sequence of FcA is as shown in SEQ ID NO: 12, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO: 12, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 12.
  • amino acid sequence of the FcB is as shown in SEQ ID NO: 13, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO: 13, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 13.
  • the amino acid sequence of the FcA is as shown in SEQ ID NO:4, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:13, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:13; and the amino acid sequence of the FcB is as shown in SEQ ID NO:12, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:12, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:3.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:14, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO:14, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:14; and the amino acid sequence of FcB is as shown in SEQ ID NO:15, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO:15, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:15.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:15, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:15, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:15; and the amino acid sequence of FcB is as shown in SEQ ID NO:14, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:14, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:14.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:16, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:16, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:16; and the amino acid sequence of FcB is as shown in SEQ ID NO:17, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:17, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:17.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:17, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:17, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:17; and the amino acid sequence of FcB is as shown in SEQ ID NO:16, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:16, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:16.
  • the amino acid sequence of FcA is as shown in SEQ ID NO: 18, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO: 18, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 18; and the amino acid sequence of FcB is as shown in SEQ ID NO: 19, or is an amino acid sequence as shown in SEQ ID NO: 19.
  • the invention relates to a conservatively substituted variant of an amino acid sequence, or a conservatively substituted variant of an amino acid sequence, or having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence shown in SEQ ID NO: 19.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:19, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:19, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:19; and the amino acid sequence of FcB is as shown in SEQ ID NO:18, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:18, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:18.
  • the FcA domain and the FcB domain can increase the expression display amount and/or copy number of the complex.
  • the FcA domain only includes the first CH3 domain
  • the FcB domain only includes the second CH3 domain; wherein the first CH3 domain and the second CH3 domain can be from the same immunoglobulin subtype or different immunoglobulin subtypes.
  • the FcA domain includes both the first CH2 domain and the first CH3 domain
  • the FcB domain includes both the second CH2 domain and the second CH3 domain
  • the first CH3 domain and the second CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the first CH2 domain and the second CH2 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the first CH2 domain and the first CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the second CH2 domain and the second CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes.
  • the FcA domain may further include a first CH4 domain
  • the FcB domain may further include a second CH4 domain.
  • first CH2 domain and the first CH3 domain of the first protein binding domain are directly or indirectly connected through a linker.
  • the second CH2 domain and the second CH3 domain of the first protein binding domain are directly or indirectly connected through a linker.
  • first CH2 domain and the first CH3 domain of the first protein binding domain are indirectly connected through a linker; and the second CH2 domain and the second CH3 domain of the second protein binding domain are indirectly connected through a linker.
  • the linker is a cleavable or non-cleavable linker.
  • the linker is a short peptide or a connecting peptide
  • the connecting peptide can be a flexible connecting peptide or a rigid connecting peptide.
  • Linker peptides can be selected with different lengths or functions. Their types, functions and characteristics can be referenced, for example, Reddy Chichili VP, Kumar V, Sivaraman J. Linkers in the structural biology of protein-protein interactions. Protein Sci. 2013; 22(2): 153-167. doi: 10.1002/pro.2206.
  • either or both of the FcA domain and the FcB domain comprise one or more amino acid modifications that can (1) enhance or stabilize the FcA domain and/or the FcB domain itself (“FcS"), and/or (2) enhance or stabilize the covalent or non-covalent binding between the FcA domain and the FcB domain (“FcM”), and/or (3) reduce or avoid various possible mispairing of the ⁇ chain and ⁇ chain of claim 1 caused by mispairing of the FcA domain and the FcB domain (“FcN").
  • FcS enhance or stabilize the FcA domain and/or the FcB domain itself
  • FcM covalent or non-covalent binding between the FcA domain and the FcB domain
  • FcN the FcN
  • the first protein binding domain is attached to the C-terminus of the ⁇ chain
  • the second A protein binding domain is attached to the C-terminus of the beta chain.
  • the signal peptide for initiating the expression of the first fusion protein and the second fusion protein is SPP SP or AGA2 SP, and further preferably, the amino acid sequence of the AGA2 SP is as shown in SEQ ID NO:39, and the amino acid sequence of the SPP SP is as shown in SEQ ID NO:45.
  • the complex is an MHC class I molecule or an MHC class II molecule. In some embodiments, the complex is an HLA class I molecule or an HLA class II molecule.
  • the ⁇ chain is encoded by HLA-DRA*01 or other alleles of the HLA-DRA family, and the ⁇ chain is encoded by HLA-DRB1*01 or HLA-DRB1*03 or HLA-DRB1*04 or HLA-DRB1*15 or other alleles of the HLA-DRB family; or
  • the ⁇ chain is encoded by HLA-DQA1*01 or HLA-DQA1*03 or HLA-DQA1*05 or other alleles of the HLA-DQA1 family, and the ⁇ chain is encoded by HLA-DQB1*02 or HLA-DQB1*03 or HLA-DQB1*05 or HLA-DQB1*06 or other alleles of the HLA-DQB1 family; or
  • the ⁇ chain is encoded by HLA-DPA1*01:03 or HLA-DPA1*02:02 or other alleles of the HLA-DPA1 family
  • the ⁇ chain is encoded by HLA-DPB1*01:01 or HLA-DPB1*02:01 or HLA-DPB1*04:01 or HLA-DPB1*04:02 or other alleles of the HLA-DPB1 family.
  • an engineered nucleic acid molecule encoding a major histocompatibility complex (MHC), wherein the complex comprises an ⁇ chain and a ⁇ chain, wherein the ⁇ chain is connected to a first protein binding domain, and the ⁇ chain is connected to a second protein binding domain, wherein the ⁇ chain and the ⁇ chain are bound to a single-chain domain by a non-covalent bond, and the first protein binding domain is bound to the second protein binding domain to enhance or promote the formation of a trimer composed of the complex and the single-chain domain.
  • the present application also provides a nucleic acid molecule encoding the trimer.
  • the trimer is a trimer composed of the MHC and the single-chain domain contained in the engineered cell described in the first aspect of the present application.
  • the MHC is the MHC contained in the engineered cell described in the first aspect of the present application.
  • the engineered nucleic acid molecule is an engineered DNA molecule.
  • the DNA molecule can be replicated and/or expressed in a cell.
  • the DNA molecule can be replicated and/or expressed in a eukaryotic cell.
  • the DNA molecule can be replicated and/or expressed in a prokaryotic cell.
  • the DNA molecule can be expressed in a eukaryotic cell and can be replicated in a prokaryotic cell. Therefore, the DNA molecule, in addition to comprising a nucleic acid fragment encoding the MHC or the trimer, also comprises a gene manipulation or regulatory element for replication and/or expression in a prokaryotic and/or eukaryotic cell.
  • the engineered DNA molecule further comprises a marker gene or its fragment and/or a reporter gene or its fragment and a unique restriction endonuclease site that allows insertion of DNA elements, preferably a restriction endonuclease site in the form of a multiple cloning site (MCS).
  • the marker gene is conducive to identifying cells containing a plasmid comprising the marker gene, and can be selected from, for example, an antibiotic resistance gene.
  • Each restriction endonuclease site in the MCS can be specifically recognized by different restriction endonucleases.
  • the DNA molecule is a DNA plasmid.
  • DNA plasmid refers to Plasmids composed of double-stranded DNA molecules.
  • the "plasmid” is a circular DNA molecule.
  • the "plasmid” can also encompass linear DNA molecules.
  • the term “plasmid” also encompasses molecules obtained by, for example, cutting a circular plasmid with a restriction endonuclease, thereby converting the circular plasmid molecule into a linear molecule and linearizing the circular plasmid, and linear molecules that can be replicated in prokaryotes.
  • Plasmids can replicate, i.e., amplify in cells independently of the genomic genetic information stored in the nucleoid or nucleoid of prokaryotes, and can be used for cloning, i.e., for amplifying genetic information in bacterial cells.
  • the DNA plasmid according to the present invention is a medium copy or high copy plasmid, more preferably a high copy plasmid.
  • high copy plasmids are such vectors: they are based on pUC, pTZ plasmids or any other plasmids (e.g., pMB1, pCoIE1) etc. that contain an ORI that supports high copies of plasmids.
  • the engineered DNA molecule is a DNA molecule or a fragment thereof constituting a prokaryotic nucleoid or nucleoid, or a DNA molecule or a fragment thereof constituting a eukaryotic genome, that is, the coding sequence or its complementary sequence comprising the aforementioned MHC or trimer can be replicated along with the prokaryotic genome.
  • the engineered DNA molecule can be transcribed into mRNA.
  • the engineered DNA molecule also comprises a coding sequence of an element that can be used to start or regulate the expression of the protein, polypeptide or its fragment after transcription, and the element includes but is not limited to 5'UTR, 3'UTR, poly (A) tail (or tailing signal), etc.
  • the engineered DNA molecule comprises a coding sequence of at least one untranslated region (UTR).
  • the engineered DNA molecule comprises at least a coding sequence of 5'UTR and a coding sequence of the protein, polypeptide or its fragment.
  • the 5'UTR usually contains at least one ribosome binding site (RBS), such as the Shine-Dalgarno sequence in prokaryotes, or at least one translation initiation site, such as the Kozak sequence in eukaryotes.
  • RBS ribosome binding site
  • the RBS promotes the efficient and accurate translation of mRNA molecules by recruiting ribosomes at the start of translation. Its activity can be optimized by changing the length and sequence of a given RBS or translation revelation site and the distance from the start codon.
  • the 5'UTR includes an internal ribosome entry site or IRES.
  • the 3'UTR may contain one or more regulatory sequences, such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for regulatory RNA molecules (such as miRNA molecules), and/or signal sequences that participate in the intracellular transport of mRNA molecules.
  • regulatory sequences such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for regulatory RNA molecules (such as miRNA molecules), and/or signal sequences that participate in the intracellular transport of mRNA molecules.
  • the target gene fragment further comprises one or more additional regulatory sequences, such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for amino acid sequences that enhance the translation of mRNA molecules, regulatory elements (such as riboswitches), and/or nucleotide sequences that have a positive impact on translation initiation.
  • additional regulatory sequences such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for amino acid sequences that enhance the translation of mRNA molecules, regulatory elements (such as riboswitches), and/or nucleotide sequences that have a positive impact on translation initiation.
  • regulatory elements such as riboswitches
  • nucleotide sequences that have a positive impact on translation initiation within the 5'UTR, preferably there is no functional upstream open reading frame, an out-of-frame upstream translation initiation site, an out-of-frame upstream start codon, and/or a nucleotide sequence that produces a secondary structure that reduces or
  • the coding sequence of the MHC or trimer comprises codons that can be translated into an amino acid sequence. All the codons contained in the coding sequence may be naturally occurring codons encoding amino acids, or some or all of them may be composed of artificially synthesized codons. In some embodiments, some or all of the codons are scrambled. Codon optimization. In some embodiments, some or all of the codons encode unnatural amino acids.
  • the engineered DNA molecule further comprises a structural element necessary for initiating or regulating transcription of the RNA on the 5' end side of the target gene fragment, and the structural element is known in the art.
  • the structural element comprises at least a promoter. Promoters and their sequences are known in the art, including weak promoters, medium strength promoters, strong promoters, mini promoters or core promoters, etc. In some specific embodiments, the promoter is a strong promoter.
  • the promoter can initiate transcription of the coding sequence of the MHC or the trimer in prokaryotes. In some embodiments, the promoter can initiate transcription of the coding sequence of the MHC or the trimer in eukaryotic cells.
  • the "promoter” comprises at least one transcription recognition site and a subsequent transcription factor binding site.
  • the recognition and binding site can interact with an amino acid sequence that mediates or regulates transcription. Compared with the recognition site, the binding site is closer to the aforementioned target gene fragment.
  • the binding site can be, for example, a Pribnow box in prokaryotes or a TATA box in eukaryotes.
  • the transcription recognition site when using the Pribnow box, can be located at about 35bp upstream of the transcription start site, and the transcription factor binding site can be located at about 10bp upstream of the transcription start site.
  • the promoter comprises at least one other regulatory element, such as an upstream element rich in AT at about 40 and/or 60 nucleotides before the transcription start site, and/or an additional regulatory element of the enhancing promoter activity between the recognition site and the binding site.
  • the promoter is a strong promoter, i.e., the promoter comprises a sequence that promotes the transcription of the aforementioned RNA coding sequence. Strong promoters are known to those skilled in the art, such as OXB18, OXB19 and OXB20 promoters derived from the RecA promoter of Escherichia coli, or can be identified or synthesized by conventional laboratory procedures.
  • the promoter is a T7 promoter.
  • the promoter also comprises other regulatory elements before the promoter, such as an enhancer that can promote the transcription of the aforementioned RNA coding sequence in a DNA plasmid.
  • the eukaryotic cell is a yeast cell.
  • the DNA molecule is a yeast display vector.
  • the present application also provides an RNA molecule encoding the above-mentioned MHC.
  • the RNA molecule is transcribed from the above-mentioned engineered DNA molecule.
  • the engineered nucleic acid molecule may also be a hybrid molecule of DNA and RNA.
  • the present application also covers any cells comprising the above-mentioned engineered nucleic acid molecules.
  • the present application also relates to a method for preparing a major histocompatibility complex (MHC) or a trimer formed by non-covalent bonding of MHC and a single-chain domain, the method comprising transforming a cell using the engineered nucleic acid molecule of the second aspect, and culturing the cell under conditions of expressing the complex or the trimer.
  • the cell is a yeast cell, such as a saccharomyces cerevisiae cell.
  • the cell is a yeast cell and the engineered nucleic acid molecule is a yeast display vector.
  • the cell belongs to the yeast strain EBY100.
  • the present application also relates to a method for identifying a peptide that binds to a major histocompatibility complex (MHC), the method comprising:
  • the single-chain domain comprises a peptide, or a mutant of a peptide, or a library of mutants of a peptide, or a mixture of peptides.
  • the peptide, or a mutant of a peptide, or a library of mutants of a peptide, or a mixture of peptides non-covalently binds to the complex.
  • the peptide is a peptide segment of 9 to 30 (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29) amino acids in length, and comprises at least one continuous registered peptide with a fixed length of 9 amino acids that can be specifically recognized by the complex or a variant thereof that can be bound by the complex.
  • 9 to 30 e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29
  • the inventors expressed human MHC alleles as native, non-covalent ⁇ / ⁇ dimers on the surface of yeast cells to allow for rapid flow cytometry-based screening of peptide ligands directly from selected antigens and to improve the sensitivity and accuracy of peptide/MHC affinity assays.
  • the present application constructs the RIPAAH (Rapid Identification of Peptide Antigen Associated by HLA-II) method and related cells and nucleic acids, that is, using the yeast display platform to eliminate labor-intensive expression/purification steps.
  • yeast cells As a unicellular eukaryotic organism, yeast cells have the rapid and easy cloning characteristics of Escherichia coli and are equipped with post-translational modification mechanisms similar to mammalian or insect cells.
  • the method connects one chain ( ⁇ or ⁇ ) of a given MHC-II allele to a yeast surface protein and allows the other chain ( ⁇ or ⁇ ) to be secreted by the same yeast cell as a soluble component.
  • the two C-termini of the ⁇ / ⁇ chain can be modified to carry the amino acid sequence of the crystallizable fragment of the antibody heavy chain (FcAB), which promotes the pairing of the soluble secretory chain with other surface-anchored chains.
  • FcAB crystallizable fragment of the antibody heavy chain
  • This method not only effectively improves the sensitivity and accuracy of determining the affinity between the target peptide and the MHC-II protein displayed by yeast, but also relies on the competitive binding between the test peptide and the reference peptide and the MHC-II protein displayed by yeast to achieve short-term and efficient high-throughput determination.
  • the inventors can more effectively create yeast cell clone libraries, so that different human MHC-II alleles will be expressed in their natural form on the surface of different yeast clones for TCR epitope peptide screening.
  • the emergence of this method will provide an effective solution for the rapid development of new vaccines and cell therapies based on T cell immunity.
  • high-throughput and high-content single-cell yeast surface MHC-II/peptide ligand affinity determination will promote artificial intelligence computational methods for T cell epitope peptide map discovery.
  • the present application provides a cell comprising a major histocompatibility complex (MHC), wherein the complex comprises an ⁇ chain and a ⁇ chain, wherein:
  • MHC major histocompatibility complex
  • the ⁇ chain is linked to the first protein binding domain to form a first fusion protein
  • the beta chain is connected to a second protein binding domain to form a second fusion protein
  • the first fusion protein binds to the second fusion protein to form the complex
  • the first protein binding domain and the second protein binding domain are used to enhance the binding of the ⁇ chain and the ⁇ chain to form the MHC.
  • the cells are engineered cells.
  • the cell is a eukaryotic cell. In some embodiments, the cell is a cell derived from a unicellular organism. In some embodiments, the cell is derived from a unicellular eukaryotic organism. In some embodiments, the cell is a yeast cell. In some embodiments, the yeast is a brewer's yeast. In some embodiments, the yeast belongs to strain EBY100 (Boder & Wittrup, 1997).
  • the complex is bound to the surface of the cell by connecting the ⁇ chain or the ⁇ chain to a molecule on the cell surface.
  • the additional ⁇ chain or the additional ⁇ chain in the complex is bound to the molecule on the cell surface by non-covalent binding to the ⁇ chain or the ⁇ chain. That is, the ⁇ chain and the ⁇ chain synthesized by different cells can bind to each other.
  • the ⁇ chain synthesized by one cell can bind to the ⁇ chain synthesized by another cell; the ⁇ chain synthesized by the one cell can also bind to the ⁇ chain synthesized by another cell, so that the cell membrane can contain a complete MHC.
  • at least one of the ⁇ chain and the ⁇ chain, or more precisely, the first fusion protein or the second fusion protein is secretable.
  • the molecule can be any molecule that allows the first protein binding domain and/or the second protein binding domain to bind to the cell surface. In some embodiments, the molecule and the first protein binding domain and/or the second protein binding domain are mutually bound by covalent or non-covalent bonds. In some embodiments, the molecule is a protein. In some embodiments, the molecule is a protein, and the first protein binding domain and/or the second protein binding domain are bound to the molecule by a peptide bond. In some embodiments, the molecule comprises a structure that can anchor the yeast cell wall. In some embodiments, the molecule comprises a structure that can anchor the yeast cell wall and a secretion signal peptide (or secretion signal region).
  • the structure that can anchor the yeast cell wall can be directly or indirectly bound to a cell wall glucan or a cell wall mannan. In some embodiments, the structure that can anchor the yeast cell wall can be covalently or non-covalently bound to a cell wall glucan or a cell wall mannan directly or indirectly. In some embodiments, the structure that can anchor the yeast cell wall can be covalently bound to a cell wall glucan or a cell wall mannan directly or indirectly. In some embodiments, the structure that can anchor the yeast cell wall can be directly or indirectly covalently bound to a cell wall glucan or a cell wall mannan non-covalently bound.
  • the structure and/or secretion signal peptide that can anchor the yeast cell wall is derived from the ⁇ -condensin system and/or the flocculin system (glycosylphosphatidylinositol anchor system and flocculin domain anchor system).
  • the structure that can anchor the yeast cell wall is directly or indirectly combined with or connected to the GPI anchor attachment signal region or the flocculation functional region.
  • the molecule is endogenous to the cell.
  • the molecule is selected from: Aga2p, ⁇ -condensin, ⁇ -condensin, flocculin, Cwp1p, Cwp2p or Tip1p.
  • the protein is Aga2p.
  • Aga2p is a protein encoded by a gene with a GENE ID of 852851 in the NCBI database.
  • Aga2p i.e., Aga2 protein, is a binding subunit of ⁇ -condensin, which can bind to the ⁇ -condensin core subunit Aga1 protein through a disulfide bond.
  • Aga2p is a protein encoded by a gene with a GENE ID of 852851 in the NCBI database.
  • Aga2p has 1, 2, 3, 4, 5, 6 or more mutations in its amino acid sequence compared to the protein encoded by the gene with a GENE ID of 852851 in the NCBI database. In some embodiments, Aga2p does not have an S288C mutation in its amino acid sequence compared to the protein encoded by the gene with a GENE ID of 852851 in the NCBI database. In some embodiments, the amino acid sequence of Aga2p comprises the amino acid sequence as shown in SEQ ID NO: 1, or a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO: 1, or a variant with a S288C mutation in its amino acid sequence compared to the amino acid sequence as shown in SEQ ID NO: 1.
  • amino acid sequence with at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity is as shown in SEQ ID NO:1.
  • the first protein binding domain is non-covalently bound to the second protein binding domain. In some embodiments, the first protein binding domain is covalently bound to the second protein binding domain.
  • the first protein binding domain and the second protein binding domain can be the same or different, and can be homologous proteins or non-homologous proteins. In some embodiments, the first protein binding domain and the second protein binding domain comprise fragments from the same antibody. In some embodiments, the first protein binding domain and the second protein binding domain can be bound to each other by any method for forming antibody homologous or heterologous dimers (or polymers).
  • the methods for forming antibody homologous or heterologous dimers (or multimers) include, but are not limited to, "knob-in-hole” technology (see, e.g., U.S. Patent Application No. 5,731,168), engineered electrostatic steering technology (see, e.g., PCT Application No. WO 2009/089004A1), leucine zipper technology (see, e.g., Kostelny et al, J. Immunol., 148(5): 1547-1553, 1992)) and those described in U.S. Patent Application No. 4,676,980 and Brennan et al, Science, 229: 81, 1985.
  • "knob-in-hole” technology see, e.g., U.S. Patent Application No. 5,731,168
  • engineered electrostatic steering technology see, e.g., PCT Application No. WO 2009/089004A1
  • leucine zipper technology see, e.g., Kostelny et al, J
  • the first protein binding domain and the second protein binding domain may contain mutations for the corresponding binding structure of the above method.
  • the first protein binding domain and the second protein binding domain are bound to each other via a disulfide bond.
  • the first protein binding domain and the second protein binding domain both comprise the Fc domain of an immunoglobulin.
  • the immunoglobulin molecule After being digested by papain, the immunoglobulin molecule is split into an Fc segment and a Fab segment, wherein the Fc segment includes the portion other than the CH1 segment of the heavy chain constant region of the immunoglobulin molecule.
  • the term "Fc domain” is a monomer, which may refer to any portion other than the CH1 segment in the constant region of any heavy chain of an immunoglobulin.
  • the Fc domain in the first protein binding domain is referred to as FcA
  • the Fc domain in the second protein binding domain is referred to as FcB
  • the FcA domain and the FcB domain are the same or different amino acid sequences, and the FcA domain and the FcB domain forming the FcAB dimer can be used interchangeably in the first fusion protein and the second fusion protein. That is, when the first protein binding domain is FcA, the second protein binding domain is FcB; and when the first protein binding domain is FcB, the second protein binding domain is FcA.
  • the FcA and FcB may be derived from the Fc segment of the same immunoglobulin, or may be derived from the Fc segment of different immunoglobulins.
  • the immunoglobulins from which the FcA and FcB are derived may be selected from: IgG1, IgG, IgE, IgM, IgD and IgA.
  • the immunoglobulins from which the FcA and FcB are derived may be selected from: IgG1, IgG2, IgG3, IgG4, IgA1 and IgA2.
  • the FcA and FcB are selected from the group consisting of amino acid sequences as set forth in SEQ ID NOs: 11-19, conservatively substituted variants of the amino acid sequences as set forth in SEQ ID NOs: 11-19, and one or two amino acid sequences having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequences as set forth in SEQ ID NOs: 11-19.
  • the amino acid sequences of the FcA and FcB are both as set forth in SEQ ID NO: 11, or are conservatively substituted variants of the amino acid sequences as set forth in SEQ ID NO: 11, or have at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequences as set forth in SEQ ID NOs: 11.
  • the amino acid sequence of the FcA is as set forth in SEQ ID NO: 12
  • the amino acid sequence of FcB is as shown in SEQ ID NO:13, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO:13, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:13.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:13, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:13, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:13; and the amino acid sequence of FcB is as shown in SEQ ID NO:12, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:12, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:12.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:14, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO:14, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:14; and the amino acid sequence of FcB is as shown in SEQ ID NO:15, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO:15, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:15.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:15, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:15, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:15; and the amino acid sequence of FcB is as shown in SEQ ID NO:14, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:14, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:14.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:16, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:16, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:16; and the amino acid sequence of FcB is as shown in SEQ ID NO:17, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:17, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:17.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:17, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:17, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:17; and the amino acid sequence of FcB is as shown in SEQ ID NO:16, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:16, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:16.
  • the amino acid sequence of FcA is as shown in SEQ ID NO: 18, or is a conservatively substituted variant of the amino acid sequence as shown in SEQ ID NO: 18, or is a variant of the amino acid sequence as shown in SEQ ID NO: 18.
  • the amino acid sequence has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity; and the amino acid sequence of the FcB is as shown in SEQ ID NO: 19, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO: 19, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO: 19.
  • the amino acid sequence of FcA is as shown in SEQ ID NO:19, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:19, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:19; and the amino acid sequence of FcB is as shown in SEQ ID NO:18, or is a conservative substitution variant of the amino acid sequence as shown in SEQ ID NO:18, or has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity with the amino acid sequence as shown in SEQ ID NO:18.
  • the FcA domain and the FcB domain can increase the expression display amount and/or copy number of the complex.
  • the FcA domain only comprises the first CH3 domain
  • the FcB domain only comprises the second CH3 domain; wherein the first CH3 domain and the second CH3 domain can be from the same immunoglobulin subtype or different immunoglobulin subtypes.
  • the FcA domain includes both the first CH2 domain and the first CH3 domain
  • the FcB domain includes both the second CH2 domain and the second CH3 domain
  • the first CH3 domain and the second CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the first CH2 domain and the second CH2 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the first CH2 domain and the first CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes
  • the second CH2 domain and the second CH3 domain may be from the same immunoglobulin subtype or different immunoglobulin subtypes.
  • the FcA domain may further include a first CH4 domain
  • the FcB domain may further include a second CH4 domain.
  • first CH2 domain and the first CH3 domain of the first protein binding domain are directly or indirectly connected through a linker.
  • the second CH2 domain and the second CH3 domain of the first protein binding domain are directly or indirectly connected through a linker.
  • first CH2 domain and the first CH3 domain of the first protein binding domain are indirectly connected through a linker; and the second CH2 domain and the second CH3 domain of the second protein binding domain are indirectly connected through a linker.
  • the linker is a cleavable or non-cleavable linker.
  • the linker is a short peptide or a connecting peptide
  • the connecting peptide can be a flexible connecting peptide or a rigid connecting peptide.
  • Linker peptides can be selected with different lengths or functions. Their types, functions and characteristics can be referenced, for example, Reddy Chichili VP, Kumar V, Sivaraman J. Linkers in the structural biology of protein-protein interactions. Protein Sci. 2013; 22(2): 153-167. doi: 10.1002/pro.2206.
  • either or both of the FcA domain and the FcB domain comprise one or more amino acid modifications, which can enhance or stabilize the FcA domain and/or the FcB domain itself (“FcS”), and/or enhance or stabilize the covalent or non-covalent binding between the FcA domain and the FcB domain (“FcM”), and/or reduce or avoid the ⁇ chain and ⁇ chain mentioned in the present application caused by mispairing of the FcA domain and the FcB domain.
  • FcN Various possible mismatches of the FcN chain
  • the first protein binding domain is linked to the C-terminus of the ⁇ chain and the second protein binding domain is linked to the C-terminus of the ⁇ chain.
  • the signal peptide for initiating the expression of the first fusion protein and the second fusion protein is SPP SP or AGA2 SP, and further preferably, the amino acid sequence of the AGA2 SP is as shown in SEQ ID NO:39, and the amino acid sequence of the SPP SP is as shown in SEQ ID NO:45.
  • the complex is an MHC class I molecule or an MHC class II molecule. In some embodiments, the complex is an HLA class I molecule or an HLA class II molecule.
  • the ⁇ chain is encoded by HLA-DRA*01 or other alleles of the HLA-DRA family, and the ⁇ chain is encoded by HLA-DRB1*01 or HLA-DRB1*03 or HLA-DRB1*04 or HLA-DRB1*15 or other alleles of the HLA-DRB family; or
  • the ⁇ chain is encoded by HLA-DQA1*01 or HLA-DQA1*03 or HLA-DQA1*05 or other alleles of the HLA-DQA1 family, and the ⁇ chain is encoded by HLA-DQB1*02 or HLA-DQB1*03 or HLA-DQB1*05 or HLA-DQB1*06 or other alleles of the HLA-DQB1 family; or
  • the ⁇ chain is encoded by HLA-DPA1*01:03 or HLA-DPA1*02:02 or other alleles of the HLA-DPA1 family
  • the ⁇ chain is encoded by HLA-DPB1*01:01 or HLA-DPB1*02:01 or HLA-DPB1*04:01 or HLA-DPB1*04:02 or other alleles of the HLA-DPB1 family.
  • the second aspect of the present application provides an engineered nucleic acid molecule encoding a major histocompatibility complex (MHC), wherein the nucleic acid of the major histocompatibility complex (MHC) is the histocompatibility complex contained in the cell of the first aspect of the present application.
  • the engineered nucleic acid molecule encoding the MHC refers to an engineered nucleic acid molecule comprising the MHC coding sequence and/or its complementary sequence.
  • the complex comprises an ⁇ chain and a ⁇ chain, wherein the ⁇ chain is connected to a first protein binding domain, and the ⁇ chain is connected to a second protein binding domain, wherein the first protein binding domain and the second protein binding domain are capable of binding to form the complex, and the first protein binding domain and the second protein binding domain form an FcAB dimer to enhance the binding of the ⁇ chain and the ⁇ chain to form the MHC.
  • the engineered nucleic acid molecule is an engineered DNA molecule.
  • the DNA molecule can be replicated and/or expressed in a cell.
  • the DNA molecule can be replicated and/or expressed in a eukaryotic cell.
  • the DNA molecule can be replicated and/or expressed in a prokaryotic cell.
  • the DNA molecule can be expressed in a eukaryotic cell and can be replicated in a prokaryotic cell.
  • the DNA molecule in addition to comprising a nucleic acid fragment encoding the major histocompatibility complex (MHC), also comprises a gene manipulation or regulatory element for replication and/or expression in a prokaryotic and/or eukaryotic cell.
  • MHC major histocompatibility complex
  • the engineered DNA molecules further comprise a marker gene or fragment thereof and/or a reporter gene or fragment thereof, and a unique restriction endonuclease site that allows insertion of DNA elements, preferably a restriction endonuclease site in the form of a multiple cloning site (MCS).
  • the marker gene facilitates identification of a cell containing the The cells containing the plasmid of the marker gene can be selected from, for example, antibiotic resistance genes.
  • Each restriction endonuclease site in the MCS can be specifically recognized by different restriction endonucleases.
  • the DNA molecule is a DNA plasmid.
  • DNA plasmid refers to a plasmid consisting of a double-stranded DNA molecule.
  • the "plasmid” is a circular DNA molecule.
  • the "plasmid” can also encompass linear DNA molecules.
  • the term “plasmid” also encompasses molecules obtained by, for example, cutting a circular plasmid with a restriction endonuclease, thereby converting the circular plasmid molecule into a linear molecule and linearizing the circular plasmid, and linear molecules that can be replicated in prokaryotes.
  • Plasmids can replicate, i.e., amplify in cells independently of the genomic genetic information stored in the nucleoid or nucleoid of prokaryotes, and can be used for cloning, i.e., for amplifying genetic information in bacterial cells.
  • the DNA plasmid according to the present invention is a medium copy or high copy plasmid, more preferably a high copy plasmid.
  • high copy plasmids are such vectors: they are based on pUC, pTZ plasmids or any other plasmids (e.g., pMB1, pCoIE1) etc. that contain an ORI that supports high copies of plasmids.
  • the engineered DNA molecule is a DNA molecule or a fragment thereof constituting a prokaryotic nucleoid or nucleoid, or a DNA molecule or a fragment thereof constituting a eukaryotic genome, that is, the coding sequence or its complementary sequence comprising the aforementioned MHC can be replicated along with the prokaryotic genome.
  • the engineered DNA molecule can be transcribed into mRNA.
  • the engineered DNA molecule also comprises a coding sequence of an element that can be used to start or regulate the expression of the protein, polypeptide or its fragment after transcription, and the element includes but is not limited to 5'UTR, 3'UTR, poly (A) tail (or tailing signal), etc.
  • the engineered DNA molecule comprises a coding sequence of at least one untranslated region (UTR).
  • the engineered DNA molecule comprises at least a coding sequence of 5'UTR and a coding sequence of the protein, polypeptide or its fragment.
  • the engineered DNA molecule comprises at least a coding sequence of 5'UTR, a coding sequence of the MHC, a coding sequence of 3'UTR, a tailing signal (or a DNA sequence corresponding to the Ploy (A) tail sequence) from 5' to 3', and the two ends of the coding sequence of the MHC may also respectively comprise a start codon (5' end) and a stop codon (3' end), which are respectively the first three nucleotides and the last three nucleotides of the mRNA molecule that can be translated.
  • the 5'UTR usually contains at least one ribosome binding site (RBS), such as the Shine-Dalgarno sequence in prokaryotes, or at least one translation initiation site, such as the Kozak sequence in eukaryotes.
  • RBS ribosome binding site
  • the RBS promotes the efficient and accurate translation of mRNA molecules by recruiting ribosomes at the start of translation. Its activity can be optimized by changing the length and sequence of a given RBS or translation revelation site and the distance from the start codon.
  • the 5'UTR includes an internal ribosome entry site or IRES.
  • the 3'UTR may contain one or more regulatory sequences, such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for regulatory RNA molecules (such as miRNA molecules), and/or signal sequences that participate in the intracellular transport of mRNA molecules.
  • regulatory sequences such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for regulatory RNA molecules (such as miRNA molecules), and/or signal sequences that participate in the intracellular transport of mRNA molecules.
  • the target gene fragment further comprises one or more additional regulatory sequences, such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for amino acid sequences that enhance the translation of mRNA molecules, regulatory elements (such as riboswitches), and/or nucleotide sequences that have a positive impact on translation initiation.
  • additional regulatory sequences such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for amino acid sequences that enhance the translation of mRNA molecules, regulatory elements (such as riboswitches), and/or nucleotide sequences that have a positive impact on translation initiation.
  • regulatory elements such as riboswitches
  • nucleotide sequences that have a positive impact on translation initiation within the 5'UTR, preferably there is no functional upstream open reading frame, an out-of-frame upstream translation initiation site, an out-of-frame upstream start codon, and/or a nucleotide sequence that produces a secondary structure that reduces or
  • the coding sequence of the MHC comprises codons that can be translated into an amino acid sequence. All the codons contained in the coding sequence may be naturally occurring codons encoding amino acids, or may be partially or entirely composed of artificially synthesized codons. In some embodiments, some or all of the codons have been codon-optimized. In some embodiments, some or all of the codons encode non-natural amino acids.
  • the engineered DNA molecule further comprises a structural element necessary for initiating or regulating transcription of the RNA on the 5' end of the target gene fragment, and the structural element is known in the art.
  • the structural element comprises at least a promoter. Promoters and their sequences are known in the art, including weak promoters, medium strength promoters, strong promoters, mini promoters or core promoters, etc. In some specific embodiments, the promoter is a strong promoter.
  • the promoter can initiate transcription of the coding sequence of the MHC in prokaryotes. In some embodiments, the promoter can initiate transcription of the coding sequence of the MHC in eukaryotic cells.
  • the "promoter” comprises at least one transcription recognition site and a subsequent transcription factor binding site.
  • the recognition and binding sites can interact with an amino acid sequence that mediates or regulates transcription. Compared with the recognition site, the binding site is closer to the aforementioned target gene fragment.
  • the binding site can be, for example, a Pribnow box in prokaryotes or a TATA box in eukaryotes.
  • the transcription recognition site when using the Pribnow box, can be located at about 35bp upstream of the transcription start site, and the transcription factor binding site can be located at about 10bp upstream of the transcription start site.
  • the promoter comprises at least one other regulatory element, such as an upstream element rich in AT at about 40 and/or 60 nucleotides before the transcription start site, and/or an additional regulatory element of the enhancing promoter activity between the recognition site and the binding site.
  • the promoter is a strong promoter, i.e., the promoter comprises a sequence that promotes the transcription of the aforementioned RNA coding sequence. Strong promoters are known to those skilled in the art, such as OXB18, OXB19 and OXB20 promoters derived from the RecA promoter of Escherichia coli, or can be identified or synthesized by conventional laboratory procedures.
  • the promoter is a T7 promoter.
  • the promoter also comprises other regulatory elements before the promoter, such as an enhancer that can promote the transcription of the aforementioned RNA coding sequence in a DNA plasmid.
  • the eukaryotic cell is a yeast cell.
  • the DNA molecule is a yeast display vector.
  • the present application also provides an RNA molecule encoding the above-mentioned MHC.
  • the RNA molecule is transcribed from the above-mentioned engineered DNA molecule.
  • the engineered nucleic acid molecule may also be a hybrid molecule of DNA and RNA.
  • the present application also covers any cells comprising the above-mentioned engineered nucleic acid molecules.
  • the present application also relates to a method for preparing a major histocompatibility complex (MHC), the method comprising: transforming a cell with the engineered nucleic acid molecule of the second aspect, and culturing the cell under conditions expressing the complex.
  • the cell is a yeast cell, such as a Saccharomyces cerevisiae cell.
  • the cell is a yeast cell and the engineered nucleic acid molecule is a yeast display vector.
  • the present application also relates to a method for identifying a peptide that binds to a major histocompatibility complex (MHC), the method comprising:
  • the peptide comprises a mixture of peptides.
  • the peptide competes with a reference peptide for binding to the complex.
  • the "reference peptide” can be any known peptide that can bind to MHC, such as an antigen epitope peptide.
  • Example 1 Yeast cell co-display dependent on peptide binding and HLA-linked C-terminal domains
  • the protein stability of the heterodimerization of ⁇ and ⁇ chains of MHC-II depends to a certain extent on the cooperation of the transmembrane domain and intracellular domain of each ⁇ and ⁇ chain.
  • MHC-II human leukocyte antigens HLA-DR1 HLA-DRA*01:01/HLA-DRB1*01:01, as shown in SEQ ID NO:40 and SEQ ID NO:41
  • HLA-DR4 HLA-DRA*01:01/HLA-D
  • the protein heterodimers corresponding to the most representative alleles such as HLA-RB1*04:01, as shown in SEQ ID NO:40 and SEQ ID NO:42), HLA-DQ6 (HLA-DQA1*01:02/HLA-D
  • the bidirectional promoter GAL1-10 in the designed vector plasmid can guide the simultaneous expression of ⁇ and ⁇ chains, and the leucine zipper domain (LZA or LZB to form LZ stable dimers, the sequences selected in the embodiment are shown in SEQ ID NO: 9 and SEQ ID NO: 10) or antibody heavy chain constant domain (FcA or FcB to form Fc stable dimers, the sequences selected in the embodiment are shown in SEQ ID NO: 15 and SEQ ID NO: 16) are fused to the C-terminal of each of the expressed ⁇ and ⁇ chains, making it easier for the HLA-II ⁇ / ⁇ chain with LZ or Fc modification to form a natural HLA complex (Figure 2-7).
  • LZA or LZB leucine zipper domain
  • FcA or FcB antibody heavy chain constant domain
  • the synthetic SPP SP (as shown in SEQ ID NO: 45) signal peptide is further replaced by the yeast endogenous AGA2 SP (as shown in SEQ ID NO: 39) signal peptide at the N-terminal of the Fc-modified DR ⁇ chain or DQ ⁇ chain.
  • yeast endogenous AGA2 SP as shown in SEQ ID NO: 39
  • the inventors then selected a peptide fragment that can specifically bind to both DR1 and DR4, namely influenza hemagglutinin (HA) 306-318 (as shown in SEQ ID NO: 2) to verify the display of peptide/MHC-II trimer by yeast cells.
  • HA hemagglutinin
  • the expression of antigenic peptide and HLA-DR ⁇ / ⁇ heterodimer was driven by two sets of yeast shuttle vector plasmids.
  • the progeny yeast cells transformed with such plasmids could display the single-chain fusion antigenic peptide on the cell surface by fusing the peptide fragment to the C-terminus of the endogenous yeast adhesion receptor subunit Aga2p ( FIG. 9 ).
  • the vector for directing the expression of the single-chain fusion antigenic peptide gene used the traditional yeast display method (see, for example, Boder & Wittrup, Yeast surface display for screening combinatorial polypeptide libraries, 1997).
  • the inventors In order to make the selected representative HLA allele proteins successfully form dimers and successfully display them on the yeast surface by non-covalent binding (rather than direct display) with the single-chain fusion antigen peptide, the inventors first transformed the Aga2p in the aforementioned HLA display plasmid used for yeast display of HLA into The coding gene of the protein was knocked out from the plasmid, and an HLA secretory vector was constructed that only secreted but did not display the HLA dimer structure skeleton with enhanced C-terminal modification ( Figures 2-7).
  • This dimeric fusion protein is secreted from yeast cells in a manner independent of the single-chain fusion antigen peptide expression vector, and then successfully binds to the antigen peptide in the cell, ultimately achieving the anchoring of the peptide/HLA trimer structure on the cell surface ( Figure 1).
  • the vector that guides the display of the single-chain fusion antigen peptide and the HLA secretory vector with enhanced C-terminal modification were transformed into Saccharomyces cerevisiae EBY100 for protein expression and trimer co-display.
  • double antibody fluorescent labeling and flow cytometry detection technology the inventors successfully demonstrated that the CODAAH system can display peptide/HLA trimers on the cell surface ( Figure 9).
  • the expression abundance and display amount of the peptide/HLA trimer formed by DR1 on the yeast surface were greatly improved (expressed as the mean or median of the fluorescence signal intensity (MFI), the signal enhancement was more than 2 times).
  • MFI fluorescence signal intensity
  • This method can enable some HLA allele proteins that are difficult to co-display on the yeast surface to successfully form peptide/HLA trimers (expressed as the mean or median of the fluorescence signal intensity, from no signal to signal).
  • Example 2 Construction of a display peptide library to achieve CODAAH system optimization and antigen peptide development
  • the co-display of peptide/HLA trimer by CODAAH system depends on the modification and stabilization of HLA-II ⁇ / ⁇ heterodimer protein itself, and on the preference of HLA-II specific antigen peptide.
  • the inventors designed a peptide variant library based on the analysis of the amino acid sequence of HA 306-318 .
  • the library randomly mutated the five amino acid residues that mainly bind to DR1 in the registered peptide sequence of HA 306-318 : YVKQNTLKL (as shown in SEQ ID NO: 20), with the intention of screening out peptide variants with higher affinity to DR1 by flow cytometry sorting, so that more DR1 dimers can be displayed on the yeast surface by binding to peptide variants.
  • the anchor residues that YVKQNTLKL binds to DR1 are mainly P1Y, P4Q, P6T, P7L, and P9L.
  • the inventors used a PCR primer carrying five degenerate codons (NNS) to amplify the insert fragment and double enzyme digestion to obtain a linearized empty plasmid.
  • the high concentration of the insert fragment carrying the sequence encoding the HA 306-318 peptide variant (as shown in SEQ ID NO: 46) and the linearized empty plasmid were transformed into yeast cells that can secrete and express DR1.
  • the insert fragment and the linearized empty plasmid formed a plasmid displaying the peptide variant through homologous recombination in the progeny yeast cells, so that each yeast cell in the library can display at least one peptide variant and can simultaneously secrete DR1 molecules.
  • the library constructed in this way has a capacity of >10 8.
  • Flow cytometry sorting was used to screen for double positivity of the peptide variant (the signal can be represented by the HA tag) and DR (the signal can be represented by the L243 monoclonal antibody) to obtain the target yeast clone.
  • the HA+/DR+ double positive signal of the progeny library (about 0.01% of the parental library capacity, ⁇ 10 4 yeast clones) was significantly improved compared with the double positive signal of the parental library ( Figure 10).
  • the sequencing results of the plasmids contained in the sub-library single clones showed that 90% of the clones carried peptide variants with base mutations ( Figure 11).
  • the designed degenerate codon positions mostly showed overlapping peak sequencing results (8 out of 10, except #1 and #4), indicating that the HA+/DR+ double-positive single yeast cells contain multiple peptide variant display plasmids, and the peptide variants expressed by each plasmid are not completely the same at the five anchor residue positions. It is further inferred that the dominant codon DNA of the peptide variants of yeast clone #5 and clone #7 is the same as the product corresponding to the dominant codon, and it cannot be confirmed that the peptide variant display recombinant plasmids contained in these two yeast clones are exactly the same.
  • the inventor extracted the plasmids contained in the single clone yeasts numbered #1, #7, #9, and #10 and transformed Escherichia coli.
  • the results of extracting plasmids and sequencing from multiple selected Escherichia coli verified this conjecture: a single yeast cell can accommodate at least 3 peptide variant display plasmids, and the peptide variants expressed by each plasmid can bind to DR1 ( Figure 12).
  • yeast PCR sequencing did not show overlapping peaks in clone #1. In fact, it only contains one peptide variant (the sequencing results of E.
  • coli extracted plasmids #1-1 and #1-2 are exactly the same as the PCR sequencing results of yeast #1), while the clones #7, #9, and #10 that show overlapping peaks also contain more than one peptide variant.
  • This conclusion shows that the actual library capacity of the constructed parent library may be larger than the theoretical library capacity, and may even exceed it by one order of magnitude and approach the industrial-grade library capacity of 10 9.
  • the number of peptide variants that can bind to DR1 in the HA+/DR+ double-positive sub-library can reach 10 5 , of which 90% of the peptide variants may contain actual amino acid mutations at the anchor site, and have a promoting effect on improving DR1 affinity, thereby facilitating the formation of trimers of co-displayed peptides and DR1 in the CODAAH system.
  • peptide variant #1 not only improves its ability to bind to DR1, but also improves its ability to bind to DR4 in DR4-secreting yeast transformant progeny strains ( Figure 13).
  • the Fc modification in the DR4-Fc fusion protein structure is obviously more conducive to the detection of relatively high levels of DR4 protein on the surface of transformed yeast cells by flow cytometry ( Figure 13), which once again verifies the optimal effect of C-terminal Fc modification on stabilizing HLA-II ⁇ / ⁇ heterodimers.
  • constructing #1 variant library with peptide variant #1 as wild type can be used as a method for discovering DR4 new antigens.
  • the entire gene expression cassette (GAL1-10//AGA2-HA//scFv 4-4-20//MF ⁇ Term.) was cut out from pCT302 using KpnI and SacI, and then subcloned into the yeast shuttle vector pRS315 partially digested with KpnI/SacI to form a yeast surface display plasmid with the LEU nutritional marker gene.
  • an oligonucleotide DNA chain encoding an antigenic peptide e.g., influenza virus hemagglutinin residues HA306-318: SEQ ID NO: 2
  • an antigenic peptide e.g., influenza virus hemagglutinin residues HA306-318: SEQ ID NO: 2
  • GGGS short spacer
  • the vector plasmids ptDR4-LZ and ptDQ6-LZ (Liu, Jiang, & Mellins, Yeast display of MHC-II enables rapid identification of peptide ligands from protein antigens (RIPPA), 2021) can be used in yeast to display C-terminal LZ-enhanced modified HLA-DR4 and HLA-DQ6 extracellular domain heterodimers, respectively.
  • the ⁇ gene HLA-DRB1*04:01 in ptDR4-LZ was replaced with HLA-DRB1*01:01.
  • the vector of the extracellular domain heterodimer replaces the nucleic acid sequences encoding AGA2 and HA-tags downstream (3'-end) of the DR ⁇ gene or DQ6 ⁇ gene of the three plasmids used for yeast display of HLA-II with stop codons in the same reading frame.
  • the two groups of vectors obtained in this way can be used to display or secrete three representative HLA fusion proteins of DR1, DR4 and DQ6 with enhanced modification of C-terminal leucine binding domain (refer to Figures 2-4).
  • the first leucine zipper fragment and the second leucine zipper fragment downstream (3'-end) of each HLA ⁇ and ⁇ gene in the two groups of three vectors are replaced with antibody crystallizable fragments.
  • the new two groups of three new vectors can respectively guide the expression of DR1, DR4 and DQ6 fusion proteins with enhanced modification of C-terminal crystallizable antibody constant domain (refer to Figures 5-7).
  • nucleic acid sequence encoding the signal peptide of AGA2SP (as shown in SEQ ID NO: 39) is cloned into the N-terminus of the Fc-modified DR ⁇ chain or DQ ⁇ chain to replace the original nucleic acid sequence encoding the signal peptide of SPP SP (as shown in SEQ ID NO: 45).
  • the TRP+ vector plasmids secreting or displaying HLA-II were transformed into the Saccharomyces cerevisiae strain EBY100 (GAL1-AGA1:URA3 ura3-52trp1 leu2 ⁇ 1 his3 ⁇ 200 pep4:HIS2 prb1 ⁇ 1.6R can1 GAL) together with the LEU+ vectors that can direct the expression of single-chain antigen peptide fusion proteins or each plasmid was transformed separately. Electroporation was performed using a MicroPulser electroporator (BioRad), and the basic method was referred to the BioRad MicroPulser manual.
  • BioRad MicroPulser electroporator
  • the medium SD-SCAA contained 2% (wt/vol) glucose, 0.67% (wt/vol) yeast nitrogen base without amino acids, appropriate amounts of amino acid (Trp- and/or Leu- and optionally Ura-) deletion supplement mix (Clontech), 38 mM Na2HPO4, 62 mM NaH2PO4, pH 6.
  • induced cells were first washed 1-2 times with 400 ⁇ L Tris-buffered saline (137 mM NaCl, 20 mM Tris-Cl, pH 7.6), and then incubated in 20 ⁇ L reducing buffer [50 mM Tris-Cl pH 8.0, 1 mM DTT (Sigma; added before use)] at 4°C with gentle shaking for 24 hours, or in 20 ⁇ L Factor Xa buffer (100 mM NaCl, 2 mM CaCl 2 , 20 mM Tris-Cl, pH 8) with 20 ⁇ g/mL Factor Xa protease (New England Biolabs) at 23°C for at least 48 hours.
  • Tris-buffered saline 137 mM NaCl, 20 mM Tris-Cl, pH 7.6
  • 20 reducing buffer 50 mM Tris-Cl pH 8.0, 1 mM DTT (Sigma; added before use)
  • cells Prior to primary labeling, cells were pelleted by centrifugation and washed at least once with 400 ⁇ L ice-cold PBS + 1% BSA. The cell pellet was resuspended in 25 ⁇ L PBS + 1% BSA containing mouse anti-DR monoclonal antibody L243 (BD Biosciences) or anti-DQ monoclonal antibody SPV-L3 (BD Biosciences) and rabbit anti-HA polyclonal antibody (Sigma) and incubated at room temperature for 30 minutes, then incubated on ice for 10 minutes.
  • the cells were washed again with ice-cold PBS + 1% BSA and resuspended in 40 ⁇ L PBS + 1% BSA containing highly cross-adsorbed secondary antibodies (Thermo Fisher): Alexa FLuor 647 goat anti-mouse IgG (H + L) (1: 80) and Alexa FLuor 488 goat anti-rabbit IgG (H + L) (1: 80) and incubated on ice for 40 minutes. Finally, the cells were washed with ice-cold PBS + 1% BSA and then resuspended in 500-700 ⁇ L PBS + 1% BSA for flow cytometry.
  • mouse anti-V5 mAb (Thermo Fisher) was used at a dilution of 1:30 instead of mAb L243 as the primary marker. Reagents were used, while other labeling steps remained unchanged.
  • P0 is used to introduce random mutations at the positions of the five HA 306-318 peptide anchor residues
  • P1 and P2 are used to amplify the gene insert fragment carrying the peptide mutant and add homologous sequences to both ends of the linearized vector at both ends of the fragment.
  • the linearized peptide display vector was obtained by double digestion with XmaI/NheI. After centrifugation, concentration and drying, a total of 50 ⁇ g of the insert fragment and the linearized vector (1:1 mass ratio) were electrotransformed into 800 ⁇ l of yeast competent cells that can secrete DR1 to obtain the yeast library used for co-display.
  • the immunofluorescence labeling of the yeast library was amplified in equal proportions based on the labeling method of the aforementioned yeast strain.
  • the sorting of yeast cells co-labeled with HA tags and DR was completed on a FACSAriaIIu flow cytometer.
  • the sorted sub-library was re-amplified in SD medium, and then induced to express in SG medium for flow cytometric analysis or further sorting.
  • the sub-library selected and amplified for the last time was sampled and coated with agarose medium for monoclonal yeast PCR and gene sequencing, and then further yeast plasmid extraction, E. coli transformation and subsequent plasmid extraction and gene sequencing analysis of multiple E. coli corresponding to one yeast.
  • the sequencing-verified peptide variant display plasmid was re-transformed into yeast competent cells that can secrete DR1 or DR4 for trimer co-display verification.
  • the verification method can refer to flow cytometry analysis and the following quantitative analysis method.
  • the flow cytometers used included FACSCalibur and FACSAriaIIu (BD Biosciences). Flow cytometric data of yeast strains co-displayed with peptide/HLA trimers were analyzed using Flowjo software (BD). The display level of peptides (e.g., influenza virus hemagglutinin peptide HA 306-318 ) on the yeast surface is proportional to the mean fluorescence intensity value after background correction and normalized to the background intensity, expressed as cMFI
  • HLA display levels dependent on peptide binding can be correctly calculated by using the normalized, background-corrected fluorescence with anti-DR staining and peptide display levels (DR-ratio) (anti-DQ staining is calculated similarly)
  • (+) and (-) represent yeast with co-display and yeast with peptide display only, respectively.
  • MFI(DR) and cMFI represent DR The average fluorescence intensity of the conjugated Alexa Fluor 647 emission and the display level of HA 306-318 on the surface of the corresponding yeast strains. Normalization minimizes the error between experiments caused by laser power output, detector amplification, and other cytometer parameters.
  • Example 3 This method effectively increases the display of "empty" MHC-II complexes
  • the peptide binding groove of nascent MHC-II (or human leukocyte antigen, HLA-II) is generally occupied by the partner invariant chain (Ii), which stabilizes the HLA-II protein structure and avoids mismatching between HLA-II and intracellular interfering peptides.
  • Ii can be trimmed by proteolytic enzymes to produce CLIP peptides.
  • CLIP peptides continue to bind to HLA-II with lower affinity until higher affinity antigen peptides replace CLIP through the peptide exchange process.
  • MHC-II class compartment MIIC
  • HLA-DM MHC-II class compartment
  • the inventors modified the two C-termini of the ⁇ / ⁇ chain to carry the amino acid sequence of the crystallizable fragment of the antibody heavy chain (FcAB, or abbreviated as Fc in the description of HLA-II fusion protein), which effectively promotes the pairing of the soluble secretory chain with other surface anchor chains, significantly increasing the expression display amount and/or copy number of the MHC-II complex.
  • FcAB crystallizable fragment of the antibody heavy chain
  • HLA-DR1 HLA-DRA*01:01/HLA-DRB1*01:01, as shown in SEQ ID NO:40 and SEQ ID NO:41
  • HLA-DR4 HLA-DRA*01:01/HLA-DRB1*04:01, as shown in SEQ ID NO:40 and SEQ ID NO:42
  • HLA-DR15 HLA-DRA*01:01/HLA-DRB1*15:01, as shown in SEQ ID NO:40 and SEQ ID NO:81
  • HLA-DQ6 HLA-DQA1*01:02/HLA-DQB1*06:02, as shown in SEQ ID NO:43 and SEQ ID NO:44
  • the bidirectional GAL1-10 promoter directed the simultaneous expression of the ⁇ and ⁇ chains.
  • LZ leucine zipper
  • the inventors created new yeast strains by transforming the successfully constructed yeast shuttle vectors into the parental Saccharomyces cerevisiae strain EBY100 (Boder & Wittrup, 1997). After inducing protein expression, the inventors performed immunofluorescence labeling on the yeast cells and flow cytometry detection.
  • the detection antibodies are a mouse monoclonal antibody L243 and a mouse monoclonal antibody L243 that can recognize the spatial epitope of the correctly folded HLA-II ⁇ protein complex.
  • Example 4 This method can further optimize and increase the display of "empty" MHC-II complexes
  • Eukaryotic cell secretory proteins usually use a signal peptide (SP) at the N-terminus of the amino acid polypeptide sequence.
  • SP signal peptide
  • the MHC-II duplex displayed by yeast also requires a similar signal peptide.
  • the syn-pre-pro synthetic signal peptides were selected at the N-terminus of both the ⁇ and ⁇ chains.
  • the inventors replaced the syn-pre-pro synthetic signal peptide at the N-terminus of the yeast display chain with the yeast endogenous AGA2 signal peptide ( Figure 15).
  • the inventors performed immunofluorescence labeling and flow cytometry on the newly constructed yeast cells. Due to the use of AGA2 SP, the expression level of FcAB-modified DR1 or DR4 ⁇ / ⁇ on the yeast surface was greatly improved, which was significantly higher than the expression level of LZ-modified HLA-II ⁇ / ⁇ on the yeast surface (MFI_Fc>MFI_LZ). The positive fluorescence intensity signal increased by 146.6% and 96.1% (MFI_Fc-MFI_LZ)/(MFI_LZ-MFI_BC)x100% respectively (Figure 15).
  • FcAB modification plus the optimization of signal peptides enabled all "empty" MHC-II proteins to be correctly assembled on the yeast surface and their display levels were greatly improved, which laid a solid foundation for improving the sensitivity and accuracy of MHC-II peptide ligand identification.
  • Example 5 This method effectively improves the sensitivity and accuracy of identifying MHC-II peptide ligands
  • Identification of peptide ligands for MHC-II includes two aspects: computational prediction and experimental verification. Since artificial intelligence machine learning is still in the development stage, the algorithm model for computational prediction still relies heavily on high-precision experimental data.
  • there are two types of experimental methods for identifying peptide ligands for MHC-II One type uses mass spectrometry to quantify peptides eluted from MHC-II molecules, which are immunoprecipitated from lysed cells (EL for short). The other type of method detects the binding of synthetic peptides to recombinant MHC-II proteins and measures affinity (BA for short).
  • Physiological peptide loading occurs in acidic MIIC (pH-5) at human body temperature (37°C), while "empty" MHC-II displayed on the surface of yeast cells can bind to specific target peptides under a wide range of conditions, for example, at pH 5.0 or pH 7.4 and 30°C or 37°C.
  • the target peptides here are biotinylated (indicator peptides) and can be used for flow cytometry and quantitative analysis.
  • Kinetic evidence shows that the target peptides generally reach a dynamic equilibrium with the "empty" MHC-II displayed on the surface of yeast around 15-20 hours. Therefore, pH-5, 20h represents the preferred condition for comparing the yeast display levels of HLA-II-Fc and HLA-II-LZ and their binding peptide sensitivity and accuracy.
  • the inventors used influenza hemagglutinin (HA) 306-318 peptide As the specific target antigen peptide corresponding to HLA-DR1.
  • HA hemagglutinin
  • the comparative experimental results showed that the expression level (display amount) of DR1-Fc detected on the yeast surface was not only much higher than that of DR1-LZ, but also the level of bound indicator peptide was significantly higher than that of DR-LZ ( Figure 16).
  • the affinity constant e.g., apparent equilibrium dissociation constant KDapp
  • the half-inhibitory concentration (IC50) measured by the "empty" DR1 ⁇ / ⁇ modified with FcAB expressed on the yeast surface is more accurate than the IC50 measured by the LZ-modified complex, because when the concentration of the competing peptide is higher than 50 ⁇ M, it is almost impossible to distinguish the difference between the positive signal of the indicator peptide and the background signal on the yeast surface displaying DR-LZ, resulting in a significant increase in the data fitting error ( Figure 19).
  • This comparison result once again proves that the yeast strain created by this method has higher sensitivity and accuracy in the application scenario of identifying peptide ligands compared with other yeast strains.
  • the indicator peptides Bio-HA306-318 (biotin-PKYVKQNTLKLAT), Bio-aI (negative control, DQ2.5 binding peptide) and competitor peptides were synthesized by GenScript.
  • Monoclonal antibodies included mouse anti-DR ⁇ (clone L243), mouse anti-DQ ⁇ (clone ), Alexa Fluor 647-conjugated streptavidin, and highly cross-adsorbed secondary antibodies including Alexa Fluor 488 goat anti-mouse IgG (H+L) and Alexa Fluor 647 goat anti-mouse IgG (H+L) were purchased from Thermo Fisher Scientific.
  • DR4-Fc and "empty" DQ6-Fc display plasmids the Fos or Jun leucine zipper dimerization motif at the C-terminus of the ⁇ or ⁇ chain in ptDR4-LZ (TRP+) and ptDQ6-LZ (TRP+) used previously (Liu et al., 2021) was replaced with the FcA or FcB domain (as shown in SEQ ID NO: 11-19, and the sequences selected in the embodiment are shown in SEQ ID NO: 14 and SEQ ID NO: 15).
  • DR4 can be replaced by DR1 or DR15 to construct yeast display vectors for DR1-Fc or DR15-Fc.
  • the plasmid carrying the tryptophan nutritional marker gene was transformed into the yeast parent strain EBY100 (URA+, TRP-) by electroporation according to the BioRad MicroPulser manual protocol. After culturing in a 30°C incubator for 2 days, single yeast colonies appeared on a solid agar plate containing tryptophan deficiency.
  • the medium was SD-CAA (2% w/v glucose, 0.67% w/v yeast nitrogen base without amino acids, 0.062% w/v Ura/Trp depleted casamino acids, 38mM Na 2 HPO 4 , 62mM NaH 2 PO 4 , pH 6.0).
  • a single yeast colony was then used to inoculate 2 ml of SD-CAA liquid medium and cultured overnight at 30°C with shaking at 225 rpm to an OD 600 of 2.5-5.0.
  • To induce GAL1-10 driven protein expression in yeast the harvested 10 7 cells were switched to 2 mL of SG-CAA medium (glucose was replaced with galactose). After induction at 30°C for 18 hours, sufficient yeast cells were collected by centrifugation at 2,500 g for 3 minutes, washed and prepared for analysis of protein expression or peptide binding.
  • yeast cells were then washed with 300 ⁇ l of ice-cold PBS + 1% w/v bovine serum albumin (BSA) and incubated with highly cross-adsorbed secondary antibodies, AlexaFluor488 goat anti-mouse IgG (H+L) or AlexaFluor 647 goat anti-mouse IgG (H+L) (1:100 dilution) or other secondary antibodies of other origin for 1 hour on ice. After labeling, yeast cells were analyzed on a flow cytometer to detect fluorescent signals corresponding to the expression of MHC-II proteins or epitope tags. No less than 20,000 cells were collected for each sample. Flow cytometric data were analyzed using FlowJo software (BD).
  • BD FlowJo software
  • Target antigen peptide binds to the "empty" MHC-II on yeast
  • the reaction tubes were sealed with parafilm before incubation to prevent changes in culture volume that could affect the final concentration of the peptide in time course or concentration titration studies.
  • the yeast cells were washed twice with 300 ⁇ l ice-cold PBS + 1% BSA and then stained with streptavidin-AF647 diluted 1:200 in 50 ⁇ l PBS + 1% BSA for one hour on ice.
  • the cells were then washed twice with 300 ⁇ l ice-cold PBS + 1% BSA and finally resuspended in 300 ⁇ l ice-cold PBS + 1% BSA for flow cytometric analysis.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Analytical Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Food Science & Technology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Peptides Or Proteins (AREA)

Abstract

公开了一种开发MHC新抗原的工程化细胞,可展示主要组织相容性复合体与抗原肽的三聚体的工程化细胞,编码所述三聚体的核酸,以及所述工程化细胞的用途及其制备方法。所述工程化细胞可用于MHC-II目标肽的准确高效的高通量筛选,并用于新抗原及其相关疫苗和免疫疗法的开发。还公开了可展示主要组织相容性复合体的工程化细胞,编码所述MHC的核酸,以及所述工程化细胞的用途及其制备方法。所述工程化细胞可用于MHC-II目标肽的准确高效的筛选,并应用于疫苗和T细胞免疫疗法的开发。

Description

开发MHC新抗原的工程化细胞 技术领域
本申请涉及细胞表面展示技术,尤其涉及在酵母细胞表面展示MHC-II以及其与抗原肽的三聚体的改进技术方案。
背景技术
T细胞调控的免疫反应对许多疾病都起到重要作用,包括肿瘤、自身免疫、移植排斥和传染病等。其中,作为适应性免疫的中央调控官,CD4+T细胞的激活需要通过其受体(TCR)特异性的识别由主要组织相容性复合体II类(MHC-II)所结合并呈递的抗原或病原肽。MHC-II是由α和β链组成的异二聚体跨膜蛋白,每条链包含两个结构域,这些蛋白捕捉专职抗原呈递细胞(APC)内部处理的抗原肽,并将其呈现在APC表面供CD4+T细胞识别以启动下游的免疫反应。由α1和β1结构域形成的MHC-II的肽结合槽包含几个袋状的空间结构,这些“袋”喜欢容纳抗原肽注册段(register)上的“锚”残基的特定侧链。因此,MHC-II可容纳的肽register具备一定泛性,换句话说,能够结合并呈递许多不同的抗原肽,但锚-袋的特殊凸凹空间结构在一定程度上限制了给定MHC-II所呈递的肽的种类。因此,深入了解MHC-II的肽结合功能进而确定各种肽register各个位点残基偏好对于肿瘤新抗原发现、自身免疫调节、疫苗设计、传染病预防和移植排斥缓解等问题至关重要。
人体的MHC或称为人类白细胞抗原(HLA),它是自然界中已知的高多态性的糖修饰的穿膜蛋白。这种多态性与自然存在的多种HLA单倍型(haplotype)、基因亚型(DR、DQ、DP)以及成千上万种等位基因有关,而且,每种等位基因所翻译表达出来的MHC蛋白各自可以结合并呈递多种肽片段。因此,研究各种MHC-II的肽结合特性和确定各种肽抗原表位图谱是极具挑战性的问题,需要高通量高内涵的基因蛋白工程方法。常规方法中,一些从不同的表达系统(如B细胞系、昆虫细胞、酵母或大肠杆菌)中纯化可溶性重组MHC-II分子,然后分析这些分子与不同的肽的结合强弱;另一些研究侧重评估肽与APC表面表达的MHC-II的结合。这些方法中所用到的肽片段或通过化学方法固相合成或通过基因方法如噬菌体展示产生。这种低效的肽片段合成,劳动密集型的可溶性蛋白制备、冗长的结合试验或产生的非定量数据(如肽的丰度)限制了这些方法在大量现有MHC-II等位基因中绘制特异性结合表位肽图谱的效率和产量。换一种思路,具有定向进化优势的真核细胞表面展示技术可用于表达指定MHC-II等位基因,从而在细胞表面进行肽结合试验。作为一个真核单细胞系统,酵母具有简单的分子克隆和真核翻译后修饰蛋白质表达的优势,因此,它是开发这种方法的优选平台。
发明内容
本申请提供了可展示主要组织相容性复合体及其与抗原肽的三聚体的工程化细胞, 编码所述复合体及其与抗原肽的三聚体的核酸,以及所述工程化细胞的用途及其制备方法。所述工程化细胞可用于MHC-II目标肽的准确高效的高通量筛选以及MHC-II与目标肽的亲和分析。
具体地,本申请涉及:1.一种细胞,其包括主要组织相容性复合体(MHC)以及单链结构域,其中所述复合体包含α链和β链,其中,
所述α链连接至第一蛋白结合结构域以形成第一融合蛋白,
所述β链连接至第二蛋白结合结构域以形成第二融合蛋白,
所述单链结构域非共价结合所述α链和β链,所述第一融合蛋白结合所述第二融合蛋白形成所述复合体,并且所述复合体与单链结构域之间构成的三聚体,其中,
所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域之间构成的三聚体。
2.根据项1所述的细胞,其中,所述第一蛋白结合结构域非共价结合所述第二蛋白结合结构域;或者所述第一蛋白结合结构域共价连接所述第二蛋白结合结构域,所述共价键优选为二硫键。
3.根据项1所述的细胞,其中,所述细胞是酵母细胞。
4.根据项1所述的细胞,其中,所述单链结构域与所述细胞表面上的分子以共价融合的方式结合至所述细胞的表面。
5.根据项4所述的细胞,其中所述α链或所述β链通过与所述单链结构域非共价结合的方式结合至所述细胞表面上的所述分子。
6.根据项4所述的细胞,其中所述分子是蛋白质。
7.根据项6所述的细胞,其中所述蛋白质对于所述细胞是内源性的。
8.根据项6所述的细胞,其中所述蛋白质为Aga2p、a-凝集素、α-凝集素、絮凝素、Cwp1p、Cwp2p或Tip1p。
9.根据项6所述的细胞,其中所述蛋白质是Aga2p。
10.根据项9所述的细胞,其中所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
11.根据项1所述的细胞,其中所述单链结构域为长度9~30个氨基酸的肽段,并且所述单链结构域包含至少一个连续的长度为9个氨基酸的可被所述复合体特异性识别的注册肽(register)或可被所述复合体结合的注册肽变体(variant),优选所述注册肽或其变体选自如SEQ ID NO:20-38中任一项所示的氨基酸序列。
12.根据项11所述的细胞,其中所述单链结构域的氨基酸序列如SEQ ID NO:2-6或46-77中任一项所示。
13.根据项1所述的细胞,其中,所述第一蛋白结合结构域为第一亮氨酸拉链结构域,所述第二蛋白结合结构域为第二亮氨酸拉链结构域,其中,形成完整亮氨酸拉链的所述第一亮氨酸拉链结构域与所述第二亮氨酸拉链结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用;或者
所述第一蛋白结合结构域为FcA结构域,所述第二蛋白结合结构域为FcB结构域,其 中,形成FcAB二聚体的所述FcA结构域与所述FcB结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用。
14.根据项13所述的细胞,其中,所述第一亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:7或SEQ ID NO:8或SEQ ID NO:9或SEQ ID NO:10所示;
相应的,所述第二亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:8或SEQ ID NO:7或SEQ ID NO:10或SEQ ID NO:9所示。
15.根据项13所述的细胞,其中,所述FcA结构域与所述FcB结构域为相同的或不同的氨基酸序列,优选选自如SEQ ID NO:11-19所示的氨基酸序列中的任一种;
优选所述Fc A结构域和FcB结构域为能够提高所述复合体表达展示量和/或拷贝数量;
优选所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;或者
优选所述FcA结构域包含第一CH2结构域和第一CH3结构域,所述FcB结构域包含第二CH2结构域和第二CH3结构域;
进一步优选第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头连接,第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头连接。
16.根据项15所述的细胞,其中,所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,所述修饰能实现以下任一种或两种或三种:
(1)增强或稳定所述FcA结构域和/或所述FcB结构域本身(“FcS”),
(2)增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),
(3)减少或避免所述FcA结构域与所述FcB结构域错配所致的所述α链和β链的错配(“FcN”)。
17.根据权利要求1中任一项所述的细胞,其中,所述第一蛋白结合结构域连接至所述α链的C末端,所述第二蛋白结合结构域连接至所述β链的C末端,优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
18.根据项1所述的细胞,其中,所述复合体为MHC I类分子或MHC II类分子,优选所述复合体为HLA I类分子或HLA II类分子。
19.根据项18所述的细胞,其中,所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
20.一种编码主要组织相容性复合体(MHC)的核酸,其中所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述α链和β链通过非共价结合单链结构域,并且所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域之间构成的三聚体。
21.根据项20所述的核酸,其所编码的主要组织相容性复合体是项1~19中任一项涉及的主要组织相容性复合体(MHC)。
22.一种主要组织相容性复合体(MHC)的制备方法,所述方法包括:
用项20或21所述的核酸和包括编码单链结构域的核酸转化细胞,
以及在表达项1~19中任一项涉及的复合体的条件下培养所述细胞。
23.根据项22所述的方法,其中所述细胞是酵母细胞。
24.一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,所述方法包括:
i)使项1~19中任一项所述的细胞展示的单链结构域文库,
以及ii)检测项1~19中任一项涉及的单链结构域展示的细胞文库中的单克隆表面的三聚体,从而鉴定项1~19中任一项涉及的复合体的结合肽。
25.根据项24所述的方法,其中所述单链结构域包含肽,或肽的突变体,或肽的突变体文库,或肽的混合物。
26.根据项24或25所述的方法,其中所述肽,或肽的突变体,或肽的突变体文库,或肽的混合物非共价结合所述复合体。
27.一种能够与主要组织相容性复合体(MHC)结合的注册肽或其变体,其中,所述注册肽或其变体选自如SEQ ID NO:20-38中任一项所示的氨基酸序列。
本申请还涉及一种可展示主要组织相容性复合体(MHC)的工程化细胞,编码所述MHC的核酸,以及所述工程化细胞的用途及其制备方法。所述工程化细胞既提高了测定目标肽与酵母展示的MHC-II蛋白之间的亲和力的灵敏度和准确性,又可依赖于测试肽和参考肽与酵母展示的MHC-II蛋白之间的竞争性结合,实现短时高效的高通量测定。
本申请还涉及:28.一种细胞,其包括主要组织相容性复合体(MHC),其中,所述复合体包含α链和β链,其中,
所述α链连接至第一蛋白结合结构域以构成第一融合蛋白,
所述β链连接至第二蛋白结合结构域以构成第二融合蛋白,
所述第一融合蛋白结合所述第二融合蛋白以形成所述复合体,
所述第一蛋白结合结构域和所述第二蛋白结合结构域以增强所述α链和β链结合形成所述MHC。
29.根据项28所述的细胞,其中,所述细胞是酵母细胞。
30.根据项28所述的细胞,其中,所述复合体通过将所述α链或所述β链连接至所述细胞表面上的分子而结合至所述细胞的表面。
31.根据项30所述的细胞,其中,复合体中另外的所述α链或另外的所述β链通过与所述β链或所述α链非共价结合从而结合至所述细胞表面上的所述分子。
32.根据项30所述的细胞,其中,所述分子是蛋白质。
33.根据项32所述的细胞,其中,所述蛋白质对于所述细胞是内源性的。
34.根据项32所述的细胞,其中,所述蛋白质为Aga2p、a-凝集素、α-凝集素、絮凝素、Cwp1p、Cwp2p或Tip1p。
35.根据项32所述的细胞,其中,所述蛋白质是Aga2p。
36.根据项35所述的细胞,其中,所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
37.根据项28所述的细胞,其中,所述第一蛋白结合结构域非共价结合所述第二蛋白结合结构域;或者
所述第一蛋白结合结构域共价结合所述第二蛋白结合结构域,优选所述共价结合为通过二硫键的结合。
38.根据项28所述的细胞,其中,
所述第一蛋白结合结构域和所述第二蛋白结合结构域形成FcAB结构域;
优选所述第一蛋白结合结构域为FcA结构域或FcB结构域,所述第二蛋白结合结构域为FcB结构域或FcA结构域,其中,
进一步优选所述FcA结构域和FcB结构域能够提高所述复合体表达展示量和/或拷贝数量。
39.根据项38所述的细胞,其中,
所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;或者
所述FcA结构域既包含第一CH2结构域又包含第一CH3结构域,所述FcB结构域既包含第二CH2结构域又包含第二CH3结构域;
进一步优选第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头连接,第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头连接。
40.根据项39所述的细胞,其中,
所述FcA结构域与所述FcB结构域任意选自SEQ ID NO:11-19所示的氨基酸序列中的任一种或与SEQ ID NO:11-19所示的氨基酸序列具有至少80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%的序列同一性。
41.根据项39或40所述的细胞,其中,所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,
所述修饰能实现以下任意:
增强或稳定所述FcA结构域和/或所述FcB结构域本身(“FcS”),
增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),
减少或避免所述FcA结构域与所述FcB结构域错配所致的所述α链和β链的错配(“FcN”)。
42.根据项28~41中任一项所述的细胞,其中,所述第一蛋白结合结构域连接至所述 α链的C末端,所述第二蛋白结合结构域连接至所述β链的C末端,优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
43.根据项28~42中任一项所述的细胞,其中,所述复合体为MHC I类分子或MHC II类分子,优选所述复合体为HLA I类分子或HLA II类分子。
44.根据项43所述的细胞,其中,
所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
45.一种编码主要组织相容性复合体(MHC)的核酸,其中,所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述第一蛋白结合域和所述第二蛋白结合域能够结合以形成所述复合体,所述第一蛋白结合结构域和所述第二蛋白结合结构域形成FcAB二聚体以增强所述α链和β链结合形成所述MHC。
46.根据项45所述的核酸,其中,所编码的主要组织相容性复合体是项1~17中任一项涉及的主要组织相容性复合体(MHC)。
47.一种主要组织相容性复合体(MHC)的制备方法,其中,所述方法包括:
用项45或46所述的核酸转化细胞,
以及在表达项28~44中任一项涉及的复合体的条件下培养所述细胞。
48.根据项47所述的方法,其中,所述细胞是酵母细胞。
49.一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,其中,所述方法包括:
i)使项28~44中任一项所述的细胞与肽接触,
以及ii)检测所述肽与项28~44中任一项涉及的复合体的结合,从而鉴定所述复合体的结合肽。
50.根据项49所述的方法,其中,所述肽包含肽的混合物。
51.根据项49或50所述的方法,其中,所述肽与参考肽竞争结合所述复合体。
附图说明
图1 CODAAH系统设计方案。a-凝集素蛋白2亚基(Aga2p)融合的肽N-末端和C-末端分 别融合HA和V5表位标签,通过对任一标签的特异性抗体进行免疫荧光染色(如Anti-HA抗体间标),可以检测肽的表达和细胞表面展示水平。带有C末端二聚修饰的MHC-IIα和MHC-IIβ链的胞外结构域由独立于肽展示型载体的另一个分型载体指导表达。肽通过经典酵母展示方法与酵母分泌组装的a-凝集素蛋白(由Aga1p和Aga2p亚基通过两个-SH分别形成-S-S-建)融合并锚定在细胞表面。MHC-IIα/β异二聚体通过C-末端的增强二聚作用提高复合效率,并通过与特异性的抗原肽的非共价结合锚定在细胞表面,实现肽/MHC-IIα/β三聚体酵母共展示。偶联到抗标签和抗MHC-II(Anti-MHC-II)试剂的不同荧光团的相对荧光水平表明结合的MHC-II对可特异性结合肽的饱和程度。EBY00:用于酵母展示的亲本酵母株,GPI:运送酵母内源蛋白的糖基磷脂酰肌醇,其由酵母基因组翻译表达。
图2显示可选质粒的目的基因区示意图,所述目的基因为DR1等位基因(HLA-DRA*01:01/DRB1*01:01)。利用所述可选质粒,酵母可翻译表达MHC-II复合体的胞外结构域。MHC-II的C-末端通过亮氨酸片段(LZA或LZB)修饰增强二聚偶联。每个表达盒包括图示方向的GA1或GAL10启动子,HLA-II的一条带C末端修饰链的融合蛋白,和一个MFα终止子。
图3与图2类似的显示可选质粒的目的基因区示意图,所述目的基因为DR4等位基因(HLA-DRA*01:01/DRB1*04:01)。
图4与图2类似的显示可选质粒的目的基因区示意图,所述目的基因为DQ6等位基因(HLA-DQA1*01:02/DQB1*06:02)。
图5与图2类似的显示可选质粒的目的基因区示意图,所述目的基因为DR1等位基因(HLA-DRA*01:01/DRB1*01:01)。MHC-II的C-末端通过抗体可结晶片段(FcA或FcB)修饰增强二聚偶联。
图6与图5类似的显示可选质粒的目的基因区示意图,所述目的基因为DR4等位基因(HLA-DRA*01:01/DRB1*04:01)。
图7与图5类似的显示可选质粒的目的基因区示意图,所述目的基因为DQ6等位基因(HLA-DQA1*01:02/DQB1*06:02)。
图8对只分泌HLA-II(Background背景对照),展示LZ修饰图注HLA-II-LZ,展示FcAB修饰图注HLA-II-Fc,展示带AGA2 SP信号肽强化胞外递送的FcAB修饰图注HLA-II-Fc(w/AGA2 SP),四类酵母菌株分别进行表面DR或DQ蛋白构象的免疫荧光标记和流式细胞测定。流式细胞检测结果以histogram方式显示。其中,HLA-IIα/β等位基因及各条件下DR或DQ蛋白荧光强度中位数(MFI)如图所示。HLA-II-LZ或HLA-II-Fc酵母展示水平可由信噪差(ΔMFI=MFI_LZ-MFI_BC或MFI_Fc-MFI_BC)定量分析。
图9对只分泌DR1(no pep+DR背景对照),只展示DR特异性肽(+HA308-316no DR对照),既展示HA308-316肽又分泌LZ修饰DR1-LZ,既展示HA308-316肽又分泌带AGA2 SP信号肽强化胞外递送的Fc修饰DR1-LZ,既展示HA308-316肽又分泌LZ修饰DR4-LZ,既展示HA308-316肽又分泌带AGA2 SP信号肽强化胞外递送的Fc修饰DR4-LZ,六类酵母菌株分别进行表面HA标签和DR蛋白构象的免疫荧光共标记和流式细胞测定。流式细胞检测结果以点图 (SSC侧散vsHA标签)或histogram方式显示。其中,点图标注门控HA+阳性细胞亚群百分比,histogram图标注HA+亚群DRα/β蛋白荧光强度中位数(MFI)。DR展示水平可如图8方式定量分析。
图10对既展示HA308-316肽变体又分泌DR1的亲本酵母文库(+pep)分别进行表面HA标签和DR蛋白构象的免疫荧光共标记和流式细胞分选。四轮分选实验(S1,S2,S3,S4)的HA+/DR+双阳性,以及每轮所用背景对照即只分泌DR1(no pep)的HA-/DR-双阴性如HA标签vs DR蛋白点图所示。另外,SSC侧散vsHA标签点图标注门控HA+阳性细胞亚群百分比,histogram图标注HA+亚群DRα/β蛋白荧光强度中位数(MFI)。每轮依次递增的DR展示水平可如图8方式定量分析。
图11对既展示HA308-316肽变体又分泌DR1的酵母子库中选定的10个单克隆进行酵母菌PCR和目标肽碱基测序。结果与建库DNA引物和野生型基因及其肽氨基酸序列对比。除#1、#4外,测序峰图有明显重叠峰现象表明单一酵母可容纳多个转化形成的重组质粒。因此,克隆#5与克隆#7肽变体的主导密码子DNA与主导密码子对应的产物相同,并不能肯定这两个酵母克隆转化得到的肽变体展示型重组质粒完全相同。
图12对选定的4个单克隆(#1,#7,#9,#10)进行酵母菌质粒抽提、大肠杆菌转化以及后续的每个酵母菌对应的一到三个大肠杆菌质粒抽提和目标肽碱基测序。结果与野生型基因及其肽氨基酸序列对比。选定肽展示质粒进一步转化酵母构建既展示HA308-316肽变体又分泌DR1的菌株并进行进行表面HA标签和DR蛋白构象的免疫荧光共标记和流式细胞测定。流式细胞检测结果以点图(SSC侧散vsHA标签)或histogram方式显示。其中,点图标注门控HA+阳性细胞亚群百分比,histogram图标注HA+亚群DRα/β蛋白荧光强度中位数(MFI)。DR展示水平可如图8方式定量分析。
图13应用编号为#1的肽相应的展示质粒进一步转化酵母构建既展示#1肽又分泌DR4的菌株并进行进行表面HA标签和DR蛋白构象的免疫荧光共标记和流式细胞测定。流式细胞检测结果以点图(SSC侧散vsHA标签)或histogram方式显示。其中,点图标注门控HA+阳性细胞亚群百分比,histogram图标注HA+亚群DRα/β蛋白荧光强度中位数(MFI)。DR展示水平可如图8方式定量分析。
图14EBY100酵母分泌可溶性HLA-II(背景对照,或BC)、展示LZ修饰HLA-II(W/LZ)、或展示FcAB修饰HLA-II(W/Fc),分别进行表面DR或DQ蛋白水平的免疫荧光标记和流式细胞测定。流式细胞检测结果以histogram方式显示。其中,HLA-IIα/β等位基因及各条件下DR或DQ蛋白荧光强度中位数如图所示。HLA-II-LZ或HLA-II-Fc酵母展示水平可由信噪差(ΔMFI=MFI_LZ-MFI_BC或MFI_Fc-MFI_BC)定量分析。
图15EBY100酵母分泌可溶性HLA-II(BC)、展示LZ修饰HLA-II(W/LZ)、或展示FcAB修饰并由AGA2信号肽引导胞外递送展示HLA-II(W/Fc),分别进行表面DR蛋白水平的免疫荧光标记和流式细胞测定。流式细胞检测结果以histogram方式显示。其中,DRα/β等位基因及各条件下DR蛋白荧光强度中位数如图所示。DR-LZ或DR-Fc酵母展示水平可由信噪差(ΔMFI=MFI_LZ-MFI_BC或MFI_Fc-MFI_BC)定量分析。
图16EBY100酵母展示LZ修饰DR1(DR-LZ)或展示FcAB修饰并由AGA2信号肽引导胞外递送展示DR1(DR1-Fc),分别进行表面DR蛋白水平和指示肽水平的免疫荧光共标记和流式细胞测定。流式细胞检测结果以Dot-plot或histogram方式显示。其中,各条件下减(no peptide)加阳性指示肽(Bio-HA306-318)或阴性对照肽(Bio-aI),各图细胞群门控分析,以及DR蛋白荧光强度中位数(选取DR阳性酵母亚群分析)分别如图所示。酵母表面结合的生物素化指示肽的生物素水平可由信噪差(ΔMFI=MFI_Bio-pep-MFI_no peptide)定量分析。
图17EBY100酵母展示LZ修饰DR1(DR-LZ)或展示FcAB修饰并由AGA2信号肽引导胞外递送展示DR1(DR1-Fc),分别与不同浓度的生物素化指示肽(Bio-HA306-318)结合并进行表面DR蛋白水平和指示肽水平的免疫荧光共标记和流式细胞测定。流式细胞检测结果如图16方式分析后进行单点结合动力学曲线拟合。拟合曲线计算的表观平衡解离常数KD,app如图所示。
图18EBY100酵母展示LZ修饰DR1(DR-LZ)或展示FcAB修饰并由AGA2信号肽引导胞外递送展示DR1(DR1-Fc),分别进行表面DR蛋白水平和指示肽水平的免疫荧光共标记和流式细胞测定。流式细胞检测结果以Dot-plot或histogram方式显示。其中,各条件下减(no indicator)加指示肽(Bio-HA306-318)及减(no competitor)加过量阳性竞争肽(10倍HA306-318)或阴性竞争肽(10倍aI),各图细胞群门控分析,以及DR蛋白荧光强度中位数(选取DR阳性酵母亚群分析)分别如图所示。酵母表面结合的生物素化指示肽的生物素水平可由信噪差(ΔMFI=MFI_Bio-pep-MFI_no indicator)定量分析。
图19EBY100酵母展示LZ修饰DR1(DR-LZ)或展示FcAB修饰并由AGA2信号肽引导胞外递送展示DR1(DR1-Fc),分别与10μM生物素化指示肽(Bio-HA306-318)及不同浓度竞争肽(HA306-318)结合并进行表面DR蛋白水平和指示肽水平的免疫荧光共标记和流式细胞测定。流式细胞检测结果如图18方式分析后进行单点竞争结合动力学曲线拟合。拟合曲线计算的IC50如图所示。
具体实施方案
本申请提供了使酵母共表达一种抗原肽基因和一种MHC-II等位基因,并且将表达的蛋白产物,即肽/MHC-II三聚体,共展示在酵母表面的方案。该方案不仅实现了酵母共表达复杂的MHC-II等位基因,还解决了肽/MHC-II三聚体共展示量低的问题。其大大提升了MHC-II/抗原肽相关的新抗原研发效率,同时,为各类疫苗包括口服疫苗的研制提供了新思路;也为精准免疫治疗的靶标开发提供了解决方案。
具体地,本申请的方案使用了酵母共展示系统(图1),这里定名为CODAAH(CO-Display of Antigen-ligand and Assicated HLA-II)。为使肽片段和MHC-II复合体以三聚体的形式共展示在细胞表面,本申请首先通过对肽片段基因进行基因融合使其蛋白产物连接在酵母表面。由于肽片段来自于T细胞抗原,因此,以可溶形式分泌的MHC-II复合体可以非共价结合酵母展示的肽片段,实现酵母表面的肽/MHC-II三聚体共展示。通过修 饰这些MHC-II蛋白的C-末端,包括融合亮氨酸拉链结构片段或融合抗体重链可结晶片段等,肽/MHC-II三聚体在酿酒酵母表面的展示量得到了显著的提高。
定义
虽然本文显示和描述了本发明的各种实施方案和方面,但是对于本领域技术人员来说显而易见的是,这些实施方案和方面仅作为示例提供。在不脱离本发明的情况下,本领域技术人员现在将想到许多变化、改变和替换。应当理解,在实施本发明时可以采用本文描述的本发明实施方案的各种替代方案。
如本文所用,“同一性”的百分比,例如85%、90%、91%、92%、93%、94%、95%、96%、97%、98%、98.5%、99%、99.5%同一性,是指氨基酸序列之间或核苷酸序列之间,通过序列比对确定的相似程度,是85%、90%、91%、92%、93%、94%、95%、96%、97%、98%、98.5%、99%、99.5%。例如,通过引入空位等方式可以使两条序列在尽可能多的位置上具有相同残基后,确定的具有相同碱基或氨基酸残基的位置数量占位置总数的比例。“同一性”的百分比可以用本领域已知的软件程序来确定。优选的是使用默认参数进行比对。一个优选的比对程序是BLAST。优选的程序是BLASTN和BLASTP。这些程序的细节可以在以下互联网地址找到:ncbi.nlm.nih.gov/cgi-bin/BLAST。
如本文所用,核酸的“互补”是指一条核酸通过传统的Watson-Crick碱基配对与另一条核酸形成氢键的能力。百分比互补性表示核酸分子中可与另一核酸分子形成氢键(即,Watson-Crick碱基配对)的残基的百分比(例如,10个中的约5、6、7、8、9、10个分别为约50%,60%,70%,80%,90%和100%互补)。“完全互补”是指核酸序列的所有连续残基与第二核酸序列中相同数量的连续残基形成氢键。如本文所用,“基本上互补”是指在约40、50、60、70、80、100、150、200、250或更多个核苷酸的区域内,至少约70%,75%,80%,85%,90%,95%,96%,97%,98%,99%或100%中的任何一个的互补程度,或指在严格条件下杂交的两条核酸。对于单个碱基或单个核苷酸,按照Watson-Crick碱基配对原则,A与T或U、C与G或I配对时,被称为互补或匹配,反之亦然;而除此以外的碱基配对都称为不互补。本申请中某多核苷酸序列的“互补多核苷酸序列”则是指与该某多核苷酸序列完全互补的多核苷酸序列。
如本文所用,某个蛋白、多肽或氨基酸序列的“保守取代变体”是指其中一个或多个氨基酸残基经过氨基酸取代而不改变蛋白质或酶的整体构象和功能,这包括但不限于以前述“保守取代”描述的方式取代亲本蛋白质中氨基酸序列中的氨基酸。因此,相似功能的两个蛋白或氨基酸序列的相似性可能会不同。例如,基于MEGALIGN算法的70%至99%的相似度(同一性)。“保守取代变体”还包括通过BLAST或FASTA算法确定具有60%以上的氨基酸同一性的多肽或酶,若能达75%以上更好,最好能达85%以上,甚至达90%以上为最佳,并且与天然或亲本蛋白质或酶相比具有相同或基本相似的性质或功能。
在本文中,“编码”是指i)DNA序列中包含可被转录成RNA分子的遗传信息,和/或ii)RNA分子中包含可被翻译成氨基酸序列的遗传信息。因此,如本文所用,“编码序列”可用以指代mRNA前体或成熟mRNA中可以被翻译为蛋白质的核糖核苷酸(RNA)序列或 其片段,亦可指代作为模板用以转录所述mRNA前体或成熟mRNA的脱氧核糖核苷酸(DNA)序列的互补序列或其片段。此外,本申请的“编码序列”还可以进一步包含编码蛋白、功能性核酸、或其片段,例如miRNA、shRNA、dsRNA、向导RNA、Poly(A)尾、5’UTR、3’UTR等的多核苷酸序列。其中,包含可被转录成RNA分子的遗传信息的DNA分子称为所述RNA分子的“编码核酸”;包含可被翻译成氨基酸序列的遗传信息的RNA分子称为所述氨基酸序列的“编码核酸”。
本文所用的术语“MHC-II”或“MHC-II复合体”是指通常在抗原呈递细胞(包括树突细胞、B细胞和吞噬细胞)上发现的主要组织相容性II类(MHC-II)复合体分子。MHC-II复合体有助于例如,通过呈递抗原肽调节免疫系统。MHC-II复合体包括两个人类白细胞抗原(HLA)蛋白,在本文中称为“alpha链”或“α链”和“beta链”或“β链”,它们结合形成异源二聚体。α链和β链的α1和β1区域分别靠近并聚集在一起形成肽结合结构域。α链和β链的α2和β2区域分别位于更靠近细胞膜的位置,并形成免疫球蛋白样结构域。α和β链的N末端区域包括α1和β1区域,α和β链的C末端区域包括α2和β2区域。α链和β链通常通过跨膜结构域连接至细胞表面。
术语“人类白细胞抗原”或“HLA”是指由人类主要组织相容性复合体(MHC)基因复合体编码的蛋白组。编码HLA蛋白组的HLA基因具有不同的等位基因,从而为基因产物提供不同的功能。属于MHC-II类的HLA蛋白(例如HLA-DP、HLA-DQ、HLA-DR)通常呈递来自细胞外的抗原所产生的肽。MHC-II类蛋白包括HLA-DP、HLA-DQ和HLA-DR,包括α链和β链的异源二聚体细胞表面受体。
本文提供的术语“HLA-DR1、DR4、DR15α链”或“HLA-DR1、DR4、DR15α链蛋白”包括任何重组或天然存在形式的人类白细胞抗原(HLA)HLA-DRα链蛋白,也称为MHC-II类HLA-DRA、或其变体或同系物,它们保持HLA-DRα链蛋白活性(例如与HLA-DRA*01:01相比,至少50%、80%、90%、95%、96%、97%、98%、99%或100%以内的活性)。在一些方面,与天然存在的HLA-DR4α链多肽相比,变体或同系物在整个序列或序列的一部分(例如50、100、150或200个连续氨基酸部分)具有至少90%、95%、96%、97%、98%、99%或100%的氨基酸序列同一性。在实施方案中,HLA-DRα链是由UniProt序列参考P01903鉴定的蛋白、其同系物或功能片段。
本文提供的术语“HLA-DRβ链”或“HLA-DRβ链蛋白”包括任何重组或天然存在形式的人类白细胞抗原(HLA)HLA-DRβ链蛋白,可以为MHC-II类HLA-DRB1*01:01或HLA-DRB1*04:01或HLA-DRB1*15:01或HLA-DRB族其他等位基因表达的蛋白或其变体或同系物,它们保持HLA-DR1或DR4或DR15或其他DRβ链蛋白活性(例如与HLA-DRB1*01:01或HLA-DRB1*04:01或HLA-DRB1*15:01或HLA-DRB族其他等位基因表达的蛋白的β链相比,至少50%、80%、90%、95%、96%、97%、98%、99%或100%以内的活性)。在一些方面,与天然存在的HLA-DR4β链多肽相比,变体或同系物在整个序列或序列的一部分(例如50、100、150或200个连续氨基酸部分)具有至少90%、95%、96%、97%、98%、99%或100%的氨基酸序列同一性。在实施方案中,HLA-DR4β链是由UniProt 序列参考P13762鉴定的蛋白、其同系物或功能片段。
本文提供的术语“HLA-DQ6α链”或“HLA-DQ6α链蛋白”包括任何重组或天然存在形式的人类白细胞抗原(HLA)DQα1链(HLA-DQ6α链)的,也称为DC-1α链、DC-α、HLA-DCA、MHC-II类DQA1或其变体或同源物,它们保持HLA-DQ6α链蛋白活性(例如与HLA-DQ6α链相比,至少50%、80%、90%、95%、96%、97%、98%、99%或100%以内的活性)。在一些方面,与天然存在的HLA-DQ6α链多肽相比,变体或同系物在整个序列或序列的一部分(例如50、100、150或200个连续氨基酸部分)具有至少90%、95%、96%、97%、98%、99%或100%的氨基酸序列同一性。在实施方案中,HLA-DQ6α链是由UniProt序列参考P01909鉴定的蛋白、其同系物或功能片段。
本文提供的术语“HLA-DQ6β链”或“HLA-DQ6β链蛋白”包括任何重组或天然存在形式的人类白细胞抗原(HLA)DRB1β链1蛋白(HLA-DQ6β链),也称为MHC-II类DQB1,HLA-DQB1或其变体或同源物,它们保持HLA-DQ6β链蛋白活性(例如与HLA-DQ6β链相比,至少50%、80%、90%、95%、96%、97%、98%、99%或100%以内的活性)。在一些方面,与天然存在的HLA-DQ6β链多肽相比,变体或同系物在整个序列或序列的一部分(例如50、100、150或200个连续氨基酸部分)具有至少90%、95%、96%、97%、98%、99%或100%的氨基酸序列同一性。在实施方案中,HLA-DQ6β链是由UniProt序列参考P01920鉴定的蛋白、其同系物或功能片段。
对于本文所述的特定蛋白,指定的蛋白包括任何蛋白的天然存在形式、变体或同源物,它们保持蛋白转录因子活性(例如,与天然蛋白相比,至少50%、80%、90%、95%、96%、97%、98%、99%或100%以内的活性)。在一些实施方案中,与天然存在的形式相比,变体或同系物在整个序列或序列的一部分(例如50、100、150或200个连续氨基酸部分)具有至少90%、95%、96%、97%、98%、99%或100%的氨基酸序列同一性。在其他实施方案中,蛋白是通过其NCBI序列参考鉴定的蛋白。在其他实施方案中,蛋白是通过其NCBI序列参考鉴定的蛋白、其同源物或功能片段。
“接触”按照其简单的普通含义使用,指的是允许至少两种不同的物质(例如肽和MHC-II复合体)变得足够接近以进行反应、相互作用或物理接触的过程。应当理解,所得到的反应产物可以直接由加入的试剂之间的反应产生,或者由一种或多种加入的试剂的中间体产生,该中间体可以在反应混合物中产生。术语“接触”可包括允许两种物质反应、相互作用或物理接触,其中这两种物质可以是例如本文提供的核酸和细胞。在实施方案中,接触包括,例如,允许本文所述的核酸进入细胞。
如本文所用,“细胞”是指执行足以保存或复制其基因组DNA的代谢或其他功能的细胞。可以通过本领域熟知的方法鉴定细胞,包括例如完整膜的存在、特定染料染色或产生子代的能力等。细胞可包括原核和真核细胞。原核细胞包括但不限于细菌。真核细胞包括但不限于酵母细胞和源自植物和动物的细胞,例如哺乳动物、昆虫(例如,贪夜蛾)和人类细胞。
当提及例如细胞、核酸、蛋白或载体时,术语“重组”表示细胞、核酸、蛋白或载体 已通过引入异源核酸或蛋白进行修饰或天然核酸或蛋白的改变,或细胞来源于如此修饰的细胞。例如,重组蛋白是由重组核酸分子产生的蛋白。核酸分子可以包括来自多个来源的遗传物质,从而包括非天然存在的序列。重组DNA可以通过分子生物学领域已知的方法或通过合成方法产生。因此,例如,重组细胞表达在天然(非重组)形式的细胞中未发现的基因,或表达异常表达、表达不足或根本不表达的天然基因。转基因细胞和植物是那些表达异源基因或编码序列的细胞和植物,通常是重组方法的结果。
术语“异源”当用于指核酸的部分时表示该核酸包含两个或更多个在自然界中彼此不存在相同关系的子序列。例如,核酸通常是重组产生的,具有两个或多个来自不相关基因的序列,这些序列被排列以产生新的功能性核酸,例如来自一个来源的启动子和来自另一个来源的编码区。类似地,异源蛋白表示该蛋白包含两个或更多个在自然界中彼此不存在相同关系的子序列(例如,融合蛋白)。
术语“外源”是指源自给定细胞或生物体外部的分子或物质(例如,化合物、核酸或蛋白)。例如,本文所提到的“外源启动子”是并非源自表达它的细胞或生物体的启动子。相反,术语“内源”或“内源启动子”是指给定细胞或生物体天然的或起源于给定细胞或生物体的分子或物质。例如,酵母细胞内源蛋白是指酵母细胞天然表达的蛋白。可以将编码内源蛋白的核酸引入细胞中,从而允许表达蛋白。例如,可以将编码Aga2p的核酸导入酵母细胞,从而表达Aga2p。
术语“表达”包括涉及多肽生产的任何步骤,包括但不限于转录、转录后修饰、翻译、翻译后修饰和分泌。可以使用用于检测蛋白的常规技术(例如,ELISA、Western印迹、流式细胞术、免疫荧光、免疫组织化学等)来检测表达。
“生物样本”或“样本”是指从受试者或患者获得或衍生的材料。生物样本包括组织切片,例如活检和尸检样本,以及用于组织学目的的冷冻切片。此类样本包括体液,例如血液和血液组分或产物(例如,血清、血浆、血小板、红细胞等)、痰、组织、培养细胞(例如,原代培养物、外植体和转化细胞)、粪便、尿液、滑液、关节组织、滑膜组织、滑膜细胞、免疫细胞、造血细胞、成纤维细胞、巨噬细胞、T细胞等。生物样本通常获自真核生物,例如哺乳动物,例如灵长类动物,例如黑猩猩或人类;牛;狗;猫;啮齿动物,例如豚鼠、大鼠、小鼠;兔子;或鸟;爬虫;或鱼。
“对照”或“标准对照”是指用作参考的样本、测量或值,通常是已知参考,用于与测试样本、测量或值进行比较。例如,可以从疑似患有给定疾病的患者获取测试样本,并与已知的正常(未患病)个体(例如标准对照受试者)进行比较。标准对照也可以表示从没有给定疾病(即标准对照人群)的相似个体(例如标准对照受试者),例如具有相似医学背景、相同年龄、重量的健康个体群体收集的平均测量值或值。标准对照值也可以从同一个体获得,例如来自疾病发作前较早获得的患者样本。例如,可以设计对照来比较基于药理学数据(例如,半衰期)或治疗措施(例如,副作用的比较)的治疗益处。对照对于确定数据的重要性也很有价值。例如,如果给定参数的值在对照中变化很大,则测试样本中的变化将不被视为显著。其中本领域技术人员能将认识到标准对照可以设计用于评估任何数 量的参数(例如RNA水平、蛋白水平、特定细胞类型、特定体液、特定组织等)。
本领域技术人员将理解哪些标准对照在给定情况下最合适并且能够基于与标准对照值的比较来分析数据。标准控制对于确定数据的显著性(例如统计显著性)也很有价值。例如,如果给定参数的值在标准控制中变化很大,则测试样本的变化将不被视为显著。
应当理解,本申请包含本文所描述的各种方面、实施方案以及所述方面和/或实施方案的组合。以上描述以及随后的实施例旨在说明而不是限制本申请的范围。在本申请范围内的其他方面、改进和修改对于本申请所属领域的技术人员将是显而易见的。因此,本领域的普通技术人员应该认识到,本申请的范围还包括对所述方面和实施方案的所述改进和修改。
细胞
本申请一方面提供了一种工程化细胞,其包括主要组织相容性复合体(MHC)以及单链结构域,其中所述复合体包含α链和β链,其中,
所述α链连接至第一蛋白结合结构域以形成第一融合蛋白,
所述β链连接至第二蛋白结合结构域以形成第二融合蛋白,
所述单链结构域非共价结合所述α链和β链,并且所述第一融合蛋白结合所述第二融合蛋白以促使α链和β链形成所述复合体,从而所述复合体与单链结构域共同构成三聚体,其中,所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域之间构成的三聚体。
在一些实施方案中,所述细胞是真核细胞。在一些实施方案中,所述细胞是源自单细胞生物的细胞。在一些实施方案中,所述细胞源自单核真核细胞生物。在一些实施方案中,所述细胞是酵母细胞。在一些实施方案中,所述酵母为酒酵母。在一些实施方案中,所述酵母属于菌株EBY100(Boder&Wittrup,Yeast surface display for screening combinatorial polypeptide libraries,1997)。
在一些实施方案中,所述单链结构域与所述细胞表面上的分子以共价融合的方式结合至所述细胞的表面。在一些实施方案中,所述单链结构域通过肽键与所述细胞表面上的分子结合至所述细胞的表面。在一些实施方案中,所述单链结构域通过肽链与所述细胞表面上的分子结合至所述细胞的表面。从而,在一些实施方案中,所述α链或所述β链通过与所述单链结构域非共价结合的方式结合至所述细胞表面上的所述分子。在一些实施方案中,所述α链和所述β链通过与所述单链结构域非共价结合的方式结合至所述细胞表面上的所述分子。在一些实施方案中,所述分子包含可锚定酵母细胞壁的结构。在一些实施方案中,所述分子包含可锚定酵母细胞壁的结构和分泌信号肽(或分泌信号区)。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖或细胞壁甘露聚糖结合。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖或细胞壁甘露聚糖共价或非共价结合。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖共价结合或细胞壁甘露聚糖非共价结合。在一些实施方案中,所述可锚定酵母细胞壁的结构和/或分泌信号肽源自α-凝聚素系统和/或絮 凝素系统(糖基磷脂酰肌醇锚定系统和絮凝素结构域锚定系统)。在一些实施方案中,所述可锚定酵母细胞壁的结构直接或间接与GPI锚定附着信号区或絮凝功能区相结合或相连接。在一些实施方案中,所述分子对于所述细胞是内源性的。在一些实施方案中,所述分子选自:Aga2p、a-凝集素、α-凝集素、絮凝素、Cwp1p、Cwp2p或Tip1p。在一些实施方案中,所述蛋白质是Aga2p。在一些实施方案中,Aga2p是由NCBI数据库中GENE ID为852851的基因编码的蛋白。Aga2p即Aga2蛋白,是a-凝聚素的结合亚基,其可通过二硫键与a-凝聚素核心亚基Aga1蛋白相结合。在一些实施方案中,Aga2p是由NCBI数据库中GENE ID为852851的基因编码的蛋白。在一些实施方案中,Aga2p是与由NCBI数据库中GENE ID为852851的基因编码的蛋白相比,其氨基酸序列具有1个、2个、3个、4个、5个、6个或更多个突变。在一些实施方案中,Aga2p是与由NCBI数据库中GENE ID为852851的基因编码的蛋白相比,其氨基酸序列不具有S288C突变。在一些实施方案中,所述Aga2p的氨基酸序列包含如SEQ ID NO:1所示的氨基酸序列,或如SEQ ID NO:1所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:1所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
在一些实施方案中,所述单链结构域为长度9~30个(例如10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28或29个)氨基酸的肽段,并且包含至少一个连续的长度固定为9个氨基酸的可被所述复合体特异性识别的注册肽(register)或其可被所述复合体结合的变体(variant)。在一些实施方案中,所述单链结构域的氨基酸序列如SEQ ID NO:2-6的任一项所示。在一些实施方案中,所述单链结构域的氨基酸序列为如SEQ ID NO:2-6的任一项所示的氨基酸序列的同义突变体。在一些实施方案中,所述单链结构域的氨基酸序列与如SEQ ID NO:2-6的任一项所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。应当理解,为所述α链和/或所述β链引入突变,可使得其构成的MHC II复合体肽结合结构域容纳的注册肽更长或更短,包含引入所述突变的所述α链和所述β链的所述工程化细胞也应涵盖在本申请的范围内,并且,对应于引入所述突变的所述α链和所述β链,所述肽链包含的注册肽的长度可以多于或少于9个氨基酸,例如为4、5、6、7、8、10、11、12、13、14、15、16、17、18、19或20个氨基酸。
在一些实施方案中,所述注册肽或其突变体的氨基酸序列如SEQ ID NO:20-38所示。
在一些实施方案中,所述第一蛋白结合结构域非共价结合至所述第二蛋白结合结构域。在一些实施方案中,所述第一蛋白结合结构域共价结合至所述第二蛋白结合结构域。所述第一蛋白结合结构域和所述第二蛋白结合结构域可以为相同或不同,且可以为同源蛋白或非同源蛋白。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域包含来自同一种抗体的片段。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域可以通过任何用于形成抗体同源或异源二聚体(或多聚体)的方法相互结合。所述用于形成抗体同源或异源二聚体(或多聚体)的方法包括但不限于: "knob-in-hole"技术(参见例如美国专利申请5,731,168),工程化静电转向技术(参见例如PCT申请WO 2009/089004A1),亮氨酸拉链技术(参见例如Kostelny et al,J.lmmunol.,148(5):1547-1553,1992))以及记载于美国专利申请号No.4,676,980以及文献Brennan et al,Science,229:81,1985中的那些。为实现基于上述方法的相互结合,所述第一蛋白结合结构域与所述第二蛋白结合结构域可包含用于上述方法相应结合结构的突变。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域通过二硫键相互结合。
在一些实施方案中,所述第一蛋白结合结构域为第一亮氨酸拉链结构域,所述第二蛋白结合结构域为第二亮氨酸拉链结构域,其中,形成完整亮氨酸拉链的所述第一亮氨酸拉链结构域与所述第二亮氨酸拉链结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用;或者所述第一蛋白结合结构域为FcA结构域,所述第二蛋白结合结构域为FcB结构域,其中,形成FcAB二聚体的所述FcA结构域与所述FcB结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用。即,或当所述第一蛋白结合结构域为第一亮氨酸拉链结构域时,所述第二蛋白结合结构域为第二亮氨酸拉链结构域;而当所述第一蛋白结合结构域为第二亮氨酸拉链结构域时,所述第二蛋白结合结构域为第一亮氨酸拉链结构域;或当所述第一蛋白结合结构域为FcA时,所述第二蛋白结合结构域为FcB;而当所述第一蛋白结合结构域为FcB时,所述第二蛋白结合结构域为FcA。
在一些实施方案中,所述第一亮氨酸拉链结构域和第二亮氨酸拉链结构域均为酸性亮氨酸结构域(Acid zipper)。在一些实施方案中,所述第一亮氨酸拉链结构域和第二亮氨酸拉链结构域均为碱性亮氨酸结构域(Basic zipper)。在一些实施方案中,所述第一亮氨酸拉链结构域为酸性亮氨酸结构域(Acid zipper),且所述第二亮氨酸拉链结构域为碱性亮氨酸结构域(Basic zipper)。在一些实施方案中,所述第一亮氨酸拉链结构域和第二亮氨酸拉链结构域均为c-Jun亮氨酸拉链结构域(Jun Zipper)。在一些实施方案中,所述第一亮氨酸拉链结构域为Jun Zipper且所述第二亮氨酸拉链结构域为c-Fos亮氨酸拉链结构域(Fos Zipper)。在一些实施方案中,所述酸性亮氨酸结构域包含如SEQ ID NO:7所示的氨基酸序列,或如SEQ ID NO:7所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:7所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述酸性亮氨酸结构域的氨基酸序列为如SEQ ID NO:7所示的氨基酸序列,或如SEQ ID NO:7所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:7所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。
在一些实施方案中,所述碱性亮氨酸结构域包含如SEQ ID NO:8所示的氨基酸序列,或如SEQ ID NO:8所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:8所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述碱性亮氨酸结构域的氨基酸序列为如SEQ ID NO:8所示的氨基酸序列,或如SEQ ID NO:8所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:8所示的氨基酸序列具有至少80%、85%、90%、91%、 92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。
在一些实施方案中,所述Fos Zipper包含如SEQ ID NO:9所示的氨基酸序列,或如SEQ ID NO:9所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:9所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述Fos Zipper的氨基酸序列为如SEQ ID NO:9所示的氨基酸序列,或如SEQ ID NO:9所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:9所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。
在一些实施方案中,所述JunZipper包含如SEQ ID NO:10所示的氨基酸序列,或如SEQ ID NO:10所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:10所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述JunZipper的氨基酸序列为如SEQ ID NO:10所示的氨基酸序列,或如SEQ ID NO:10所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:10所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。
在一些实施方案中,所述第一亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:7或SEQ ID NO:8或SEQ ID NO:9或SEQ ID NO:10所示;
相应的,所述第二亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:8或SEQ ID NO:7或SEQ ID NO:10或SEQ ID NO:9所示。
在一些实施方案中,所述第一蛋白结合结构域和所述第二蛋白结合结构域均包含免疫球蛋白的Fc结构域。免疫球蛋白分子被木瓜蛋白酶消化后分裂成Fc段和Fab段,其中Fc段包括所述免疫球蛋白分子重链恒定区CH1段以外的部分。在本申请中,术语“Fc结构域”是一个单体,其可以指免疫球蛋白任一条重链的恒定区中除CH1段以外的任何部分。在一些实施方案中,所述FcA和FcB可以源自相同免疫球蛋白的Fc段,也可以源自不同免疫球蛋白的Fc段。在一些实施方案中,所述FcA和FcB源自的免疫球蛋白可选自:IgG1、IgG、IgE、IgM、IgD和IgA。在一些实施方案中,所述FcA和FcB源自的免疫球蛋白可选自:IgG1、IgG2、IgG3、IgG4、IgA1和IgA2。在一些实施方案中,所述FcA和FcB选自如SEQ ID NO:11-19所示的氨基酸序列,如SEQ ID NO:11-19所示的氨基酸序列的保守取代变体,及与如SEQ ID NO:11-19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列中的一个或两个。例如,在一些实施方案中,所述FcA和FcB的氨基酸序列均如SEQ ID NO:11所示,或为如SEQ ID NO:11所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:11所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:12所示,或为如SEQ ID NO:12所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:12所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序 列同一性;且所述FcB的氨基酸序列如SEQ ID NO:13所示,或为如SEQ ID NO:13所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:13所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:4所示,或为如SEQ ID NO:13所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:13所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:12所示,或为如SEQ ID NO:12所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:3所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:14所示,或为如SEQ ID NO:14所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:14所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:15所示,或为如SEQ ID NO:15所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:15所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:15所示,或为如SEQ ID NO:15所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:15所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:14所示,或为如SEQ ID NO:14所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:14所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:16所示,或为如SEQ ID NO:16所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:16所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:17所示,或为如SEQ ID NO:17所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:17所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:17所示,或为如SEQ ID NO:17所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:17所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:16所示,或为如SEQ ID NO:16所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:16所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:18所示,或为如SEQ ID NO:18所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:18所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:19所示,或为如SEQ ID NO:19所示的氨基 酸序列的保守取代变体,或与如SEQ ID NO:19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:19所示,或为如SEQ ID NO:19所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:18所示,或为如SEQ ID NO:18所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:18所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。
在一些实施方案中,所述FcA结构域和FcB结构域能够提高所述复合体表达展示量和/或拷贝数量。在一些实施方案中,所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;其中,所述第一CH3结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型。在一些实施方案中,所述FcA结构域既包含第一CH2结构域又包含第一CH3结构域,所述FcB结构域既包含第二CH2结构域又包含第二CH3结构域;其中,所述第一CH3结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型;所述第一CH2结构域和所述第二CH2结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型;所述第一CH2结构域和所述第一CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型,且所述第二CH2结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型。根据所来源的免疫球蛋白的不同,例如当所述FcA和/或FcB来自IgE或IgM时,在一些实施方案中,所述FcA结构域还可进一步包含第一CH4结构域,所述FcB结构域还可进一步包含第二CH4结构域。
在一些实施方案中,所述第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域直接或通过接头间接相连。在一些实施方案中,所述第一蛋白结合结构域的所述第二CH2结构域和第二CH3结构域直接或通过接头间接相连。在一些实施方案中,所述第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头间接相连;且所述第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头间接相连。在一些实施方案中,所述接头为可切割或不可切割的接头。在一些实施方案中,所述接头为一段短肽或称为连接肽,所述连接肽可以为柔性连接肽或刚性连接肽。本领域技术人员可根据需求选择连接肽,连接肽可以选择不同的长度或功能,其种类及、功能及特征可参考,例如Reddy Chichili VP,Kumar V,Sivaraman J.Linkers in the structural biology of protein-protein interactions.Protein Sci.2013;22(2):153-167.doi:10.1002/pro.2206。
在一些实施方案中,所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,所述修饰能(1)增强或稳定所述FcA结构域和/或所述FcB结构域本身(“FcS”),和/或(2)增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),和/或(3)减少或避免所述FcA结构域与所述FcB结构域错配所致的权利要求1所述的α链和β链各种可能的错配(“FcN”)。
在一些实施方案中,所述第一蛋白结合结构域连接至所述α链的C末端,且所述第二 蛋白结合结构域连接至所述β链的C末端。
优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
在一些实施方案中,所述复合体为MHC-I类分子或MHC-II类分子。在一些实施方案中,所述复合体为HLA I类分子或HLA II类分子。
在一些实施方案中,所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
工程化核酸分子
本申请的第二方面,提供了一种编码主要组织相容性复合体(MHC)的工程化核酸分子,其中所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述α链和β链通过非共价键结合单链结构域,并且所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域共同构成的三聚体。同时,本申请还提供了编码所述三聚体的核酸分子。其中所述三聚体是本申请第一方面中所述工程化细胞包含的MHC以及单链结构域共同构成的三聚体。其中所述MHC是本申请第一方面中所述工程化细胞包含的MHC。
在一些实施方案中,所述工程化核酸分子为工程化DNA分子。在一些实施方案中所述DNA分子可以在细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在真核细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在原核细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在真核细胞中表达且可以在原核细胞中复制。因此所述DNA分子除包含编码所述MHC或所述三聚体的核酸片段以外,还包含用于在原核和/或真核细胞中复制和/或表达的基因操作或调控元件。
使所述工程化DNA分子在细胞中复制或高效复制所述必须的结构元件是本领域已知的,包括例如复制起点(ORI)。在一些实施方案中,所述工程化DNA分子还进一步包含标记基因或其片段和/或报道基因或其片段、和允许插入DNA元件的独特的限制性内切酶位点,优选多克隆位点(MCS)形式的限制性内切酶位点。所述标记基因有利于鉴定含有包含所述标记基因的质粒的细胞,可选自,例如抗生素抗性基因。所述MCS中的每一个限制性内切酶位点均可被不同的限制性内切酶特异性识别。
在一些实施方案中,所述DNA分子是DNA质粒。如本文所用,术语“DNA质粒”是指 由双链DNA分子组成的质粒。在一些实施方案中,所述“质粒”是环状DNA分子。在一些实施方案中,所述“质粒”还可以涵盖线性DNA分子。具体的,术语“质粒”还涵盖通过例如用限制性内切酶切割环状质粒,进而使该环状质粒分子转变成线性分子而使该环状质粒线性化所得到的分子,以及可在原核生物中复制的线性分子。质粒可以复制,即在细胞中独立于原核细胞拟核或类核存储的基因组遗传信息而扩增,并且可以用于克隆,即用于在细菌细胞中扩增遗传信息。优选地,根据本发明的DNA质粒是中拷贝或高拷贝质粒,更优选地是高拷贝质粒。此类高拷贝质粒的实例是这样的载体:其基于pUC、pTZ质粒或包含支持质粒高拷贝的ORI的任意其它质粒(例如pMB1、pCoIE1)等。
在一些实施方案中,所述工程化DNA分子是构成原核生物拟核或类核的DNA分子或其片段,或构成真核生物基因组的DNA分子或其片段,即所述包含前述MHC或三聚体的编码序列或其互补序列可随原核生物基因组进行复制。
在一些实施方案中,所述工程化DNA分子可转录为mRNA。在一些实施方案中,所述工程化DNA分子还包含转录后可用于启动或调控所述蛋白、多肽或其片段表达的元件的编码序列,所述元件包括但不限于5’UTR、3’UTR、poly(A)尾(或加尾信号)等。在一些实施方案中,所述工程化DNA分子包含至少一个非翻译区(UTR)的编码序列。在一些实施方案中,所述工程化DNA分子包含至少5'UTR的编码序列和所述蛋白、多肽或其片段的编码序列。在一些实施方案中,所述工程化DNA分子从5’至3’至少依次包含5'UTR的编码序列,所述MHC或所述三聚体的编码序列,3'UTR的编码序列,加尾信号(或Ploy(A)尾序列对应的DNA序列),并且在所述MHC或所述三聚体的编码序列的两端还可分别包含起始密码子(5’端)和终止密码子(3’端),其分别是所述mRNA分子的可被翻译的前三个核苷酸和后三个核苷酸。5'UTR通常包含至少一个核糖体结合位点(RBS),如原核生物中的Shine-Dalgarno序列,或至少一个翻译起始位点,如真核生物中的Kozak序列。RBS通过在翻译起始时募集核糖体来促进mRNA分子的有效且准确的翻译。可以通过改变给定的RBS或翻译启示位点的长度和序列以及距起始密码子的距离来优化其活性。可选地或任选地,5'UTR包括内部核糖体进入位点或IRES。3'UTR可包含一个或多个调控序列,如增强mRNA分子稳定性的氨基酸序列的结合位点、调控RNA分子(如miRNA分子)的结合位点、和/或参与mRNA分子的胞内运输的信号序列。
在前述实施方案的基础上,在一些实施方式中,所述目的基因片段还包含一个或多个另外的调控序列,如增强mRNA分子稳定性的氨基酸序列的结合位点、增强mRNA分子翻译的氨基酸序列的结合位点、调节元件(如核糖开关)、和/或对翻译起始产生积极影响的核苷酸序列。此外,在5'UTR内,优选地不存在功能性的上游开放阅读框、框外上游翻译起始位点、框外上游起始密码子、和/或产生减少或防止翻译的二级结构的核苷酸序列。5'UTR中此类核苷酸序列的存在可对翻译产生负面影响。
所述MHC或所述三聚体的编码序列包含可以翻译成氨基酸序列的密码子。所述编码序列包含的全部密码子中,可以全部是天然存在的编码氨基酸的密码子,也可以有部分或全部由人工合成的密码子组成。在一些实施方案中,所述部分或全部密码子经过了密 码子优化。在一些实施方案中,所述部分或全部密码子编码非天然氨基酸。
在一些实施方案中,所述工程化DNA分子在述目的基因片段的5’端一侧还进一步包含可启动或调控所述RNA转录所必须的结构元件,所述结构元件是本领域已知的。在一些实施方案中,所述结构元件至少包含启动子。启动子及其序列是本领域已知的,包括弱启动子、中等强度启动子、强启动子、mini启动子或核心启动子等。在一些特定的实施方案中,所述启动子为强启动子。在一些实施方案中,所述启动子可在原核细胞中启动所述MHC或所述三聚体的编码序列的转录。在一些实施方案中,所述启动子可以在真核细胞中启动所述MHC或所述三聚体的编码序列的转录。所述“启动子”包含至少一个转录识别位点及其后的转录因子结合位点。所述识别和结合位点可以与介导或调节转录的氨基酸序列相互作用。与识别位点相比,结合位点更靠近前述目的基因片段。结合位点可以是,例如原核生物中的Pribnow框或真核生物中的TATA框。例如,在一些实施方案中,当使用Pribnow框时,所述转录识别位点可以位于转录起始位点上游约35bp处,而转录因子结合位点可以位于转录起始位点上游约10bp处。在一些实施方案中,所述启动子包含至少一个另外的调控元件,如位于转录起始位点之前约40和/或60个核苷酸处富含AT的上游元件,和/或位于识别位点和结合位点之间的增强启动子活性的另外的调控元件。在一些实施方式中,所述启动子是强启动子,即所述启动子包含促进前述RNA编码序列转录的序列。强启动子是本领域技术人员已知的,例如来自大肠杆菌的RecA启动子衍生的OXB18、OXB19和OXB20启动子,或者可以通过常规实验室程序鉴定或合成。在一些实施方案中,所述启动子为T7启动子。在一些实施方案中,所述启动子前还包含另外的调控元件,如包含在DNA质粒中可促进前述RNA编码序列转录的增强子。
在一些实施方案中,所述真核细胞为酵母细胞。在一些实施方案中,所述DNA分子为酵母展示载体。
此外,本申请还提供了包含编码上述MHC的RNA分子。在一些实施方案中,所述RNA分子由以上工程化DNA分子转录获得。
在一些实施方式中,所述工程化核酸分子还可以是DNA与RNA的杂合分子。
此外,本申请还涵盖了任何包含上述工程化核酸分子的细胞。
制备及鉴定方法
本申请还涉及一种主要组织相容性复合体(MHC)或由MHC以及单链结构域通过非共价键结合形成的三聚体的制备方法,所述方法包括使用前述第二方面的工程化核酸分子转化细胞,以及在表达所述复合体或所三聚体的条件下培养所述细胞。在一些实施方案中,所述细胞是酵母细胞,例如酿酒酵母细胞。在一些实施方案中,所述细胞是酵母细胞且所述工程化核酸分子为酵母展示载体。在一些实施方案中,所述细胞属于酵母菌株EBY100。
本申请还涉及一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,所述方法包括:
i)使前述第一方面的细胞展示单链结构域文库,以及
ii)检测所述单链结构域展示的细胞文库中的单克隆表面的三聚体,从而鉴定与所述复合体结合的肽。在一些实施方案中,所述单链结构域包含肽,或肽的突变体,或肽的突变体文库,或肽的混合物。在一些实施方案中,所述肽,或肽的突变体,或肽的突变体文库,或肽的混合物非共价结合所述复合体。在一些实施方案中,所述肽为长度9~30个(例如10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28或29个)氨基酸的肽段,并且包含至少一个连续的长度固定为9个氨基酸的可被所述复合体特异性识别的注册肽或其可被所述复合体结合的变体。
发明人将人类MHC等位基因表达为酵母细胞表面的天然、非共价α/β二聚体,以便基于流式细胞术从选定抗原中直接快速筛选肽配体并提高肽/MHC亲和力测定的灵敏度和准确性。
本申请构建了RIPAAH(Rapid Identification of Peptide Antigen Associated by HLA-II)的方法和相关细胞及核酸,即利用酵母展示平台消除劳动密集型表达/纯化步骤。作为单细胞真核生物,酵母细胞具有大肠杆菌快速简便的克隆特性,并配备了类似于哺乳动物或昆虫细胞的翻译后修饰机制。该方法将给定MHC-II等位基因的一条链(α或β)连接至酵母表面蛋白,并允许另一条链(α或β)作为可溶性成分由同一酵母细胞分泌。该方案中可以修饰α/β链的两个C末端使其带有抗体重链可结晶片段的氨基酸序列(FcAB),该氨基酸序列促进可溶性分泌链与其他表面锚定链的配对。该方法针对非共价MHC-IIα/β异源二聚体表面表达的独特设计不需要共价连接的定位肽、链间接头或MHC-II氨基酸残基突变。该方法成功实现在酵母细胞表面展示正确折叠且功能齐全的MHC-II蛋白。该方法既有效提高了测定目标肽与酵母展示的MHC-II蛋白之间的亲和力的灵敏度和准确性,又可依赖于测试肽和参考肽与酵母展示的MHC-II蛋白之间的竞争性结合实现短时高效的高通量测定。
有了该方法及细胞、核酸的加持,发明人可以更有效的创建酵母细胞克隆文库,使得人类不同的MHC-II等位基因都将以其天然形式表达在不同酵母克隆的表面,以供TCR表位肽筛选。该方法的出现,将为迅速开发出基于T细胞免疫的新型疫苗和细胞疗法提供有效解决方案。此外,高通量高内涵的单细胞酵母表面MHC-II/肽配体的亲和力测定将推进针对T细胞表位肽图谱发现的人工智能计算方法。
细胞
一方面,本申请提供了一种细胞,其包括主要组织相容性复合体(MHC),其中所述复合体包含α链和β链,其中,
所述α链连接至第一蛋白结合结构域以构成第一融合蛋白,
所述β链连接至第二蛋白结合结构域以构成第二融合蛋白,
所述第一融合蛋白结合所述第二融合蛋白以形成所述复合体,
所述第一蛋白结合结构域和所述第二蛋白结合结构域(如,形成FcAB二聚体)以增强所述α链和β链结合形成所述MHC。
所述细胞是工程化细胞。
在一些实施方案中,所述细胞是真核细胞。在一些实施方案中,所述细胞是源自单细胞生物的细胞。在一些实施方案中,所述细胞源自单核真核细胞生物。在一些实施方案中,所述细胞是酵母细胞。在一些实施方案中,所述酵母为酒酵母。在一些实施方案中,所述酵母属于菌株EBY100(Boder&Wittrup,1997)。
在一些实施方案中,所述复合体通过将所述α链或所述β链连接至所述细胞表面上的分子而结合至所述细胞的表面。在一些实施方案中,所述复合体中另外的所述α链或另外的所述β链通过与所述β链或所述α链非共价结合从而结合至所述细胞表面上的所述分子。即由不同的所述细胞合成的α链与β链可以彼此结合。一个所述细胞合成的α链可以与另一个所述细胞合成的β链结合;由该一个所述细胞合成的β链也可以与另一个所述细胞合成的α链结合,从而使所述细胞膜上可以包含完整的MHC。为实现不同所述细胞分泌的α链与β链的组合,因此在一些实施方案中,所述α链和所述β链中,或确切地讲,所述第一融合蛋白或所述第二融合蛋白中,至少有一个是可分泌的。
所述分子可以是任何可使得第一蛋白结合结构域和/或第二蛋白结合结构域结合至所述细胞表面的分子。在一些实施方案中,所述分子与所述第一蛋白结合结构域和/或第二蛋白结合结构域通过共价键或非共价键相互结合。在一些实施方案中,所述分子是蛋白质。在一些实施方案中,所述分子是蛋白质,且所述第一蛋白结合结构域和/或第二蛋白结合结构域与所述分子通过肽键相结合。在一些实施方案中,所述分子包含可锚定酵母细胞壁的结构。在一些实施方案中,所述分子包含可锚定酵母细胞壁的结构和分泌信号肽(或分泌信号区)。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖或细胞壁甘露聚糖结合。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖或细胞壁甘露聚糖共价或非共价结合。在一些实施方案中,所述可锚定酵母细胞壁的结构可直接或间接与细胞壁葡聚糖共价结合或细胞壁甘露聚糖非共价结合。在一些实施方案中,所述可锚定酵母细胞壁的结构和/或分泌信号肽源自α-凝聚素系统和/或絮凝素系统(糖基磷脂酰肌醇锚定系统和絮凝素结构域锚定系统)。在一些实施方案中,所述可锚定酵母细胞壁的结构直接或间接与GPI锚定附着信号区或絮凝功能区相结合或相连接。在一些实施方案中,所述分子对于所述细胞是内源性的。在一些实施方案中,所述分子选自:Aga2p、a-凝集素、α-凝集素、絮凝素、Cwp1p、Cwp2p或Tip1p。在一些实施方案中,所述蛋白质是Aga2p。在一些实施方案中,Aga2p是由NCBI数据库中GENE ID为852851的基因编码的蛋白。Aga2p即Aga2蛋白,是a-凝聚素的结合亚基,其可通过二硫键与a-凝聚素核心亚基Aga1蛋白相结合。在一些实施方案中,Aga2p是由NCBI数据库中GENE ID为852851的基因编码的蛋白。在一些实施方案中,Aga2p是与由NCBI数据库中GENE ID为852851的基因编码的蛋白相比,其氨基酸序列具有1个、2个、3个、4个、5个、6个或更多个突变。在一些实施方案中,Aga2p是与由NCBI数据库中GENE ID为852851的基因编码的蛋白相比,其氨基酸序列不具有S288C突变。在一些实施方案中,所述Aga2p的氨基酸序列包含如SEQ ID NO:1所示的氨基酸序列,或如SEQ ID NO:1所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:1所示的氨基酸序列具有 至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列。在一些实施方案中,所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
在一些实施方案中,所述第一蛋白结合结构域非共价结合至所述第二蛋白结合结构域。在一些实施方案中,所述第一蛋白结合结构域共价结合至所述第二蛋白结合结构域。所述第一蛋白结合结构域和所述第二蛋白结合结构域可以为相同或不同,且可以为同源蛋白或非同源蛋白。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域包含来自同一种抗体的片段。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域可以通过任何用于形成抗体同源或异源二聚体(或多聚体)的方法相互结合。所述用于形成抗体同源或异源二聚体(或多聚体)的方法包括但不限于:"knob-in-hole"技术(参见例如美国专利申请5,731,168),工程化静电转向技术(参见例如PCT申请WO 2009/089004A1),亮氨酸拉链技术(参见例如Kostelny et al,J.lmmunol.,148(5):1547-1553,1992))以及记载于美国专利申请号No.4,676,980以及文献Brennan et al,Science,229:81,1985中的那些。为实现基于上述方法的相互结合,所述第一蛋白结合结构域与所述第二蛋白结合结构域可包含用于上述方法相应结合结构的突变。在一些实施方案中,所述第一蛋白结合结构域与所述第二蛋白结合结构域通过二硫键相互结合。
在一些实施方案中,所述第一蛋白结合结构域和所述第二蛋白结合结构域均包含免疫球蛋白的Fc结构域。免疫球蛋白分子被木瓜蛋白酶消化后分裂成Fc段和Fab段,其中Fc段包括所述免疫球蛋白分子重链恒定区CH1段以外的部分。在本申请中,术语“Fc结构域”是一个单体,其可以指免疫球蛋白任一条重链的恒定区中除CH1段以外的任何部分。在本申请中,当所述第一蛋白结合结构域和所述第二蛋白结构域均包含免疫球蛋白的Fc结构域时,所述第一蛋白结合结构域中的Fc结构域称为FcA,所述第二蛋白结合结构域中的Fc结构域称为FcB。所述FcA结构域与所述FcB结构域为相同的或不同的氨基酸序列,并且形成FcAB二聚体的所述FcA结构域与所述FcB结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用。即,当所述第一蛋白结合结构域为FcA时,所述第二蛋白结合结构域为FcB;而当所述第一蛋白结合结构域为FcB时,所述第二蛋白结合结构域为FcA。在一些实施方案中,所述FcA和FcB可以源自相同免疫球蛋白的Fc段,也可以源自不同免疫球蛋白的Fc段。在一些实施方案中,所述FcA和FcB源自的免疫球蛋白可选自:IgG1、IgG、IgE、IgM、IgD和IgA。在一些实施方案中,所述FcA和FcB源自的免疫球蛋白可选自:IgG1、IgG2、IgG3、IgG4、IgA1和IgA2。在一些实施方案中,所述FcA和FcB选自如SEQ ID NO:11-19所示的氨基酸序列,如SEQ ID NO:11-19所示的氨基酸序列的保守取代变体,及与如SEQ ID NO:11-19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性的氨基酸序列中的一个或两个。例如,在一些实施方案中,所述FcA和FcB的氨基酸序列均如SEQ ID NO:11所示,或为如SEQ ID NO:11所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:11所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:12所 示,或为如SEQ ID NO:12所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:12所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:13所示,或为如SEQ ID NO:13所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:13所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:13所示,或为如SEQ ID NO:13所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:13所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:12所示,或为如SEQ ID NO:12所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:12所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:14所示,或为如SEQ ID NO:14所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:14所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:15所示,或为如SEQ ID NO:15所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:15所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:15所示,或为如SEQ ID NO:15所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:15所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:14所示,或为如SEQ ID NO:14所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:14所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:16所示,或为如SEQ ID NO:16所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:16所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:17所示,或为如SEQ ID NO:17所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:17所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:17所示,或为如SEQ ID NO:17所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:17所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:16所示,或为如SEQ ID NO:16所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:16所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:18所示,或为如SEQ ID NO:18所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:18所示的 氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:19所示,或为如SEQ ID NO:19所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。在一些实施方案中,所述FcA的氨基酸序列如SEQ ID NO:19所示,或为如SEQ ID NO:19所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:19所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性;且所述FcB的氨基酸序列如SEQ ID NO:18所示,或为如SEQ ID NO:18所示的氨基酸序列的保守取代变体,或与如SEQ ID NO:18所示的氨基酸序列具有至少80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%序列同一性。
在一些实施方案中,所述FcA结构域和FcB结构域能够提高所述复合体表达展示量和/或拷贝数量。在一些实施方案中,所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;其中,所述第一CH3结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型。在一些实施方案中,所述FcA结构域既包含第一CH2结构域又包含第一CH3结构域,所述FcB结构域既包含第二CH2结构域又包含第二CH3结构域;其中,所述第一CH3结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型;所述第一CH2结构域和所述第二CH2结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型;所述第一CH2结构域和所述第一CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型,且所述第二CH2结构域和所述第二CH3结构域可来自同种免疫球蛋白亚型或不同种免疫球蛋白亚型。根据所来源的免疫球蛋白的不同,例如当所述FcA和/或FcB来自IgE或IgM时,在一些实施方案中,所述FcA结构域还可进一步包含第一CH4结构域,所述FcB结构域还可进一步包含第二CH4结构域。
在一些实施方案中,所述第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域直接或通过接头间接相连。在一些实施方案中,所述第一蛋白结合结构域的所述第二CH2结构域和第二CH3结构域直接或通过接头间接相连。在一些实施方案中,所述第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头间接相连;且所述第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头间接相连。在一些实施方案中,所述接头为可切割或不可切割的接头。在一些实施方案中,所述接头为一段短肽或称为连接肽,所述连接肽可以为柔性连接肽或刚性连接肽。本领域技术人员可根据需求选择连接肽,连接肽可以选择不同的长度或功能,其种类及、功能及特征可参考,例如Reddy Chichili VP,Kumar V,Sivaraman J.Linkers in the structural biology of protein-protein interactions.Protein Sci.2013;22(2):153-167.doi:10.1002/pro.2206。
在一些实施方案中,所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,所述修饰能增强或稳定所述FcA结构域和/或所FcB结构域本身(“FcS”),和/或增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),和/或减少或避免所述FcA结构域与所述FcB结构域错配所致的本申请提及的所述的α链和β 链各种可能的错配(“FcN”)。
在一些实施方案中,所述第一蛋白结合结构域连接至所述α链的C末端,且所述第二蛋白结合结构域连接至所述β链的C末端。
优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
在一些实施方案中,所述复合体为MHC-I类分子或MHC-II类分子。在一些实施方案中,所述复合体为HLA I类分子或HLA II类分子。
在一些实施方案中,所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
工程化核酸分子
本申请的第二方面,提供了一种编码主要组织相容性复合体(MHC)的工程化核酸分子,其中所述主要组织相容性复合体(MHC)的核酸为本申请第一方面的细胞中包含的组织相容性复合体。所述编码所述MHC的工程化核酸分子是指包含所述MHC编码序列和/或其互补序列的工程化核酸分子。
具体地,在一些实施方案中,所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述第一蛋白结合域和所述第二蛋白结合域能够结合以形成所述复合体,所述第一蛋白结合结构域和所述第二蛋白结合结构域形成FcAB二聚体以增强所述α链和β链结合形成所述MHC。
在一些实施方案中,所述工程化核酸分子为工程化DNA分子。在一些实施方案中所述DNA分子可以在细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在真核细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在原核细胞中复制和/或表达。在一些实施方案中,所述DNA分子可以在真核细胞中表达且可以在原核细胞中复制。因此所述DNA分子除包含编码所述主要组织相容性复合体(MHC)的核酸片段以外,还包含用于在原核和/或真核细胞中复制和/或表达的基因操作或调控元件。
使所述工程化DNA分子在细胞中复制或高效复制所述必须的结构元件是本领域已知的,包括例如复制起点(ORI)。在一些实施方案中,所述工程化DNA分子还进一步包含标记基因或其片段和/或报道基因或其片段、和允许插入DNA元件的独特的限制性内切酶位点,优选多克隆位点(MCS)形式的限制性内切酶位点。所述标记基因有利于鉴定含有包 含所述标记基因的质粒的细胞,可选自,例如抗生素抗性基因。所述MCS中的每一个限制性内切酶位点均可被不同的限制性内切酶特异性识别。
在一些实施方案中,所述DNA分子是DNA质粒。如本文所用,术语“DNA质粒”是指由双链DNA分子组成的质粒。在一些实施方案中,所述“质粒”是环状DNA分子。在一些实施方案中,所述“质粒”还可以涵盖线性DNA分子。具体的,术语“质粒”还涵盖通过例如用限制性内切酶切割环状质粒,进而使该环状质粒分子转变成线性分子而使该环状质粒线性化所得到的分子,以及可在原核生物中复制的线性分子。质粒可以复制,即在细胞中独立于原核细胞拟核或类核存储的基因组遗传信息而扩增,并且可以用于克隆,即用于在细菌细胞中扩增遗传信息。优选地,根据本发明的DNA质粒是中拷贝或高拷贝质粒,更优选地是高拷贝质粒。此类高拷贝质粒的实例是这样的载体:其基于pUC、pTZ质粒或包含支持质粒高拷贝的ORI的任意其它质粒(例如pMB1、pCoIE1)等。
在一些实施方案中,所述工程化DNA分子是构成原核生物拟核或类核的DNA分子或其片段,或构成真核生物基因组的DNA分子或其片段,即所述包含前述MHC的编码序列或其互补序列可随原核生物基因组进行复制。
在一些实施方案中,所述工程化DNA分子可转录为mRNA。在一些实施方案中,所述工程化DNA分子还包含转录后可用于启动或调控所述蛋白、多肽或其片段表达的元件的编码序列,所述元件包括但不限于5’UTR、3’UTR、poly(A)尾(或加尾信号)等。在一些实施方案中,所述工程化DNA分子包含至少一个非翻译区(UTR)的编码序列。在一些实施方案中,所述工程化DNA分子包含至少5'UTR的编码序列和所述蛋白、多肽或其片段的编码序列。在一些实施方案中,所述工程化DNA分子从5’至3’至少依次包含5'UTR的编码序列,所述MHC的编码序列,3'UTR的编码序列,加尾信号(或Ploy(A)尾序列对应的DNA序列),并且在所述MHC的编码序列的两端还可分别包含起始密码子(5’端)和终止密码子(3’端),其分别是所述mRNA分子的可被翻译的前三个核苷酸和后三个核苷酸。5'UTR通常包含至少一个核糖体结合位点(RBS),如原核生物中的Shine-Dalgarno序列,或至少一个翻译起始位点,如真核生物中的Kozak序列。RBS通过在翻译起始时募集核糖体来促进mRNA分子的有效且准确的翻译。可以通过改变给定的RBS或翻译启示位点的长度和序列以及距起始密码子的距离来优化其活性。可选地或任选地,5'UTR包括内部核糖体进入位点或IRES。3'UTR可包含一个或多个调控序列,如增强mRNA分子稳定性的氨基酸序列的结合位点、调控RNA分子(如miRNA分子)的结合位点、和/或参与mRNA分子的胞内运输的信号序列。
在前述实施方案的基础上,在一些实施方式中,所述目的基因片段还包含一个或多个另外的调控序列,如增强mRNA分子稳定性的氨基酸序列的结合位点、增强mRNA分子翻译的氨基酸序列的结合位点、调节元件(如核糖开关)、和/或对翻译起始产生积极影响的核苷酸序列。此外,在5'UTR内,优选地不存在功能性的上游开放阅读框、框外上游翻译起始位点、框外上游起始密码子、和/或产生减少或防止翻译的二级结构的核苷酸序列。5'UTR中此类核苷酸序列的存在可对翻译产生负面影响。
所述MHC的编码序列包含可以翻译成氨基酸序列的密码子。所述编码序列包含的全部密码子中,可以全部是天然存在的编码氨基酸的密码子,也可以有部分或全部由人工合成的密码子组成。在一些实施方案中,所述部分或全部密码子经过了密码子优化。在一些实施方案中,所述部分或全部密码子编码非天然氨基酸。
在一些实施方案中,所述工程化DNA分子在述目的基因片段的5’端一侧还进一步包含可启动或调控所述RNA转录所必须的结构元件,所述结构元件是本领域已知的。在一些实施方案中,所述结构元件至少包含启动子。启动子及其序列是本领域已知的,包括弱启动子、中等强度启动子、强启动子、mini启动子或核心启动子等。在一些特定的实施方案中,所述启动子为强启动子。在一些实施方案中,所述启动子可在原核细胞中启动所述MHC的编码序列的转录。在一些实施方案中,所述启动子可以在真核细胞中启动所述MHC的编码序列的转录。所述“启动子”包含至少一个转录识别位点及其后的转录因子结合位点。所述识别和结合位点可以与介导或调节转录的氨基酸序列相互作用。与识别位点相比,结合位点更靠近前述目的基因片段。结合位点可以是,例如原核生物中的Pribnow框或真核生物中的TATA框。例如,在一些实施方案中,当使用Pribnow框时,所述转录识别位点可以位于转录起始位点上游约35bp处,而转录因子结合位点可以位于转录起始位点上游约10bp处。在一些实施方案中,所述启动子包含至少一个另外的调控元件,如位于转录起始位点之前约40和/或60个核苷酸处富含AT的上游元件,和/或位于识别位点和结合位点之间的增强启动子活性的另外的调控元件。在一些实施方式中,所述启动子是强启动子,即所述启动子包含促进前述RNA编码序列转录的序列。强启动子是本领域技术人员已知的,例如来自大肠杆菌的RecA启动子衍生的OXB18、OXB19和OXB20启动子,或者可以通过常规实验室程序鉴定或合成。在一些实施方案中,所述启动子为T7启动子。在一些实施方案中,所述启动子前还包含另外的调控元件,如包含在DNA质粒中可促进前述RNA编码序列转录的增强子。
在一些实施方案中,所述真核细胞为酵母细胞。在一些实施方案中,所述DNA分子为酵母展示载体。
此外,本申请还提供了包含编码上述MHC的RNA分子。在一些实施方案中,所述RNA分子由以上工程化DNA分子转录获得。
在一些实施方式中,所述工程化核酸分子还可以是DNA与RNA的杂合分子。
此外,本申请还涵盖了任何包含上述工程化核酸分子的细胞。
制备及鉴定方法
本申请还涉及一种主要组织相容性复合体(MHC)的制备方法,所述方法包括:用上述第二方面的工程化核酸分子转化细胞,以及在表达所述复合体的条件下培养所述细胞。在一些实施方案中,所述细胞是酵母细胞,例如酿酒酵母细胞。在一些实施方案中,所述细胞是酵母细胞且所述工程化核酸分子为酵母展示载体。
本申请还涉及一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,所述方法包括:
i)使前述第一方面的细胞与肽接触,
以及ii)检测所述肽与所述复合体的结合,从而鉴定所述复合体的结合肽。
在一些实施方案中,所述肽包含肽的混合物。
在一些实施方案中,所述肽与参考肽竞争结合所述复合体。所述“参考肽”可以是任何已知的可与MHC结合的肽,例如抗原表位肽。
实施例
实施例1:依赖于肽结合和HLA所接C端结构域加持的酵母细胞共展示
MHC-II作为跨膜蛋白,其α和β链异二聚的蛋白稳定性一定程度上取决于α和β链各自的穿膜结构域和胞内结构域的协作。为了利用酵母这类真核单细胞生命体表达稳定的人源MHC-II胞外结构域(不含穿膜结构域和胞内结构域),发明人首先选用了MHC-II人类白细胞抗原HLA-DR1(HLA-DRA*01:01/HLA-DRB1*01:01,如SEQ ID NO:40和SEQ ID NO:41所示),HLA-DR4(HLA-DRA*01:01/HLA-DRB1*04:01,如SEQ ID NO:40和SEQ ID NO:42所示),HLA-DQ6(HLA-DQA1*01:02/HLA-DQB1*06:02,如SEQ ID NO:43和SEQ ID NO:44所示)等最具有代表性的等位基因对应的蛋白异二聚体,然后构建可表达DR1、DR4、或DQ6重组胞外结构域复合体的酵母穿梭载体。所设计载体质粒中的双向启动子GAL1-10可指导α和β链的同时表达,并且在表达的α和β链各自的C末端融合了亮氨酸拉链结构域(LZA或LZB以形成LZ稳定二聚体,实施例选用序列如SEQ ID NO:9和SEQ ID NO:10所示)或抗体重链恒定域(FcA或FcB以形成Fc稳定二聚体,实施例选用序列如SEQ ID NO:15和SEQ ID NO:16所示)使得带有LZ或Fc修饰的HLA-IIα/β链更容易形成天然的HLA复合体(图2-7)。与此同时,在Fc修饰的DRβ链或DQα链N末端进一步以酵母内源的AGA2 SP(如SEQ ID NO:39所示)信号肽代替了合成的SPP SP(如SEQ ID NO:45所示)信号肽。尽管C末端融合的LZ和Fc都可以促成稳定的HLAα和β链的异二聚,流式细胞检测结果显示,酵母表面由Fc修饰的HLA重组蛋白表达水平明显高于LZ修饰的HLA重组蛋白表达水平(图8)。这个结果证实了所设计的HLA融合蛋白的C末端修饰,尤其是带有AGA2 SP信号肽引导胞外递送和/或Fc修饰可以有效的促进酵母表达HLA胞外域。
发明人然后选用了与DR1和DR4都可以特异性结合的肽片段,即流感血凝素(HA)306-318(如SEQ ID NO:2所示)来验证酵母细胞展示肽/MHC-II三聚体。其中,抗原肽和HLA-DRα/β异二聚体的表达由两组酵母穿梭载体质粒驱动。首先,无论是否形成肽/MHC-II三聚体,抗原肽由于单独由肽展示型质粒表达,接受这类质粒转化的子代酵母细胞都可以通过将肽片段融合到内源性酵母粘附受体亚基Aga2p的C-末端的形式将单链融合抗原肽展示在细胞表面(图9)。其中,指导单链融合抗原肽基因表达的载体使用的是传统的酵母展示方法(参见例如Boder&Wittrup,Yeast surface display for screening combinatorial polypeptide libraries,1997)。为了使所选用的各种代表性HLA等位基因蛋白成功形成二聚体,并且成功的通过与单链融合抗原肽非共价结合(而非直接展示)的方式共同展示到酵母表面,发明人先将前述HLA展示型质粒中用于酵母展示HLA的Aga2p 蛋白的编码基因从质粒中敲除,构建了只分泌而不展示C-末端增强修饰的HLA二聚体结构骨架的HLA分泌型载体(图2-7)。这种二聚融合蛋白是通过独立于单链融合抗原肽表达载体的方式从酵母细胞分泌,然后在细胞内成功结合抗原肽,最终实现在细胞表面锚定肽/HLA三聚体结构(图1)。指导单链融合抗原肽展示的载体和C-末端增强修饰的HLA分泌型载体转化到酿酒酵母EBY100中进行蛋白表达和三聚体共展示。利用双抗荧光标记和流式细胞仪检测技术,发明人成功的证明了CODAAH系统可以实现在细胞表面展示肽/HLA三聚体(图9)。尤为重要的是,对HLA的C-末端进行过修饰尤其是Fc修饰后,大大提高了DR1在酵母表面所形成的肽/HLA三聚体的表达丰度和展示量(以荧光信号强度的均值或中位数(MFI)表示,信号增强达2倍以上)。该方法可使得一些难以在酵母表面共展示的HLA等位基因蛋白成功形成肽/HLA三聚体(以荧光信号强度的均值或中位数表示,信号从无到有)。
实施例2:构建展示肽文库实现CODAAH系统优化和抗原肽开发
CODAAH系统共展示肽/HLA三聚体一方面取决于对HLA-IIα/β异二聚体蛋白本身的修饰和稳定,另一方面取决于对HLA-II特异性抗原肽的优选。发明人根据HA306-318的氨基酸序列进行分析,设计了肽变体文库。文库将HA306-318注册肽序列:YVKQNTLKL(如SEQ ID NO:20所示),中主要与DR1结合的5个氨基酸残基进行随机突变,意图通过流式细胞分选的方式筛出能与DR1亲和力较高的肽变体,使更多的DR1二聚体可以通过结合肽变体的方式展示在酵母表面。YVKQNTLKL与DR1结合的锚定残基主要为P1Y、P4Q、P6T、P7L、P9L。利用PCR引物携带五个简并密码子(NNS)扩增插入片段和双酶切获得线性化空载质粒的方式,发明人将高浓度的携带有可编码HA306-318肽变体序列(如SEQ ID NO:46所示)的插入片段和线性化空载质粒转化到可分泌表达DR1的酵母细胞中。插入片段和线性化空载质粒在子代酵母细胞中通过同源重组形成展示肽变体的质粒,使得文库中每个酵母细胞既可以展示至少一种肽变体又可以同时分泌DR1分子。以此方式构建的文库库容>108。流式细胞分选筛选肽变体(信号可HA标签代表)和DR(信号可由L243单抗代表)双阳性,获得目的酵母克隆。经过4轮富集和分选,子代文库(约占亲本库容0.01%,~104酵母克隆)的HA+/DR+双阳信号对比亲本文库的双阳信号有了显著的提高(图10)。对子库单克隆所含质粒的测序结果显示达90%的克隆携带有碱基突变的肽变体(图11)。从酵母克隆直接进行的菌PCR测序结果中,所设计的简并密码子位置大都呈现重叠峰测序结果(10个中有8个,#1和#4除外),说明HA+/DR+双阳性单酵母细胞中含有多个肽变体展示质粒,且各质粒所表达的肽变体在五个锚定残基位置不完全相同。进一步推论是,酵母克隆#5与克隆#7肽变体的主导密码子DNA与主导密码子对应的产物相同,并不能肯定这两个酵母克隆所含肽变体展示型重组质粒完全相同。发明人抽提编号为#1、#7、#9、#10的单克隆酵母所含质粒并转化了大肠杆菌,从选定的多个大肠杆菌中抽提质粒和测序的结果验证了这一猜想:单酵母细胞可以容纳至少3个肽变体展示质粒,并且每个质粒表达的肽变体都可以结合DR1(图12)。另外,酵母菌PCR测序未显示重叠峰的#1克隆确 实只含有一种肽变体(大肠杆菌抽提质粒#1-1和#1-2的测序结果与酵母菌#1PCR测序结果完全相同),而显示重叠峰的#7、#9、#10各个克隆也确实含有不只一种肽变体。这一结论表明所构建的亲本文库的实际库容可能比理论库容大,甚至可能超过其一个数量级接近工业级库容109。这也意味着HA+/DR+双阳性子库中可结合DR1的肽变体数量级可达105,其中达90%的肽变体可能在锚定位点含有实际氨基酸突变,且对提高DR1亲和力有促进作用,进而有助于CODAAH系统共展示肽与DR1形成的三聚体。
构建文库优选的肽变体促进肽/HLA三聚体共展示的作用甚至可以提高CODAAH系统在其他相关肽/DR共展示中的应用。比如,与野生型HA306-318相比(图9),肽变体#1不仅提高了其与DR1结合的能力,也提高了在DR4分泌型酵母转化子代菌株中结合DR4的能力(图13)。而且重要的一点是,对比DR4-LZ,DR4-Fc融合蛋白结构中的Fc修饰显然更有助于流式细胞术在转化的酵母细胞表面检测到相对高水平的DR4蛋白(图13),再次验证了C末端Fc修饰对于稳定HLA-IIα/β异二聚体的最佳效果。与构建HA306-318变体文库发现DR1新抗原肽的方法类似的,以肽变体#1为野生型构建#1变体文库可作为发现DR4新抗原的方法。值得一提的是,对肽变体#1进行blast分析,发明人发现肽变体#1注册肽序列(YVKLNPAKA(如SEQ ID NO:25所示))的五个DR结合锚定残基P1Y、P4L、P6P、P7A、P9A,与大肠杆菌NADH-quinone氧化还原酶亚基NuoG的潜在DR注册肽序列(YIKLNPADA(如SEQ ID NO:38所示))的五个锚定残基完全相同。这一发现可指导预测DR1或DR4阳性人群可能产生的源自大肠杆菌的新抗原肽和可能相关的免疫反应。总之,上述结论证明了CODAAH系统在优选肽和发现HLA新抗原肽应用中的高通量高效率等重要作用。
实施例1和2中所用材料和方法
所用材料和试剂如无特殊说明,均可从商业途径获得。所用实验方法如无特殊说明,均为常规方法。
酵母共展示的穿梭载体的构建和电转化
用KpnI和SacI将整个基因表达盒(GAL1-10//AGA2-HA//scFv 4-4-20//MFαTerm.)从pCT302中切出,然后亚克隆到KpnI/SacI部分酶切处理的酵母穿梭载体pRS315中,形成具有LEU营养标记基因的酵母表面展示质粒。然后用一段可编码抗原肽(如,流感病毒血凝素残基HA306-318:SEQ ID NO:2)和短间隔子(GGGS)和V5表位标签的寡核苷酸DNA链来取代原质粒中编码scFv 4-4-20的DNA链,以构建了用于展示Aga2-单链抗原肽融合蛋白的载体。
载体质粒ptDR4-LZ和ptDQ6-LZ(Liu,Jiang,&Mellins,Yeast display of MHC-II enables rapid identification of peptide ligands from protein antigens(RIPPA),2021)可分别用于酵母展示C-端LZ增强修饰的HLA-DR4和HLA-DQ6胞外域异二聚体。为了创建用于酵母展示C-端LZ增强修饰的HLA-DR1胞外域异二聚体的载体,将ptDR4-LZ中的β基因HLA-DRB1*04:01替换为HLA-DRB1*01:01。为了创建用于酵母分泌C-端增强修饰的HLA 胞外域异二聚体的载体,将上述三个用于酵母展示HLA-II的质粒DRβ基因或DQ6α基因下游(3‘-端)的编码AGA2和HA-标签的核酸序列替换为同阅读框的终止密码子。这样得到的两组载体就可分别用于展示或分泌C-末端亮氨酸结合域增强修饰的DR1、DR4和DQ6三种代表性的HLA融合蛋白(参考图2-4)。进一步,通过分子克隆手段,将两组每组各三个载体(一组三个展示型、另一组三个分泌型)中各自HLAα和β基因下游(3‘-端)的第一亮氨酸拉链片段和第二亮氨酸拉链片段分别置换成抗体可结晶片段。这样,得到的新的两组每组各三个新载体(一组三个展示型、另一组三个分泌型)就可以分别指导表达C-端可结晶抗体恒定域增强修饰的DR1、DR4和DQ6融合蛋白(参考图5-7)。然后,将编码AGA2SP(如SEQ ID NO:39所示)信号肽的核酸序列克隆到Fc修饰的DRβ链或DQα链N末端取代原有的编码SPP SP(如SEQ ID NO:45所示)信号肽的核酸序列。
分泌或展示HLA-II的TRP+载体质粒分别与可指导单链抗原肽融合蛋白表达的LEU+载体一同或各个质粒分别单独转化到酿酒酵母菌株EBY100(GAL1-AGA1:URA3 ura3-52trp1 leu2Δ1 his3Δ200 pep4:HIS2 prb1Δ1.6R can1 GAL)中。电转化使用MicroPulser电穿孔仪(BioRad)进行电穿孔,基本方法参考BioRadMicroPulser手册。接受到TRP+和/或LEU+载体的酵母细胞单克隆在30℃恒温箱培养2-3天后会在含有色氨酸和/或亮氨酸营养缺失的固体琼脂培养基平板上形成肉眼可见的单个酵母菌落。培养基SD-SCAA含2%(wt/vol)葡萄糖,0.67%(wt/vol)不含氨基酸的酵母氮源基,适量的氨基酸(Trp-和/或Leu-以及可选Ura-)缺失补充混合物(Clontech),38mM Na2HPO4,62mM NaH2PO4,pH6。
用于流式细胞检测的共展示酵母的制备
用单个酵母菌落接种2ml SD-SCAA培养基,在30℃摇床中培养,直到达到2.5-5.0×107个细胞/mL的密度(OD600为2.5-5.0)。为了诱导GAL1-10促进的蛋白表达,离心收集107个细胞,并转移到2mL SG-SCAA培养基(葡萄糖被半乳糖取代)。在30℃下诱导16-18小时后,对于每个样品,通过离心法收获106个细胞,用于免疫荧光标记。当需要剥离蛋白质时,首先用400μL Tris-缓冲盐水(137mMNaCl,20mM Tris-Cl,pH7.6)清洗诱导的细胞1-2次,然后在20μL还原缓冲液[50mM Tris-Cl pH8.0,1mM DTT(Sigma;使用前加入)]中,在4℃下轻轻摇晃24小时,或在20μL因子Xa缓冲液(100mMNaCl,2mMCaCl2,20mMTris-Cl,pH8)与20μg/mL因子Xa蛋白酶(New England Biolabs)中在23℃下孵育至少48小时。在一级标记前,细胞经离心沉淀并用400μL冰冷的PBS+1%BSA清洗至少一次。将细胞颗粒重新悬浮在25μL PBS+1%BSA中,其中含有小鼠抗DR单克隆抗体L243(BD Biosciences)或抗DQ单克隆抗体SPV-L3(BD Biosciences)和兔抗HA多克隆抗体(Sigma),并在室温下孵浴30分钟,然后在冰上孵浴10分钟。离心除去一级试剂后,再次用冰冷的PBS+1%BSA清洗细胞,并重新悬浮在40μL PBS+1%BSA中,其中含有高度交叉吸附的二级抗体(Thermo Fisher):Alexa FLuor 647山羊抗鼠IgG(H+L)(1∶80)和Alexa FLuor 488山羊抗兔IgG(H+L)(1∶80),在冰上孵浴40分钟。最后用冰冷的PBS+1%BSA进行清洗,然后将细胞重新悬浮在500-700μL的PBS+1%BSA中进行流式细胞检测。为了同时检测侧接肽的HA-标签和V5-标签,用小鼠抗V5 mAb(Thermo Fisher)以1∶30的稀释度代替mAb L243作为一级标记 试剂,而其他标记步骤没有变化。
肽变体/HLA三聚体共展示酵母文库构建与单克隆分选和分析
为使文库中的酵母展示HA306-318肽变体进而优选可结合DR1的肽变体,发明人设计了如下引物:
其中,P0用于在5个HA306-318肽锚定残基位置引入随机突变,P1和P2用于扩增携带肽突变体的基因插入片段并且在片段两端添加与线性化载体两端的同源序列。线性化肽展示型载体通过XmaI/NheI双酶切获得。经过离心浓缩干燥处理的总量为50μg的插入片段和线性化载体(1:1质量比)电转化到800μl可分泌DR1的酵母感受态细胞中,获得共展示所用酵母库。酵母库的免疫荧光标记在前述酵母株的标记方法基础上等比放大。HA标签和DR共标记的酵母细胞的分选在FACSAriaIIu流式细胞分选仪上完成。分选所得的子库重新在SD培养基中扩增,然后在SG培养基中诱导表达进行流式细胞分析或进一步分选。最后一次分选并扩增的子库选样涂布琼脂糖培养基以进行单克隆酵母菌PCR及基因测序,然后,进一步酵母菌质粒抽提、大肠杆菌转化及后续对应于一个酵母菌的多个大肠杆菌的质粒抽提和基因测序分析。测序验证的肽变体展示质粒重新转化到可分泌DR1或DR4的酵母感受态细胞中,以进行三聚体共展示验证。验证方法可参考流式细胞检测分析和如下定量分析方法。
相对结合的流式细胞检测和定量分析
每个样品至少收集10,000个细胞事件,通过正向和侧向散射进行门控。使用的流式细胞仪包括FACSCalibur和FACSAriaIIu(BD Biosciences)。使用Flowjo软件(BD)分析了肽/HLA三聚体共展示的酵母菌株的流式细胞数据。肽(如,流感病毒血凝素肽HA306-318)在酵母表面的展示水平与背景校正后的平均荧光强度值成正比,并归一到背景强度上,用cMFI表示
其中,MFI(HA-)和MFI(HA+)分别代表使用Flowjo计算的阴性和阳性细胞群的HA-标记偶联的Alexa Fluor 488发射的平均荧光强度。使用抗V5-标签染色时对酵母表面的肽展示水平分析可以得到与上述方法相当的cMFI值。
依赖于肽结合的HLA展示水平可以通过使用归一化、背景校正的荧光与抗DR染色和肽展示水平(DR-比率)来正确计算(抗DQ染色用类似方法计算)
其中,(+)和(-)分别代表共展示的酵母和只展示肽的酵母。MFI(DR)和cMFI代表DR 偶联的Alexa Fluor 647发射的平均荧光强度和HA306-318在相应酵母菌株表面的展示水平。归一化最大限度地减少了由于激光功率输出、检测器放大和其他细胞仪参数引起的实验之间的误差。
实施例3:该方法有效提高“空”MHC-II复合体的展示量
在人类专职抗原呈递细胞(professional APC)的内质网中,新生MHC-II(或称人类白细胞抗原,HLA-II)的肽结合槽一般被伴侣不变链(Ii)占据,起到稳定HLA-II蛋白结构和避免HLA-II和细胞内干扰肽错配的作用。其中,Ii可以被蛋白水解酶修剪以产生CLIP肽。CLIP肽继续以较低亲和力结合HLA-II,直到更高亲和力的抗原肽通过肽交换过程替换CLIP。这个过程常发生在MHC-II类区室(MIIC)中,在那里肽交换过程通常由HLA-DM催化调节。这些免疫学信息表明肽结合槽呈“空”状态下的MHC-II蛋白是不稳定的,这一点在许多蛋白表达系统中也得到了证实。为了有效提高酵母表达“空”MHC-II,发明人运用分子克隆技术,将给定MHC-II等位基因的一条链(α或β)连接至酵母表面蛋白,并允许另一条链(α或β)作为可溶性成分由同一酵母细胞分泌。发明人修饰了α/β链的两个C末端使其带有抗体重链可结晶片段的氨基酸序列(FcAB,或在HLA-II融合蛋白描述中简称为Fc),该氨基酸序列有效促进可溶性分泌链与其他表面锚定链的配对,显著提高MHC-II复合体表达展示量和/或拷贝数量。
结果
发明人使用HLA-DR1(HLA-DRA*01:01/HLA-DRB1*01:01,如SEQ ID NO:40和SEQ ID NO:41所示)、HLA-DR4(HLA-DRA*01:01/HLA-DRB1*04:01,如SEQ ID NO:40和SEQ ID NO:42所示)、HLA-DR15(HLA-DRA*01:01/HLA-DRB1*15:01,如SEQ ID NO:40和SEQ ID NO:81所示)作为代表性的DR等位基因;使用HLA-DQ6(HLA-DQA1*01:02/HLA-DQB1*06:02,如SEQ ID NO:43和SEQ ID NO:44所示)作为代表性的DQ等位基因。在构建的可表达DR1、DR4、DR15或DQ6复合体的酵母穿梭载体中,双向GAL1-10启动子可指导α和β链的同时表达。与之前使用的亮氨酸拉链(LZ)基序以促进二聚化相比,带有C末端FcAB修饰的HLA-IIα/β链更容易形成“空”HLA复合体(图14)。
发明人通过将构建成功的酵母穿梭载体分别转化进亲本酿酒酵母菌株EBY100(Boder&Wittrup,1997)来创建新酵母菌株。诱导蛋白表达后,发明人对酵母细胞进行了免疫荧光标记并进行了流式细胞检测。检测用抗体是一种能够识别正确折叠的HLA-IIαβ蛋白复合体空间表位的鼠源单克隆抗体L243和流式细胞术测量结果表明,FcAB修饰的HLA-IIα/β在酵母表面的表达水平普遍高于LZ修饰的HLA-IIα/β在酵母表面的表达水平,包括DR1,DR15,DQ6(MFI_Fc>MFI_LZ)。其中,DR15和DQ6的HLA-II阳性荧光强度信号分别显著提高172.7%和86.6%(MFI_Fc-MFI_LZ)/(MFI_LZ-MFI_BC)x100%(图14)。这种展示水平的提高归因于FcA与FcB结合结构域之间的高亲和力和高结合特异性。
实施例4:该方法可进一步优化并提高“空”MHC-II复合体的展示量
介绍
真核细胞分泌蛋白通常会利用氨基酸多肽序列氮端(N-terminus)的信号肽(SP)。酵母展示的MHC-II双链同样需要类似的信号肽。此前构建的可表达DR1、DR4、DR15或DQ6复合体的酵母穿梭载体中,α和β链的氮端都选择了syn-pre-pro合成信号肽,为了进一步提高“空”HLAα/β复合体的酵母表面展示量,发明人将酵母展示链的氮端syn-pre-pro合成信号肽替换成了酵母内源的AGA2信号肽(图15)。
结果
诱导蛋白表达后,发明人对新构建的酵母细胞进行了免疫荧光标记和流式细胞检测。由于使用了AGA2 SP,FcAB修饰的DR1或DR4α/β在酵母表面的表达水平大幅提高,明显高于LZ修饰的HLA-IIα/β在酵母表面的表达水平(MFI_Fc>MFI_LZ)。阳性荧光强度信号分别提高146.6%和96.1%(MFI_Fc-MFI_LZ)/(MFI_LZ-MFI_BC)x100%(图15)。
总之,FcAB修饰加之信号肽的优选使得全部“空”MHC-II蛋白在酵母表面正确组装并且展示水平大幅提升,这些为提高MHC-II肽配体鉴定灵敏度和准确性奠定了坚实的基础。
实施例5:该方法有效提高鉴定MHC-II肽配体的灵敏度和准确性
鉴别MHC-II的肽配体包括计算预测和实验验证两大方面,由于目前人工智能机器学习尚处在发展阶段,计算预测的算法模型在很大程度上仍依赖于高精准确的实验数据。目前,鉴别MHC-II的肽配体的实验方法又分两类。一类使用质谱法来量化从MHC-II分子洗脱的肽,这些分子是从裂解细胞中免疫沉淀出来的(简称EL)。另一类方法检测合成肽与重组MHC-II蛋白的结合并测定亲和力(简称BA)。要鉴定与特定MHC-II等位基因表达的蛋白所结合的肽,两类实验方法的最优方案都是生成仅表达该等位基因的细胞系而不是直接用人类的原代细胞,原因是人类的原代细胞通常是多MHC-II等位基因复合表达的,并且共同显性表达来自HLA-DR、HLA-DQ和HLA-DP这三种人类白细胞抗原(HLA)的等位基因。此外,这两类实验方法都需要先纯化或富集所表达的MHC-II蛋白再鉴定肽配体。建立单MHC-II等位基因表达细胞系和纯化MHC-II蛋白这些步骤通常需要长达4-6个月的时间,严重限制了研究开发的速度。另外,EL方法多数都是定性鉴定,而且假阳性比例比BA方法高。相比,用酵母表面展示的“空”MHC-II进行肽配体的鉴别与测定最快仅需要1周,而且满足定性和定量的双重功效。更重要的是,酵母表面所展示的“空”MHC-II蛋白产量越高,拷贝数越多,对提高鉴定MHC-II肽配体的灵敏度和准确性就越有帮助。因此,FcAB修饰后的“空”MHC-II蛋白在这个应用领域就展现出巨大的优势。
在人体温度(37℃)下,生理肽加载发生在酸性MIIC(pH-5)中,而酵母细胞表面展示的“空”MHC-II可以在比较宽泛的条件下,例如,在pH 5.0或pH 7.4和30℃或37℃下,与特异性目标肽结合。这里的目标肽是生物素化的(指示肽),可用于流式细胞检测和定量分析。动力学的证据表明目标肽与酵母表面展示的“空”MHC-II一般在15-20小时左右出现动态平衡。因此,pH-5,20h代表了对比HLA-II-Fc和HLA-II-LZ酵母展示水平及他们结合肽灵敏度和准确性的首选条件。
结果
针对酵母表面展示的“空”HLA-DR1复合体蛋白,发明人使用流感血凝素(HA)306-318肽 作为HLA-DR1所对应的特异性目标抗原肽。在使用同样浓度的Bio-HA306-318指示肽的前提下,对比实验结果显示,酵母表面检测到的DR1-Fc不仅表达水平(展示量)远高于DR1-LZ,而且结合的指示肽的水平也显著高于DR-LZ(图16)。这证明了酵母表面所表达的经FcAB修饰过的“空”HLA-IIα/β的表观活性高于LZ修饰的蛋白的表观活性(即参数:肽结合能力每单位酵母)。换句话说,较低浓度的指示肽染色表达FcAB修饰HLA-IIα/β的酵母可以达到相对较高浓度的指示肽染色表达LZ修饰HLA-IIα/β的酵母所获得的荧光信号强度(用MFI表示)(图16和图17)。这证明了该方法创建的酵母株相较于其他酵母株在鉴定肽配体这一应用场景里具有更高的灵敏度。这种灵敏度的提升也就意味着,以改变指示肽浓度为条件而测定的亲和力常数(如,表观平衡解离常数KDapp)更加准确,换句话说,更接近由传统BA方法测定的亲和力常数。比如,运用DR-Fc展示酵母测定的Bio-HA306-318亲和力常熟KDapp=6.81μM,明显比15.14μM(DR-LZ测定值)更接近传统BA方法测定的HA306-318~100nM数量级的KD(图17)。
进一步实验中,发明人同时使用非生物素化的HA306-318(竞争肽)和生物素化的Bio-HA306-318(指示肽)进行与MHC-II竞争结合的试验。实验结果表明,酵母展示的DR-Fc结合的Bio-HA306-318被HA306-318竞争的信噪差明显大于酵母展示的DR-LZ(ΔMFI=18.5VSΔMFI=3.81)(图18)。另外,用酵母表面所表达的经FcAB修饰过的“空”DR1α/β测定的半抑制浓度(IC50)比用LZ修饰的复合体测定的IC50更准确,因为当竞争肽浓度高于50μM后,在展示DR-LZ的酵母表面,几乎无法分辨指示肽的阳性信号与背景信号的区别,导致数据拟合误差大幅增加(图19)。这一对比结果再次证明该方法创建的酵母株相较于其他酵母株在鉴定肽配体这一应用场景里具有更高的灵敏度和准确性。
实施例3-实施例5的材料和方法
材料
指示肽:Bio-HA306-318(biotin-PKYVKQNTLKLAT)、Bio-aI(阴性对照,DQ2.5结合肽)和竞争肽由金斯瑞合成。单克隆抗体(mAb)包括小鼠抗DRαβ(克隆L243)、小鼠抗DQαβ(克隆),Alexa Fluor 647偶联链霉亲和素,和高度交叉吸附的二抗包括Alexa Fluor 488山羊抗小鼠IgG(H+L)和Alexa Fluor 647山羊抗小鼠IgG(H+L),购自赛默飞。
方法
酵母展示载体的创建
为了构建“空”DR4-Fc和“空”DQ6-Fc展示用质粒,将之前(Liu et al.,2021)使用的ptDR4-LZ(TRP+)和ptDQ6-LZ(TRP+)中α或β链C末端的Fos或Jun亮氨酸拉链二聚化基序分别替换成FcA或FcB结构域(如SEQ ID NO:11-19所示,实施例选用序列如SEQ ID NO:14和SEQ ID NO:15所示)。DR4可以被DR1或DR15替换构建酵母展示DR1-Fc或DR15-Fc的载体。
酵母转化、培养、及蛋白表达
将携带色氨酸营养标记基因(TRP+)的质粒按照BioRad MicroPulser手册方案通过电穿孔转化到酵母亲本菌株EBY100(URA+,TRP-)中。在30℃恒温箱培养2天后,单个酵母菌落出现在含有色氨酸缺失的固体琼脂培养基平板上,培养基是SD-CAA(2%w/v葡萄糖, 0.67%w/v不含氨基酸的酵母氮源基,0.062%w/v Ura/Trp缺失酪蛋白氨基酸,38mM Na2HPO4,62mM NaH2PO4,pH 6.0)。然后用单个酵母菌落接种2ml的SD-CAA液体培养基,并在30℃下以225rpm振荡培养过夜至OD600为2.5-5.0。为了在酵母中诱导GAL1-10驱动的蛋白表达,收获到的107个细胞将切换到2mL SG-CAA培养基(葡萄糖被半乳糖替代)。在30℃下诱导18小时后,通过2,500g离心3分钟的方式收集数量充足的酵母细胞,洗涤并准备用于分析蛋白表达或肽结合。
免疫荧光染色和流式细胞术
“空”MHC-II的表达和展示水平通过流式细胞术使用免疫荧光标记进行评估。简而言之,首先将半乳糖诱导的酵母细胞与包括小鼠mAb L243(对于DR)或(对于DQ)或其他标为标签的抗体(每种mAb约10μg/ml)的一级mAb标记,室温(RT)30分钟,然后冰上30分钟。然后用300μl冰冷PBS+1%w/v牛血清白蛋白(BSA)洗涤细胞,并用高度交叉吸附的二抗,AlexaFluor488山羊抗小鼠IgG(H+L)或AlexaFluor 647山羊抗小鼠IgG(H+L)(1:100稀释)或其他种源二抗在冰上孵浴1小时。标记后,酵母细胞在流式细胞仪上进行分析,以检测与MHC-II蛋白或表位标签的表达相对应的荧光信号。每个样本收集不少于20,000个细胞。使用FlowJo软件(BD)分析流式细胞数据。
目的抗原肽结合酵母上的“空”MHC-II
通过2,500g离心3分钟的方式收集表达“空”MHC-II的6×105个半乳糖诱导的酵母细胞,并重悬于40μl以下溶液中:柠檬酸盐缓冲液(40mM柠檬酸和柠檬酸钠,pH 5.0、150mM氯化钠)和磷酸盐缓冲盐水(PBS;pH 7.4,137mM NaCl,2.7mM KCl,10.1mM Na2HPO4,1.8mM KH2PO4)。将指示肽添加到溶液中,在所需条件下孵育。为了确定肽结合的动力学,收集6×105个半乳糖诱导的酵母细胞并重悬于40μl 40mM柠檬酸盐缓冲液(pH 5.0)中,并在不同时间点与指示肽孵育,然后通过2,500g离心3分钟的方式收集细胞用于流式细胞分析。为了确定MHC-II肽在酵母细胞表面结合的表观亲和力,不同浓度的指示肽与展示“空”MHC-II的酵母细胞在40mM柠檬酸盐缓冲液(pH 5.0)中于30℃孵育20小时。反应管在孵育前用石蜡膜密封,以防止培养体积的变化可能影响时间过程或浓度滴定研究中肽的最终浓度。孵育后,酵母细胞用300μl冰冷PBS+1%BSA洗涤两次,然后用在50μlPBS+1%BSA中按1:200稀释的链霉亲和素-AF647在冰上染色一小时。然后用300μl冰冷PBS+1%BSA洗涤细胞两次,最后重悬于300μl冰冷PBS+1%BSA中,用于在流式细胞分析。为了同时检测细胞表面MHC-II蛋白和指示肽,结合指示肽的细胞首先在冰上用小鼠mAb L243(对于DR)或(对于DQ)染色30分钟,然后用高度交叉吸附的Alexa Fluor488山羊抗小鼠IgG(H+L)和链霉亲和素-AF647在PBS+1%BSA中在冰上双标记一小时。肽结合被量化为去噪链霉亲和素染色信号(MFISA,indicator-MFISA,neg),并针对孵育时间或肽浓度作图以确定动力学或热力学参数。为了确定指示肽与酵母细胞表面“空”MHC-II分子结合的表观平衡解离常数(KDapp),用Graphpad Prism中的一个位点特异性结合方程,使用非线性回归拟合数据。
使用酵母展示MHC-II的肽竞争测定
6×105个半乳糖诱导的酵母在pH 5.0、30℃下与10μM指示肽在各种浓度的竞争肽存在下孵育20小时,以确定用于竞争测定的适当竞争剂浓度。孵育后,用PBS+1%BSA洗涤酵母细胞,用链霉亲和素-AF647染色并如上所述通过流式细胞术进行分析。竞争肽存在下的结合百分比被量化为[(MFIwithcompetitor-背景)/(MFIwithoutcompetitor-背景)]×100%,并针对竞争剂浓度作图以确定半抑制浓度(IC50)。用Graphpad Prism中的单点拟合log IC50方程,使用非线性回归拟合数据。
本申请以上实施例中使用的序列示于如下序列表中。应当理解,以下序列仅为本申请实施方案的示例性序列,而非对本申请方案的任何限制。
序列表


Claims (51)

  1. 一种细胞,其包括主要组织相容性复合体(MHC)以及单链结构域,其中所述复合体包含α链和β链,其中,
    所述α链连接至第一蛋白结合结构域以形成第一融合蛋白,
    所述β链连接至第二蛋白结合结构域以形成第二融合蛋白,
    所述单链结构域非共价结合所述α链和β链,所述第一融合蛋白结合所述第二融合蛋白形成所述复合体,并且所述复合体与单链结构域之间构成的三聚体,其中,
    所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域之间构成的三聚体。
  2. 根据权利要求1所述的细胞,其中,
    所述第一蛋白结合结构域非共价结合所述第二蛋白结合结构域;或者所述第一蛋白结合结构域共价连接所述第二蛋白结合结构域,所述共价键优选为二硫键。
  3. 根据权利要求1所述的细胞,其中,所述细胞是酵母细胞。
  4. 根据权利要求1所述的细胞,其中,所述单链结构域与所述细胞表面上的分子以共价融合的方式结合至所述细胞的表面。
  5. 根据权利要求4所述的细胞,其中所述α链或所述β链通过与所述单链结构域非共价结合的方式结合至所述细胞表面上的所述分子。
  6. 根据权利要求4所述的细胞,其中所述分子是蛋白质。
  7. 根据权利要求6所述的细胞,其中所述蛋白质对于所述细胞是内源性的。
  8. 根据权利要求6所述的细胞,其中所述蛋白质为Aga2p、a-凝集素、α-凝集素、絮凝素、Cwp1p、Cwp2p或Tip1p。
  9. 根据权利要求6所述的细胞,其中所述蛋白质是Aga2p。
  10. 根据权利要求9所述的细胞,其中所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
  11. 根据权利要求1所述的细胞,其中所述单链结构域为长度9~30个氨基酸的肽段,并且所述单链结构域包含至少一个连续的长度为9个氨基酸的可被所述复合体特异性识别的注册肽(register)或可被所述复合体结合的注册肽变体(variant),优选所述注册肽或其变体选自如SEQ ID NO:20-38中任一项所示的氨基酸序列。
  12. 根据权利要求11所述的细胞,其中所述单链结构域的氨基酸序列如SEQ ID NO:2-6或46-77中任一项所示。
  13. 根据权利要求1所述的细胞,其中,
    所述第一蛋白结合结构域为第一亮氨酸拉链结构域,所述第二蛋白结合结构域为第二亮氨酸拉链结构域,其中,形成完整亮氨酸拉链的所述第一亮氨酸拉链结构域与所述第二亮氨酸拉链结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用;或者
    所述第一蛋白结合结构域为FcA结构域,所述第二蛋白结合结构域为FcB结构域,其中,形成FcAB二聚体的所述FcA结构域与所述FcB结构域可在所述第一融合蛋白和所述第二融合蛋白中互换使用。
  14. 根据权利要求13所述的细胞,其中,
    所述第一亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:7或SEQ ID NO:8或SEQ ID NO:9或SEQ ID NO:10所示;
    相应的,所述第二亮氨酸拉链结构域的氨基酸序列为SEQ ID NO:8或SEQ ID NO:7或SEQ ID NO:10或SEQ ID NO:9所示。
  15. 根据权利要求13所述的细胞,其中,所述FcA结构域与所述FcB结构域为相同的或不同的氨基酸序列,优选选自如SEQ ID NO:11-19所示的氨基酸序列中的任一种;
    优选所述Fc A结构域和FcB结构域为能够提高所述复合体表达展示量和/或拷贝数量;
    优选所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;或者
    优选所述FcA结构域包含第一CH2结构域和第一CH3结构域,所述FcB结构域包含第二CH2结构域和第二CH3结构域;
    进一步优选第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头连接,第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头连接。
  16. 根据权利要求15所述的细胞,其中,
    所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,所述修饰能实现以下任一种或两种或三种:
    (1)增强或稳定所述FcA结构域和/或所述FcB结构域本身(“FcS”),
    (2)增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),
    (3)减少或避免所述FcA结构域与所述FcB结构域错配所致的所述α链和β链的错配(“FcN”)。
  17. 根据权利要求1中任一项所述的细胞,其中,所述第一蛋白结合结构域连接至所述α链的C末端,所述第二蛋白结合结构域连接至所述β链的C末端,优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
  18. 根据权利要求1所述的细胞,其中,所述复合体为MHC I类分子或MHC II类分子,优选所述复合体为HLA I类分子或HLA II类分子。
  19. 根据权利要求18所述的细胞,其中,
    所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
    所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
    所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或 HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
  20. 一种编码主要组织相容性复合体(MHC)的核酸,其中所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述α链和β链通过非共价结合单链结构域,并且所述第一蛋白结合结构域结合所述第二蛋白结合结构域以增强或促进形成所述复合体与单链结构域之间构成的三聚体。
  21. 根据权利要求20所述的核酸,其所编码的主要组织相容性复合体是权利要求1~19中任一项涉及的主要组织相容性复合体(MHC)。
  22. 一种主要组织相容性复合体(MHC)的制备方法,所述方法包括:
    用权利要求20或21所述的核酸和包括编码单链结构域的核酸转化细胞,
    以及在表达权利要求1~19中任一项涉及的复合体的条件下培养所述细胞。
  23. 根据权利要求22所述的方法,其中所述细胞是酵母细胞。
  24. 一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,所述方法包括:
    i)使权利要求1~19中任一项所述的细胞展示的单链结构域文库,
    以及ii)检测权利要求1~19中任一项涉及的单链结构域展示的细胞文库中的单克隆表面的三聚体,从而鉴定权利要求1~19中任一项涉及的复合体的结合肽。
  25. 根据权利要求24所述的方法,其中所述单链结构域包含肽,或肽的突变体,或肽的突变体文库,或肽的混合物。
  26. 根据权利要求24或25所述的方法,其中所述肽,或肽的突变体,或肽的突变体文库,或肽的混合物非共价结合所述复合体。
  27. 一种能够与主要组织相容性复合体(MHC)结合的注册肽或其变体,其中,所述注册肽或其变体选自如SEQ ID NO:20-38中任一项所示的氨基酸序列。
  28. 一种细胞,其包括主要组织相容性复合体(MHC),其中,所述复合体包含α链和β链,其中,
    所述α链连接至第一蛋白结合结构域以构成第一融合蛋白,
    所述β链连接至第二蛋白结合结构域以构成第二融合蛋白,
    所述第一融合蛋白结合所述第二融合蛋白以形成所述复合体,
    所述第一蛋白结合结构域和所述第二蛋白结合结构域以增强所述α链和β链结合形成所述MHC。
  29. 根据权利要求28所述的细胞,其中,所述细胞是酵母细胞。
  30. 根据权利要求28所述的细胞,其中,所述复合体通过将所述α链或所述β链连接至所述细胞表面上的分子而结合至所述细胞的表面。
  31. 根据权利要求30所述的细胞,其中,复合体中另外的所述α链或另外的所述β链通过与所述β链或所述α链非共价结合从而结合至所述细胞表面上的所述分子。
  32. 根据权利要求30所述的细胞,其中,所述分子是蛋白质。
  33. 根据权利要求32所述的细胞,其中,所述蛋白质对于所述细胞是内源性的。
  34. 根据权利要求32所述的细胞,其中,所述蛋白质为Aga2p、a-凝集素、α-凝集素、 絮凝素、Cwp1p、Cwp2p或Tip1p。
  35. 根据权利要求32所述的细胞,其中,所述蛋白质是Aga2p。
  36. 根据权利要求35所述的细胞,其中,所述Aga2p的氨基酸序列如SEQ ID NO:1所示。
  37. 根据权利要求28所述的细胞,其中,所述第一蛋白结合结构域非共价结合所述第二蛋白结合结构域;或者
    所述第一蛋白结合结构域共价结合所述第二蛋白结合结构域,优选所述共价结合为通过二硫键的结合。
  38. 根据权利要求28所述的细胞,其中,
    所述第一蛋白结合结构域和所述第二蛋白结合结构域形成FcAB结构域;
    优选所述第一蛋白结合结构域为FcA结构域或FcB结构域,所述第二蛋白结合结构域为FcB结构域或FcA结构域,其中,
    进一步优选所述FcA结构域和FcB结构域能够提高所述复合体表达展示量和/或拷贝数量。
  39. 根据权利要求38所述的细胞,其中,
    所述FcA结构域只包含第一CH3结构域,所述FcB结构域只包含第二CH3结构域;或者
    所述FcA结构域既包含第一CH2结构域又包含第一CH3结构域,所述FcB结构域既包含第二CH2结构域又包含第二CH3结构域;
    进一步优选第一蛋白结合结构域的所述第一CH2结构域和第一CH3结构域通过接头连接,第二蛋白结合结构域的所述第二CH2结构域和第二CH3结构域通过接头连接。
  40. 根据权利要求39所述的细胞,其中,
    所述FcA结构域与所述FcB结构域任意选自SEQ ID NO:11-19所示的氨基酸序列中的任一种或与SEQ ID NO:11-19所示的氨基酸序列具有至少80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%的序列同一性。
  41. 根据权利要求39或40所述的细胞,其中,所述FcA结构域和所述FcB结构域中的任一者或两者包含一个或多个氨基酸修饰,
    所述修饰能实现以下任意:
    增强或稳定所述FcA结构域和/或所述FcB结构域本身(“FcS”),
    增强或稳定所述FcA结构域与所述FcB结构域之间的共价或非共价结合(“FcM”),
    减少或避免所述FcA结构域与所述FcB结构域错配所致的所述α链和β链的错配(“FcN”)。
  42. 根据权利要求28~41中任一项所述的细胞,其中,所述第一蛋白结合结构域连接至所述α链的C末端,所述第二蛋白结合结构域连接至所述β链的C末端,优选启动表达所述第一融合蛋白和第二融合蛋白的信号肽为SPP SP或AGA2 SP,进一步优选所述AGA2 SP的氨基酸序列如SEQ ID NO:39所示,所述SPP SP的氨基酸序列如SEQ ID NO:45所示。
  43. 根据权利要求28~42中任一项所述的细胞,其中,所述复合体为MHC I类分子或MHC II类分子,优选所述复合体为HLA I类分子或HLA II类分子。
  44. 根据权利要求43所述的细胞,其中,
    所述α链是由HLA-DRA*01或HLA-DRA族其他等位基因编码,所述β链是由HLA-DRB1*01或HLA-DRB1*03或HLA-DRB1*04或HLA-DRB1*15或HLA-DRB族其他等位基因编码;或者
    所述α链是由HLA-DQA1*01或HLA-DQA1*03或HLA-DQA1*05或HLA-DQA1族其他等位基因编码,所述β链是由HLA-DQB1*02或HLA-DQB1*03或HLA-DQB1*05或HLA-DQB1*06或HLA-DQB1族其他等位基因编码;或者
    所述α链是由HLA-DPA1*01:03或HLA-DPA1*02:02或HLA-DPA1族其他等位基因编码,所述β链是由HLA-DPB1*01:01或HLA-DPB1*02:01或HLA-DPB1*04:01或HLA-DPB1*04:02或HLA-DPB1族其他等位基因编码。
  45. 一种编码主要组织相容性复合体(MHC)的核酸,其中,所述复合体包含α链和β链,其中所述α链连接至第一蛋白结合结构域,所述β链连接至第二蛋白结合结构域,其中所述第一蛋白结合域和所述第二蛋白结合域能够结合以形成所述复合体,所述第一蛋白结合结构域和所述第二蛋白结合结构域形成FcAB二聚体以增强所述α链和β链结合形成所述MHC。
  46. 根据权利要求45所述的核酸,其中,所编码的主要组织相容性复合体是权利要求1~17中任一项涉及的主要组织相容性复合体(MHC)。
  47. 一种主要组织相容性复合体(MHC)的制备方法,其中,所述方法包括:
    用权利要求45或46所述的核酸转化细胞,
    以及在表达权利要求28~44中任一项涉及的复合体的条件下培养所述细胞。
  48. 根据权利要求47所述的方法,其中,所述细胞是酵母细胞。
  49. 一种鉴定结合主要组织相容性复合体(MHC)的肽的方法,其中,所述方法包括:
    i)使权利要求28~44中任一项所述的细胞与肽接触,
    以及ii)检测所述肽与权利要求28~44中任一项涉及的复合体的结合,从而鉴定所述复合体的结合肽。
  50. 根据权利要求49所述的方法,其中,所述肽包含肽的混合物。
  51. 根据权利要求49或50所述的方法,其中,所述肽与参考肽竞争结合所述复合体。
PCT/CN2024/107278 2023-07-24 2024-07-24 开发mhc新抗原的工程化细胞 Pending WO2025021112A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202310913088 2023-07-24
CN202310913088.5 2023-07-24
CN202310943938.6 2023-07-28
CN202310943938 2023-07-28

Publications (1)

Publication Number Publication Date
WO2025021112A1 true WO2025021112A1 (zh) 2025-01-30

Family

ID=94374197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/107278 Pending WO2025021112A1 (zh) 2023-07-24 2024-07-24 开发mhc新抗原的工程化细胞

Country Status (1)

Country Link
WO (1) WO2025021112A1 (zh)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999042597A1 (en) * 1998-02-19 1999-08-26 President And Fellows Of Harvard College Monovalent, multivalent, and multimeric mhc binding domain fusion proteins and conjugates, and uses therefor
US20130115218A1 (en) * 2010-07-15 2013-05-09 Technion Research & Development Foundation Ltd Isolated high affinity entities with t-cell receptor like specificity towards native complexes of mhc class ii and glutamic acid decarboxylase (gad) autoantigenic peptides
WO2014165866A2 (en) * 2013-04-01 2014-10-09 The Board Of Trustees Of The Leland Stanford Junior University Methods for immune-based diagnosis, prevention and personalized treatment of narcolepsy
CN104159926A (zh) * 2011-12-01 2014-11-19 普腾生技有限公司 补体和vegf途径的蛋白质抑制剂及其使用方法
CN104350070A (zh) * 2012-01-27 2015-02-11 生物防护科技公司 调节免疫反应的组合物和方法
CN105828835A (zh) * 2013-05-10 2016-08-03 诺华股份有限公司 避免流感疫苗中的发作性嗜睡病风险
US20180179256A1 (en) * 2015-05-04 2018-06-28 Epivax, Inc. Modified H7 Hemagglutinin Glycoprotein of the Influenza A/Shanghai/2/2013 H7 Sequence
CN110291111A (zh) * 2016-11-09 2019-09-27 优迪有限合伙公司 重组pMHC II类分子
WO2020177719A1 (zh) * 2019-03-05 2020-09-10 信达生物制药(苏州)有限公司 展示与分泌目的多肽的酵母展示系统及其用途
CN111712254A (zh) * 2017-12-23 2020-09-25 鲁比厄斯治疗法股份有限公司 人工抗原呈递细胞和使用方法
CN113528564A (zh) * 2020-04-16 2021-10-22 葡萄柚集团有限公司 在酵母中表达的covid-19的口服疫苗
US20220381793A1 (en) * 2021-05-28 2022-12-01 The Board Of Trustees Of The Leland Stanford Junior University Compositions and methods for identifying mhc-ii binding peptides
US20230076204A1 (en) * 2020-02-18 2023-03-09 Institute For Systems Biology Single chain trimer mhc class ii nucleic acids and proteins and methods of use
US20230091257A1 (en) * 2021-05-10 2023-03-23 The Regents Of The University Of Colorado, A Body Corporate Pocket Engineering of HLA Alleles for Treating Autoimmunity

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999042597A1 (en) * 1998-02-19 1999-08-26 President And Fellows Of Harvard College Monovalent, multivalent, and multimeric mhc binding domain fusion proteins and conjugates, and uses therefor
US20130115218A1 (en) * 2010-07-15 2013-05-09 Technion Research & Development Foundation Ltd Isolated high affinity entities with t-cell receptor like specificity towards native complexes of mhc class ii and glutamic acid decarboxylase (gad) autoantigenic peptides
CN104159926A (zh) * 2011-12-01 2014-11-19 普腾生技有限公司 补体和vegf途径的蛋白质抑制剂及其使用方法
CN104350070A (zh) * 2012-01-27 2015-02-11 生物防护科技公司 调节免疫反应的组合物和方法
WO2014165866A2 (en) * 2013-04-01 2014-10-09 The Board Of Trustees Of The Leland Stanford Junior University Methods for immune-based diagnosis, prevention and personalized treatment of narcolepsy
CN105828835A (zh) * 2013-05-10 2016-08-03 诺华股份有限公司 避免流感疫苗中的发作性嗜睡病风险
US20180179256A1 (en) * 2015-05-04 2018-06-28 Epivax, Inc. Modified H7 Hemagglutinin Glycoprotein of the Influenza A/Shanghai/2/2013 H7 Sequence
CN110291111A (zh) * 2016-11-09 2019-09-27 优迪有限合伙公司 重组pMHC II类分子
CN111712254A (zh) * 2017-12-23 2020-09-25 鲁比厄斯治疗法股份有限公司 人工抗原呈递细胞和使用方法
WO2020177719A1 (zh) * 2019-03-05 2020-09-10 信达生物制药(苏州)有限公司 展示与分泌目的多肽的酵母展示系统及其用途
US20230076204A1 (en) * 2020-02-18 2023-03-09 Institute For Systems Biology Single chain trimer mhc class ii nucleic acids and proteins and methods of use
CN113528564A (zh) * 2020-04-16 2021-10-22 葡萄柚集团有限公司 在酵母中表达的covid-19的口服疫苗
US20230091257A1 (en) * 2021-05-10 2023-03-23 The Regents Of The University Of Colorado, A Body Corporate Pocket Engineering of HLA Alleles for Treating Autoimmunity
US20220381793A1 (en) * 2021-05-28 2022-12-01 The Board Of Trustees Of The Leland Stanford Junior University Compositions and methods for identifying mhc-ii binding peptides

Similar Documents

Publication Publication Date Title
US10781249B2 (en) Anti-GPC3 antibody
JP4972264B2 (ja) 高親和性tcrタンパク質および方法
US20040110253A1 (en) Method for identifying MHC-presented peptide epitopes for T cells
AU2018383600B2 (en) Yeast display of proteins in the periplasmic space
CN109678958B (zh) 一种人NT-proBNP特异性重组羊单克隆抗体及其制备方法和应用
JP2003518377A5 (zh)
JP6876628B2 (ja) 細胞表面上にペプチドを提示するためのシステム
JP2014507121A (ja) 膜結合レポーター分子および細胞ソーティングにおけるそれらの使用
CN118813436A (zh) 展示单mhc等位基因功能蛋白的工程化细胞
WO2025021112A1 (zh) 开发mhc新抗原的工程化细胞
CN118792176A (zh) 用于mhc新抗原肽发现及亲和分析的工程化细胞
CN114487448B (zh) 用于检测重症肌无力相关抗体的组合物、试剂盒及应用
CN119220425A (zh) 发现mhc新抗原肽表位的工程化细胞
CN118725101A (zh) cTnI抗体及其用途
CN117683131A (zh) 一种抗髓鞘少突胶质细胞糖蛋白(mog)抗体及其应用
JPWO1999060113A1 (ja) 新規な遺伝子クローニング方法
CN117603362A (zh) 工程化细胞外囊泡
JP2019534689A (ja) ファージ表面上での二重特異性抗体の提示
WO2025129836A1 (en) An engineered cell
WO2023001256A1 (zh) 一种纳米颗粒及其检测car阳性表达率的应用
CN111808170A (zh) 多肽、hla-dr蛋白及其制备方法和应用
CN112501124A (zh) 稳定表达人转铁蛋白受体1细胞株的制备方法及其应用
CN112048476A (zh) 稳定表达hFcRn细胞株的制备方法及其在药物筛选中的应用
CN114045306B (zh) 一种反映免疫突触相关信号蛋白的荧光素酶互补系统质粒和稳转细胞株及其应用
CN113999306B (zh) 一种获得识别空间构象表位抗体的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24844803

Country of ref document: EP

Kind code of ref document: A1