[go: up one dir, main page]

WO2025236237A1 - 一种双限制区纳米孔蛋白复合物及其应用 - Google Patents

一种双限制区纳米孔蛋白复合物及其应用

Info

Publication number
WO2025236237A1
WO2025236237A1 PCT/CN2024/093629 CN2024093629W WO2025236237A1 WO 2025236237 A1 WO2025236237 A1 WO 2025236237A1 CN 2024093629 W CN2024093629 W CN 2024093629W WO 2025236237 A1 WO2025236237 A1 WO 2025236237A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
dual
nanoporous
hfab
restriction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/093629
Other languages
English (en)
French (fr)
Inventor
贝伟伟
王阳芷
邵雪雪
张满丰
张雪梅
黄亿华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Polyseq Biotech Co Ltd
Institute of Biophysics of CAS
Original Assignee
Beijing Polyseq Biotech Co Ltd
Institute of Biophysics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Polyseq Biotech Co Ltd, Institute of Biophysics of CAS filed Critical Beijing Polyseq Biotech Co Ltd
Priority to PCT/CN2024/093629 priority Critical patent/WO2025236237A1/zh
Publication of WO2025236237A1 publication Critical patent/WO2025236237A1/zh
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • This invention relates to the field of biotechnology, and in particular to a dual-restricted-region nanoporous protein complex and its applications.
  • Nanopore sequencing with its advantages of real-time processing, portability, potential low cost, ultra-long read lengths, and the ability to detect epigenetic modifications on nucleic acids and perform direct RNA sequencing, demonstrates significant superiority as a fourth-generation sequencing technology and represents the future direction of nucleic acid sequencing.
  • the principle of nanopore sequencing involves using an electric field to drive single-stranded nucleic acids sequentially through nanoscale pores embedded in an insulating artificial membrane.
  • Nanopore sequencing Compared to next-generation sequencing, nanopore sequencing currently suffers from lower accuracy, particularly in its ability to distinguish homopolymers.
  • Nanopore proteins are key factors determining the accuracy of nanopore sequencing, and the modification and optimization of single-restriction nanopore proteins can improve sequencing current properties and thus increase accuracy to some extent.
  • Dual-restriction nanopore protein complexes offer a new approach to improving nanopore sequencing accuracy, potentially further enhancing its ability to distinguish homopolymers.
  • dual-restriction nanopore protein complexes are scarce.
  • This invention addresses the problem of the scarcity of dual-restriction nanoporous protein complexes and the low accuracy of nanopore sequencing by providing a novel dual-restriction nanoporous protein complex through in vitro assembly, thereby enriching the variety of nanoporous proteins.
  • the present invention provides a dual-restriction nanoporous protein complex comprising HfaB protein and truncated HfaA protein, wherein the truncated HfaA protein is linked to the HfaB protein and forms a restriction region in the nanoporous protein complex.
  • the truncated HfaA protein is any of the following polypeptides: (a1) a polypeptide with an amino acid sequence representing the N-terminal 23-35 amino acids (e.g., 23, 27, 31, or 35) of the HfaA protein; (a2) a polypeptide with the same function obtained by substituting and/or deleting and/or adding one or more amino acids to the amino acid sequence representing the N-terminal 23-35 amino acids (e.g., 23, 27, 31, or 35) of the HfaA protein; (a3) a polypeptide with more than 80% identity to the amino acid sequence defined in (a1)-(a2) and having the same function; (a4) a fusion polypeptide obtained by attaching a tag to the end of any of the polypeptides defined in (a1)-(a3).
  • the amino acid sequence of the HfaA protein has the GenBank accession number SAMN05880561_102762.
  • the truncated HfaA protein amino acid sequence is shown in SEQ ID NO:37.
  • the truncated HfaA protein is inserted into the cavity of the HfaB protein.
  • the HfaB protein comprises nine HfaB protein monomers.
  • the ratio of the HfaB protein monomer to the truncated HfaA protein in the nanoporous protein is 1:1.
  • the HfaB protein monomer is any of the following: (b1) a protein monomer with the amino acid sequence shown in SEQ ID NO: 1; (b2) a protein monomer with the same function obtained by substituting and/or deleting and/or adding one or more amino acid residues of the amino acid sequence shown in SEQ ID NO: 1; (b3) a protein monomer with more than 80% identity to the amino acid sequence defined in (b1)-(b2) and with the same function; (b4) a fusion protein monomer obtained by attaching a tag to the end of any of the protein monomers defined in (b1)-(b3).
  • the protein monomer (b2) comprises at least one of the following substitutions: the serine at position 79 is replaced by asparagine, tryptophan, isoleucine, alanine, valine, leucine, tyrosine, glutamic acid, or lysine; and the glutamic acid at position 80 is replaced by glutamine, asparagine, serine, tryptophan, alanine, or isoleucine.
  • amino acid sequences of the protein monomers described in (b2) are as shown in SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33 and/or SEQ ID NO:35.
  • connection can be covalent or non-covalent.
  • the connection can be that each truncated HfaA protein is linked to multiple HfaB protein monomers to improve the stability of the confinement region nanoporous protein complex.
  • each truncated HfaA protein is linked to three adjacent HfaB protein monomers.
  • connection is made by means of at least one amino acid residue at positions 81-242 of the HfaB protein monomer.
  • the HfaB protein monomer and the truncated HfaA protein are linked non-covalently via residues at positions corresponding to one or more of the following pairs of positions: Glu5 and Arg184, Arg7 and Glu81, Arg12 and Asp233, Glu16 and Lys235, Arg17 and Asp233, and Arg17 and Glu173, respectively.
  • HfaB protein monomer and the truncated HfaA protein are also linked, rather than covalently, via residues at positions corresponding to one or more of the following pairs of positions on the truncated HfaA protein and HfaB protein monomer, respectively: Asn1 and Gln241, Asn1 and Val215, Asn1 and Val188, Asn1 and Ala187, Asn1 and Glu242, Tyr9 and Ser169, Tyr9 and Gly171, Phe11 and Phe220, Phe11 and Ile172, Phe11 and Glu173, Phe11 and Ala183, Arg12 and Ser222, Arg12 and Glu173, Leu20 and Phe224, Glu24 and Phe226, Thr26 and Phe226, Leu30 and Phe226, Leu30 and Asp229, Gln31 and Asp229.
  • the structure of the above-described dual-restricted nanoporous protein complex is shown in Figure 1, comprising: (1) an HfaB protein, the HfaB protein comprising a first opening, an intermediate segment, a second opening, and an inner cavity extending from the first opening through the intermediate segment to the second opening, wherein the inner surface of the intermediate segment defines a restriction region (i.e., the first restriction region in the figure); and (2) a plurality of truncated HfaA proteins, each truncated HfaA protein containing an HfaB binding region, wherein the plurality of truncated HfaA proteins form another restriction region (i.e., the second restriction region in the figure) within the intermediate segment of the HfaB protein, and the two restriction regions are coaxially spaced apart within the intermediate segment of the HfaB protein.
  • an HfaB protein the HfaB protein comprising a first opening, an intermediate segment, a second opening, and an inner cavity extending from the first opening
  • polypeptides or protein monomers can be obtained by first synthesizing their encoding genes and then expressing them biologically, or they can be synthesized artificially through complete chemical processes.
  • the tag may refer to a polypeptide or protein expressed by fusion with a target protein using in vitro DNA recombination technology, to facilitate the expression, detection, tracing, and/or purification of the target protein.
  • the protein tag may be a Strep-TagII tag, Flag tag, His tag, MBP tag, HA tag, myc tag, GST tag, and/or SUMO tag, etc.
  • identity refers to the identity of the amino acid sequences.
  • the identity of amino acid sequences can be determined using homology search sites on the Internet, such as the BLAST page on the NCBI homepage. For example, in Advanced BLAST 2.1, using blastp as the procedure, setting the Expect value to 10, setting all filters to OFF, using BLOSUM62 as the matrix, setting the Gap existence cost, Per residual gap cost, and Lambda ratio to 11, 1, and 0.85 (default values) respectively, and performing an identity search on a pair of amino acid sequences, the identity value (%) can then be obtained.
  • the 80% or more identity can be at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity.
  • the aforementioned dual-restriction nanoporous protein complexes may specifically be H4E-P6, H5C-P6, H5D-P6, H4G-P6, H4K-P6, or P1A-P6 prepared in the following examples.
  • the present invention also provides a method for generating the above-described dual-restriction nanoporous protein complex, the method comprising: A1 or A2 as follows: A1 co-expressing one or more HfaB protein monomers and a truncated HfaA protein in a host cell, thereby allowing the formation of the dual-restriction nanoporous protein complex in the cell; A2 contacting one or more HfaB protein monomers with the truncated HfaA protein, thereby allowing the formation of the dual-restriction nanoporous protein complex in vitro.
  • the molar ratio of HfaB protein monomer to truncated HfaA protein in A2 is 1:2.
  • the aforementioned biomaterials related to the dual-restriction nanoporous protein complex also fall within the scope of protection of this invention.
  • the related biomaterials are any one of the following: a1) a nucleic acid molecule encoding the aforementioned dual-restriction nanoporous protein complex; a2) an expression cassette containing the nucleic acid molecule described in a1); a3) a recombinant vector containing the nucleic acid molecule described in a1), or a recombinant vector containing the expression cassette described in a2); a4) recombinant cells containing the nucleic acid molecule described in a1), or recombinant cells containing the expression cassette described in a2), or recombinant cells containing the recombinant vector described in a3).
  • the nucleic acid molecule can be DNA, such as cDNA, genomic DNA or recombinant DNA; the nucleic acid molecule can also be RNA, such as mRNA, siRNA, shRNA, sgRNA, miRNA or antisense RNA.
  • the expression cassette refers to DNA capable of expressing genes in host cells.
  • This DNA may include not only promoters that initiate gene transcription but also terminators that terminate gene transcription.
  • the expression cassette may also include enhancer sequences.
  • the above-mentioned dual-restriction nanoporous protein complex and related biomaterials are also within the scope of protection of this invention in detecting the presence, absence, or one or more features of the target analyte or in preparing products that detect the presence, absence, or one or more features of the target analyte.
  • the present invention also provides a method for determining the presence, absence, or one or more characteristics of a target analyte, the method comprising: A. contacting the target analyte with the aforementioned dual-restriction nanoporin complex, causing the target analyte to move relative to the dual-restriction nanoporin complex; B. acquiring one or more measurements as the target analyte moves relative to the dual-restriction nanoporin complex, thereby determining the presence, absence, or one or more characteristics of the target analyte.
  • the present invention also provides a kit or apparatus for determining the presence, absence, or one or more characteristics of a target analyte.
  • the kit comprises the aforementioned dual-restriction nanoporous protein complex or related biomaterials, and a membrane.
  • the apparatus comprises the aforementioned dual-restriction nanoporous protein complex, and a membrane.
  • the membrane and the dual-restricted-region nanoporin complex can be packaged independently, or the dual-restricted-region nanoporin complex can be embedded in the membrane.
  • the membrane can be any membrane existing in the prior art, preferably a lipid bilayer.
  • the membrane is a lipid bilayer formed by the self-assembly of block copolymers/phospholipid molecules.
  • Rate-controlling proteins may include one or more combinations of nucleic acid-binding proteins, helicases, exonucleases, telomerases, topoisomerases, transcriptases, transloses, and/or polymerases.
  • the helicase is selected from Hel308 family helicases and modified Hel308 family helicases, RecD helicase and its variants, TrwC helicase and its variants, Dda helicase and its variants, TraI Eco and its variants, XPD Mbu and its variants, Pif1 helicase and its variants.
  • the target analyte is one or more of nucleotides, nucleic acids, amino acids, oligopeptides, polypeptides, and proteins.
  • the one or more features are selected from at least one of (i) the length of the target analyte; (ii) the identity of the target analyte; (iii) the sequence of the target analyte; (iv) the secondary structure of the target analyte; and (v) whether the target analyte is modified.
  • Identity refers to the similarity between sequences. Identity can be evaluated visually or by computer software. Using computer software, the identity between two or more sequences can be expressed as a percentage (%), which can be used to evaluate the identity between related sequences.
  • the nucleic acid can be naturally occurring or artificially synthesized.
  • the nucleic acid can be natural DNA, RNA, or modified DNA or RNA, or it can be artificially synthesized nucleic acid, such as peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threonine nucleic acid (TNA), locked nucleic acid (LNA), or other synthetic polymers with nucleoside side chains.
  • PNA peptide nucleic acid
  • GNA glycerol nucleic acid
  • TAA threonine nucleic acid
  • LNA locked nucleic acid
  • the nucleic acid is single-stranded, double-stranded, or at least partially double-stranded.
  • the nucleic acid can be of any length.
  • the length of the nucleic acid can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs, or it can be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs, or 100000 or more nucleotides or nucleotide pairs.
  • one or more nucleotides in the nucleic acid may be modified, such as methylated, oxidized, damaged, debased, protein-labeled, tagged, or linked to a spacer in the middle of a polynucleotide sequence.
  • the protein or nucleic acid may be composed of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described in this invention.
  • amino acids and their abbreviations and English abbreviations are as follows: Histidine (His, H); Serine (Ser, S); Glutamic acid (Glu, E); Glutamine (Gln, Q); Glycine (Gly, G); Threonine (Thr, T); Phenylalanine (Phe, F); Aspartic acid (Asp, D); Tyrosine (Tyr, Y); Leucine (Leu, L); Isoleucine (Ile, I); Arginine (Arg, R); Alanine (Ala, A); Valine (Val, V); Tryptophan (Trp, W); Methionine (Met, M); Asparagine (Asn, N); Cysteine (Cys, C); Lysine (Lys, K); Proline (Pro, P). Standard substitution notation is also used, i.e., E80Q means that the E at position 80 of the sequence is replaced by Q.
  • the restriction region (also referred to as the contraction region) refers to an opening defined by the inner lumen surface of a pore or pore complex, which serves to allow ions and target analytes (e.g., but not limited to polynucleotides or single nucleotides) to pass through the pore complex channel.
  • the restriction region is the narrowest opening within the pore or pore complex.
  • HfaB protein is a novel single-restriction-region nonamerican nanoporous protein with only 25.1% amino acid sequence similarity to CsgG protein.
  • HfaA protein also has only 28% amino acid sequence similarity to CsgF protein.
  • This invention directly synthesizes the P6 polypeptide (a polypeptide composed of the N-terminal 35 amino acids of the mature HfaA protein), and obtains a novel HfaB-P6 dual-restriction nanoporin complex via in vitro recombination.
  • the structure of the H4G-P6 dual-restriction nanoporin complex was then determined.
  • the dual-restriction nanoporin complex exhibits greater stability in both sequencing current and composition compared to the pre-assembly single-restriction nanoporin oligomer state.
  • Mutation modification of the HfaB protein addresses the issue of low DNA sample capture efficiency of the nanoporin complex, thereby improving sequencing current properties.
  • This invention addresses the scarcity of dual-restriction nanoporin complexes and the resulting low accuracy in nanopore sequencing.
  • a novel dual-restriction nanoporin complex has been developed, enriching the variety of nanoporins.
  • the dual-restriction nanoporin complex exhibits greater stability in both oligomer and sequencing current compared to the pre-assembly single-restriction nanoporin oligomers.
  • Mutation modification has resolved the issue of low DNA sample capture efficiency in dual-restriction nanoporin complexes, improving sequencing current properties, with the P1A-P6 nanoporin complex being the preferred choice.
  • Figure 1 is a schematic diagram of the HfaB-P6 dual-restriction region nanoporous protein complex.
  • Figure 2 shows an SDS-PAGE gel image of wild-type HfaB protein.
  • Figure 3 shows the SDS-PAGE gel image of the H4G/H4H/H4I/H4J/H4K/H4L mutant proteins.
  • Figure 4 shows the SDS-PAGE gel image of the H4C/H4D/H4E/H4F/H5C/H5D/H5E/H5F/H5G/H5H mutant proteins.
  • Figure 5 shows the current signal of wild-type HfaB nanopores.
  • Figure 6A shows the pore current signal of the nanopores in the H4G/H4H/H4I/H4J/H4K/H4L mutants.
  • Figure 6B shows the through-pore current signal of the H4G/H4H/H4I/H4J/H4K/H4L mutant nanopores.
  • Figure 7A shows the pore current signal of the nanopores of the H4C/H4E/H5C/H5D/H5E/H5F/H5G/H5H mutants.
  • Figure 7B shows the through-pore current signal of the H4C/H4E/H5C/H5D/H5E/H5F/H5G/H5H mutant nanopores.
  • Figure 8 shows the current signal of the H4E-P6 dual-restriction nanoporous protein complex.
  • Figure 9 shows the current signal of the H5C-P6 and H5D-P6 dual-restriction nanoporous protein complex.
  • Figure 10 shows the current signal of the H4G-P6 and H4K-P6 dual-restriction nanoporous protein complex.
  • Figure 11 shows the molecular sieve chromatography diagram of the H4G mutant.
  • Figure 12 shows the SDS-PAGE gel images of H4G and H4G-P6 proteins.
  • Figure 13 is a cryo-electron microscopy image of the H4G_P6 double-restricted region nanoporous protein complex.
  • Figure 14 is a cryo-electron microscopy density map of the H4G-P6 dual-restriction region nanoporous protein complex.
  • Figure 15 is a schematic diagram of the structure of the H4G-P6 dual-restriction nanoporous protein complex.
  • Figure 16 shows the conformation of the dual confinement region of the H4G-P6 nanoporous protein complex.
  • Figure 17 shows the structure of the P6 polypeptide and its interaction sites with H4G.
  • Figure 18 shows the current signal of the P1A-P6 dual-restriction region nanoporous protein complex.
  • Wild-type HfaB protein was derived from Rhizobium sp. RU33A (Uniprot ACCESSION: A0A1N6RVG5_9HYPH).
  • the HfaB protein expression gene was synthesized and codons optimized for expression in *E. coli*.
  • a Strep-TagII tag was added to the C-terminus of the protein.
  • Forward and reverse primers (HfaB-F and HfaB-R, as shown in Table 1) were designed, and the target gene was amplified by PCR.
  • the amplification product was ligated into the pQlink vector via a seamless cloning reaction to obtain the pQlink-HfaB Strep vector.
  • the pQlink-HfaB Strep vector contains the HfaB gene (sequence shown in SEQ ID NO: 2) and expresses the wild-type HfaB protein with the Strep-TagII tag.
  • the wild-type HfaB protein sequence is shown in SEQ ID NO: 1.
  • Ser79 at the narrowest point of the HfaB nanopore protein restriction region and the negatively charged Glu80 are key amino acids that determine the sequencing current properties of HfaB single-restriction nanopores. Mutation analysis was performed on Ser79 and Glu80.
  • Mutant vectors were constructed using single-point mutation PCR. Using the pQlink-HfaB Strep vector as a template, forward and reverse primers were designed (primers for the corresponding mutant vectors are shown in Table 1). PCR amplification of the pQlink-HfaB Strep vector yielded the corresponding PCR products. The PCR products were digested with DpnI enzyme (NEB, catalog number: R0176L), and the digested PCR products were transformed into DH5 ⁇ competent cells for positive clone screening. Single colonies were picked for sequencing, and plasmids extracted using the kit were stored at -20°C for later use.
  • H4G, H4H, H4I, H4J, H4K, H4L, H4C, H4D, H4E, H4F, H5C, H5D, H5E, H5F, H5G, and H5H mutant vectors were prepared using the aforementioned method.
  • the H4G mutant vector contains the H4G mutant encoding gene (sequence shown in SEQ ID NO: 4) and expresses the H4G mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 3).
  • the only difference between the H4G mutant and the wild-type HfaB protein is the E80Q mutation.
  • the H4H mutant vector contains the H4H mutant coding gene (sequence shown in SEQ ID NO: 6) and expresses the H4H mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 5).
  • the only difference between the H4H mutant and the wild-type HfaB protein is the E80N mutation.
  • the H4I mutant vector contains the H4I mutant encoding gene (sequence shown in SEQ ID NO: 8) and expresses the H4I mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 7).
  • the only difference between the H4I mutant and the wild-type HfaB protein is the E80S mutation.
  • the H4J mutant vector contains the H4J mutant encoding gene (sequence shown in SEQ ID NO: 10) and expresses the H4J mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 9).
  • the only difference between the H4J mutant and the wild-type HfaB protein is the E80W mutation.
  • the H4K mutant vector contains the H4K mutant encoding gene (sequence shown in SEQ ID NO: 12) and expresses the H4K mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 11).
  • the only difference between the H4K mutant and the wild-type HfaB protein is the E80A mutation.
  • the H4L mutant vector contains the gene encoding the H4H mutant (sequence shown in SEQ ID NO: 14) and expresses the H4L mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 13).
  • the only difference between the H4L mutant and the wild-type HfaB protein is the E80I mutation.
  • the H4C mutant vector contains the H4C mutant encoding gene (sequence shown in SEQ ID NO: 16) and expresses the H4C mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 15).
  • the only difference between the H4C mutant and the wild-type HfaB protein is the S79N mutation.
  • the H4D mutant vector contains the H4D mutant encoding gene (sequence shown in SEQ ID NO: 18) and expresses the H4D mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 17).
  • the only difference between the H4D mutant and the wild-type HfaB protein is the S79W mutation.
  • the H4E mutant vector contains the H4E mutant encoding gene (sequence shown in SEQ ID NO: 20) and expresses the H4E mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 19).
  • the only difference between the H4E mutant and the wild-type HfaB protein is the S79I mutation.
  • the H4F mutant vector contains the H4F mutant encoding gene (sequence shown in SEQ ID NO: 22) and expresses the H4F mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 21).
  • the only difference between the H4F mutant and the wild-type HfaB protein is the S79A mutation.
  • the H5C mutant vector contains the H5C mutant encoding gene (sequence shown in SEQ ID NO: 24) and expresses the H5C mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 23).
  • the only difference between the H5C mutant and the wild-type HfaB protein is the S79V mutation.
  • the H5D mutant vector contains the H5D mutant coding gene (sequence shown in SEQ ID NO: 26) and expresses the H5D mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 25).
  • the only difference between the H5D mutant and the wild-type HfaB protein is the S79L mutation.
  • the H5E mutant vector contains the H5E mutant coding gene (sequence shown in SEQ ID NO: 28) and expresses the H5E mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 27).
  • the only difference between the H5E mutant and the wild-type HfaB protein is the S79Q mutation.
  • the H5F mutant vector contains the H5F mutant coding gene (sequence shown in SEQ ID NO: 30) and expresses the H5F mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 29).
  • the only difference between the H5F mutant and the wild-type HfaB protein is the S79Y mutation.
  • the H5G mutant vector contains the H5G mutant encoding gene (sequence shown in SEQ ID NO: 31) and expresses the H5G mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 32).
  • the only difference between the H5G mutant and the wild-type HfaB protein is the S79E mutation.
  • the H5H mutant vector contains the H5H mutant coding gene (sequence shown in SEQ ID NO: 34) and expresses the H5H mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 33).
  • the only difference between the H5H mutant and the wild-type HfaB protein is the S79K mutation.
  • Example 1 Bacterial expansion culture and induction of expression.
  • the vector prepared in Example 1 was transferred into OMP8 competent cells (OMP8 is disclosed in Coupling site-directed mutagenesis with high-level expression: large scale production of mutant porins from E. coli (2016), specifically BL21(DE3)omp8, which is incorporated herein by reference).
  • Seed culture was incubated overnight at 37°C and 200 rpm. 1 mL of the seed culture was inoculated into 1 L of LB medium for expansion culture. When the OD 600 was 1, the temperature was lowered to 26°C and induced overnight with 0.2 mM IPTG (isopropyl thiogalactoside).
  • Collection of cell membranes Bacterial cells were collected at 4000 rpm.
  • Each 1 L of cells was resuspended in 20 mL of lysis buffer and sonicated for 2 min. The cells were centrifuged at 18000 rpm and 4°C for 1 hour to collect the cell membranes. (3) Melting of the cell membrane. The membrane components were resuspended in a glass homogenizer using a membrane buffer (15 mL of membrane buffer was used to resuspend the cell membrane for every 1 L of bacteria). The membrane was thoroughly dissolved by magnetic stirring at 4 °C for 1 h. The membrane proteins were then fully extracted from the cell membrane using a detergent. The membrane protein components were collected by centrifugation at 18000 rpm at 4 °C for 1 h. (4) Strep column affinity chromatography.
  • the supernatant was incubated with Strep beads (Streptactin Beads 4FF, brand: Tiandi Renhe, catalog number: SA053250) at 4 °C for 45 min.
  • Strep beads Streptactin Beads 4FF, brand: Tiandi Renhe, catalog number: SA053250
  • the mixture of supernatant and beads was introduced into the column and flow-through was performed twice by gravity. 10 column volumes of Wash buffer were used to remove non-specifically bound contaminating proteins, and 5 column volumes of Elution buffer were used to elute the target protein. Finally, the proportion and purity of each component were determined by SDS-PAGE gel chromatography.
  • Lysis buffer 20 mM Tris-HCl pH 8.0, 150 mM NaCl.
  • Film dissolution buffer 20 mM Tris-HCl pH 8.0, 150 mM NaCl, 1% LDAO.
  • Wash buffer 20 mM Tris-HCl pH 8.0, 150 mM NaCl, 0.3% LDAO.
  • Elution buffer 20 mM Tris-HCl pH 8.0, 150 mM NaCl, 0.1% LDAO, 2.5 mM desulfurized biotin.
  • the SDS-PAGE gel detection results are shown in Figures 2-4. Wild-type HfaB protein and sixteen mutant proteins can be expressed and purified normally. On the SDS-PAGE gel, when not heated (protein samples placed at 25°C for 10 minutes), there are two states: oligomers and monomers, with most being oligomers. When heated (protein samples heated at 100°C for 10 minutes), all of them become monomers. Therefore, wild-type HfaB protein and sixteen mutant proteins can form stable nanopore channels.
  • an artificial membrane and a single nanoporin system were constructed to test the current passing through the nanoporin.
  • the block copolymers/phospholipids were induced to self-assemble into a lipid bilayer by passing the oil and liquid phases twice through the surface of a microwell support array.
  • the bilayer membrane was then stably preserved in a buffer solution (200 mM KCl, 100 mM K3[Fe(CN)6], 150 mM K4[Fe(CN)6], 25 mM PBS, pH 8.0).
  • nanoporin proteins i.e., the wild-type HfaB protein and sixteen mutant proteins prepared in Example 2
  • excess residual nanopores were removed by passing 2 mL of the above buffer solution through the system.
  • the nanoporin pore current signal was recorded at 150 mV. If the nanoporin pore current is relatively stable, its DNA sequencing properties can be attempted.
  • 4 ⁇ L of anchoring buffer 50 nM DNA tether, 200 mM KCl, 25 mM PBS, pH 8.0
  • sequencing buffer 500 mM KCl, 30 mM MgCl2, 30 mM ATP, 25 mM PBS, pH 8.0
  • the DNA sample to be tested was assembled with the T4 Dda mutant protein (the T4 Dda mutant protein is a helicase or rate-controlling protein, specifically the T4 Dda-E94C/C109A/C136A/A360C described in patent WO2014135838A1) according to the method recorded in patent WO2014135838A1.
  • the DNA sample sequence is shown in SEQ ID NO: 38.
  • Figure 5 shows the current signal of wild-type HfaB nanoporin, with the upper figure representing the pore current signal and the lower figure representing the DNA sequencing properties.
  • Wild-type HfaB nanoporin exhibits uniform electrophysiological properties, but suffers from unstable current states, spontaneous blockage, and significant spike-like noise. Furthermore, the sequencing signal amplitude is relatively small when single-stranded DNA translocates through the pore's contraction region.
  • wild-type HfaB nanoporin possesses base recognition capabilities and can be used for DNA sequencing, but its overall sequencing performance still needs improvement.
  • FIG. 6A is the pore current signal diagram
  • Figure 6B is the characterization of DNA sequencing properties.
  • the pore current noise of the Glu 80 mutant H4G-L is reduced compared to the wild-type HfaB, but it is still unstable.
  • the amplitude of the transpore signal is small and there is no significant improvement compared to the wild-type HfaB.
  • Figures 7A and 7B show the pore current signal diagrams of H4C, H4E, H5C, H5D, H5E, H5F, H5G, and H5H mutant nanopore proteins.
  • Figure 7A shows the pore current signal diagram
  • Figure 7B shows the DNA sequencing properties.
  • the H4E (S79I) and H5C (S79V) mutants showed significantly improved DNA sequencing properties, with the H4E mutant showing particularly significant improvement, resolving the current instability issue and stabilizing the pore current at 0.25-0.3 nA. Simultaneously, the sequencing accuracy of H4E and H5C was significantly improved, with a marked increase in both sequencing signal amplitude and step number.
  • H4C, H5D, H5E, H5F, H5G, and H5H mutants were not improved; they still exhibited problems such as unstable current states, susceptibility to spontaneous blockage, significant spike-like noise, and small sequencing signal amplitude.
  • H4D and H4F are difficult to recombine in block copolymer artificial membranes and cannot be used for nanopore sequencing.
  • P6 peptide is a polypeptide consisting of the N-terminal 35 amino acids of the mature HfaA protein, with the sequence shown in SEQ ID NO: 37.
  • the average molecular weight of P6 peptide (95% purity) is 4186.58 g/mol, and the arithmetic mean of hydrophobicity is -1.16, making it a hydrophilic peptide.
  • P6 peptide (95% purity) was synthesized and dissolved in ddH2O to a final concentration of 0.5 mg/ml.
  • Nanoporins H4E, H5C, H5D, H4G, and H4K prepared in Example 2 were incubated with P6 peptide at a molar ratio of 1:2 (nanoporin:peptide) overnight to obtain a protein mixture.
  • the protein mixture was concentrated and excess peptide was removed to obtain a dual-restriction nanoporin complex.
  • H4E-P6, H5C-P6, H5D-P6, H4G-P6, and H4K-P6 dual-restriction nanoporin complexes were prepared.
  • the DNA sequencing results of the H4E-P6 dual-restriction nanoporin complex are shown in Figure 8; the results of the H5C-P6 and H5D-P6 dual-restriction nanoporin complexes are shown in Figure 9; and the results of the H4G-P6 and H4K-P6 dual-restriction nanoporin complexes are shown in Figure 10.
  • the results show that the P6 peptides can stably assemble into H4E, H5C, H5D, H4G, or H4K mutants to form dual-restriction nanoporin complexes.
  • the dual-restriction nanoporin complexes exhibit significantly smaller pore currents, significantly improved stability, and significantly reduced fluctuations in current size.
  • H4E-P6, H5C-P6, and H5D-P6 exhibit extremely low DNA sample capture efficiency. For example, it takes approximately 40 minutes to observe a single DNA sample through-well signal with H4E-P6, and the through-well signal has significant noise; no DNA sample through-well signal was observed with H5C-P6 and H5D-P6 within a short period.
  • the DNA sample capture efficiency of the H4G-P6 dual-restriction nanoporin complex and the H4K-P6 dual-restriction nanoporin complex is comparable to that before recombination, solving the problem of low DNA sample capture efficiency of dual-restriction nanoporin complexes.
  • the DNA sample through-well signals of the H4G-P6 and H4K-P6 dual-restriction nanoporin complexes are similar to those of the pre-assembly H4G and H4K single-restriction nanoporin mutants, and the sequencing signal amplitude is also relatively small when single-stranded DNA translocates through the porin restriction region.
  • Example 5 Atomic-level structural analysis of the H4G-P6 dual-restriction region nanoporous protein complex
  • Example 4 Sequencing property testing in Example 4 confirmed that the P6 peptide could be assembled into the H4G mutant in vitro. Therefore, this example attempts to obtain the H4G-P6 dual-restriction region nanoporous protein complex through in vitro assembly and resolve its structure. By analyzing the interaction sites between the P6 peptide and H4G, as well as the amino acid composition of the second restriction region formed by the P6 peptide, the H4G-P6 nanoporous protein complex is further modified and optimized to improve the accuracy of nanopore sequencing, especially the resolution of homopolymers.
  • the sample preparation method for cryo-electron microscopy is as follows:
  • H4G-P6 assembly Take the peak tip sample from molecular sieve chromatography and incubate it overnight with P6 peptide at a molar ratio of 1:2 (H4G mutant: P6 peptide) to obtain a protein mixture. Concentrate the protein mixture to 5 mg/ml.
  • cryo-electron microscopy sample observation and screening The prepared cryo-electron microscopy samples were loaded onto the Talos F200C for observation, and samples with good contrast, good dispersion and little contamination were screened for data collection.
  • the molecular weight of the P6 polypeptide monomer is only 4.2kD, and it cannot be distinguished in the SDS-PAGE gel image.
  • all the oligomers of the H4G mutant protein were completely converted into monomers after heating at 100°C for 10 min. After the P6 polypeptide assembly, under the same heating conditions, only some oligomers were converted into monomers, indicating that the P6 polypeptide and the H4G mutant successfully assembled to form the H4G-P6 nanoporous protein complex, and the oligomeric state of the H4G mutant became more stable after polypeptide assembly. This phenomenon corroborates the decrease in pore current and significant improvement in pore current stability after P6 peptide assembly in Example 4.
  • the three amino acids Phe77, Ser79, and Gln80 form the first restriction region of the H4G-P6 double-restricted nanoporin complex, with a diameter of [missing information].
  • Nine P6 peptides are obliquely inserted into the ⁇ -barrel of the H4G nanoporin.
  • the Asn15 of these nine P6 peptides forms the second confinement region of the H4G-P6 dual-confinement nanoporin complex, with a diameter of [missing information].
  • the first and second confinement regions are coaxially separated within the H4G nanopores.
  • the N-terminus of the P6 peptide is a long loop domain, and the C-terminus is an ⁇ -helix and a short loop domain extending out of the lumen.
  • Each P6 peptide interacts with three adjacent H4G monomers, enhancing the stability of the dual-confinement nanoporous protein complex.
  • the interactions between the P6 peptide and the H4G monomers mainly include electrostatic interactions, hydrogen bonds, hydrophobic interactions, and van der Waals forces.
  • Glu5, Arg7, Arg12, and Glu16 of the P6 peptide interact electrostatically with Arg184, Glu81, Asp233, and Lys235 of the H4G monomers, respectively.
  • Arg17 of the P6 peptide interacts electrostatically with Asp233 and Glu173 of the H4G monomers.
  • Asn1, Tyr9, Arg12, and Glu24 of the P6 peptide interact with Glu242, Ser169, Ser222, and Phe226 of the H4G monomers via hydrogen bonds, respectively.
  • Hydrophobic interactions mainly exist between Leu20 and Leu30 of the P6 peptide and Phe224 and Phe226 of the H4G monomer, and between Phe11 of the P6 peptide and Phe220 and Ile172.
  • Van der Waals forces exist between Asn1 of the P6 peptide and Gln241, Val215, Val188 and Ala187 of the H4G monomer, and van der Waals forces exist between Tyr9, Phe11, Thr26, Leu30 and Gln31 of the P6 peptide and Gly171, Glu173, Phe226, Asp229 and Asp229 of the H4G monomer, respectively ( Figure 17).
  • a mutant vector was constructed using single-point mutation PCR.
  • primers P1A-F and P1A-R
  • PCR amplification of the vector yielded PCR products.
  • the PCR products were digested with DpnI enzyme (NEB, catalog number: R0176L), and the digested PCR products were transformed into DH5 ⁇ competent cells for positive clone screening. Single colonies were picked for sequencing, and plasmids extracted using the kit were stored at -20°C for later use.
  • a P1A mutant protein vector was prepared using the aforementioned method.
  • This vector contains the P1A mutant encoding gene (sequence shown in SEQ ID NO: 36) and expresses the P1A mutant with a Strep-TagII tag (sequence shown in SEQ ID NO: 35). Compared to the wild-type HfaB protein, the P1A mutant differs in that it contains S79I and E80Q mutations.
  • P1A-R GTTGCCCGCTTCCTGAATGCTATAGCGGCCGGTC.
  • the P1A mutant protein was prepared using the same method as in Example 2.
  • the corresponding P6 peptide (95% purity) was synthesized and dissolved in ddH2O to a final concentration of 0.5 mg/ml.
  • the P1A mutant protein and the peptide were incubated overnight at a molar ratio of 1:2 (P1A mutant protein: peptide) to obtain a protein mixture.
  • the protein mixture was concentrated and excess peptide was removed to obtain the dual-restriction nanoporous protein complex P1A-P6.
  • Pore current analysis revealed a significant decrease in the pore current of the P1A-P6 dual-restriction nanoporin complex, along with reduced noise and a decrease in the amplitude of spike-like noise, indicating successful recombination of P1A and P6 to form the P1A-P6 dual-restriction nanoporin complex.
  • Through-hole signal analysis showed a significantly improved DNA sample capture rate and sequencing stability compared to H4E, while the sequencing accuracy remained comparable to H4E.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

提供一种双限制区纳米孔蛋白复合物及其应用。该纳米孔蛋白复合物包含HfaB蛋白和截短的HfaA蛋白,截短的HfaA蛋白连接HfaB蛋白并在纳米孔蛋白复合物中形成限制区。通过体外组装的方式,提供了新型双限制区纳米孔蛋白复合物,丰富了纳米孔蛋白种类,双限制区纳米孔蛋白复合物比组装前的单限制区纳米孔蛋白寡聚体状态及测序电流都更加稳定。

Description

一种双限制区纳米孔蛋白复合物及其应用 技术领域
本发明涉及生物技术领域,尤其是涉及一种双限制区纳米孔蛋白复合物及其应用。
背景技术
2020年海关总署统计发现基因测序仪及相关试剂盒进口占比81.2%,测序技术被列为生物医疗领域内具有一等风险的“卡脖子”技术。纳米孔测序具有实时、便携、潜在低成本、超长读长以及能检测核酸上表观遗传修饰信息和直接RNA测序等特点,作为第四代测序技术,展现出巨大的优越性,成为核酸测序技术的未来发展方向。纳米孔测序原理:通过电场力驱动核酸单链按顺序穿过嵌入在绝缘人工膜上的纳米尺度的孔道蛋白。由于不同的碱基的大小和化学性质不同,通过纳米孔道时产生不同阻断程度和阻断时间的电流信号,由此可以获得核酸分子上的碱基信息,实现对单链核酸的测序。目前,ONT陆续推出了多种商用纳米孔测序仪:MinION、GridION和PromethION,高准确度的生物纳米孔测序仪的国产化研制迫在眉睫。
与二代测序相比,目前纳米孔测序准确率偏低,尤其是对均聚物的辨别能力较低。纳米孔蛋白是决定纳米孔测序准确率的关键因素,单限制区纳米孔蛋白的改造优化可以在一定程度上改善测序电流性质,提高测序准确率。双限制区纳米孔蛋白复合物为提高纳米孔的测序准确率提供了新思路,可进一步提高测序准确率,尤其是提高纳米孔蛋白对均聚物的辨别能力。目前,双限制区纳米孔蛋白复合物稀缺。
发明内容
本发明针对目前双限制区纳米孔蛋白复合物稀缺,纳米孔测序准确率低这一问题,通过体外组装的方式,提供了一种新型双限制区纳米孔蛋白复合物,丰富了纳米孔蛋白种类。
本发明提供了一种双限制区纳米孔蛋白复合物,所述纳米孔蛋白复合物包含HfaB蛋白和截短的HfaA蛋白,所述截短的HfaA蛋白连接所述HfaB蛋白并在所述纳米孔蛋白复合物中形成限制区。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述截短的HfaA蛋白为如下任一所述的多肽:(a1)氨基酸序列为HfaA蛋白N端23-35个(例如23、27、31或35个)氨基酸所示的多肽;(a2)将HfaA蛋白N端23-35个(例如23、27、31或35个)氨基酸所示的氨基酸序列经过一个或几个氨基酸的取代和/或缺失和/或添加且具有相同功能的多肽;(a3)与(a1)-(a2)中任一所限定的氨基酸序列具有80%以上同一性且具有相同功能的多肽;(a4)在(a1)-(a3)中任一所限定的多肽的末端连接标签后得到的融合多肽。
HfaA蛋白的氨基酸序列genbank登录号为SAMN05880561_102762。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述截短的HfaA蛋白氨基酸序列如SEQ ID NO:37所示。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述截短的HfaA蛋白插入所述HfaB蛋白的腔中。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述HfaB蛋白包含9个HfaB蛋白单体。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述纳米孔蛋白中,所述HfaB蛋白单体与所述截短的HfaA蛋白的比例为1:1。
可选地,根据上述的双限制区纳米孔蛋白复合物,所述HfaB蛋白单体为如下任一所述的蛋白单体:(b1)氨基酸序列为SEQ ID NO:1所示的蛋白单体;(b2)将SEQ ID NO:1所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加且具有相同功能的蛋白单体;(b3)与(b1)-(b2)中任一所限定的氨基酸序列具有80%以上同一性且具有相同功能的蛋白单体;(b4)在(b1)-(b3)中任一所限定的蛋白单体的末端连接标签后得到的融合蛋白单体。
可选地,根据上述的双限制区纳米孔蛋白复合物,(b2)所述蛋白单体包含如下至少一个取代:第79位的丝氨酸被天门冬酰胺、色氨酸、异亮氨酸、丙氨酸、缬氨酸、亮氨酸、酪氨酸、谷氨酸或赖氨酸取代;第80位的谷氨酸被谷氨酰氨、天门冬酰胺、丝氨酸、色氨酸、丙氨酸或异亮氨酸取代。
可选地,(b2)所述蛋白单体的氨基酸序列如SEQ ID NO:3、SEQ ID NO:5、SEQ ID NO:7、SEQ ID NO:9、SEQ ID NO:11、SEQ ID NO:13、SEQ ID NO:15、SEQ ID NO:17、SEQ ID NO:19、SEQ ID NO:21、SEQ ID NO:23、SEQ ID NO:25、SEQ ID NO:27、SEQ ID NO:29、SEQ ID NO:31、SEQ ID NO:33和/或SEQ ID NO:35所示。
可选地,所述连接为共价或非共价连接。所述连接可为每一个截短的HfaA蛋白与多个HfaB蛋白单体连接以提高限制区纳米孔蛋白复合物的稳定性。例如,每一个截短的HfaA蛋白与相邻三个HfaB蛋白单体连接。
可选地,所述连接是借助于所述HfaB蛋白单体的第81-242位的位置处至少一个氨基酸残基。例如,所述HfaB蛋白单体和截短的HfaA蛋白经由分别与截短的HfaA蛋白(序列如SEQ ID NO:37所示)和HfaB蛋白单体(序列如SEQ ID NO:3、SEQ ID NO:5、SEQ ID NO:7、SEQ ID NO:9、SEQ ID NO:11、SEQ ID NO:13、SEQ ID NO:15、SEQ ID NO:17、SEQ ID NO:19、SEQ ID NO:21、SEQ ID NO:23、SEQ ID NO:25、SEQ ID NO:27、SEQ ID NO:29、SEQ ID NO:31、SEQ ID NO:33或SEQ ID NO:35)的以下一对或多对位置相对应的位置处的残基而非共价连接:Glu5和Arg184、Arg7和Glu81、Arg12和Asp233、Glu16和Lys235、Arg17和Asp233、Arg17和Glu173。更具体地,所述HfaB蛋白单体和截短的HfaA蛋白还经由分别与截短的HfaA蛋白和HfaB蛋白单体的以下一对或多对位置相对应的位置处的残基而非共价连接:Asn1和Gln241、Asn1和Val215、Asn1和Val188、Asn1和Ala187、Asn1和Glu242、Tyr9和Ser169、Tyr9和Gly171、Phe11和 Phe220、Phe11和Ile172、Phe11和Glu173、Phe11和Ala183、Arg12和Ser222、Arg12和Glu173、、Leu20和Phe224、Glu24和Phe226、Thr26和Phe226、Leu30和Phe226、Leu30和Asp229、Gln31和Asp229。
在一些实施方案中,上述的双限制区纳米孔蛋白复合物结构如图1所示,其包含:(1)HfaB蛋白,HfaB蛋白包含第一开口、中间区段、第二开口以及从所述第一开口延伸通过所述中间区段到达所述第二开口的内腔,其中所述中间区段的内腔表面限定一个限制区(即图中第一限制区);和(2)多个截短的HfaA蛋白,每个截短的HfaA蛋白含有HfaB结合区,其中所述多个截短的HfaA蛋白在所述HfaB的中间区段内形成另一个限制区(即图中第二限制区),并且两个限制区在所述HfaB蛋白的中间区段内同轴间隔开。
上述多肽或蛋白单体可通过先合成其编码基因,再进行生物表达得到,也可通过全化学人工合成。
上述多肽或蛋白单体中,所述标签可指利用DNA体外重组技术,与目的蛋白一起融合表达的一种多肽或者蛋白,以便于目的蛋白的表达、检测、示踪和/或纯化。所述蛋白标签可为Strep-TagII标签、Flag标签、His标签、MBP标签、HA标签、myc标签、GST标签和/或SUMO标签等。
上述多肽或蛋白单体中,同一性是指氨基酸序列的同一性。可使用国际互联网上的同源性检索站点测定氨基酸序列的同一性,如NCBI主页网站的BLAST网页。例如,可在高级BLAST2.1中,通过使用blastp作为程序,将Expect值设置为10,将所有Filter设置为OFF,使用BLOSUM62作为Matrix,将Gap existence cost,Per residue gap cost和Lambda ratio分别设置为11,1和0.85(缺省值)并进行检索一对氨基酸序列的同一性进行计算,然后即可获得同一性的值(%)。
本文中,所述80%以上的同一性可为至少80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%的同一性。
上述双限制区纳米孔蛋白复合物具体可为下述实施例制备的H4E-P6、H5C-P6、H5D-P6、H4G-P6、H4K-P6或P1A-P6。
本发明还提供了一种用于产生上述双限制区纳米孔蛋白复合物的方法,所述方法包括如下A1或A2:A1使一个或多个HfaB蛋白单体和截短的HfaA蛋白在宿主细胞中共表达,从而允许在所述细胞中形成所述双限制区纳米孔蛋白复合物;A2使一个或多个HfaB蛋白单体与截短的HfaA蛋白接触,从而允许在体外形成所述双限制区纳米孔蛋白复合物。
可选地,根据上述的方法,A2中HfaB蛋白单体与截短的HfaA蛋白的摩尔比为1:2。
上述的双限制区纳米孔蛋白复合物的相关生物材料也属于本发明的保护范围之内。所述相关生物材料为所述生物材料为下述任一种:a1)编码上述的双限制区纳米孔蛋白复合物的核酸分子;a2)含有a1)所述核酸分子的表达盒;a3)含有a1)所述核酸分子的重组载体、或含有a2)所述表达盒的重组载体;a4)含有a1)所述核酸分子的重组细胞、或含有a2)所述表达盒的重组细胞、或含有a3)所述重组载体的重组细胞。
上述生物材料中,所述核酸分子可以是DNA,如cDNA、基因组DNA或重组DNA;所述核酸分子也可以是RNA,如mRNA、siRNA、shRNA、sgRNA、miRNA或反义RNA。
上述生物材料中,所述的表达盒是指能够在宿主细胞中表达基因的DNA,该DNA不但可包括启动基因转录的启动子,还可包括终止基因转录的终止子。进一步,所述表达盒还可包括增强子序列。
上述的双限制区纳米孔蛋白复合物、上述的相关生物材料在检测靶分析物存在、不存在或一个或多个特征或制备检测靶分析物存在、不存在或一个或多个特征的产品中的应用也属于本发明保护范围之内。
本发明还提供了一种用于确定靶分析物存在、不存在或一个或多个特征的方法,所述方法包括:A.所述靶分析物与上述的双限制区纳米孔蛋白复合物接触,使得所述靶分析物相对于所述双限制区纳米孔蛋白复合物移动;B.在所述靶分析物相对于所述双限制区纳米孔蛋白复合物移动时获取一个或多个测量值,从而确定所述靶分析物的存在、不存在或一个或多个特征。
本发明还提供了一种用于确定靶分析物存在、不存在或一个或多个特征的试剂盒或装置。所述试剂盒包括上述的双限制区纳米孔蛋白复合物或上述的相关生物材料,和膜。所述装置包括上述的双限制区纳米孔蛋白复合物,和膜。
上述试剂盒或装置中,膜和双限制区纳米孔蛋白复合物可以独立包装,也可以将双限制区纳米孔蛋白复合物嵌入所述膜中。
所述的膜可以为任何现有技术中存在的膜,优选为脂双层。例如,所述的膜为嵌段共聚物/磷脂分子自组装形成脂双层。
上述试剂盒或装置中,还可包括控速蛋白。控速蛋白可包括核酸结合蛋白、解旋酶、核酸外切酶、端粒酶、拓扑异构酶、转录酶、转位酶和/或聚合酶中的一种或多种组合。
可选地,所述的解旋酶选自Hel308家族解旋酶及修饰的Hel308家族解旋酶、RecD解旋酶及其变体、TrwC解旋酶及其变体、Dda解旋酶及其变体、TraI Eco及其变体、XPD Mbu及其变体、Pif1解旋酶及其变体。
可选地,所述靶分析物为核苷酸、核酸、氨基酸、寡聚肽、多肽、蛋白质中的一种或多种。
可选地,所述一个或多个特征选自(i)所述靶分析物的长度;(ii)所述靶分析物的同一性;(iii)所述靶分析物的序列;(iv)所述靶分析物的二级结构;和(v)所述靶分析物是否是经修饰的中的至少一种。“同一性”指与序列之间的相似性。同一性可以用肉眼或计算机软件进行评价。使用计算机软件,两个或多个序列之间的同一性可以用百分比(%)表示,其可以用来评价相关序列之间的同一性。
可选地,所述的核酸可以是天然存在的或人工合成的。具体地,所述的核酸可以是天然的DNA、RNA或者经过修饰的DNA或RNA,也可以是人工合成的核酸,例如肽核酸(PNA)、甘油核酸(GNA)、苏糖核酸(TNA)、锁定核酸(LNA)或其他具有核苷侧链的合成聚合物。
可选地,所述的核酸为单链、双链或至少一部分是双链的。
可选地,所述的核酸可以为任意长度。例如,核酸的长度可以是至少10,至少50,至少100,至少150,至少200,至少250,至少300,至少400或至少500个核苷酸或核苷酸对,也可以为1000个或更多个核苷酸或核苷酸对,5000个或更多个核苷酸或核苷酸对或100000个或更多个核苷酸或核苷酸对。
可选地,所述的核酸中的一个或多个核苷酸可以是经过修饰的,例如甲基化、氧化、损伤、脱碱基的、蛋白标记、带有标签或多核苷酸序列中间连接一段间隔物。
上述截短的HfaA蛋白在制备双限制区纳米孔蛋白复合物中的应用也属于本发明保护范围之内。
本发明所述的“包含”或“包括”在本申请中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸,但仍然具有本发明所述的活性。
本文中,氨基酸及缩写和英文简称如下所示:组氨酸(His,H);丝氨酸(Ser,S);谷氨酸(Glu,E);谷氨酰胺(Gln,Q);甘氨酸(Gly,G);苏氨酸(Thr,T);苯丙氨酸(Phe,F);天冬氨酸(Asp,D);酪氨酸(Tyr,Y);亮氨酸(Leu,L);异亮氨酸(Ile,I);精氨酸(Arg,R);丙氨酸(Ala,A);缬氨酸(Val,V);色氨酸(Trp,W);甲硫氨酸(Met,M);天冬酰胺(Asn,N);半胱氨酸(Cys,C);赖氨酸(Lys,K);脯氨酸(Pro,P)。也使用标准的取代记法,即E80Q意指序列第80位的E被Q取代。
本文中,限制区(也被称为收缩区)是指由孔或孔复合物的内腔表面限定的孔眼,其作用是允许离子和靶分析物(例如但不限于多核苷酸或单个核苷酸)通过孔复合物通道。在一些实施方案中,限制区是孔或孔复合物中最窄的孔眼。
目前应用于纳米孔测序的双限制区纳米孔蛋白复合物稀缺,主要包括CsgG-CsgF蛋白复合物。HfaB蛋白是一种单限制区九聚体纳米孔蛋白,与CsgG蛋白的氨基酸序列相似性仅为25.1%,是一种新型的单限制区纳米孔蛋白。HfaA蛋白与CsgF蛋白的氨基酸序列相似性也仅为28%。
本发明实施例直接合成P6多肽(成熟的HfaA蛋白的N端35个氨基酸组成的多肽),通过体外重组的方式获得HfaB-P6新型双限制区纳米孔蛋白复合物,解析了H4G-P6双限制区纳米孔蛋白复合物的结构。双限制区纳米孔蛋白复合物比组装前的单限制区纳米孔蛋白寡聚体状态及测序电流都更加稳定。通过对HfaB蛋白突变改造,解决了纳米孔蛋白复合物DNA样品捕获效率低的问题,改善了测序电流性质。
本发明针对目前双限制区纳米孔蛋白复合物稀缺,纳米孔测序准确率低这一问题,通过体外组装的方式,发明了新型双限制区纳米孔蛋白复合物,丰富了纳米孔蛋白种类。双限制区纳米孔蛋白复合物比组装前的单限制区纳米孔蛋白寡聚体状态及测序电流都更加稳定。通过突变改造,解决了双限制区纳米孔蛋白复合物DNA样品捕获效率低的问题,改善了测序电流性质,优选P1A-P6纳米孔蛋白复合物。后续H4G-P6双限制区纳米孔蛋白复合物的结构解析工作,明确了P6多肽与H4G单限制区纳米孔蛋白相互作用关键氨基酸组成,以及P6多肽形成的第二个限制区的关键氨基酸的组成,针对以上关键氨基酸的突变改造工作,将有利于提高纳米孔测序准确率,尤其是对均聚物的分辨能力。
附图说明
图1为HfaB-P6双限制区纳米孔蛋白复合物模式图。
图2为野生型HfaB蛋白SDS-PAGE胶图。
图3为H4G/H4H/H4I/H4J/H4K/H4L突变体蛋白SDS-PAGE胶图。
图4为H4C/H4D/H4E/H4F/H5C/H5D/H5E/H5F/H5G/H5H突变体蛋白SDS-PAGE胶图。
图5为野生型HfaB纳米孔电流信号图。
图6A为H4G/H4H/H4I/H4J/H4K/H4L突变体纳米孔孔道电流信号图。
图6B为H4G/H4H/H4I/H4J/H4K/H4L突变体纳米孔过孔电流信号图。
图7A为H4C/H4E/H5C/H5D/H5E/H5F/H5G/H5H突变体纳米孔孔道电流信号图。
图7B为H4C/H4E/H5C/H5D/H5E/H5F/H5G/H5H突变体纳米孔过孔电流信号图。
图8为H4E-P6双限制区纳米孔蛋白复合物电流信号图。
图9为H5C-P6和H5D-P6双限制区纳米孔蛋白复合物电流信号图。
图10为H4G-P6和H4K-P6双限制区纳米孔蛋白复合物电流信号图。
图11为H4G突变体分子筛层析图。
图12为H4G及H4G-P6蛋白SDS-PAGE胶图。
图13为H4G_P6双限制区纳米孔蛋白复合物冷冻电镜照片。
图14为H4G-P6双限制区纳米孔蛋白复合物冷冻电镜密度图。
图15为H4G-P6双限制区纳米孔蛋白复合物结构示意图。
图16为H4G-P6纳米孔蛋白复合物双限制区构象图。
图17为P6多肽结构及其与H4G相互作用位点图。
图18为P1A-P6双限制区纳米孔蛋白复合物电流信号图。
具体实施方式
下面结合具体实施方式对本发明进行进一步的详细描述,给出的实施例仅为了阐明本发明,而不是为了限制本发明的范围。以下提供的实施例可作为本技术领域普通技术人员进行进一步改进的指南,并不以任何方式构成对本发明的限制。
下述实施例中的实验方法,如无特殊说明,均为常规方法,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。以下实施例中的定量试验,均设置三次重复实验,结果取平均值。
实施例1纳米孔蛋白载体的构建
(1)野生型HfaB纳米孔蛋白载体的构建
野生型HfaB蛋白来源于Rhizobium sp.RU33A(Uniprot ACCESSION:A0A1N6RVG5_9HYPH),合成HfaB蛋白表达基因并进行适用于大肠杆菌表达的密码子优化。以合成的HfaB基因为模板,在蛋白C端添加Strep-TagII标签,设计正反向引物(HfaB-F和HfaB-R,表1所示),PCR扩增目的基因,扩增产物通过无缝克隆反应连接至pQlink载体获得pQlink-HfaBStrep载体。pQlink-HfaBStrep载体包含HfaB基因(序列如SEQ ID NO:2所示),表达带有Strep-TagII标签的野生型HfaB蛋白,野生型HfaB蛋白序列如SEQ ID NO:1所示。
(2)突变体纳米孔蛋白载体的构建
HfaB纳米孔蛋白限制区最窄处的Ser79以及带负电的Glu 80是决定HfaB单限制区纳米孔的测序电流性质的关键氨基酸,针对Ser79和Glu80进行突变分析。
采用单点突变PCR的方式构建突变体载体,以pQlink-HfaBStrep载体为模板,分别设计正反向引物(相应突变体载体的引物如表1所示)PCR扩增pQlink-HfaBStrep载体获得相应的PCR产物,DpnI酶(NEB公司,货号:R0176L)消化PCR产物,将消化得到的PCR产物转入DH5α感受态细胞中进行阳性克隆筛选。挑取单菌落测序,试剂盒提质粒-20℃保存备用。采用前述方法分别制备H4G、H4H、H4I、H4J、H4K、H4L、H4C、H4D、H4E、H4F、H5C、H5D、H5E、H5F、H5G和H5H突变体载体。
引物序列如下表所示:
表1引物序列表
H4G突变体载体包含H4G突变体编码基因(序列如SEQ ID NO:4所示),表达带有Strep-TagII标签的H4G突变体(序列如SEQ ID NO:3所示)。与野生型HfaB蛋白相比,H4G突变体不同之处仅在于E80Q突变。
H4H突变体载体包含H4H突变体编码基因(序列如SEQ ID NO:6所示),表达带有Strep-TagII标签的H4H突变体(序列如SEQ ID NO:5所示)。与野生型HfaB蛋白相比,H4H突变体不同之处仅在于E80N突变。
H4I突变体载体包含H4I突变体编码基因(序列如SEQ ID NO:8所示),表达带有Strep-TagII标签的H4I突变体(序列如SEQ ID NO:7所示)。与野生型HfaB蛋白相比,H4I突变体不同之处仅在于E80S突变。
H4J突变体载体包含H4J突变体编码基因(序列如SEQ ID NO:10所示),表达带有Strep-TagII标签的H4J突变体(序列如SEQ ID NO:9所示)。与野生型HfaB蛋白相比,H4J突变体不同之处仅在于E80W突变。
H4K突变体载体包含H4K突变体编码基因(序列如SEQ ID NO:12所示),表达带有Strep-TagII标签的H4K突变体(序列如SEQ ID NO:11所示)。与野生型HfaB蛋白相比,H4K突变体不同之处仅在于E80A突变。
H4L突变体载体包含H4H突变体编码基因(序列如SEQ ID NO:14所示),表达带有Strep-TagII标签的H4L突变体(序列如SEQ ID NO:13所示)。与野生型HfaB蛋白相比,H4L突变体不同之处仅在于E80I突变。
H4C突变体载体包含H4C突变体编码基因(序列如SEQ ID NO:16所示),表达带有Strep-TagII标签的H4C突变体(序列如SEQ ID NO:15所示)。与野生型HfaB蛋白相比,H4C突变体不同之处仅在于S79N突变。
H4D突变体载体包含H4D突变体编码基因(序列如SEQ ID NO:18所示),表达带有Strep-TagII标签的H4D突变体(序列如SEQ ID NO:17所示)。与野生型HfaB蛋白相比,H4D突变体不同之处仅在于S79W突变。
H4E突变体载体包含H4E突变体编码基因(序列如SEQ ID NO:20所示),表达带有Strep-TagII标签的H4E突变体(序列如SEQ ID NO:19所示)。与野生型HfaB蛋白相比,H4E突变体不同之处仅在于S79I突变。
H4F突变体载体包含H4F突变体编码基因(序列如SEQ ID NO:22所示),表达带有Strep-TagII标签的H4F突变体(序列如SEQ ID NO:21所示)。与野生型HfaB蛋白相比,H4F突变体不同之处仅在于S79A突变。
H5C突变体载体包含H5C突变体编码基因(序列如SEQ ID NO:24所示),表达带有Strep-TagII标签的H5C突变体(序列如SEQ ID NO:23所示)。与野生型HfaB蛋白相比,H5C突变体不同之处仅在于S79V突变。
H5D突变体载体包含H5D突变体编码基因(序列如SEQ ID NO:26所示),表达带有Strep-TagII标签的H5D突变体(序列如SEQ ID NO:25所示)。与野生型HfaB蛋白相比,H5D突变体不同之处仅在于S79L突变。
H5E突变体载体包含H5E突变体编码基因(序列如SEQ ID NO:28所示),表达带有Strep-TagII标签的H5E突变体(序列如SEQ ID NO:27所示)。与野生型HfaB蛋白相比,H5E突变体不同之处仅在于S79Q突变。
H5F突变体载体包含H5F突变体编码基因(序列如SEQ ID NO:30所示),表达带有Strep-TagII标签的H5F突变体(序列如SEQ ID NO:29所示)。与野生型HfaB蛋白相比,H5F突变体不同之处仅在于S79Y突变。
H5G突变体载体包含H5G突变体编码基因(序列如SEQ ID NO:31所示),表达带有Strep-TagII标签的H5G突变体(序列如SEQ ID NO:32所示)。与野生型HfaB蛋白相比,H5G突变体不同之处仅在于S79E突变。
H5H突变体载体包含H5H突变体编码基因(序列如SEQ ID NO:34所示),表达带有Strep-TagII标签的H5H突变体(序列如SEQ ID NO:33所示)。与野生型HfaB蛋白相比,H5H突变体不同之处仅在于S79K突变。
实施例2野生型HfaB及突变体纳米孔蛋白表达、纯化
(1)细菌扩大培养与诱导表达。将实施例1制备的载体分别转入OMP8感受态细胞(OMP8公开于Coupling site-directed mutagenesis with high-level expression:large scale production of mutant porins from E.coli(2018)中,为其中的BL21(DE3)omp8,其通过引用并入本公开)。37℃200rpm过夜培养种子液,取1mL接种至1L LB培养基扩大培养,OD600为1时,降温至26℃,0.2mM IPTG(异丙基硫代半乳糖苷)过夜诱导;(2)收集细胞膜。4000rpm收集菌体,每1L菌用20ml裂解buffer(缓冲液)重悬,超声破碎仪超声破碎2min。18000rpm 4℃离心1小时,收集细胞膜;(3)溶膜。借助玻璃匀浆器用溶膜buffer重悬膜组分(每1L菌的细胞膜用15mL溶膜buffer重悬),在4℃低温下磁力搅拌充分溶膜1h,利用去垢剂将膜蛋白从细胞膜上充分抽提出来。18000rpm 4℃离心1小时,收集上清膜蛋白组分;(4)Strep柱亲和层析。上清与Strep beads(Streptactin Beads 4FF,品牌:天地人和,货号:SA053250)4℃孵育45min,将上清和beads的混合物导入柱子中,重力流穿两遍,10倍柱体积的Wash buffer去除非特异性结合的杂蛋白,5倍柱体积的Elution buffer洗脱目的蛋白。最后,SDS-PAGE胶检测各组分比例及纯度。
裂解buffer:20mM Tris-HCl pH 8.0,150mM NaCl。溶膜buffer:20mM Tris-HCl pH 8.0,150mM NaCl,1%LDAO。Wash buffer:20mM Tris-HCl pH 8.0,150mM NaCl,0.3%LDAO。Elution buffer:20mM Tris-HCl pH 8.0,150mM NaCl,0.1% LDAO,2.5mM脱硫生物素。
SDS-PAGE胶检测结果如图2-图4所示,野生型HfaB蛋白和十六种突变体蛋白都能够正常表达纯化,SDS-PAGE胶上,不加热时(25℃,25℃放置10分钟的蛋白样品)存在寡聚体和单体两种状态,且大多为寡聚体,加热(100℃,100℃加热10分钟的蛋白样品)全部变为单体,因此,野生型HfaB蛋白和十六种突变体蛋白均能够形成稳定的纳米孔孔道。
实施例3野生型HfaB及突变体单限制区纳米孔蛋白测序电流的检测
为检测单限制区纳米孔蛋白的测序性质,构建人工膜与单个纳米孔蛋白系统,以测试通过纳米孔蛋白的电流情况。利用嵌段共聚物/磷脂分子能够自组装形成双分子层的特性,通过将油相和液相界面两次通过 微井支撑阵列表面的方式使嵌段共聚物/磷脂分子自组装形成脂双层,最终双层膜稳定保存于缓冲液(200mM KCl,100mM K3[Fe(CN)6],150mM K4[Fe(CN)6],25mM PBS,pH 8.0)中。
单个纳米孔蛋白(即实施例2制备的野生型HfaB蛋白和十六种突变体蛋白)组装于双层膜后,用2mL上述缓冲液流过系统去除残留的过量纳米孔。在150mV电压下记录纳米孔蛋白孔道电流信号。若纳米孔蛋白孔道电流较稳定,可尝试表征DNA测序性质。将4μL的锚定缓冲液(50nM DNA tether,200mM KCl,25mM PBS,pH 8.0)与196μL测序缓冲液(500mM KCl,30mM MgCl2,30mM ATP,25mM PBS,pH 8.0)混合,流入系统并孵育5分钟;随后将100μL含有组装T4 Dda突变蛋白的DNA待测样品的测序缓冲液(200ng组装T4 Dda突变蛋白的DNA待测样品,500mM KCl,30mM MgCl2,30mM ATP,25mM PBS,pH 8.0)加入系统当中,150mV下运行2h,得到了可分辨的DNA过孔信号。
DNA待测样品按照专利(WO2014135838A1)所记录的方法组装T4 Dda突变蛋白(T4 Dda突变蛋白为解旋酶或控速蛋白,其为专利WO2014135838A1中所述的T4 Dda-E94C/C109A/C136A/A360C)。DNA待测样品序列如SEQ ID NO:38所示。
野生型HfaB纳米孔蛋白电流信号图如图5所示,其中上图为孔道电流信号图,下图为表征DNA测序性质图,野生型HfaB纳米孔蛋白电生理性质均一,但是存在电流状态不稳定、易自发堵塞,同时有较多尖刺样噪音的问题;同时,在单链DNA易位通过孔蛋白收缩区时,形成的测序信号幅度较小。总之,野生型HfaB纳米孔蛋白具有碱基识别能力,可用于DNA测序,但是整体测序性质仍有待提高。
H4G、H4H、H4I、H4J、H4K、H4L突变体纳米孔蛋白电流信号图如图6A和图6B所示,图6A为孔道电流信号图,图6B为表征DNA测序性质图,Glu 80突变的H4G-L突变体相较于野生型HfaB孔道电流噪音减小,但是仍不稳定,在单链DNA易位通过孔蛋白限制区时,形成的过孔信号幅度较小且相较于野生型HfaB没有明显改善。
H4C、H4E、H5C、H5D、H5E、H5F、H5G、H5H突变体纳米孔蛋白电流信号图如图7A和图7B所示,图7A为孔道电流信号图,图7B为表征DNA测序性质图。其中,H4E(S79I)和H5C(S79V)突变体DNA测序性质得到了显著的改善,H4E突变体效果尤其明显,解决了电流不稳定的问题,使孔道电流稳定在0.25-0.3nA;同时H4E与H5C的测序精度得到了明显的提升,测序信号幅度与台阶数都明显增大。H4C、H5D、H5E、H5F、H5G、H5H突变体测序性质并没有得到改善,仍然存在电流状态不稳定,易自发堵塞,存在较多尖刺样噪音及测序信号幅度较小等问题。H4D和H4F难以重组于嵌段共聚物人工膜中,不能用于纳米孔测序。
实施例4双限制区纳米孔蛋白复合物的制备及检测
P6多肽是成熟的HfaA蛋白的N端35个氨基酸组成的多肽,序列如SEQ ID NO:37所示。P6多肽(95%纯度)的平均分子量为4186.58g/mol,疏水性的算术平均值为-1.16,是一种亲水性多肽。
合成P6多肽(95%纯度),溶解P6多肽于ddH2O中使其终浓度为0.5mg/ml。将实施例2制备的纳米孔蛋白H4E、H5C、H5D、H4G、H4K分别与P6多肽按照摩尔比1:2(纳米孔蛋白:多肽)孵育过夜获得蛋白混合物,浓缩蛋白混合物并去除多余多肽,得到双限制区纳米孔蛋白复合物。制备获得H4E-P6、H5C-P6、H5D-P6、H4G-P6、H4K-P6双限制区纳米孔蛋白复合物。
采用如实施例3所述的方法对组装的双限制区纳米孔蛋白复合物进行电流情况的测试。
H4E-P6双限制区纳米孔蛋白复合物的DNA测序性质检测结果如图8,H5C-P6双限制区纳米孔蛋白复合物和H5D-P6双限制区纳米孔蛋白复合物的DNA测序性质检测结果如图9所示,H4G-P6双限制区纳米孔蛋白复合物和H4K-P6双限制区纳米孔蛋白复合物的DNA测序性质检测结果如图10所示。检测结果显示,P6多肽均可以稳定组装到H4E、H5C、H5D、H4G或H4K突变体,形成双限制区纳米孔蛋白复合物。相较于相应的单限制区突变体纳米孔蛋白,双限制区纳米孔蛋白复合物孔道电流明显变小,稳定性明显提高,大小电流起伏情况明显减少。然而,H4E-P6、H5C-P6以及H5D-P6存在DNA样品捕获效率极低的问题。例如,H4E-P6大约40分钟才可观察到一条DNA样品过孔信号,且过孔信号噪音较大;H5C-P6和H5D-P6短时间内没有观察到DNA样品过孔信号。H4G-P6双限制区纳米孔蛋白复合物和H4K-P6双限制区纳米孔蛋白复合物的DNA样品捕获效率与重组前相当,解决了双限制区纳米孔蛋白复合物的DNA样品捕获效率低的问题。H4G-P6和H4K-P6双限制区纳米孔蛋白复合物的DNA样品过孔信号与组装前的H4G和H4K单限制区纳米孔蛋白突变体相似,在单链DNA易位通过孔蛋白限制区时,形成的测序信号幅度同样较小。
实施例5 H4G-P6双限制区纳米孔蛋白复合物原子水平结构解析
实施例4的测序性质检测证实P6多肽可以体外组装H4G突变体,于是本实施例尝试通过体外组装获得H4G-P6双限制区纳米孔蛋白复合物,并解析其结构。通过结构分析P6多肽与H4G相互作用位点,以及P6多肽形成的第二个限制区的氨基酸组成等相关信息,进一步改造优化H4G-P6纳米孔蛋白复合物,提高纳米孔测序准确率,尤其是对均聚物的分辨能力。
冷冻电镜样品制备方法如下:
(1)分子筛层析。用分子筛缓冲液平衡Superose6 TM10/300GL凝胶过滤层析柱,注射器吸取实施例2制备的H4G突变体蛋白上样至上样环,收集寡聚体蛋白,SDS-PAGE胶检测各组分比例及纯度。
分子筛缓冲液:20mM Tris-HCl pH 8.0,150mM NaCl,0.06% LDAO
(2)H4G-P6组装。取分子筛层析峰尖样品,与P6多肽按照摩尔比1:2(H4G突变体:P6多肽)比例孵育过夜获得蛋白混合物,浓缩蛋白混合物至5mg/ml。
(3)冷冻电镜样品制备。制备液态乙烷,设置EMGP仪器温度湿度等参数,载网亲水处理后,吸取3.5ul经过(2)浓缩后的蛋白混合物加至载网,利用EMGP仪器制备冷冻电镜样品。
(4)冷冻电镜样品观察筛选。将制备的冷冻电镜样品上样至Talos F200C观察冷冻电镜样品,筛选用于数据收集的衬度好、分散好、污染少的样品。
(5)冷冻电镜样品数据收集与处理。筛选获得适合收数据的冷冻电镜样品后,利用Titan2电镜完成冷冻电镜数据的收集,选定合适的样品之后保存将要拍照的位置,调节电镜状态,主要包括电镜的合轴、背底扣除及数据收集基本参数(欠焦量-1.2μm到-2.0μm,电子计量拍照帧数32帧和pixl size)的设置,开始数据收集。利用cryoSPARC进行数据处理,经过挑颗粒、二维分类和三维分类,最终获得电子密度图。
(6)原子模型的搭建。利用已经解析的HfaB结构,通过Coot对模型进行调整优化,然后利用Phenix软件中的Real-space refinement进行精修,最终获得H4G-P6双限制区纳米孔蛋白复合物原子模型。为了得到寡聚状态均一的冷冻电镜样品,利用分子筛层析进一步分离纯化H4G突变体蛋白样品,分子筛层析图如图11所示。经过分子筛层析后的H4G及经过(2)浓缩后的的H4G-P6蛋白SDS-PAGE胶图如图12所示。P6多肽单体分子量只有4.2kD,SDS-PAGE胶图无法分辨。P6多肽组装前H4G突变体蛋白100℃加热10min所有的寡聚体完全变为单体,P6多肽组装后同样的加热条件,只有部分寡聚体变为单体,说明P6多肽与H4G突变体成功组装形成H4G-P6纳米孔蛋白复合物,且多肽组装后H4G突变体寡聚状态变得更加稳定。这一现象与实施例4中P6多肽组装后孔道电流变小,孔道电流稳定性明显提高相互印证。经过冷冻电镜样品观察筛选,优化出了可用于冷冻电镜样品收集的衬度好、分散好、污染少的样品(图13,圆圈所示为H4G-P6纳米孔蛋白复合物颗粒)。经过数据收集、处理,原子模型搭建后解析了H4G-P6双限制区纳米孔蛋白复合物原子水平结构电子密度图如图14所示,H4G-P6双限制区纳米孔蛋白复合物原子模型如图15所示,H4G-P6双限制区纳米孔蛋白复合物双限制区构象图如图16所示,九个H4G单体组成了一个稳定的H4G纳米孔蛋白,其中,Phe77、Ser79和Gln80三个氨基酸组成了H4G-P6双限制区纳米孔蛋白复合物的第一限制区,直径为九个P6多肽倾斜插入H4G纳米孔蛋白β桶内,其中,九个P6多肽的Asn15组成了H4G-P6双限制区纳米孔蛋白复合物的第二限制区,直径为第一限制区和第二限制区在H4G纳米孔道内同轴间隔开。P6多肽N端为一段长的Loop结构域,C端为α螺旋及一段伸出腔外的短的Loop结构域。每个P6多肽与相邻近的三个H4G单体相互作用,增强了双限制区纳米孔蛋白复合物的稳定性。P6多肽与H4G单体相互作用主要包括静电相互作用、氢键、疏水相互作用和范德华力。P6多肽的Glu5、Arg7、Arg12和Glu16分别与H4G单体的Arg184、Glu81、Asp233和Lys235之间存在静电相互作用,P6多肽的Arg17与H4G单体的Asp233和Glu173之间存在静电相互作用。P6多肽的Asn1、Tyr9、Arg12和Glu24分别与H4G单体的Glu242、Ser169、Ser222和Phe226通过氢键相互作用。疏水作用主要存在于P6多肽的Leu20、Leu30与H4G单体的Phe224、Phe226之间,及P6多肽的Phe11与Phe220、Ile172之间。P6多肽的Asn1与H4G单体的Gln241、Val215、Val188及Ala187之间存在范德华力,P6多肽的Tyr9、Phe11、Thr26、Leu30、Gln31分别与H4G单体的Gly171、Glu173、Phe226、Asp229、Asp229之间存在范德华力(图17)。
实施例6 P1A-P6双限制区纳米孔蛋白复合物的制备
(1)P1A突变体蛋白载体的制备
采用单点突变PCR的方式构建突变载体,以实施例1制备的H4E载体为模板,设计引物(P1A-F和P1A-R),PCR扩增载体获得PCR产物,DpnI酶(NEB公司,货号:R0176L)消化PCR产物,将消化得到的PCR产物转入DH5α感受态细胞中进行阳性克隆筛选。挑取单菌落测序,试剂盒提质粒-20℃保存备用。采用前述方法制备P1A突变体蛋白载体,其包含P1A突变体编码基因(序列如SEQ ID NO:36所示),表达带有Strep-TagII标签的P1A突变体(序列如SEQ ID NO:35所示)。与野生型HfaB蛋白相比,P1A突变体不同之处在于S79I和E80Q突变。
P1A-F:CAGGAAGCGGGCAACTATCTGC;
P1A-R:GTTGCCCGCTTCCTGAATGCTATAGCGGCCGGTC。
(2)P1A突变体蛋白的制备
采用与实施例2相同的方法制备P1A突变体蛋白。
(3)P1A-P6双限制区纳米孔蛋白复合物的制备
合成对应的P6多肽(95%纯度),溶解P6多肽于ddH2O中使终浓度为0.5mg/ml。将P1A突变体蛋白与多肽按照摩尔比1:2(P1A突变体蛋白:多肽)孵育过夜获得蛋白混合物,浓缩蛋白混合物并去除多余多肽,得到双限制区纳米孔蛋白复合物P1A-P6。
实施例7 P1A-P6双限制区纳米孔蛋白复合物的检测
采用实施例3相同的检测方法,测试P1A-P6是否具有孔道电流明显减小等表征重组成功的性质,确认重组成功后继续对其进行DNA测序性质检测。
检测结果如图18所示。孔道电流显示,P1A-P6双限制区纳米孔蛋白复合物的孔道电流明显减小,噪音减小且尖刺样噪音的幅度降低,表明P1A和P6重组成功,形成P1A-P6双限制区纳米孔蛋白复合物。过孔信号显示,P1A-P6双限制区纳米孔蛋白复合物的DNA样品捕获率显著提高,测序稳定性相较于H4E明显增强,而测序精度与H4E相当。
以上对本发明进行了详述。对于本领域技术人员来说,在不脱离本发明的宗旨和范围,以及无需进行不必要的实验情况下,可在等同参数、浓度和条件下,在较宽范围内实施本发明。虽然本发明给出了特殊的实施例,应该理解为,可以对本发明作进一步的改进。总之,按本发明的原理,本申请欲包括任何变更、用途或对本发明的改进,包括脱离了本申请中已公开范围,而用本领域已知的常规技术进行的改变。按以下附带的权利要求的范围,可以进行一些基本特征的应用。

Claims (10)

  1. 一种双限制区纳米孔蛋白复合物,其特征在于,所述纳米孔蛋白复合物包含HfaB蛋白和截短的HfaA蛋白,所述截短的HfaA蛋白连接所述HfaB蛋白并在所述纳米孔蛋白复合物中形成限制区。
  2. 根据权利要求1所述的双限制区纳米孔蛋白复合物,其特征在于,所述截短的HfaA蛋白为如下任一所述的多肽:
    (a1)氨基酸序列为HfaA蛋白N端23-35个氨基酸所示的多肽;
    (a2)将HfaA蛋白N端23-35个氨基酸所示的氨基酸序列经过一个或几个氨基酸的取代和/或缺失和/或添加且具有相同功能的多肽;
    (a3)与(a1)-(a2)中任一所限定的氨基酸序列具有80%以上同一性且具有相同功能的多肽;
    (a4)在(a1)-(a3)中任一所限定的多肽的末端连接标签后得到的融合多肽。
    优选地,所述截短的HfaA蛋白插入所述HfaB蛋白的腔中。
  3. 根据权利要求1或2所述的双限制区纳米孔蛋白复合物,其特征在于,所述HfaB蛋白包含9个HfaB蛋白单体。
    优选地,所述纳米孔蛋白中,所述HfaB蛋白单体与所述截短的HfaA蛋白的比例为1:1。
  4. 根据权利要求3所述的双限制区纳米孔蛋白复合物,其特征在于,所述HfaB蛋白单体为如下任一所述的蛋白单体:
    (b1)氨基酸序列为SEQ ID NO:1所示的蛋白单体;
    (b2)将SEQ ID NO:1所示的氨基酸序列经过一个或几个氨基酸残基的取代和/或缺失和/或添加且具有相同功能的蛋白单体;
    (b3)与(b1)-(b2)中任一所限定的氨基酸序列具有80%以上同一性且具有相同功能的蛋白单体;
    (b4)在(b1)-(b3)中任一所限定的蛋白单体的末端连接标签后得到的融合蛋白单体。
    优选地,(b2)所述蛋白单体包含如下至少一个取代:
    第79位的丝氨酸被天门冬酰胺、色氨酸、异亮氨酸、丙氨酸、缬氨酸、亮氨酸、酪氨酸、谷氨酸或赖氨酸取代;
    第80位的谷氨酸被谷氨酰氨、天门冬酰胺、丝氨酸、色氨酸、丙氨酸或异亮氨酸取代。
  5. 一种用于产生权利要求1-4任一所述双限制区纳米孔蛋白复合物的方法,其特征在于,所述方法包括如下A1或A2:
    A1使一个或多个HfaB蛋白单体和截短的HfaA蛋白在宿主细胞中共表达,从而允许在所述细胞中形成所述双限制区纳米孔蛋白复合物;
    A2使一个或多个HfaB蛋白单体与截短的HfaA蛋白接触,从而允许在体外形成所述双限制区纳米孔蛋白复合物。
  6. 权利要求1-4任一所述的双限制区纳米孔蛋白复合物的相关生物材料,其特征在于,所述相关生物材料为所述生物材料为下述任一种:
    c1)编码权利要求1-4任一所述的双限制区纳米孔蛋白复合物的核酸分子;
    c2)含有c1)所述核酸分子的表达盒;
    c3)含有c1)所述核酸分子的重组载体、或含有c2)所述表达盒的重组载体;
    c4)含有c1)所述核酸分子的重组细胞、或含有c2)所述表达盒的重组细胞、或含有c3)所述重组载体的重组细胞。
  7. 权利要求1-4任一所述的双限制区纳米孔蛋白复合物或权利要求6所述的相关生物材料在检测靶分析物存在、不存在或一个或多个特征或制备检测靶分析物存在、不存在或一个或多个特征的产品中的应用;
    优选地,所述靶分析物为核苷酸、核酸、氨基酸、寡聚肽、多肽、蛋白质中的一种或多种。
  8. 一种用于确定靶分析物存在、不存在或一个或多个特征的方法,其特征在于,所述方法包括:
    A.所述靶分析物与权利要求1-4任一所述的双限制区纳米孔蛋白复合物接触,使得所述靶分析物相对于所述双限制区纳米孔蛋白复合物移动;
    B.在所述靶分析物相对于所述双限制区纳米孔蛋白复合物移动时获取一个或多个测量值,从而确定所述靶分析物的存在、不存在或一个或多个特征;
    优选地,所述靶分析物为核苷酸、核酸、氨基酸、寡聚肽、多肽、蛋白质中的一种或多种。
  9. 一种用于确定靶分析物存在、不存在或一个或多个特征的试剂盒或装置,其特征在于,所述试剂盒包括权利要求1-4任一所述的双限制区纳米孔蛋白复合物或权利要求6所述的相关生物材料,和膜;
    所述装置包括权利要求1-4任一所述的双限制区纳米孔蛋白复合物,和膜;
    优选地,所述靶分析物为核苷酸、核酸、氨基酸、寡聚肽、多肽、蛋白质中的一种或多种。
  10. 权利要求1中所述的截短的HfaA蛋白在制备双限制区纳米孔蛋白复合物中的应用。
PCT/CN2024/093629 2024-05-16 2024-05-16 一种双限制区纳米孔蛋白复合物及其应用 Pending WO2025236237A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/093629 WO2025236237A1 (zh) 2024-05-16 2024-05-16 一种双限制区纳米孔蛋白复合物及其应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/093629 WO2025236237A1 (zh) 2024-05-16 2024-05-16 一种双限制区纳米孔蛋白复合物及其应用

Publications (1)

Publication Number Publication Date
WO2025236237A1 true WO2025236237A1 (zh) 2025-11-20

Family

ID=97719224

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/093629 Pending WO2025236237A1 (zh) 2024-05-16 2024-05-16 一种双限制区纳米孔蛋白复合物及其应用

Country Status (1)

Country Link
WO (1) WO2025236237A1 (zh)

Similar Documents

Publication Publication Date Title
US11945840B2 (en) Protein pores
JP7499761B2 (ja) 細孔
CN109627344B (zh) cAMP荧光探针及其应用
AU2020389020A1 (en) Artificial nanopores and uses and methods relating thereto
CN117886907B (zh) 一种pht纳米孔突变体蛋白及其应用
CN114957415B (zh) 链霉亲和素突变体及其应用和产品、基因、重组质粒和基因工程菌
WO2025236237A1 (zh) 一种双限制区纳米孔蛋白复合物及其应用
CN120965840A (zh) 一种双限制区纳米孔蛋白复合物及其应用
CN120349388A (zh) 一种突变体pht纳米孔蛋白单体及其应用
WO2025236239A1 (zh) 一种双收缩区纳米孔蛋白复合物及其应用
WO2025236238A1 (zh) 一种纳米孔蛋白单体及其应用
WO2025255780A1 (zh) 纳米孔蛋白复合物、其构建方法及应用
WO2024138565A1 (zh) 纳米孔蛋白及其突变体和应用