WO2025166050A1 - Séquençage à base de nanopores de peptides - Google Patents
Séquençage à base de nanopores de peptidesInfo
- Publication number
- WO2025166050A1 WO2025166050A1 PCT/US2025/013852 US2025013852W WO2025166050A1 WO 2025166050 A1 WO2025166050 A1 WO 2025166050A1 US 2025013852 W US2025013852 W US 2025013852W WO 2025166050 A1 WO2025166050 A1 WO 2025166050A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amino acid
- molecule
- peptide
- linker
- polymerizable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/483—Physical analysis of biological material
- G01N33/487—Physical analysis of biological material of liquid biological material
- G01N33/48707—Physical analysis of biological material of liquid biological material by electrical means
- G01N33/48721—Investigating individual macromolecules, e.g. by translocation through nanopores
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
- G01N33/6824—Sequencing of polypeptides involving N-terminal degradation, e.g. Edman degradation
Definitions
- Protein signaling underpins a variety of cellular processes and serve important functions in viruses, cells, and living organisms.
- current technologies for studying proteins are limited in selectivity, sensitivity, throughput, or require a priori knowledge.
- new approaches for characterizing and analyzing proteins is needed.
- a method of the present disclosure may comprise attaching a plurality of polymerizable molecules to amino acids of a peptide, thereby generating modified amino acids.
- the modified amino acids may be generated via an intramolecular expansion process.
- One or more processes described herein may involve sequencingvia a nanopore or nanogap sequencer, allowing for identification of individual amino acids of the peptide in the order in which they appear or occur in the peptide.
- the methods, systems, compositions, and kits provided herein enable high-throughput single-molecule protein sequencing with high accuracy.
- a method for sequencing a peptide comprising a plurality of amino acids comprising: (a) providing a plurality of modified amino acids generated from at least a sub set of the plurality of amino acids, wherein the plurality of modified amino acids comprises a plurality of polymerizable molecules; and (b) sequencing the plurality of modified amino acids, thereby determining an amino acid identity of each modified amino acid of the plurality of modified amino acids; wherein the sequencing has an average read accuracy that is greater than 80% for at least 3 different modified amino acid types.
- a method for sequencing a peptide at sub-attomole resolution comprising a plurality of amino acids, comprising: (a) providing a plurality of modified amino acids generated from atleast a subset of the plurality of amino acids, wherein the plurality of modified amino acids comprises a plurality of polymerizable molecules; and (b) sequencing the plurality of modified amino acids, thereby determining an amino acid identity of each modified amino acid of the plurality of modified amino acids; wherein the sequencing has an individual identification accuracy that is greater than 80% for at least 3 different modified amino acid types.
- the plurality of polymerizable molecules is covalently linked.
- the plurality of polymerizable molecules comprises a plurality of nucleic acid molecules.
- the plurality of nucleic acid molecules comprises partially double-stranded DNA molecules.
- the plurality of nucleic acid molecules comprises branched nucleic acid molecules.
- the plurality of modified amino acids comprises a modified amino acid that comprises a non-naturally occurring chemical modification.
- the non-naturally occurring chemical modification is a protecting group.
- the non-naturally occurring chemical modification comprises phenylisothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the non-naturally occurring chemical modification comprises a click chemistry moiety.
- the sequencing is performed using a nanopore or a nanogap.
- the nanopore comprises a transmembrane protein.
- the nanogap comprises an inorganic material.
- (b) comprises translocating the plurality of modified amino acids through or adjacent to the nanopore or the nanogap; measuring a plurality of signals from the plurality of modified amino acids; and using the plurality of signals to determine the amino acid identity of each modified amino acid.
- the sequencing is perf ormed under conditions sufficient to reduce a translocation speed of the plurality of modified amino acids through the nanopore as compared to a control condition.
- the conditions sufficient to reduce the translocation speed comprise an increased viscosity, an addition of a radical agent, an addition of a DNA-binding peptide, an addition of ATP inhibitors, an addition of metal ions, an addition of an intercalating dye, a decreased temperature, an altered pH, an altered voltage, an addition of a strand displacing enzyme, replacement or addition of a helicase, addition of replication protein A, addition of a DNA repair or replication protein, addition of a chromatin remodeling protein, or addition of a denaturant, or a combination thereof, as compared to the control condition.
- the conditions sufficient to reduce said translocation speed comprise an increased viscosity, an addition of a radical agent, an addition of a DNA-binding peptide, an addition of ATP inhibitors, an addition of metal ions, an addition of an intercalating dye, a decreased temperature, an altered pH, an altered voltage, an addition of a strand displacing enzyme, replacement or addition of a helicase, addition of replication protein A, addition of a denaturant, or a combination thereof as compared to the control condition.
- the method further comprises, prior to (a), generating the plurality of modified amino acids.
- the generating comprises (I) providing a linker and a polymerizable molecule, (II) coupling the linker and the polymerizable molecule to an amino acid of a peptide, thereby generating an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid, wherein the modified amino acid comprises a cleaved amino acid, the linker, and the polymerizable molecule.
- (II) comprises coupling the linker to (i) the amino acid of the peptide and (ii) the polymerizable molecule.
- the linker is pre-coupled to the polymerizable molecule.
- the method further comprises contacting the modified amino acid with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic or hydrophilic moiety.
- the method further comprises repeating (I)-(III).
- the method further comprises derivatizingthe modified amino acid.
- the method further comprises (IV) coupling the amino acidlinker complex to a capture moiety.
- the capture moiety is coupled to a substrate. In some embodiments, (I), (II), (III), (IV) ora combination thereofare performed on the substrate.
- the capture moiety comprises a cleavable moiety and further comprising, cleaving the cleavable moiety. In some embodiments, the cleaving occurs subsequent to (II) and prior to (III). In some embodiments, the method further comprises, (V) removing the amino acid-linker complex from the substrate. In some embodiments, the removing comprises an enzymatic digestion, heat denaturation, or toehold-mediated strand displacement. In some embodiments, the capture moiety is coupled to the substrate via an anchor molecule. In some embodiments, the anchor molecule or the capture moiety comprises a PEG linker.
- the substrate comprises a plurality of anchor molecules configured to couple to the capture moiety, wherein an average distance between the plurality of anchor molecules is greater than 100 nanometers.
- the anchor molecule comprises a peptide nucleic acid (PNA).
- the capture moiety is coupled to the peptide.
- the capture moiety is coupledto a C-terminus of the peptide.
- the capture moiety comprises a nucleic acid barcode molecule.
- the nucleic acid barcode molecule identifies the peptide.
- the nucleic acid barcode molecule comprises temporal information. In some embodiments, (IV) is performed priorto (III).
- the method further comprises repeating (I)-(IV) on the peptide. In some embodiments, the repeating yields a stacked plurality of modified amino acids. In some embodiments, the repeating yields a plurality of detectable products, wherein the plurality of detectable products comprises a plurality of modified amino acids that are not coupled to one another. In some embodiments, the method further comprises concatemerizing at least a portion of the detectable products, thereby generating a stacked plurality of modified amino acids. In some embodiments, the method further comprises cleaving the modified amino acid from the capture moiety. In some embodiments, the capture moiety comprises a cleavable moiety.
- the polymerizable molecule and the capture moiety comprise nucleic acid molecules.
- the coupling of (IV) comprises hybridization.
- the hybridization is performed using a splint oligonucleotide.
- splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the coupling of (IV) comprises ligation.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the ligation is performed using a splint oligonucleotide.
- the splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the linker comprises an isothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the linker is coupled to the polymerizable molecule.
- the linker is coupled to the polymerizable molecule using click chemistry.
- the polymerizable molecule is a DNA molecule comprising a click chemistry moiety.
- the click chemistry moiety is coupled to a nucleobase of the DNA molecule. In some embodiments, the click chemistry moiety is coupled to a backbone of the DNA molecule. In some embodiments, the amino acid is a terminal amino acid. In some embodiments, the linker comprises a charged moiety.
- a method for sequencing a peptide comprising a plurality of amino acids comprising: (a) providing the peptide; and (b) sequencing the peptide, thereby identifying at least 2 contiguous amino acids of the peptide; wherein the sequencing has an average read accuracy that is greater than 80% for at least 3 different amino acid types.
- (b) comprises generating a plurality of modified amino acids from the peptide and identifying the plurality of modified amino acids.
- the generating comprises (I) providing, a linker and a polymerizable molecule, (II) couplingthe linker to (i) an amino acid of a peptide and (ii) the polymerizable molecule to generate an amino acidlinker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid, wherein the modified amino acid comprises a cleaved amino acid, the linker, and the polymerizable molecule.
- the method further comprises contacting the modified amino acid with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic or hydrophilic moiety.
- the linker comprises a charged moiety.
- the method further comprises, repeating (I)- (III).
- the method further comprises, derivatizingthe modified amino acid.
- the method further comprises (IV) couplingthe amino acid-linker complex to a capture moiety.
- the capture moiety is coupled to a substrate.
- (I), (II), (III), (IV) or a combination thereof are performed on the substrate.
- the capture moiety comprises a cleavable moiety and further comprising cleaving the cleavable moiety. In some embodiments, the cleaving occurs subsequent to (II) and prior to (III). In some embodiments, the method further comprises, (V) removing the amino acidlinker complex from the substrate. In some embodiments, the removing comprises an enzymatic digestion, heat denaturation, ortoehold-mediated strand displacement. In some embodiments, the capture moiety is coupled to the substrate via an anchor molecule. In some embodiments, the anchor molecule or the capture moiety comprises a PEG linker.
- the substrate comprises a plurality of anchor molecules configured to couple to the capture moiety, wherein an average distance between the plurality of anchor molecules is greater than 100 nanometers.
- the anchor molecule comprises a peptide nucleic acid (PNA).
- the capture moiety is coupled to the peptide.
- the capture moiety is coupled to a C-terminus of the peptide.
- the capture moiety comprises a nucleic acid barcode molecule.
- the nucleic acid barcode molecule identifiesthe peptide.
- the nucleic acid barcode molecule comprises temporal information. In some embodiments, (IV) is performed prior to (III).
- the method further comprises repeating (I)-(IV) on the peptide. In some embodiments, the repeating yields a stacked plurality of modified amino acids. In some embodiments, the repeating yields a plurality of detectable products, wherein the plurality of detectable products comprises a plurality of modified amino acids that are not coupled to one another. In some embodiments, the method further comprises concatemerizing at least a portion of the detectable products, thereby generating a stacked plurality of modified amino acids. In some embodiments, the method further comprises cleaving the modified amino acid from the capture moiety. In some embodiments, the capture moiety comprises a cleavable moiety.
- the polymerizable molecule and the capture moiety comprise nucleic acid molecules.
- the coupling of (IV) comprises hybridization.
- the hybridization is performed using a splint oligonucleotide.
- splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the coupling of (IV) comprises ligation.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the ligation is performed using a splint oligonucleotide.
- the splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the linker comprises an isothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the linker is coupled to the polymerizable molecule.
- the linker is coupled to the polymerizable molecule using click chemistry.
- the polymerizable molecule is a DNA molecule comprising a click chemistry moiety.
- the click chemistry moiety is coupled to a nucleobase of the DNA molecule. In some embodiments, the click chemistry moiety is coupled to a backbone of the DNA molecule. In some embodiments, the amino acid is a terminal amino acid. In some embodiments, the linker comprises a charged moiety.
- a method for sequencing a peptide comprising: (a) providing a modified amino acid generated from the peptide, wherein the modified amino acid comprises a polymerizable molecule; (b) translocating the modified amino acid through or adjacent to a nanopore, wherein (b) is performed under conditions sufficient to reduce a translocation speed of the modified amino acid through the nanopore as compared to a control condition; (c) measuring a signal generated from the modified amino acid during (b); and (d) using the signal generated from the modified amino acid to determine an identity of the modified amino acid.
- the polymerizable molecule comprises a nucleic acid molecule.
- the nucleic acid molecule is a partially double-stranded DNA molecule.
- the nucleic acid molecule is a branched nucleic acid molecule.
- the nucleic acid molecule comprises a modified base or modified nucleic acid backbone.
- the modified base or modified nucleic acid backbone is selected from the group consisting of a locked nucleic acid (LNA), a phosphoramidite, a click chemistry- conjugated base or click chemistry-conjugated sugar backbone, a spacer moiety, and a combination thereof.
- (b) occurs at ambient temperature.
- the conditions sufficient to reduce the translocation speed comprise an increased viscosity, an addition of a radical agent, an addition of a DNA-binding peptide, an addition of ATP inhibitors, an addition of metal ions, an addition of an intercalating dye, a decreased temperature, an altered pH, an altered voltage, an addition of a strand displacing enzyme, replacement or addition of a helicase, addition of replication protein A, or addition of a denaturant, as compared to the control condition.
- the modified amino acid comprises a non-naturally occurring chemical modification.
- the non-naturally occurring chemical modification is a protecting group.
- the non-naturally occurring chemical modification comprises phenylisothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the non-naturally occurring chemical modification comprises a click chemistry moiety.
- the nanopore comprises a transmembrane protein.
- the method further comprises, prior to (a), generating the modified amino acid.
- the generating comprises (I) providing, a linker and a polymerizable molecule, (II) coupling the linker to (i) an amino acid of the peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid, wherein the modified amino acid comprises a cleaved amino acid, the linker, and the polymerizable molecule.
- the method further comprises contactingthe modified amino acid with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic or hydrophilic moiety.
- the linker comprises a charged moiety.
- the method further comprises repeating (I)-(III).
- the method further comprises derivatizing the modified amino acid.
- the method further comprises (IV) coupling the amino acid-linker complex to a capture moiety.
- the capture moiety is coupled to a substrate.
- (I), (II), (III), (IV) or a combination thereof are performed on the substrate.
- the capture moiety comprises a cleavable moiety and further comprising, cleaving the cleavable moiety. In some embodiments, the cleavingoccurs subsequentto (II) andpriorto (III). In some embodiments, the method further comprises, (V) removing the amino acid-linker complex from the substrate. In some embodiments, the removing comprises an enzymatic digestion, heat denaturation, or toehold- mediated strand displacement. In some embodiments, the capture moiety is coupled to the substrate via an anchor molecule. In some embodiments, the anchor molecule or the capture moiety comprises a PEG linker.
- the substrate comprises a plurality of anchor molecules configured to couple to the capture moiety, wherein an average distance between the plurality of anchor moleculesis greater than 100 nanometers.
- the anchor molecule comprises a peptide nucleic acid (PNA).
- the capture moiety is coupled to the peptide.
- the capture moiety is coupled to a C- terminus of the peptide.
- the capture moiety comprises a nucleic acid barcode molecule.
- the nucleic acid barcode molecule identifies the peptide.
- the nucleic acid barcode molecule comprises temporal information. In some embodiments, (IV) is performed prior to (III).
- the method further comprises repeating (I)-(IV) on the peptide. In some embodiments, the repeating yields a stacked plurality of modified amino acids. In some embodiments, the repeating yields a plurality of detectable products, wherein the plurality of detectable products comprises a plurality of modified amino acids that are not coupled to one another. In some embodiments, the method further comprises concatemerizing at least a portion of the detectable products, thereby generating a stacked plurality of modified amino acids. In some embodiments, the method further comprises cleaving the modified amino acid from the capture moiety. In some embodiments, the capture moiety comprises a cleavable moiety.
- the polymerizable molecule and the capture moiety comprise nucleic acid molecules.
- the coupling of (IV) comprises hybridization.
- the hybridization is performed using a splint oligonucleotide.
- splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the coupling of (IV) comprises ligation.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the ligation is performed using a splint oligonucleotide.
- the splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the linker comprises an isothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the linker is coupled to the polymerizable molecule.
- the linker is coupled to the polymerizable molecule using click chemistry.
- the polymerizable molecule is a DNA molecule comprising a click chemistry moiety.
- the click chemistry moiety is coupled to a nucleobase of the DNA molecule. In some embodiments, the click chemistry moiety is coupled to a backbone of the DNA molecule. In some embodiments, the amino acid is a terminal amino acid. In some embodiments, the linker comprises a charged moiety.
- (d) comprises determining a probability that a modified amino acid is an amino acid type or a subset of amino acid types.
- a method of processing a peptide comprising: (a) providing the peptide and a linker, wherein the linker is capable of coupling to an amino acid of the peptide and wherein the linker is coupled to a nucleic acid molecule; (b) coupling the linker to the amino acid of the peptide to generate an amino acid-linker complex; (c) couplingthe nucleic acid molecule to a capture moiety; (d) cleavingthe amino acid from the peptide to yield a modified amino acid comprising a cleaved amino acid, the linker, and the nucleic acid molecule; and (e) performing a nucleic acid extension reaction of the nucleic acid molecule, thereby generating a detectable product comprising the modified amino acid.
- the method further comprises contacting the modified amino acid or the detectable product with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic or hydrophilic moiety.
- the linker comprises a charged moiety.
- the modified amino acid comprises a non-naturally occurring chemical modification.
- the non-naturally occurring chemical modification is a protecting group.
- the non-naturally occurring chemical modification comprises phenylisothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the non-naturally occurring chemical modification comprises a click chemistry moiety.
- the nucleic acid molecule is a branched nucleic acid molecule.
- the method further comprises sequencing the detectable product.
- the sequencing is performed using a nanopore or a nanogap.
- the nanopore comprises a transmembrane protein.
- the nanogap comprises an inorganic material.
- the sequencing comprises translocating the detectable product through or adjacent to the nanopore or the nanogap; measuring a signal from the detectable product; and using the signal to determine an amino acid identity of the detectable product.
- the sequencing is performed under conditions sufficient to reduce a translocation speed of the detectable product through the nanopore as compared to a control condition.
- the conditions sufficient to reduce the translocation speed comprise an increased viscosity, an addition of a radical agent, an addition of a DNA-binding peptide, an addition of ATP inhibitors, an addition of metal ions, an addition of an intercalating dye, a decreased temperature, an altered pH, an altered voltage, an addition of a strand displacing enzyme, replacement or addition of a helicase, addition of replication protein A, or addition of a denaturant, as compared to the control condition.
- the capture moiety is coupled to a substrate. In some embodiments, the capture moiety is coupled to the substrate via a PEG linker. In some embodiments, the sub strate comprises a plurality of capture moieties, wherein an average distance between the plurality of capture moieties is greater than 100 nanometers.
- the capture moiety is coupled to the peptide. In some embodiments, the capture moiety is coupled to a C-terminus of the peptide.
- the capture moiety comprises a nucleic acid barcode molecule.
- the nucleic acid barcode molecule identifies the peptide.
- the nucleic acid barcode molecule comprises temporal information.
- the method further comprises repeating (a)-(e) on the peptide.
- the repeating yields a plurality of detectable products that are not coupled to one another.
- the method further comprises releasingthe modified amino acid from the capture moiety.
- the capture moiety comprises a cleavable moiety.
- the releasing comprises dehybridization.
- the capture moiety comprises an additional nucleic acid molecule.
- the coupling of (c) comprises hybridization.
- the hybridization is performed using a splint oligonucleotide.
- the coupling of (c) comprises ligation.
- the linker comprises an isothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the linker is coupled to the nucleic acid molecule using click chemistry.
- the amino acid is a terminal amino acid.
- a method of generating increased reads of a modified amino acid comprising (a) providing the modified amino acid, wherein the modified amino acid comprises a polymerizable molecule; (b) translocating the modified amino acid through or adjacent to a nanopore or a nanogap; (c) circularizing the polymerizable molecule thereby generating a circularized, modified amino acid; and (d) translocating the circularized, modified amino acid through or adjacent to the nanopore or the nanogap.
- the method further comprises, during (b) and (d), measuring a signal generated from the nanopore or nanogap while the modified amino acid translocates through the nanopore or the nanogap. In some embodiments, the method further comprises using the signal to determine an identity of the modified amino acid.
- the polymerizable molecule comprises a nucleic acid molecule.
- the modified amino acid comprises a non-naturally occurring chemical modification.
- the non-naturally occurring chemical modification is a protecting group.
- the non-naturally occurring chemical modification comprises phenylisothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the non-naturally occurring chemical modification comprises a click chemistry moiety.
- the nanopore comprises a transmembrane protein. In some embodiments, the nanogap comprises an inorganic material.
- (b) is performed under conditions sufficient to reduce a translocation speed of the modified amino acid through the nanopore as compared to a control condition.
- the conditions sufficient to reduce the translocation speed comprise an increased viscosity, an addition of a radical agent, an addition of a DNA-binding peptide, an addition of ATP inhibitors, an addition of metal ions, an addition of an intercalating dye, a decreased temperature, an altered pH, an altered voltage, an addition of a strand displacing enzyme, replacement or addition of a helicase, addition of replication protein A, or addition of a denaturant, as compared to the control condition.
- the method further comprises, generating the modified amino acid wherein the generating comprises (I) providing, a linker and a polymerizable molecule, (II) coupling the linker to (i) an amino acid of a peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating the modified amino acid, wherein the modified amino acid comprises a cleaved amino acid, the linker, and the polymerizable molecule.
- the method further comprises contacting the modified amino acid with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic or hydrophilic moiety. In some embodiments, the method further comprises repeating (I)-(III) on the peptide.
- a method for sequencing a peptide comprising: (a) providing a modified amino acid generated from the peptide, wherein the modified amino acid comprises a polymerizable molecule compri sing M onomers, wherein M is a positive integer; (b) translocating the modified amino acid through or adjacent to a nanopore, wherein the translocating comprises ratcheting of a first monomer of the M monomers through or adjacent to the nanopore; (c) during (b), measuring a first state of a first set of N monomers of the M monomers, wherein N ⁇ M; wherein the first state is associated with the ratcheting of the first monomer; (d) ratcheting a second monomer of theM monomers through die nanopore; (e) measuring a second state of a second set of N monomers of the M monomers; wherein the second state is associated with the ratcheting of the second monomer; (f) repeating (d)-(e)
- the N measured states correspond to N contiguous monomers of the M monomers. In some embodiments, the N measured states correspond to N monomers of the M monomers, wherein a subset of the N monomers is non-contiguous. In some embodiments, the N measured states correspond to fewer than N monomers. In some embodiments, the N measured states correspond to a non-integer value of monomers. In some embodiments, the measuring comprises measuring an ionic current blockade. In some embodiments, the N measured states represent a conformational or orientational state of theN monomers. In some embodiments, (g) comprises comparing the N measured states to a reference measurement of known modified amino acid types.
- the method further comprises, repeating (a)-(g) for a plurality of modified amino acids generated from the peptide, thereby sequencing the peptide.
- a method for sequencing at sub-attomole resolution a peptide comprising a plurality of amino acids, comprising: (a) providing a plurality of modified amino acids generated from at least a subset of the plurality of amino acids, wherein the plurality of modified amino acids comprises a plurality of polymerizable molecules; and (b) sequencing the plurality of modified amino acids, thereby determining an amino acid identity of each modified amino acid of the plurality of modified amino acids; wherein the sequencing has an average identification accuracy that is greater than 80% for at least 3 different amino acid types of the 20 canonical proteinogenic amino acid types.
- Another aspect of the present disclosure provides a method for peptide sequencing comprising sequencing a peptide or plurality of peptides at sub-attomole resolution, wherein the sequencing is capable of discriminating all 20 proteinogenic amino acids.
- a method for peptide sequencing comprising sequencing a peptide or plurality of using a nanopore, wherein the sequencing is capable of discriminating all 20 proteinogenic amino acids.
- the sequencing is capable of discriminating post-translationally modified amino acids, which may be synthetic or naturally occurring.
- (a) comprises generating a plurality of modified amino acids from the peptide and identifying the plurality of modified amino acids.
- the generating comprises (I) providing a linker and a polymerizable molecule, (II) coupling the linker to (i) an amino acid of the peptide and (ii) the polymerizable molecule to generate an amino acid-linker complex, and (III) cleaving the amino acid, thereby generating a modified amino acid of the plurality of modified amino acids, wherein the modified amino acid comprises a cleaved amino acid, the linker, and the polymerizable molecule.
- the method further comprises contacting the modified amino acid with a helper molecule, wherein the helper molecule comprises a charged moiety, a chelator, or a hydrophobic hydrophilic moiety.
- the linker comprises a charged moiety. In some embodiments, the method further comprises repeating (I)- (III). In some embodiments, the method further comprises derivatizing the modified amino acid. In some embodiments, the method further comprises (IV) couplingthe amino acid-linker complex to a capture moiety. In some embodiments, the capture moiety is coupled to a substrate. In some embodiments, the capture moiety is coupled to the substrate via a PEG linker. In some embodiments, (I), (II), (III), (IV) or a combination thereof are performed on the substrate. In some embodiments, (I), (II), (III), (IV) or a combination thereof are performed apartfromthe substrate.
- the capture moiety comprises a cleavable moiety and further comprising cleaving the cleavable moiety. In some embodiments, the cleaving occurs subsequent to (II) and prior to (III). In some embodiments, the method further comprises, (V) removing the amino acidlinker complex from the substrate. In some embodiments, the removing comprises an enzymatic digestion, heat denaturation, or toehold-mediated strand displacement. In some embodiments, the capture moiety is coupled to the substrate via an anchor molecule. In some embodiments, the anchor molecule or the capture moiety comprises a PEG linker.
- the substrate comprises a plurality of anchor molecules configured to couple to the capture moiety, wherein an average distance between the plurality of anchor molecules is greater than 100 nanometers.
- the anchor molecule comprises a peptide nucleic acid (PNA).
- the capture moiety is coupled to the peptide.
- the capture moiety is coupled to a C-terminus of the peptide.
- the capture moiety comprises a nucleic acid barcode molecule.
- the nucleic acid barcode molecule identifiesthe peptide.
- the nucleic acid barcode molecule comprises temporal information. In some embodiments, (IV) is performed prior to (III).
- the method further comprises repeating (I)-(IV) on the peptide. In some embodiments, the repeating yields a stacked plurality of modified amino acids. In some embodiments, the repeating yields a plurality of detectable products, wherein the plurality of detectable products comprises a plurality of modified amino acids that are not coupled to one another. In some embodiments, the method further comprises concatemerizing at least a portion of the detectable products, thereby generating a stacked plurality of modified amino acids. In some embodiments, the method further comprises cleaving the modified amino acid from the capture moiety. In some embodiments, the capture moiety comprises a cleavable moiety.
- the polymerizable molecule and the capture moiety comprise nucleic acid molecules.
- the coupling of (IV) comprises hybridization.
- the hybridization is performed using a splint oligonucleotide.
- splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the coupling of (IV) comprises ligation.
- the capture moiety or the polymerizable molecule comprises a hairpin nucleic acid molecule.
- the ligation is performed using a splint oligonucleotide.
- the splint oligonucleotide comprises a hairpin nucleic acid molecule.
- the linker comprises an isothiocyanate, a xanthate, a guanidinylating agent, a dithioester, or a thiocarbamoyl.
- the linker is coupled to the polymerizable molecule.
- the linker is coupled to the polymerizable molecule using click chemistry.
- the polymerizable molecule is a DNA molecule comprising a click chemistry moiety.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
- the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- FIG. 1 A schematically showsan example workflowforprocessingpolymericanalytes molecules (e.g., peptides) described herein.
- FIG. IB schematically shows another example workflow for processing polymeric analytes in solution or on a substrate.
- FIG. 1C schematically shows another example workflow for processing polymeric analytes in solution or on a substrate.
- FIG. ID schematically shows yet another example workflow for processing polymeric analytes in solution or on a substrate.
- FIG. IE schematically shows a workflow for analyzing a polymeric analyte.
- FIG. IF schematically shows contacting a helper molecule with a modified amino acid or stacked plurality of modified amino acids.
- FIG. 1G schematically shows another workflow for generating a stacked plurality of modified monomers.
- FIG. 1H schematically shows a workflow for reducing intermolecular crosstalk as described herein.
- FIG. 2A schematically shows an exemplary linker for coupling polymerizable molecules to polymeric analytes.
- FIG. 2B shows another exemplary linker for coupling polymerizable molecules to polymeric analytes.
- FIG. 3 schematically shows an example modified amino acid, as described herein.
- FIG. 4A schematically shows a workflow for increasing the number of reads of a modified amino acid or stacked plurality of modified amino acids.
- FIG. 4B schematically shows a workflow for sequencing a stacked plurality of modified amino acids in which the modified amino acids are spatially separated.
- FIG. 4C schematically shows a workflow for generating a plurality of stacked plurality of modified amino acids.
- FIG. 4D schematically shows a workflow for increasing the number of reads from a stacked plurality of modified amino acids.
- FIG. 5 schematically shows a computer system described herein.
- FIG. 6A shows example data of a current profile generated stacked pluralities of modified amino acids.
- FIG. 6B shows example current traces of individual stacked pluralities of modified amino acids.
- FIG. 7 shows example current traces of a stacked plurality of modified amino acids.
- FIG.8 shows example data of classification accuracy of a stacked plurality of modified amino acids.
- FIG. 9 shows example data of classification accuracy of different modified amino acid types.
- FIG. 10 shows example data of classification accuracy as a function of an applied confidence threshold.
- FIG. 11 shows example data of classification accuracy of different modified amino acids comprising post-translationally modifications or other modifications.
- FIG. 12 shows example data of a stacked plurality of five modified amino acids from a peptide.
- FIG. 13 shows example current traces of a stacked plurality of five modified amino acids from a peptide.
- FIG. 14 shows example current trace data from a tripeptide and a modified amino acid.
- FIG. 15 shows an exampleworkflowforprocessingproteins orpeptidesin preparation for peptide sequencing.
- FIG. 16 shows example data of a sample processing approach to conjugate one or more linkers to a processed peptide.
- FIG. 17 shows example data of conjugationof a capture moietyto a processed peptide.
- FIG. 18 shows example data of cleavage of a processed peptide comprising a linker coupled thereto.
- FIG. 19 shows example data of current traces of individual modified amino acids comprising a spacer moiety.
- FIG. 20 shows example data of alternative polymerizable molecules for processing polymeric analytes described herein.
- FIG. 21 shows an example scheme for chemical expansion of a peptide described herein.
- FIG. 22A shows example alternative workflows for processing polymeric analytes.
- FIG. 22B shows example data from one example alternative workflow.
- FIG.22C shows example data from another example alternative workflow.
- FIG. 22D shows example data of yet another example alternative workflow.
- FIG. 22E shows example data of yet another example alternative workflow.
- FIG. 23 shows an example scheme of reducing intermolecular crosstalk as described herein.
- references to “one embodiment,” “an embodiment,” “example embodiment,” “some embodiments,” “certain embodiments,” “various embodiments,” etc. indicate that the embodiment(s) of the disclosed technology so described may include a particular feature, structure, or characteristic, but not every embodiment necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one embodiment” does not necessarily refer to the same embodiment, although it may.
- Ranges may be expressed hereinas from “about” or “approximately” or “substantially” one particular value and/or to “about” or “approximately” or “substantially” another particular value. When such a range is expressed, other exemplary embodiments include from the one particular value and/or to the other particular value. Further, the term “about” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within an acceptable standard deviation, per the practice in the art.
- “about” can mean a range of up to ⁇ 20%, preferably up to ⁇ 10%, more preferably up to ⁇ 5%, and more preferably still up to ⁇ 1% of a given value.
- the term can mean within an order of magnitude, preferably within 2-fold, of a value.
- protein generally refers to a molecule comprising two or more amino acids joined by a peptidebond.
- a protein may also be referred to as a “polypeptide”, “oligopeptide”, or “peptide”.
- a protein can be a naturally occurring molecule, or a synthetic molecule.
- a protein may include one or more non-natural amino acids, modified amino acids, or non-amino acid linkers.
- a protein may contain D-amino acid enantiomers, L- amino acid enantiomers or both. Amino acids of a protein may be modified naturally or synthetically, such as by post-translational modifications or by chemical modification.
- proteins may be distinguished from each other based on different genes from which they are expressed in an organism, different primary sequence length or different primary sequence composition. Proteins expressed from the same gene may nonetheless be different proteoforms, for example, being distinguished based on non-identical length, non-identical amino acid sequence or non-identical post-translational modifications. Different proteins can be distinguished based on one or both of gene of origin and proteoform state.
- peptide may refer to any short, single peptide chain.
- a peptide may be no more than about 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, or less than about5 amino acidsinlength.
- a peptide may have a known or unknown biological function or activity.
- Peptides can include natural, synthetic, modified, or degraded proteins or peptides, or a combination thereof.
- Peptides can include proteinogenic, natural, synthetic, or modified amino acids or amino acid residues, or a combination thereof.
- single analyte may refer to an analyte that is individually manipulated or distinguished from other analytes.
- a single analyte may comprise a biomolecule or a synthetic molecule.
- a single analyte may comprise a small molecule.
- a single analyte can be a single molecule (e.g., a single biomolecule such as a single protein, nucleic acid molecule, affinity reagent, lipid, carbohydrate, metabolite, hapten, small molecule, pharmaceutical compound, nanoparticle, amino acid derivative, synthetic amino acid, etc.), a single complex of two or more molecules (e.g., a multimeric protein having two or more separable subunits, a single protein attached to a nucleic acid molecule or a single protein attached to an affinity reagent), a single particle, or the like.
- a single biomolecule such as a single protein, nucleic acid molecule, affinity reagent, lipid, carbohydrate, metabolite, hapten, small molecule, pharmaceutical compound, nanoparticle, amino acid derivative, synthetic amino acid, etc.
- a single complex of two or more molecules e.g., a multimeric protein having two or more separable subunits, a single protein attached to a nu
- polypeptide refers to two or more amino acids linked together by a peptide bond.
- polypeptide includes proteins that have a C-terminal end and an N- terminal end as generally known in the art and may be synthetic in origin or naturally occurring
- at least a portion of the polypeptide refers to 2 or more amino acids of the polypeptide.
- a polypeptide may comprise one or more peptides.
- a portion of the polypeptide includes at least: 1, 5, 10, 20, 30 or 50 amino acids, either consecutive or with gaps, of the complete amino acid sequence of the polypeptide, or the full amino acid sequence of the polypeptide.
- affixed refers to a connection between a polypeptide and a substrate such that at least a portion of the polypeptide and the substrate are held in physical proximity.
- the term “affixed” encompasses both an indirect or direct connection and may be reversible or irreversible, for example the connection is optionally a covalent bond or a non-covalent bond.
- sample refers to a collected substance or material that comprises or is suspected to comprise one or more analytes of interest (e.g., biomolecules, e.g, polypeptides).
- a sample may be modified for purposes such as storage or stability.
- a sample may be naturally occurring or synthetic.
- a sample may be processed to separate or remove unwanted fractions or impurities from the analyte(s) of interest.
- a sample may be enriched or purified.
- a sample may comprise a fraction of a separation process (e.g., chromatography, fractionation, electrophoresis, etc.).
- a sample may not be subjected to processing that separates or removes any unwanted fractions or impurities from the analyte(s) of interest.
- a sample may be obtained from any suitable source or location, including from organisms, cells, tissues, cell preparations, cell-free compositions, the environment (e.g., air, water, dirt, soil, agriculture, soil, dust).
- a sample may be obtained from an organism or part of an organism, such as from a fluid, tissue, or cell.
- a sample may include biological and/or non- biological components.
- biological sample or “biological source” refer to a sample that is derived from a predominantly biological system or organism, such as one or more viral particles, cells (e.g. individualized cells), organelles (e.g. individualized organelles), tissues, organs, bodily fluids, bone, cartilage, and exoskeleton.
- Abiological sample may comprise a majority of biological material on a mass basis, excluding the weight of fluid within the sample.
- Biological samples may comprise one or more proteins, referred to herein as protein samples.
- Biological samples can be acquired from various sources, e.g.
- a biological sample may be processed to purify and retain one or more biomolecules (e.g., proteins, nucleic acids, carbohydrates, lipids, glycoproteins, lipoproteins, metabolites, etc.) from the biological sample.
- a biological sample e.g., a protein sample
- a biological sample can also result from tissue specimens, such as biopsy samples, which may optionally be processed to liberate biomolecules (e.g., proteins) contained therein.
- tissue samples may also be derived from in vivo specimens, including fresh, frozen, acute, and fixed tissues.
- a sample or a biological sample may comprise non-biological molecules, including but not limited to nanoparticles, polymers, haptens, small molecules, chemicals, fluorescent reagents, inert materials, pharmaceuticals, food additives, environmental contaminants, solvents, industrial chemicals, nanomaterials, radioisotopes, by-products from non-biological molecules.
- antibody and “immunoglobulin” may generally refer to proteins that can recognize and bind to a specific antigen.
- An antibody or immunoglobulin may refer to an antibody isotype, fragments of antibodies including, but not limited to, Fab, Fv, scFv, vHH, and Fd fragments, chimeric antibodies, humanized antibodies, single-chain antibodies, and fusion proteins including an antigen-binding portion of an antibody and a non-antibody protein.
- the antibodies may be detectably labeled, e.g., with a fluorophore, radioisotope, enzyme (e.g, a peroxidase), epitope tag, which generates a detectable product, fluorescent protein, nucleic acid barcode sequence, and the like.
- the antibodies may be further conjugated to other moieties, such as members of specific binding pairs, e.g., biotin (member of biotin-avidin specific binding pair), and the like.
- Also encompassed by the terms are Fab', Fv, F(ab')2, and other antibody fragments that retain specific binding to antigen.
- Antibodies may exist in a variety of other forms including for example, Fv, Fab, and (Fab)2, as well asbi-functional (i.e., bi-specific) hybrid antibodies (e.g, Lanzavecchiaet al., Eur. J. Immunol. 17, 105 (1987)) and in single chains (e.g., Huston et al., Proc. Natl. Acad. Sci. U.S.A., 85, 5879-5883 (1988)andBird etal., Science, 242, 423-426(1988), which are incorporated herein by reference).
- Huston et al. Proc. Natl. Acad. Sci. U.S.A., 85, 5879-5883 (1988)andBird etal., Science, 242, 423-426(1988), which are incorporated herein by reference.
- Hood etal. Immunology, Benjamin, N.Y., 2 nd ed. (1984), and
- Binding generally refers to a covalent or non-covalent interaction between two molecules (referred to herein as “binding partners”, e.g., a substrate and an enzyme or an antibody and an epitope). Bindingbetween binding partners may be specific or non-specific. Binding between binding partners may involve one or more additional molecules (e.g., biomolecules) or enhancer molecules or substrates.
- bindingpartners e.g., abindingpartnerand a cognate molecule
- binding partners e.g., abindingpartnerand a cognate molecule
- a specific binding interaction may entail a binding partner that binds to a cognate molecule.
- the specific binding interaction may entail the binding of the binding partner to its cognate molecule at a significantly or substantially higher level or with greater affinity as compared to the binding of the binding partner to a non-cognate molecule.
- a specific binding interaction may entail a first binding partner that has greater selectivity of binding to the cognate molecule as compared to a non-cognate molecule.
- nucleic acid refers to a polymeric form of naturally occurring or synthetic nucleotides, or analogs thereof, of any length.
- a nucleic acid molecule may comprise one or more deoxyribonucleotides, deoxynucleotide triphosphates, dideoxynucleotide triphosphates, deoxynucleotide hexaphosphates, dideoxynucleotide hexaphosphates, ribonucleotides, hexitol nucleotides, cyclohexane nucleotides, or analogs or combinations thereof.
- a nucleic acid molecule may comprise, e.g., DNA, RNA, HNA, CeNA, and modified forms thereof.
- a nucleic acid molecule may comprise nucleotides that are linked by phosphodiester bonds.
- Anucleic acid molecule may have any two- orthree-dimensional structure, and may perform any function, known or unknown.
- a nucleic acid molecule may be single stranded, double stranded, or partially double stranded.
- Non-limiting examples of polynucleotides include a gene, a gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, noncoding RNA, small interfering RNA, short hairpin RNA, micro RNA, scaRNA, ribozymes, riboswitches, viral RNA, complementary DNA (cDNA), cosmid DNA, mitochondrial DNA, chromosomal or genomic DNA, viral DNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, control regions, isolated RNA of any sequence, nucleic acid probes, nucleic acid adapters, and primers.
- mRNA messenger
- the nucleic acid molecule may be linear, circular, or any other geometry.
- polynucleotide analogs include but are not limited to xeno nucleic acid (XNA), bridged nucleic acid (BNA), glycol nucleic acid (GNA), hexitol nucleic acid (HNA), cyclohexane nucleic acid (CeNA), peptide nucleic acids (PNAs), yPNAs, morpholino polynucleotides, locked nucleic acids (LNAs), threose nucleic acid (TNA), 2 '-O-Methyl polynucleotides, 2'-O-alkyl ribosyl substituted polynucleotides, phosphorothioate polynucleotides, and boronophosphate polynucleotides.
- XNA xeno nucleic acid
- BNA bridged nucleic acid
- GNA glycol nucle
- a polynucleotide analog may possess purine or pyrimidine analogs, including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, inverted base, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarbostyril analogues, azole carboxamides, and aromatic triazole analogues, orbase analogs with additional functionality, such as a biotin moiety for affinity binding.
- purine or pyrimidine analogs including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, inverted base, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarbostyril analogues, azole carboxamides, and aromatic triazole analogues, orbase analogs with additional functionality, such as a biotin moiety for affinity binding.
- amino acid generally refers to an organic compound that combines to form a protein or peptide.
- An amino acid generally comprises an amine group, a carboxylic acid group, and a side-chain specific to each amino acid, which serve as a monomeric subunit of a peptide.
- An amino acid may include the 20 standard, naturally occurring or canonical amino acids as well as non-standard or non-canonical amino acids.
- the standard, naturally- occurring or canonical amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).
- amino acid may be an L-amino acid or a D-amino acid.
- Non-standard amino acids may be modified amino acids, amino acid analogs, amino acid mimetics, non-standard proteinogenic amino acids, or non-proteinogenic amino acids that occur naturally or are chemically synthesized.
- non-standard amino acids include, but are not limited to, selenocysteine, pyrro lysine, and N-formylmethionine, (3 -amino acids, Homo-amino acids, Proline and Pyruvic acid derivatives, 3 -substituted alanine derivatives, glycine derivatives, ring- substituted phenylalanine and tyrosine derivatives, linear core amino acids, and N-methyl amino acids.
- amino acid type generally refers to one of the standard, naturally-occurring or canonical amino acids, e.g., one member of the group consisting of Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), Tyrosine (Y or Tyr), derivatives thereof, and modified forms of any of the aforementioned amino acids.
- amino acid type may be used herein to distinguish a plurality of amino acids that comprise different side chain groups, rather than a plurality of amino acids that are identical (e.g., different positional amino acids of a single peptide that have the same side chain).
- An amino acid type may comprise a modified version of one of the standard, naturally-occurring or canonical amino acids e.g., post translational modifications, an epigenetic modification, or chemical or enzymatic modifications. In some instances, an amino acid type can include non-canonical amino acids.
- post-translational modification refers to modifications that occur on a peptide subsequentto translation.
- a post-translational modification maybe a covalent modification or enzymatic modification.
- post-translation modifications include, but are not limited to, acylation, acetylation, alkylation (including methylation), benzoylation, biotinylation, butyrylation, carbamylation, carbonylation, carboxylation, crotonylation, deamidation, deiminiation, dimethylation, diphthamide formation, disulfide bridge formation, eliminylation, flavin attachment, formylation, gamma-carboxylation, glutamylation, glutarylation, glycylation, glycosylation, glypiation, heme C attachment, hydroxylation, hypusine formation, iodination, isoprenylation, lipidation, lipoylation, malonylation, methylation, myristolylation, nitration, oxidation, palmitoylation, pegylation, phosphopantetheinylation, phosphorylation, pren
- a post-translational modification includes modifications of the amino terminus and/or the carboxyl terminus of a peptide.
- Modifications (both naturally occurring and synthetic) of the terminal amino group include, but are not limited to, des-amino, N-lower alkyl, N-di-lower alkyl, and N-acyl modifications, N-terminal cyclization, deamination, oxidation, ubiquitination, SUMOylation, Neddylation, ISGylation, pupylation, eliminylation, biotinylation, lipidation, N- terminal methylation, N-terminal acetylation, N-terminal propionylation, N-terminal butyrylation, N-terminal crotonylation, N-terminal myristoylation, N-terminal palmitoylation, N-terminal stearoylation, andN-terminal benzoylation.
- Modifications of the terminal carboxy group include, but are not limited to, amide, lower alkyl amide, dialkyl amide, and lower alkyl ester modifications (e.g., wherein lower alkyl is C1-C4 alkyl).
- a post-translational modification also includes modifications, such as but not limited to those described above, of amino acids falling between the amino and carboxy termini.
- the term post-translational modification can also include peptide modifications that include one or more detectable labels.
- a post-translational modification may be naturally occurring or synthetic.
- binding agent refers to a molecule, e.g., a nucleic acid molecule, a peptide, a polypeptide, a protein, carbohydrate, a synthetic molecule, or a small molecule that binds to, associates with, unites with, recognizes, or combines with another molecule.
- the binding agent may bind to a macromolecule or a component or feature of a macromolecule.
- a binding agent may form a covalent association or non-covalent association with a molecule, a macromolecule, or a component or feature of a macromolecule.
- a binding agent may also be a chimeric binding agent, composed of two or more types of molecules, such as a nucleic acid molecule-peptide chimeric binding agent, a carbohydrate-peptide chimeric binding agent, or a lipid-peptide chimeric binding agent.
- a binding agent may be a naturally occurring, synthetically produced, or recombinantly expressed molecule.
- a binding agent may bind to a single monomer or subunit of a macromolecule (e.g., a single amino acid of a peptide) or bind to a plurality of linked subunits of a macromolecule (e.g., a di-peptide, tri-peptide, or higher order peptide of a longer peptide, polypeptide, or protein molecule).
- a binding agent may bind to a linear molecule or a molecule having a three-dimensional structure (also referred to as conformation).
- an antibody binding agent may bind to linear peptide, polypeptide, or protein, or bind to a conformational peptide, polypeptide, or protein.
- a binding agent may bind to anN-terminal peptide, a C-terminal peptide, oraninterveningpeptideof apeptide, polypeptide, or protein molecule.
- a binding agent may bind to an N-terminal amino acid, C-terminal amino acid, or an intervening amino acid of a peptide molecule.
- a binding agent may preferably bind to a chemically modified or labeled amino acid over a non-modified or unlabeled amino acid.
- a binding agent may preferably bind to an amino acid that has been modified with an acetyl moiety, guanyl moiety, dansyl moiety, PTC moiety, DNP moiety, SNP moiety, etc., over an amino acid that does not possess such a moiety.
- a binding agent may bind to a post- translational modification of a peptide molecule.
- a binding agent may exhibit selective binding to a component or feature of a macromolecule (e.g., a binding agent may selectively bind to one of the 20 possible natural amino acid residues and bind with very low affinity or not at all to the other 19 natural amino acid residues).
- a binding agent may exhibit less selective binding, where the binding agent is capable of binding a plurality of components or features of a macromolecule (e.g., a binding agent may bind with similar affinity to two or more different amino acid residues).
- a binding agent may comprise a tag, which may be coupled to the binding agent via a linker.
- linker generally refers to a molecule or moiety that is involved in joining two or more molecules.
- a linker may facilitate a covalent or noncovalent interaction of two or more molecules.
- a linker may be a crosslinker.
- the linker can be unifunctional, bifunctional, trifunctional, quadrifunctional, or poly functional.
- a linker can be or comprise a nucleotide, a nucleotide analog, an amino acid, a peptide, a polypeptide, or a nonnucleotide chemical moiety, such as an organic or inorganic compound.
- a linker may comprise a polymer, such as a polyethylene glycol (PEG), polyethylene, polypropylene, polyvinyl chloride, polystyrene or other organic or inorganic polymer.
- a linker may comprise one or more reactive ends, e.g., an amine-reactive group, a carboxyl-reactive group, a sulfhydryl-reactive group, a hydroxyl-reactive group, etc. Alternatively, a linker may not comprise a reactive end.
- a linker may be usedtojoin different molecule types, e.g., different biomolecule types such as a peptide with a nucleic acid molecule, a lipid with a peptide, a carbohydrate with a peptide, etc.; non-biomolecule types; or a biomolecule to anon-biomolecule.
- a linker may be used to join a binding agent with a tag, a tag with a macromolecule (e.g., peptide, nucleic acid molecule), a macromolecule with a solid support, a tag with a solid support, etc.
- Alinker may join two molecules via enzymatic reaction or chemistry reaction (e.g., click chemistry).
- a linker may join more than two molecules, e.g., via enzymatic or chemical reactions.
- a linker can be relatively linear or non-linear, e.g., cyclic or circularized, branched, polygonal, etc.
- conjugated generally refers to a covalent or ionic interaction between two entities, e.g., molecules, compounds, or combinations thereof.
- the term “tag” generally refers to a molecule or moiety that is conjugated to a molecule.
- Atag may comprise a detectable label, e.g., a fluorophore or fluorescent protein, a radioactive isotope, an enzyme (e.g., a chromogenic or fluorescent protein, proteins that can catalyze chromogenic substrates), a mass tag, a hapten (e.g., biotin, digoxigenin, urushiol, fluorescein), a vibrational or FTIR tag (e.g., alkyne group).
- a detectable label e.g., a fluorophore or fluorescent protein, a radioactive isotope, an enzyme (e.g., a chromogenic or fluorescent protein, proteins that can catalyze chromogenic substrates), a mass tag, a hapten (e.g., biotin, digoxigenin, urushiol, fluoresc
- a tag may comprise a biomolecule, such as a nucleic acid molecule, a protein, a lipid, a carbohydrate, or a combination thereof.
- a tag may comprise one or more nucleic acid molecules, which may optionally encode information regarding the tag or the molecule onto which a tag is conjugated (e.g., a binding agent, such as an antibody).
- a tag may comprise a nucleic acid barcode molecule.
- a tag may comprise an organic compound or an inorganic compound.
- barcode generally refers to an identifying feature that may be used to distinguish similar items.
- a barcode may comprise a nucleic acid molecule of about 2 to about 150 bases.
- a barcode may comprise a nucleic acid molecule of about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
- a barcode canbe an artificial sequence or a naturally occurring sequence including peptides, proteins, protein complexes, carbohydrates, and synthetic polymeric materials such as peptoids, polysaccharides, polymers, fluorescent tags, chemical tags, magnetic tags, isobaric tags, Raman spectroscopic tags, quantum dots, etc.
- each barcode within a population of barcodes is different.
- a portion of barcodes in a population of barcodes is different, e.g., at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% of the barcodes in a population of barcodes is different.
- a population of barcodes may be randomly generated or non-randomly generated.
- a population of barcodes may comprise error correcting barcodes.
- Barcodes can be used to computationally deconvolute sequence reads derived from an individual molecule, sample, library, etc. Barcodes may comprise multiplexed information, e.g., arising from different samples, compartments, individual molecules, etc.
- a barcode can also be used for deconvolution of a collection of molecules that have been distributed into small compartments for enhanced mapping. For example, rather than mapping a peptide back to the proteome, the peptide can be mapped back to its originating protein molecule or protein complex, a sample or partition from which it originated, etc.
- a barcode may comprise any useful sequence, including repeat sequences (e.g., a poly -A, poly-T, poly-C, poly- G region) or the barcode may comprise non-repeat sequences.
- a barcode may encode for information including, but not limited to time, lineage, sample types, cell number, beads, single molecule information, meta data, space/location (e.g., a slide, well, tissue), proximity (e.g., to other molecules, cells, metabolites, DNA, RNA), patient info, biological sample information, library information computer data, weather, physical parameters such as temperature, humidity, precipitation.
- sample barcode also referred to as “sample tag” generally refers to a barcode molecule comprising identifying information of a sample from which a barcoded molecule derives.
- a “spatial barcode” generally refers to a barcode molecule comprising identifying information of a region of a 2-D or 3-D sample (e.g., a tissue section) from which a molecule originates or is derived. Spatial barcodes may be used for molecular pathology on tissue sections. A spatial barcode may allow for multiplex sequencing of a plurality of samples or libraries from tissue section(s).
- a “temporal barcode” generally refers to a barcode molecule comprising time-based information relating to the barcoded molecule.
- the types of time-based data encoded in a temporal barcode can include information such as a lifetime of a barcoded molecule, a time of collection of a sample, a time or duration since the beginning of an experiment or induction with a stimulus, information on the age of a cell or tissue, a sequence of interactions between molecules, a time or cycle or round (e.g., of an iterative process) in which the barcode molecule is provided, among others. It is possible for different types of barcodes (e.g., spatial, temporal, cell-specific) to be combined in one multiplexed barcode.
- nucleic acid sequence or “oligonucleotide sequence” generally refers to a contiguous string of nucleotide bases and may refer to the particular placement of nucleotide bases in relation to each other as they appear in an oligonucleotide.
- polypeptide sequence or “amino acid sequence” refers to a contiguous string of amino acids and may refer to the particular placement of amino acids in relation to each other as they appear in a polypeptide.
- a “nucleotide sequence” may include any polymer or oligomer of nucleotides such as pyrimidine and purine bases, such as cytosine, thymine, and uracil, and adenine and guanine, respectively and combinations thereof.
- the nucleotide sequence may comprise any deoxyribonucleotide, ribonucleotide, hexitol-nucleotide, cyclohexanenucleotide, peptide nucleic acid component, and any chemical variants thereof, such as methylated, 7-deaza purine analogs, 8-halopurine analogs, hydroxymethylated or glycosylated forms of these bases, and the like.
- the polymers or oligomers may be heterogeneous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced.
- a nucleotide sequence may be DNA, RNA, HNA, CeNA or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
- complementarity refers to polynucleotides (i.e., a sequence of nucleotides) related by base-pairing rules.
- Complementarity may be “partial,” in which only some of the nucleic acids’ bases are matched according to the base pairing rules, or there may be “complete” or “total” complementarity between the nucleic acids.
- the degree of complementarity between nucleic acid strands can have significant effects on the efficiency and strength of hybridization between nucleic acid strands under defined conditions.
- hybridization is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (e.g., the strength of the association between the nucleic acids) is influenced by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, and the melting temperature of the formed hybrid. Hybridization methods involve the annealing of one nucleic acid to another, complementary nucleic acid, e.g., based on Watson-Crick base pairing.
- proteomics generally refers to quantitative and/or qualitative analysis of the proteome within a sample, such as biological sample, e.g., from cells, tissues, or bodily fluids.
- proteomics may include the analysis of spatial distributions of proteins within a sample (e.g., cell and/or tissues).
- proteomics may include studies of the dynamic state of the proteome, e.g., how one or more proteins change in time.
- N-terminal amino acid The terminal amino acid at one end of the peptide chain that has a free amino group may be referred to herein as the “N-terminal amino acid” (NTAA).
- C-terminal amino acid The terminal amino acid at the other end of the chain that has a free carboxyl group may be referred to herein as the “C-terminal amino acid” (CTAA).
- the amino acids making up a peptide may be numbered in order, with the peptide being “n” amino acids in length.
- NTAA may be considered the nth amino acid (also referred to herein as the “n NTAA”).
- next amino acid is the n- 1 amino acid, then the n-2 amino acid, and so on down the length of the peptide from the N-terminal end to C-terminal end.
- CTAA may be consideredthe nth amino acid (also referred to herein as the “n CTAA”).
- the next amino acid is the n-1, then the n-2 amino acid, and so on down the length of the peptidefromthe C-terminal end toN-terminal end.
- An NTAA, CTAA, or both may be modified or labeled with a chemical moiety.
- determining As used herein, the terms “determining,” “measuring,” “assessing,” and“assaying” are used interchangeably and include both quantitative and qualitative determinations.
- UMI unique molecular identifier
- AUMI may comprise a nucleic acid molecule of about 3 to about 150 bases (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
- a UMI may provide a unique identifier tag for each molecule (e.g., peptide, binding agent, a nucleic acid molecule) that comprises or is coupled to a UMI.
- a UMI may comprise a random sequence (e.g., a random N- mer).
- a “derivative” of a nucleic acid molecule generally refers to a nucleic acid molecule that is derived from an originating nucleic acid molecule.
- the derivative may have the same or substantially the same nucleotide sequence as the originating nucleic acid molecule, or the derivative may comprise a complement or partial complement as the originating nucleic acid molecule.
- a derivative may be the same type of nucleic acid (e.g., DNA or RNA) as the originating nucleic acid molecule, or the derivative may be a different type of nucleic acid (e.g, cDNA generated from an RNA molecule).
- a nucleic acid molecule derivative may display sequence identity as the originating nucleic acid molecule.
- the derivative nucleic acid molecule may also be subjected to additional processing from the originating nucleic acid molecule, e.g., chemical or enzymatic modification, splicing, ligation, polymerization, fragmentation, tagmentation (e.g., using a transposase), digestion, etc.
- additional processing e.g., chemical or enzymatic modification, splicing, ligation, polymerization, fragmentation, tagmentation (e.g., using a transposase), digestion, etc.
- a derivative polypeptide or peptide may be derived from an originating polypeptide (or peptide).
- a derivative may comprise the same amino acid sequence as the originating polypeptide, or the sequence may be different.
- the derivative polypeptide may result from or be subjected to additional processing from the originating polypeptide, e.g., chemical or enzymatic modification.
- the derivative polypeptide may comprise one or more tags, nucleic acid molecules, barcode molecules, labels (e.g., detectablelabels), fluorophores, probes, linkers, post-translational modifications, chemical protecting groups, or other chemical moieties.
- compartment or partition generally refers to a physical area or volume that separates or isolates a subset of molecules from a sample of molecules.
- a compartment or partition may separate an individual cell from other cells, or a subset of a sample’s proteome from the rest of the sample’s proteome.
- a compartment or partition may be an aqueous compartment (e.g., microfluidic droplet), a solid compartment (e.g., picotiter well or microtiter well on a plate, tube, vial, gel bead), a liquid-liquid phase separation, a liquid condensate, a sub cellular region, or a separatedregion on a surface.
- a compartment may comprise one or more beads to which macromolecules may be immobilized.
- a compartment may be transient.
- solid support As used herein, the term “solid support”, “solid surface”, or “solid substrate” or “substrate” refers to any solid material, including porous and non-porous materials, to which a molecule can be associated directly or indirectly.
- the molecule may be associated with the substrate by covalent or non-covalent interactions, or a combination thereof.
- a substrate may be two-dimensional (e.g., planar surface) or three-dimensional (e.g., gel matrix or bead).
- a solid support may comprise, in non-limiting examples, a bead, a microbead, an array, a glass surface, a silicon surface, a plastic surface, a filter, a membrane, nylon or other polymer, a silicon wafer chip, a flow through chip, a flow cell, a microfluidic device or chip or a surface thereof, a biochip including signal transducing electronics, a channel, a microtiter well, an ELISA plate, a spinning interferometry disc, a nitrocellulosemembrane, a nitrocellulose-based polymer surface, a polymer matrix, a nanoparticle, or a microsphere.
- Materials for a solid support include but are not limited to acrylamide, agarose, cellulose, nitrocellulose, glass, gold, quartz, polystyrene, polyethylene vinyl acetate, polypropylene, polymethacrylate, polyethylene, polyethylene oxide, poly silicates, polycarbonates, Teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid, polyorthoesters, functionalized silane, polypropylfumerate, collagen, glycosaminoglycans, poly amino acids, dextran, or any combination thereof.
- Solid supports further include thin film, membrane, bottles, dishes, fibers, woven fibers, shaped polymers such as tubes, (e.g., nanotubes), particles, beads, DNA origami, microspheres, microparticles, or any combination thereof.
- the bead can include, but is not limited to, a ceramic bead, polystyrene bead, a polymer bead, a methylstyrene bead, an agarose bead, an acrylamide bead, a solid core bead, a porous bead, a paramagnetic bead, a glass bead, or a controlled pore bead.
- Ahead may be spherical or an irregularly shaped.
- Ahead’ s size may range from nanometers, e.g., 1 nm, lO nm, lOO nm, to millimeters, e.g., 1 mm.
- beads range in size from about 0.2 micron to about200microns, or from about 0.5 micron to about 5 microns.
- beads can be about 1, 1.5, 2, 2.5, 2.8, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81 , 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 pm in diameter.
- sequencing generally refers to determining the order and identity of: (A) nucleotides (base sequences) in a nucleic acid sample, e.g., DNA orRNA; or determining the order and identity of (B) amino acids in all or part of a polymer, such as a protein, peptide, or other multimeric molecule.
- base sequences e.g., DNA orRNA
- B amino acids in all or part of a polymer, such as a protein, peptide, or other multimeric molecule.
- Many techniques are available for nucleic acid sequencing, such as Sanger sequencing or High Throughput Sequencing technologies (HTS).
- Sanger sequencing may involve sequencing via detection through (capillary) electrophoresis, in which up to 384 capillaries may be sequence analyzed in one run.
- High throughput sequencing involves the parallel sequencing of thousands or millions or more sequences at once.
- HTS can be defined as Next Generation sequencing (NGS), i.e. techniques based on solid phase pyrosequencing or as Next-Next Generation sequencingbased on single nucleotide real time sequencing (SMRT).
- NGS Next Generation sequencing
- SMRT single nucleotide real time sequencing
- HTS technologies are available such as offered by Roche, Illumina and Applied Biosystems (Life Technologies). Further high throughput sequencing technologies are described by and/or available from Helicos, Pacific Biosciences, Complete Genomics, Ion Torrent Systems, Oxford Nanopore Technologies, Nabsys, ZS Genetics, GnuBio.
- next generation sequencing refers to high-throughput sequencing methods that allow the sequencing of millions to billions of molecules in parallel.
- next generation sequencing methods include sequencing by synthesis, sequencing by ligation, sequencing by hybridization, polony sequencing, ion semiconductor sequencing, nanopore sequencing, and pyro sequencing.
- primers By attaching primers to a solid substrate and a complementary sequence to a nucleic acid molecule, a nucleic acid molecule can be hybridized to the solid substrate via the primer and then multiple copies can be generated in a discrete area on the solid substrate by using polymerase to amplify (these groupings are sometimes referred to as polymerase colonies or polonies).
- a nucleotide at a particular position can be sequenced multiple times (e.g., hundreds or thousands of times) — this depth of coverage is referred to as “deep sequencing.”
- Examples of high throughput nucleic acid sequencing technology include platforms provided by Illumina, BGI, Qiagen, ThermoFisher, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencing by ligation, capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, and single-molecule arrays, as reviewedby Service (Science s 11 : 1544-1546, 2006).
- analyzing means to quantify, characterize, distinguish, or a combination thereof, all or a portion of the components of a molecule (e.g., a macromolecule, a biological molecule such as a protein, amino acid, nucleic acid molecule, etc.).
- a molecule e.g., a macromolecule, a biological molecule such as a protein, amino acid, nucleic acid molecule, etc.
- analyzing a peptide, polypeptide, or protein may comprise determining all or a portion of the amino acid sequence (contiguous or non-continuous) of the peptide.
- Analyzing a macromolecule may include partial identification of a component of the macromolecule. For example, partial identification of amino acids in a protein sequence can identify an amino acid in the protein as belonging to a subset of possible amino acids.
- Analysis may be performed sequentially, e.g., beginning with analysis of the n NTAA, and then proceeding to the next amino acid of the peptide (i.e., n-1, n-2, n-3, and so forth).
- sequencing may be performed by cleavage ofthen NTAA, thereby converting the n-1 amino acid of the peptide to an N-terminal amino acid (referred to herein as the “n-1 NTAA”).
- analysis of a peptide may begin from C-terminus towards the N-terminus with each round of cleavage from the C- terminus creating a new CTAA.
- n-1 CTAA Cleavage of the n CTAA converts the n-1 amino acid of the peptide to a C-terminal amino acid, referred to herein as an “n-1 CTAA”.
- Analyzing the peptide may also include determining a presence and frequency of post-translational modifications on the peptide, which may or may not include information regarding the sequential order of the post- translational modifications on the peptide.
- Analyzing the peptide may also include determining the presence and frequency of epitopes in the peptide, which may or may not include information regarding the sequential order or location of the epitopes within the peptide.
- Analyzingthe peptide may include combining different types of analysis, for example obtaining epitope information, amino acid sequence information, post-translational modificationinformation, or any combination thereof.
- the term “array” generally refers to a population of molecules that is attached to one or more solid supports such that the molecules at one address can be distinguished from molecules at other addresses.
- An array can include different molecules that are each located at different addresses on a solid support.
- an array can include separate solid supports each functioning as an address that bears a different molecule, wherein the different molecules can be identified according to the locations of the solid supports on a surface to which the solid supports are attached, or according to the locations of the solid supports in a liquid such as a fluid stream.
- the molecules of the array can be, for example, nucleic acids such as SNAPs, polypeptides, proteins, peptides, oligopeptides, enzymes, ligands, or receptors such as antibodies, functional fragments of antibodies or aptamers.
- the addresses of an array can optionally be optically observable, and, in some configurations, adjacent addresses can be optically distinguishable when detected using a method or apparatus set forth herein.
- a functionalized material or substance may be naturally or synthetically functionalized.
- a polypeptide can be naturally functionalized with a phosphate group, oligosaccharide (e.g., glycosyl, glycosylphosphatidylinositol or phosphoglycosyl), nitrosyl, methyl, acetyl, lipid (e.g., glycosyl phosphatidylinositol, myristoyl orprenyl), ubiquitin or other naturally occurringpost-translational modification.
- a functionalized material or substance may be functionalized for any given purpose, including altering chemical properties (e.g., altering hydrophobicity or changing surface charge density) or altering reactivity (e.g., capable of reactingwith a moiety or reagentto form a covalent bond to the moiety or reagent).
- altering chemical properties e.g., altering hydrophobicity or changing surface charge density
- altering reactivity e.g., capable of reactingwith a moiety or reagentto form a covalent bond to the moiety or reagent.
- click reaction refers to single-step, thermodynamically favorable conjugation reaction utilizing biocompatible reagents.
- a click reaction may utilize no toxic or biologically incompatible reagents (e.g., acids, bases, heavy metals) or generate no toxic or biologically incompatible byproducts.
- a click reaction may utilize an aqueous solvent or buffer (e.g., phosphate buffer solution, Tris buffer, saline buffer, MOPS, etc.).
- a click reaction may be thermodynamically favorable if it has a negative Gibbs free energy of reaction, for example a Gibbs free energy of reaction of less than about -5 kiloJoules/mole (kJ/mol), -10 kJ/mol, -25 kJ/mol, -50 kJ/mol, -100 kJ/mol, -200 kJ/mol, -300 kJ/mol, -400 kJ/mol, or less than -500 kJ/mol.
- Exemplary bioorthogonal and click reactions are describedin detail in WO 2019/195633A1, which is herein incorporated by reference in its entirety.
- Exemplary click reactions may include metal-catalyzed azide-alkyne cycloaddition, strain-promoted azide-alkyne cycloaddition, strain-promoted azide- nitrone cycloaddition, strained alkene reactions, thiolene reaction, Diels-Alder reaction, inverse electron demand Diels-Alder reaction, [3+2] cycloaddition, [4+1] cycloaddition, nucleophilic substitution, dihydroxylation, thiolyne reaction, photoclick, nitrone dipole cycloaddition, norbomene cycloaddition, oxanobornadiene cycloaddition, tetrazine ligation, and tetrazole photoclick reactions.
- Exemplary functional groups or reactive handles utilized to perform click reactions may include alkenes (e.g., linear alkenes or cyclic alkenes such as trans-cyclooctene (TCO)), alkynes (e.g., linear alkynes or cycloalkynes (e.g., cyclooctynes or derivatives thereof, e.g., aza-dimethoxycyclooctyne (DIMAC), symmetrical pyrrolocyclooctyne (SYPCO), pyrrolocyclooctyne (PYRROC), difluorocyclooctyne (DIFO), a,a-bis(trifluoromethyl)pyrrolocyclooctyne
- alkenes e.g., linear alkenes or cyclic alkenes such as trans-cyclooctene (TCO)
- alkynes e.g., linear alkynes or cycl
- TRIPCO bicyclo[6.1.0]nonyne
- BCN dibenzocyclooctyne
- DIBO difluorinated cyclooctyne
- DIFBO difluorobenzocyclooctyne
- DBCO dibenzoazacyclo-octyne
- F 2 -DIBAC difluoro-aza- dibenzocyclooctyne
- BARAC biaryl-azacyclooctynone
- BARAC difluorodimeth oxydib enzocyclooctynol
- FMDIBO difluorodimeth oxydibenzocyclooctynone
- TMTH 3,3,6,6-tetramethylthiacycloheptyne
- TMTH-sulf oximine TMSI
- the click chemistry moieties may be subjected to conditions sufficient to react the first click chemistry moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy, for any useful duration of time.
- group and “moiety” are intended to be synonymous when used in reference to the structure of a molecule.
- the terms refer to a component or part of the molecule.
- the terms do not necessarily denote the relative size of the component or part compared to the molecule, unless indicated otherwise.
- the terms do not necessarily denote the relative size of the component or part compared to any other component or part of the molecule, unless indicated otherwise.
- a group or moiety can contain one or more atoms.
- primers generally refer to nucleic acid molecules which can prime the synthesis of a nucleic acid molecule (e.g., DNA or RNA).
- a primer may be single stranded.
- a primer may comprise one or more recognition sites for a protein (e.g., a polymerizing enzyme, a restriction enzyme, a cleaving enzyme, etc.) to bind to the primer or a primer hybridized to a template strand.
- a primer may comprise DNA, RNA, or other nucleic acid analogs or noncanonical bases (e.g., spacer moieties, uracils, abasic sites).
- a primer may optionally comprise any number of functional sequences such as sequencing primer sequences (e.g., P5 or P7 sequences), sequencing primer-binding sequences, read sequences (e.g., R1 or R2 sequences), restriction sites, transposition sites (e.g., mosaic end sequences), etc.
- sequencing primer sequences e.g., P5 or P7 sequences
- sequencing primer-binding sequences e.g., read sequences (e.g., R1 or R2 sequences)
- restriction sites e.g., restriction sites, transposition sites (e.g., mosaic end sequences), etc.
- Amplification or amplifying generally refers to a polynucleotide amplification reaction, namely, a population of polynucleotides that are replicated from one or more starting sequences.
- Amplifying may refer to a variety of amplification reactions, including but not limited to polymerase chain reaction (PCR), linear polymerase reactions, nucleic acid sequence- based amplification, rolling circle amplification and similar reactions.
- An amplification reaction may generate an amplicon.
- Amplification or amplifying may refer to an increase in quantity of a measurable output, for example, signal amplification.
- An “adapter” as referred to herein, generally refers to a short nucleic acid molecule (e.g., about 10 to about 100 base pairs in length).
- An adapter may comprise a short double-stranded DNA molecule.
- An adapter may be attached, e.g., via polymerization or ligation, to an end of a DNA fragments or amplicons.
- Adapters may comprise synthetic oligonucleotides, e.g., oligonucleotides that have nucleotide sequences which are at least partially complementary to each other.
- An adapter may have blunt ends, may have staggered ends (also referred to herein as a 3 ’ or 5’ “overhang sequence” or “sticky end”, or a blunt end and a staggered end.
- Adapters may be attached (e.g., via ligation) to fragments to provide an adapter-ligated fragment; the adapter- ligated fragment may serve as a starting point for subsequent manipulation e.g., for amplification or sequencing.
- An adapter may be functionalized, e.g., conjugated with a tag, probe, detectable label, affinity capture reagent (e.g., biotin or streptavidin).
- translocation generally refers to the movement of a molecule through a medium (e.g., a gas, a liquid, a solid, ora multiphase medium). Translocation of a molecule may occur spontaneously (e.g., through diffusion, Brownian motion, etc.). Alternatively, or in addition to, translocation of a molecule may occur with an application of force or pressure, e.g., using frictional force, tension force, a normal force, air resistance force, spring force, a temperature gradient, gravitational force, electrical force, magnetic force, acoustic force (e.g., acoustophoresis) etc.
- a medium e.g., a gas, a liquid, a solid, or a multiphase medium.
- translocation of a molecule may be achieved by application of pressure-driven flow or electrophoretic forces. Translocation may occur through a liquid or through a solid or semi-solid substrate (e.g., through a pore or gap) or adjacent or in proximity to the solid or semi-solid substrate.
- alanine (A, Ala); arginine (R, Arg); asparagine (N, Asn); aspartic acid (D, Asp); cysteine (C, Cys); glutamic acid (E, Glu); glutamine (Q, Gin); glycine (G, Gly); histidine (H, His); isoleucine (I, He); leucine (L, Leu); lysine (K, Lys); methionine (M, Met); phenylalanine (F, Phe); proline (P, Pro); serine (S, Ser); threonine (T, Thr); tryptophan (W, Trp); tyrosine (Y, Tyr); valine (V, Vai).
- X can indicate any amino acid.
- X can be asparagine (N), glutamine (Q), histidine (H), lysine (K), or arginine (R).
- References to these amino acids are also in the form of “[amino acid] [residues/residues]” (e.g, lysine residue, lysine residues, leucine residue, leucine residues, etc.).
- a method of the present disclosure comprises providing a polymeric analyte, such as a peptide, and determining the identity of the individual (or clusters of) monomers comprised by the polymeric analyte.
- the present disclosure may provide for methods of identifying a peptide or protein without determining the identity of the individual monomers.
- the methods, systems, compositions, and kits provided herein entail coupling polymerizable molecules to monomers (e.g., amino acids), thereby generating modified monomers (e.g., modified amino acids), and detecting the modified monomers (e.g., modified amino acids).
- the methods provided herein entail repeating one or more operations on a single or plurality of peptides to generate a plurality of modifiedmonomers, which optionally may be tethered together (e.g., in a stacked plurality of modified monomers), and analyzing or detecting the plurality of modified monomers.
- the detecting is performed using nanoscale objects (e.g., nanopores or nanogaps).
- the polymerizable molecule may alter or modulate a property of the monomer, suchthatthe monomer (e.g., amino acid) is more accurately identifiable, e.g., while translocating adjacent to or through the nanoscale object.
- the methods, systems, and compositions disclosed herein provide a more facile and highly parallelized approach to protein sequencing with increased accuracy.
- the methods, systems, and compositions disclosed herein may enable high -resolution sequencing of polymeric analytes, e.g., at single-molecule, sub -zeptomole, sub-attomole, sub- femtomole, or sub -picomole resolution.
- a method for sequencing a peptide comprising a plurality of amino acids comprising (a) providing a plurality of modified amino acids generated from at least a subset of the plurality of amino acids, and
- the methods provided herein may advantageously provide for highly -accurate identification and sequencing of individual amino acids from a peptide; for instance, sequencing of the plurality of modified amino acids may have an average read accuracy that is greater than 80% for at least 2 different modified amino acid types.
- the plurality of modified amino acids is derived from two contiguous amino acids of a peptide.
- the modified amino acids herein comprise polymerizable molecules.
- the methods provided herein comprise providing a modified amino acid, in which the modified amino acid comprises a polymerizable molecule and translocating the modified amino acid or derivative thereof through or adjacent to a nanopore or a nanogap.
- the method may further comprise sensing the modified amino acid as it translocates through or adjacent to the nanopore or nanogap.
- the sensing comprises measuring a signal from the nanopore or the nanogap as the modified amino acid or derivative thereof translocates through the nanopore or nanogap.
- the method further comprises using the signal to determine the identity of the modified amino acid or derivative thereof.
- the polymerizable molecule facilitates the translocation or modifies an interaction between the modified amino acid and the nanopore or nanogap.
- the interaction may be a covalent or noncovalent interaction and can, in some instances, induce a biophysical change within the nanopore or nanogap.
- the polymerizable molecule modulates the translocation speed of the modified amino acid through the nanopore or nanogap, which may render the modified amino acid more detectable as compared to an unmodified amino acid.
- the modulation of the interaction of the modified amino acid may control or cause a change in the translocation speed of the modified amino acid through the nanopore or nanogap.
- the translocation of the modified amino acid through the nanopore is performed under conditions sufficient to reduce the translocation speed of the modified amino acid through the nanopore as compared to a control condition.
- the reduced translocation speed of the modified amino acid through the nanopore or nanogap may improve the measured signal (e.g., current blockade) or increase the signal-to-noise ratio of the measured signal as compared to an unmodified amino acid or the control condition.
- the reduced translocation speed of the modified amino acid through the nanopore or nanogap improves the accuracy of identification of the modified amino acids.
- the methods provided herein comprise performing an intramolecular expansion of a polymeric analyte, e.g., a peptide, to generate a modified monomer (e.g., modified amino acid).
- a polymeric analyte e.g., a peptide
- Such an intramolecular expansion process may comprise providing a linker, coupling the linker to a monomer of the polymeric analyte to generate a monomer-linker complex, coupling the linker or the monomer-linker complex to a capture moiety, and cleaving the monomer from the polymeric analyte, thereby yielding a modified monomer.
- the method may further comprise repeatingthe intramolecular expansion or one or more operations of the intramolecular expansion on the next monomer of the polymeric analyte to generate another modified monomer.
- the monomer-linker complex or the modified monomer resulting from a round or cycle of intramolecular expansion may be coupled to that of the previous round or cycle of intramolecular expansion to generate a stacked plurality of modified monomers, e.g., linked by a polymerizable molecule backbone.
- the modified monomer or stacked plurality of modified monomers is detected using a nanopore sequencer, thereby outputting the identity of the individual monomers of the polymeric analyte.
- Modified amino acids may originate from or be part of a protein or peptide; for example, the modified amino acid may comprise or be derived from an amino acid located at a terminus (N-terminus or C-terminus) of a peptide, or the modified amino acid may comprise or be derived from an amino acid located within the peptide.
- the modified amino acid may comprise a proteinogenic amino acid or derivative thereof with any number of modifications. Examples of modifications include, in non-limiting examples, chemical modifications (e.g., protecting groups), biological modifications (e.g., post-translational modifications, modifications introduced by enzymatic treatment or digestion), physical modifications ⁇ . g., mutations introduced by irradiation, heat, etc.), andthelike.
- the modified amino acid or derivative thereof comprises or is coupled to a binding agent, such as an antibody, antibody fragment, nanobody, aptamer, peptide, a small molecule, an inorganic compound, a polymer, or any variations or combinations thereof.
- the modified amino acid may comprise a covalent or noncovalent modification.
- the modified amino acid comprises a non-naturally occurring chemical modification.
- the modified amino acid may comprise a protecting group, such as, in non-limiting examples, a methyl, formyl, ethyl, acetyl, t-butyl, anisyl, benzyl, tifluoroacetyl, N-hydroxysuccinimide, t-butyloxycarbonyl, benzoyl, 4-methyl benzyl, thioanizyl, thiocresyl, benzyloxymethyl, 4 -nitrophenyl, benzyloxycarbonyl, 2-nitrobenzoyl, 2-nitrophenylsulphenyl, 4-toluenesulphonyl, pentafluorophenyl, diphenylmethyl, 2-chlorobenzyloxy carbonyl, 2,4, 5 -trichlorophenyl, 2- bromobenzyloxycarbonyl, 9-fluorenylmethyloxycarbonyl, triphenylmethyl, or 2, 2, 5,7,8- pent,
- modifications include fluorescence labeling, Isotope labeling, PEGylation, alkylation, arylation, acylation, benzylation, carbamoylation, guanidination, eliminylation, iodination, seleniation, thiolation, acetoacetylation, aminoethylation, carbonylation, cyanylation, glucuronidation, glucosylation, hydroxymethylation, lipoylation, mannosylation, naphthoquinone addition, nucleotide addition, phosphopantetheinylation, polysialylation, andtyramine addition.
- the modified amino acid comprises a proteinogenic amino acid or derivative thereof that is coupled to a polymerizable molecule. In some instances, the modified amino acid comprises a proteinogenic amino acid or derivative thereof, a linker, and a polymerizable molecule.
- the modified amino acid may comprise any useful modification. Modifications may be naturally-occurring (e.g., post translational modifications) or non-naturally occurring, such as by labeling or tagging, e.g., with an amino acid- or amine-reactive agent or linker comprising the amino acid- or amine-reactive agent.
- amino acid- or amine-reactive agents examples include isothiocyanate (e.g., PITC, NITC), l-fluoro-2,-4-dinitrobenzene (DNFB), dansyl chloride, 4- sulfonyl-2-nitrobfluorobenzene (SNFB), an acetylating agent, an acylating agent, an alkylating agent, a guanidination agent, a thioacetylation agent, a thioacylation agent, a thiobenzoylation agent, or a derivative or combination thereof.
- isothiocyanate e.g., PITC, NITC
- DNFB l-fluoro-2,-4-dinitrobenzene
- dansyl chloride dansyl chloride
- an acetylating agent an acylating agent, an alkylating agent, a
- the one or more modified amino acids may comprise an adduct (e.g., a polymer such as PEG, a polymerizable molecule such as a nucleic acid molecule, a nanoparticle or nanotube, a peptide or protein), a lipid, a carbohydrate, a metabolite, a fluorophore, a hapten, a peptide, a synthetic peptide, a peptoid, a quencher, a tag (e.g., a fluorescent tag, a magnetic tag, a radioactive tag), a barcode, or other moiety.
- an adduct e.g., a polymer such as PEG, a polymerizable molecule such as a nucleic acid molecule, a nanoparticle or nanotube, a peptide or protein
- a lipid e.g., a carbohydrate, a metabolite, a fluorophore, a hapten
- a modified amino acid may comprise a modification that facilitates recruitment of an enzyme (or ribozyme or DNAzyme) to recognize or cleave a terminal amino acid, e.g., a NTAA or CTAA of a peptide.
- a terminal amino acid of a peptide may be modified with a saccharide in order to recruit a lectin or lectin-bound protease.
- one or more modified amino acids may comprise or be coupled to a nucleic acid molecule having a first sequence that is complementary to a second sequence comprised by an oligo-bound protease.
- Hybridization of the first sequence to the second sequence may facilitate local recruitment of the protease to the amino acid to be cleaved.
- a peptide may be modified with phenylisothiocyanate (PITC), which may allow for recruitment and cleavage of the modified amino acid by an Edmanase.
- PITC phenylisothiocyanate
- a peptide may be modified with a peptide or protein, which may allow for recruitment and cleavage of the modified amino acid by a protease.
- a peptide may be modified with a tag that may allow for recruitment and cleavage of the modified amino acid by an enzyme.
- a peptide may be modified with functional moiety that is recognized by a specific cleaving enzyme; recognition and binding of the cleaving enzyme to the functional moiety may result in cleavage of the modified amino acid.
- modificationsto amino acids may include epitope tags, which can facilitate binding of a binding agent to the modified amino acid. Examples of such epitope tags include fluorophores, nucleic acid molecules, peptides, haptens, polymers, chemical moieties, or other adduct molecule.
- the methods described herein may further comprise generating the modified amino acid.
- the modified amino acid comprises a proteinogenic amino acid or derivative thereof, a linker, and a polymerizable molecule.
- an amino acid of a peptide may be contacted with a linker that comprises (i) first reactive moiety capable of reacting with the amino acid and (ii) a second reactive moiety.
- a polymerizable molecule comprising a third reactive moiety that is capable of reacting with the second reactive moiety may be provided.
- the second and third reactive moieties may comprise click chemistry moieties that can react with one another (e.g., azide and DBCO, azide andBCN, alkyne and DBCO, TCO and tetrazine, etc.).
- the reaction of the amino acid with the linker and the linker to the polymerizable molecule may thus yield a modified amino acid comprising the amino acid, the linker, and the polymerizable molecule.
- the polymerizable molecule may comprise the amino acid- reactive moiety (optionally via a linker) and may react directly with the amino acid.
- cleavage of the amino acid from the peptide may be performed, and the modified amino acid may comprise the cleaved product comprising the cleaved, and optionally derivatized, amino acid, the linker (if present), and the polymerizable molecule.
- the amino acid is a terminal amino acid.
- generating the modified amino acid may comprise (i) contacting an amino acid of a peptide with a polymerizable molecule comprising an amino acid reactive group and (ii) polymerizing the polymerizable molecule, thereby generating the modified amino acid.
- the polymerizable molecule may comprise a modified nucleotide comprising an amino acid reactive moiety (e.g., isothiocyanate, guanidinylating group, dithioester, xanthate, etc.).
- the amino acid reactive group may react with the amino acid (e.g., a terminal amino acid).
- the modified nucleotide may then be subject to a nucleic acid reaction such as ligation (e.g., usingligase or chemical ligation such as click chemistry) or an extension reaction, e.g., using polymerase or terminal deoxynucleotidyl transferase (TdT), thereby generating the modified amino acid.
- a nucleic acid reaction such as ligation (e.g., usingligase or chemical ligation such as click chemistry) or an extension reaction, e.g., using polymerase or terminal deoxynucleotidyl transferase (TdT), thereby generating the modified amino acid. See, e.g., FIG. IB (inset).
- a modified amino acid may comprise a single amino acid or a plurality of amino acids (e.g., dipeptide, tripeptide, etc.).
- the modified amino acid may comprise, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or greater amino acids.
- the plurality of amino acids comprised by the modified amino acid may be of the same amino acid type (e.g., leu-leu, val-val, ile-ile) or different amino acid types (e.g., val-pro, arg-his, gly-leu).
- the plurality of amino acids comprised by the modified amino acid may comprise any number of modifications, e.g., as described elsewhere herein.
- the polymerizable molecules described herein may comprise any useful polymerizable moiety.
- the polymerizable moiety may comprise a naturally occurring or synthetic polymer (organic or inorganic) or biopolymer.
- the polymer may comprise one or more monomer types, which may assemble covalently or non covalently .
- the polymer may comprise one or more repeating monomers.
- the polymer may comprise non-repeating units. In instances where more than one monomer type is used, the polymer may form an alternating copolymer structure, a periodic copolymer structure, a random copolymer structure, a block copolymer structure, a chained or grafted copolymer, or any other useful structure.
- the polymer may be linear or non-linear.
- the polymer may comprise polyethylene glycol (PEG), PEG- diacrylate, PEG-acrylate, PEG-thiol, PEG-azide, PEG-alkyne, polyacrylamide, agarose, collagen, fibrin, gelatin, chitosan, hyaluronic acid, alginate, polyvinyl alcohol, or another polymer.
- a polymerizable molecule may comprise one or more repeating units (e.g., monomers), or a polymerizable molecule may not comprise repeatable units.
- a polymerizable may be expandable or extendible (e.g., capable of being polymerized) or the polymerizable molecule may not be expandable or extendible (e.g., comprises a terminal monomer or a terminal end that cannot be expanded).
- the polymerizable molecule may comprise a scaffold.
- the polymerizable molecule comprises a biomolecule, e.g., a protein or peptide, a nucleic acid molecule, e.g., DNA, RNA, XNA, PNA, LNA, a carbohydrate or lipid chain, or a combination thereof.
- the polymerizable molecule comprises a nucleic acid molecule, which can comprise any useful number and type of nucleotides, e.g., including canonical and noncanonical bases, and the number of nucleotides may be modulated based on the intended purpose.
- the length or sequence of the nucleic acid molecules may be modulated to alter a property of the amino acid (e.g., volume, aspect ratio, charge, etc.) to which the polymerized molecule is tethered.
- the nucleic acid molecule may additionally or alternatively comprise any useful functional sequences or moieties, including but not limited to barcode sequences or other identifying sequences, UMI sequences, template or recognition sequences for DNA repair or replication, 5’ blocking groups, 3’ blocking groups, protecting groups, enzyme recognition sites (e.g., transposition sites, restriction sites), spacer sequences or spacer moieties (e.g., dSpacer, 3 ’ C3 Spacer, 2-aminopurine, hexanediol, Spacer 9, Spacer 18, etc.) sequencing primer sequences, read sequences, or primer sequences.
- any useful functional sequences or moieties including but not limited to barcode sequences or other identifying sequences, UMI sequences, template or recognition sequences for DNA repair or replication, 5
- the nucleic acid molecule may comprise canonical bases, noncanonical or modified bases, naturally occurring bases, synthetic bases, abasic sites, nucleotide analogs, or a combination thereof.
- the nucleic acid molecule can be single stranded, double stranded, or partially double stranded.
- the nucleic acid molecule may comprise RNA, DNA, or a hybrid of RNA and DNA.
- the modified amino acid or derivative thereof is generated using an iterative process, as described elsewhere herein; accordingly, the nucleic acid molecule may comprise information on the round or cycle number of the iterative process.
- the nucleic acid molecule comprises a nucleic acid barcode molecule and may comprise useful information on the identity of the amino acid, temporal information, spatial information, etc.
- the polymerizable molecules described herein may be any useful type of polymerizable molecule.
- the polymerizable molecules may be naturally occurring, such as biological polymers (e.g., nucleic acid molecules, peptides, polysaccharides, fatty acids), or other naturally occurring polymers, e.g., rubber, cellulose, starches, polyhydroxyalkanoates, chitosan, dextran, structural proteins (e.g., collagen, hyaluronic acid, glycosaminoglycans), agarose, carrageenan, isphagula, acacia, agar, gelatin, shellac, xanthan gum, guar gum, alginate, etc.
- biological polymers e.g., nucleic acid molecules, peptides, polysaccharides, fatty acids
- other naturally occurring polymers e.g., rubber, cellulose, starches, polyhydroxyalkanoates, chitosan, dextran, structural proteins (e.
- the polymerizable molecules may be synthetic, e.g., acrylics, nylons, silicones, viscose, rayon, polyesters, poly carboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethane, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene terephthalate, poly(chlorotrifluoroethylene), polyethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), poly formaldehyde, polypropylene, polystyrene, polytetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and combinations thereof.
- synthetic e.g., acrylics,
- the polymerizable molecule may be charged (positively charged, negatively charged) uniformly or nonuniformly.
- the polymerizable molecule may comprise both a positively charged region and a negatively charged region.
- the polymerizable molecules may comprise one or more reactive moieties (e.g, radical groups) to initiate polymerization or to add a functional group, or the polymerizable molecules may be polymerized via contacting of an initiating agent (e.g., ammonium persulfate, peroxide, or other radicalizing agent).
- an initiating agent e.g., ammonium persulfate, peroxide, or other radicalizing agent.
- the polymerizable molecules may be polymerizable via an enzymatic reaction or contacting of an enzyme (e.g., polymerizing enzyme such as polymerases), ribozyme or DNAzyme.
- the polymerizable molecules may be polymerizable via self-assembly.
- the polymerizable molecules may comprise a single polymer type (e.g., a homopolymer) or more than one polymer type (e.g., a copolymer) and may comprise random or arranged monomers.
- the polymerizable molecules may be a block polymer, alternating copolymer, periodic copolymer, statistical copolymer, stereoblock copolymer, gradient copolymer, branched copolymer, graft copolymer, etc.
- the polymerizable molecule may comprise one or more non-repeat monomers.
- the polymerizable molecules may encompass any useful geometry or shape.
- the polymerizable molecule may comprise a linear, circular, branched or other polymer.
- the polymerizable molecules comprise one or more nucleic acid molecules, which may have any useful shape or geometry, e.g., single-stranded, double-stranded, partially- double stranded, hairpin, 2D or 3D structure (e.g., DNA origami).
- a first polymerizable molecule may be a nucleic acid molecule
- a second polymerizable molecule may be a peptide.
- both the first polymerizable molecule and the second polymerizable molecule are nucleic acid molecules.
- the first polymerizable molecule may be coupled to the second polymerizable molecule via ligation, hybridization, an extension reaction, or a combination thereof.
- the first polymerizable molecule may comprise a first nucleic acid sequence and the second polymerizable molecule may comprise a second nucleic acid sequence.
- the first nucleic acid sequence may be complementary or partially complementary to the second nucleic acid sequence, and the coupling may comprise hybridizing the first nucleic acid sequence or portion thereof to the second nucleic acid sequence or portion thereof.
- the first nucleic acid sequence and the nucleic acid sequence may be complementary to two sequences of a splint or bridge oligonucleotide, and coupling may be mediated via hybridization to the splint oligo.
- the first nucleic acid sequence may be ligated to the second nucleic acid sequence, either chemically (e.g, via click chemistry approaches in which the first polymerizable molecule and the second polymerizable molecule each comprise one member of a click chemistry pair) or enzymatically (e.g., using a ligase).
- the polymerizable molecules may comprise functional portions or functional groups.
- the polymerizable molecules may comprise a nucleic acid molecule comprising a functional sequence, such as a primer sequence (e.g., universal priming site), a sequencing sequence, a read sequence, a unique molecular identifier (UMI), a barcode sequence, a cleavage sequence (e.g., a restriction site, a Cas-binding sequence), a transposition sequence (e.g., a mosaic end sequence), a spacer moiety, a primer-binding sequence, or a combination thereof.
- a primer sequence e.g., universal priming site
- UMI unique molecular identifier
- UMI unique molecular identifier
- a barcode sequence e.g., a cleavage sequence (e.g., a restriction site, a Cas-binding sequence)
- transposition sequence e.g., a mosaic end sequence
- spacer moiety e.g
- the polymerizable molecule may comprise a linker or linking moiety, e.g., a reactive or cross-linking moiety such as click chemistry moieties (e.g., alkyne, azide, DBCO, BCN, tetrazine, TCO), photoreactive groups (e.g., benzophenone), l-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC), N-hydroxysulfosuccinimide (NHS), Sulfo-NHS, or NHS-esters, sulfhydryls, amine groups, maleimides, hydrazines, hydroxyl amines, thiols, biotin or streptavidin, cystamine, glutaraldehyde, formaldehyde, succinimidyl 4-(N-maleimidomethyl)cyclohexame-l- carboxylate (SMCC), Sulfo-SMCC,
- the polymerizable molecule may comprise an amino acid reactive group, e.g., an isothiocyanate, an aldehyde, a guanidinylating agent, a xanthate, a dithioester or thiocarbamoyl, or other amino acid reactive group, as described elsewhere herein.
- an amino acid reactive group e.g., an isothiocyanate, an aldehyde, a guanidinylating agent, a xanthate, a dithioester or thiocarbamoyl, or other amino acid reactive group, as described elsewhere herein.
- the polymerizable molecules may comprise a tag, which may be useful in identifying enriching, or purifying the polymerizable molecule.
- a tag may be a barcode, e.g., a nucleic acid barcode molecule, fluorophore or fluorescent protein, bioluminescenttag, chemiluminescent tag isobaric tag, radioisotope, mass tag, or other detectable moiety, or a combination thereof.
- the tag may be used in enriching or purifyingthe polymerizablemolecule and may comprise, for example, an affinity tag (e.g., an antibody, an aptamer, a biotin molecule, a streptavidin molecule, etc.).
- a polymerizable molecule may comprise a biotin tag, which may enable pulldown or purification using streptavidin (e.g., streptavidin beads).
- the polymerizable molecule may be cleavable or comprise a cleavable moiety.
- the cleavage of the polymerizable molecule or cleavable moiety may be achieved using a stimulus.
- the stimulus can be, for example, a chemical stimulus (e.g., application of an acid or base, a reducing agent), a biological stimulus (e.g., a cleaving enzyme, a protease, a nuclease), a thermal stimulus (e.g., application of heat), a photo-stimulus, a physical or mechanical stimulus, or other type of stimulus or a combination of stimuli, as described elsewhere herein.
- the polymerizable molecule may comprise a cleavable tag, e.g., a photocleavable biotin moiety, which can be useful for purifying or enriching the polymerizable molecule from a mixture (e.g, during an operation of intramolecular expansion of a polymeric analyte).
- the polymerizable molecule comprises a nucleic acid sequence that is cleavable using a chemical stimulus; for example, the nucleic acid sequence may comprise only purine nucleobases and may be cleavable using acid-catalyzed depurination and cleavage (e.g., using trifluoracetic acid or boron triflate etherate).
- the polymerizable molecules may be any useful size.
- the polymerizable molecules may be about 1 angstrom, about 2 angstrom, about 3 angstrom, about 4 angstrom, about 5 angstrom, about 6 angstrom, about ? angstrom, about 8 angstrom, about 9 angstrom, about 10 angstrom, about 20 angstrom, about 30 angstrom, about 40 angstrom, about 50 angstrom, about 60 angstrom, about 70 angstrom, about 80 angstrom, about 90 angstrom, about 100 angstrom, about 200 angstrom, about 300 angstrom, about 400 angstrom, bout 500 angstrom, about 600 angstrom, about 700 angstrom, about 800 angstrom, about 900 angstrom, about 1000 angstrom, about 10,000 angstrom, about 100,000 angstrom or greater in size, length, or another dimension.
- the polymerizable molecule (e.g., the first polymerizable molecule or the second polymerizable molecule) comprises a nucleic acid molecule comprising one or more nucleotide bases.
- the polymerizable molecule may comprise any useful number of nucleotide bases, e.g., about 1 base, about 2 bases, about 3 bases, about 4 bases, about 5 bases, about 6 bases, about ?
- the polymerizable molecules comprises a nucleic acid molecule or analog thereof.
- the nucleic acid molecule can be single stranded, double stranded, or partially double-stranded.
- the nucleic acid molecule may comprise a crosslinked nucleic acid molecule.
- the nucleic acid molecule may comprise a modified nucleotide or non-canonical base.
- the polymerizable molecules may comprise a pseudo-complementary base, a bridged nucleic acid (BNA), a xenonucleic acid (XNA), a locked nucleic acid (LNA), a peptide nucleic acid (PNA), a gamma-PNA molecule, a morpholino, or a combination thereof.
- the polymerizable molecule comprises a PNA, which may be advantageous in increasing the stability of the molecule under reaction conditions (e.g., acidic cleavage).
- a polymerizable molecule may comprise a hexitol nucleic acid (HNA) or a cyclohexyl nucleic acid (CeNA), which may be useful in rendering the polymerizable molecule more resistant to acid degradation (e.g., as used in conventional Edman degradation).
- HNA hexitol nucleic acid
- CeNA cyclohexyl nucleic acid
- a polymerizable molecule may comprise naturally occurring bases that are more resistant to acid degradation, e.g., be composed of primarily thymine or cytosine or analogs thereof.
- a nucleic acid molecule may comprise at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% thymines or cytosines, which can render the nucleic acid molecule more acid resistant as compared to a nucleic acid molecule comprising adenines or guanines.
- the polymerizable molecules may comprise more than one molecule type.
- the polymerizable molecule may comprise a conjugate of two or more polymer types, such as two or more biopolymers, a biopolymer and a synthetic polymer, two or more synthetic polymers, etc.
- the polymerizable molecule may comprise a peptide-oligo conjugate, a peptide-polymer conjugate, an oligo-polymer conjugate, etc.
- the modified amino acid may comprise a polymerizable molecule coupled covalently or non -covalently thereto.
- the coupling may be performed using any suitable chemistries and reaction conditions and may comprise the use of a linker.
- a polymerizable molecule such as a nucleic acid molecule may comprise a first reactive group, e.g., a first click chemistry moiety, as described elsewhere herein, and may be contacted with a linker comprising a second reactive group, e.g., a second click chemistry moiety that is able to react with the first reactive group.
- the linker may also comprise an additional reactive group that is able to tether to an amino acid (e.g., a terminal amino acid) and optionally, cleave the amino acid from a peptide.
- the additional reactive group may be a thiocyanate conjugate, e.g., an isothiocyanate (ITC) such as phenyl isothiocyanate (PITC) or naphthylisothiocyanate (NITC), or an aldehyde group, e.g., ortho-phthalaldehyde (OPA), 2,3-naphthalenedicarboxyaldehyde (NDA), a guanidinylating agent, dinitrofluorobenzene (DNFB), dansyl chloride, a dithioester, a thiobenzoyl, a thioacetyl, a xanthate, or other amino acid-reactive group.
- ITC isothiocyanate
- the linker may be reacted with an amino acid of a peptide, e.g., the NTAA or CTAA.
- a linker comprising at least two reactive groups may allow for (i) tethering of the amino acid to the linker and (ii) tethering of the linker to the polymerizable molecule (such as a nucleic acid molecule; see, e.g., FIG. 2A-2B)
- the linker may be provided pre-tethered to the polymerizable molecule (e.g., nucleic acid molecule) prior to contacting with the amino acid.
- the conjugation of the polymerizable molecule to the amino acid may change the chemical structure of the amino acid.
- the amino acid may be derivatized to a thiocarbamyl group (e.g., under alkaline conditions), a thiazolone group (e.g., under acid conditions), a thiohydantoin group, or other chemical moiety, thereby generating a modified amino acid comprising a polymerizable molecule coupled thereto.
- the polymerizable molecule comprises a linker that can couple to an amino acid-linker complex.
- the polymerizable molecule may comprise a nucleic acid molecule that comprises a linker that comprises a click chemistry moiety.
- the linker of the polymerizable molecule may be coupled via a synthetic nucleobase or to the backbone.
- an octadiynyl deoxy nucleotide or ribonucleotide or an ethynyl deoxy nucleotide or ribonucleotide may be incorporated into a DNA molecule, thereby yielding a linker-conjugated DNA molecule.
- the linker-conjugated DNA molecule may then react, e.g., via a click chemistry reaction, with an amino acid-linker complex or another linker (e.g., a linker comprising an amino acid reactive moiety or that is already reacted to an amino acid and additionally comprises a click chemistry moiety).
- an amino acid-linker complex or another linker e.g., a linker comprising an amino acid reactive moiety or that is already reacted to an amino acid and additionally comprises a click chemistry moiety.
- the polymerizable molecule of the modified amino acid may facilitate translocation of the modified amino acid through the nanopore or nanogap.
- the polymerizable molecule may be able to couple to a processive enzyme that allows for ratcheting of the polymerizable molecule through the nanopore or nanogap.
- the polymerizable molecule comprises a nucleic acid molecule, and translocation of the modified amino acid through the nanopore or nanogap is facilitated using a DNA or RNA-processing enzyme, such as a helicase, polymerase, topoisomerase, or other enzyme.
- the ratcheting enzyme may be naturally derived (e.g., from a virus such as poxvirus helicase-primase D5, El (papillomaviridae), or herpesviridae (UL5), poxvirus DNA polymerase holoenzyme, E9, A20, D4, bacteria, mammal), or the enzyme may be an engineered variant.
- a virus such as poxvirus helicase-primase D5, El (papillomaviridae), or herpesviridae (UL5), poxvirus DNA polymerase holoenzyme, E9, A20, D4, bacteria, mammal
- the enzyme may be an engineered variant.
- a modified amino acid having the same amino acid type may be coupled to the same or different linkers and/or polymerizable molecules.
- first modified amino acid and a second modified amino acid may comprise the same polymerizable molecule but different linkers (e.g., different click chemistry moieties), or the same linker but different polymerizable molecules (e.g., a protein and a nucleic acid molecule, two different nucleic acid sequences, etc.).
- Linkers One or more linkers maybe used to couple molecules, e.g., the polymerizable molecule to an amino acid or derivative thereof to thereby generate the modified amino acid or a precursor or derivative thereof.
- the linker comprises a click chemistry moiety.
- the click chemistry moiety may comprise any suitable clickable or bioorthogonal moieties, as described elsewhere herein, e.g., alkenes, alkynes (e.g., alkyne, cycloalkynes such as DBCO and BCN), azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, and combinations, variations, or derivatives thereof.
- alkenes e.g., alkyne, cycloalkynes such as DBCO and BCN
- alkynes e.g., alkyne, cycloalkynes such as DBCO and BCN
- azides epoxides
- amines e.g., cycloalkynes
- thiols e.g., cycloalkynes
- nitrones e.g., isonitriles
- the linker may be subjected to conditions sufficient to react the click chemistry moiety of the linker to an additional click chemistry moiety (e.g., comprised by the polymerizable molecule), e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, orlight/energy for any useful duration of time.
- an additional click chemistry moiety e.g., comprised by the polymerizable molecule
- the coupling of the linker to a monomer of a polymeric analyte e.g., an amino acid of a peptide, and/or to the polymerizable molecule or to a capture moiety may be covalent or noncovalent.
- a linker may comprise a first reactive group that is able to couple to a monomer of the polymeric analyte (e.g., an amino acid of a peptide) and optionally, cleave the amino acid from a peptide.
- the first reactive group may be an amino-acid reactive group, e.g., an isothiocyanate (ITC) such as isothiocyanate, phenyl isothiocyanate (PITC), 3- pyridyl isothiocyanate (PYITC), 2-piperidinoethyl isothiocyanate (PEITC), 3-(4-morpholino) propyl isothiocyanate (MPITC), 3- (diethylamino)propyl isothiocyanate (DEPTIC) or naphthylisothiocyanate (NITC), fluorescein isothiocyanate (FITC), ammonium thiocyanate, potassium thiocyanate, trimethylsilyl isothiocyanate (TMS-ITC), phenyl phosphoroisothiocyanatidate, acetyl isothiocyanate (AITC), or an aldehyde group, e.g., orthophthalaldeh
- the linker may additionally comprise a second reactive group that is capable of coupling, either directly or indirectly, to the capture moiety.
- the capture moiety may comprise a click chemistry moiety (e.g., alkyne)
- the second reactive group of the linker may comprise an additional click chemistry moiety (e.g., azide) that can react with the click chemistry moiety of the capture moiety.
- the linker may be coupled indirectly to the capture moiety, e.g., via noncovalent interaction or via an intermediate linking molecule.
- the intermediate linking molecule may comprise a polymerizable molecule (e.g., a polymer or nucleic acid molecule) that can couple the linker to the capture moiety .
- the polymerizable molecule may comprise (i) a third reactive group that is capable of coupling to the second reactive group (e.g, via alkyne-azide click chemistry) of the linker and (ii) a moiety that can couple to the capture moiety (e.g., another orthogonal click chemistry reaction, avidin-biotin interaction, nucleic acid coupling or hybridization).
- a third reactive group that is capable of coupling to the second reactive group (e.g, via alkyne-azide click chemistry) of the linker and
- a moiety that can couple to the capture moiety e.g., another orthogonal click chemistry reaction, avidin-biotin interaction, nucleic acid coupling or hybridization.
- the linking polymerizable molecule comprises a nucleic acid molecule that comprises (i) a click chemistry moiety (e.g., alkyne) that can conjugate to a reactive group (e.g., azide) of the linker and (ii) a nucleic acid sequence that can couple to the capture moiety, e.g., via ligation, splint ligation, or hybridization.
- the linking polymerizable molecule may comprise the linker comprising an amino acid-reactive moiety.
- the linker comprises a linking nucleic acid molecule that comprises a self-splinting moiety.
- the linker may be directly coupled to the polymerizable molecule (e.g., the polymerizable molecule comprises the linker that can couple to a monomer of the polymeric analyte).
- the polymerizable molecule may comprise an amino acid reactive moiety, which may enable direct coupling of the polymerizable molecule to the monomer of the polymeric analyte.
- the polymerizable molecule comprises a nucleic acid molecule that comprises an amino acid-reactive moiety, e.g., a guanidinylating group, a dithioester, an isothiocyanate (e.g., PITC), etc., which can be used to conjugate the polymerizable molecule to an amino acid, e.g., an N-terminal amino acid of a peptide analyte.
- an amino acid-reactive moiety e.g., a guanidinylating group, a dithioester, an isothiocyanate (e.g., PITC), etc.
- the peptide analyte may comprise or be coupled to an additional nucleic acid molecule; in such instances, the additional nucleic acid molecule may be used to localize (e.g., via hybridization) the polymerizable molecule to enable more efficient reaction of the amino acid-reactive group with the amino acid.
- the click chemistry moieties of the linker and capture moiety or intermediate linking molecule may comprise useful clickable moieties, as described elsewhere herein, e.g., alkenes, alkynes, azides, epoxides, amines, thiols, nitrones, isonitriles, isocyanides, aziridines, activated esters, and tetrazines, and combinations, variations, or derivatives thereof.
- the linker may be subjected to conditions sufficient to react the first click chemistry moiety to the second click chemistry moiety, e.g., provision of metal catalysts, appropriate solvents, pH, temperature, ionic concentration, or light/energy for any useful duration of time.
- the linker may comprise an amino acid-reactive moiety.
- the amino acid- reactive moiety of the linker may be any useful moiety that enables the reactive moiety to conjugate to and optionally cleave an amino acid.
- the first reactive moiety can react with a terminal amino acid (e.g., NTAA or CTAA).
- the first reactive moiety may comprise any primary amine or carboxylic group reactive group, including but not limited to isocyanates, acyl azides, NHS esters, sulfonyl chlorides, aldehydes, glyoxals, epoxides, oxiranes, carbonates, aryl halides, acyl halides, aldehydes, imidoesters, carbodiimides, anhydrides, phenyl esters, isothiocyanates (e.g., phenyl isothiocyanate, sodium isothiocyanate, ammonium isothiocyanates (e.g., tetrabutylammonium isothiocyanate, tetrabutylammonium isothiocyanate), diphenylphosphoryl isothiocyanate), acetyl chloride, cyanogen bromide, carboxypeptidases, azide, alkyne, DBCO, maleimide,
- linkers or sequencing reagents comprising amino acid reactive groups are provided in U.S. Pat. Pub. No. 2020/0217853, International Patent App. Nos. PCT/US2023/079684, filed November 14, 2023, and PCT/US2024/013211, filed January 26, 2024, and U.S. Provisional Patent App. No. 63/601,389, filed November 21, 2023, each of which is incorporated by reference herein in its entirety.
- the linker may comprise any additional useful moieties.
- the linker may comprise a releasable or cleavable moiety, which may facilitate removal of the monomer from the polymeric analyte, or portion thereof, or from the substrate.
- Such a releasable or cleavable moiety may comprise, for example, a disulfide bond, which may be releasable by contacting with a reducing agent (e.g., DTT, TCEP).
- a reducing agent e.g., DTT, TCEP
- the linker may couple to the third polymerizable molecule via the releasable or cleavable moiety, alternatively or in addition to the coupling via click chemistry moieties. As such, the coupling between the polymerizable molecule and the linker may be reversible.
- the linker may additionally or alternatively comprise any number of spacing moieties, e.g., polymers (e.g., PEG, PVA, polyacrylamide), peptide nucleic acids (PNAs), aminohexanoic acid, nucleic acids, alkyl chains, etc.
- spacing moieties may increase the distance between any other moieties of the linker, e.g., the amino acid-reactive group and the polymerizable molecule-reactive group.
- the spacing moiety may comprise any useful properties, e.g., comprise one or more moieties that are charged, polar, nonpolar, hydrophobic, hydrophilic, or a combination thereof.
- the linker may comprise a carbon, ethylene glycol, acetylene, isothianapthene, dimethyl silane, urethane, glycolic acid, lactic acid, dioxanone, methyl methacrylate, hydroxy ethyl methacrylate, vinyl chloride, tetrafluoroethylene, propylene, ethylene, ether ketone, ether suylfonefluorene, aniline, phenylene, polypyrrole, phenylenevinylene, fluorene, thiophene, or 3,4- ethylenedioxythiophene.
- the linker may comprise or be coupled to a detectable moiety, e.g., a fluorophore, radioisotope, mass tag, nucleic acid molecule (which can also act as a releasable or cleavable moiety), or other detectable moiety.
- the linker comprises a fluorophore, which can enable localization visualization of the linker using single- molecule imaging.
- the monomer may be labeled with a first fluorophore and the linker may comprise a second fluorophore to enable localization visualization of the linker and the monomer (e.g., using two-channel imaging or FRET).
- the linker comprises a barcode, e.g., a nucleic acid barcode or a peptide barcode.
- linker comprising two reactive groups may allow for coupling of the linker to (i) the monomer of the polymeric analyte and (ii) a linking polymerizable molecule (e.g., linking nucleic acid molecule) or (iii) the capture moiety .
- a linking polymerizable molecule e.g., linking nucleic acid molecule
- the linker may be pre-coupled to the linking polymerizable molecule.
- a precursor linker may comprise a monomer-binding group (e.g., PITC) and a click chemistry moiety (e.g., azide), which may be reacted with a polymerizable molecule (e.g., oligonucleotide) comprising complementary click chemistry moiety (e.g., alkyne) to generate a linker that is capable of coupling to the monomer and the capture moiety (e.g., another oligonucleotide).
- a monomer-binding group e.g., PITC
- a click chemistry moiety e.g., azide
- a polymerizable molecule e.g., oligonucleotide
- complementary click chemistry moiety e.g., alkyne
- the linkingpolymerizable molecule may comprisean amino acid reactive group (e.g., an isothiocyanate such as PITC, guanidinylating agent, xanthate, dithioester, etc.).
- the linker may be provided pre-coupled to the linking polymerizable molecule.
- FIG. 2A schematically shows an example linker that may be used in sequencing polymeric analytes such as peptides.
- FIG. 2A Panel A shows a bifunctional linker 203 (e.g., 1- (but-3-yn-l-yl)-4-isothiocyanatobenzene) comprising an amino acid reactive moiety (e.g., PITC) and an alkyne click chemistry moiety, which may be reacted with a polymerizable molecule 201 (e.g., a linking nucleic acid molecule) comprising a complementary azide click chemistry moiety.
- a bifunctional linker 203 e.g., 1- (but-3-yn-l-yl)-4-isothiocyanatobenzene
- an amino acid reactive moiety e.g., PITC
- alkyne click chemistry moiety e.g., a polymerizable molecule 201 comprising a complementary azide click chemistry moiety.
- the bifunctional linker may also comprise a spacer moiety, e.g., an alkyl chain (an ethyl group is depicted) of any length, a polymer (e.g., PEG) of any length, etc.
- the spacer moiety may be located between the amino acid reactive moiety and the click chemistry moiety.
- FIG. 2A Panel B shows the product of a click chemistry cycloaddition reaction between the azide and alkyne groups to generate a linker molecule comprising the polymerizable molecule and the amino acid reactive moiety .
- the conjugation of the polymerizable molecule 201 to the bifunctional linker 203 may occur at any useful or convenient step.
- the bifunctional linker 203 may comprise an azide group, e.g., l-(2-azidoethyl)-4- isothiocyanatobenzene, which can be reacted to a polymerizable molecule 201 comprising an alkyne moiety.
- an azide group e.g., l-(2-azidoethyl)-4- isothiocyanatobenzene
- FIG. 2B illustrates another example linker that can be used in sequencing polymeric analytes such as peptides.
- the linker comprises a guanidinylating group, which can react with and conjugate to an N-terminal amino acid with high efficiency under relatively mild conditions.
- the linker also comprises a spacer moiety that comprises a charged linker (ammonium) moiety that connects the guanidinylating group with a click chemistry moiety (modified tetrazine comprising dihydro-2H-pyran).
- the charged linker moiety may be useful in increasing the attraction (decreasing electrostatic repulsion) of the linker to another charged polymerizable molecule, e.g, a linking DNA molecule, and/or decreasing the translocation speed of the linker (or the modified amino acid comprisingthe linker) through a nanopore sequencer.
- the modified tetrazine can react with another molecule, e.g., a polymerizable molecule or capture moiety that comprises a transcyclooctene (TCO) moiety, in a highly efficient click chemistry reaction.
- TCO transcyclooctene
- the polymeric analyte comprises a peptide that comprises amino acid monomers
- the coupling of the linker to an amino acid changes the chemical structure of the amino acid.
- the amino acid may be derivatized to a thiocarbamyl group (e.g., under mildly alkaline conditions) during or subsequent to contact with the isothiocyanate moiety.
- the amino acid or amino acid derivative e.g., thiocarb amyl-derivatized amino acid
- the amino acid or amino acid derivative may be further derivatized to a thiazolone group (e.g., under acid conditions), a thiohydantoin group, or other chemical moiety.
- a thiazolone group or thiohydantoin group may be further derivatized to a thiocarbamyl group.
- the polymerizable molecule comprises a linker.
- the linker may be used, for instance, for coupling a reactive moiety to the polymerizable molecule, which reactive moiety can react with that of another linker or with a monomer (e.g., terminal amino acid).
- a polymerizable molecule e.g., nucleic acid molecule
- the linker comprising the click chemistry moiety may be coupled to the polymerizable molecule using any useful approach, e.g., by incorporation of a linker-conjugated nucleotide or nucleoside and may be located at any useful position (e.g., at a 5’ end, at a 3’ end, in the center or other position of the polymerizable molecule).
- a click-functionalized nucleotide or nucleoside e.g., ethynyl deoxyuridine, octadiynyl deoxyuridine, can be incorporated into a DNA or RNA molecule.
- the click chemistry moiety of the polymerizable molecule may then couple to another linker that comprises a complementary click chemistry moiety and also an amino acid reactive group (e.g., isothiocyanate, dansyl chloride, DNFB, xanthate, dithioester, guanidinylating agent, NHS ester, etc.).
- the polymerizable molecule may comprise afree amine (e.g., a modified nucleobase orbackbone with a free amine).
- the free amine may be able to react with another linker that comprises, for example, an NHS ester group and also an amino acid reactive group.
- the polymerizable molecule may comprise the amino acid reactive group.
- the linker may be coupled to a backbone of a polymerizable molecule.
- the linker may form a phosphodiester or analogous bond with a DNA polymerizable molecule, or the linker may form an amide bond with a peptide polymerizable molecule.
- the polymerizable molecule may comprise any useful number of linkers; for example, the polymerizable molecule may comprise a first linker that can couple to a second linker that can couple to a third linker that is coupled to or configure to couple to a monomer of a polymeric analyte, e.g., an amino acid or a peptide.
- the modified amino acid e.g., comprising the cleaved amino and the polymerizable molecule
- the linker may comprise any number of spacing moieties, e.g., alkyl chains, polymer spacers (e.g., PEG), nucleic acid or oligo spacers, or other useful spacing moieties which may be useful in modulating the size or molecular weight of the linker.
- the linker may comprise atleast 1, at least2, at least 3, at least4, at least 5, at least 6, at least 7, atleast 8, at least 9, at least 10, or a greater number of spacing moieties (e.g., hydrocarbon units, PEG units, nucleotides, spacer sequences etc.).
- the linker may comprise at most about 100, at most about 10, at most about 9, at most about 8, at most about 7, at most about 6, at most about 5, at most about 4, at most about 3, at most about 2, or at most 1 spacing moiety.
- the linker may comprise any useful number of functional groups, e.g., for attachment to multiple molecules.
- the linker may comprise atleast 1, at least2, at least 3, at least4, at least 5, at least 6, at least 7, atleast 8, at least 9, at least 10, or a greater number of functional groups.
- the linker may be modulated to achieve a size range or relative size as compared to another molecule in the system. For instance, it may be advantageous to use a linker that has a ratiometric length or radius as compared to a nanopore or nanogap channel, e.g., to increase dwell time in a nanopore. Accordingly, the linker size or molecular weight may be adjusted to achieve an optical length or molecular weight, e.g., via addition of spacer moieties, monomers, or by linking multiple linkers together.
- the linker may comprise a charged moiety (e.g., cation, anion, polyatomic ion, a nucleic acid molecule) or any other useful functional groups such as hydrophilic moieties, hydrophobic moieties, chelators, or lipophilic moieties.
- the linker may comprise a carboxylic moiety, a primary amine, a secondary amine, a tertiary amine, or a quaternary amine.
- the modified amino acid or derivative thereof may be produced from a peptide using an intramolecular expansion process, e.g., using one or more linkers and polymerizable molecules.
- intramolecular expansion individual amino acids, clusters of amino acids, or small peptides (e.g., a dipeptide, tripeptide, or quadripeptide) of a protein (e.g., a peptide or protein analyte) may be sequentially removed and re-tethered together, suchthatthe distance (e.g., a chemical distance suchasthe number of atomsor a physical distance) between the individual amino acids or clusters of amino acids is increased.
- a chemical distance suchasthe number of atomsor a physical distance
- performing an intramolecular expansion process of one or more amino acids of a peptide may obviate or overcome several issues with nanopore sequencing of peptides. For instance, by increasing the spacing between amino acids, fewer amino acids may enter the nanopore at a given instance, thereby reducing the quantity of superimposed signals arising from the number of amino acids within the nanopore. Further, increasing the spacing between amino acids may disrupt intramolecularinteractionsthatconvolvethe currentblockade signal, thereby allowing for higher- accuracy signal output from the nanopore. In some instances, the intramolecular expansion process may sufficiently separate the amino acids, such that only one amino acid or derivative thereof is present in a sensing region of a nanopore or nanogap in a given moment.
- a method for intramolecular expansion may comprise providing a peptide comprising a plurality of amino acids, a linker (e.g., as described elsewhere herein), a polymerizable molecule (e.g., as described elsewhere herein), and a capture moiety.
- the linker may be configured to couple to (i) an amino acid (e.g., NTAA, CTAA, or an internal amino acid) of the peptide and (ii) the polymerizable molecule.
- the method may further comprise contacting the linker with the amino acid and the polymerizable molecule.
- the linker may be provided pre-tethered to the polymerizable molecule and subsequently reacted with the amino acid.
- the linker may couple to the amino acid of the peptide to generate an amino acidlinker complex, which may or may not comprise the polymerizable molecule.
- the amino acid-linker complex may then couple to the capture moiety via the polymerizable molecule.
- the polymerizable molecule and the capture moieties may both comprise nucleic acid molecules, which may be coupled via hybridization, ligation, an extension reaction, or combination thereof.
- the method may further comprise, cleaving the amino acid from the peptide to yield a modified amino acid that comprises the amino acid-linker complex, and optionally repeating the process.
- an additional linker may be provided which is configured to couple to (i) an additional amino acid of the peptide (e.g., the n- 1 NTAA or n-1 CTAA) and (ii) an additional polymerizable molecule.
- the additional linker may be provided pre-coupled to the additional polymerizable molecule.
- the method may further comprise contacting the additional linker with the additional amino acid to generate an additional amino acid-linker complex.
- the additional polymerizable molecule may be coupled to the additional linker prior to, during, or subsequent to the coupling of the additional linker to the additional amino acid.
- the additional polymerizable molecule may be configured to couple to the (first) modified amino acid, e.g., via coupling of the polymerizable molecule and the additional polymerizable molecule.
- the additional linker-additional amino acid complex may couple to the modified amino acid, thereby generating a stacked plurality of modified amino acids (see, e.g., FIGs. 1 A-1B and ID).
- the additional amino acid may be cleaved from the peptide prior to, during, or sub sequent to generation of the stacked plurality of modified amino acids.
- intramolecular expansion of the peptide or protein may occur across a plurality of capture moieties.
- a substrate may be provided that comprises a plurality of capture moieties, and, in some instances, the capture moieties are located adj acent to the peptide or protein.
- a first amino acid (e.g., n NTAA or n CTAA) of the peptide or protein may be coupled to a first capture moiety (e.g., via a first linker and a first polymerizable molecule), a second amino acid (e.g., n-1 NTAA or n-1 CTAA) may be coupled to a second capture moiety (e.g., via a second linker and a second polymerizable molecule), and a third amino acid (e.g., n-2 NTAA or n-2 CTAA) may be coupled to a third capture moiety (e.g., via a third linker and third polymerizable molecule).
- a first capture moiety e.g., via a first linker and a first polymerizable molecule
- a second amino acid e.g., n-1 NTAA or n-1 CTAA
- a third amino acid e.g., n-2 NTAA or n-2 CTAA
- a first amino acid (e.g., n NTAA) may be coupled to a first capture moiety
- a second amino acid e.g., n-1 NTAA
- the modified amino acid e.g, from the n NTAA
- a third amino acid e.g., n-2 NTAA
- the polymerizable molecule may comprise temporal information, e.g., abarcode on the round or cycle numberthatthe polymerizable molecule is provided.
- any numberof amino acids (or modified amino acids) may be coupled to any number of capture moieties.
- intramolecular expansion is performed without the use of a polymerizable molecule.
- intramolecular expansion may be performed using a chemical expansion process, e.g., using amide skeletal elongation via amino acid insertion, as described by Z. Liu et al. 2023. Chemistry a European Journal. Vol 29, Issue 46, which is incorporated by reference herein in its entirety.
- a peptide may be intramolecularly expanded by chemical processing (see, e.g., FIG. 21).
- a “modified amino acid” as used herein may refer to the amino acid-linker complex, the amino acid-linker-polymerizable molecule complex, or derivatives thereof.
- the modified amino acid may be used to refer to the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex before or after cleavage.
- the modified amino acid may refer to a portion of the amino acid-linker complex or the amino acid-linker- polymerizable molecule complex (e.g., justthe comprised aminoacid portion, justthe amino acidlinker complex portion, etc.).
- a capture moiety may couple to the amino acid, the linker, the polymerizable molecule, or a combination thereof.
- the coupling of the amino acid, the linker, or the polymerizable molecule to the capture moiety may comprise a covalent interaction or a noncovalent interaction.
- the coupling may occur by interaction of binding pairs, e.g., biotin and avidin (or streptavidin), antigen or epitope and antibody or antibody fragment, cyclodextrins and small hydrophobic molecules (e.g., alkanes, benzene, polycyclics), cucurbiturils and adamantaneammonium or trimethylammoniomethyl ferrocene, cyclophane (e.g., calixarenes, cavitands, pillararenes, tetralactams), etc.
- binding pairs e.g., biotin and avidin (or streptavidin), antigen or epitope and antibody or antibody fragment, cyclodextrins and small hydrophobic molecules (e.g., alkanes, benzene, polycyclics), cucurbiturils and adamantaneammonium or trimethylammoniomethyl ferrocene, cyclophane (e.g., calixarenes
- the coupling of the amino acid, the linker, or the polymerizable molecule to the capture moiety occurs through coupling of nucleic acid molecules (e.g., hybridization to one another or to a splint molecule, ligation, or a nucleic acid extension reaction).
- nucleic acid molecules e.g., hybridization to one another or to a splint molecule, ligation, or a nucleic acid extension reaction.
- the capture moiety comprises an additional polymerizable molecule (e.g., a nucleic acid molecule or peptide).
- both the polymerizable molecule of the modified amino acid and the capture moiety may comprise nucleic acid molecules.
- the nucleic acid molecules may be coupled to one another, e.g., via complementary base pairing directly or via a splint or bridge molecule and optional ligation (e.g., enzymatic or chemical ligation).
- the splint molecule may comprise a hairpin nucleic acid molecule.
- the nucleic acid molecules may be coupled via a nucleic acid extension or amplification reaction.
- the capture moiety and the polymerizable molecule comprise click chemistry moieties or reactive moieties which can allow for chemical ligation of the capture moiety to the polymerizable molecule.
- click chemistry moieties or reactive moieties which can allow for chemical ligation of the capture moiety to the polymerizable molecule.
- Non-exhaustive examples of chemical attachment of oligonucleotides can be found in M. Greenberg. Current Protocols in Nucleic Acid Chemistry. (2000). 1.4.1-4.5.19, which is incorporated by reference in its entirety.
- the capture moiety or the polymerizable molecule, or, if applicable, a splint molecule can comprise any naturally occurring, non-naturally occurring or engineered nucleotide base.
- the capture moiety or the polymerizable molecule may comprise a nucleic acid molecule or analog thereof, which may comprise a pseudo-complementary base, a bridged nucleic acid, a xenonucleic acid, a locked nucleic acid, a peptide nucleic acid (PNA), a gamma-PNA, a morpholino, etc., as is described elsewhere herein.
- a nucleic acid molecule or analog thereof may comprise a pseudo-complementary base, a bridged nucleic acid, a xenonucleic acid, a locked nucleic acid, a peptide nucleic acid (PNA), a gamma-PNA, a morpholino, etc., as is described elsewhere herein.
- the capture moiety may comprise one or more functional sequences, including, but not limited to a priming sequence, sequencing sequence (e.g, P5 or P7 sequence), sequencing read sequence (e.g., R1 or R2 sequence), a protein binding site such as a mosaic end sequence, a transposase recognition sequence, a transcription factor binding site, a cleavage site (e.g., restriction site), a UMI, a blocking group, a spacer sequence, a barcode sequence, or other functional sequence.
- a priming sequence e.g, P5 or P7 sequence
- sequencing read sequence e.g., R1 or R2 sequence
- a protein binding site such as a mosaic end sequence
- a transposase recognition sequence e.g., a transcription factor binding site
- a cleavage site e.g., restriction site
- UMI a blocking group
- spacer sequence e.g., a barcode sequence, or other functional sequence.
- the capture moiety comprises a cleavable or releasable moiety, e.g., a restriction enzyme recognition site, an abasic site, a uracil which can be cleaved using USER® or uracil DNA glycosylase, a disulfide bond that can be releasable upon addition of a reducing agent, a photocleavable moiety that can be cleaved using a photostimulus, a thermolabile bond that is cleaved using a thermal stimulus, etc.
- a cleavable or releasable moiety e.g., a restriction enzyme recognition site, an abasic site, a uracil which can be cleaved using USER® or uracil DNA glycosylase, a disulfide bond that can be releasable upon addition of a reducing agent, a photocleavable moiety that can be cleaved using a
- the capture moiety comprises a partial restriction site; e.g., the capture moiety may comprise a first partial restriction site and the polymerizable molecule may comprise a second partial restriction site; upon coupling or ligation of the polymerizable molecule to the capture moiety, the two partial restriction sites may generate a complete restriction site, such that the individual molecules (capture moiety and polymerizable molecule) are not cleavableby restriction digest individually but the ligated or coupled product is.
- the capture moiety comprises a barcode sequence that comprises any useful information, e.g., the identity of the peptide that is to be analyzed, temporal information, spatial information, the origin of the peptide (e.g., from a sample, partition, protein, cell, experiment, etc.).
- the barcode may encode for information on a protein, cell type, demographic, patient, organism, cell state, phenotype, disease, age, sex, ancestral lineage, fitness or athletic performance, nutrient or metabolic state, paternity or kinship, drug-association, behavioral or cognitive state, pharmacokinetics, a synthetic molecule, etc.
- the capture moiety is provided coupled to a substrate or is configured to couple to a substrate (e.g., couple to an anchor moiety or molecule on the substrate).
- the substrate comprises one or more identical capture nucleic acid molecules; these identical capture nucleic acid molecules may act as a capture moiety for coupling to a polymerizable molecule of a modified amino acid, e.g., for a modified amino acid comprising a terminal amino acid, an n-1 amino acid, etc.
- substrates e.g., beads (e.g., DNA beads or barcoded beads), flow cells, or chips, e.g., Illumina® HiSeq, iSeq, MiniSeq, NextSeq, NovaSeq, etc.
- the capture moieties may comprise additional useful sequences, e.g., primer sequences (e.g., P5 or P7 sequences) or read sequences (e.g., R1 or R2).
- the capture moiety may be configured to couple to a substrate.
- the substrate may comprise one or more anchor molecules to which the capture moiety binds.
- capture-anchor molecule binding pairs include biotinstreptavidin, nucleic acid coupling, and click chemistry pairs (e.g., azide-alkyne, TCO-tetrazine, etc.).
- the capture moiety may be coupled to a substrate using any useful approach.
- the capture moiety comprises a substrate-tethering group or linker or additional functional group.
- the capture moiety comprises a nucleic acid molecule that comprises a substrate-tethering group, e.g., biotin, a click chemistry moiety such as an azide, that can couple to a substrate, e.g., a substrate comprising streptavidin or a complementary click chemistry moiety that can react with that of the substrate-tethering group.
- the capture moiety may comprise a nucleic acid molecule, which may be coupled to a substrate-tethered nucleic acid molecule (e.g., an anchor nucleic acid molecule).
- the capture moiety may additionally comprise a binding sequence, to which another nucleic acid molecule (e.g., a polymerizable molecule that is part of or coupled to the modified amino acid).
- the capture moiety comprises a single-stranded oligonucleotide or a single-stranded region in which a complementary oligonucleotide can hybridize.
- the complementary oligonucleotide may comprise a detectable label (e.g., fluorophore) that allows for detection of the capture moiety.
- the capture moiety maybe directly coupledto the polymeric analyte (e.g., peptide) that is to be analyzed or is undergoing intramolecular expansion.
- the capture moiety may additionally comprise a nucleic acid barcode molecule that encodes the identity of the peptide or the originating sample or partition from which the peptide originated.
- the capture moiety may be coupled to any useful segment of the peptide, e.g., at a terminus (e.g, C-terminus) or at an internal residue (e. g.
- the capture moiety may be provided in a solution and may not be coupled to the substrate or the peptide.
- the capture moiety is coupled to or configured to couple to both the polymeric analyte and the substrate.
- the capture moiety may be coupled to the polymeric analyte and optionally comprise information (e.g., a barcode molecule) about the polymeric analyte or thatidentifiesthe polymeric analyte.
- the capturemoiety may also be coupled to or configured to couple to a substrate, e.g., via a linker, coupling of affinity or interactive pairs (e.g., biotin-streptavidin, antibody-antigen), nucleic acid coupling, etc.
- the capture moiety comprises a nucleic acid molecule which can be linked to a peptide analyte, e.g, at a terminus or a side chain of the peptide analyte using, for example click chemistry, linkers, or reaction of amine groups of lysine side chains or carboxyl group of aspartic acid or glutamic acid residues.
- the capture moiety may couple to a substrate comprising one or more anchor nucleic acid molecules, e.g., via hybridization or ligation. See, e.g., FIG. IB.
- the capture moiety may be coupled to the polymeric analyte using any useful attachment approach, e.g., click chemistry moieties (e.g., alkyne-azide coupling), photoreactive groups (e.g., benzophenone), l-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC) (e.g., to couple amino-oligosorpeptides),N-hydroxysulfosuccinimide(NHS), Sulfo-NHS, orNHS-esters (e.g., to couple sulfhydryl oligos), maleimides, hydrazines, hydroxyl amines, thiols, biotin-streptavidin interactions, cystamine, glutaraldehyde, formaldehyde, succinimidyl 4-(N- maleimidomethyl)cyclohexame-l-carboxylate (SM
- a polymeric analyte may comprise an alkyne such as dibenzocyclooctyne (DBCO), which may be configuredto react to an amine (e.g., DBCO-alcohol, DBCO-Boc, DBCO-NHS), a carboyxl or carbonyl (e.g., DBCO, DBCO-silane), a sulfhydryl, etc.
- DBCO dibenzocyclooctyne
- An azide-functionalized capture moiety e.g., a capture nucleic acid molecule may react with DBCO to link the polymeric analyte and the capture moiety.
- linkers such as bifunctional linkers may be used to attach a polymeric analyte to a capture moiety; such bifunctional linkers may comprise the same reactive moiety on both ends or a different moiety at each end (e.g., heterobifunctional linker). Additional examples of linkers are described elsewhere herein.
- the capture moiety and the polymeric analyte are coupled using a linker.
- terminal amino acid residue attachment can be achieved by reacting the peptide with a linker comprising (i) an amine-reactive group (e.g., isothiocyanates such as PITC, guanidinylating agents, dithioesters, xanthates, NHS esters, etc.) and (ii) a reactive group (e.g., click chemistry group).
- the linker can be, for example, PITC-conjugated click chemistry moieties.
- the linker reacts with and “blocks” the primary amines (e.g., modifies lysines), including the N-terminus. Subsequent cleavage of the N-terminal amino acid (e.g., using an Edman reagent, such as acid), can be performed, and one of the remaining modified lysines may be attached to the capture moiety (e.g. , usingthe click chemistry moiety coupled to the aminereactive group).
- the primary amines e.g., modifies lysines
- Subsequent cleavage of the N-terminal amino acid e.g., using an Edman reagent, such as acid
- one of the remaining modified lysines may be attached to the capture moiety (e.g. , usingthe click chemistry moiety coupled to the aminereactive group).
- the peptide may be treated with a protease, e.g., LysC, which cleaves peptides such that a remaining peptide has a C-terminal lysine and suchthatthe remaining peptide comprises a primary amine only at the C-terminal lysine residue and the N-terminus.
- a protease e.g., LysC
- the capture moiety may comprise the amine-reactive group, which can then couple to the N-terminal amine or to the amine of a lysine side chain.
- reactions for coupling the capture moiety and the polymeric peptide include, but are not limited to: thiol-based reactions (e.g., disulfide bonds with cysteine residues, Michael addition or retro-Michael reaction with maleimides or electrophilic alkenes), imine-based reactions (e.g., lysine residues formingreversible imines with aldehydes or ketones), oxime-based reactions (e.g., lysine residues forming oxime bonds with hydroxylamine derivatives or aminooxy compounds), hydrazone/oxime-based reactions (e.g., ketone or aldehyde moieties from N- terminal residues or modified side chains forming reversible hydrazone or oxime bonds with hydrazides or aminooxy compounds), boronic acid-based reactions (e.g., 1,2- or 1,3 -diol containing residues such as serine, threonine or tyrosine may react
- generating the modified amino acid further comprises cleaving the amino acid or the amino acid-linker complex from the peptide.
- the cleaving of the amino acid or amino acid-linker complex may be achieved using any suitable mechanism, such as via application of a stimulus.
- the stimulus can be, for example, a chemical stimulus, a biological stimulus, a thermal stimulus (e.g., application of heat), a photo-stimulus, a physical or mechanical stimulus, or other type of stimulus or a combination of stimuli.
- the stimulus comprises a chemical stimulus, e.g., a change in pH, application of an acid (e.g., trifluoroacetic acid, heptafluorobutyric acid, formic acid, phosphoric acid, acetic acid) or base, addition of a lytic agent, initiating agent, radical-generating agent, reducing agent, etc.
- the chemical stimulus comprises application of a Lewis acid (e.g., boron triflate, boron trifluoride etherate, boron trichloride, boron tribromide, boron triiodide, or scandium triflate).
- the stimulus comprises a biological stimulus, e.g., enzyme (e.g., Edmanase, protease, nuclease such as endonuclease or exonuclease) or ribozyme or DNAzyme that can cleave or catalyze cleavage of the amino acid or amino acid-linker complex.
- enzyme e.g., Edmanase, protease, nuclease such as endonuclease or exonuclease
- ribozyme or DNAzyme that can cleave or catalyze cleavage of the amino acid or amino acid-linker complex.
- the stimulus may be neutralized subsequent to the cleaving using any useful technique, e.g., neutralization of an acid, heat-killing of enzyme, nucleic acid removal or degradation, buffer exchange, or other approach.
- the methods provided herein may comprise using a linker comprising an amino acid reactive group (e.g., PITC, a xanthate, a guanidinylating agent, a dithioester or thiocarbamoyl) and coupling the amino acid reactive group of the linker with the amino acid and cleavingthe amino acid from the peptide using a stimulus (e.g., change in pH, temperature).
- a linker comprising an amino acid reactive group (e.g., PITC, a xanthate, a guanidinylating agent, a dithioester or thiocarbamoyl) and coupling the amino acid reactive group of the linker with the amino acid and cleavingthe amino acid from the peptide using a stimulus (e.g., change in pH, temperature).
- a stimulus e.g., change in pH, temperature
- the linker may comprise a PITC moiety that couples to an NTAA under mildly alkaline conditions to generate a phenylthiocarbamoyl (PTC) derivative of the NTAA, and cleavage of theNTAA from the peptide may be achievedusingan Edman degradation reaction (e.g., application of an acid such as trifluoroacetic acid or boron triflate, optionally with heat), to generate a thiazolinone (ATZ) derivative or a phenylthiohydantoin (PTH) derivative.
- PTC phenylthiocarbamoyl
- the linker may comprise a moiety or molecule (e.g., polymerizable molecule such as a nucleic acid molecule) that can also couple to the capture moiety such that the amino acid-linker complex may be coupled to the capture moiety, thereby generating an amino acid-linker-capture moiety complex.
- a moiety or molecule e.g., polymerizable molecule such as a nucleic acid molecule
- nucleic acid molecules may comprise predominantly pyrimidines (e.g., thymines, cytosines, uracils) which are more resistantto acid degradation and heatas compared to purines(e.g., adenine and guanine).
- a nucleic acid molecule may comprise at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% thymines or cytosines.
- canonical nucleotides may be substituted or may comprise acid-resistant nucleotide analogs, e.g., hexitol nucleic acids.
- Milder degradation under basic conditions forN-terminal amino acid removal can include the use of triethylamine acetate in acetonitrile or other solvent such as water, N, N-dimethylformamide (DMF), or a mixture of solvents.
- solvent such as water, N, N-dimethylformamide (DMF), or a mixture of solvents.
- degradation may be achieved using a thioacylation approach, the use of milder acid reagents, e.g., trichloroacetic acid (pKa of 0.66) or dichloroacetic acid (pKa of 1 .35), acetic acid, or alternative basic reaction conditions, e.g., using acid-base pairs such as N, N- Diisopropylethylamine (DIPEA), pyridine, acetic acid derivatives, etc.
- a base alkaline conditions
- use of a mild base can be used to cleave amino acid-linker complex.
- C-terminal degradation strategies are also provided herein.
- C-terminal degradation may comprise Edman-like degradation approaches.
- C-terminal degradation may employ the use of activatingreagents that react with the C-terminal carboxyl group of a peptide, and a derivatizing agent (e.g., a thiocyanate to generate a peptide-thiocyanate or peptide-thiohydantoin).
- activating reagents include acetyl chloride and acetic anhydride.
- single-step C-terminal derivatization of a peptide to a peptidyl- thiohydantoin may be performed, e.g., using Schlack-Kumpf approach, in which a peptide is reacted with thiocyanic acid (e.g., in acetone) to generate a peptidyl-thiohydantoin.
- the peptide- thiohydantoin may be cleaved, e.g., using basic conditions, to generate an amino acid thiohydantoin and remaining peptide.
- Cleavage of amino acids may also be achieved using enzymatic or enzyme-analog (e.g., ribozyme or DNAzyme) approaches.
- Example enzymatic cleavage may include the use of Edmanases (e.g., modified cruzain), aminopeptidases (e.g., Pfu aminopeptidase I, PhTET aminopeptidases, P. horikoshii aminopeptidases), metalloenzymatic aminopeptidases, acylpeptide hydrolases, tRNA synthetases, endopeptidases, carb oxy peptidases, and the like.
- Edmanases e.g., modified cruzain
- aminopeptidases e.g., Pfu aminopeptidase I, PhTET aminopeptidases, P. horikoshii aminopeptidases
- metalloenzymatic aminopeptidases e.g., acylpeptide hydrolases
- tRNA synthetases e.
- the enzymes or ribozymes or DNAzymes may be modified or engineered to recognize a modified amino acid, e.g., an amino acid that has a chemical moiety attached thereto (e.g., PITC, NITC, dansyl chloride, SNFB, DNP, SNP, guanidinyl group, biotin, streptavidin, nucleic acid molecules, lipids, carbohydrates, acetyl groups, acyl groups, guandinylation agents, etc.).
- a modified amino acid e.g., an amino acid that has a chemical moiety attached thereto (e.g., PITC, NITC, dansyl chloride, SNFB, DNP, SNP, guanidinyl group, biotin, streptavidin, nucleic acid molecules, lipids, carbohydrates, acetyl groups, acyl groups, guandinylation agents, etc.).
- One or more reactions may be accelerated by application of energy or radiation, e.g, electromagnetic radiation.
- energy or radiation e.g, electromagnetic radiation.
- degradation or cleavage of the terminal amino acid of a peptide may be facilitated by applying microwave energy to accelerate the reaction kinetics.
- hydrolysis of proteins may be facilitated by application of microwave energy, e.g., as described in Margolis et al., 1991, Journal of Automatic Chemistry. Vol 13, No. 3, pp 93-95, which is incorporated by reference herein.
- cleavedamino acids may also be performed. For instance, in some instances, cleavage of the amino acid may result in generation of stereoisomers.
- the cleaved products maybe treated to remove the stereocenter during the cleavage reaction.
- stereoisomers may be enzymatically converted to a single isomer, e.g., using an isomerase.
- more than one amino acid may be cleaved from the peptide per cleavage event.
- the cleaving may comprise cleaving 2 amino acids, 3 amino acids, 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, 8 amino acids, 9 amino acids, 10 amino acids, or more.
- the polymeric analyte may comprise a peptide comprising a plurality of amino acids, and single amino acids, di-peptides, tri-peptides, quadri-peptides, or larger may be cleaved in the methods described herein.
- At most about 10 amino acids, at most about 9 amino acids, at most about 8 amino acids, at most about 7 amino acids, at most about 6 amino acids, at most about 5 amino acids, at most about 4 amino acids, at most about 3 amino acids, or fewer amino acids may be cleaved in a given cleavage event.
- cleavage of greater than one amino acid may be mediated using an enzyme (e.g., Edmanase, protease) or ribozyme or DNAzyme that is capable of recognizing or cleaving more than a single amino acid.
- Cleavage of the amino acid from the peptide may be conducted using a biological stimulus, such as an enzyme or ribozyme or DNAzyme.
- the enzyme can be any useful cleaving enzyme, e.g., a protease, such as an Edmanase, hydrolase, lyase, transferase, cruzain, a cleaving protein (e.g., ClpS, ClpX), Proteinase K, exopeptidase, aminopeptidase, diaminopeptidase, serine protease, cysteine protease, threonine protease, aspartic protease, aspartic protease, glutamic protease, metalloprotease, asparagine peptide lyase, pepsin, trypsin, pancreatin, Lys-C, Arg-C, Glu-C, Asp-N, chymotrypsin, carboxypeptidase (e.g., carboxypeptidase A, carboxypeptidase B, carb oxy peptidase Y), SUMO prote
- the cleaving enzyme or ribozyme orDNAzyme may be configured or engineered to cleave a terminal amino acid or plurality of amino acids; alternatively, the cleaving enzyme or ribozyme or DNAzyme may be configured or engineered to cleave off-site at a non-terminal location of the peptide, e.g., at an internal amino acid at an n-1, n-2, n-3, n-4, n-5, n-6, n-7, n-8, n-9, n-10, etc. position, where n is the number of amino acids in the peptide.
- additional reagents may be providedto catalyze or induce the cleavage.
- metalloproteases, aminopeptidases, or exopeptidases may facilitate cleavage of an amino acid or plurality of amino acids in the presence of a catalyst, e.g, metal or metal ion (e.g., cobalt).
- a catalyst may be provided in order to facilitate the binding of the enzyme to an amino acid or the subsequent cleavage of the amino acid from the peptide.
- cleavage may be mediated by an apo-enzyme, which is inactive in the absence of a metal catalyst of cofactor, and cleavage may be controlled by addition of metal or metal ions.
- cleaving stimuli include: a photo stimulus (e.g., application of UV, X-rays, gamma rays, or other wavelength of light), mechanical stimulus (e.g., sonication, high pressure, electromagnetic energy), thermal stimulus (e.g., application of heat), or chemical stimulus.
- a photo stimulus e.g., application of UV, X-rays, gamma rays, or other wavelength of light
- mechanical stimulus e.g., sonication, high pressure, electromagnetic energy
- thermal stimulus e.g., application of heat
- chemical stimulus e.g., application of heat
- the peptide may comprise or be altered to comprise a cleavable or labile bond that can be cleaved upon application of the appropriate stimulus, e.g., disulfide bonds (e.g., cleavable upon application of a chemical stimulus such as a reducing agent), ester linkages (e.g., cleavable with a change of pH), a vicinal-diol linkage (e.g., cleavable with sodium periodate), a Diels-Alder linkage (e.g., cleavable upon application of heat), a sulfonelinkage (e.g, cleavable via a base), a silyl ether linkage (e.g., cleavable via an acid), a glycosidic linkage (e.g, cleavable via an amylase), a peptide linkage (e.g., cleavable via a protease), or
- the capturemoiety may be cleaved from the peptide orthe substrate.
- the cleaving may occur at any useful or convenient step, e.g., after generation of the modified amino acid or stacked plurality of modified amino acids.
- cleavage of the capture moiety may occur subsequent to the formation of a stacked plurality of modified amino acids, and the cleaved product may be sequenced, e.g., using nanopore sequencing or imaging approaches described elsewhere herein.
- Nanopore s/Nanogaps Characterization of the modified amino acid may be performed using nanoscale technologies such as nanopores, nanogaps, or nanochannels.
- a nanopore, nanogap, or nanochannel may be provided on a membrane in an ionic solution.
- individual analytes enter the nanopore under an applied electric potential, thereby altering the flow of ions through the nanopore in a timedependent manner.
- Measurement of the modulation of the ionic current as the individual analytes translocate through the nanopore can be performed, and the measured signal can be computationally decoded, e.g., to yield a DNA sequence of a DNA analyte.
- a nanopore sequencer may be used to characterize or analyze the detectable complexes described herein, e.g, modified amino acids or stacked pluralities of modified amino acids. Additional example methods and systems for nanopore sequencing of intramolecularly expanded peptides can be found in International Patent App. No. PCT/US2023/071456, which is incorporated by reference herein.
- a signal may be measured from the nanopore, membrane, or surrounding solution. For example, a conductance, current, current blockage, current density, current change, voltage, impedance, resistance, inductance, capacitance, frequency, phase, power, electric field, magnetic field, or other parameter within the nanopore may be monitored as a function of time.
- the current signal may be an ionic current signal, a cross-pore or transverse-to-pore drain current, or a source current.
- a change in the conductance, current, impedance, or other parameter may occur and provide information (e.g., size, charge, aspect ratio, volume, hydrophobicity, chemical structure) on the molecule.
- Each amino acid or modified amino acid, or a subset of amino acids or modified amino acids may generate a unique signal that is distinguishable from other amino acids or modified amino acids.
- the unique signal signatures may be assigned to the amino acids or modified amino acids in order to determine the identity of the amino acids, polymerizable molecules, or both.
- a modified amino acid (e.g., an individual modified amino acid or a modified amino acid comprised by a stacked plurality of modified amino acids) may b e analyzed numerous times, e.g., via translocation and measuring of a current or conductance of the modified amino acid, through the same or different nanopore or nanogap.
- iterative reading of a modified amino acid may be beneficial in improving the accuracy of the reads or identification of the modified amino acid (or polymerizable molecule).
- the modified amino acid may be translocated through one or more nanopores at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, at least 10 times, at least 20 times, at least 50 times, at least 100 times, at least 200 times, or even greater.
- the modified amino acid may be ratcheted back and forth within the nanopore, e.g., using a processive enzyme such as phi29 polymerase such as described in Cherf et al. 2012. Nat. Biotechnol. 30(4):344-348, which is incorporated by reference herein in its entirety.
- a motor protein may be altered or modified to induce slipping of the motor protein as it translocates along the modified amino acid (or polymerizable molecule).
- “flossing” of the modified amino acid may be performed, e.g., using a dual nanopore device and competing voltage forces, e.g., as described in Liu et al. Small. 2020. 16(3):el 905379.
- Flossing of the modified amino acid may also be performed by reversing the electric field, modulating the electric current flow (e.g., switching from alternating current to direct current or from direct current to alternating current), applying a magnetic field, modulating impedance, by inclusion of structural elements on the modified amino acids (e.g., comprised by or attached to the polymerizable molecules) such as DNA hairpins, hybridizing primers (e.g., for generating double-stranded regions), dumbbell shaped complexes, e.g., as described by Kasianowicz. 2004 Nature Materials. 3, 355-356., use of magnetic particles (e.g., attached to a portion or an end of the modified amino acid or stacked plurality of modified amino acids), and the like.
- structural elements on the modified amino acids e.g., comprised by or attached to the polymerizable molecules
- structural elements on the modified amino acids e.g., comprised by or attached to the polymerizable molecules
- hybridizing primers e.
- slowing or stuttering of the modified amino acid during translocation through the nanopore may be performed, for instance, via introducing noncan onical or non-incorporable nucleotides (e.g., sulfur modified nucleotides).
- a polymerase maybe used as a molecular motor protein to ratchet the modified amino acid through the nanopore, and the addition of the non-incorporable nucleotides may change the translocation speed (e.g., slow down) or introduce stutter, skipping, backstepping, or other useful ratcheting manner of the modified amino acid.
- a peptide or portion thereof may be analyzed using a nanopore or nanogap.
- a peptide may be provided, and a modified amino acid or stacked plurality of modified amino acids may be generated from the peptide using the methods described herein.
- the remaining peptide e.g., after any number of cycles of generating modified amino acids or a stacked plurality of modified amino acids, may be analyzed using the nanopore or nanogap (see, e.g., FIG. IE).
- the signal (e.g., from the current or other parameter such as impedance) may be measured, which may output information such as size (e.g., molecular weight, number of amino acids, hydrodynamic radius, length), conformation, charge, polarity, hydrophobicity, hydrophilicity, composition (e.g., amino acid sequence) or other parameter about the remaining peptide.
- size e.g., molecular weight, number of amino acids, hydrodynamic radius, length
- conformation charge, polarity, hydrophobicity, hydrophilicity
- composition e.g., amino acid sequence
- the peptide may notbe subjected to intramolecular expansion and may be directly translocated through a nanopore or nanogap for signal measurement and analysis.
- a peptide may be provided, digested or fragmented, and then the digested or fragmented peptides or individual amino acids can be analyzed using a nanopore or nanogap.
- the peptide may be digested or fragmented using any suitable approach, e.g., enzymatic approach (e.g., using a protease or proteasome, LysC), mechanical approach (e.g., sonication, mechanical shearing), chemical approach (e.g., using an Edman degradation reaction), or other useful approach.
- the peptide may be digested or fragmented into smaller length peptides or individual amino acids.
- the fragmented or digested product may then translocate through a nanopore or nanogap and be analyzed, e.g., via measurement of a signal (e.g., current) and using the measured signal to identify or characterize the fragmented or digested product.
- a signal e.g., current
- the fragmented or digested product may be contacted with a helper molecule, which may assist with translocatingthe fragmented or digested product through the nanopore.
- the helper molecule may comprise, for example, a charged molecule (e.g., a DNA molecule) and a linker that comprises an amino acid reactive moiety (e.g., an amine-reactive moiety such as isothiocyanate, guanidinyl group, dithioester, xanthate, etc. or a carboxyl-reactive moiety such as carbodiimide, EDC, etc.).
- the amino acid reactive group may be used to couple the helper molecule to the fragmented or digested product and facilitate translocation (e.g., via electrophoresis) of the helper molecule-digested product complex through the nanopore.
- the nanopore, nanogap, or nanochannel may be generated from an organic or inorganic material and may comprise a protein, e.g., a pore-forming protein or a transmembrane protein.
- a protein e.g., a pore-forming protein or a transmembrane protein.
- a protein may be naturally occurring, synthetic, or engineered.
- naturally occurring organic nanopores include wild-type aerolysin, alpha-hemolysin, mycobacterial porins (e.g., MspA porin), Phi29 connector channels, Fragaceatoxin C, Cytolysin A, Ferric hydroxyamate uptake component A, Curb specific gene G, outer membrane porin G, viral DNA packaging motors, etc.
- the nanopore, nanochannel, or nanogap may comprise an engineered variant of a naturally occurring nanopore.
- engineered variants can be generated, for example, using protein engineering approaches (e.g., directed evolution approaches such as yeast-surface, bacterial-surface, or phage display, ribosomal display, CIS display, droplet directed evolution, etc.), genetic engineering, or chemical modification (e.g, using maleimide reaction with cysteine residues, isothiocyanate or NHS ester reaction with lysine residues).
- the engineered variant may comprise a chemical modification such as addition of a chelator molecule (e.g., nickel), e.g., as described by Wang et al. 2023. Nature Methods.
- a chelator molecule e.g., nickel
- a nanopore may be engineered to change the pore or lumen size.
- the nanopore, nanogap, or nanochannel is comprised of an inorganic material.
- solid-state nanopores may be madefrom dielectric materials such as a silicon compound (e.g., silicon nitride, silicon dioxide), an aluminum compound (e.g., aluminum oxide), a titanium compound (e.g., titanium oxide), a molybdenum compound (e.g, molybdenum sulfate), hafnium, graphene, etc.
- the nanopore, nanochannel, or nanogap may assume any useful form factor or geometry, e.g., gaps or channels within membranes, capillaries, etc., and may be generated using any suitable process, e.g., ion-beam sculpting, electron beam exposure.
- the nanopore, nanochannel, or nanogap may comprise an elastomeric material.
- the nanopore, nanogap, or nanochannel is coupled to or configured to couple to a protein.
- the protein may be a molecular motor, which may facilitate movement of the modified amino acid or portion thereof (e.g., the polymerizable material) through the nanopore.
- the modified amino acid may comprise an amino acid coupled, optionally via a linker, to a nucleic acid molecule, and the nanopore may be coupled to a ratcheting enzyme such as a helicase or a molecule comprising helicase activity.
- the helicase may be used to translocate (ratchet) the nucleic acid molecule and the coupled amino acid through the nanopore.
- Non-limiting examples of helicases include PcrAl , Rep A, UvrD, Dda, HSV UL5, HSV UL9, DnaB, PriA, T7gp4A and 4B, T4gp41, SV40, TAG, Polyoma TAG, BPV El, MCM 4/6/7, Dna2, FFA-1, RecD, Tral, NS3, RecQL4, UvrD, UvrAB, PcrA, Rad3, helicase E, XPD, XPB, Dna2, RecD2, BACH1, HDH II, RecQ, WRN, Rtell, BLM, RuvB, Mph 1, CHD4, CMG helicase, RecBCD, RecG, RecQ, RuvAB, PriA, UvrD, T4 UvsW, HDH II, HDH IV, WRN, Tra I, Rho, PDH65, BLM, Srs2, Sgsl, Rtell, SWI2, S
- the molecular motor may increase, decrease, or otherwise change the translocation speed of the modified amino acid through the nanopore as compared to an unmodified amino acid.
- the molecular motor may comprise a topoisomerase, a polymerase, a nuclease (e.g., endonucleases such as restriction endonucleases or Cas proteins, or exonuclease) an unfoldase, mitotic spindle protein (e.g., nuclear mitotic apparatus protein, kinesin, dynein), or other motor protein (e.g., myosin), a G-protein coupled receptor or signaling protein, or a combination or a variant thereof, or other poly mer-processing protein.
- mitotic spindle protein e.g., nuclear mitotic apparatus protein, kinesin, dynein
- motor protein e.g., myosin
- the protein comprises a protease (e.g., exopeptidase, Edmanase) or proteosome, which can enable “chop-n-drop” or cleaving of the modified amino acid or portion thereof prior to translocation in the nanopore.
- the protein may not be coupled to the nanopore, nanogap, or nanochannel but may be adjacent or in proximity to the nanopore, nanogap, or nanochannel.
- the protein and the nanopore may be provided separately in an array (e.g., a planar substrate, a microwell or nanowell array), such that the protein may contact the modified amino acid (or stacked plurality of modified amino acids) in proximity to the nanopore, nanogap, or nanochannel.
- the modified amino acid or the stacked plurality of modified amino acids may comprise a nucleic acid molecule comprising non-canonical nucleotides, which can be processed by a motor protein adjacent to the nanopore, nanogap, or nanochannel.
- the modified amino acid or stacked plurality of modified amino acids may comprise one or more nucleotide hexaphosphate groups.
- the modified amino acid or stacked plurality of modified amino acids may be contacted with a molecular motor protein that accepts hexaphosphates (e.g., an exonuclease, a 5’ exo polymerase such as Bst or engineered variant thereof, a DNA polymerase from G.
- kaustophillus or engineered variant thereof which, in some instances, may cleave the nucleotide comprising the amino acid portion, thereby generated a liberated portion of a modified amino acid.
- the liberated portion of the modified amino acid may then translocate through the nanopore, nanogap or nanochannel and be detected.
- translocation of the molecules described herein (e.g., modified amino acids, polymerizable molecules, etc.) through a medium or through a nanopore or nanogap may be facilitated by application of a force.
- a molecule may be translocated by application of pressure (e.g., pressure-driven flow), an electric field (e.g., via electrophoresis, electroosmotic flow, isoelectric focusing), a magnetic field (e.g., using a ferromagnetic fluid, magnetic particles), light (e.g., optoelectronics, optical tweezers), hydrodynamic force, centrifugal force, or gravitational force.
- pressure e.g., pressure-driven flow
- an electric field e.g., via electrophoresis, electroosmotic flow, isoelectric focusing
- a magnetic field e.g., using a ferromagnetic fluid, magnetic particles
- light e.g., optoelectronics, optical
- a commercially available nanopore system may be used in the methods described herein.
- a nanopore system from Oxford Nanopore Technologies such as the MinlON, VolTRAX, GridlON, PromethlON, MinIT, Flongle, or Q-Line products maybe used to identify and characterize the modified amino acids described herein.
- a solid-state nanopore system may be used, e.g., Northern Nanopore Instruments, Norcada, or IMEC.
- a method provided herein may comprise providing a peptide, and generating a modified amino acid from the peptide.
- the generating the modified amino acid may comprise providing a linker and a polymerizable molecule, coupling the linker to the amino acid of the peptide and to the polymerizable molecule, thereby generating an amino acid-linker complex, and cleaving the amino acid from the peptide, thereby generating the modified amino acid comprising the linker and the polymerizable molecule.
- the amino acidlinker complex or the optionally-cleaved modified amino acid is coupled to a capture moiety.
- One or more operations of the process may be repeated, thereby generating a stacked plurality of modified amino acids comprising a plurality of stacked polymerizable molecules.
- the modified amino acids or stacked plurality of modified amino acids may then be analyzed.
- the modified amino acid is characterized or identified using a binding agent comprising a detectable label and detecting the detectable label, thereby identifying an amino acid type of the modified amino acid.
- the modified amino acid is analyzed using super-resolution imaging.
- the method may comprise attaching a portion of the modified amino acid to a substrate, linearizing the modified amino acid or portion thereof, and attaching an additional portion of the modified amino acid to the substrate.
- the method comprises attaching a first sequence of a DNA molecule or modified DNA molecule (e.g., a DNA molecule comprised by a modified amino acid) to a substrate; linearizing the DNA molecule and attaching a second sequence of the DNA molecule or the modified DNA molecule to the substrate.
- a DNA molecule or modified DNA molecule e.g., a DNA molecule comprised by a modified amino acid
- FIGs. 1A-1D schematically show example workflows for generating a modified monomer from a polymeric analyte (e.g., a modified amino acid from a peptide) either on a substrate (FIG. 1A) or with or without a substrate (FIG. IB and FIG. 1C).
- a polymeric analyte 103 e.g., a peptide
- a capture moiety 105 are provided, which optionally are coupled to a substrate 101 .
- the capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA).
- a linker 109 and a polymerizable molecule e.g., a linking nucleic acid molecule 111
- the linker 109 is pre-tethered to the polymerizable molecule (depicted as a linking nucleic acid molecule 111); alternatively, the linker 109 and the polymerizable molecule maybe provided separately.
- the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., a peptide) to generate a monomer-linker complex.
- the monomer-linker complex may couple to the capture moiety 105, thereby generating a monomer-capture moiety complex. Coupling of the monomer-linker complex to the capture moiety 105 may be mediated by the polymerizable molecule, e.g., the linking nucleic acid molecule 111.
- the monomer-linker complex and the capture moiety 105 may be covalently linked together (e.g., the linking nucleic acid molecule 111 may be covalently linked to the capture moiety 105), using chemical (e.g., click chemistry) or enzymatic (e.g., a ligase) approaches.
- the polymerizable molecule may comprise a first sequence that is complementary and may hybridize to a second sequence of the capture moiety 105 (not shown), or the polymerizable molecule may be linked to the capture moiety 105 via a splint or bridge molecule, which may comprise sequences that are complementary to the first sequence of the polymerizable molecule and the second sequence of the capture moiety 105 (not shown).
- the monomer may be cleaved from the polymeric analyte 103, thereby providing a modified monomer that comprises the cleaved monomer-capture moiety complex; the modified monomer may comprise the cleaved monomer coupled to the linker 109, the polymerizable molecule (shown as a linking nucleic acid molecule 111), the capture moiety 105, or a combination thereof (e.g., the modified monomer may comprise the cleaved monomer, the linker, and the polymerizable molecule).
- Any of the processes, e.g., 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules (e.g., linking nucleic acid molecules optionally comprising cycle/round information), and tethering the additional polymerizable molecules together (e.g., tethering an additional polymerizable molecule to the polymerizable molecule of the monomer-capture moiety complex).
- Rounds any number of times (“rounds”) using additional linkers 109 and polymerizable molecules (e.g., linking nucleic acid molecules optionally comprising cycle/round information), and tethering the additional polymerizable molecules together (e.g., tethering an additional polymerizable molecule to the polymerizable molecule of the monomer-capture moiety complex).
- rounds may be performed until all or a subset of the monomers in the polymeric analyte 103 are cleaved and tethered together.
- processes 106, 112, and 113 may be iterated to generate a stacked plurality of modified amino acids 123 comprising a set of cleaved monomers, e.g., a concatenated set of modified monomers that each comprise a polymerizable molecule coupled thereto.
- the stacked plurality of modified amino acids 123 may comprise a stacked set of polymerizable molecules (e.g., via coupling of the polymerizable molecule from a second round to the polymerizable molecule from a first round and coupling of the polymerizable molecule from the third round to that of the second round, and so on) from the individual modified monomers.
- the stacked plurality of modified amino acids 123 may comprise polymerizable molecules that are coupled or concatenated in a linear fashion (e.g., polymerizable molecules that form a linear polymerizable molecule “backbone”), or in a non-linear fashion (e.g., random or semi-random arrangement, branched coupling).
- the stacked plurality of modified amino acids 123 may be circularized; for example, the stacked plurality of modified amino acids 123 forming a linear polymerizable backbone may be joined at the ends to form a circularized product. In some instances, the stacked plurality of amino acids 123 may be cleaved from the substrate.
- the capture moiety 105 or the polymerizable molecule may comprise a restriction or cleavable site that can be cleaved upon addition of the proper cleaving reagent, e.g., a restriction enzyme, a reducing agent (for disulfide bonds), etc.
- the stacked set of polymerizable molecules may be subject to amplification to generate amplicons of the polymerizable molecules coupled to or comprised by the stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids 123 may be subjected to analysis or characterization, e.g., by contacting the stacked plurality of modified amino acids 123 with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents, or by using a nanopore sequencer.
- a library of binding agents which can bind to their respective monomer targets (e.g., an amino acid type)
- detecting the binding agents e.g., a nanopore sequencer.
- FIG. IB schematically illustrates another example workflow of generating a modified monomer, e.g., modified amino acid, in presence or absence of a substrate.
- a polymeric analyte 103 e.g., a peptide and a capture moiety 105 are provided.
- the capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA molecule) and may comprise identifying information of the polymeric analyte 103, e.g., an identifying barcode sequence.
- the capture moiety 105 may additionally comprise a releasable or cleavable moiety.
- the polymeric analyte and the capture moiety 105 may, in some instances, be coupled to a substrate (FIG. IB inset), e.g., via an anchor molecule (e.g., anchor nucleic acid molecule).
- the polymeric analyte may be coupled to the capture moiety, either directly or indirectly (e.g., both the polymeric analyte and the capture moiety may be coupled to a substrate).
- the capture moiety 105 may comprise a nucleic acid sequence that is complementary to a sequence on a substrate (e.g., bead, flat surface), or the capture moiety may be coupled to the substrate via a linker (not shown).
- the capture moiety 105 may comprise a nucleic acid sequence that is partially complementary to a splint oligonucleotide that is also partially complementary to an anchor oligonucleotide on the substrate, and proximity ligation (e.g., using ligase or chemical ligation) may be performed to couple the capture moiety 105 to the substrate (not shown).
- proximity ligation e.g., using ligase or chemical ligation
- a linker 109 and polymerizable molecule such as a linking nucleic acid molecule 111 , are provided.
- the linker 109 is pre-tethered to the polymerizable molecule (linking nucleic acid molecule 111); alternatively, the linker 109 and the polymerizable molecule (linking nucleic acid molecule 111) may be provided separately.
- the polymerizable molecule may comprise identifying temporal information, e.g., the cycle or round in which it is provided.
- the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., peptide) to generate a monomer-linker complex.
- the monomer-linker complex may couple to the capture moiety 105.
- Coupling of the monomer-linker complex to the capture moiety 105 may be mediatedby the polymerizable molecule and optionally an additional polymerizable molecule 116.
- the additional polymerizable molecule 116 may comprise a cleavable tag moiety, which can enable rapid and precise purification.
- the additional polymerizable molecule 116 may comprise a photocleavable biotin moiety, which may allow for purification of the monomer-linker complex using streptavidin pulldown. Subsequently, the photocleavable biotin moiety maybe cleaved and removed at any convenient or useful step.
- the capture moiety 105 may be directly hybridized or ligated to the polymerizable molecule (linking nucleic acid molecule 111).
- the monomer-linker complex and the capture moiety may be covalently linked together (e.g., via ligation).
- the polymerizable molecule may comprise a first sequence that is complementary and may hybridize to a second sequence of the capture moiety 105 (not shown), or the polymerizable molecule may be linked to the capture moiety 105 via a splint or bridge molecule, which may comprise sequences that are complementary to the first sequence of the polymerizable molecule and the second sequence of the capture moiety 105 (not shown).
- the monomer may be cleaved from the polymeric analyte 103 to generate the modified monomer that comprises the monomer-capture moiety complex.
- the modified monomer may comprise the cleaved monomer, the linker 109, the polymerizable molecule (e.g., linking nucleic acid molecule 111), the capture moiety 105, or a combination thereof (e.g., the cleaved monomer, the linker, and the polymerizable molecule, just the cleaved monomer, or just the cleaved monomer-linker complex).
- Processes 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules, and tethering the additional polymerizable molecules together (e.g., tethering an additional (e.g., second) polymerizable molecule to the monomer-capture moiety complex, tethering a polymerizable molecule from the third round to that of the second round, and so on). Multiple rounds may continue until all or a subset of the monomers of the polymeric analyte 103 are tethered together.
- rounds any number of times
- the process may be iterated to generate a stacked plurality of modified monomers (e.g., stacked plurality of amino acids 123) comprising a set of concatenated modified monomers, e.g., concatenated monomer-linker-polymerizable complexes.
- the polymerizable molecules (e.g., linking nucleic acid molecules 111) of the stacked plurality of modified monomers (e.g., stacked plurality of amino acids 123) may beidentical molecules (e.g, same nucleic acid sequence), or they may be different.
- the stacked plurality of modified amino acids 123 may comprise polymerizable molecules that are coupled or concatenated in a linear fashion (e.g., polymerizable molecules that form a linear polymerizable molecule “backbone”), or in a non-linear fashion (e.g., random or semi-random arrangement, branched coupling).
- the stacked plurality of modified amino acids 123 may be circularized; for example, the stacked plurality of modified amino acids 123 forming a linear polymerizable backbone may be joined at the ends to form a circularized product.
- the stacked plurality of modified monomers may be cleaved from or at the capture moiety 105, e.g., using the cleavable moiety.
- the stacked set of polymerizable molecules may be subject to amplification to generate amplicons of the polymerizable molecules coupled to or comprised by the stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids may be removed from the substrate, e.g., via cleavage of the capture moiety or dehybridization (e.g, of a nucleic acid capture moiety that is annealed to an anchor nucleic acid molecule of the substrate).
- the stacked plurality of modified amino acids 123 may be subjected to analysis or characterization, e.g., by contacting the stacked plurality of modified amino acids 123 with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents or using a nanopore sequencer.
- generating the modified monomer may comprise providing a polymerizable molecule comprising a reactive moiety, reacting the reactive moiety with the monomer, and polymerizing the polymerizable molecule.
- a polymerizable molecule comprising an incorporable monomer, e.g., a modified nucleotide, that has an amino acid reactive moiety (e.g., isothiocyanate, PITC) is provided and reacted with the modified amino acid.
- a linking polymerizable molecule 111 e.g., linking nucleic acid molecule
- the linking polymerizable molecule 111 may act as a splint molecule to hybridize the capture moiety 105 or portion thereof.
- the linking polymerizable molecule 111 may additionally comprise a terminated end such that it is not extendable.
- an extension reaction is performed, e.g., using polymerase or a TdT enzyme to elongate the capture moiety 105 to comprise a complementary sequence to a portion of the linking polymerizable molecule 111.
- the extension reaction may incorporate canonical or noncanonical nucleotides (e.g., hexaphosphate nucleotides, deaza-nucleotides), ribonucleotides, or a combination thereof.
- the reaction may be catalyzed by addition of cations (e.g., manganese ions, magnesium ions, calcium ions, etc.).
- the linking polymerizable molecule 111 may, in some instances, comprise identifying temporal information, e.g., the cycle or round in which it is provided, which may be copied to the extended nucleic acid molecule.
- the modified nucleotide may be ligated to the capture moiety 105.
- the remaining operations of workflow 100b e.g., process 113, iteration, etc. may be performed.
- FIG. 1C schematically illustrates another example workflow of generating a modified monomer, e.g., modified amino acid, or a detectable product comprising the modified monomer, in presence or absence of a substrate.
- a polymeric analyte 103 e.g., a peptide
- a capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA molecule) and may comprise identifying information of the polymeric analyte 103, e.g., an identifying barcode sequence such as a peptide-identifying barcode.
- the capture moiety 105 may additionally comprise a releasable or cleavable moiety.
- the polymeric analyte and the capture moiety 105 may, in some instances, be coupled to a substrate (FIG. 1C inset).
- the polymeric analyte may be coupled to the capture moiety, either directly or indirectly (e.g., both the polymeric analyte and the capture moiety may be coupled to a substrate).
- the capture moiety 105 may comprise a nucleic acid sequence that is complementary to a sequence on a substrate (e.g., bead, flat surface), or the capture moiety may be coupled to the substrate via a linker (not shown).
- a linker 109 and polymerizable molecule such as a linking nucleic acid molecule 111
- the linker 109 is pre-tethered to the polymerizable molecule (linking nucleic acid molecule 111); alternatively, the linker 109 and the polymerizable molecule (linking nucleic acid molecule 11 l)maybeprovidedseparately.
- Thepolymerizablemolecule may comprise identifying temporal information, e.g., the cycle or round in which it is provided.
- the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., peptide) to generate a monomer-linker complex.
- the monomer-linker complex may couple to the capture moiety 105, e.g., a portion of the capture moiety 105 may hybridize to the polymerizable molecule (linking nucleic acid molecule 111).
- a nucleic acid extension reaction e.g., using a polymerase, may be performed, e.g., to copy a sequence (e.g., a barcode sequence) of the capture moiety 105 to the polymerizable molecule (linking nucleic acid molecule 111).
- a sequence e.g., a barcode sequence
- the monomer may be cleaved from the polymeric analyte 103 to generate the modified monomer that comprises the cleaved monomer, the linker 109, the polymerizable molecule (e.g., linking nucleic acid molecule 111), the capture moiety 105, or a combination thereof (e.g., the cleaved monomer, the linker, and the polymerizable molecule, justthe cleaved monomer, orjustthe cleaved monomer-linker complex).
- the polymerizable molecule e.g., linking nucleic acid molecule 111
- the capture moiety 105 e.g., the cleaved monomer, the linker, and the polymerizable molecule, justthe cleaved monomer, orjustthe cleaved monomer-linker complex.
- the cleavage of the monomer results in formation of a detectable product 131, which comprises a complementary sequence 105’ to a portion of the capture moiety 105, the linking nucleic acid molecule 111, the linker 109, and the cleaved monomer (e.g., cleaved amino acid).
- the detectable product 131 may subsequently be dehybridized or removed (e.g., via cleavage) from the capture moiety 105, thereby regenerating the capture moiety 105, which can be used in subsequent iterations.
- Processes 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules and tethering the additional polymerizable molecules to the capture moiety 105 (e.g., via ligation, hybridization, extension), cleaving the additional monomers, removing the detectable product, and repeating. Multiple rounds may continue until all or a subset of the monomers of the polymeric analyte 103 are generated into discrete detectable products. For example, the process may be iterated to generate a plurality of detectable products 131 that each comprise a modified monomer.
- the detectable products may be collected, optionally processed, and analyzed, e.g., using a nanopore sequencer.
- the discrete detectable products may be combined or concatemerized, e.g., hybridized or ligated to one another, to generate a stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids 123 may comprise polymerizable molecules that are coupled or concatenated in a linear fashion (e.g., polymerizable molecules that form a linear polymerizable molecule “backbone”), or in a nonlinear fashion (e.g., random or semi-random arrangement, branched coupling, circularized molecule).
- the stacked plurality of modified amino acids 123 may be circularized; for example, the stacked plurality of modified amino acids 123 forming a linear polymerizable backbone may be joined at the ends to form a circularized product.
- a branched polymerizable molecule may be used or constructed to generate the modified monomer.
- FIG. ID schematically illustrates another example workflow of generating a modified monomer, e.g., modified amino acid using a branched linking polymerizable molecule.
- a polymeric analyte 103 e.g., a peptide, and a capture moiety 105 are provided.
- the capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA molecule) and may comprise identifying information of the polymeric analyte 103, e.g., an identifying barcode sequence such as a peptide-identifying barcode.
- the capture moiety 105 may additionally comprise a releasable or cleavable moiety.
- the polymeric analyte and the capture moiety 105 may, in some instances, be coupled to a substrate (see, e.g., FIG. 1C inset)
- the polymeric analyte may be coupled to the capture moiety, either directly or indirectly (e.g., both the polymeric analyte and the capture moiety may be coupled to a substrate).
- the capture moiety 105 may comprise a nucleic acid sequence that is complementary to a sequence on a substrate (e.g., bead, flat surface).
- a linker 109 and polymerizable molecule such as a linking nucleic acid molecule 111, are provided.
- the polymerizable molecule is a branched polymerizable molecule (e.g., a branched DNA molecule).
- the branched DNA molecule may comprise a first nucleic acid molecule that comprises a first click chemistry moiety (e.g, an ethynyl or octadiynyl nucleobase or nucleotide analogue) that can conjugate to a second nucleic acid molecule comprising a second click chemistry moiety (e.g., azide), thereby generating the branched DNA molecule.
- a first click chemistry moiety e.g, an ethynyl or octadiynyl nucleobase or nucleotide analogue
- a second click chemistry moiety e.g., azide
- the linker 109 is pre-tethered to the polymerizable molecule; alternatively, the linker 109 and the polymerizable molecule (e.g., linking nucleic acid molecule 111) may be provided separately .
- the polymerizable molecule may comprise identifyingtemporal information, e.g., the cycle or round in which it is provided.
- the linker 109 may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., peptide) to generate a monomer-linker complex.
- the monomer-linker complex may couple to the capture moiety 105, e.g., a portion of the capture moiety 105 may hybridize to the polymerizable molecule (linking nucleic acid molecule 111, hybridization not shown) or alternatively, the polymerizable molecule may be coupled to the capture moiety 105 using a splint molecule (not shown) and ligated.
- the coupling of the capture moiety 105 and the polymerizable molecule (linking nucleic acid molecule 111) may be performed using click chemistry.
- a nucleic acid extension reaction e.g., using a polymerase, may be performed (not shown).
- a different order of operations may be performed; for instance, the coupling of the polymerizable molecule to the capture moiety 105 may occur first, followed by coupling of the linker 109 to the polymerizable molecule.
- the monomer is cleaved from the polymeric analyte 103 to generate the modified monomer that comprises the cleaved monomer, the linker 109, the polymerizable molecule (e.g., linkingnucleic acid molecule 111), which altogether may be coupledto the capture moiety 105.
- Processes 106, 112, and 113 may be iterated and repeated any number of times (“rounds”) using additional linkers 109 and polymerizable molecules, and tethering the additional polymerizable molecules together (e.g., tethering a second polymerizable molecule provided in a second round to polymerizable molecule of the first round).
- Multiple rounds may continue until all or a subset of the monomers of the polymeric analyte 103 are tethered together.
- the process may be iterated to generate a stacked plurality of modified monomers (e.g., stacked plurality of amino acids 123) comprising a set of concatenated modified monomers, e.g., concatenated monomer-linker-polymerizable complexes.
- the polymerizable molecule from a second round may be coupledto the polymerizablemolecule from the firstround, and the polymerizable molecule from a third round may couple that of the second round, and so on.
- the polymerizable molecules from the multiple rounds may thus form a polymerizable molecule backbone that comprises pendant, individual modified monomers.
- the polymerizable molecules e.g., linking nucleic acid molecules 111) of the stacked plurality of modified monomers (e.g., stacked plurality of amino acids 123) may be identical molecules (e.g., same nucleic acid sequence), orthey may be different.
- the stacked plurality of modified amino acids 123 may comprise polymerizable molecules that are coupled or concatenated in a linear fashion (e.g., polymerizable molecules that form a linear polymerizable molecule “backbone”), or in a non-linear fashion (e.g., random arrangement, branched coupling).
- the stacked plurality of modified amino acids 123 may be circularized; for example, the stacked plurality of modified amino acids 123 forming a linear polymerizable backbone may be joined at the ends to form a circularized product.
- the stacked plurality of modified monomers e.g., stacked plurality of amino acids 123
- the stacked set of polymerizable molecules may be subject to amplification to generate amplicons of the polymerizable molecules coupled to or comprised by the stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids 123 may be subjected to analysis or characterization, e.g., by contacting the stacked plurality of modified amino acids 123 with a library of binding agents, which can bind to their respective monomer targets (e.g., an amino acid type), and detecting the binding agents, or using a nanopore sequencer.
- a library of binding agents which can bind to their respective monomer targets (e.g., an amino acid type)
- detecting the binding agents e.g., a nanopore sequencer.
- the polymerizable molecule e.g., linkingnucleic acid molecule 111 comprises temporal information on the cycle in which it is provided; as such, the temporal information may be used for reconstructing the sequence of the peptide or for quality control.
- the temporal information may be used for reconstructing the sequence of the peptide or for quality control. For example, referring to FIG. IB, if a particular cycle number is missing, then it can be inferred that an amino acid is missing or was not present in the peptide, that cleavage of the amino acid did not occur, or other error.
- FIG. IB For reconstruction purposes, e.g., referring to FIG.
- the presence of a temporal (e.g., round or cycle number) barcode and a peptide-identifying barcode can be used to attribute a particular identified modified amino acid to the order or position (using the temporal barcode) in which it occurs in a specific peptide (using the peptide-identifying barcode).
- a temporal (e.g., round or cycle number) barcode and a peptide-identifying barcode can be used to attribute a particular identified modified amino acid to the order or position (using the temporal barcode) in which it occurs in a specific peptide (using the peptide-identifying barcode).
- a method for processing a peptide may comprise (a) providing the peptide and a linker, wherein the linker is coupled to a polymerizable molecule (e.g., nucleic acid molecule); (b) couplingthe linker to the amino acid of the peptide to generate an amino acid-linker complex; (c) couplingthe polymerizable molecule to a capture moiety, e.g., via hybridization; (d) cleaving the amino acid from the peptide to yield a modified amino acid comprising the cleaved amino acid, the linker, and the polymerizable
- a polymerizable molecule e.g., nucleic acid molecule
- an extension reaction e.g., nucleic acid extension reaction
- the modified amino acid may comprise other additional modifications that are not depicted, such as posttranslational modifications or chemical modifications (e.g., protecting groups), described elsewhere herein.
- the modified amino acid comprises a derivatized amino acid.
- the linker may comprise a PITC moiety as the amino acid reactive group, and upon conjugation of PITC to an amino acid (e.g., NTAA) of a peptide under mildly basic conditions, a phenylthiocarbamoyl (PTC) derivative of the amino acid is generated.
- the PTC-derivatized amino acid may be treated with acid (e.g., TFA or a Lewis acid) to generate a cleaved cyclic 2-anilino-5(4)- thiazolinone (ATZ)-derivatized amino acid, leaving a new N-terminus on the remaining peptide.
- acid e.g., TFA or a Lewis acid
- ATZ cleaved cyclic 2-anilino-5(4)- thiazolinone
- the ATZ-derivatized amino acid may be converted to a phenylthiohydantoin (PTH) derivative or PTC derivative.
- the polymerizable molecules and capture moieties described herein may comprise other molecule types, e.g., peptides, lipids, carbohydrates, polymers (both naturally occurring and synthetic), or a combination thereof.
- the capture moiety 105 and the polymerizable molecule e.g, shown as a linking nucleic acid molecule 111) may each comprise a peptide, optionally comprising a peptide barcode sequence.
- process 112 (coupling of the monomer-linker complex to the capture moiety) may be mediated using a peptide enzyme such as sortase A.
- the capture moiety (or the polymerizable molecule) may comprise an oligoglycine or poly -glycine peptide sequence at the C-terminus and the polymerizable molecule (or capture moiety) may comprise a sortase recognition sequence (e.g. , LPXTG, where Xis any amino acid) at the N-terminus and optionally, an oligo-glycine or poly-glycine peptide sequence at the C-terminus (to facilitate further attachment). Sortase A may then be used to catalyze the formation of a peptide bond between the capture moiety and the polymerizable molecule.
- a sortase recognition sequence e.g. , LPXTG, where Xis any amino acid
- a stacked plurality of modified monomers 123 that are connected via a peptide backbone may be generated.
- the polymerizable molecule and/or capture moiety may comprise a protecting group that can be deprotected at any useful or convenient step.
- the use of a peptide backbone may allow for performing harsher reaction conditions (e.g., traditional Edman degradation using strong acids) and for assisting in nanopore readout of the individual monomers spaced along the peptide backbone, e.g., as described by K. Motone et al. Multi-pass, single-molecule nanopore reading of long protein strands. Nature (2024), which is incorporated by reference herein in its entirety.
- a “derivative” of the modified amino acid may generally refer to a molecule that is derived from the modified amino acid .
- a derivative may b e a product of a reaction (e . g. , chemical, enzymatic) or interaction of the modified amino acid with another molecule. Further processing of the modified amino acid may optionally be performed to arrive at the “derivative thereof.” For example, further chemical or enzymatic treatment, cleavage of the amino acid, an extension or amplification reaction, cleavage or removal from a substrate, physical processes such as mechanical shearing or fragmentation may be performed on the modified amino acid to obtain a derivative of the amino acid or derivative of the modified amino acid.
- a modified amino acid may be derivatized by chemical reaction, e.g., using maleimide to react with cysteine residues, NHS esters or isothiocyanates to react with lysine residues, etc.
- the derivative may result from performing a nucleic acid reaction (e.g., nucleic acid extension reaction, amplification, ligation, transposition, hybridization, dehybridization, etc.).
- one or more of the operations described herein may be iterated or repeated. Iteration of the operations may allow for sequential processing, analysis, or identification of the individual monomers of the polymeric analyte, which can allow for reconstruction of the entire polymeric analyte.
- the operations of the workflow 100a, 100b, 100c, and 1 OOd may be conducted to generate a modified amino acid sequentially for each terminal monomer (e.g., NTAA) of the polymeric analyte (e.g, peptide).
- the individual modified amino acids may then be immobilized to a substrate (e.g., the same or separate substrate as shown in FIG. 1A), linearized, optionally immobilized (e.g., at another end), and detected.
- the individual modified amino acids may be combined, e.g., via hybridization or ligation of the polymerizable molecules to generate the stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids may comprise a plurality of polymerizable molecules from multiple rounds or cycles.
- nth) cycle may be configured to only couple to the first (or second, third, fourth, ...n-lth) polymerizable molecule.
- the first cycle polymerizable molecule may comprise a unique binding sequence that is absent on the capture moiety of the substrate, and to which the second cycle polymerizable molecule can bind. Accordingly, the second cycle polymerizable molecule may only bind to the first cycle polymerizable molecule and not to any of the capture moieties.
- a bridging polymerizable molecule may be provided that encodes for a null event but comprises the unique binding sequence, such that subsequent rounds may continue, even if a null event occurs.
- the polymerizable molecules across cycles or rounds may comprise the same binding sequence (e.g., an adapter sequence).
- a plurality of bioorthogonal click chemistry moieties may be used in the polymerizable molecules for different cycles.
- the polymerizable molecule e.g., alinkingnucleic acid molecule
- the linker may react with a linker provided in the first cycle that comprises TCO.
- the polymerizable molecule and the linker may comprise moieties for an orthogonal click chemistry to that of the first cycle, e.g., sulfur fluoride exchange (SuFEx) click chemistry.
- SuFEx sulfur fluoride exchange
- the polymerizable molecules may additionally encode temporal information, e.g., the cycle or iteration number, such that the order of the individual monomers may be determined.
- temporal information e.g., the cycle or iteration number
- the terminal amino acid may be coupled to a polymerizable molecule that comprises a barcode sequence that identifies the cycle number (e.g., cycle 1 ) (not shown).
- the information encoded by the barcode sequence may be coupled to an adjacent (additional) capture moiety (not shown).
- the workflow may be repeated for the n-1 terminal amino acid, which may again be coupled to a capture moiety via a linker and barcoded polymerizable molecule and cleaved.
- the barcoded polymerizable molecule may comprise the cycle number (e.g., cycle 2).
- cycle number e.g., cycle 2
- temporal information may be provided separately.
- a temporal barcode may be provided that can couple to the polymerizable molecule 111 or to the capture moiety 105, or a combination thereof.
- the temporal barcode may comprise any useful agent, including a nucleic acid molecule, a peptide, a lipid, a carbohydrate, an enzyme (e.g., a chromogenic or fluorogenic enzyme) or a ribozyme or DNAzyme, a fluorophore, a dye, an intercalating agent, a dideoxynucleotide, a fluorescent nucleic acid molecule or nucleotide, a radioisotope, a mass tag, or other detectable label that can indicate the time or cycle (or iteration) number in which it is provided.
- an enzyme e.g., a chromogenic or fluorogenic enzyme
- a ribozyme or DNAzyme e.g., a ribozyme or DNAzyme
- fluorophore e.g., a fluorophore
- a dye e.g., a ribozyme or DNAzyme
- the temporal barcode comprises a cycle- specific nucleic acid barcode molecule, which can couple to the polymerizable molecule 111 or to the capture moiety 105.
- the temporal barcode may comprise any additional useful functional sequences, e.g., primer sites, sequencing sites, restriction sites, abasic or cleavable sites, etc.
- the temporal barcode may comprise an amplification site that allows forbridge amplification of the temporal barcode and optionally, the coupled polymerizable molecules, to other capture or polymerizable molecules.
- FIG. IE schematically illustrates an example workflow for analyzing a remaining peptide after performing intramolecular expansion, e.g., as shown in FIG. 1A-1D.
- the polymeric analyte 103 e.g., a peptide
- the capture moiety 105 may optionally be or remain coupled to a substrate, such as a bead.
- the capture moiety 105 may comprise a cleavable moiety, e.g., a restriction site, a uracil, an abasic site, a photocleavable or photolabile moiety, a disulfide bond, etc.
- a cleavable moiety e.g., a restriction site, a uracil, an abasic site, a photocleavable or photolabile moiety, a disulfide bond, etc.
- a linker and a polymerizable molecule e.g., a linking nucleic acid molecule are provided.
- the linker pretethered to the polymerizable molecule e.g., a linking nucleic acid molecule
- the linker and the polymerizable molecule may be provided separately.
- the linker may couple to a monomer, e.g., an amino acid (e.g., NTAA) of the polymeric analyte 103 (e.g., the remaining peptide subsequent to one or more cycles of intramolecular expansion) to generate a monomer-linker complex.
- the monomer-linker complex may couple to the stacked plurality of modified amino acids 123. Coupling of the monomer-linker complex to the stacked plurality of modified amino acids 123 may be mediated by the polymerizable molecule (e.g, linking nucleic acid molecule).
- the monomer-linker complex and the stacked plurality of modified amino acids 123 may be covalently linked together using chemical (e.g., click chemistry) or enzymatic (e.g., a ligase, polymerase) approaches, or they may be noncovalently linked (e.g., via hybridization).
- cleavage of the capture moiety 105 may be performedby application of a stimulus, e.g., abiological stimulus (e.g., restriction enzyme, UDG), photo-stimulus, chemical stimulus (e.g., reducing agent), thereby providing a liberated stacked plurality of modified amino acids 123 coupled to the remainder of the polymeric analyte 103.
- a portion 105 ’ of the capture moiety may remain tetheredto the polymeric analyte 103.
- the portion 105’ of the capture moiety may serve as an attachment site for an additional adapter 125 (e.g., a sequencing adapter), e.g., via ligation, hybridization, or an extension reaction.
- the remaining polymeric analyte 103 and the stacked plurality of modified amino acids 123 may be translocated through a nanopore for readout or analysis.
- readout of the remaining polymeric analyte may provide additional multiplexed information on the polymeric analyte.
- readout of the stacked plurality of modified amino acids 123 may yield information on the primary structure (amino acid sequence) of a portion of the peptide
- readout of the remaining polymeric analyte e.g., peptide
- the combination of information may be used for peptide identification or fingerprinting in a rapid and accurate manner.
- the remainingpolymeric analyte may be further processedor treated prior to or duringtranslocation through the nanopore.
- the polymeric analyte may be digested with an enzyme (e.g., fragmenting enzyme, aminopeptidase, trypsin, peptidase, or other enzyme), degraded or cleaved (e.g., using Edman degradation).
- an enzyme e.g., fragmenting enzyme, aminopeptidase, trypsin, peptidase, or other enzyme
- degraded or cleaved e.g., using Edman degradation.
- the methods, systems, kits, and compositions described herein may advantageously improve nanopore sequencing readout and identification of the modified monomers (e.g., modified amino acids) described herein.
- the methods, systems, kits, and compositionsdescribed herein may improve the accuracy of identification of an amino acid type of a modified amino acid.
- the improvement to identification accuracy may be achieved by modulating the translocation of the modified monomer adjacent to or through the nanopore.
- the translocation of the modified amino acid through the nanopore can be modulated by alteringthe conditions in which the translocation occurs.
- a commercial nanopore sequencer system may conventionally use a particular set of conditions (which may be referred to herein as “control condition”) for translocation of a molecule through the nanopore.
- control condition a particular set of conditions
- the translocation speed of the analyte of interest e.g., a modified amino acid
- reducingthe translocation speed of the analyte of interest (e.g., modified amino acid) through the nanopore may improve the accuracy of the readout.
- the conditions sufficient to reduce a translocation speed of the analyte of interest include alterations to a sequencing run buffer (a “control” solution).
- alterations may include an increased viscosity (e.g., addition of PEG, glycerol, glucose, dextran, bovine serum albumin, methylcellulose, dextrose, etc.), an increased concentration of ATP inhibitors or addition of ATP competition agents (e.g., ATP analogs such as adenylyl-imidodiphosphate), which may aid in slowing down the ratcheting enzyme (e.g., helicase), addition of cofactors, an increased ionic strength, e.g., addition of cations, metal ions, metal salts, halide salts, including but not limited to lithium chloride (LiCl), potassium chloride (KC1), magnesium chloride (MgCl), sodium chloride (NaCl), caesium
- Translocation speed of the analyte of interest may also be modulated using molecular sieving mechanisms, e.g., incorporation of gel matrices (e.g., polyacrylamide, agarose, photo- crosslinkable gels, thermosensitive gels such as SOL gels).
- gel matrices e.g., polyacrylamide, agarose, photo- crosslinkable gels, thermosensitive gels such as SOL gels.
- the nanopore or nanogap may be embedded within a gel matrix, which may help to control translocation of the modified amino acids through the nanopore or nanogap.
- the modified amino acid or stacked plurality of modified amino acids is contacted with one or more helper molecules, which can aid in reducing the translocation speed, modulate or alter the interaction with the nanopore, or otherwise improve the accuracy of the readout obtained from the nanopore or nanogap sequencing.
- the one or more helper molecules may bind to the modified amino acid or stacked plurality of modified amino acids, or a portion thereof, and may comprise one or more moieties that can interact with the amino acid portion of the modified amino acid or the amino acid portions of the stacked plurality of modified amino acids.
- the helper molecule may comprise one or more moieties that are configured to interact with a portion of the nanopore or nanogap sequencer.
- the helper molecule may be configured to interact with a portion of the sensing region (e.g., a sensing region in the lumen) of a biological nanopore.
- the interaction of the helper molecule with the modified amino acids or the nanopore or nanogap sequencer may comprise a covalent or non- covalent (e.g., ionic, dipole-dipole interaction, electrostatic interaction, hydrophobic or van der Waals) bond.
- the helper molecule may comprise a hydrophobic moiety (e.g., a hydrocarbon, a lipophilic moiety, a cholesterol moiety) that can interact with nonpolar or hydrophobic residues, or a polar or charged moiety (e.g., a cation, an anion, a polyatomic ion, ionic compounds that can interact with polar or oppositely-charged residues), an electronwithdrawing group (e.g., fluorine, chlorine, or other halogen), an amine group, an oxygen group, or other moiety .
- the helper molecule comprises an amphiphilic moiety.
- the helper molecule comprises a chelator.
- the one or more helper molecules may comprise any useful molecule type, e.g., a protein or peptide, a nucleic acid, a lipid, a carbohydrate, a synthetic polymer (e.g., PEG, PVA, polyacrylamide), etc.
- the one or more helper molecules comprises a DNA molecule that can hybridize to the polymerizable molecule (e.g., another DNA molecule) of the modified amino acid.
- the linker comprises the one or more helper molecules.
- the helper molecule may compriseone or more reactive groups that can couple or link to additional moieties that may interact with the modified amino acids.
- the helper molecule may comprise a DNA molecule that comprises one or more reactive groups.
- the DNA molecule may hybridize to the polymerizable molecule (e.g, another DNA molecule) of a modified amino acid or stacked plurality of amino acids.
- a library of molecules with different properties may be contacted with the helper molecule; the library of molecules may selectively bind to different amino acid types based on the amino acid property (e.g., a charged molecule may bind to a modified amino acid comprising an oppositely charged residue, a hydrophobic molecule may bind to a hydrophobic residue) and may comprise another reactive group which can couple or link to the reactive group of the helper molecule (e.g., via click chemistry, affinity binding, etc.).
- the helper molecule, andnotthe stacked plurality of aminoacids may be analyzed using the nanopore sequencer, obviating the need to analyze the modified amino acid or stacked plurality of amino acids.
- the helper molecule may be useful in nanopore sequencing identification of the modified amino acids (e.g., the amino acid type comprised by the modified amino acids).
- the helper molecule may aid in further distinction of the signals generated from a nanopore sequencer from the different amino acid types.
- the helper molecules comprising hydrophobic regions may bind to hydrophobic aminoacid residues, thereby increasingthe detectability (e.g., signal amplitude, signal intensity, current deflection or blockade, etc.) of one or subsets of the modified amino acids comprising hydrophobic residues.
- the helper molecule may change a property, e.g., hydrophobicity, hydrophilicity, flexibility or rigidity, size, shape, conformation, charge, etc., of the modified amino acid, which may render the modified amino acid more detectable, e.g., by increasing the distinction of the signal generated from the different amino acid types.
- a property e.g., hydrophobicity, hydrophilicity, flexibility or rigidity, size, shape, conformation, charge, etc.
- FIG. IF schematically illustrates binding of one or more helper molecules with a stacked plurality of modified amino acids.
- the stacked plurality of modified amino acids 123 may be generated using any useful technique or combination of techniques, e.g., as shown in FIGs. 1A-1D.
- the stacked plurality of modified amino acids 123 is contacted with one or more helper molecules 131.
- the one or more helper molecules 131 may be identical or different.
- a plurality of helper molecules comprising different functional groups or moieties may be provided.
- a first helper molecule may comprise a hydrophobic moiety disposed at one end of the molecule and a polar or charged moiety disposed at the other end.
- a second helper molecule may comprise two ends, each having a polar or charged moiety.
- a third helper molecule may comprise two ends, each having a hydrophobic or nonpolar moiety.
- the hydrophobic moieties may be configured to form hydrogen bonds or van der Waals interactions with nonpolar side chains (e.g., Gly, Ala, Vai, Leu, He, Met, Phe, Trp, etc.).
- the charged or polar moieties may be configured to form ionic or electrostatic interactions with polar or charged side chains (e.g., Asn, Ser, Gin, Asp, Glu, Lys, Arg, His).
- the helper molecules comprise a polymerizable molecule, e.g., nucleic acid molecule, which can hybridize to a complementary sequence of the polymerizable molecules (e.g., DNA molecules) of the stacked plurality of modified amino acids.
- the helper molecules bound to the stacked plurality of modified amino acids may then be translocated through a nanopore (e.g., a biological or solid- state nanopore) or nanogap for detection and can aid in decreasing the translocation time of the individual modified amino acids through the nanopore or nanogap.
- the helper molecules may comprise single nucleotides, di-nucleotides, or other small nucleic acid molecule, which may be configured to hybridize with a polymerizable molecule of the modified amino acid or stacked plurality of modifiedamino acids atany useful position or location, e.g., in contactwith the amino acid, adjacent to the amino acid, upstream or downstream along the polymerizable molecule backbone, etc.
- the helper molecules may be configured to interactwith the nanopore or nanogap sequencer.
- the helper molecule may comprise a moiety that interacts with a moiety or binding pocket within the lumen or sensing region of a biological nanopore.
- the nanopore or nanogap cancompriseamodification,e.g., abinding moiety, chelator molecule (e.g., nickel modification), etc. which can interactwith the moiety of the helper molecule.
- the helper molecule may induce a structural or chemical change in the nanopore, which may improve sequencing or identification accuracy, e.g., by decreasing the translocation speed of the modified amino acid through the nanopore.
- the linkers described herein may comprise different moieties that can facilitate or modulate the interaction of the modified amino acids with the nanopore.
- a library of linkers with different properties e.g., charged, polar, hydrophobic, etc.
- the library of linkers may have a variety of moieties (e.g., a charged moiety, a hydrophobic moiety, etc.), which may couple preferentially to particular amino acid types (e.g., hydrophobic moieties may interact preferentially with hydrophobic residues, negatively charged groups may interact with positively charged residues, etc.) and may, in some instances, facilitate the conjugation of the linker to the amino acid.
- the library of linkers may additionally comprise the same or different reactive groups (e.g., click chemistry moieties, see, e.g., FIGs. 2A-2
- the nanopore, nanogap, or nanochannel may be modified or engineered to comprise a moiety that can interact with the modified amino acid (or portion thereof) or a helper molecule.
- a biological nanopore may be engineered using protein engineering techniques to improve pi-pi interactions between the linker comprised by the modified amino acid and the nanopore (e.g., in the lumen or sensingregion of the nanopore).
- the nanopore lumen may comprise one or more phenylalanine residues, which can then interact with a phenyl group of the modified amino acid (e.g., added to a portion of the linker or the polymerizable molecule).
- a coordination interaction between the modified amino acid and the nanopore lumen may occur.
- the modified amino acid e.g., the linker
- the nanopore lumen may comprise a modification to facilitate coordination chemistry, e.g., a lumen-exposed charged residue such as histidine, aspartic acid, or glutamic acid or a polar side chain.
- the coordination interaction maybe facilitated by a metal ion, e.g., Cu(II), Ca(II), Co(II), Mg (II), Zn(II), Fe(II), or other metal ion.
- the metal ion may coordinate 6 bonds, any number of which may arise from the modified nanopore and the modified amino acid; for example, the ratio of bonds arising from the modified nanopore to the modified amino acid may be 5 :1, 4:2, 3 :3, 2:4, or 1 :5.
- Interaction of the modified amino acid and the nanopore, nanogap, or nanochannel may be facilitated via hydrogen bonding, van der Waals interaction, or entropic forces.
- either or both the nanopore and the modified amino acid or portion thereof e.g., the linker, the polymerizable molecule
- the nanopore, nanogap, or nanochannel may comprise one or more binding agents or moieties, which can facilitate the interaction of the nanopore, nanogap, or nanochannel with the modified amino acid.
- the nanopore, nanogap, or nanochannel may comprise an antibody, nanobody, antibody fragment, aptamer, biotin, streptavidin, integrin or binding moiety (e.g., an RGD tripeptide), DNA, RNA, polymer, nanoparticle, fluorophore. Additional examples of binding agents are described elsewhere herein.
- low- affinity binders may be attached to the nanopore or nanogap to enable weak interactions of the modified amino acids during translocation and/or measurement.
- the nanopore may comprise a surface coating that can facilitate interaction with the modified amino acid or portion thereof.
- the nanopore may comprise any number of useful moieties such as charged moieties, hydrophobic moieties, polar moieties, or nonpolar moieties.
- the polymerizable molecule may be treated with a radical or radicalgenerating agent, e.g., UV rays, hydrogen peroxide, ozone, or other reactive oxygen species.
- a radical or radicalgenerating agent e.g., UV rays, hydrogen peroxide, ozone, or other reactive oxygen species.
- the radical agent may be provided prior to or during translocation of the modified amino acid through the nanopore or nanogap, e.g., to induce damage on the polymerizable molecule, which may be useful improving the detection of the nanopore or nanogap sequencer, e.g., increasingthe measured current signal or blockade, producing a more distinct signal, etc.
- the damage or change caused to the polymerizable molecule decreases the translocation speed of the polymerizable molecule.
- Modifications to the polymerizable molecules may also be useful in improving the sequencing accuracy.
- a polymerizable molecule that comprises DNA may comprise or be treated with, an intercalating dye (e.g., SYBR) or a polyamine (e.g., spermine, spermidine), which may change the charge or another property of the DNA molecule, thereby rendering the polymerizable molecule and/orthe modified monomer (e.g., modified amino acid) comprisingthe polymerizable molecule more detectable.
- the change in charge or other property may modulate the translocation speed of the modified monomer.
- Modifications to the polymerizable molecules may be performed using a chemical or enzymatic reaction or by changing the buffer or surrounding solution conditions.
- the modification of the polymerizable molecule comprises inducingDNA damage, e.g., irradiation such as treatment with UV, gamma, or X-rays, treatment with a mutagen (e.g., ethidium bromide), an oxidizing agent or reactive oxygen species, heat, a radical agent (e.g., peroxide), etc.
- a mutagen e.g., ethidium bromide
- an oxidizing agent or reactive oxygen species e.g., heat, a radical agent (e.g., peroxide), etc.
- the polymerizable molecule orthe modified monomer, e.g., modified aminoacid may comprise or be coupled to other structural or chemical elements (e.g., provided separately or comprised by a helper molecule); such structural or chemical elements may beneficially assist in enhancing sequencing or molecular identification, e.g., by decreasing the translocation speed of the polymerizable molecule as compared to a control molecule (e.g., a polymerizable molecule thatdoes not have the structural or chemical element).
- the structural or chemical element may be useful in analysis or determining the identity of the modified amino acid.
- the structural or chemical element may incur a change in the measured signal of a portion of the modified amino acid or stacked plurality of modified amino acids during translocation through the nanopore; such a signal change can be used as a distinctive signal or signal fiducial marker, e.g., for signal processing or alignment purposes, an indicator of a discrete modified amino acid or a stacked plurality of modified amino acids, or other spatiotemporal marker.
- the structural or chemical element may include, for example, a locked nucleic acid (LNA), peptide nucleic acid (PNA), a spacer, a loop sequence, a hairpin sequence, a coiled or super-coiled DNA molecule, a charged linker or adduct, a modified base or backbone, a lesion (e.g., UV-induced DNA lesion), a biotin moiety (e.g., a biotinylated nucleobase), an avidin or streptavidin moiety (e.g., a streptavidin-conjugated nucleobase), an inverted DNA base, a modified DNA moiety such as an Int 5 -Nitroindole, Int 5-TAMRATM (Azide), Int Super G®, 2'- Deoxy-P-nucleoside-5 '-Triphosphate - (N-2037), Pyrene-dU-CEPhosphoramidite, Perylene-d
- LNA
- the structural or chemical element may comprise a DNA adduct (e.g., an adduct introduced chemically or using a mutant polymerase), such as Pt- (GpG), N 2 -benzo[a]pyrene diolepoxide-2’-deoxyguanosine, 8-oxo-7,8-dihydro-2’- deoxy guanosine, abasic sites, 5-guanidinohydantoin, 2 ’-deoxy inosine, DNA mismatch sites, 2’- deoxy cytidine derivatives, O 6 -carboxymethyl-2’ -deoxyguanosine, an epigenetic mark (e.g., 5’- methyl-2’ -deoxy citidine and 5-hydroxymethyl-2’-deoxy cytidine), or other DNA adduct, e.g., as described by Nookaewet al 2020.
- a DNA adduct e.g., an adduct introduced chemically or using
- bulkier DNA adducts may be used or attached to the polymerizable molecule, e.g., a DNA binding protein (e.g., recombinase or other single stranded binding proteins, strand displacing enzymes, streptavidin, traptavidin, biotin, etc.).
- the structural or chemical element may comprise an electronic tag, e.g, a molecule comprising saturated double bonds, triple bonds, dye molecules, etc.
- the structural or chemical element may comprise a single- or double-stranded break, which optionally may be introduced at any convenient step or process, e.g., using non-homologous end joining homologous recombination, polymerase repair, etc.
- the structural or chemical element may change an interaction of the polymerizable molecule or modified monomer (e.g., modified amino acid) with the nanopore.
- the structural or chemical element may induce a structural change in the nanopore (e.g., biological nanopore), such that the measured signal from the nanopore (e.g., current) is altered.
- the polymerizable molecule may comprise a modification site, e.g., in proximity to the attachmentpointofthe linker orthe amino acid-linker complex, which may enable conjugation of another molecule (e.g., a helper molecule, as described elsewhere herein) that can interact with the amino acid portion of a modified amino acid.
- the polymerizable molecule may comprise single-stranded DNA, double-stranded DNA, or partially double-stranded DNA with any useful number or variety of structural or chemical elements, which may assist in increasing the nanopore sequencing accuracy, for example by decreasing the translocation speed of the polymerizable molecule.
- the polymerizable molecule comprises a DNA molecule that comprises a particular sequence of nucleotides or nucleotide analogs that decreases the translocation speed of the DNA molecule through the nanopore.
- the accuracy of detection and identification of the modified amino acid through the nanopore sequencer includes alteration of an operatingparameter of the nanopore sequencer. For instance, a reduced or decreased temperature, different voltage (e.g., change in voltage, altering voltage, oscillatingvoltage, etc.), or different samplingrate may be used.
- the translocation may be performed at freezing or sub-freezing temperatures.
- additional reagents such as glycerol or cryoprotective agents, may be utilized to facilitate the translocation of the analytes and preservation of the nanopore (e.g., biological nanopore).
- the linker of the polymerizable molecule may have any useful properties that may aid in distinction, discrimination, or identification of the modified amino acids.
- the polymerizable molecule may have a designated size, molecular weight, hydrodynamic radius, hydrodynamic volume, hydrophobicity, hydrophilicity, hygroscopicity, charge, polarity, flexibility, rigidity, etc.
- FIG. 3 schematically shows an example of a modified amino acid, which can be generated by coupling a polymerizable molecule having a linker to an amino acid-linker complex.
- the polymerizable molecule comprises an alkyne-linkedDNA molecule; the alkyne is provided as a synthetic nucleobase, octadiynyl deoxyuridine, thatisincorporatedinto aDNAmolecule.
- the amino acid-linker complex is generated using a bifunctional linker, l-(2-azidoethyl)4- isothiocyanatobenzene, which comprises (1) a PITC moiety (amino acid reactive group) and (2) an azide moiety that is capable of reacting with the alkyne of the polymerizable molecule.
- a bifunctional linker l-(2-azidoethyl)4- isothiocyanatobenzene
- a PITC moiety amino acid reactive group
- an azide moiety that is capable of reacting with the alkyne of the polymerizable molecule.
- Three different example amino acid-linker complexes are shown comprising Asp, Trp, and Tyrthat have each been reacted with the bifunctional linker.
- the amino acid-linker complexes may react with the alkyne-linked DNA molecule, thereby generating a modified amino acid.
- the alkyne-linkedDNA molecule may first be
- the modified amino acid may comprise a plurality of linkers that are linked together.
- the polymerizable molecule may comprise a backbone linker (e.g., a phosphodiesterbackbone)comprisinga functional moiety that can couple to another linker which may be capable of coupling to an amino acid-linker complex, to generate a modified amino acid comprising multiple linkers.
- the polymerizable molecule comprises a backbone linker (e.g., on a phosphate or sugar of a nucleic acid molecule) that comprises a free amine group.
- a second linker comprising an amine-reactive group, such as NHS ester and a click chemistry moiety (e.g., BCN, DBCO, azide, alkyne, etc.) may couple to the free amine group of the polymerizable molecule.
- an amino acid-linker complex e.g., generated by reacting an N-terminal amino acid of a peptide with a third linker comprising an amino acid reactive group (e.g., PITC, a xanthate, a guanidinylating agent, a dithioester, etc.) may be provided.
- the amino acid-linker complex may comprise an additional click chemistry moiety that can react to the click chemistry moiety of the second linker.
- reaction of the amino acid-linker complex (the third linker) with the second linker may yield a modified amino acid the comprises the backbone linker, the second linker (coupled to the backbone linker via NHS ester and amine reaction), and the amino acid-linker complex (comprising the third linker coupled to the second linker via click chemistry).
- the translocation of the modified amino acid may be modulated or controlled using a mutant helicase or other ratcheting enzyme, e.g., polymerase, exonuclease, single stranded and double stranded binding protein, or topoisomerase, such as gyrases.
- translocation of the modified amino acid may be modulated by the applied voltage. For instance, voltage-gated pausing may be implemented to pause a modified amino acid as it translocates through the nanopore.
- improvements to nanopore sequencing accuracy may be obtained by modulating the translocation speed of an analyte (e.g., a modified amino acid) through the nanopore.
- the translocation speed of the modified amino acid may be higher or lower than the translocation speed of the unmodified amino acid or a modified amino acid that does not comprise a polymerizable molecule.
- the translocation speed of the modified amino acid is lower than that of the unmodified amino acid or the modified amino acid that does not comprise the polymerizable molecule.
- the translocation speed of the modified amino acid may be decreased by at least about 0.1%, at least about 1%, at least about 2%, at least about 3%, at least about 4%, at least about 5%, at least about 6%, at least about 7%, at least about 8%, at least about 9%, at least about 10%, at least about 20%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, atleast about 90%, at least about 100%, atleast about200%, at least about 500%, at least about 1000%, at least about 5000% or greater.
- the translocation speed of the modified amino acid may be decreased by a range of percentages, e.g., from about 5%- 20%, from about 500%-3000%, etc.
- the translocation speed of the modified amino acid is non-uniform, e.g., may fall within a range of percentages by which the translocation speed changes, and may depend on a property (e.g., size, charge, polarity, etc.) of the polymerizable molecule, the amino acid type encompassed by the modified amino acid, the linker, or any other component of the molecule.
- a property e.g., size, charge, polarity, etc.
- the change in translocation speed of the modified amino acid, as compared to an unmodified amino acid may render the modified amino acid more detectable.
- the polymerizable molecule coupled to an amino acid may decrease or alter the translocation speed as it translocates through the nanopore or nanogap, such that a more accurate current reading may be obtained.
- the polymerizable molecule may alter the charge, size, masscharge ratio, aspect ratio of the amino acid to which it is coupled, which can alter the current signature to improve detectability of the amino acid, the polymerizable molecule, or both.
- the media (e.g., liquid, buffer, solution) in which the nanopore is in contact may comprise one or more agents that can alter or modulate the translocation speed of the modified amino acid.
- the media may comprise ions, salts, or other molecules which may selectively or preferentially alter the interaction between the modified amino acid and the nanopore.
- a change in the buffer composition may result in increased retention or dwell times of the modified amino acid within the sensing region of the nanopore, thereby producing a more detectable signal that can be used to identify or detect the modified amino acid.
- a measured signal (e.g., current blockade) of the modified amino acid as it translocates through the nanopore or nanogap is substantially different than that of an unmodified amino acid or a modified amino acid that does not comprise a polymerizable molecule.
- a measured signal of the modified amino acid as it translocates through the nanopore or nanogap is substantially different than that of the polymerizable molecule alone.
- the difference of the measured signal may be measured by any useful metric, e.g., fold-change, percentage change, absolute current measurement, signal-to-noise ratio, signal amplitude difference, signal signature, frequency, etc.
- the methods, systems, compositions, and kits provided herein may allow for high-accuracy sequencing of polymeric analytes, such as peptides. For instance, for a given amino acid type, the methods provided herein may provide an average read accuracy of greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90% or higher. For a plurality of amino acid types, the methods provided herein may provide an average read accuracy of greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90% or higher for each of the amino acid types of the plurality of amino acid types.
- the plurality of amino acid types may include 2 amino acid types, 3 amino acid types, 4 amino acid types, 5 amino acid types, 6 amino acid types, 7 amino acid types, 8 amino acid types, 9 amino acid types, 10 amino acid types, 11 amino acid types, 12 amino acid types, 13 amino acid types, 14 amino acid types, 15 amino acid types, 16 amino acid types, 17 amino acid types, 18 amino acid types, 19 amino acid types, or 20 amino acid types of the 20 proteinogenic amino acids.
- the plurality of amino acid types may include greater than 20 amino acid types, e.g., including post-translationally modified amino acids or noncanonical amino acids.
- the average read accuracy may fall within a range of accuracies based on the number of amino acid types being identified.
- the average read accuracy for identifying a smaller set of amino acid types may be greater than the average read accuracy for identifying a larger set of amino acid types (e.g., 10 or more amino acid types, all 20 proteinogenic amino acid types, more than 20 proteinogenic amino acid types and post-translational modifications). Accordingly, it will be appreciated that the average read accuracy may fall within a range, depending on the number of amino acid types that are identified.
- the methods, systems, compositions, and kits provided herein may allow for high individual identification accuracies of amino acids (or an amino acid type comprised by a modified amino acid).
- the individual identification accuracy of a modified amino acid type maybe greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90% or higher.
- the methods provided herein may provide an individual identification accuracies of greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90% or higher for each of the amino acid types of the plurality of amino acid types.
- the plurality of amino acid types may include 2 amino acid types, 3 amino acid types, 4 amino acid types, 5 amino acid types, 6 amino acid types, 7 amino acid types, 8 amino acid types, 9 amino acid types, 10 amino acid types, 11 amino acid types, 12 amino acid types, 13 amino acid types, 14 amino acid types, 15 amino acid types, 16 amino acid types, 17 amino acid types, 18 amino acid types, 19 amino acid types, or 20 amino acid types.
- the plurality of amino acid types may include greater than 20 amino acid types, e.g., post-translationally modified amino acids or noncanonical amino acids.
- the individual identification accuracy may fall within a range of accuracies based on the number of amino acid types being identified.
- the individual identification accuracy for identifying a smaller set of amino acid types may be greater than the individual identification accuracy for identifying a larger set of amino acid types (e.g., 10 or more amino acid types, all 20 proteinogenic amino acid types, more than 20 proteinogenic amino acid types and post-translational modifications). Accordingly, it will be appreciated that the individual identification accuracy may fall within a range, depending on the number of amino acid types that are identified.
- the methods, systems, compositions, and kits provided herein may also allow for high average identification accuracies of amino acids (or an amino acid type comprised by a modified amino acid).
- the average identification accuracy of a given modified amino acid type may be greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90% or higher.
- the methods provided herein may provide an average identification accuracy of greater than 50%, greater than 60%, greater than 70%, greaterthan 80%, greater than 90% or higher for each of the amino acid types of the plurality of amino acid types.
- the plurality of amino acid types may include 2 amino acid types, 3 amino acid types, 4 amino acid types, 5 amino acid types, 6 amino acid types, 7 amino acid types, 8 amino acid types, 9 amino acid types, 10 amino acid types, 11 amino acid types, 12 amino acid types, 13 amino acid types, 14 amino acid types, 15 amino acid types, 16 amino acid types, 17 amino acid types, 18 amino acid types, 19 amino acid types, or 20 amino acid types.
- the plurality of amino acid types may include greater than 20 amino acid types, e.g., post-translationally modified amino acids or noncanonical amino acids.
- the average identification accuracy may fall within a range of accuracies based on the number of amino acid types being identified.
- the average identification accuracy for identifying a smaller set of amino acid types may be greater than the average identification accuracy for identifying a larger set of amino acid types (e.g., 10 or more amino acid types, all 20 proteinogenic amino acid types, more than 20 proteinogenic amino acid types and post-translational modifications). Accordingly, it will be appreciated that the average identification accuracy may fall within a range, depending on the number of amino acid types that are identified.
- the individual or average identification accuracy may be determined probabilistically.
- an individual identification accuracy or an average identification accuracy may be characterized by a probabilistic determination of a modified amino acid belonging to an amino acid type or subset of amino acid types.
- the probabilistic determination may comprise determining a probabilistic distribution of a range of classes that correspond to amino acid types (e.g., leucine, isoleucine, valine, arginine, etc.) or determining the probability that a modified amino acid is derived from an amino acid type or a subset of amino acid types.
- calibration molecules e.g., modified amino acids of known identity
- an adapter sequence comprising one or more known modified amino acids (e.g., a cleaved amino acid that is linked to a known DNA sequence) may be attached to each or a subset of polymeric analytes.
- the calibrating molecule may serve as a reference standard for the individual monomers of the polymeric analytes.
- the calibration molecules may comprise one or more small molecules, which may vary in one or more properties such as hydrophobicity, size, charge, flexibility, polarity, etc.
- a calibration molecule may comprise a polymerizable molecule and optionally, a linker, without an amino acid attached thereto. In some instances, the calibration molecules may comprise similar, but not identical, modified amino acids.
- a calibration molecule may comprise an amino acid, a polymerizable molecule (e.g., nucleic acid molecule) that is used in the intramolecular expansion process of a peptide analyte, but which may comprise a different linker type (e.g., isothiocyanate instead of PITC, isothiocyanate instead of a guanidinylating agent), etc.
- a library of calibration molecules may be used to enable calibration across different runs, devices, flow cells, etc.
- a modified amino acid or stacked plurality of modified amino acids may comprise or be appended to a polymerizable molecule that is identical to that of the modified amino acid (or the polymerizable molecules comprised by the stacked plurality of modified amino acids), but without the amino acid or linker.
- the polymerizable molecule (“DNA backbone”) may comprise a nucleic acid sequence which comprises a substituted, alkyne- containingbase (e.g., octadiynyl dU) to which the amino acid-linker complex may bind.
- the DNA backbone may also comprise or be coupled to a calibration molecule, which comprises the same sequence as the DNA backbone but with a conventional nucleotide (e.g., A, C, T, G, U) instead of the substituted base.
- the calibration molecule (sequence) may serve as a baseline sequence for alignment or comparison of the read arising from the modified amino acid portion.
- Read accuracy can also be improved computationally. For instance, accuracy can be improved by imposing confidence thresholds of individual reads, such that higher-confidence reads are output, and lower-confidence reads are removed. Confidence can be, for example, a probability that is assigned by a computational algorithm for the predicted amino acid type.
- the identification accuracy of a modified amino acid may be improved by increasing the read count of a particular modified amino acid.
- the modified amino acid may be ratcheted back and forth through a particular nanopore, thereby obtaining a plurality of reads of the modified amino acid. Additional approaches for ratcheting a modified amino acid back and forth (“flossing”) through a particular nanopore are described elsewhere herein.
- the read count may be increased by circularizing the polymerizable molecule or plurality of polymerizable molecules of a modified amino acid or stacked plurality of amino acids.
- a method of generating increased reads of a modified amino acid may comprise: providing the modified amino acid comprising a polymerizable molecule; translocating the modified amino through or adjacent to a nanopore or a nanogap; circularizing the polymerizable molecule, thereby generating a circularized modified amino acid; and translocating the circularized, modified amino acid through or adjacent to the nanopore or the nanogap.
- FIG. 4A schematically shows an example workflow for generating increased reads (repeat reads) for a single modified amino acid or stacked plurality of modified amino acids.
- a stacked plurality of modified amino acids 401 is provided, and can be generated using the methods described herein, see, e.g., FIGs. 1A-1D.
- the stacked plurality of modified amino acids may comprise a plurality of modified amino acids that are linked together by a polymerizable molecule backbone (e.g., a nucleic acid backbone) that comprises linked polymerizable molecules 111.
- a polymerizable molecule backbone e.g., a nucleic acid backbone
- single modified amino acids comprising a polymerizable molecule 111 may be provided.
- the modified amino acid or stacked plurality of modified amino acids may be translocated through a nanopore 405, optionally using a ratcheting enzyme 403 such as a polymerase, helicase, or other processive enzyme.
- a ratcheting enzyme 403 such as a polymerase, helicase, or other processive enzyme.
- one or more measurements is made (e.g., current blockade, current signal, impedance, current amplitude, etc.).
- the one or more measurements may be deconvolved computationally to output the identity of the modified amino acid.
- FIG. 4A Panel B multiple reads are generated from a single modified amino acid or stacked plurality of modified amino acids by circularizing the polymerizable molecule or plurality of stacked polymerizable molecules.
- the polymerizable molecule or plurality of stacked polymerizable molecules may translocate counter- directionally (e.g., in the trans to cis direction) through an adjacent nanopore.
- One portion of the polymerizable molecule or plurality of stacked polymerizable molecules may then self-ligate to another portion of the polymerizable molecule or plurality of stacked polymerizable molecules, thereby generating a circularized molecule (a circularized modified amino acid or circularized stacked plurality of modified amino acids).
- additional polymerizable molecules e.g., splint or bridge oligos
- additional polymerizable molecules e.g., splint or bridge oligos
- the circularized molecule may then iteratively translocate through both pores in a cyclic manner, and the signal may be measured from one or both nanopores.
- the measured signal may thus generate multiple reads of the same circularized molecule, which reads can be output and used to identify the modified amino acid or plurality of modified amino acids.
- the stacked plurality of modified amino acids may be positioned along a polymerizable molecule backbone such that individual reads are obtained for a given modified amino acid without overlapping or interfering signal from another modified amino acid of the stacked plurality of modified amino acids.
- FIG. 4B schematically shows nanopore sequencing of a stacked plurality of modified amino acids that are spaced along a polymerizable molecule backbone, e.g., as generated using workflow lOOd of FIG. ID.
- the stacked plurality of modified amino acids may comprise individual modified amino acids 402 that are spatially resolved from one another and spaced along the polymerizable molecule backbone such that only approximately one modified amino acid translocates through a given nanopore at a given instant.
- a molecular motor protein such as a ratcheting enzyme, e.g., a polymerase or helicase, or engineered variantthereof, may be used to translocate the polymerizable molecule backbone (e.g, a single or double-stranded DNA molecule) adjacent to the nanopore.
- the modified amino acids may translocate through the nanopore (e.g., using electrophoretic force) as the polymerizable molecule backbone is ratcheted through the ratcheting enzyme (shown in FIG. 4B as occurring in an orthogonal direction to the ratcheting).
- the ionic current signal can be measured through the nanopore as the translocation event occurs.
- the ratcheting enzyme may have cleaving activity (e.g., an engineered Bst polymerase that has been modified to cleave hexaphosphates or other modified nucleotides or nucleosides), such that a portion of the individual modified amino acids 402 may be released from the polymerizable molecule backbone and may translocate (e.g., via diffusion or active transport such as electrophoresis) through the nanopore.
- the ratcheting enzyme is engineered at the ATP binding site.
- an increase in read count may be obtainedby increasing the numb er of engagement sites of the modified amino acid or stacked plurality of modified amino acids with one or more nanopores.
- introducing more structures that can engage with a motor or ratcheting enzyme (e.g., helicase) or to facilitate entry to the nanopore may be performed.
- FIG. 4C schematically shows a workflowto introduce additional 5’ double-stranded or partially double-stranded ends that can engage with or enter a nanopore using hybridization chain reaction.
- a stacked plurality of modified amino acids 423 may be coupled (e.g., using ligation) to different hairpin molecules at each end of the stacked plurality of modified amino acids 423.
- a plurality of the stacked plurality of modified amino acids 423 may then be concatenated or joined together by introduction of helper molecules, such as additional hairpin molecules that can promote the hybridization chain reaction. Accordingly, a plurality of the stacked plurality of modified amino acids may be formed, with each stacked plurality of modified amino acids comprising a 5 ’ end double-stranded or partially double-stranded end that can facilitate entry into the nanopore.
- Hybridization chain reaction may additionally be used to generate or introduce additional structural elements, e.g., loop orhairpin regions, A-tailed regions, etc. to the plurality of stacked plurality of modified amino acids (not shown).
- FIG. 4D schematically shows nanopore sequencing of a circularized stacked plurality of modified monomers, e.g., modified amino acids.
- the left-hand schematic illustrates a circularized stacked plurality of modified monomers, e.g., individual modified amino acids 402, which may be connected to one another along a polymerizable molecule backbone.
- the circularized stacked plurality of modified monomers may be generated using an intramolecular expansion process to generate a linear stacked plurality of modified monomers and then circularizing the linear stacked plurality of modified monomers (e.g., via ligation of one end to another).
- the polymerizable molecule may be provided as a circularized molecule comprising a plurality of individual linkers (e.g., comprising amino acid reactive groups) to which the monomers may be coupled individually and then cleaved.
- a portion of the polymeric analyte e.g., peptide
- the circularized stacked plurality of modified monomers may be contacted with a nanopore 405 and a locally coupled molecular motor protein, such as a ratcheting enzyme 403.
- the ratcheting enzyme 403 may facilitate translocation of the circularized stacked plurality of modified monomers adjacent to the nanopore 405, such that the individual modified amino acids 402 may translocate through the nanopore 405 for detection.
- the individual modified amino acids 402 may comprise a charged moiety (e.g., along the polymerizable molecule or via a linker), which can aid in translocation of the individual modified amino acids 402 through the nanopore 405.
- a remaining portion of the polymeric analyte is present, that may also be translocated through the nanopore 405 for detection, or alternatively, detected using other approaches such as imaging (e.g., super-resolution imaging).
- Detection of the remainingportion of the polymeric analyte may yield useful information such as the size or length, identity, charge, etc. of the remaining portion of the polymeric analyte, which, in combination with identification of the individual modified monomers (modified amino acids), may identify the originating analyte (e.g., identify the starting protein or peptide).
- imaging may be used to detect the remaining polymeric analyte.
- a dye or fluoroph ore may be coupled to the N-terminus of a remaining peptide after intramolecular expansion and to the capture moiety coupled to the remaining peptide, and imaging (e.g., confocal microscopy or super resolution imaging) may be used, e.g., to determine a distance between the capture moiety and the N- terminus, thereby yieldinginformation on the number of amino acids in the remaining polymeric analyte.
- the N-terminus of the remaining peptide may be labeled with a fluorophore and the nanopore may also comprise a fluorophore, and FRET can be used to determine the number of amino acids in the remaining peptide.
- Multi-state measurements' may also employ measuring multiple states of polymeric analytes.
- a measured state may correspond to any useful measured parameter, e.g., a measured ionic current or current blockade, that can be indicative of a characteristic of a portion of the polymeric analyte that is translocating through or adjacent to the nanopore or nanogap.
- a method for sequencing a peptide may comprise (a) providing a modified amino acid generated from the peptide, wherein the modified amino acid comprises a polymerizable molecule; (b) translocating at least a portion of the modified amino acid through or adjacent to a nanopore or nanogap; (c) measuring one or more states of the modified amino acid or portion thereof.
- the method may comprise: (a) providing a modified amino acid generated from the peptide, wherein the modified amino acid comprises a polymerizable molecule comprising M monomers, wherein M is a positive integer;
- translocating comprises ratcheting of a first monomer of the M monomers through the nanopore;
- N may be any useful integer or non-integer number.
- N is a characteristic number that represents the number of monomers that is present in the nanopore or in a sensing or interacting portion of the nanopore in a given instant.
- N may represent the number or average number of monomers of a polymerizable molecule that traverse through the nanopore in a given instant.
- N measured states may correspond to the N contiguous monomers in the nanopore in a given instant (e.g., the ratio of the measured states to the number of monomers measured is 1 : 1).
- the N measured states may correspond to N measurements taken from N discrete, non-contiguous monomers, e.g., corresponding from a slip in the ratcheting of the monomers (e.g., the ratio of the measured states to the number of discrete monomers measured is 1 : 1 for non-contiguous monomers).
- the N measured states may correspond to fewer than N monomers, e.g., corresponding from a repeat read of one or more monomers (e.g., the ratio of the measured states to the number of discrete monomers measured is ⁇ 1 ).
- the N measured states may correspond to a non-integer value of monomers. For instance, if a non-integer number of monomers (e.g., 1.2, 1.5, etc.) is ratcheted through the nanopore, the N measured states may correspond to a measurement obtained from a non-integer value of monomers.
- the measured state may be derived from a measurement of the current signal, current blockade, or other useful measurement.
- the measured state can be indicative of a positional, orientational, conformational state of the N monomers present in the nanopore in a given instant.
- the measured state may be an indication of property of the collective or individual monomers present in the nanopore in a given instant, e.g., size such as a length or hydrodynamic volume, hydrophobicity, hydrophilicity, flexibility, hygroscopicity, charge, isoelectric point, or other property.
- the N measured states may be used to determine an identity of a modified amino acid or a group of possible amino acid types. For instance, the N measured states may be compared to a reference measurement or set of reference measurements of known modified amino acids.
- the reference measurements may comprise a plurality of measured states (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more) for a given modified amino acid.
- a modified amino acid may be generated using the methods described herein and may comprise a cleaved amino acid coupled, e.g., via one or more linkers, to a polymerizable molecule, such as a DNA molecule.
- the modified amino acid may comprise the sequence of X1-X2-X3-X4-X5-X6-Y-X7-X8-X9-X10, where each X is a nucleotide, such as an A, G, C, or T basepair, and Y is a modified base or DNA backbone comprising the cleaved amino acid.
- the modified amino acid may be contacted with a molecular motor protein such as a ratcheting enzyme, e.g., a helicase, polymerase, or topoisomerase, that can ratchet individual base pairs through a nanopore.
- a ratcheting enzyme e.g., a helicase, polymerase, or topoisomerase
- a measurement of the current blockade is measured through the nanopore.
- a state measurement may correspond to a ratcheting event of the ratcheting enzyme, e.g., ratcheting of one base pair, or the state measurement may correspond to anon-integer number of basepairs (e.g., non-integer ratcheting of basepairs).
- one or more consecutive measurements comprises a repeat, e.g., both a second state measurement measures X2-X3-X4- X5-X6-Y-X7-X8 and a third state measurement measures X2-X3-X4-X5-X6-Y-X7-X8 (e.g., a stalling of the ratcheting enzyme or a repeat read).
- one or more basepairs may be excluded in a state measurement, e.g., a first state measurement measures X2-X3-X4-X5-X6-Y-X7-X8 and a second state measurement measures X4-X5-X6-Y-X7-X8-X9-X10.
- a state measurement e.g., a first state measurement measures X2-X3-X4-X5-X6-Y-X7-X8 and a second state measurement measures X4-X5-X6-Y-X7-X8-X9-X10.
- N and M can be any useful number; for instance, M can be sufficiently large such that adjacent modified amino acids of a stacked plurality of modified amino acids are spaced apart to resolve the modified amino acids.
- N may comprise any integer or non-integer number and can, in some instances, be associated with the ratcheting behavior of the ratcheting enzyme.
- Polymeric Analytes The sequencing approaches provided herein may be used for analyzing and characterizing peptides or for other types of polymeric analytes.
- the methods outlined herein may be useful in sequencing or analyzing a biomolecule, macromolecule, or synthetic molecule, or a combination thereof, e.g., chimeric molecules.
- the polymeric analyte may be a biomolecule or other biological molecule that comprises one or more monomers.
- Nonlimiting examples of polymeric biomolecules include nucleic acid molecules (e.g., DNA molecule, RNA molecule, DNA:RNA hybrids, aptamers), peptides and proteins, polysaccharides, lipid polymers (e.g., diglycerides, triglycerides and other fatty acids).
- the polymeric analyte may be a synthetic molecule, e.g., a peptoid or synthetic polymer, or a peptidomimetic (e.g., a peptoid, a beta-peptide, a D-peptide peptidomimetic).
- Non-limiting examples of synthetic polymers include acrylics, nylons, silicones, viscose, rayon, polyesters, polycarboxylic acids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethane, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethylene oxide), poly(ethylene terephthalate), polyethylene, polyisobutylene, poly(methyl methacrylate), poly(oxymethylene), polyformaldehyde, polypropylene, polystyrene, polytetrafluoroethylene), poly(vinyl acetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidene dichloride), poly(vinylidene difluoride), poly(vinyl fluoride), or a combination thereof.
- the polymeric analyte may comprise any useful linkage including, carbon, ethylene glycol, acetylene, isothianapthene, dimethyl silane, urethane, glycolic acid, lactic acid, dioxanone, methyl methacrylate, hydroxyethyl methacrylate, vinyl chloride, tetrafluoro ethylene, propylene, ethylene, ether ketone, ether suylfonefluorene, aniline, phenylene, polypyrrole, phenylenevinylene, fluorene, thiophene, or 3,4-ethylenedioxythiophene linkages.
- the polymeric analytes may comprise a single polymer type (e.g., a homopolymer) or more than one polymer type (e.g., a copolymer) and may comprise random or arranged monomers.
- the polymeric analytes may be a block polymer, alternating copolymer, periodic copolymer, statistical copolymer, stereoblock copolymer, gradient copolymer, branched copolymer, graft copolymer, etc.
- the polymeric analytes may be any size or comprise a range of sizes.
- the polymeric analyte may be about 1 nanometer (nm), about 5 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about200 nm, about 300 nm, about400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, about 1 micrometer (pm), about 10 pm, about 100 pm, about 1 millimeter mm in size or greater.
- pm micrometer
- a plurality of polymeric analytes may comprise polymeric analytes of similar size or within a range of sizes, e.g., between about 10 nm to about lOO nm, between about 50 nm to about 1 pm.
- the polymeric analytes may have any molecular weight or range of molecular weights.
- the polymeric analytes may be about 10 daltons (Da), 100 Da, 500 Da, 1 kilodalton (kDa), 10 kDa, 100 kDa, 1,000 kDa, 10,000 kDa, 100,000 kDa, or greater.
- the polymeric analytes may comprise polymeric analytes of similar molecular weight or within a range of molecular weights.
- the monomers of the polymeric analytes may comprise any size or range of sizes that is less than that of the entire polymeric analyte.
- a monomer may be about 0.1 nanometer (nm), about 0.5 nm, 1 about 1 nm, about 5 nm, about 10 nm, about 20 nm, about 30 nm, about40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 200 nm, about 300 nm, about 400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, about 1 micrometer (pm), about 10 pm, about 100 pm, about 1 millimeter mm in size or greater.
- pm micrometer
- the monomers may have any molecular weight or range of molecular weights.
- the monomers may be about 1 dalton (Da), 10 Da, 100 Da, 500 Da, 1 kilodalton (kDa), 10 kDa, 100 kDa, 1,000 kDa, 10,000 kDa, 100,000 kDa, or greater.
- the monomers or polymeric analytes may range in size of molecular weight; for example, a polymeric analyte may comprise a peptide comprising amino acid monomers, which may vary in molecular weight from 75 Da (glycine) to 204 Da (tryptophan).
- the polymeric analytes may comprise any number of monomers.
- the polymeric analytes may comprise about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 50,000, 100,000 or more monomers.
- the polymeric analytes may comprise at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 500, at least about 1,000, at least about 5,000, at least about 10,000, at least about 50,000 at least about 100,000 or greater monomers.
- the polymeric analytes may comprise atmost about 100,000, at most about 50, 000, atmost about 10,000, at most about 5,000, at most about 1,000, at most about 500, atmost about 100, at most about 50, at most about 10, at most about 5, or fewer monomers.
- the polymeric analytes may comprise a range of monomers; for example, a polymeric analyte may comprise about 5 monomers whereas another polymeric analyte may comprise about 500 monomers.
- the polymeric analyte comprises a peptide comprising amino acid monomeric units.
- the peptide may be naturally occurring or synthetic.
- the peptide may comprise any number of amino acids.
- the amino acids may be one of 20 proteinogenic amino acids and may comprise any number of post-translational modifications.
- the peptides or any of the constituent amino acids maybe processed, e.g., contacted with protecting groups, alkylated, betaelimination of phosphate groups, etc., as is described elsewhere herein.
- the peptides are derived from larger peptides or proteins and are fragmented.
- Substrates' One or more operations described herein may be performed using a substrate.
- one or more molecules described herein may be coupled to a substrate.
- the polymeric analyte, capture moiety, and one or more polymerizable molecules may be provided coupled to one ormore substrates.
- the polymeric analyte and a capture moiety are coupled to a substrate.
- the substrate may comprise one or more anchor molecules that may couple to the polymeric analyte or the capture moiety.
- more than one substrate may be used. In such cases, the substrates may comprise the same material or different material.
- the substrate may be made from any suitable material, e.g., glass, silicon, gel (e.g., a hydrogel including reversible hydrogels), polymer, etc., as is described elsewhere herein.
- the substrate may be a bead or a gel bead (e.g., polyacrylamide, agarose, or TentaGel® bead).
- the substrate may comprise a flow cell, a microfluidic device, or one or more surfaces disposed thereon.
- the substrate may be functionalized.
- One or more molecules may be coupled to the substrate via a covalent or non-covalent interaction.
- the capture moiety and polymeric analyte can be coupled to the substrate using any suitable chemistry, e.g., click chemistry moieties (e.g., alkyneazide coupling), photoreactive groups (e.g., benzophenone), l-ethyl-3-(3- dimethylaminopropyl)carbodiimide hydrochloride (EDC) (e.g., to couple amino-oligos or peptides), N-hydroxysulfosuccinimide (NHS), Sulfo-NHS, or NHS-esters (e.g., to couple sulfhydryl oligos), maleimides, hydrazines, hydroxyl amines, thiol
- EDC l-ethyl-3-(3- dimethylaminopropyl)carbodiimide hydrochloride
- NHS-esters
- the substrate may be functionalized to comprise a coupling chemistry to couple the polymeric analyte or the capture moiety.
- a substrate e.g., bead or surface
- a substrate may comprise an alkyne such as dibenzocyclooctyne (DBCO), which may be configured to react to an amine (e.g., DBCO-alcohol, DBCO-Boc, DBCO-NHS), a carboyxl or carbonyl (e.g., DBCO, DBCO-silane), a sulfhydryl, etc.
- DBCO dibenzocyclooctyne
- An azide- functionalized nucleic acid or protein may react with DBCO to link the nucleic acid or protein to the DBCO substrate.
- linkers such as bifunctional linkers may be used to attach a molecule to a substrate; such bifunctional linkers may comprise the same reactive moiety on both ends or a different moiety at each end (e.g., heterobifunctional linker). Additional examples of linkers are described elsewhere herein.
- a molecule e.g., polymeric analytes such as peptides, capture moieties, polymerizable molecules
- an enzymatic approach e.g., as described elsewhere herein.
- a chemical linker or moiety such as a click chemistry moiety may be attached to a polymeric analyte (e.g., peptide) using an enzyme.
- the chemical linker or moiety maybe able to react with another chemical linker or moiety (e.g., click chemistry moiety) of a substrate, capture moiety, or polymerizable molecule.
- the substrates may be coupled to any useful number of molecules (e.g., polymeric analytes, modified monomers, stacked plurality of modified monomers, capture moieties, polymerizable molecules).
- the substrate may be coupled to 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1000, 5000, 10000, 100000, 500000, 1000000, 10000000 or more molecules.
- a substrate may comprise a plurality of polymeric analytes (e.g., peptides) and a plurality of anchor molecules, which may be provided at any useful ratio or density.
- the ratio of polymeric analytes, modified monomers, or stacked plurality of modified monomers to anchor molecules may be about 1 :1, 1 :5, 1 :10, 1 :20, 1 :100, 1 : 1000, 1 :10,000, 1 :100,000, 1 :1,000,000 or lower.
- the ratio of polymeric analytes to anchor molecules maybe at most about 1 :1, at most about 1 :5, at most about 1 : 10, at most about 1 :20, at most about 1:100, at most about 1 : 1000, at most about 1 : 10,000, at most about 1 : 100,000, at most about 1 : 1,000, 000 or lower.
- the molecules may be coupled to the substrate at any useful density, for example about 1 molecule/ square micron (pm 2 ), about 10 molecules/pm 2 , about 100 molecules/ pm 2 , about 1,000 molecules/pm 2 , about 10,000 molecules/pm 2 , about 100,000 molecules/pm 2 , about 1,000,000 molecules/pm 2 , about 10,000,000 molecules/pm 2 , about 100,000,000 molecules/pm 2 , about 1,000,000,000 molecules/pm 2 , about 10,000,000,000 molecules/pm 2 , about 100,000,000,000 molecules/ pm 2 , or greater.
- pm 2 molecule/ square micron
- the polymeric analytes, capture moieties, anchor molecules, and/or polymerizable molecules may be coupled to the substrate at a range of densities, e.g., from about 100 to about 10,000 molecules/pm 2 , or from about 10 to about 1,000 molecules/pm 2 .
- the density of the polymeric analytes, capture moieties, anchor molecules, and/or polymerizable molecules may be the same or different.
- the density of the polymerizable molecules may be 1-fold, 2- fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1000-fold, 10,000- fold, 100,000-fold, 1,000,000-fold or greater-fold lower than that of the polymeric analyte.
- the molecules coupled to the substrate may be spaced apart at a designated or controlled distance.
- the average spacing or distance between the anchor molecules, the capture moieties, or the processed polymeric analyte, e.g., the stacked plurality of modified monomers may be spaced at a pitch of about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1pm, about 5 pm, about 10 pm or greater.
- the spacingbetween or pitch of the molecules may be at most about 10 pm, at most about 5 pm, at most about 1pm, at most about 500 nm, at most about 100 nm, at most about 90 nm, at most about 80 nm, at most about 70 nm, at most about 60 nm, at most about 50 nm, at most about 40 nm, at most about 30 nm, at most about 20 nm, at most about 10 nm, at most about 5 nm, or less.
- the molecules e.g., anchor molecules, capture moieties, polymeric analyte, or processed polymeric analyte such as the stacked plurality of modified monomers
- the spacing or distance between a polymeric analyte and a polymerizable molecule or capture moiety may be about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about 40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1pm or greater.
- the average spacingbetween the capture moiety and the polymeric analyte coupled to the substrate may be at most about 1pm, at most about 500 nm, at most about 100 nm, at most about 90 nm, at most about 80 nm, at most about 70 nm, at most about 60 nm, at most about 50 nm, at most about 40 nm, at most about 30 nm, at most about 20 nm, at most about 10 nm, at most about 5 nm, or less.
- a range of average distances between the polymerizable molecules from one another or from the polymeric analytes may be used, e.g., from about 1 nm to about 40 nm, from about 2 nm to about 10 nm, etc.
- the substrate may be coupled to only a single molecule.
- the molecules may be spaced or distributed unevenly across the substrate (e.g., with varying distances between the molecules).
- the concentration or density of the molecules attached to the substrate may be modulated using one or more suitable approaches, including patterning or random deposition approaches. Examples of methods to control the concentration or density of the molecules attached to the substrate include limited dilution, addition of chaotropes (e.g., guanidine, formamide, urea), using metal organic compounds, etc.
- the molecules may be attachedto the substrate in a patterned fashion, e.g., using self -assembling monolayers, photopatterning, lithography, etching, or a combination thereof, or the molecules may be randomly arranged.
- the substrate may comprise any useful size or dimension (e.g., length, width, height, diameter, radius), surface area, volume, or ratio or combination thereof.
- the substrate may comprise a bead or particle that may comprise a diameter of about 1 nanometer (nm), about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 8 nm, about 9 nm, about 10 nm, about 20 nm, about 30 nm, about40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 500 nm, about 1 pm, about 2 pm, about 3 pm, about 4
- the substrate may comprise a surface area of about 1 square nanometer (nm 2 ), about 10 nm 2 , about 100 nm 2 , about 1,000 nm 2 , about 10,000 nm 2 , about 100,000 nm 2 , about 1 pm 2 , about 10 pm 2 , about 100 pm 2 , about 1,000 pm 2 , about 10,000 pm 2 , about 100,000 pm 2 , about 1 mm 2 , about 10 mm 2 , about 100 mm 2 , about 1,000 mm 2 , about 10,000 mm 2 , about 100,000 pm 2 , about 1 mm 2 , about 10 mm 2 , about 100 mm 2 , about 1,000 mm 2 , about 10,000 mm 2 , about 100,000 mm 2 , about 1,000,000 mm 2 or greater.
- the molecules may be coupled to the substrate in an ordered, semi-ordered, or random arrangement.
- the molecules may be patterned using any conventional approach such as lithography (e.g., soft lithography, photolithography), etching (e.g., ion etching photo etching), or other patterning approach.
- a linker e.g., bifunctional linker
- the molecules e.g., polymeric analytes, polymerizable molecules, capture moieties
- linkers may be patterned using any useful technique such as self-assembling monolayers, photopatterning, lithography, etching.
- the molecules may be coupled to the substrate in a random arrangement.
- the molecules may be provided at a stoichiometric ratio or controlled concentration to couple the molecules at any useful ratio or density.
- the substrate may comprise topographical or patterned features which may facilitate attachment of linkers to the patterned features.
- the methods provided herein may comprise using a plurality of substrates.
- the preparation of the modified monomers or stacked plurality of modified monomers may be performed using a substrate (e.g., as shown in FIG. 1A).
- the modified monomers or stacked plurality of modified monomers may then be removed from the substrate and contacted with an additional substrate for coupling and detection.
- a modified monomer or stacked plurality of modified monomers may be contacted with a flow cell comprising one or more attachment or anchor molecules (e.g., anchor nucleic acid molecule).
- the modified monomer or stacked plurality of modified monomers may be coupled to the flow cell via one of the anchor molecules, linearized (e.g., using flow or electrophoretic force), and attached at another point to another anchor molecule.
- the linearized molecule may then be detected, e.g., using fluorescently labeled binders.
- any useful operation may be performed on or off a substrate.
- any of operations 106, 112, 113 may be performed on or apart from a substrate.
- a substrate e.g., a bead or a flow cell
- Process 106 may occur on the substrate (e.g., provision of a linker that couples to a terminal amino acid).
- the polymeric analyte Prior to coupling of the polymerizable molecule (linking nucleic acid molecule 111) to the capture moiety, the polymeric analyte may be removed from the substrate, e.g., by enzymatic (e.g., endonuclease, UDG) digestion of a portion of the capture moiety, chemical or heat denaturation (e.g., for double-stranded nucleic acid molecules), toehold- mediated strand displacement, chemical cleavage, or other approach.
- enzymatic e.g., endonuclease, UDG
- chemical or heat denaturation e.g., for double-stranded nucleic acid molecules
- toehold- mediated strand displacement e.g., for double-stranded nucleic acid molecules
- Such a decoupling of the polymeric analyte from the substrate may be useful for improving reaction kinetics (e.g., reactions occurring in solution rather than coupled to a surface of a substrate), preventing molecular crowding, or preventing intermolecular interactions or cross-talk among different polymeric analytes present on the same substrate.
- the polymeric analyte may be re-coupled to the same or different substrate (e.g., bead), e.g., to facilitate purification, enrichment, or separation of the polymeric analytes.
- the polymeric analyte may be re-coupled to the same or different substrate subsequent to process 113.
- the reaction kinetics, e.g., of coupling of the linker to the polymerizable molecule or to the terminal monomer may be improved using a sub strate, e . g. , by reducing dimensionality .
- the polymeric analyte 103 and the capture moiety 105 may be provided coupled to a substrate.
- the substrate may comprise or be contacted with the linker 109 and the polymerizable molecule (e.g., a linking nucleic acid molecule 111), which may adsorb to the substrate.
- the coupling of the polymeric analyte 103 to the linker 109 may occur in two dimensions (e.g, across the surface of the substrate) instead of three (in solution), which may beneficially improve the reaction kinetics of the coupling reaction.
- intermolecular crosstalk may be reduced using secondary structures of nucleic acid molecules.
- the polymerizable molecule (e.g., linking nucleic acid molecule) and the capture moiety may comprise nucleic acid molecules and coupling ofthetwo may occurusing a splint orbridge oligonucleotide.
- the splint orbridge oligonucleotide may comprise secondary structures, e.g., partially double-stranded regions, a hairpin molecule, etc. to favorably couple the capture moiety and polymerizable molecule linked to a particular polymeric analyte and not across different polymeric analytes. See, e.g., FIG. 23 and Example 12.
- the linking nucleic acid molecule and/or the capture moiety may comprise secondary structure (e.g., a hairpin molecule) to facilitate intramolecular expansion and coupling of the capture moiety and the linking nucleic acid molecule of a single polymeric analyte rather than from multiple polymeric analytes.
- secondary structure e.g., a hairpin molecule
- Intermolecular crosstalk among different polymeric analytes may be reduced or eliminated using any useful approach.
- intermolecular crosstalk may be decreased by decreasing the density of or increasing the distance between polymeric analytes (e.g., on a substrate, in solution, etc.).
- the intermolecular crosstalk is reduced by condensing or reducing the hydrodynamic radius of the polymerizable molecules.
- the polymerizable molecule e.g., linking nucleic acid molecule
- the capture moiety e.g., another nucleic acid molecule
- molecular staples such as DNA staples or DNA origami, as shown schematically in FIG. 1H, or hairpin nucleic acid molecules (not shown).
- the molecular staples may be reversible or cleavable, e.g., using toehold mediated strand displacement or restriction enzyme digest, a reducing agent (e.g., for molecular staples comprising disulfide bonds), oxidative cleavage of vicinal diol linkages, etc.
- intramolecular interactions may be used to facilitate intramolecular folding, e.g, using complementary base pair segments, multimeric DNA-binding molecules such as histones or DNA binding proteins.
- One or more substrates may be used for purification or enrichment.
- one or more purification or enrichment operations may be performed.
- a bead comprising a complementary nucleic acid sequence to at least a portion of the capture moiety 105, the linking nucleic acid molecule, or the additional polymerizable molecule may be used to purify or enrich for the desired product.
- the coupled product may be purified from the sample using a magnetic bead comprising a complementary sequence to the capture moiety 105. Enrichment or purification may occur at any useful step or operation, e.g., after each of or a subset of processes 106, 112, and 113.
- linking nucleic acid molecule 111 maybe coupled to the capture moiety 105 via a splint oligonucleotide comprising a biotin moiety; subsequent to the coupling, the coupled (ligated and/or circularized products) may be enriched or collected using streptavidin beads.
- Nanopore sensing/sequencing the modified amino acid or derivative thereof may be subjected to nanopore sensing or sequencing to characterize and/or determine the identity (e.g., an amino acid type) of the modified amino acid and optionally, of the polymerizable molecule.
- the nanopore sensing or sequencing may be performed using a commercially available nanopore system, e.g., Oxford Nanopore Technologies, Genia Technologies, NobleGen, Northern Nanopore Instruments, Norcada, or QuantumBiosy stem.
- Nanopore sequencing may be performed to determine the identity of different components of the modified amino acids; for example, a modified amino acid comprising a derivatized amino acid (e.g., a thiocyanate-conjugated amino acid or a thiocarbamyl, thiazolinone, or thiohydantoin derivative) coupled to a polymerizable molecule (e.g., a nucleic acid molecule) may be subjected to nanopore sequencing, which may output the identity of which amino acid type (e.g., which of the 20 proteinogenic amino acids or post-translationally modified amino acids) the modified amino acid comprises or is derived from or a subset of types (e.g., a hydrophobic residue, a charged residue, etc.) and, optionally, the identity of individual monomers of the polymerizable molecule (e.g., the nucleic acid sequence of the nucleic acid molecule).
- a derivatized amino acid e.g., a thio
- the polymerizable molecule comprises a nucleic acid molecule that encodes additional information (e.g., comprises barcode sequences, UMIs, cycle information, spatial information etc.)
- additional information e.g., comprises barcode sequences, UMIs, cycle information, spatial information etc.
- the sequencing of both the derivatized aminoacid and the nucleic acid molecules may generate multiplexed information.
- the polymerizable molecule may comprise multiple barcode sequence segments to generate a full barcode sequence.
- a modified amino acid may comprise a polymerizable molecule (e.g., a nucleic acid molecule) and an amino acid or derivative thereof.
- the polymerizable molecule may comprise multiple types of information.
- the polymerizable molecule e.g., a nucleic acid molecule
- the polymerizable molecule may comprise sequences that encode cycle or other temporal information, as described above, as well as spatial information.
- an array of peptides and capture moieties may be provided on a substrate.
- the array may comprise a plurality of individually addressable units, in which each (or a subset of) individually addressable units of the array comprises a peptide to be sequenced and a capture moiety.
- Linkers may be provided across the array, which may be capable of linkingto an amino acid of the peptides. Prior to, during, or subsequent to the provision of the linkers, a plurality of polymerizable molecules may be provided prior to, during, or subsequent to the provision of the linkers.
- the plurality of polymerizable molecules may comprise spatial information (e.g., spatial barcode sequences) which uniquely identify the individually addressable units and thus the location of the array.
- the polymerizable molecules or capture moieties may comprise information on the peptide, e.g, a barcode that identifies the peptide, a partition, sample, population of cells, organism, etc. from which the peptide originates.
- the polymerizable molecules or capture moieties may additionally comprise temporal information (e.g., a cycle barcode that indicates the round or iteration in which the polymerizable molecule or capture moiety is provided).
- Subsequent sequencing may be used to reveal the spatial or temporal information (e.g., the originating location in the array of a peptide or amino acid, the cycle or timing in which a polymerizable molecule or capture moiety is provided, etc.).
- the capture moiety may comprise multiplexed information (e.g., a barcode sequence, e.g., for identification of the peptide or the sample or partition from which the peptide originated, spatial information, temporal information, a UMI, etc.).
- binding agents may be used, which can add additional multiplexed analytical approaches to peptide sequencing.
- the use of binding agents coupled to a modified amino acid may be useful in modulating the translocation of the modified amino acid through a nanopore.
- the binding agents may comprise additional information that can be read out using nanopore sequencing. For example, different binding agents may be used that recognize specific amino acid residues (or modified amino acids, or amino acid-linker complexes); the different binding agents, alone or in combination with the modified amino acids to which they are coupled, may each be associated with a unique current signature when translocated through the nanopore and thus may be useful in identifying particular residues.
- the binding agents may further comprise additional coding information, e.g., via conjugated barcode molecules or detectable tags.
- the binding agents may have a barcode nucleic acid molecule coupled thereto, which barcode may identify the binding agent or the binding partner of the binding agent (e.g., an amino acid residue).
- the binding agents may comprise spatial or temporal information, as described above. Accordingly, when the modified amino acids comprisingbinding agents associatedtherewith are translocated through the nanopore, multiple sets of information may be obtained, e.g., spatial, temporal, encoded barcode information from the binding agent, the amino acid or modified amino acid, the polymerizable molecule, the capture moiety, or a combination thereof.
- the methods provided herein may yield multiplexed information in a streamlined workflow that obviates the need for multiple analytical instruments.
- obtaining multiplexed data may occur substantially simultaneously.
- the identity of an amino acid (or modified amino acid) and the identity of a polymerizable molecule coupled thereto may be obtained on the order of at most about 5 minutes, about 1 minute, about 50 seconds, about40 seconds, about 30 seconds, about20 seconds, about 10 seconds, about 1 second, about 900 milliseconds, about 800 milliseconds, about 700 milliseconds, about 600 milliseconds, about 500 milliseconds, about 400 milliseconds, about 300 milliseconds, about200 milliseconds, about 100 milliseconds, about 900 microseconds, about 800 microseconds, about 700 microseconds, about 600 microseconds, about 500 microseconds, about 400 microseconds, about 300 microseconds, about 200 microseconds, about 100 microseconds, about 900 nanoseconds, about 800 microseconds, about 300 microseconds, about
- the sequence or identity of the polymerizable molecule need not be determined.
- the polymerizable molecule may comprise a nucleic acid molecule or other charged polymeric molecule that may facilitate translocation of the modified amino acid through the nanopore, but the identity of the polymerizable molecule may not be determined.
- the polymerizable molecule may comprise one or more fiducial markers, e.g, known DNA sequences, polymers (e.g., PEG, PVA, polyacrylamide) or chemical moieties that generate a known or predictable current deflection, which may facilitate read alignment of the amino acid sequences, amino acid identification, mapping back of the amino acids back to a position within the peptide, or aid in computation of the read sequences.
- Sequencing may output the identity of the amino acid type comprised by the modified amino acids or stacked plurality of modified amino acids.
- Sequencing reads may be assembled using a de novo approach to identify the peptide or protein. For instance, fragmented peptides arising from a common parent protein may be labeled with a common peptide barcode sequence. Putative peptide reads can thus be assembled based on the common barcode sequence, amino acid identity, and if applicable, cycle number. Erroneous reads may be identified through probabilistic modeling of accuracy of reads, resulting in reconstructed, fragmentary, peptide sequences (contigs) with possible gaps for missed or unidentified rounds/amino acid.
- An alternative option for de novo read reconstruction may employ end-to-end, unsupervised machine learning based reconstruction of peptide reads.
- This option may employ a Machine Learning Algorithm, such as a deep-learning based model thattakes asitsinputNGS sequencing reads associated with a parent protein/peptide barcode, and outputs the likely reconstruction of peptide reads (contigs). Training of the model can be conducted with protein sequencing
- Reconstruction may output reconstructed, fragmentary, peptide sequences (contigs) with a probability assigned to each amino acid as well as the assembled peptide sequence.
- a k-mer or De Brujin approach may be used for peptide sequence reconstruction. For example, reads arising from each nucleic acid molecule may be broken down into shorter k-mer sequences. The k-mer sequences from the pool of reads may be assembled into longer contig sequences.
- a De Brujin graph may be generated, e.g., to represent splice variants, post- translational modifications, or other proteoforms.
- the isoforms may be assembled, and the expression level may be determined using a Bayesian approach.
- the assembled isoforms of proteins may be subjected to evaluation and error correction, e.g., by comparison with standard proteins that are spiked in samples, and assessing for missing segments of sequences, incorrect or redundant assembly, uniform coverage, etc.
- Other clustering methods may also be used to improve sequencing accuracy. For example, for a population of sequenced molecules (e.g., modified amino acids or stacked plurality of modified amino acids), clustering may be performed such that a population of reads can be clustered together and can be used to correct for individual inaccuracies across the same or different population of reads.
- the identity of the modified amino acids may be obtained without use of a sequencing approach.
- probes may be used to couple to particular regions of a polymerizable molecule or a modified amino acid.
- the probes may comprise nucleic acid probes with probe sequences that can be used to specifically detect a type of monomer.
- the polymeric analyte comprises a peptide, and an individual amino acid (monomeric unit) may be coupled to a capture moiety and cleaved from the peptide.
- the monomer-capture moiety complex may be contacted with a binding agent (e.g., antibody, nanobody, scFv) comprising a nucleic acid barcode molecule (polymerizable molecule) that identifies the binding agent.
- a binding agent e.g., antibody, nanobody, scFv
- the binding agent may be specific to one amino acid (e.g., of the 20 proteinogenic amino acids) and as such, the nucleic acid barcode molecule encodes for one specific amino acid.
- a nucleic acid probe having a complementary sequence to the nucleic acid barcode molecule of the binding agent may be used to identify the presence of the binding agent (e.g., via in situ hybridization).
- the probes may comprise detectable labels or moieties, e.g., a fluorophore, radioisotope, mass tag, etc.
- detectable labels or moieties e.g., a fluorophore, radioisotope, mass tag, etc.
- hybridization-based assays such as SeqFISH or Nanostring may be performed to probe or assay particular regions of a polymerizable molecule to determine its identity.
- an amplification-based approach may be used to determine the presence and identity of a polymerizable molecule.
- PCR or nested PCR approaches may be used to selectively probe for a particular sequence of a polymerizable molecule.
- the binding agent may comprise a detectable label or moiety.
- the binding agent may comprise a fluorophore, radioisotope, mass tag, chromogenic enzyme (e.g., horse radish peroxidase), etc., which may be detectable using the appropriate imaging technique.
- Different binding agents e.g., binding agents that recognize different monomers or amino acids
- fluorophore-labelled binding agents can be detected using single molecule imaging (e.g., total internal reflection, confocal, wide-field, or super resolution microscopy (e.g., PALM, STORM, STED)).
- the substrates comprising the polymerizable molecules may be provided on an array for sequencing.
- a plurality of beads comprising polymerizable molecules that encode for amino acids of a plurality of peptides may be provided on an array for sequencing.
- the plurality of beads may be directly or indirectly coupled to an additional substrate (e.g., planar substrate, such as microscope slides or multi-well plates), and sequencing may be performed using image-based sequencing approaches (e.g., using sequencingby synthesis or in situ hybridization probes and a single-molecule resolution imaging system), amplificationbased sequencing, or both.
- the plurality of beads may be coupled to the additional substrate using any suitable technique such as nucleic acid attachment using the polymerizable molecules or capture molecules, magnetic attachment (using a magnetic field and magnetic beads), optoelectronics, digital microfluidics, application of an electric field, gravity settling, centrifugation, capillary force, hydrogen bonding, electrostatic interactions or other suitable approach.
- suitable technique such as nucleic acid attachment using the polymerizable molecules or capture molecules, magnetic attachment (using a magnetic field and magnetic beads), optoelectronics, digital microfluidics, application of an electric field, gravity settling, centrifugation, capillary force, hydrogen bonding, electrostatic interactions or other suitable approach.
- the methods and systems described herein may be particularly useful in achieving highly accurate sequencing, or identification of individual monomers (e.g., amino acids), of polymeric analytes (e.g., peptides).
- the sequencing approaches disclosed herein provide for local tethering and cleavage of monomers and intramolecular expansion, the methods outlined herein comprise removing the monomers and tethering them locally, the local environment problem (variability of detection or readout of individual amino acids caused by the neighboring adjacent amino acids is avoided, which can thus achieve highly accurate reads of single monomers (e.g., amino acids) with single-molecule sensitivity.
- the methods outlined herein may achieve an individual read accuracy, e.g., the probability of correctly identifying a single monomer (e.g., an amino acid) of greater than 10%, greater than 20%, greater than 30%, greater than 40%, greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90%, or higher.
- the individual read accuracy may remain relatively constant as multiple iterations of the operations are performed. In other words, the individual read accuracy may not vary substantially depending on how many rounds or iterations of sequencingis performed.
- a high individual read accuracy may be obtained for at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 500, atleast 1000 or greater number of iterations of sequencing for identification of individual monomers (e.g., amino acids) of a given polymeric analyte.
- individual monomers e.g., amino acids
- the methods provided herein may allow for distinction of individual amino acids from other amino acids with high specificity.
- At least one single amino acid residue may be distinguishable from the other 19 proteinogenic amino acid residues, or, in some cases, amino acids comprising post-translational modifications or unnatural amino acids.
- the methods described herein may allow for specific identification of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or all 20 proteinogenic amino acids.
- the individual read accuracy may, in some instances, be defined by a probability function.
- the number of incorrect or f ailed identifications can include, f or example, an incorrect amino acid identification (incorrect A), an insertion (incorrect X), or a deletion (incorrectX), or a combination thereof.
- an individual read accuracy may be characterized by the probability of correctly identifying a class or group of amino acids.
- a class or group of amino acids may be classified, for example, based on physical, chemical, biological, physicochemical, or other properties. For example, aliphatic side chains (e.g., G, A, V, L, I), hydroxyl side chains or sulfur/selenium containing side chains (e.g., S, C, U, T, M), aromatic side chains (e.g., F, Y, W), basic side chains (e.g., H, K, R), acidic side chains (e.g., D, E, N, Q) may each constitute a class of amino acids.
- aliphatic side chains e.g., G, A, V, L, I
- hydroxyl side chains or sulfur/selenium containing side chains e.g., S, C, U, T, M
- aromatic side chains e.g., F, Y, W
- basic side chains
- positively- charged side chains e.g., arginine, histidine, lysine
- negatively -charged side chains aspartic acid, glutamic acid
- polar uncharged side chains e.g., serine, threonine, asparagine, glutamine
- hydrophobic side chains e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, trytophan
- certain types of post-translational modifications may constitute a class of amino acids.
- a class of amino acids may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more members, and any number of classes comprising the same or different numbers of members may be used to characterize individual read accuracy.
- a first class of amino acids may comprise 2 amino acid types (e.g., leucine and isoleucine)
- a second class of amino acids may comprise 3 amino acid types (e.g., tryptophan, phenylalanine, and tyrosine).
- tryptophan e.g., tryptophan, phenylalanine, and tyrosine.
- a relative (e.g., positional) individual read accuracy may also be ascribed to the individual monomers or to the polymeric analyte.
- the relative individual read accuracy may be the accuracy of relative positioning of each monomer in a sequence, but not the absolute position. For instance, for a peptide comprising the sequence A-G-N, if A is first correctly identified, G is not identified, and N is then correctly identified, then the relative individual read accuracy, or the correct identification of the order of a subset of amino acids (A and N) may be assigned (e.g., 100%) even though not every single amino acid is identified.
- the relative individual read accuracy relies on the correct identification of the order or relative positioning of a subset of amino acids of a peptide, but not necessarily the absolute position of each of those identified amino acids.
- Fingerprinting The methods described herein may be useful in complete de novo protein or peptide sequencing (e.g., the identification of each amino acid in a peptide), or for fingerprinting a protein (e.g., identifying only a subset of amino acid types in a peptide and inferring from, or mapping the identified amino acids to, a reference database, to identify the peptide orprotein). For fingerprinting, a subset of amino acids may be identified, e.g., usingthe approaches described herein. For example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 different identified modified amino acids arising from a peptide may be sufficient to determine the identity of a protein or peptide.
- reference-based reconstruction may be performed by simulatingNGS reads that would be generated from the set of possible peptides in the workflow. For each possible peptide, a simulation can produce NGS reads mimicking the output of this protein sequence system. Next, the real (experimental) NGS reads from a run can be matched to simulated reads from candidate peptides from a database based on likelihood. This results in reconstructed, fragmentary, peptide sequences (contigs) with probability assigned to the assembled peptide sequence.
- mapping accuracy or the accuracy of correctly identifying a protein (e.g., using a protein database) may be, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or greater, by correctly identifying only a subset of amino acids. For instance, only 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 amino acids may need to be identified in order to identify the protein. In some instances, the individual read accuracy for single amino acids need not be highly accurate in order to identify a protein with high accuracy.
- the individual read accuracy of a set of individual amino acids may be on average around 50%, around 60%, around 70%, around 80%, around 90%, etc. in order to yield a correct identification of the protein or peptide with at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or greater accuracy or confidence.
- the individual read accuracy for a single amino acid need not be above a threshold percentage in order to accurately identify a protein.
- abinding agent may have off-target binding to chemically similar residues (e.g., a binding agent against leucine may have off-target binding to isoleucine), which would result in poor individual read accuracy but still allow for correct identification of a peptide.
- the exact amino acid e.g, leucine
- the position e.g., “X” position
- relative position can be denoted as one of a number of amino acids (e.g., either leucine or isoleucine).
- Contextual clues and homology may then be used against a reference protein database to identify the protein, knowing that the X-position amino acid is one of a number of amino acids (e.g., either leucine or isoleucine). For example, knowing the exact NTAA, the exact n-1 NTAA, and the X position amino acid (as one of a number of amino acids) may be sufficient to correctly identify the protein. Accordingly, the accuracy of correctly identifying a peptide may be high, even if the individual read accuracy is not.
- a collection of reads may be usedto identify a protein or information aboutthe protein. For instance, a plurality of reads originating from a sample or across samples can be pooled to determine a consensus sequence or a probable sequence of the protein. Alternatively, or in addition to, concatenation of sequences to sequence a protein may be possible, e.g., via mapping of overlapping sequences between peptides.
- Arrays The methods and systems provided herein may include the use of arrays, e.g, for massively parallel sample processing or sequencing.
- a substrate may comprise an array of individually addressable units comprising a plurality of peptide analytes.
- a subset of the individually addressable units comprises a single peptide analyte.
- the individually addressable units may additionally comprise a single or plurality of capture moieties.
- the subset of the individually addressable units comprising a single peptide each comprise at least one capture moiety.
- the array may be patterned (e.g., individually addressable units are arranged in a pattern), or they may be random (e.g., individually addressable units are randomly distributed across the substrate).
- An array may comprise any useful number of individually addressable units.
- an array may have at least 5, at least 10, at least 100, at least 1,000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000,000,000 or greater number of individually addressable units.
- an array may have at most about 1,000,000,000, at most about 100,000,000, at most about 10,000,000, at most about 1,000,000, at most about 100,000, at most about 10,000, at most about 1,000, at most about 100, at most about 10, or at most about 5 individually addressable units.
- the array may have a range of number of individually addressable units, e.g., from about 1,000 to about 100,000, or from about 50,000 to about 1,000,000 individually addressable units.
- High Throughput Sequencing/Parallelization The methods described herein may be conducted in a parallelized, high-throughput format. Such parallelization may be achieved by having substrates comprising multiple polymeric analytes coupled thereto and performing the operations (e.g., intramolecular expansion) iteratively, across the substrate. The methods described herein may allow for parallel analysis of 1, 10, 100, 1,000, 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000 or greater number of polymeric analytes.
- a library of binding agents may be used to recognize different monomer types (e.g., different amino acids of a peptide analyte or derivatized amino acids), such that different polymeric analytes (e.g., different peptides) may be processed on a single substrate, thereby increasing multiplexing or detection of multiple different analytes.
- monomer types e.g., different amino acids of a peptide analyte or derivatized amino acids
- polymeric analytes e.g., different peptides
- binding agents may include the use of a binding agent.
- a binding agent may be or comprise a protein or peptide (e.g., an antibody, antibody fragment, nanobody), a nucleic acid molecule (e.g, aptamer), a polymer, an inorganic compound, a small molecule, or derivatives (e.g., engineered variants) or combinations thereof.
- the binding agent can bind to a modified amino acid or portion thereof.
- the use of binding agents may be useful in the identification of individual amino acids (or modified amino acids) of a peptide or to modulate the translocation of a modified amino acid through a nanopore or nanogap.
- the binding agent may comprise a recognition site that specifically recognizes an amino acid, modified amino acid, or a derivatized (and optionally modified) amino acid.
- the binding agent may be configured to recognize a moiety of a modified amino acid, such as a specific amino acid residue, the residue-linker complex, or derivatized amino acid (e.g., a thiocarbomyl-derivatized residue, a thiazolone-derivatized residue, a thiohydantoin-derivatized residue, etc.), or a portion of a modified amino acid.
- the binding agent may be derived or engineered from a naturally-occurring enzyme, e.g., an aminopeptidase or tRNA synthetase.
- a binding agent may be provided as part of the intramolecular expansion process; for example, the binding agent may specifically recognize and bind to a modified amino acid (e.g., an amino acid coupled to a polymerizable molecule) or a portion thereof. In some instances, the binding agent may recognize and bind to a moiety of the modified amino acid, e.g., the amino acid-linker conjugate or derivative thereof (e.g., an isothiocyanateamino acid or a thiocarbamyl, thiazolone, or thiohydantoin derivative thereof).
- the amino acid-linker conjugate or derivative thereof e.g., an isothiocyanateamino acid or a thiocarbamyl, thiazolone, or thiohydantoin derivative thereof.
- the binding agent may comprise a detectable moiety (e.g., fluorophore, barcode molecule, radioisotope, or other tag), which may facilitate detection and identification of individual amino acids.
- a detectable moiety e.g., fluorophore, barcode molecule, radioisotope, or other tag
- the binding agent may be recognized by an additional binding agent (e.g., a secondary antibody) comprising a detectable moiety; binding of the additional binding agent to the binding agent may be representative of the presence of the modified amino acid.
- the polymerizable molecule may comprise or be provided pre-coupled to the linker (e.g., as shown in process 106 of FIGs. 1A-1D), or the polymerizable molecule may be provided prior to, during, or subsequentto provision of the linker.
- the coupling of the polymerizable molecule to the capture moiety may occur prior to, during, or subsequently to the coupling of the linker to the amino acid.
- a substrate may be provided comprising a plurality of capture moieties or anchor molecules that can couple to a capture moiety.
- a polymerizable molecule comprising or coupled to a linker may be provided and may couple to one of the plurality of capture moieties or anchor molecules of the sub strate.
- the linker may then couple to the monomer (e.g. , terminal amino acid) of the polymeric analyte (e.g., peptide).
- an additional polymerizable molecule e.g., oligonucleotide may be contacted and used to couple the cleaved molecule with the capture moiety.
- the removal (e.g., cleavage) of the amino acid from the peptide may occur prior to, during, or sub sequently to the coupling of the polymerizable molecule to the capture moiety.
- the binding agent may be introduced prior to, during, or subsequent to the intramolecular process.
- the operations of workflows 100a and 100b may be performed and the stacked plurality of modified aminoacids 123 maybe contacted with a plurality of binding agents that have specificity to individual or oligomers (e.g., dipeptides or tripeptides) of different modified amino acids (e.g., 10, 20, or greater different binding agents, each with specificity to a different amino acid or modified amino acid).
- the plurality of binding agents may comprise a detectable label which may be used for amino acid identification (e.g., using singlemolecule imaging); alternatively, or in addition to, the stacked plurality of modified amino acids and binding agents may be introduced into a nanopore or nanogap for sequencing.
- the amino acid-linker complex or amino acid-capture moiety complex may, in some instances, be cleaved or removed from the substrate or peptide (e.g., via cleavage, enzymatic digestion, chemical dissociation) prior to, during, or subsequentto iteration of the workflow 100a or 100b.
- the cleaved molecule e.g., comprising the modified amino acid and the polymerizable molecule
- the cleaved molecule may be translocated through a nanopore for protein sequencing
- a stacked plurality of modified amino acids may be cleaved from the capture moiety prior to translocation through the nanopore.
- a plurality of stacked plurality of modified amino acids may be coupled to one another, e.g., via ligation, extension, hybridization, or chemical coupling methods.
- the plurality of stacked plurality of modified amino acids may arise from a common origin, e.g., the same peptide, the same cell, the same biological sample, etc., or the plurality of stacked plurality of modified amino acids may arise from separate or different origins.
- the coupling may occur in an ordered fashion or in a stochastic fashion.
- the length or size of the modified amino acid or stacked plurality of modified amino acids may be increased, orthe sequencemay be otherwise modified, e.g., using Gibson assembly, HIV integrase, Cas protein (e.g., Cas9, Casl2), homologous recombination, non-homologous end joining, or other recombination, expansion or polymerization approach, which may be beneficial in improving the sequencing readout, throughput, or efficiency.
- Cas protein e.g., Cas9, Casl2
- homologous recombination e.g., non-homologous end joining, or other recombination, expansion or polymerization approach, which may be beneficial in improving the sequencing readout, throughput, or efficiency.
- a divider molecule or a fiducial marker that can be identified may be placed between the individual stacked plurality of modified amino acids to indicate a distinction of the individual stacked plurality of modified amino acids or otherwise discretize them.
- a divider molecule may comprise a nucleic acid molecule that comprises a detectable moiety (e.g., an adduct, a hairpin, a structural or chemical element).
- the divider molecule may be coupled to the modified amino acids or individual stacked plurality of modified amino acids via ligation, a nucleic acid extension reaction, hybridization, or a combination thereof.
- additional cleaving or dissociation reactions may be performed to separate the amino acid from the polymerizable molecule.
- the separated polymerizable molecule orthe amino acid may be translocated through the nanopore.
- one or more additional processing operations may be performed in the methods provided herein.
- processing operations include wash steps, library clean-up, end repair, A-tailing amplification, purification or extraction (e.g., using gel extraction, SPRI, chromatography (e.g, using columns), HPLC), enzyme treatment (e.g., contacting with a restriction enzyme, treatment with uracilDNA glycosylase orUSER, proteinase K treatment, transposase, exonuclease, or other nuclease, ligase, phosphatase, kinase such as T4 polynucleotide kinase, deadenylase, polymerase, helicase, topoisomerase, methyltransferase, glycosylase, primase, telomerase, repair enzymes such as photolyases, AP endonucleases, reverse
- the additional processing operations may include treatment with a denaturing agent, e.g., sodium hydroxide, sodium dodecyl sulfate, or with formamide.
- a denaturing agent e.g., sodium hydroxide, sodium dodecyl sulfate, or with formamide.
- the products of intramolecular expansion may be further expanded, e.g., via Gibson assembly.
- the polymerizable molecules may be amplified (e.g, using nucleic acid amplification approaches such as polymerase chain reaction (PCR), isothermal amplification, ligation-mediated amplification, transcription-based amplifi cation, etc.) to generate amplicons for sequencing.
- Amplification may be performed, for example, using the capture moieties or polymerizable molecules as primer binding sites.
- an adapter sequence comprising a primer binding site may be added to the polymerizable molecules.
- nucleic acid reactions e.g., ligation, extension, amplification, tagmentation, restriction enzyme cleavage, enzymatic treatment (e.g., using exonucleases, RNase, CRISPR, Argonaut, terminal transferase)
- enzymatic treatment e.g., using exonucleases, RNase, CRISPR, Argonaut, terminal transferase
- barcoding addition of adapters, enzymatic treatment, etc.
- the polymerizable molecules, or the substrates comprising the polymerizable molecules may be filtered based on any useful characteristic or properties. Filtering based on a characteristic or property may achieve higher accuracy or less noise by removing poor quality molecules or enriching for higher quality polymerizable molecules prior to sequencing.
- polymerizable molecules or substrates e.g., beads or particles
- polymerizable molecules or substrates e.g., beads or particles
- polymerizable molecules or substrates may be filtered by size or length, quantity, presence of particular sequences (e.g., primer sequences, sequences of interest), GC content, polarity, polarization, birefringence, fluorescence (or other optical property), anisotropy, charge, secondary structure (e.g., hairpins), or other useful metric, characteristic, or property or combinations thereof.
- Such filtration orenrichment maybeperformedusingany suitable approach, e.g., affinity or hybridization approaches (e.g., bead-based affinity sequences or hybridization assays, which can enrich particular sequences), chromatography, size-based filtration, electrophoresis, electrofocusing, optoelectronics, dielectrophoresis, digital fluidics, magnetic activated sorting fluorescence activated sorting, flow cytometry, or other suitable technique.
- affinity or hybridization approaches e.g., bead-based affinity sequences or hybridization assays, which can enrich particular sequences
- chromatography size-based filtration
- electrophoresis electrophoresis
- electrofocusing electrofocusing
- optoelectronics dielectrophoresis
- digital fluidics e.g., magnetic activated sorting fluorescence activated sorting
- flow cytometry e.g., flow cytometry, or other suitable technique.
- FIG. 1G shows an example workflow for generating a stacked plurality of modified monomers (e.g., a stacked plurality of modified amino acids) using fewer operations.
- a polymeric analyte 103 e.g., a peptide
- the capture moiety 105 may comprise a first nucleic acid molecule (e.g., DNA).
- a polymerizable molecule e.g., a linking nucleic acid molecule I l l is provided.
- the polymerizable molecule comprises a plurality of linkers 109.
- the linkers 109 are pre-tethered to the polymerizable molecule (depicted as a linking nucleic acid molecule 111); alternatively, the linkers 109 and the polymerizable molecule may be provided separately.
- a first linker of the plurality of linkers 109 may couple to the terminal monomer (e.g., terminal amino acid) of the polymeric analyte 103 and subsequent cleavage may be performed.
- a second linker of the plurality of linkers 109 may couple to the next terminal monomer of the polymer analyte 103 and sub sequent cleavage may be performed.
- the linking nucleic acid molecule 111 may comprise, for each linker 109, an annealed nucleic acid primer which can sterically inhibit the reaction of the linker until a complementary oligonucleotide (e.g., a cycle barcode) is added.
- a complementary oligonucleotide e.g., a cycle barcode
- the complementary oligonucleotide may preferentially bind to the nucleic acid primer and thereby dehybridize the nucleic acid primer from the linking nucleic acid molecule 111, thus revealing the linker 109 and/or increasing the reactivity of the linker 109 to the terminal monomer of the polymeric analyte 103.
- any of the operations described herein may occur under any useful physical conditions. For instance, one or more operations described herein may occur at an elevated or lowered temperature, either of which may aid in reaction speeds, reaction completion, translocation speed or dwell time of a molecule through the nanopore or nanogap, etc.
- the nanopore or nanogap sequencing system may be performed on ice or lowered temperature, which can beneficially increase the measured signal-to-noise ratio of the current signal, modulate the translocation speed of the molecules through the nanopore or nanogap, or otherwise improve readout or detection.
- the operations, reactions, and compositions may be performed at any useful temperature.
- the temperature may be above or below ambient temperature.
- the temperature may occur at a freezing or below freezing temperature.
- the operations, reactions, and compositions may be performed at a temperature of about -273 degrees Celsius, of about -200 degrees Celsius, of about -150 degrees Celsius, of about -100 degrees Celsius, of about -50 degrees Celsius, of about -40 degrees Celsius, of about -30 degrees Celsius, of about -20 degrees Celsius, of about - 10 degrees Celsius, of about 0 degrees Celsius, of about 10 degrees Celsius, of about 20 degrees Celsius, of about 30 degrees Celsius, of about 40 degrees Celsius, ofabout 50 degrees Celsius, or about 100 degrees Celsius, of about 150 degrees Celsius, of about 200 degrees Celsius, of about 300 degrees Celsius, of about400 degrees Celsius or higher.
- a range of temperatures may be used in the operations, reactions, and compositions provided herein, e.g., between about 20 and 40 degrees Celsius, between about -5 and 0 degrees Celsius, etc.
- Additional methods and devices for sequencing peptides and/or polymerizable molecules may be used alternatively or in addition to nanopore or nanogap sequencing.
- the methods provided herein may utilize one or more imaging systems for identifying a molecule (e.g., a modified amino acid or a polymerizable molecule).
- optical imaging systems and methods include but are not limited to: epifluorescence microscopy, confocal microscopy, total internal reflection fluorescence (TIRF) microscopy, expansion microscopy, two-photon microscopy, integrated correlative microscopy, stimulated emission depletion (STED) microscopy, stochastic optical reconstruction microscopy (STORM), reversible saturable optically linear fluorescence transitions (RESOLFT), spatially modulated illumination, spectral precision distance microscopy (SPDM), photoactivated localization microscopy (PALM), fluorescence PALM (FP ALM), structured illumination microscopy (SIM), saturated SIM (SSIM), fluorescence lifetime imaging microscopy (FLIM), among others.
- epifluorescence microscopy confocal microscopy, total internal reflection fluorescence (TIRF) microscopy, expansion microscopy, two-photon microscopy, integrated correlative microscopy, stimulated emission depletion (STED) microscopy, stochastic optical reconstruction micro
- imaging may be used to identify a modified amino acid or polymerizable molecule.
- a binding agent e.g., antibody
- the binding agent may carry a fluorophore or other optically detectable tag, and the binding agent may selectively bind to a particular residue of a modified amino acid.
- the fluorophore may identify the binding agent, such that presence of the fluorophore indicates the presence of the binding agent (and thus the amino acid residue to which the binding agent binds).
- a secondary binding agent e.g., a secondary antibody
- a fluorophore or other detectable agent may bind to the binding agent (e.g., a primary antibody) to indicate the presence of the binding agent (and thus the amino acid residue to which the primary antibody binds).
- the sequence of the nucleic acid molecule may be determined using any suitable imaging technique, e.g., fluorescence in situ hybridization, or sequencing by synthesis. [00331]
- other proteomic tools may be used to identify the modified amino acids and polymerizable molecules.
- a modified amino acid e.g, an amino acid that has been contacted with a PITC linker coupled to a DNA molecule
- mass spectrometry immunochemistry (e.g., via ELISA, immunoblotting fluorescence imaging), or other analytical technique (e.g., traditional Edman degradation, Raman spectroscopy, atomic force microscopy, FTIR, light scattering, quartz-crystal microbalance, circular dichroism spectrometry, calorimetry such as isothermal titration calorimetry, protein crystallization).
- further processing of the modified amino acid may be performed prior to analysis.
- such further processing may include detachment of the modified amino acid from a substrate, enrichment or purification, or addition of tags, adapters, or other moieties.
- the modified amino acid may be subjected to a separation technique prior to analysis, e.g., chromatography (e.g., liquid chromatography, HPLC, nano-LC), electrophoretic separation (e.g., electrophoresis, isoelectric focusing, dielectrophoresis, 2-D electrophoresis, electroosmotic flow), filtration, evaporation, distillation, or other separation technique.
- chromatography e.g., liquid chromatography, HPLC, nano-LC
- electrophoretic separation e.g., electrophoresis, isoelectric focusing, dielectrophoresis, 2-D electrophoresis, electroosmotic flow
- filtration evaporation, distillation, or other separation technique.
- the analysis method may comprise the use of nanopores or nanogaps for sequencing or identification.
- the nanopore or nanogap sequencing approach can be useful in generation of multiplexed data, such as (i) sequencing of a polymerizable molecule (e.g, sequencing a nucleic acid molecule to generate sequencing reads) and (ii) identification of the analyte (e.g., identification of an amino acid residue).
- Such generation of multiplexed data may occur substantially simultaneously, e.g., within about 5 minutes, about 1 minute, about 1 second, about 1 millisecond, about 1 microsecond, about 1 picosecond, or less.
- the sequencing and identification may not occur substantially simultaneously, but rather signal generation from the polymerizable molecule and the amino acid may occur substantially simultaneously .
- the modified amino acids or derivatives thereof e.g, derivatized modified amino acids, stacked plurality of modified amino acids, etc.
- the modified amino acids or derivatives thereof may be translocated through a nanopore.
- one ormore signals e.g., current signal, current blockade, impedance
- a parameter of a signal e.g., amplitude, frequency, time-frequency distribution
- Such one or more signals may then be deconvolved using computational approaches to determine the identity of the polymerizable molecules (e.g., nucleic acid sequences) and the amino acid types of the modified amino acids or derivatives thereof.
- the use of nanopore or nanogap sequencing for generation of multiplexed data may obviate the need for multiple analysis techniques or instruments.
- the signals obtained from the nanopore may be used to directly determine the protein identity or proteoform type without identification of the individual constituent amino acids.
- kits, and compositions for characterizing analytes (e.g., peptides).
- the systems, kits, and compositions provided herein may be useful in implementing any of the described methods or may be provided in complement to the described methods.
- kits of the present may comprise a substrate, a capture moiety, a linker, a polymerizable molecule, a peptide-conjugation reagent, an enzyme (e.g., a polymerizing or ligating enzyme, a cleaving enzyme, a restriction enzyme, a nicking enzyme, an exonuclease, a repair enzyme such as a uracil DNA glycosylase), a detection or labeling agent, or any combination thereof.
- the kits may comprise buffers, reagents, binding agents, catalysts, or other chemicals or biological molecules (e.g., enzymes) necessary for conducting a chemical or enzymatic reaction.
- kits of the present disclosure may further comprise instructions for using the components of the kit or for implementing any of the methods and processes described herein.
- the kit may comprise instructions for conjugating a peptide onto a substrate using a peptide-conjugation reaction.
- the kit may comprise instructions for conjugating a capture moiety to a substrate, or alternatively, the substrate may comprise the capture moieties coupled thereto.
- the kit may comprise instructions for performing intramolecular expansion, as described herein, e.g., to generate a modified amino acid comprising a polymerizable molecule (e.g., DNA) coupled thereto.
- a polymerizable molecule e.g., DNA
- compositions that may be used to characterize an analyte (e.g., peptide).
- a composition may comprise a linker covalently attached to a polymerizable molecule (e.g., a nucleic acid molecule), which may, for example, be useful in intramolecular expansion of a peptide for peptide sequencing.
- a composition may comprise (A) a linker comprising (i) a first moiety that can couple to an amino acid (e.g., a CTAA or NTAA of a peptide), (ii) a second moiety that can couple to a nucleic acid molecule (e.g., DNA), and optionally, (iii) a releasable or cleavable moiety, which may be the same or different moiety as (ii), and also optionally, (iv) a spacer moiety, and (B) a nucleic acid molecule.
- a linker comprising (i) a first moiety that can couple to an amino acid (e.g., a CTAA or NTAA of a peptide), (ii) a second moiety that can couple to a nucleic acid molecule (e.g., DNA), and optionally, (iii) a releasable or cleavable moiety, which may be the same or different mo
- the composition may comprise a linker that is covalently coupled to a nucleic acid molecule; such a linker may comprise a moiety that can couple to and optionally cleave an amino acid (e.g., a CTAA orNTAA of a peptide), and the covalently coupled nucleic acid molecule may be configured to tether to another nucleic acid molecule (e.g., a capture moiety), which may in some instances, be provided attached to a substrate.
- a linker may comprise a moiety that can couple to and optionally cleave an amino acid (e.g., a CTAA orNTAA of a peptide)
- the covalently coupled nucleic acid molecule may be configured to tether to another nucleic acid molecule (e.g., a capture moiety), which may in some instances, be provided attached to a substrate.
- a system of the present disclosure may comprise a sequencing instrumentthatis configuredto receive a modified amino acid or derivative thereof, as described herein, and to provide sequencing reads of the modified amino acid or derivative thereof.
- a system of the present disclosure may be configured to process, prepare, or sequence a modified amino acid.
- the system may be configured to provide the peptide, the capture moiety, and a linker comprising a polymerizable molecule; couple the linker to an amino acid of the peptide; cleave the amino acid from the peptide; and optionally, iterate one or more operations or processes.
- systems of the present disclosure may comprise any useful apparatuses or tools, including but not limited to mixers, liquid handlers, vortexes, centrifuges, heating or cooling elements, mechanical stages, microfluidic chambers or devices and fluidic controls.
- the system e.g., a sequencing instrument
- the system is configured to directly couple to a nanopore or nanochannel sequencer.
- a system of the present disclosure may comprise a nanopore or nanochannel sequencing system that is configured to sequence or analyze a modified amino acid.
- the modified amino acid may comprise or be coupled to a polymerizable molecule, as is described elsewhere herein.
- the system maybe configured to output multiplexed data regarding the modified amino acid (e.g., which of the 20 proteinogenic amino acid residues a modified amino acid is or is derived from), and the polymerizable molecule (e.g., a nucleic acid sequence).
- the modified amino acid e.g., which of the 20 proteinogenic amino acid residues a modified amino acid is or is derived from
- the polymerizable molecule e.g., a nucleic acid sequence
- a system of the present disclosure may comprise a substrate, a capture moiety, alinker, a polymerizable molecule, a peptide-conjugation reagent, an enzyme (e.g., a polymerizing or ligating enzyme, a cleaving enzyme, a restriction enzyme, a nicking enzyme, an exonuclease, a repair enzyme such as a uracil DNA glycosylase), a detection or labeling agent, buffers, reagents, binding agents, catalysts, or other chemicals or biological molecules (e.g., enzymes) necessary for conducting a chemical or enzymatic reaction, or any combination thereof.
- the system may further comprise one or more detection (e.g., imaging or mass spectrometry) systems, separation systems (e.g., HPLC), or other analytical instrument.
- a composition of the present disclosure may comprise any useful items or reagents for processing or sequencing a peptide or for generating the modified amino acids or derivatives thereof.
- a composition may comprise a capture moiety configuredto couple to a polymerizable molecule.
- the composition may comprise a linker (e.g., to couple to an amino acid), which linker may comprise a polymerizable molecule, a cleaving agent, a ligating agent (e.g., ligase), a capture moiety, reagents for conducting a reaction, or a combination thereof.
- a composition comprises a linker comprising (e.g., covalently attached thereto) a nucleic acid barcode molecule, wherein the linker comprises an amino acid-reactive group.
- the nucleic acid barcode molecule may comprise any useful barcode sequences, e.g., spatial, temporal, peptide- identifying, partition-identifying, sample-identifying sequences, a UMI, or other functional sequence.
- Kits of the present disclosure may comprise any useful reagents for processing analyzing, or sequencing a peptide.
- a kit may comprise a reagent for providing the peptide and a capture moiety, a reagent for coupling the peptide and capture moiety to a substrate, a reagent for cleaving an amino acid, a reagent for coupling the peptide to a linker comprising an amino acid reactive group and either an additional reactive group or a polymerizable molecule, a reagent for coupling the capture moiety to the polymerizable molecule, a reagent for providing a binding agent, a cleaving reagent, a stabilizing reagent, a wash buffer, or a combination thereof.
- the kit may further comprise reagents for removing or decoupling an amino acid from the peptide, or from removing or decoupling an amino acid-linker complex or a modified amino acid from a substrate or from the capture moiety.
- the kit may further comprise reagents for removing or decoupling an amino acid-linker complex from a substrate, or the kit may comprise reagents for conducting a nucleic acid extension reaction.
- the kit may include any relevant reagents, e.g., buffers, detergents, chelating agents, cofactors, enzymes, ribozymes orDNAzymes, acids, bases, salts, metal ions, primers, nucleic acid molecules, nucleotides, proteins, polynucleotides, binding agents (e.g., antibodies, aptamers, nanobodies, antibody fragments), lipids, carbohydrates, ribozymes, riboswitches, probes, fluorophores, oxidizing agents, reducing agents, nuclease or protease inhibitors, dyes, organic molecules, inorganic molecules, emulsifiers, surfactants, stabilizers, polymers, water, small molecules, therapeutics, radioactive materials, preservatives, or other useful reagent.
- the kits of the present disclosure may also provide instructions for the use of the contents of the kit.
- the present disclosure provides methods for coupling molecules (e.g., polymeric analytes, e.g., biomolecules such as nucleic acid molecules, peptides, lipids, carbohydrates, etc.) to other molecules or to a substrate.
- molecules e.g., polymeric analytes, e.g., biomolecules such as nucleic acid molecules, peptides, lipids, carbohydrates, etc.
- the other molecules or substrate may be functionalized to allow for covalent or noncovalent coupling of the molecules thereto.
- the substrate or other molecules may comprise any useful functional moiety, e.g., a reactive moiety, that can couple or conjugate to a molecule or another reactive moiety.
- a reactive moiety may comprise a click chemistry moiety, such as an azide, alkyne, nitrone, alkene (e.g., a strained alkene), tetrazine, methyltetrazine, triazole, tetrazole, phosphite, phosphine, etc.
- a click chemistry moiety such as an azide, alkyne, nitrone, alkene (e.g., a strained alkene), tetrazine, methyltetrazine, triazole, tetrazole, phosphite, phosphine, etc.
- a click chemistry moiety may be reactive in copper-catalyzed Huisgen cycloaddition or the 1,3-dipolar cycloaddition between an azide and a terminal alkyne, a Diels-Alder reaction (e.g., a cycloaddition between a diene and a dienophile), or a nucleophilic substitution reaction in which one of the reactive species is an epoxy or aziridine.
- a molecule that is to be coupled to a substrate may comprise a complementary click chemistry moiety to that of the substrate; for example, the substrate may comprise an alkyne moiety and the molecule to be coupled may comprise an azide moiety, which can react with the alkyne moiety of the substrate to generate a covalent linkage.
- the substate may comprise dibenzocyclooctyne (DBCO) moieties to which azide-comprising molecules (e.g., azide-DNA, azide-polymers, azide- peptides) can react and conjugate.
- DBCO dibenzocyclooctyne
- the reactive moiety may comprise a photoreactive moiety that may be activated when exposed to a photostimulus (e.g., light such as UV or visible light).
- photostimulus e.g., light such as UV or visible light.
- photoreactive moieties include aryl (phenyl) azides (e.g., phenyl azide, orthohydroxyphenyl azide, meta-hydroxyphenyl azide, tetrafluorophenyl azide, ortho-nitrophenyl azide, meta-nitrophenyl azide), diazirines, azido-methyl-coumarins, benzophenones, anthraquinones, diazo compounds, diazirines, psoralen, 3-cyanovinylcarbazole phosph oramidite (CNVK), and analogs or derivatives thereof.
- aryl (phenyl) azides e.g., phenyl azide, orthohydroxyphenyl azide, meta-hydroxyphen
- the reactive moiety may comprise a carboxyl-reactive crosslinkergroup, such as diazo compounds such as diazomethane and diazoacetyl, carbonyldiimidazole, carbodiimides (e.g., 1- ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC)), dicyclohexylcarbodiimide (DCC)), or an amine-reactive group (e.g., N-hydroxysulfosuccinimide (NHS), Sulfo-NHS, or NHS-esters).
- diazo compounds such as diazomethane and diazoacetyl
- carbonyldiimidazole e.g., 1- ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC)), dicyclohexylcarbodiimide (DCC)
- an amine-reactive group e
- the reactive group may comprise a crosslinking agent, which may comprise an NHS group, an EDC group, a maleimide (e.g., for coupling with a Michael acceptor), a thiol, a cystamine, an aldehyde, a succinimidyl group, an epoxide, an acrylate.
- a crosslinking agent which may comprise an NHS group, an EDC group, a maleimide (e.g., for coupling with a Michael acceptor), a thiol, a cystamine, an aldehyde, a succinimidyl group, an epoxide, an acrylate.
- crosslinking agents include, for example, NHS (N-hydroxysuccinimide); sulfo-NHS (N- hydroxy sulfosuccinimide); EDC (l-Ethyl-3 -[3 -dimethylaminopropyl]); carbodiimide hydrochloride; SMCC (succinimidyl 4-(N-maleimidomethyl)cyclohexane-l-carboxylate); sulfo- SMCC; DSS (disuccinimidyl suberate); DSG (disuccinimidyl glutarate); DFDNB (1,5-difluoro- 2,4-dinitrobenzene); BS3 (bis(sulfosuccinimidyl)suberate); TSAT (tris- (succinimidyl)aminotriacetate); BS(PEG)5 (PEGylated bis(sulfosuccinimidyl)suberate
- Molecules may also be attached to other molecules or to substrates using linkers.
- the linkers can have any useful number of functional groups or reactive groups and may be unifunctional (having one functional group), bi-functional, tri-functional, quadri-functional, or comprise a greater number of functional groups.
- a molecule e.g., nucleic acid molecule, peptide, or polymer
- the heterobifunctional linker may comprise any useful functional group, as described herein.
- heterobifunctional linkers include: p- Azidobenzoyl hydrazide (ABH), N-5-Azido-2 -nitrobenzoyl oxysuccinimide (ANB-NOS), N-[4- (p-Azidosalicylamido)butyl]-3'-(2' -pyridyldithio) propionamide (APDP), p-Azidophenyl Glyoxal monohydrate (APG), Bis [B-(4-azidosalicylamido)ethyl]disulfide (BASED), Bis [2- (Succinimidooxycarbonyloxy)ethyl] Sulfone (BSOCOES), BMPS, 1,4-Di [3 '-(2 - pyridyldithio)propionamido] Butane (DPDPB), Dithiobis(succinimidyl Propionate) (DSP), Dis
- conjugation reactions that may be used to attach molecules to one another or to substrates include Castro-Stevens coupling, Larock indole synthesis, Miyaura borylation, Sonagashira cross-coupling, a Grubbs reaction, a Diels-Alder reaction, Staudinger ligation, Oxime ligation, Hydrazone formation, Thiol-ene reaction, Thiol-yne reaction, Thiol- maleimide reaction, Thiol-bromomaleimide reaction, Thiol-haloacetyl reaction, Disulfide formation, Thioether formation, Suzuki coupling, Sonogashira coupling, Heck reaction, Buchwald-Hartwig amination, Chan-Lam coupling, Negishi coupling, Kumada coupling, Stille coupling, Hiyama coupling, Ullmann coupling, Cadiot-Chodkiewicz coupling, Glaser coupling Wurtz coupling, Williamson ether synthesis, Mitsunobu reaction
- a substrate may be coupled to nucleic acid molecules and peptides.
- a substrate may be coupled to only one type of molecule (e.g., only nucleic acid molecules, only peptides, only lipids, only carbohydrates, etc.).
- a substrate may be coupled to any useful combination of molecules, linkers, reactive moieties or functional groups, which may be coupled at any useful density, as described elsewhereherein.
- a multifunctional linker may be usedto attach both a nucleic acid barcode molecule and a peptide to the substrate.
- the substrate may comprise a plurality of bifunctional linkers that can conjugate to different molecules.
- a substrate may comprise a linker and reactive sites; the linker may be used to attach one type of molecule (e.g., peptides or nucleic acid molecules), whereas the reactive sites may be used to attach another type of molecule (e.g., nucleic acid molecules or peptides).
- the linker may be used to attach one type of molecule (e.g., peptides or nucleic acid molecules)
- the reactive sites may be used to attach another type of molecule (e.g., nucleic acid molecules or peptides).
- Linkers can comprise other functional portions, such as spacers (e.g., polymer chains, e.g., PEG, alkyl chains, etc.), cleavable moieties (e.g., disulfide bridges that are cleavable upon application of a chemical stimulus, photocleavable or thermocleavable moieties, etc.), enzyme recognition sites, etc.
- spacers e.g., polymer chains, e.g., PEG, alkyl chains, etc.
- cleavable moieties e.g., disulfide bridges that are cleavable upon application of a chemical stimulus, photocleavable or thermocleavable moieties, etc.
- enzyme recognition sites e.g., enzyme recognition sites, etc.
- the spacer comprises a charged linker.
- the proximity of a molecule coupledto a substrate to its nearestneighbor may be controlled using a variety of approaches, e.g., self-assembling monolayers, patterning approaches, linking moieties, etc.
- approaches e.g., self-assembling monolayers, patterning approaches, linking moieties, etc.
- it may be advantageous to have two molecules in close proximity e.g., two polymerizable molecules, such as a peptide and a nucleic acid molecule, ortwo nucleic acid molecules).
- capture moieties may be used to couple a monomer of a polymeric analyte, and sub sequent to monomer cleavage, an additional polymerizable molecule or plurality of polymerizable molecules may be required to be in proximity to the capture moiety to allow for transfer of information encoded by polymerizable molecules of binding agents.
- the proximity of the molecules e.g., capture moiety and polymerizable molecules
- the distance between a molecule coupled to a substrate from a surface of the substrate may be modulated, e.g., via a linker.
- the distance of an end of a molecule to a surface of a substrate may be about 0.1 nanometer (nm), about 0.5 nm, about 1 nm, about 2 nm, about 3 nm, about 4 nm, about 5 nm, about 6 nm, about 7 nm, about 8 nm, about 9 nm, about 10 nm, about20 nm, about 30 nm, about40 nm, about 50 nm, about 60 nm, about 70 nm, about 80 nm, about 90 nm, about 100 nm, about 200 nm, about 300 nm, about400 nm, about 500 nm, about 600 nm, about 700 nm, about 800 nm, about 900 nm, or about 1000 nm.
- the distance of an end of a molecule to a surface of a substrate or the distances of a plurality of ends of molecules to one or more surfaces of one or more substrates may fall in a range of values, e.g., from about 5nm to about 100 nm, from about 1 nm to about 1 micrometer, etc.
- Nucleic acid molecules may be coupled to a substrate by direct coupling.
- the substrate or the nucleic acid molecules may comprise functional moieties that can interact.
- the substrate and nucleic acid molecules may comprise a complementary click chemistry pair, e.g., alkyne and azide.
- a substrate may comprise alkyne moieties (e.g., DBCO), which can be reacted with azide-functionalized nucleic acid molecules.
- the nucleic acid molecules maybe reacted with the alkyne moieties in a click chemistry reaction to covalently link the substrate to the nucleic acid molecules.
- the substrate may comprise avidin or streptavidin moieties, to which biotinylated nucleic acid molecules may interact and bind non-covalently.
- the substrate may comprise a nucleic acid molecule to which additional nucleic acid molecules (e.g., nucleic acid analytes, nucleic acid linkers) are conjugated using hybridization, ligation, click chemistry, crosslinking (e.g., photocrosslinking such as CNVK).
- the nucleic acid molecules may be coupled to a substrate using a linker, e.g., as described elsewhere herein.
- the linker may comprise at least two functional groups (e.g., a heterobifunctional linker) that can couple to both the substrate and the nucleic acid molecules.
- the substrate may comprise an amine group, and alkyne- functionalized DNA primers (e.g., DBCO-DNA primers) may be attached using a linker such as azidoacetic acid NHS ester.
- amine-fun ctionalized substrates may be coupled to azide-functionalizedDNA primers using a DBCO-NHS ester or DBCO-PEG-NHS ester linker.
- the linkers may comprise additional functional moieties (e.g., cleavage sites, spacers such as polymer or alkyl chains).
- peptides may be coupled to a substrate by direct coupling or by using a linker.
- a peptide may be coupled to a substrate at a terminus of the peptide (e.g., C terminus orN terminus), at an internal residue or amino acid of the peptide, or at multiple locations along the peptide.
- a peptide may be functionalized with a moiety that can interact with a moiety of the substrate (e.g., click chemistry pair, avidin-b iotin).
- the substrate and peptides may comprise a complementary click chemistry pair, e.g., alkyne and azide, or binding partners such as avidin and biotin.
- a substrate may comprise alkyne moieties (e.g., DBCO), which can be reacted with azide-functionalized peptides.
- the peptides may be reacted with the alkyne moieties in a click chemistry reaction to covalently link the substrate to the peptides.
- the substrate may comprise avidin or streptavidin moieties, to which biotinylated peptides may interact and bind non- covalently.
- the peptides may be coupled to a substrate using a linker, e.g., as described elsewhere herein.
- the linker may comprise atleasttwo functional groups (e.g., a heterobifunctional linker) that can couple to both the substrate and the nucleic acid molecules.
- the substrate may comprise an amine group, and alkyne-functionalized peptides may be attached using a linker such as azidoacetic acid NHS ester.
- amine-functionalized substrates may be coupled to azide-functionalized peptides using a DBCO- NHS ester or DBCO-PEG-NHS ester linker.
- substrates comprising an amine group may be coupled to an azide-functionalized peptide using EDC and Sulfo-NHS.
- a peptide may be functionalized with a functional moiety to enable attachment or coupling of the peptide to a capture moiety or to a substrate.
- the functional moiety may comprise a silane, e.g., aminosilane (e.g., APTES), amino-PEG-silane, click chemistry moiety or other linking moiety and can be attached to the peptide at a peptide terminus (N-terminus or C- terminus), at an internal amino acid, or at multiple locations (e.g., multiple internal amino acids, one or both termini, etc.).
- Chemical approaches to functionalize peptides can include C-terminal- specific conjugation (e.g., via C-terminal decarboxylative alkylation) using photoredox catalysis, e.g., as described by Bloom et al, Nature Chemistry 10, 205-211. 2018. and Zhang et al, ACS' Chem. Biol. 2021, 16, 11, 2595-2603, each of which is incorporated by reference herein in its entirety, or amide coupling to an amine-functionalized surface.
- N-terminal attachment may comprise amide coupling of the N-terminus amine group to a carboxylic group functionalized surface or using 2-pyridinecarboxaldehyde variants.
- terminal ends of peptides may be achieved enzymatically or using enzyme analogs such as ribozymes or DNAzymes.
- enzyme analogs such as ribozymes or DNAzymes.
- carboxypeptidases or amidases are used for C-terminal functionalization (e.g., as described in Xu et al, ACS Chem Biol. 2011 Oct 21; 6(10): 1015-1020; Zhu et al, Chinese Chemical Letters. 2018, Vol 29 Issue 7, Pages 1116-1118; and Zhu et al, ACS Catal.
- the click chemistry-functionalized peptides may then be directly attached to the substrate via another clickable group (e.g., BCN- azide or DBCO-azide coupling), or, in other instances, may be reacted with another linker or polymerizable molecule (e.g., a bait nucleic acid molecule with a clickable group) that can then link to the substrate directly or indirectly (e.g., using a capture nucleic acid molecule and hybridizing the bait nucleic acid molecule).
- another clickable group e.g., BCN- azide or DBCO-azide coupling
- another linker or polymerizable molecule e.g., a bait nucleic acid molecule with a clickable group
- ubiquitin ligase can be used to attach ubiquitin proteins with linker moieties to substrates. These linker moieties can then be used to chemically attach proteins to ubiquitin- coupled substrates.
- glycosylating enzymes may be used to conjugate functionalized sugar groups (e.g., click chemistry functionalized sugars, polymer-conjugated sugars, biotinylated sugars) to amino acid residues, which can allow for attachment to a substrate (e.g., via click chemistry, polymer crosslinking or nucleic acid hybridization, avidin-biotin interactions), etc.
- functionalized sugar groups e.g., click chemistry functionalized sugars, polymer-conjugated sugars, biotinylated sugars
- amino acid residues e.g., amino acid residues
- Internal amino acid residues or post-translationally modified residues may be coupled to substrates using, for example, thiol labeling, amide coupling using EDC/NHS chemistry or DMT-MM to glutamate or aspartate residues, esterifying glutamate or aspartate residues, alkylation or disulfide bridge labeling of cysteines, or amide coupling to lysine residues.
- a peptide may be treated prior to, during, or subsequent to coupling of the peptide to a substrate.
- a peptide is conjugated with a tag that enables attachmentto the substrate, e.g., usingHis tags, SNAP-tags, CLIP-tags, SpyCatcher, Spy Tag, nucleic acid tags (e.g, bait oligos which can attach to capture oligos of the substrate).
- a tag that enables attachmentto the substrate, e.g., usingHis tags, SNAP-tags, CLIP-tags, SpyCatcher, Spy Tag, nucleic acid tags (e.g, bait oligos which can attach to capture oligos of the substrate).
- single-point (e.g., C-terminal) selective attachment ofpeptides can be achieved by reactingthe peptide with a linker comprising an aminereactive group (e.g., isothiocyanates such as PITC) and a reactive group (e.g., click chemistry group).
- the linker can be, for example, PITC-conjugated click chemistry moieties such as PITC- azide, PITC-alkyne, optionally with spacer moieties in between, e.g., PITC-alkyl-azide, PITC- PEG-azide, PITC-alkyl-alkyne, PITC-PEG-azide).
- the linker reacts with and “blocks” the primary amines (e.g., modifies lysines), includingthe N-terminus.
- Subsequent cleavage of the N- terminal amino acid e.g., using an Edman reagent, such as acid
- one of the remaining modified lysines may be attached to a substrate (e.g., using the click chemistry moiety coupled to the amine-reactive group).
- the peptide may be treated with a protease, e.g., LysC, which cleaves peptides such that a remaining peptide has a C-terminal lysine and such that the remaining peptide comprises a primary amine only at the C-terminal lysine residue and the N-terminus; such a cleavage may be performedpriorto reactingthe amine-reactive group, e.g., as shown by Xie et al. Langmuir 2022, 38, 30, 9119-9128, which is incorporated by reference herein in its entirety. See also, Example 6 c/ FIG. 15.
- a protease e.g., LysC
- the peptide may be treated with ArgC, which cleaves peptides such that a remaining peptide has a C-terminal arginine.
- the C-terminal arginine may thenbefunctionalized(e.g., with a click chemistry moiety) using an arginine-reactive group, e.g., a dicarbonyl compound.
- the peptide may be treated with AspN, which cleaves peptides such that a remainingpeptidehas an N-terminal aspartic acid residue.
- the N-terminal aspartic acid may then be functionalized (e.g., with a click chemistry moiety) using an carboxy-specific chemistry.
- attachment of a peptide to a substrate may be facilitated using one or more nucleic acid molecules.
- a linker comprising an amine-reactive group (e.g., isothiocyanates such as PITC) and a reactive group (e.g., click chemistry group) may be coupled to the peptide (e.g., at an N-terminus, a lysine side chain); and an oligonucleotide comprising an additional reactive group (e.g., click chemistry group) can react with that of the linker, thereby coupling the oligonucleotide to the peptide via the linker.
- an amine-reactive group e.g., isothiocyanates such as PITC
- a reactive group e.g., click chemistry group
- the oligonucleotide may then couple to a substrate (e.g., bead, flow cell) which comprises an anchor oligonucleotide, e.g, via ligation, extension, hybridization, or other nucleic acid reaction.
- a substrate e.g., bead, flow cell
- an anchor oligonucleotide e.g, via ligation, extension, hybridization, or other nucleic acid reaction.
- carboxylic groups can be reacted in a way to enable C-terminal or internal residue attachment.
- carboxyl groups may be labeled with a C-terminal sequencing reagent, such as isothiocyanate, when treated with an activating reagent (e.g., acetic anhydride) to generate a peptide-thiohydantoin (at the C-terminus) and “blocked” carboxyl groups on the aspartic acid andglutamic acid residues.
- an activating reagent e.g., acetic anhydride
- the thiohydantoin may then be reacted to couple to a substrate.
- cleavage of the C-terminal amino acid via a single round of C-terminal sequencing degradation, or via a protease exposes only a single reactive carboxylic group at the C-terminal amino acid.
- the single reactive C-terminal carboxylic group can then be used as a reactive moiety for a single attachment site.
- a peptide or protein can be attached via the N-terminus using the specific reactivities of the N-terminus amine group.
- Amine-based reactions such as amide coupling, can be carried out at low pH where only the N-terminal amine group is active.
- 2-pyridinecarboxyaldehyde and variants can be used to react to the N-terminal amine group.
- a peptide may be conjugated to a substrate using a polymerization reaction, e.g., a free radical polymerization, such as using PEGylated peptides, methacrylamide- modified peptides, Michael-type addition of maleimide-terminated oligo-NIPAAM-conjugated peptides; photocrosslinking of azophenyl-conjugated peptides, or other polymerization reactions with monomer-conjugated peptides, e.g., as described by Krishna et al. Biopolymers. 2010; 94(1): 32-48, which is incorporated by reference herein in its entirety.
- a polymerization reaction e.g., a free radical polymerization, such as using PEGylated peptides, methacrylamide- modified peptides, Michael-type addition of maleimide-terminated oligo-NIPAAM-conjugated peptides; photocrosslinking of azophenyl-con
- the substrate may comprise, coupled thereto, any combination of molecules, including but not limited to peptides, proteins (e.g., enzymes, ribozymes, DNAzymes, antibodies, nanobodies, antibody fragments), nucleic acid molecules, lipids, carbohydrates or sugars, metabolites, small molecules, polymers, metals, viral particles, biotin, avidin, streptavidin, neutravidin, etc.
- the multiple types of molecules may be attached simultaneously to the substrate or in a sequential manner.
- a substrate may be treated to conjugate nucleic acid molecules and subsequently treated to conjugate peptides, or alternatively, the substrate may be treated to conjugate peptides prior to the nucleic acid molecules.
- Any number of conjugation or attachment chemistries may be used.
- any number of conjugation chemistries may be used for each type of molecule.
- a substrate, or portion thereof may be subjected to conditions sufficient to passivate the substrate or portion thereof.
- Passivation of a substrate may beuseful for a variety of purposes, such as preventing nonspecific binding of binding agents, altering the surface density of a molecule (e.g., increasing the density of nucleic acid molecules or peptides), blocking reactive sites (e.g., blocking available click chemistry moieties subsequentto conjugation of the molecules on the substrate), etc.
- Passivation may be achieved using chemical approaches, e.g., deposition of blocking agents such as proteins (e.g., albumin), Tween-20, polymers, metals or metal oxides, or biochemical approaches, e.g., using metal microbes.
- Substrates comprising reactive moieties may also be passivated following molecule conjugation (e.g., coupling of nucleic acid molecules, peptides, etc.) by reacting any unreacted sites with an appropriate molecule.
- a substrate comprising click chemistry moieties e.g., DBCO beads
- molecules of interest e.g., polymerizable molecules, such as nucleic acid molecules, peptides
- click chemistry e.g., azide-nucleic acid molecules, azide-peptides
- Unreacted sites may be passivated by providing and reacting complementary click-chemistry molecules, e.g, azide-polymers (e.g., PEG-azide), which may reduce downstream nonspecific interactions.
- Substrate passivation may occur at any useful time or step. For instance, passivation to block unreacted DBCO sites may be performed prior to, during, or subsequentto conjugation of analytes or other molecules of interest (e.g., peptides and nucleic acid molecules). The passivation may be controlled by stoichiometry or densities of the passivating agent relative to the molecules of interest, or by physical approaches, e.g., photopatteming, self-assembling monolayers, etc.
- analytes or other molecules of interest e.g., peptides and nucleic acid molecules.
- the passivation may be controlled by stoichiometry or densities of the passivating agent relative to the molecules of interest, or by physical approaches, e.g., photopatteming, self-assembling monolayers, etc.
- One or more methods for processing samples may comprise preparation of biological samples for analysis, which, in some instances, includes partitioning of cells for conducting single- cell analysis.
- a method for processing a biological sample may comprise extraction or isolation of one or more peptides or proteins from the biological sample for further processing and analysis, as is described elsewhere herein.
- Preparation of Cell Suspensions for Single-Cell Analysis may involve preparation of single cell suspensions from a biological sample. Single cell suspensions may be prepared from biological samples by dissociating cells and optionally, culturing them in a liquid medium. In some instances, biological samples comprise a liquid sample.
- a biological sample may comprise a bacterial liquid culture, a mammalian liquid culture, a bacterial cell culture, a fungal cell culture, a blood, plasma, or serum sample. Processing of such liquid samples may include centrifugation (e.g., to isolate cells), resuspension of cells in a suitable medium, such as Dulbecco’ s PhosphateBuffered Saline (DPB S), and optional culturing of the isolated cells.
- a suitable medium such as Dulbecco’ s PhosphateBuffered Saline (DPB S)
- a biological sample may comprise cultured cells, e.g., cell cultured in suspension, or cells adhered to a solid surface, such as petri dishes or tissue culture dishes, or semi-solid medium. Cultured adherent cells samples may be treated to generate a cell suspension, e.g., via a protease such as trypsin, to detach the cells from the surface.
- a biological sample may comprise a tissue or biopsy sample. A tissue or biopsy sample may be processed mechanically or enzymatically to generate a cell suspension.
- Such processing may include sonication (mechanical treatment) or enzymatic treatment, such as the use of pronase, collagenase, hyaluronidase, metalloproteinases, lysozymes, zymolase, trypsin, or other enzymes that digest extracellular matrix components.
- sonication mechanical treatment
- enzymatic treatment such as the use of pronase, collagenase, hyaluronidase, metalloproteinases, lysozymes, zymolase, trypsin, or other enzymes that digest extracellular matrix components.
- the dissociated cells can then be stored in a suitable buffer, such as DPBS.
- Cell Sorting A biological sample or a cell suspension may be subjected to sorting to isolate a cell of interest. Sorting may be performed to select or isolate a cell based on a quality or characteristic of the cell, e.g., expression of a protein target, size, deformability, fluorescence or other optical property, or other physical or phenotypic property of the cell.
- a quality or characteristic of the cell e.g., expression of a protein target, size, deformability, fluorescence or other optical property, or other physical or phenotypic property of the cell.
- Sorting may be accomplished using any number of approaches, e.g., using immunosorting (e.g., fluorescence activated cell sorting (FACS) or magnetic activated cell sorting (MACS)), electrophoretic approaches, chromatography, microfluidic approaches (e.g., using inertial focusing, cell traps, electrophoresis, isoelectric focusing), acoustic sorting, optical sorting (e.g., optoelectronic tweezers), mechanical cell picking (e.g., using manual or robotic pipettes) or passive approaches (e.g., gravitational settling). Sorting may be performed based on a displayed protein on a cell surface, e.g., a protein anchored to the cytoplasmic membrane.
- immunosorting e.g., fluorescence activated cell sorting (FACS) or magnetic activated cell sorting (MACS)
- electrophoretic approaches e.g., chromatography, microfluidic approaches (e.g., using in
- Cells of a biological sample or cell suspension may be partitioned into individual partitions such that at least a subset of the individual partitions comprises a single cell.
- the individual partitions may comprise a barcode molecule (e.g., fluorophore or set of fluorophores, nucleic acid barcode molecules, etc.).
- Barcode molecules may be unique to the partition, such that each individual partition comprises a different barcode sequence than other partitions.
- the barcode molecules may be loaded into the individual partitions at any useful ratio of barcode molecules to sample species (e.g., cells, proteins, nucleic acid molecules).
- the barcode molecules may be loaded into partitions such that about O.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded into partitions such that more than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species. In some cases, the barcodes are loaded in the partitions so that less than about 0.0001, 0.001, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, or 200000 barcodes are loaded per sample species.
- a partition may assume any useful geometry such as a droplet, a microwell, a solid substrate, a gel (e.g., a cell encapsulated in a gel bead), a bead, a flask, a tube, a spot, a capsule, a channel, a chamber, or other compartment or vessel.
- a partition may be part of an array of partitions, e.g., a droplet in a microfluidic device, a microwell of a microwell plate, a spot on a multi- spot array, etc.
- Single cells may be processed to obtain one or more analytes contained therein.
- a method for processing a single cell may comprise lysing the cell to release the contents into the individual compartment or partition.
- Lysis may be performed using a detergent (e.g., Triton-X 100, sodium dodecyl sulfate, sodium deoxycholate, CHAPS), RIPA buffer, a change in temperature (e.g., elevated or lower temperature, freezing, freeze-thawing), enzymes, ribozymes, DNAzymes, mechanical lysis (e.g, sonication, application of mechanical force such as bead beating), electrical lysis, ora combination thereof. Lysis may be performed in the presence of protease inhibitors to prevent degradation or digestion of the proteins from the cell. The contents may optionally be further processed, e.g, subjected to purification or extraction, denaturation of proteins or peptides, enzyme or chemical digestion, etc.
- a detergent e.g., Triton-X 100, sodium dodecyl sulfate, sodium deoxycholate, CHAPS
- RIPA buffer e.g., a change in temperature (e.g., elevated or lower temperature, freezing,
- the contents may be subjected to enzymatic digestion to remove nucleic acid molecules, e.g., using nucleases such as DNAse or RNAse.
- a cell may be fixed (e.g., using a fixative), crosslinked, and/or permeabilized.
- fixatives examples include aldehydes (e.g., glutaraldehyde, formaldehyde, paraformaldehyde), alcohols (e.g., methanol, ethanol), acetone, acids (e.g., acetic acid, Davidson’s AFA), oxidizing agents (e.g., osmium tetroxide, potassium dichromate, chromic acid, permanganate salts), Zenker’s fixative, picrates, Hepes-glutamic acid buffer-mediated organic solvent protection effect (HOPE), or Karnovsky fixative.
- aldehydes e.g., glutaraldehyde, formaldehyde, paraformaldehyde
- alcohols e.g., methanol, ethanol
- acetone acids
- acids e.g., acetic acid, Davidson’s AFA
- oxidizing agents e.g., osmium tetroxide, potassium dichromate
- Cell permeabilization may be achieved mechanically (e.g., using sonication, electroporation, shearing, osmotic lysis) or chemically (e.g., using an organic solvent such as methanol or acetone or detergents such as saponin, Tween-20, Triton X-100).
- the biological sample (or single cell suspensions or partitioned cells) may be further processed to enable proteomic analysis. For example, de-aggregation of proteins in the sample may be performed, e.g., using chemical or mechanical approaches.
- Chemical de-aggregation methods can include but are not limited to sodium dodecyl (SDS), Triton-X 100, 3-((3-cholamidopropyl) dimethylamminio)-l-proppanesulfonate (CHAPS), ethylene carbonate, or formamide.
- Mechanical de-aggregation methods can include but are not limited to sonication or high temperature treatment.
- the biological sample or single cell suspensions or partitioned cells) may be subjected to conditions sufficientto denature one or more proteins.
- Denaturation may be achieved using heat, chemicals (e.g., SDS, urea, guanidine, formamide, metal organic compounds), reducing agents (e.g., dithiothreitol (DTT), beta mercaptoethanol, TCEP), urea, chaotropes, enzymes (e.g., ClpX, ClpS, unfoldases), ribozymes or DNAzymes.
- chemicals e.g., SDS, urea, guanidine, formamide, metal organic compounds
- reducing agents e.g., dithiothreitol (DTT), beta mercaptoethanol, TCEP
- urea chaotropes
- enzymes e.g., ClpX, ClpS, unfoldases
- ribozymes or DNAzymes e.g., ribozymes or DNAzymes.
- the peptides or proteins may be subjected to conditions to solubilize the peptides or proteins in
- Proteins or peptides may be extracted from a sample, e.g., using liquid-liquid extraction, solid-phase extraction, immunoaffinity extraction, magnetic bead extraction.
- the peptides or proteins may be enriched or purified; in an example, the peptides or proteins of interest may be precipitated using trichloroacetic acid, chloroform, TRIzol, or other chemical reagent.
- proteins e.g., DNAse, RNAse, DNA glycosylases, restriction endonucleases, transposases, micrococcal nucleases, Cas proteins.
- minimal protein processing is performed, e.g., to maintain native states or conformations of the proteins.
- Proteins or peptides may be filtered, e.g, using ultrafiltration, diafiltration, tangential flow filtration, nanofiltration, or other filtration method.
- Peptides or proteins may be fragmented prior to analysis. Fragmenting proteins may be useful in reducing the size of the proteins and allow for efficient processing of peptides, as is described elsewhere herein. Fragmentation may be performed using proteases, e.g., trypsin, chymotrypsin, pepsin, Lys-C, Glu-C, Asp-N, Proteinase K, furin, thrombin, endopeptidase, dipeptidase, papain, subtilisin, elastase, enterokinase, genenanse, endoproteinase, metalloproteases, or with chemical treatment, e.g., cyanogen bromide, hydrazine, hydroxylamine, formic acid, BNPS-skatole, iodosobenzoic acid, 2-nitro-5-thiocyanobenzoic acid, etc. Alternatively, or in addition to, fragmentation may be performed using mechanical methods,
- Enrichment of proteins or peptides in a biological sample may be performed, e.g, for separatingproteins and peptides from cellular debris or othertypes of analytes (e.g., nucleic acids, lipids, carbohydrates, metabolites).
- analytes e.g., nucleic acids, lipids, carbohydrates, metabolites.
- Such enrichment may include, for example, the use of affinity columns (e.g., ion exchange), size exclusion columns, affinity precipitation (e.g., ammonium sulfate precipitation, ethanol precipitation, acetone precipitation, trichloroacetic acid precipitation, immunoprecipitation or co-immunoprecipitation), chemical precipitation (e.g., using trichloroacetic acid, chloroform, TRIzol), chromatography (e.g., HPLC, SEC, IEX, affinity chromatography, RPC, HIC, FPLC, GC), centrifugation, filtration, crystallization (e.g., vapor diffusion, batch crystallization, dialysis crystallization, field-flow fractionation, membrane separation, or electrophoresis (e.g., PAGE, agarose gel electrophoresis, capillary electrophoresis, 2D-PAGE, isoelectric focusing, isotachophoresis, dielectrophores
- the enrichment may be performed using microbeads, affinity microcolumns, affinity beads, etc.
- fractionation may be performed on the proteins or peptides, which may be used to separate the proteins by size, hydrophobicity, charge, affinity, size, mass, density, etc.
- Peptides may be barcoded, in bulk or in partitions. Peptides maybe barcoded with any useful type of barcode molecule, e.g., spectral or fluorescent barcodes, mass tags, nucleic acid barcode molecules, etc.
- the barcode molecules may allow for identification of an originating peptide, a partition, a sample, a cell, or cell compartment.
- a cell sample may be partitioned such that a partition comprises at most one cell; the partition may comprise a unique barcode molecule (e.g., nucleic acid barcode molecule) that identifies the partition and thus the cell.
- a substrate may comprise nucleic acid molecules comprising a unique barcode sequence that differs from barcode sequences of other substrates. As such, the barcode sequence may be used to identify the substrate.
- barcoded substrates may be partitioned with cell samples, such that at least a subset of the partitions comprise a single cell and a single barcoded substrate. As such, the peptides arising from the single cell and transferred to the barcoded substrate may all be identifiable as originating from the single cell.
- Barcode molecules may comprise additional useful functional sequences, e.g., UMIs, primer sites, restriction sites, cleavage sites, transposition sites, sequencing sites, read sites, etc.
- Individual proteins or a plurality of proteins may be barcoded, in bulk, or in partitions.
- a protein may be barcodedin partitions using a split-pool technique.
- a population of proteins e.g., arising from a same cell, a same sample, a same origin, or different cells, samples, origins
- the tagged population of proteins may then be partitioned (“split”) into a plurality of partitions comprising unique partial barcode sequences, which can then be attached (e.g., via ligation, hybridization, nucleic acid extension, click chemistry, etc.) to the tag of each protein to generate barcoded or partially barcoded proteins.
- the contents of the partitions may then be pooled together, and the barcoded or partially barcoded proteins may be partitioned again into another plurality of partitions comprising unique partial barcode sequences.
- the unique partial barcode sequences may then be attached to the barcoded proteins or partially barcoded proteins, thereby generating combinatorial barcoded proteins. This process can be repeated any useful number of times, to provide individual or subsets of proteins with unique combinations of barcode sequences.
- nucleic acid molecules e.g., capture nucleic acid molecules, nucleic acid barcode molecules
- C-terminal conjugation of nucleic acid molecules may be achievedby amide coupling of amine- conjugated DNA barcode molecules to peptides or by thiol alkylation, e.g., reacting a thiolated peptide with an alkylated (e.g., iodoacetamide) DNA barcode molecule.
- N-terminal conjugation can be achieved, for instance, using 2-pyridinecarb oxyaldehyde labeling of a DNA barcode and reacting with the N-terminus of a peptide.
- Internal residues e.g., glutamate, can also be labeled with amine-conjugated DNA barcode molecules or carboxylated DNA barcodes (e.g., to react with primary amines in lysine).
- Individual peptides may be barcoded at multiple locations for a given peptide.
- a peptide may be labeled at multiple sites with the same or different barcode sequences.
- a peptide may be partitioned into a partition comprising a plurality of identical barcode molecules that comprise a barcode sequence that is unique to the partition.
- the peptide may be labeled at a single or multiple sites with the unique partition barcode sequence, optionally each comprising a unique molecular identifier (UMI), such that subsequent downstream analysis (e.g, sequencing) may be attributable to the same peptide using the barcode sequence.
- UMI unique molecular identifier
- a terminus of the peptide e.g., N-terminus or C-terminus
- an internal amino acid may be labeled with a barcode.
- the peptide may be fragmented priorto analysis or sequencing; accordingly, upstream attachment of multiple identical barcode molecules to the same peptide may allow for attribution of the sequence analysis back to a single peptide. Barcoding of peptides may occur prior to, during, or sub sequent to fragmentation.
- Peptides may be labeled with barcodes (e.g., nucleic acid barcode molecules) using any suitable chemistry, e.g, as described above, or using bifunctional or trifunctional linkers comprising multiple linking moieties, e.g., as described elsewhere herein, such as click chemistry moieties, NHS-esters, EDC, etc.
- barcodes e.g., nucleic acid barcode molecules
- bifunctional or trifunctional linkers comprising multiple linking moieties, e.g., as described elsewhere herein, such as click chemistry moieties, NHS-esters, EDC, etc.
- C-terminal attachment may comprise amide coupling to C-terminus carboxylic group or photoredox tagging of C-terminus carboxylic group (e.g., to add an electrophile tag).
- N- terminal attachment may comprise amide coupling to N-terminus amine group, where specific attachment can occur at low pH, or using 2-pyridinecarboxaldehyde variants for specific attachmentto N-terminus.
- Internal attachment may comprise, for example, amide couplingusing EDC/NHS chemistry or DMT-MM to Glutamate or Aspartate; alkylation or disulfide bridge labeling of cysteines; or amide coupling to lysine residues.
- a peptide may be labeled with different barcode molecules, which can be indexed by proximity to one another, e.g., usingprimers that can anneal to adjacent barcode molecules.
- proximity -based polymerase extension may be used to copy and associate the sequence of adjacent barcodes.
- each barcode molecule may comprise a primer binding site, to which a dual-primer linker sequence comprising two sequences is annealed. The dual primer linker sequence can bind to the primer binding sites of two adjacent barcodes.
- An extension reaction e.g., using a polymerase, may extend and copy the barcode sequences of the adjacent barcodes.
- the dual primer linker sequence which now has copies of the two adjacent barcodes, may be removed and sequenced.
- an adjacency matrix of barcode sequences may be generated (e.g., to correspond barcode sequences on a single dual primer linker as spatially adjacent). Accordingly, each of the barcode sequences may be associated with a nearby adjacent barcode sequences, and as such, peptide portions may be aligned or attributed as being adjacent. Such an approach may be useful in instances where the peptide is fragmented, such that individual fragments of a peptide may be corresponded with the nearest neighbor using the barcode sequences.
- a peptide may be barcoded at multiple locations for a given peptide using bridge amplification.
- a peptide or protein may be labeled at multiple sites with a nucleic acid primer.
- a nucleic acid barcode molecule may be provided, which can anneal to the nucleic acid primer (not shown) or be ligated to the nucleic acid primer. Subsequent rounds of bridge amplification may be performed in order to copy the nucleic acid barcode molecule to the other primers located at other sites of the given peptide.
- a peptide may be tagged with multiple copies of the nucleic acid primer, and barcode sequences may be provided sparsely, such that only one nucleic acid primer per peptide is extended by polymerase extension. Subsequent rounds of bridge amplification can result in a peptide having the same barcode sequence at each nucleic acid primer. Sub sequent fragmenting of peptides may be performed, such that peptide fragments comprise on average, a single barcode. Accordingly, in some cases, the output such an amplification approach may be peptides with individual barcodes generated from fragmentingmulti-labeled proteins where peptides from the same protein have the same barcodes.
- a sample of cells may be partitioned into individual partitions or compartments (e.g, droplets, microwells) such that at least a subset of the partitions comprise a single cell.
- the partitions may then be treated with a lysing agent to lyse the cells and release the proteins from the cells into the partition.
- the proteins may then be labeled with a partition-specific barcode (e.g, using a barcode bead), such that all peptides or proteins arising from a single compartment comprises the same barcode.
- the barcodes comprise nucleic acid barcode molecules, and the barcode sequence can be used in downstream processing, e.g., via sequencing to identify the partition or cell from which a peptide originated.
- the nucleic acid barcode molecule may comprise any additional useful sequences, e.g., UMIs, primer sequences, etc.
- a biological sample may be processed in bulk.
- a biological sample may be processed to obtain a suspension of cells, which may be directly lysed in the suspension, without partitioning of cells in individual compartments.
- Cells may be lysed in bulk using any useful approach, e.g., as described above and optionally subjected to further processing, e.g., homogenization, protease inhibition, denaturation, protein processing (e.g., chemical treatment, fragmentation), or a combination thereof.
- a biological sample may be subjected to pre-processing prior to cell lysis or protein extraction. Such pre-processing may include removal of debris, purification, filtration, concentration, or sorting.
- a biological sample may comprise a tissue sample comprising multiple cells. Tissue samples may be processed using an approach to retain spatial information (e.g., to identify peptides from individual cells), e.g., using spatial barcodes.
- a 2-D or 3-D tissue sample may be provided, and individual cells or locations within a tissue sample may be contacted with a plurality of spatial barcodes (e.g., nucleic acid barcode molecules) comprising different barcode sequences.
- the different barcode sequences may be attributed to a particular location in the 2-D or 3-D tissue sample, which may correspond with a location of a cell.
- spatial barcodes may be provided using deterministic methods such as two- photon patterning, or stochastic methods such as PCR, to assign different segments of the 2-D or 3-Dtissue sample with unique spatial barcodes. Accordingly, peptidesthatare labeled with spatial barcodes may be attributed back to a single location within a tissue sample, or back to a single cell.
- the tissue sample comprising multiple cells (illustrated as a 2x2 array of cells) may be provided.
- the tissue sample may be subjected to lysis or fixation and permeabilization to provide access to the proteins contained within the multiple cells.
- Spatial barcodes e.g., nucleic acid barcode molecules, may be provided.
- the spatial barcodes may comprise coordinate or location information.
- each cell may be contacted with a different spatial barcode, or portions of cells may be contacted with different spatial barcodes, which may optionally be pre-indexed (e.g., using imaging, or deterministic spatial barcoding approaches). Further processing of the peptides may be performed, as described elsewhere herein.
- each peptide having a spatial barcode may be attributed back to its originating coordinate or location, which can help identify the originating cell from which a peptide arises.
- a spatial barcode array may be provided on a substrate (e.g., a glass microscope slide, a hydrogel mesh).
- the spatial barcodes may be directly conjugated to the substrate, orthey maybe provided on barcoded beads.
- a plurality of beads each comprising different barcode sequences may be arranged in an array on a substrate.
- Each bead may comprise a spatial barcode comprising a spatial barcode sequence, and optionally, a unique molecular identifier (UMI).
- UMI unique molecular identifier
- a tissue sample e.g., a fixed tissue sample
- the tissue sample may then be subjected to conditions sufficient to transfer the peptides or proteins to the spatial barcode array.
- the peptides or proteins may be transferred via passive transport, e.g., diffusion or Brownian motion, or via active transport, e.g., electrophoresis, pressure-driven flow, etc.
- the peptides or proteins may be attached to the spatial barcodes, e.g., using a linker (e.g., comprising amine-reactive groups, or click chemistry groups such as azide, alkyne, or other functional moieties such as aldehyde groups or NHS or carboxylic groups), conjugation chemistry, or an anchoring agent.
- a linker e.g., comprising amine-reactive groups, or click chemistry groups such as azide, alkyne, or other functional moieties such as aldehyde groups or NHS or carboxylic groups
- anchoring agents include fixatives, such as formaldehyde, paraformaldehyde, glutaraldehyde, ormonomers for incorporation into a hydrogel, e.g., Acryloyl- X, acrylamide, N-(3-Aminopropyl)methacrylamide, or N-(3-Aminoethyl)methacrylamide, or benzophenone.
- Anchoring agents may also comprise multi-functional linkers, e.g., Acryloyl-X, Biotin-NHS, Biotin-PEG-Amine, DBCO-NHS, DBCO-amine. For bead arrays, the plurality of beads may be collected from the sample for further processing.
- FIG. 5 shows a computer system 501 that is programmed or otherwise configured to receive sequencing data and to output information on an identity of a polymerizable molecule or amino acid residues of modified amino acids.
- the computer system 501 can regulate various aspects of generating sequencing reads of the present disclosure, such as, for example, receiving one or more sets of sequencing data, using an algorithm to process the sequencing data, and outputting one or more sequencing results.
- the computer system 501 can be an electronic device of auser or a computer system thatis remotely located with respecttothe electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 501 includes a central processing unit (CPU, also “processof’ and “computer processor” herein) 505, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 501 also includes memory or memory location 510 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 515 (e.g., hard disk), communication interface 520 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 525, such as cache, other memory, data storage and/or electronic display adapters.
- the memory 510, storage unit 515, interface 520 and peripheral devices 525 are in communication with the CPU 505 through a communication bus (solid lines), such as a motherboard.
- the storage unit 515 can be a data storage unit (or data repository) for storing data.
- the computer system 501 can be operatively coupled to a computer network (“network”) 530 with the aid of the communication interface 520.
- the network 530 can be the Internet, an internet and/or extranet, or an intranet and/or extranetthat is in communication with the Internet.
- the network 530 in some cases is a telecommunication and/or data network.
- the network 530 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 530 in some cases with the aid of the computer system 501 , can implement a peer-to-peer network, which may enable devices coupled to the computer system 501 to behave as a client or a server.
- the CPU 505 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 510.
- the instructions can be directed to the CPU 505, which can subsequently program or otherwise configure the CPU 505 to implement methods of the present disclosure. Examples of operations performed by the CPU 505 can include fetch, decode, execute, and writeback.
- the CPU 505 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 501 can be included in the circuit.
- the circuit is an application specific integrated circuit (ASIC).
- the storage unit 515 can store files, such as drivers, libraries and saved programs.
- the storage unit 515 can store user data, e.g., user preferences and user programs.
- the computer system 501 in some cases can include one or more additional data storage units that are external to the computer system 501, such as located on a remote server that is in communication with the computer system 501 through an intranet or the Internet.
- the computer system 501 can communicate with one or more remote computer systems through the network 530.
- the computer system 501 can communicate with a remote computer system of a user.
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 501 via the network 530.
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 501, such as, for example, on the memory 510 or electronic storage unit 515.
- the machine executable or machine-readable code can be provided in the form of software. During use, the code can be executed by theprocessor 505. In some cases, the code canbe retrieved from the storage unit 515 and stored on the memory 510 for ready access by the processor 505. In some situations, the electronic storage unit 515 can be precluded, and machine-executable instructions are stored on memory 510.
- the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.
- aspects of the systems and methods provided herein, suchas the computer system 501, can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine-readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read- only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- a machine-readable medium such as computer-executable code
- a tangible storage medium such as computer-executable code
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- the computer system 501 can include or be in communication with an electronic display 535 that comprises a user interface (UI) 540 for providing, for example, sequencing data results, the identity of the modified amino acids, the identity of the polymerizable molecules, or the nucleic acid sequence of a nucleic acid molecule.
- UI user interface
- Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 505.
- the algorithm can, for example, process the sequencing data (e.g., nanopore or nanogap ion current signals), to generate one or more sequencing outputs (e.g., identification of an amino acid or a monomer of a polymerizable molecule, e.g., a nucleic acid sequence of a nucleic acid molecule).
- a starting peptide having sequence (from N to C terminus) WYDGQKGQRK is intramolecularly expanded into a stacked plurality of modified amino acids that contains the first three amino acids of the peptide (W-Y- D), e.g., using the workflow shown in FIG. IB.
- the peptide comprises a click chemistry moiety on the C-terminal lysine, which allows conjugation atthe C-terminus to an oligonucleotide capture moiety using click chemistry (1 :50 dilution of 0.
- the peptide-oligonucleotide capture moiety conjugates are purified using cleanup kit (Oligo Clean & Concentrator TM from Zymo Research) with one or more modifications to the kit protocol, including (i) adding 20% more additional 100% ethanol after use of the oligo binding buffer, (ii) adding additional DNA buffer washes, (iii) centrifuging for 10 seconds at 16,000xg, or (iv) eluting 2-3 times.
- the enriched peptide-oligonucleotide capture moiety conjugates are coupled to a bifunctional linker, l-(2-azidoethyl)-4-isothiocyanatobenzene, which can react with the N- terminal amino group of the peptide-oligonucleotide capture moiety.
- the linker is mixed in a 50:50 ratio of 75% pyridine to water, pH 9.5, flushed with inert gas, and incubated at 50 degrees Celsius for one hour to conjugate the linker to the terminal amino acid (of the peptideoligonucleotide capture moiety complex).
- the reaction is then cleaned using an organic phase extraction, comprising an ethyl ether wash 6 times, removal of the ethyl ether, and an inert gas flush for 3 minutes.
- the amino acid-linker complex (linker-peptide-oligonucleotide capture moiety complex) is then purified using a Zymo cleanup kit, as described above.
- a linking nucleic acid molecule is conjugated to the amino acid-linker complex.
- the linking nucleic acid molecule comprises an alkyne moiety (octadiynyl linker) that can react with the azide moiety of the bifunctional linker of the linker-peptide-oligonucleotide capture moiety complex, using the same click chemistry reaction conditions described above.
- the resultant conjugate is then validated using DNA gel electrophoresis or HPLC and purified using a Zymo cleanup kit.
- the linking nucleic acid molecule is then circularized and ligated to the oligonucleotide capture moiety using T4 ligase buffer and 5 microliters of 2000 U/microliter of T4 ligase for every nanomole of starting material.
- the ligation reaction is incubated at room temperature for one hour.
- the ligated product is then validated using gel electrophoresis and purified using a Zymo cleanup kit.
- the resultant product comprises a circularized linking nucleic acid molecule- linker-peptide-oligonucleotide capture moiety complex see, e.g., FIG. IB).
- Cleavage oftheN-terminal amino acid is carried outusing neat TFA flushed with inert gas at 50 degrees Celsius for 30 minutes. Subsequent to cleavage, the TFA is flushed out using a 50 degrees Celsius incubation and application of a continuous flow of inert gas. The TFA is then neutralized using 100 microliters of water and 5 microliters of triethylamine, with continuous addition of tri ethylamine until the pH is > 7. The mixture is then washed with ethyl ether, the ethyl ether is removed, and an inert gas flush for 3 minutes is performed to remove residual ethyl ether. Cleavage yields a modified amino acid that comprises the cleaved N-terminal amino acid and the linking nucleic acid molecule that are coupled to the oligonucleotide capture moiety.
- the resultant stacked plurality of modified amino acids is further processed, including ligation to a barcode oligonucleotide, second strand synthesis, and gel purification, as described elsewhere herein.
- the processed stacked plurality of modified amino acids is then run through a commercially available nanopore sequencer (Oxford Nanopore Technologies MinlON).
- the resultant current levels are shown in FIG. 6 A as a function of time (top) or basepair position (bottom). Visible disruptions of the measured current (dotted boxes) are visible at the expected basepair positions, indicating that a detectable change to the measured current signal is caused by the presence of the (cleaved) amino acids.
- FIG. 6 A The resultant current levels (background subtracted) are shown in FIG. 6 A as a function of time (top) or basepair position (bottom). Visible disruptions of the measured current (dotted boxes) are visible at the expected basepair positions, indicating that a detectable change to the measured current signal is caused by the presence of the (clea
- FIGS. 6B show plots of the mean measured current level (background subtracted) as a function of basepair position (left plot) and four individual current traces from individual nanopore measurements as a function of basepair position. Visible drops in the measured current signal are at the expected basepair positions (140, 240, and 340).
- a synthetic construct of stacked (6-mer) plurality of modified amino acids comprising 6 modified amino acids is generated (P-P-D-R-P-D).
- Individual modified amino acids are first generated by conjugating barcoded oligonucleotides comprising an alkyne moiety (octadiynyl dU) with a synthetic PTC-derivatized amino acid (P, R, or D) that is conjugated to a bifunctional linker (1 -(2-azidoethyl)-4-isothiocyanatobenzene).
- Individual modified amino acids are then ligated to one another in pairs to generate a stacked 2-mer plurality of modified amino acids D-R, P-P, and P-D.
- the ligation reaction is performed by combining 5 micromolar of the two modified amino acids with 5.5 micromolar of a splint oligonucleotide and lx T4 ligase; annealing on a thermal cycler at 80 degrees Celsius and then slow cool to 16 degrees Celsius.
- the 2-mers are then cleaned up using SPRI (Ampure XP beads). Beads are added at an appropriate ratio, incubated on a rotator for 5 minutes at room temperature, pelleted, supernatant removed, and then washed with 80% ethanol twice. The residual ethanol is removed and dried.
- the resultant pellet is resuspended in nuclease-free water for 2 minutes, pelleted for 1 minute, and then eluted.
- the purified 2-mers are then combined and ligated together to generate the 6-mer using T4 ligase, in lx T4 ligase buffer in water, incubating at 37 degrees Celsius for 30 minutes to form the 6-mer.
- the purified product is then visualized using gel electrophoresis (2% agarose gel).
- second strand synthesis is performed to improve the nanopore sequencing readout or for DNA quantification.
- Second strand synthesis is performed using a polymerase (e.g., Bst polymerase in Thermpol buffer and dNTP mix), and gel extraction is performed thereafter for purifying the product (double-stranded 6-mer), followed by quantification (Qubit Fluorometer, Thermo Fisher Scientific).
- End repair is performed using Ultra II End Prep kits (New England Biolabs). Briefly, nanofiltered water (25.9 microliters) is combined with 24.1 microliters of DNA sample, 1 microliter of a DNA control sample, 3 microliters of Ultra II End prep reaction buffer, and 3 microliters of Ultra II end prep enzyme mix, incubated at 20 degrees Celsius for 5 minutes and 65 degrees Celsius for 5 minutes. 60 microliters of Ampure XP beads (Beckman Coulter) are added and rotated on a rotator for 5 minutes. The beads are then pelleted, supernatant removed, and washed with 80% ethanol twice.
- Ultra II End Prep kits New England Biolabs. Briefly, nanofiltered water (25.9 microliters) is combined with 24.1 microliters of DNA sample, 1 microliter of a DNA control sample, 3 microliters of Ultra II End prep reaction buffer, and 3 microliters of Ultra II end prep enzyme mix, incubated at 20 degrees Celsius for 5 minutes and 65 degrees Celsius for 5 minutes. 60 microliters
- the pellet is then spun down with a magnet and residual ethanol is removed, followed by resuspension nuclease-free water, pelleted for 1 minute, then eluted.
- the eluted product is measured using a Qubit fluorometer.
- the stacked (6-mer) plurality of modified amino acids is prepared for nanopore sequencing using Oxford Nanopore Technologies MinlON nanopore sequencer.
- An adapter ligation reaction is performed: Ligation Adapter, Quick T4 ligase, ligation buffer and the sample are combined, rotated for 10 minutes, then purified using Ampure XP beads, washing in Short Fragment Buffer, pelleting, resuspendingin Elution Buffer, pelleting, and then elutingthe purified (adapter-ligated) sample.
- the adapter-ligated sample (6-mer) is then loaded on a SpotON flow cell (Oxford Nanopore Technologies) according to the manufacturer’s instructions and run for ⁇ 12 hours.
- FIG. 7 shows plots of the mean measured current level (background subtracted) as a function of time (top) and basepair position (bottom). Visible drops in the measured current signal are visible at the expected basepair positions (140, 240, and 340, 440, 540, and 640). The results indicate that the individual amino acids produce detectable disruptions in the current signatures.
- Example 2- Classification of a stacked plurality of modified amino acids using nanopore sequencing.
- a stacked plurality (3-mer) of modified amino acids is generated from a peptide sequence and run through a commercially available nanopore sequencer (Oxford Nanopore Technologies MinlON).
- the sequencing reads are input into a custom in-house machine learning algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- Each of the three modified amino acid types is then barcoded by ligating a barcode oligonucleotide and adapter-ligated by ligating an ExoR-containing adapter oligonucleotide by combining 2.5 micromolar purified modified amino acid with 3.5 micromolar barcode oligonucleotide, 3 micromolar of two splint oligonucleotides, 3.5 micromolar of the adapter oligonucleotide, T4 ligation buffer, annealing the reactions at 80 degrees Celsius for 2 minutes, then slow cooling to 16 degrees Celsius, then ligating using T4 ligase (lx) diluted in T4 ligation buffer and water.
- the mixture is incubated at 25 degrees Celsius for 30 minutes, then 65 degrees Celsius for 10 minutes.
- the ligation is performed at 16 degrees Celsius for 30 minutes, then 25 degrees Celsius for 90 minutes, 37 degrees Celsius for 30 minutes, 65 degrees Celsius for 10 minutes, then held at 16 degrees Celsius.
- the resultant product is validated using an E-gel, treated with Thermolabile Exo I and Exo III by incubating for 10 minutes at 37 degrees Celsius, followed by addition of 18 microliters of 0.5 M EDTA. Cleanup is performed using a Zymo cleanup kit. Second-strand synthesis is performed, followed by end repair, adapter ligation, and cleanup, as described above.
- the purified samples are then run on a SpotON flow cell (Oxford Nanopore Technologies MinlON), and the sequencing reads are screened for quality and input into a custom in-house machine learning algorithm for training the algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- raw current traces corresponding to the modified amino acids or stacked plurality of modified amino acids are segmented and converted into a DNAbase sequence using a basecalling algorithm.
- the current segmentation values are used to create a mapping between the raw current values and the correspondingDNA sequence (basecalls).
- the basecalls are aligned to the expected reference DNA sequence using an open-source sequence alignment algorithm.
- the segmentation and alignment information are then usedto extractthe portion of the raw current traces corresponding to 30bp windows surrounding the amino acid sites. These extracted traces are then median background subtracted andlow-pass filtered, after which they are used for training or testing the custom in-house machine learning algorithm.
- R-P-D a synthetic construct of stacked (3-mer) plurality of modified amino acids comprising 3 modified amino acids is generated (R-P-D).
- the individual modified amino acids (PTC-derivatized) R, P, and D, are ligated to one another, and further processed (e.g., cleanup, purification, second-strand synthesis, adapter ligation), as described above, to generate the stacked (3-mer) plurality of modified amino acids.
- the stacked (3-mer) plurality of modified amino acids is run through the nanopore sequencer, and the sequencing reads are pre-processed and input into the custom in- house machine learning algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- FIG. 8 shows normalized confusion matrices of the predicted amino acid type on the x-axis and the actual amino acid type of the modified amino acid on the y-axis.
- the left-hand plot indicates the training data set (individual modified amino acids R, P, and D). 50,000 reads are input into the model, 90% of which are used to train the model, and 10% of which are used to test the model. The overall average accuracy of the 3 amino acid types is 67.4%.
- the right-hand plot indicates the normalized confusion matrix of the stacked (3-mer) plurality of modified amino acids (R-P-D) predicted from the model, with an overall average accuracy of 48.5%.
- the schematic of the stacked (3-mer) plurality of modified amino acids is shown on the top. The overall average accuracy of the 3 amino acid types can be improved by imposing a confidence threshold, as described further below.
- Example 3 Detection of 23 distinct modified amino acids and PTM-modified amino acids [00417]
- 18 different modified amino acids are generatedand run through a commercially available nanopore sequencer (Oxford Nanopore Technologies MinlON).
- the sequencing reads are input into a custom in-house machine learning algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- Individual modified amino acids are first generated by conjugating an oligonucleotide comprising an alkyne moiety (octadiynyl dU) with a synthetic amino acid-linker complex, which comprises a PTC-derivatized amino acid (one of the 18 listed above)thatis conjugated to a bifunctional linker (1 -(2-azidoethyl)- 4-isothiocyanatobenzene).
- a synthetic amino acid-linker complex which comprises a PTC-derivatized amino acid (one of the 18 listed above)thatis conjugated to a bifunctional linker (1 -(2-azidoethyl)- 4-isothiocyanatobenzene).
- oligonucleotides 4 microliters of 400 micromolar oligonucleotides is combined with 20 microliters of 10 mM of the amino acid-linkercomplex, 5 microliters of THPTA (60mM), 5 microliters of CuSCL (20 mM), 5 microliters of sodium ascorbate (100 mM), 5 microliters of PBS and 5 microliters of water.
- the mixture is incubated at 50 degrees Celsius for 30 minutes.
- the mixture is then cleaned up usingaZymo kit, as described above.
- the purifiedmodified amino acids are then tested for purity and quantity using HPLC, a nanodrop analyzer, and/or gel electrophoresis.
- a barcode oligonucleotide is ligated to each of the modified amino acid types by combining 1 micromolar purified modified amino acid with 1.2 micromolar barcode oligonucleotide, 1.1 micromolar of a splint oligonucleotide, T4 ligation buffer, and 100 U/microliter T4 ligase in water. The mixture is incubated at 25 degrees Celsius for 30 minutes, then 65 degrees Celsius for 10 minutes. The barcoded modified amino acids and run on a SYBR gold DNA gel for confirmation. The barcoded modified amino acids are then treated with Exonuclease I and prepared for nanopore sequencing using Oxford Nanopore Technologies MinlON nanopore sequencer. The samples are treated with T4 PNK and then annealed to an adapter, heat annealing at 94 degrees Celsius for 2 minutes, cooling at room temperature for 20 minutes.
- the adapter-ligated sample is then loaded on a SpotON flow cell according to the manufacturer’s instructions.
- the sequencing reads are input into a custom in-house machine learning algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- the model is trained using the reads from the nanopore sequencing 135,000 reads are input into the model, 90% of which are used to train the model, and 10% of which are used to test the model.
- FIG. 9 shows normalized confusion matrices of the predicted amino acid type on the x-axis and the actual amino acid type of the modified amino acid on the y-axis.
- the left-hand plot shows the normalized confusion matrix in which all data, regardless of predicted confidence (the probability that is assigned by the machine-learning algorithm for the predicted amino acid type), is included.
- the overall average accuracy of the 18 amino acid types is 37%.
- the middle plot indicates the output in which a confidence threshold is set at 0.6, yielding an improved overall average accuracy to 58.2%.
- the right-hand plot indicates the output in which the confidence threshold is set to 0.75, yielding an overall average accuracy of 67.2%.
- FIG. 10 shows a plot of prediction accuracy as a function of the confidence threshold for each of the 18 amino acid types. For most amino acid types, increasing the confidence threshold increases the overall accuracy. The right-hand plot shows the number of reads that are removed or below the confidence threshold as well as the overall accuracy (of all 18 amino acid types) as a function of the confidence threshold.
- a modified amino acid comprising a post-translational modification is generated, as described herein, and run through a commercially available nanopore sequencer (Oxford Nanopore Technologies MinlON).
- PTM-containing modified amino acids Synthetic modified amino acids are generated for 3 different post-translational modifications: Acetyl-Lys, Methyl-Arg, and Phospho-Tyr. A modified oxidized Cys and PITC-conjugated Lys are also included.
- the individual PTM-containing modified amino acids are first generated as described above and each comprise an oligonucleotide.
- the modified amino acids are processed as described above, run through a nanopore sequencing system (Oxford Nanopore Technologies MinlON), and the sequencing reads are input into a custom in-house machine learning algorithm for identification of the modified amino acids (the amino acid type comprised by the modified amino acid).
- FIG. 11 shows a normalized confusion matrix of the classified molecule, PITC- conjugated Lys (K-P), oxidized Cys (C), acetyl-Lys (Ac-K), methyl-Arg(mR), phospho-Tyr (pY), and a negative control (alkyne linker control) on the x-axis and the actual molecule on the y-axis. 7,349 reads are input into the model, 90% of which are used to train the model, and 10% of which are used to test the model. As can be visualized, classification of the correct molecule is achieved with modest accuracy, 57.3%.
- the right-hand plot shows the normalized confusion matrix for all 18 modified amino acids, the three PTM-containing modified amino acids, and the negative control.
- 10,000 reads are input into the model, 90% of which are used to train the model, and 10% of which are used to test the model.
- the overall average accuracy is -41.5% (confidence threshold set at 0).
- This set of experiments aims to demonstrate the feasibility of sequentially capturing and removing 5 amino acids from the N-terminus of a peptide through an intramolecular expansion process and the feasibility of measuring the intramolecularly expanded molecule on a commercially available nanopore detection system (Oxford Nanopore Technologies MinlON).
- Materials and Methods Materials, buffers, and enzymes: Magnetic beads, Zymo oligo clean and concentrator kit (Zymo Research #D4060), EDC (Thermo Scientific #PG82079), BF3 (Sigma-Aldrich #175501), MES buffer (Thermo Scientific #28930), T4 DNA ligase buffer (NEB #B0202S), PBST (Thermo Scientific #28352), ClickP conjugation buffer (45% pyridine, 25% dimethylformamide, 20% water, 5% triethylamine), T4 DNA ligase (NEB #M0202M), T4 PNK (NEB #M0201L), FastDigestEarl (Thermo Scientific #FD0234), E-gel EX 2% agarose gel (Thermo Scientific #G402022), Therm olabile Proteinase K (NEB #P8111 S), Qubit HS DNA kit (Thermo Scientific #Q32854),
- Magnetic beads are covalently conjugated with a 3 ’amine anchor DNA (“anchor DNA”), using EDC. Beads are washed twice with MES buffer and then conjugated at 50 mg/mL for three hours with 0.25 mM anchor DNA and 0.25MEDC in MES buffer for 3 hours at room temperature. After anchor DNA conjugation, the beads are washed three times with TT buffer (250 mM Tris pH 8, 0.01% Tween- 20) and then stored at 10 mg/mL in TE buffer (10 mM Tris pH 8, 1 mM EDTA).
- TT buffer 250 mM Tris pH 8, 0.01% Tween- 20
- a synthetic peptide with the sequence ERPGQ ⁇ K-TMR ⁇ GQR ⁇ K-AZIDE ⁇ (comprising a TMR fluorophore and a C-terminal azidolysine) is conjugated to a capture moiety, another DNA molecule, via the C-terminal azidolysine on the peptide and an internal DBCO on the DNA (capture moiety).
- the peptide-DNA (capture moiety) complex is purified using a Zymo Oligo clean and concentrator.
- the peptide-DNA (capture moiety) is covalently linked to the anchor-DNA on the bead surface with T4 DNA ligase. Ligation is performed for 1 hour at 37C and includes 750 pmols of peptide-DNA (capture moiety) conjugate and 5 mg of anchor DNA conjugated beads, along with complementary DNA sequences to support ligation.
- Amino acid expansion step 1 - ClickP conjugation Beads are washed twice with PBST and then resuspended in a linker “ClickP” conjugation buffer with 100 mM ClickP and incubated at 50C for 1 hour.
- the ClickP is a bifunctional linker comprising a click chemistry moiety and an amino acid reactive group.
- ClickP refers to l-(2-azidoethyl)4- isothiocyanatobenzene, but it will be appreciated that other amino acid reactive groups (e.g., isothiocyanates, guanidinyl groups, dithioesters, etc.) and other reactive moieties (e.g., click chemistry moieties) are contemplated.
- Beads are washed with methanol and PBST three times each. Excess complementary DNA is added in PBST to reanneal to any single-stranded peptide- DNA (capture moiety) or anchor-DNA and linking DNA (see next step) on the bead surface.
- Amino acid expansion step 2 - linking DNA conjugation Beads are washed twice with PBST. Beads are then contacted with a click-functionalized polymerizable molecule, a linking DNA oligo comprising an alkyne (DBCO), by resuspension of the beads in a solution containing 200 uM of the linking DNA oligo in PBST. The copper-free click reaction is run at 37C for approximately 1.5-3 hours. Beads are washedtwice with PBST, then once with TE buffer.
- DBCO alkyne
- Amino acid expansion step 3 linking DNA circularization'. Beads are washed twice with PBST. Beads are resuspended in a ligase circularization solution with T4 DNA Ligase, T4 PNK, and 10 uM DNA splint in T4 DNA Ligase buffer. One half of the splint is designed to be partially complementary to the 5’ end of the peptide DNA (capture moiety) (cycle 1) or the 5’ end of the linking DNA oligo used in the preceding cycle (cycles 2-5); the other half of the splint is designed to be partially complementary to the 3 ’ end of the new linking DNA oligo added in this cycle. The ligation is run for 30 minutes at 37C. Beads are washed with PBST twice, then once with TE buffer.
- Amino acid expansion step 4 - chemical peptide cleavage Beads are washed twice with PBST. Beads are washed with methanol and anhydrous acetonitrile. Beads are resuspended in 80 mM BF3 (Sigma-Aldrich #175501) in acetonitrile and incubated for 30 minutes at 50C. Beads are washed twice with TT buffer (250 mM Tris pH 8, 0.01% Tween-20), twice with PBST, and once with TE buffer. Cleavage of the N-terminal amino acid results in a modified amino acid that is covalently linked to the linkingDNA oligo and the capture moiety . Then the cycling process continues again from step 1 (ClickP conjugation), linking the linkingDNA oligo from each cycle to that of the previous cycle, until five rounds are completed.
- step 1 ClickP conjugation
- DNA gel analysis of QC aliquots After each step of the expansion process, an aliquot with 0.1 mg of beads is removed to perform gel analysis of the cycling process. After the process is complete, the collected aliquots are washedtwice with freshly prepared 200 mM sodium hydroxide, then washed twice with PBST and once with TE. After this wash, excess complementary DNA is added in PBST to reanneal to any single-stranded peptide-DNA (capture moiety), anchor-DNA, and linking DNA oligo on the bead surface. For this reannealing reaction beads are incubated at 50C 10 min, 37C 5 min, room temperature 5 min.
- beads are washed twice with PBST andtwicewith TE. Beads are then resuspended in a lOuLFastDigest Earl restriction enzyme digest and incubated for 1 hour at 37C, 5 min at 80C, then cooled to 4C. After the digest is complete 4 uL of each supernatant is run on a 2% agarose E-gel for visualization.
- Nanopore sequencing analysis of 5 cycle products After five cycles of the expansion process is complete, an aliquot of 2 mg of beads is removed for nanopore sequencing Before removal for sequencing, the material on the beads is ligated with a DNA adapter and sample barcode using T4 DNA ligase and T4 PNK in T4 DNA ligase buffer; this ligation is performed for 2 hours at 37C. The beads are washed twice with freshly prepared 200 mM sodium hydroxide, then twice with PBST. Beads are then treated with thermolabile proteinase K in PBST for 1 hour at 37C to remove excess C-terminal peptide residues.
- FIG. 12 shows analysis of the linking DNA oligo length from the ONT nanopore sequencing output after trimming the common adapters at the 5 ’ and 3 ’ end of the molecule.
- the region between these two adapters contains the accumulated linking DNA oligo sequences representing up to five amino acids extracted sequentially from the N-terminus of the test peptide.
- dotted lines are used to indicate the length corresponding to one, two, three, four, and five successful ClickP-amino acid DNA capture events (i.e., generation of a stacked plurality of modified amino acids comprising five modified amino acids).
- FIG. 13 shows an overlay of extracted ONT nanopore current traces for reads corresponding to five successful ClickP-AA DNA capture events (a stacked plurality of modified amino acids comprising five modified amino acids). Below the current trace overlay, four individual single-molecule traces are shown. The current traces show the background-subtracted current level as a function of basepair position, plotted in the 5’ to 3 ’ direction. The five dips in the signal correspond to the five sites of ClickP-AA capture. Due to the orientation of the plot, the leftmost dip corresponds to the fifth amino acid and the rightmost dip corresponds to the first amino acid.
- a peptide attached to an oligonucleotide is run through a nanopore sequencer and the current blockade level is measured.
- the modified amino acid or the uncleaved tripeptide conjugated to the linking DNA oligo are ligated to barcode sequences and adaptor sequences for a sequencing run on a MinlON Nanopore sequencer as described in Example 3. Nanopore sequencing is carried out as described in Example 3.
- FIG. 14 shows example current traces obtained from a nanopore for the modified amino acids and the uncleaved tripeptide conjugated to the linking DNA oligo.
- the tripeptide conjugated to the linking DNA oligo generates larger current blockade than the modified amino acids comprising the individual amino acids linked to the linking DNA oligo when run on a nanopore.
- Panel A shows a plot indicating overlayed current traces from nanopore runs of the modified amino acids and the tripeptides conjugated to the linking DNA oligo.
- Panel B shows a scatter plot showing the signal duration and current blockade amplitude for the four different types of constructs. Each point represents an individual molecule.
- Panel C shows a bar plot showing the mean current blockade amplitude for the four different types of constructs. Error bars are standard deviation.
- protein or peptide analytes may be processed prior to analysis.
- the processing of the protein analytes comprises fragmentation (e.g., via protease digestion) and attachment of fragmented peptides to a substrate for conducting singlemolecule peptide sequencing, e.g., via intramolecular expansion and nanopore analysis.
- FIG. 15 schematically shows an example workflow for preparing proteins for single-molecule peptide sequencing.
- a sample protein of interest is provided.
- the sample protein may be a protein thatis provided as part of a protein mixture, e.g., arisingfrom abiological sample such as a cell lysate or a tissue sample.
- the sample protein is fragmented, e.g, using an enzymatic approach such as LysC digestion to yield peptide fragments comprising C- terminal lysine residues.
- further modifications or processing of the sample protein or fragmented proteins is performed, e.g., alkylation (e.g., using iodoacetamide).
- a linker can be used to functionalize the amine groups (e.g., N-terminal amino acid and lysine residues) of the processed peptide fragments.
- the linker is the same linker that is used in the intramolecular expansion process described herein.
- the linker is different than the linker used in intramolecular expansion.
- the linker may comprise an amine-reactive moiety (e.g., isothiocyanate, PITC, guanidinylating group, etc.) and an additional reactive moiety, e.g., click chemistry moiety.
- the linker is conjugated to the N-terminal amino acid and the C-terminal lysine residues of the processed peptide fragments.
- additional amino acid reactions may be performed (shown as process 1507), which may occur at any useful step, e.g., before fragmentation, before a first processing (e.g., alkylation), etc.
- an Edman degradation reaction may be performed to remove the N-terminal amino acid.
- a capture moiety e.g., a capture oligonucleotide is provided.
- the capture moiety may comprise a reactive moiety (e.g., a click chemistry moiety) that can react with the additional reactive moiety of the linker.
- the capture moiety may be reacted with the linker located on the C-terminal lysine residue.
- the processed peptide fragments can be attached to a substrate (e.g., a DNA bead).
- a substrate e.g., a DNA bead
- the capture moiety may be ligated to an anchor DNA molecule on the bead.
- the processed peptide fragments may then be prepared for protein sequencing, e.g., using an intramolecular expansion process and nanopore sequencing readout, as described elsewhere herein.
- the expected ion masses of the digested peptides are extracted using Agilent MassHunter Qualitative Analysis software.
- Undigested or LysC-digested proteins are separated by SDS-PAGE using Bio-Rad 4- 15% precast TGX (Tris-Glycine extended) stain-free gel in Tris-Glycine buffer. Proteins and peptides bands are visualized by staining with Coomassie stain (Bio-Rad, 1610786).
- EIC extracted peptide mass counts
- IAM iodoacetamide
- a final concentration of 4 mM is then added to a final concentration of 4 mM and incubated at 37°C for 30 minutes.
- the reactions are diluted more than four times by adding 220 pL of 50 mM Tris (pH 8). Subsequently, 5 pmol of LysC is added to all the reactions, followed by incubation at 37°C for 18 hours.
- the reactions are then concentrated in a speed vacuum concentrator before running LC-MS. 50 pmol of the digested peptides are separated, ionized, and analyzed by LC-MS. The expected ion masses of the digested peptides are extracted using Agilent MassHunter Qualitative Analysis software.
- the extracted peptide mass counts indicate successful cleavage of the peptides at the internal lysine residues (results not shown).
- the modification of peptides with iodoacetamide is also evaluated to maximize modification while preventing side reactions.
- Peptides (PLP(178-191), Endothelin 1, Eps) are prepared at 1 mM concentrations. For the reduction step, 5 pL of peptide (5 nmol) is incubated with 1 pL 2 M ammonium bicarbonate, 1 pL 100 mM TCEP, and 3 pL water for 10 minutes at room temperature.
- Alkylation is performed by adding 1 pL 100 mM iodoacetamide to the reduction mixture (or the starting peptide) and incubating for 30 minutes, followed by quenching with 1 pL 100 mM DTT.
- samples are prepared by diluting 2.5 nmol of peptide to final volume of 50 pL water. Control samples without reduction are also analyzed. A total of 12 samples (three peptides under reduced and non-reduced, alkylated and non-alkylated conditions) are analyzed by LC-MS, along with peptide-specific controls. Data analysis is performedto determine conversion percentage and side reactions. Results (not shown) indicate quantitative reactions, indicating the desired product is obtained without observing side reactions.
- FIG 16 shows the overlaid analyzed extracted peptide mass intensity (EIC) peaks for three representative expected UBB LysC-digested peptides before and after ClickP conjugation. Based on the expected masses, the shown digested peptide indicates conjugation of two ClickP molecules (“+2x CP”) per digested peptide, as expected.
- EIC extracted peptide mass intensity
- Conjugation of ClickP is also performed on a MOG peptide (MEVGWYRPPFSRVVHLYRNGK). Briefly, a reaction mixture is prepared by combining 100 pL of a 1 mM MOG stock solution (resulting in 100 nmol MOG), 10 pL of a 2 M ClickP stock solution (equating to 2 pmol ClickP, a 200-fold excess), 200 pL of 0.5 MNaPi buffer at pH 7.2, 90 pL of water, and 600 pL of acetonitrile. This yields a final reaction volume of 1 mLwith MOG at a concentration of 100 pM, ClickP at 20 mM, and NaPi buffer at 100 mM.
- the next step of sample processing for protein sequencing is conjugation of capture DNA oligos to the ClickP-peptides (e.g., as depicted schematically in 1511 of FIG. 15, which may be performed prior to 1209).
- the capture DNA oligos comprise an alkyne (octadiynyl dU) and TMR fluorophore for visualization.
- the capture DNA oligo is conjugated to ClickP-peptides from the previous step. Briefly, 2.5 pL of 40 pM ClickP-peptides is combined with 1 .4 pL of lOx serial dilutions of oligonucleotide (1 mM, 100 pM, 10 pM, and 1 pM), 1 .3 pLof a premixed equal volume of BTTES (60 mM) and CuSO4 (20 mM), 0.65 pL of sodium ascorbate (20 mg/mL), and 0.65 pLofPBS (lOx). Themixtureisincubated atroomtemperaturefor90 minutes. Thereactions are run on a 10% urea DNA gel for confirmation. The peptide-capture DNA oligonucleotide conjugates are visualized through staining with SYBR Gold.
- FIG 17 shows capture DNA oligo conjugation to the ClickP -peptide complexes.
- Lane 1 shows a DNA ladder.
- Lane 2 shows the capture DNA oligo.
- Lane 3 shows the conjugated product- the bottom band represents the unconjugated capture DNA oligo, the middle band represents one capture DNA oligo attached to the ClickP-peptide complex, and the top band represents two capture DNA oligos attached to the ClickP-peptide complex (one attheN-terminus and one at the C-terminal lysine side chain).
- Subsequent processing can include performing an Edman degradation cycle to remove the N-terminal amino acid (and capture DNA oligonucleotide), and attachment of the C-terminal capture DNA oligonucleotide to a substrate (e.g., bead or flow cell), which can facilitate the intramolecular expansion process.
- a substrate e.g., bead or flow cell
- Edman degradation of the N-terminal amino acid may be performed (e.g., as depicted in 1509 of FIG. 15), prior to conjugation of the processed peptide fragment with the capture DNA oligonucleotides.
- the processed peptide fragment may comprise two ClickP linkers, one atthe N-terminus and one atthe C-terminal lysine side chain; subsequent Edman degradation results in a processed peptide fragment with one removed amino acid (N-terminal amino acid) and one ClickP linker remaining at the C-terminal lysine side chain.
- conjugation with a capture DNA oligo may be performed, yielding a processed peptide fragment comprising a capture DNA oligo at the C-terminal lysine residue.
- conjugation with a capture DNA oligo may be performed, yielding a processed peptide fragment comprising a capture DNA oligo at the C-terminal lysine residue.
- 500 pL of a 1 :1 TFA/H2O mixture is added and incubated at 50°C for 30 minutes.
- the solvent is evaporated under nitrogen gas and the residue is dissolved in 200 pL of a 1 : 1 DMSO/water mixture.
- the solution is injected into the LCMS.
- FIG. 18 shows the LC-MS TIC trace for the starting material (ClickP-peptide fragment having two ClickP molecules, one at the N-terminus and one at the C-terminal lysine side chain) at the top, and the result of the Edman degradation cleavage (bottom), the ClickP- peptide fragment having the N-terminal amino acid (and ClickP) removed.
- the starting material ClickP-peptide fragment having two ClickP molecules, one at the N-terminus and one at the C-terminal lysine side chain
- the result of the Edman degradation cleavage bottom
- the ClickP- peptide fragment having the N-terminal amino acid (and ClickP) removed As expected, a shift is observed, indicating successful N-terminal cleavage.
- Subsequent processing may be performed, e.g., conjugation of the capture DNA oligo to the C-terminal lysine ClickP linker, and attachment of the peptide-capture DNA oligo to a substrate for intramol
- spacer moieties to increase the distance between a peptide and the capture DNA oligonucleotide may improve reaction kinetics, such as for linker conjugation.
- a PEG spacer is placed between the peptide and the C-terminal capture DNA oligo.
- a synthetic peptide containing a TMR-labeled residue is used. Four conditions are tested: 1 . Peptide linked to the capture DNA oligonucleotide (control), 2. Peptide linked to the capture DNA oligonucleotide with a PEG4 molecule located in between. 3. Peptide linked to the capture DNA oligonucleotide with a PEG 12 molecule located in between. 4. Peptide linked to the capture DNA oligonucleotide with a PEG24 molecule located in between.
- Bead Preparation ' First, input beads are prepared by annealing magnetic beads containing anchor oligo to a splint oligo. To prepare the annealing mix, the splint oligo is used at a stock concentration of 1000 pMto achieve a target concentration of 10 pM. For each 0.1 mg of beads, 0.1 pL of splint oligo is added, corresponding to 4.4 pL for 44 reactions. PB ST is added at 9.9 pL per 0.1 mg, totaling 435.6 pL for 44 reactions. The final volume of the mix is 10 pL per 0. 1 mg of beads, resultingin a total of 440 pL for 44 reactions.
- Beads are prepared by pelleting and resuspending 400 pL (4 mg) of anchor oligo-conjugated magnetic beads in 400 pL of the splint annealingmix. The bead batch is then shaken at 50°C for 10 minutes at 1400 rpm. Following this, the beads are pelleted and washed twice with 500 pL of PBST, then resuspended in 400 pL of PBST.
- Peptide Oligo Annealing and Ligation For each peptide, annealing and ligation mixes are prepared to ensure accurate target concentrations and volumes for each peptide condition listed above.
- the peptide- capture DNA conjugates either comprising PEG spacers or not, are diluted to 1 and 10 pM. 6.25 uL of 1 or 10 uM dilutions of each peptide-capture DNA conjugate is then spiked into each reaction and mixed.
- Annealing is performed by gradually decreasing the temperature from 80°C to 16°C at a rate of -0.1°C/s. Subsequently, 0.5 pL of T4 DNA ligase is added to each peptide mix.
- the beads 50 uL of the beads are aliquoted into 8 PCR tubes and resuspended in 25 uL of each mix.
- the peptide-capture DNA conjugate is then ligated for 30 minutes at room temperature on a rotator.
- the beads are then washed twice with 200 pL of PBST and resuspended in 125 pL of TE buffer. Two replicates are created from each condition by aliquoting 54 uL into 9 wells of a PCR strip. A 2 pL aliquot of beads is taken from each reaction and diluted in 98 pL of 0.1% Tween-20 for input qPCR quality control.
- Linker (“ClickP”) and Linking DNA Oligo Conjugation: Beads are washed twice with 100 pL of PBST. In a fume hood, the beads are pelleted and resuspended in 47.5 pL of conjugation buffer. To each tube, 2.5 pL of 2M ClickP in ACN is added, and the reactions are thoroughly mixed. The beads are incubated at 50°C for 60 minutes on a multi-therm shaker. Following this, the beads undergo three cycles of washing, each consisting of pelleting and resuspending in 200 pL of methanol, followed by 200 pL of PBST.
- the reannealing mix is prepared, and all conditions are resuspended in 10 pL of this mix.
- the beads are shaken at 50°C for 10 minutes at 1400 rpm, then shaken for 5 minutes at 37°C at 1400 RPM, and finally incubated for 5 minutes at room temperature on the rotator.
- the beads are resuspended in 52 pL of lx TE buffer.
- a 2 pL aliquot of beads is taken and diluted in 98 pL of 0.1% Tween-20 for post-ClickP quality control.
- the beads are washed twice with 100 pL of PBST.
- the enzyme is heat-killed on a PCR machine at 80°C for 5 minutes, followed by slow cooling to 16°C at a rate of -0.1°C/s. Each sample is then mixed with 18 pL of RNA buffer and incubated at 80°C for 5 minutes. The samples are loaded onto a 5% PAGE gel and run at 300V for 20 minutes. The gel is analyzed for bands around 60 nt (unconjugated) and 120 nt (conjugated). The intensity of each band is quantified using Image J software to image the gel. % Clicked is then calculated by dividing the intensity of the “conjugated” band to the total intensity of the “conjugated” and “unconjugated” bands.
- translocation speed of a modified amino acid may be modulated using one or more different approaches. It is hypothesized that insertion of a spacer moiety into the polymerizable molecule of a modified amino acid may affect translocation of the modified amino acid through a nanopore. To test this hypothesis, amino acid-DNA constructs comprising varying lengths of a PEG spacer are run through nanopores in Oxford Nanopore Technology’s MinlON Device.
- Oligonucleotideswith Spacers Oligonucleotides with Int C3 Spacer, Int Spacer 9 (triethylene glycol) and Int Spacer 18 (hexa-ethyleneglycol) spacers at varying positions relative to the site of the amino acid are synthesized by Integrated DNA Technologies.
- the modified amino acids are covalently linked to the anchor-DNA on the bead surface with T4 DNA ligase. Ligation is performed for 1 hour at 37C and includes 50 pmols of modified amino acid and 0.5 mg of anchor DNA conjugated beads, along with complementary DNA sequences to support ligation. A unique barcode for each modified amino acid is then ligated onto each bead construct, and T4 PNK is used to phosphorylate the 5’ end of the construct. The T4 ligase and T4 PNK are added to the beads and incubated in T4 ligase buffer according to the NEB T4 PNK protocol. Beads are then washed twice with NaOH, incubating for 3-5 minutes each time while shaking to remove any non-ligated fragments.
- Second Strand Synthesis and End Repair A short strand of DNA is then reannealed onto the molecule to serve as a primer, and the beads are incubated with Bst large fragment in Thermo Pol buffer, with 0.2mM dNTP and 2mM dATP to ensure the creation of a 5’ A-overhang for adapter ligation. After incubation at 60C in this solution for 10 minutes, beads are washed twice in PBST, End repair is performed. Beads are then washed twice with TE buffer to prepare for digestion.
- FIG. 19 shows the mean signal duration at each base for each spacer arrangement.
- the top-left plot shows the oligo with no spacers (control condition), the top right shows the oligo that has two Int C3 spacers, centered around the 1 Oth basepair from the OctdU on the 3 ’ end.
- the bottom-left plot shows the oligo that has 2, Int C3 spacers centered around the 6th basepair from the OctdU on the 3’ end and 2, 3 Int C3 spacers centered around the 14th.
- the bottom-right plot shows the oligo that has 4, Int C3 spacers, each 1 basepair apart, centered around the 1 Oth basepair from the OctdU.
- the amino acid is located around the 30th basepair. All three spacer constructs show that the mean translocation time increases around the location of the amino acid, indicating that they slowed down the amino acid’ s translocation through the pore.
- the delay profile for each spacer arrangement varies slightly, showing that different arrangements of spacers can delay the translocation in different ways.
- the polymerizable molecules described herein may be linked directly to amino acids (e.g., an N-terminal amino acid) to generate a modified amino acid or modified amino acid precursor.
- generating a modified amino acid comprises contacting a polymerizable molecule comprising an amino acid reactive group with an amino acid (e.g., an N- terminal amino acid), and polymerizing the polymerizable molecule.
- the polymerizable molecule comprises a modified nucleotide comprising an amino acid reactive group (e.g., isothiocyanate, such as PITC).
- the modified nucleotide may comprise a modified uracil that comprises a PITC moiety, referred to hereinafter as “ClickP-dUTP”.
- ClickP-dUTP is a hydrophilic molecule that can ameliorate solubility issues of ClickP.
- ClickP-dUTP can also obviate the need for a click reaction and enables polymerase as a method for connecting two different DNA strands together to thereby generate a modified amino acid comprising a DNA backbone (e.g., as depicted schematically in FIG. IB (inset)).
- ClickP-dUTP conjugation is carried out in pH buffered water with ClickP-dUTP at 100 mM concentration.
- the beads are pelleted and resuspended in 100 pL of MeCN + TEA pre-wash solution, incubated at room temperature for 5 minutes while shaking, then pelleted and the supernatant removed.
- the beads are resuspended in 9 pL of coupling reaction solvent (DMF with 10% borate buffer, pH 9.5) and 1 pL of ClickP or ClickP-dUTP is added.
- the mixture is incubated at 50°C for 1 hour while shaking, followed by three cycles of washing with 100 pL of methanol and PBST.
- ACN and 80 mM BF3 in ACN are collected from sealed stock bottles using a nitrogen balloon method. Beads are pelleted and resuspended in 100 pL of methanol, then pelleted again and resuspended in 100 pL of anhydrous ACN. The beads are resuspended in 100 pL of 80 mM BF3 (anhydrous in ACN), mixed by pipetting, purged with nitrogen for ⁇ 30 seconds before sealing, and incubated at 50°C for 30 minutes at 1400 rpm on a multitherm. The beads are then washed twice times with 100 pLof TTbufferfollowedby washing twice with 100 uL of PBST. The beads are resuspended in 100 pL of PBST.
- FIG. 20 shows a plot of TMR fluorescence signal observed from the peptide on the bead when ClickP or ClickP-dUTP is provided at a 0 mM (negative control) condition and at lOOmM concentration.
- a drop of the TMR signal is observed, corresponding with cleavage of the N-terminal amino acid.
- ClickP and ClickP-dUTP have similar combined conjugation and cleavage efficiency, indicating that using ClickP-dUTP may be a viable alternative to ClickP. Further experiments will test polymerase extension of the ClickP-dUTP to generate a modified amino acid comprising a DNA- backbone.
- the ClickP-dUTP is polymerized, using a polymerase, after conjugation to the N-terminal amino acid of a peptide, to generate the modified amino acid (e.g., as depicted schematically in FIG. IB inset) comprising a DNA backbone.
- the beads can be washed with a wash buffer solution. Polymerase buffer is then used to wash the beads twice before adding the polymerase and primer mix. Bst polymerase (80 units) and a primer (100 pmoles) may then be added to the reaction mixture, and the beads are incubated on a shaker at 37°C for 30 minutes. Post-reaction, beads may be washed with water five times.
- the modified amino acid is generated via a chemical expansion process that does not require a polymerizable molecule.
- FIG. 21 schematically shows an example mechanism of performing intramolecular expansion of a peptide via incorporation of an amino acid between individual amino acids of a peptide.
- a peptide is reacted with thionyl chloride (SOC1 2 ) at elevated temperature (e.g., 70 degrees Celsius) for 3 hours, then reacted with a spacer molecule, such as another amino acid (e.g., glycine), that is to be incorporated between the individual amino acids of the peptide.
- SOC1 2 thionyl chloride
- spacer molecule such as another amino acid (e.g., glycine)
- glycine is combined with triethylamine and 1, 2-dichloroethane and reacted with the peptide at 70 degrees Celsius for 12 hours.
- the resulting product incorporates the glycine between the individual amino acids of the peptide.
- This process maybe repeatedN times, with the same or different spacer molecule, to achieve any useful spacing or distance between the individual amino acids of the peptide.
- FIG. 22A schematically illustrates alternative example workflows to those shown in FIG. IB.
- a polymeric analyte 2203 and a capture moiety 2205 are provided.
- the polymeric analyte 2203 is a peptide
- the capture moiety 2205 comprises a DNA molecule (“capture DNA molecule”) that comprises an EARI restriction site and is covalently coupled to the peptide.
- the peptide is coupled to a substrate, a magnetic bead, via the capture DNA molecule, which is ligated to an anchor oligonucleotide on the bead.
- a linker 2209 and polymerizable molecule 2211 are provided.
- the linker 2209 is pre-conjugated to the polymerizable molecule 2211, which is a linking nucleic acid molecule.
- the linker 2209 comprises an isothiocyanate moiety that can conjugate directly to the N-terminal amino acid of the peptide.
- the peptide and the linker 2209 and polymerizable molecule 2211 coupled thereto are removed from the bead using EARI restriction digest.
- process 2212 coupling via ligation of the linking nucleic acid molecule to the capture DNA molecule (not shown) or the amino-acid linker complex of the previous round is performed.
- process 2214 the ligated constructed is re-attached to the bead.
- process 2213 the N-terminal amino acid of the peptide is cleaved, revealing a new N-terminal amino acid.
- Example Workflows 2, 3, and 4 show additional alternative workflows.
- the DNA capture molecule is hybridized to an anchor oligonucleotide on the bead. Conjugation to a linker and linking nucleic acid molecule is performed, as described above.
- process 2207b the peptide, linker, and linking nucleic acid molecule coupled thereto are removed from the bead using dehybridization (e.g., heat denaturation). Processes 2212, 2213, and 2214 are then sequentially performed.
- dehybridization e.g., heat denaturation
- Example Workflow 3 shows an additional alternative workflow similar to that of Example Workflow 2 but with process 2214 being performed prior to process 2213.
- the capture DNA molecule and/or the anchor molecule may comprise a more reactionresistant moiety, e.g., PNA.
- Example Workflow 4 shows an additional alternative workflow similar to that of Example Workflow 3.
- process 2207c the peptide, linker, and linking nucleic acid molecule coupled thereto are removed from the bead using toehold displacement.
- Processes 2212, 2214, and 2213 are then sequentially performed.
- a digestion of the strand-displacing molecule is performed in process 2214 to facilitate reattachment of the ligated construct to the bead.
- Magnetic beads are covalently conjugated with an amine anchor DNA (“anchor DNA”), as described above.
- anchor DNA amine anchor DNA
- a test peptide-DNA conjugate (representing a peptide polymeric analyte coupled to an oligonucleotide capture moiety) is coupled to the magnetic bead via splinted ligation.
- the peptide has the sequence DRGWSMNGQGQ ⁇ Lys(TMR) ⁇ GQGQ ⁇ Lys(N3) ⁇ and is conjugated to an oligo /5Phos/CTTCCCTCCTCCTCCTTTTTCTCCCTTCCTTTTCCCTCCCTTCTCTCCTC/iAmMC 6T/TTCCCTC/i50ctdU/CCCTTTCTCTTCT, via the C-terminal azidolysine on the peptide and an internal DBCO on the oligo.
- the peptide-DNA also contains an ATTO647Nlabel which allows for imaging of the beads via flow cytometry.
- the peptide-DNA conjugate is covalently linked to the anchor DNA on the bead surface with T4 DNA ligase.
- Ligation is performed for 30 minutes at 37C and includes 12.5 pmols/mg of peptide-DNA conjugate and 4.5 mg of anchor DNA conjugated beads, along with complementary DNA sequences to support ligation. Following ligation, the beads are washed twice in lOrnM phosphate buffer pH 8 with 0.01% Tween-20 (“PT buffer”), then O. lM NaOH, then PT buffer again.
- PT buffer 0.01% Tween-20
- Amino acid expansion step 1 - ClickP-oligo conjugation Beads are resuspended in a conjugation buffer comprising a linker-linking nucleic acid conjugate (“ClickP-oligo”).
- the linker, “ClickP” comprises an isothiocyanate, which is an amino acid reactive group. Beads are incubated at 65C for 5 hours, then slow cooled to 10C and left overnight. Following conjugation, beads are washed with PT buffer.
- Amino acid expansion step 2 -removal from beads, linking DNA circularization, recapture-.
- Beads are pelleted and resuspended in a removal buffer to remove the peptide-DNA conjugates from the bead.
- the removal buffer comprises FastDigest Buffer and Earl enzyme.
- the beads are incubated at 37C on a shaker at 900 RPM for 60 minutes.
- the supernatant (containing the peptide-DNA conjugates) is collected, and the beads are treated with waterto further elute any residual peptide-DNA conjugates.
- the supernatant is combined with the previously collected supernatant.
- Ligation is then performed in solution using an annealing buffer comprising a DNA splint, heat killing the Earl at 80C f or 5 minutes f ollowed by slow cool, then incubating in a ligase circularization solution comprising T4 DNA Ligase and T4 DNA Ligase buffer.
- a portion of the splint is designed to be partially complementary to the 5’ end of the peptide DNA (capture moiety) (cycle 1) or the 5’ end of the linking DNA oligo usedin the preceding cycle (cycles 2-5); the other half of the splint is designed to be partially complementary to the 3’ end of the new linking DNA oligo added in this cycle.
- the ligation is run for 30 minutes at 37C.
- the peptide-DNA conjugates, now circularized and ligated to the linking DNA oligo, are then re-attached to or recaptured onto a new set of the anchor DNA conjugated beads.
- the anchor DNA-conjugated beads are first pelleted and washed in PT Buffer and then resuspended in a ligation buffer comprising T4 ligase buffer and T4 ligase.
- the beads are added to the ligation mixture comprising the peptide-DNA conjugates; the mixture is then mixed and incubated at 37C on a multitherm.
- the mixture is then pelleted and resuspended in 1 mL of PT Buffer and washed twice.
- the beads are then resuspended with a buffer comprising T4 ligase buffer and T4 PNK.
- Amino acid expansion step 4 - chemical peptide cleavage Beads are washed with methanol, followed by a wash in anhydrous acetonitrile. Beads are resuspended in a cleavage solution comprising either trifluoroacetic acid (TFA) or 80 mM BF3 (Sigma-Aldrich #175501) and incubated for 15 minutes at 50C. Cleavage of the N-terminal amino acid results in a modified amino acid that is covalently linked to the linking DNA oligo and the DNA capture moiety. Beads are washed twice with acetonitrile and a 0. IMNaOH and lOOmMDTT solution for 5 minutes at room temperature for 2-3 washes. The beads are then pelleted and washed in PT buffer. A sample is then aliquoted for gel analysis and flow cytometry analysis.
- TFA trifluoroacetic acid
- 80 mM BF3 Sigma-Aldrich
- DNA gel analysis and flow cytometry analysis ofQC aliquots After each step of the expansion process, an aliquot of beads is removed to perform gel analysis of the cycling process. Beads are then resuspended in a lO uL FastDigest Earl restriction enzyme digest mix and incubated for 1 hour at 37C, at 1400 RPMfor 1 hour. After the digestis complete, formamide is added to the tubes, incubated at 98C for two minutes, and then 5 uL of each supernatant is run on a 5% urea gel for visualization. An aliquot of beads is also analyzed using flow cytometry in the 647 channel.
- FIG. 22B shows example data of the flow cytometry and DNA gel analysis after one cycle of Example Workflow 1 .
- the flow cytometry plot shows a histogram of the output from each of the operations (from top to bottom: negative control, input of peptide-oligo, post-ClickP- oligo conjugation, post-ligation and recapture, and post-cleavage).
- the relative amount of material retained on the beads relative to the input (100%) is 92%, 39% and 37%, respectively.
- the right-hand panel shows the DNA gel data showing the output from each of the operations (from left to right: input of peptide-oligo, post-ClickP-oligo conjugation, post-ligation and recapture, and post-cleavage, and after a second cycleof conjugation).
- the operations from left to right: input of peptide-oligo, post-ClickP-oligo conjugation, post-ligation and recapture, and post-cleavage, and after a second cycleof conjugation.
- Magnetic beads are covalently conjugated with a 5’amine anchor DNA (“anchor DNA”), as described above.
- anchor DNA 5’amine anchor DNA
- Atest peptide-DNA (capture moiety) conjugate (representing a peptide polymeric analyte coupled to an oligonucleotide capture moiety) is coupled to the magnetic bead via complementary annealing
- the peptide has the sequence DRGWSMNGQGQ ⁇ Lys(TMR) ⁇ GQGQ ⁇ Lys(N3) ⁇ and is conjugated to an oligonucleotide capture moiety having the sequence /5Phos/CTTCCCTCCTCCTCCTTTTTCTTCCTTTTCCCTCCCTTCTCTCCTC/iAmMC 6T/TTCCCTC/i50ctdU/CCCTTTCTCTTCT via the C-terminal azidolysine on the peptide and an internal DBCO on the oligo.
- the peptide-DNA also contains an ATTO647Nlabel which allows for imaging of the beads via flow cytometry.
- the peptide-DNA is annealed to the complementary anchor DNA on the bead surface. Annealing is performed by first combining 18.75 picomoles of peptide-DNA conjugate and 0.3 mgof anchor DNA conjugatedbeads (Final concentration of 6.25 picomoles per mg). After mixing, the beads are incubated at 50C on a shaker at 900 RPM for 10 minutes, followed by a slow cool to 16C. Following annealing, the beads are washed with PBST and then resuspended in PBST to 10 mg/mL.
- Amino acid expansion step 1 - ClickP-oligo conjugation Beads are resuspended in a conjugation buffer comprising a linker-linking nucleic acid conjugate (“ClickP-oligo”).
- the ClickP comprises phenylisothiocyanate, which is an amino acid reactive group. Beads are incubated at 65C for 5 hours, then slow cooled to 10C and left overnight. Following conjugation, beads are washed with PBST buffer and resuspended in PBST to 10 mg/mL.
- Amino acid expansion step 2 -removal from beads, linking DNA circularization, recapture-.
- Beads are pelleted and resuspended in water.
- the beads are then incubated at 98C for two minutes to elute the peptide-DNA.
- the supernatant (containingthepeptide-DNA) is collected into a separate tube.
- the original beads are resuspended in 100 uL of water for later analysis. Ligation is then performed in-solution using an annealing buffer comprising a DNA splint diluted in a ligase circularization solution comprising T4 DNA Ligase and T4 DNA Ligase buffer.
- a portion of the splint is designed to be partially complementary to the 5’ end of the peptide DNA (cycle 1) or the 5’ end of the linking nucleic acid molecule (“linking DNA oligo”) used in the preceding cycle (cycles 2 onward); the other half of the splint is designed to be partially complementary to the 3 ’ end of the new linking DNA oligo added in this cycle.
- the ligation is run for 30 minutes at 37C.
- the sample is then incubated at 65C for ten minutes to heat kill the T4 ligase.
- a PNK treatment is then performed by spiking T4 PNK enzyme into each reaction to a final concentration of 200 u/mL, followed by a 30-minute incubation at 37C. This treatment converts the hydroxyl on the 5 ’ end of the new linking DNA oligo to a phosphate group to allow for addition of the next linker in future cycles.
- Amino acid expansion step 4 - chemical peptide cleavage Trifluoroacetic acid (TFA) is added to each supernatant to a final concentration of 10 or 20% and incubated for 30 minutes at 50C. Cleavage of the N-terminal amino acid results in a modified amino acid that is covalently linked to the linking DNA oligo and the oligo capture moiety. Each reaction is neutralized by adding 10M imidazole with either3.68Mor 7.36MNaOHto neutralize the 10 and 20% reactions respectively.
- TFA Trifluoroacetic acid
- the peptide-DNA conjugates nowligated to the linkingDNA oligo and linearized, are then re-attached to or recaptured onto a new set of the anchor DNA conjugated beads.
- the anchor DNA-conjugatedbeads arefirstpelletedandwashedin 10mMphosphatebufferand0.01% Tween-20 and resuspended to a concentration of 10 mg/mL.
- the beads are resuspended in the neutralized reaction and mixed.
- the peptide-DNA conjugates are then annealed to the complementary anchor by incubating at 50C for 10 minutes while shaking at 900RPM, followed by a slow cool to 16C.
- DNA gel analysis and flow cytometry analysis ofQC aliquots After each step of the expansion process, an aliquot of beads is removed to perform gel analysis of the cycling process. Beads are then resuspended in 20 uL of formamide and incubated for 2 minutes at 98C to elute DNA. After elution, 10 uL of each supernatant is run on a 5% urea gel for visualization. An aliquot of beads is also analyzed using flow cytometry in the 647 channel.
- FIG. 22C shows example data of the flow cytometry and DNA gel analysis through 2 cycles of Example Workflow 2.
- the flow cytometry plot shows a histogram of the output from each of the operations (from left to right: input, conjugation 1, recapture 1, conjugation 2, recapture 2, post-elution 1 , and post-elution 2. This sample order is repeated for 20 and 10% TFA). Approximately 17% of the original material is retained on the bead after one cycle.
- the right-hand panel shows the DNA gel data showing the output from each of the operations (from left to right: input of peptide-oligo, post-ClickP-oligo conjugation, post-ligation, cleavage and recapture, post-ClickP-oligo cycle 2 conjugation, and post cycle 2 ligation, cleavage, and recapture).
- the average efficiency of ClickP-oligo conjugation is 78.6%.
- Magnetic beads are covalently conjugated with an N-terminal amine anchor peptide nucleic acid (“anchor PNA”), as described above.
- anchor PNA N-terminal amine anchor peptide nucleic acid
- the anchor PNA sequence (AGAAGAGAAAGGGAG) is designed to hybridize to a portion of the test peptide-DNA conjugate so that the test peptide-DNA conjugate (representing a peptide polymeric analyte coupled to an oligonucleotide capture moiety) is coupled to the magnetic bead via complementary annealing.
- a DNA oligo “AP604” is synthesized with an internal amine and alkyne moiety (/5Phos/CTTCCCTCCTCCTCCTTTTTCTCCCTTCCTTTTCCCTCCCTTCTCTCCTC/iAmM C6T/TTCCCTC/i50ctdU/CCCTTTCTCTTCT), the internal alkyne (DBCO) is conjugated to an Azide-Atto647N fluorophore, and then the internal amine is conjugated to an NHS-PEG4-DBCO spacer.
- the peptide has the sequence DRGWSMNGQGQ ⁇ Lys(TMR) ⁇ GQGQ ⁇ Lys(N3) ⁇ and is conjugated to the DNA sequence AP604 via the C-terminal azidolysine on the peptide and the internal PEG4-DBCO.
- the resulting peptide-DNA contains an ATTO647N label which allows for imaging of the beads via flow cytometry.
- the peptide-DNA is annealed to the complementary anchor PNA on the bead surface. Annealing is performed by first combining 25 picomoles of peptide-DNA conjugate and 0.5 mg of anchor PNA conjugated beads (Final concentration of 25 picomoles per mg).
- the beads are incubated at 50C on a shaker at 900 RPM for 10 minutes, followed by a slow cool to 16C. Following annealing, the beads are washed with PT buffer (10 mM phosphate pH 8 with 0.01% Tween-20) and then resuspended in PT buffer to 10 mg/mL.
- PT buffer 10 mM phosphate pH 8 with 0.01% Tween-20
- Amino acid expansion step 1 - ClickP-oligo conjugation Beads are resuspended in a conjugation buffer comprising a linker-linking nucleic acid conjugate (“ClickP-oligo”).
- the ClickP comprises an isothiocyanate, which is an amino acid reactive group. Beads are incubated at 62C for 3 hours, then slow cooled to 10C and left overnight. Following conjugation, beads are washed with PT buffer and resuspended in PT to 10 mg/mL.
- Amino acid expansion step 2 -removal from beads, linking DNA circularization, recapture-.
- Beads are pelleted and resuspended in water with 7.5% DMSO.
- the beads are then incubated at 98C for two minutes to elute the peptide-DNA conjugates.
- the supernatant (containing the peptide-DNA conjugates) is collected into a separate tube.
- the original beads are resuspended in 50 uL of PT buffer for later analysis. Ligation is then performed in-solution using an annealing buffer comprising a DNA splint diluted in a ligase circularization solution comprising T4 DNA Ligase and T4 DNA Ligase buffer.
- a portion of the splint is designed to be partially complementary to the 5 ’ end of the peptide-DNA conjugate (cycle 1 ) or the 5 ’ end of the linking DNA oligo used in the preceding cycle (cycles 2 onward); the other half of the splint is designed to be partially complementary to the 3 ’ end of the new linking DNA oligo added in this cycle.
- the ligation is run for 30 minutes at 37C.
- the sample is then incubated at 65 forten minutes to heat kill the T4 ligase.
- a PNK treatment is then performed by spiking T4 PNK enzyme into each reaction to a final concentration of 200 u/mL, followed by a 30-minute incubation at 37C. This treatment converts the hydroxyl on the 5 ’ end of the new linking DNA oligo to a phosphate group to allow for addition of the next linker in future cycles.
- the peptide-DNA conjugates, now circularized and ligated to the linking DNA, are then re-attached to or recaptured onto a new set of the anchor PNA conjugated beads.
- the PNA-conjugated beads are first pelleted and washed in PT buffer (lOmM phosphate buffer and 0.01% Tween-20) andthen added to the reaction with peptide-DNA molecules. After mixing the beads are incubated at 80C on a shaker at 900 RPM for 5 minutes, followed by a slow cool to 16C.
- Cleavage of the N-terminal amino acid results in a modified amino acid that is covalently linked to the linking DNA oligo and the DNA capture moiety.
- the cleavage reaction is cooled and then 100 uL of 2M sodium phosphate pH 7 is added to neutralize the reaction. Following neutralization, the beads are washed with PT buffer (10 mM phosphate pH 8 with 0.01% Tween-20) and then resuspended in PT buffer to 10 mg/mL.
- DNA gel analysis and flow cytometry analysis ofQC aliquots After each step of the expansion process, an aliquot of beads is removed to perform gel analysis of the cycling process. Beads are then resuspended in 20 uL of formamide and incubated for 2 minutes at 98C to elute DNA. After elution, 10 uL of each supernatant is run on a 5% urea gel for visualization. An aliquot of beads is also analyzed using flow cytometry in the 647 channel.
- FIG. 22D shows example data of the flow cytometry and DNA gel analysis through one cycle of Example Workflow 3.
- the flow cytometry plot shows a histogram of the output from each of the operations (from top to bottom: input, conjugation , elution, PNA recapture, cleavage, and negative control beads). After one cycle approximately 40.1% of the original material is retained based on bead intensity compared to the input.
- the right-hand panel shows the DNA gel data showing the output from each of the operations (from left to right: input of peptide-oligo, post-ClickP-oligo conjugation, post-ligation and recapture, post-cleavage).
- gel densitometry analysis it is estimated that the average efficiency of ClickP-oligo conjugation is 42.1%.
- Magnetic beads with the peptide-DNA conjugate are prepared as described in Example Workflow 3, albeit with a different peptide-oligo sequence.
- the peptide-oligo comprises an ATTO647M label which allows for visualization of the beads via flow cytometry. 10 microliters of the bead suspension are prepared for each sample condition.
- RNA displacement oligonucleotide is used to elute or remove the peptide-oligo conjugates from the bead via toehold- mediated strand displacement.
- the RNA displacement oligo is used at a concentration of 100 micromolar in T4 ligase buffer and water. 50 micromolar of the RNA displacement oligo is used based on a previous experiment which demonstrated 95% elution efficiency (data not shown).
- 10 pl of the 2X displacement buffer with the beads are combined to form a IX displacement buffer with beads. The beads are incubated at 98°C for 2 minutes followed by 80°C for 5 minutes. The beads are then transferred to a heater shaker and incubated at 37°C for 15 minutes while shaking
- One well for displacement control is set aside to analyze via flow cytometry.
- RNA displacement oligonucleotide is digested using RNase to promote reannealing (recapture) of the peptide-oligo conjugates to the bead.
- RNAse A and RNAsel are both tested at varying input volumes (5, 1, 0.2, 0.04, 0.008, and 0 (control) microliters).
- the RNAseA reactions are conducted at 65C for 30 mins and RNAsel reactions at 37C for 30 mins on the heater shaker. After incubating, 0.5 pl of 500 mMEDTA & 180 pl of water are added to all reactions to achieve a 1 .25 mM final EDTA concentration.
- the samples are then vortexed and then incubated at 37C while shaking in order to recapture the peptide-oligo conjugates. All reactions are washed twice with 100 pl of PT buffer, followed by pelleting the beads and removing the supernatant after each wash. The beads are resuspended in 100 pl of PT buffer.
- FIG. 22E shows the data from the flow cytometer in which the fluorescence of the
- Atto647 oligo is measured on the bead. All values are normalized by subtracting the value of the autofluorescence of the PNA beads with no oligo hybridized onto them. Conditions with no RNAse added also serve as a displacement control in which -99% of all the peptide oligo conjugates is removed from the beads.
- RNAsel has similar performance to RNAseA. The optimal volume of RNAseA andRNAsel is 1 pl for a 20 pl reaction and yields -94% recapture efficiency.
- intramolecular circularization may occur during the intramolecular expansion process, e.g., ligation of a polymerizable molecule (e.g., linking nucleic acid molecule) to the capture moiety or to a modified amino acid comprising a cleaved amino acid, linker, and polymerizable molecule from a previous round/cycle.
- a polymerizable molecule e.g., linking nucleic acid molecule
- intermolecular crosstalk e.g., ligation of a polymerizable molecule coupled to one polymeric analyte to a capture moiety of an adjacent polymeric analyte
- hairpin splint oligonucleotide e.g., ligation of a polymerizable molecule coupled to one polymeric analyte to a capture moiety of an adjacent polymeric analyte
- the four oligos with an internal azide moiety are (CTCTTTTTCCTCAGCGCTAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGCTT/i AzideN/CTCCTCCTCTTCT, CTCTTTTTCCTCGATATCGAAGATCGGAAGAGCGTCGTG TAGGGAAAGAGTGCTT/iAzideN/CTCCTCCTCTTCT, CTCTTTTTCCTCCGCAGACGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGCTT/iA zideN/CTCCTCCTCTTCT, CTCTTTTTCCTCTATGAGTAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGCTT/iA zideN/CTCCTCCTCTTCT)
- DBCO oligo which are each conjugated to a corresponding barcoded 5’ DBCO oligo (/5DBC0N/CCTGTGACTGGAGTTCAGACGTGTAACCGCGGTCTCCTTT, /5DBC0N/CCTGTGACTGGAGTTCAGACGTGTGGTTATAATCTCCTTT, /5DBCON/CCTGTGACTGGAGTTCAGACGTGTCCAAGTCCTCTCCTTT, /5DBCON/CCTGTGACTGGAGTTCAGACGTGTTTGGACTTTCTCTCCTTT) using a copper free click reaction, and corresponding Azide complementary oligos (AAGCACTCTTTCCCTACACGACGCTCTTCCGATCTCTAGCGCT, AAGCACTCTTTCCCTACACGACGCTCTTCCGATCTTCGATATC, AAGCACTCTTTCCCTACACGACGCTCTTCCGATCTCTGCG, AAGCACTCTTTCCCTACACGACGCTCTTCCGATCTCTGCG, AAGCACTCTTTCCCTACACGACGCTCTT
- Magnetic beads that have been covalently conjugated with anchor DNA are coupled to the oligonucleotide constructs (described in previous section) via splinted ligation.
- the oligonucleotide construct is covalently linked to the anchor DNA on the bead surface with T4 ligase. Ligation is performed for 30 minutes at room temperature and includes 3. 125 picomoles of the oligo construct pool for every 0.25 mg of anchor DNA conjugated beads, along with complementary DNA sequences to support ligation.
- Beads are pelleted and resuspended in a removal buffer to remove the oligonucleotide construct from the bead.
- the removal buffer comprises FastDigest Buffer and Earl enzyme.
- the beads are incubated at 37C on a shaker at 900 RPM for 60 minutes.
- the supernatant (containing the oligonucleotide construct) is collected.
- Ligation is performed in solution by incubating in a ligase circularization solution containing T4 ligase, T4 DNA ligase buffer, T4 PNK, and a DNA splint (either linear or hairpin).
- a portion of the splint is designed to be partially complementary to the 5’ end of the azide containingDNA oligo, and a different portion of the splint is designed to be complementary to the 3 ’ end of the 5’ DBCO oligo.
- the ligation is run at 37 C for 30 minutes, and the solution is incubated at 65 C for 10 minutes in order to heat kill the T4 ligase and T4 PNK.
- the solution is purified using a Zymo Oligo Clean and Concentrator Kit, and is eluted in nuclease free water.
- the indexed PCR products from each reaction condition are then pooled into a library and cleaned using a Zymo DNA clean and concentrator kit.
- the cleaned PCR library is then prepared for nanopore sequencing using the ONT Ligation Sequencing Kit and sequenced using an ONT MinlON nanopore sequencer.
- FIG. 23 shows data from nanopore sequencing used to determine intermolecular crosstalk rate in the peptide-free model system. Due to the barcode pairs in the oligonucleotide constructs, a PCR product containing a matching barcode pair indicates intramolecular circularization and ligation, while a PCR product containing a mismatch of barcode pairs indicates intermolecular crosstalk.
- GAGGAAAAAGAGAAAGGAGACGTGACttttttGTCACG GAGGAAAAAGAGAAAGGAGACGTGACttttttGTCACG
- one control condition without a splint are 32.6%, 20.1%, 22.6%, 23.5%, 24.2%, 15.1%, 19.0%, and32.5%, respectively.
- the data shows that the splint AP665 (a splint containing the hairpin loop sequence but no stem) results in the lowest crosstalk rate of 15.087%, and that all the hairpin splints result in a lower crosstalk rate than the AP337 linear splint.
- AP666 (the splint containing a 6-base stem) also has a low crosstalk rate of 18.950%
- the control condition with no splint resulted in only 1551 mapped reads, as compared to >40,000 reads for all other conditions, suggesting a noise floor of ⁇ 1500 reads.
- Embodiment 1 A method for sequencing at sub-atomole resolution a peptide comprising a plurality of amino acids, comprising:
- Embodiment 2 The method of embodiment 1, wherein the individual identification accuracy is greater than 80% for at least 2 different modified amino acid types of the 20 proteinogenic amino acid types.
- Embodiment 3 The method of embodiment 1, wherein said plurality of polymerizable molecules is covalently linked.
- Embodiment 4 The method of any one of embodiments 1-3, wherein said plurality of polymerizable molecules comprises a plurality of nucleic acid molecules.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Hematology (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Microbiology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Cell Biology (AREA)
- Nanotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Sont divulgués des procédés de séquençage à haute précision d'analytes polymères, tels que des peptides, à l'aide de nanopores ou de nanoespaces. Un procédé de la présente divulgation peut comprendre la génération d'un acide aminé modifié à partir d'un acide aminé du peptide, la translocation de l'acide aminé modifié à travers ou adjacent à un nanopore, et l'identification de l'acide aminé modifié.
Applications Claiming Priority (8)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463627214P | 2024-01-31 | 2024-01-31 | |
| US63/627,214 | 2024-01-31 | ||
| US202463558344P | 2024-02-27 | 2024-02-27 | |
| US63/558,344 | 2024-02-27 | ||
| US202463683941P | 2024-08-16 | 2024-08-16 | |
| US63/683,941 | 2024-08-16 | ||
| US202463694299P | 2024-09-13 | 2024-09-13 | |
| US63/694,299 | 2024-09-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025166050A1 true WO2025166050A1 (fr) | 2025-08-07 |
Family
ID=94924945
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/013852 Pending WO2025166050A1 (fr) | 2024-01-31 | 2025-01-30 | Séquençage à base de nanopores de peptides |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025166050A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180298436A1 (en) * | 2015-10-30 | 2018-10-18 | Universal Sequencing Technology Corporation | Methods and Systems for Controlling DNA, RNA and Other Biological Molecules Passing Through Nanopores |
| WO2019195633A1 (fr) | 2018-04-04 | 2019-10-10 | Ignite Biosciences, Inc. | Procédés de génération de nanoréseaux et de microréseaux |
| US20200217853A1 (en) | 2019-01-08 | 2020-07-09 | Massachusetts Institute Of Technology | Single-Molecule Protein and Peptide Sequencing |
| WO2023114732A2 (fr) | 2021-12-14 | 2023-06-22 | Glyphic Biotechnologies, Inc. | Séquençage de peptides à molécule unique par codage à barres moléculaires et analyse ex-situ |
-
2025
- 2025-01-30 WO PCT/US2025/013852 patent/WO2025166050A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180298436A1 (en) * | 2015-10-30 | 2018-10-18 | Universal Sequencing Technology Corporation | Methods and Systems for Controlling DNA, RNA and Other Biological Molecules Passing Through Nanopores |
| WO2019195633A1 (fr) | 2018-04-04 | 2019-10-10 | Ignite Biosciences, Inc. | Procédés de génération de nanoréseaux et de microréseaux |
| US20200217853A1 (en) | 2019-01-08 | 2020-07-09 | Massachusetts Institute Of Technology | Single-Molecule Protein and Peptide Sequencing |
| US11499979B2 (en) | 2019-01-08 | 2022-11-15 | Massachusetts Institute Of Technology | Single-molecule protein and peptide sequencing |
| WO2023114732A2 (fr) | 2021-12-14 | 2023-06-22 | Glyphic Biotechnologies, Inc. | Séquençage de peptides à molécule unique par codage à barres moléculaires et analyse ex-situ |
Non-Patent Citations (22)
| Title |
|---|
| BIRD ET AL., SCIENCE, vol. 242, 1988, pages 423 - 426 |
| BLOOM ET AL., NATURE CHEMISTRY, vol. 10, 2018, pages 205 - 211 |
| CHERF ET AL., NAT. BIOTECHNOL., vol. 30, no. 4, 2012, pages 344 - 348 |
| HOOD ET AL.: "Immunology", 1984, BENJAMIN |
| HUNKAPILLERHOOD, NATURE, vol. 323, 1986, pages 15 - 16 |
| HUSTON ET AL., PROC. NATL. ACAD. SCI. U. S.A., vol. 85, 1988, pages 5879 - 5883 |
| K. MOTONE ET AL.: "Multi-pass, single-molecule nanopore reading of long protein strands", NATURE, 2024 |
| KASIANOWICZ, NATURE MATERIALS, vol. 3, 2004, pages 355 - 356 |
| KRISHNA ET AL., BIOPOLYMERS, vol. 94, no. 1, 2010, pages 32 - 48 |
| LANZAVECCHIAET, EUR. J. IMMUNOL., vol. 17, 1987, pages 105 |
| LIU ET AL., SMALL, vol. 16, no. 3, 2020, pages e1905379 |
| MARGOLIS ET AL., JOURNAL OF AUTOMATIC CHEMISTRY, vol. 13, no. 3, 1991, pages 93 - 95 |
| N. KJ ÆRSGAARD ET AL., CHEMBIOCHEM, vol. 23, 2022, pages e202200245 |
| NOOKAEW ET AL., CHEMICAL RESEARCH. IN TOXICOLOGY, vol. 33, no. 12, 2020, pages 2944 - 2952 |
| SCIENCE, vol. 311, 2006, pages 1544 - 1546 |
| WANG ET AL., NATURE METHODS, vol. 21, 2023, pages 92 - 101 |
| XIE ET AL., LANGMUIR, vol. 38, no. 30, 2022, pages 9119 - 9128 |
| XU ET AL., ACS CHEM BIOL., vol. 6, no. 10, 21 October 2011 (2011-10-21), pages 1015 - 1020 |
| Z. LIU ET AL., CHEMISTRY A EUROPEAN JOURNAL, vol. 29, 2023 |
| ZHANG ET AL., ACS CHEM. BIOL., vol. 16, no. 11, 2021, pages 2595 - 2603 |
| ZHU ET AL., ACS CATAL., vol. 12, no. 13, 2022, pages 8019 - 8026 |
| ZHU ET AL., CHINESE CHEMICAL LETTERS, vol. 29, 2018, pages 1116 - 1118 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7333975B2 (ja) | 核酸エンコーディングを使用した巨大分子解析 | |
| US12292446B2 (en) | Kits for analysis using nucleic acid encoding and/or label | |
| JP7253833B2 (ja) | 核酸エンコーディングおよび/または標識を使用する方法およびキット | |
| US20250155447A1 (en) | Methods and systems for processing polymeric analytes | |
| US20240409995A1 (en) | Single-molecule peptide sequencing through molecular barcoding and ex-situ analysis | |
| US12399180B2 (en) | Protein sequencing via coupling of polymerizable molecules | |
| US20250327811A1 (en) | Single-molecule peptide sequencing using dithioester and thiocarbamoyl amino acid reactive groups | |
| WO2024159162A1 (fr) | Séquençage de peptides à molécule unique à l'aide d'agents de guanidinylation | |
| WO2025166050A1 (fr) | Séquençage à base de nanopores de peptides | |
| US20250188538A1 (en) | Single-molecule peptide sequencing using xanthate amino acid reactive groups | |
| WO2025147587A1 (fr) | Séquençage de protéines à l'aide d'une imagerie à super-résolution | |
| CN118679389A (zh) | 通过分子条形码化和非原位分析的单分子肽测序 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25710577 Country of ref document: EP Kind code of ref document: A1 |