WO2025240502A1 - High-throughput analysis of n-linked glycosylation site occupancy in proteins and peptides - Google Patents
High-throughput analysis of n-linked glycosylation site occupancy in proteins and peptidesInfo
- Publication number
- WO2025240502A1 WO2025240502A1 PCT/US2025/029176 US2025029176W WO2025240502A1 WO 2025240502 A1 WO2025240502 A1 WO 2025240502A1 US 2025029176 W US2025029176 W US 2025029176W WO 2025240502 A1 WO2025240502 A1 WO 2025240502A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polypeptide
- binder
- residue
- immobilized
- amino acid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/25—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving enzymes not classifiable in groups C12Q1/26 - C12Q1/66
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/58—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances
Definitions
- the present disclosure generally relates to biotechnology, in particular to methods for analysis of proteins and peptides, in particular, analysis of occupancy of N-linked glycosylation sites in proteins and peptides.
- the disclosure finds utility at large-scale profiling of N-linked glycosylation sites, as well as monitoring changes in glycosylation patterns associated with numerous disease conditions.
- N-linked glycosylation also called N-glycosylation
- changes in N-linked glycosylation involve changes where the same site on a protein may or may not be glycosylated.
- This variability in glycosylation at specific sites within a protein can significantly impact its biological activity.
- N- glycosylation sites in proteins may experience changes in their glycosylation patterns, such as when they lose or gain a glycan’s attachment.
- variations in N- glycosylation site occupancy within the same protein can be associated with numerous diseases, including cancer and certain genetic disorders, because such changes can affect the structure, stability, and function of proteins.
- N-glycosylation of alpha-1- antitrypsin affects its function as a protease inhibitor, impacting disease progression of liver diseases (McCarthy C, et al., The role and importance of glycosylation of acute phase proteins with focus on alpha- 1 antitrypsin in acute and chronic inflammatory conditions. J Proteome Res. 2014 Jul 3; 13(7):3131-43). N-glycans have been suggested to have a major role in preventing the impairment of glucose-stimulated insulin secretion by modulating cell surface expression of glucose transporters (Stambuk T, Gornik O. Protein Glycosylation in Diabetes. Adv Exp Med Biol. 2021;1325:285-305).
- N-glycosylation site occupancy in specific proteins is important for understanding glycoprotein function, as well as for disease research and drug development, as N-glycosylation site occupancy could be affected during a disease condition or modified by an applied drug.
- the degree of N-glycosylation site occupancy by itself may correlate with progression or severity of the disease.
- Mass spectrometry has been used to analyze N- glycosylation site occupancy, where both label-free and labeling methods are applied (Zhu Z, Go EP, Desaire H. Absolute quantitation of glycosylation site occupancy using isotopically labeled standards and LC-MS. J Am Soc Mass Spectrom.
- Labeling methodologies include SILAC (stable isotope labeling by amino acids in cell culture), TMT (tandem mass tags), and iTRAQ (isobaric tags for absolute and relative quantification).
- SILAC stable isotope labeling by amino acids in cell culture
- TMT tandem mass tags
- iTRAQ isobaric tags for absolute and relative quantification
- the present disclosure describes sensitive and reliable analytical methods to identify occupation of N-glycosylation sites in multiple proteins that may be originated from different biological sample, providing a large-scale N-linked glycoproteomic analysis. Attempts to identify in vivo N-glycosylation sites on a proteome level has been reported, mapping 6367 N- glycosylation sites on 2352 proteins in four mouse tissues and blood plasma using high-accuracy mass spectrometry (Zielinska DF, et al., Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell. 2010 May 28;141(5):897-907).
- N-X-S/T-X The vast majority of sites have a consensus sequence motif N-X-S/T-X, where N is an asparagine residue and X is not a proline residue.
- N an asparagine residue
- X is not a proline residue.
- evolutionary conservation of both solvent-exposed glycosylated asparagine residues and the canonical asparagine glycosylation motif sequences was observed (Park C, Zhang J. Genome-wide evolutionary conservation of N-glycosylation sites. Mol Biol Evol. 2011 Aug;28(8):2351-7), demonstrating functional importance ofN- glycosylation and highlighting diagnostic potential for high-throughput N-glycosylation monitoring.
- the present disclosure utilizes previously reported techniques for high-throughput polypeptide analysis that involve molecular barcoding, use of binders that bind to specific terminal amino acid residues and encoding of specific binding events (see, e.g., US 11513126 B2, US 2023/0136966 Al, US 2023/0054691 Al, US 2019/0145982 Al, each incorporated herein by reference).
- NGPS next-generation peptide sequencing
- each cycle adds a barcode identifying corresponding binder in the extended recording tag attached to the polypeptide.
- sequencing of the recording tag extended after several rounds of encoding allows to identify specific binders that were bound to each terminal amino acid residue formed at the beginning of each encoding cycle. If specificities of binders utilized in the assay are known, then identities of each of the encoded terminal amino acid residues on the polypeptide may be predicted with a certain probability, and the identity of the polypeptide can be derived by matching amino acid sequence variants predicted from the encoding assay to a theoretical collection of peptides potentially present in the sample, which can be obtained from a genomic or proteomic database.
- the described stepwise encoding of sequential amino acid residues at the N-terminus and/or C-terminus of a polypeptide can be adopted to enable high-throughput N-glycosylation site occupancy detection.
- the present disclosure utilizes treatment of polypeptides with a PNGase enzyme, a deglycosylating enzyme that hydrolyzes N-linked glycan moieties from glycopolypeptides and yield polypeptides containing aspartic acid residues in place of the original asparagine residues effectively changing the glycan attachment residue of N-linked glycosylation sites of polypeptides (see, e.g., Kuhn P, et al., Active site and oligosaccharide recognition residues of peptide-N4-(N-acetyl-beta-D-glucosaminyl)asparagine amidase F.
- Amino acid identity of the glycan attachment residue of N-linked glycosylation sites may be determined following treatment with the PNGase enzyme by utilizing two binders, such as a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue, and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue.
- amino acid identity of the glycan attachment residue may be determined following treatment with the PNGase enzyme by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue.
- the described approach can be performed in a very high-throughput manner (e.g., for thousands or millions of polypeptides in a single assay), and may be combined with high-throughput polypeptide identification, because the same binders may be utilized to determine amino acid sequence information regarding the analyzed polypeptides.
- a particular N-glycosylation site is partially occupied by a glycan molecule (i.e., some polypeptide molecules are glycosylated at this site while other molecules are not glycosylated at this site)
- treatment with the PNGase enzyme will produce a heterogenous population of polypeptide molecules, where the previously non-glycosylated polypeptide molecules will still contain original asparagine residues at the analyzed glycosylation site(s), while the previously glycosylated molecules will contain aspartic acid residues at the analyzed glycosylation site(s) generated after the PNGase treatment.
- multiple polypeptides are formed by digesting proteins from one or more biological samples, followed by treatment of the polypeptides with the PNGase enzyme and the NGPS analysis of the polypeptides that utilizes binders specific for “N” and “D” residues (referred to as N-binder and D-binder).
- the attachment residue of the N- linked glycan within the N-linked glycosylation site i.e., the original “N” residue
- a binder is able to preferentially binds to an Asn (N) residue over an Asp (D) residue at the attachment site.
- the attachment residue of the N-linked glycan within the N-linked glycosylation site is a terminal residue of a polypeptide analyte, and binders that specifically bind to terminal “N” or “D” residues are utilized.
- sequential cleavage of terminal residues of a polypeptide analyte is utilized which eventually exposes the N/D residue within the analyzed N- glycosylation site treated with PNGase to become a terminal amino acid residue, and its identity may be determined using available N-binder and D-binder.
- One particular advantage of the disclosed methods is that when sequential cleavage of terminal residues of a polypeptide analyte is utilized, this would allow to combine identification of a polypeptide (e.g., determining polypeptide amino acid sequence through probabilistic identification of individual terminal amino acid residues) with detecting the presence of a glycan in a particular N-glycosylation site, thereby achieving high-throughput polypeptide identification and characterization.
- the described methods can be used to quantify the percentage of polypeptide molecules glycosylated at a particular site by comparing encoding yield for N-binder and D- binder to encode the N or D residue within the analyzed N-glycosylation site in a population of the polypeptide molecules.
- the described process can be applied for analysis of at least 1000, 10000, 100000 or more individual polypeptides simultaneously, which allows to analyze N-glycosylation site occupancy at a proteome level.
- the high multiplexing feature of the described process makes it superior in comparison with mass spectrometry-based glycosylation identification.
- a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide comprising:
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and (c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by
- a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide comprising:
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and (d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and clea
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn
- two binders are needed for implementation of the disclosed approach: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site, and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site.
- a method for detecting N-linked glycosylation of polypeptides comprising a first polypeptide having a N-linked glycosylation site, the method comprising:
- each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
- the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide;
- step (c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
- a polypeptide comprising a N-linked glycosylation site comprising:
- each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder
- the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
- the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue may be obtained or retained depending on particular way of detecting the detectable labels of the binders.
- the detectable label of the binder is detected in situ upon the binding to the glycan attachment residue of a polypeptide treated with PNGase.
- detecting in situ upon binding allows to obtain the identifying information regarding the binder, which may be stored and utilized later to decode amino acid identity of the glycan attachment residue.
- the detectable label of the binder comprises a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
- the encoder barcode may be analyzed in situ (at the polypeptide location) such as by sequencing in situ or in situ amplification.
- the encoder barcode sequence information is transferred between the nucleic acid coding tag of the binder and a recording tag attached to the analyzed polypeptide.
- such transfer of nucleic acid information generates an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the binder and nucleic acid sequence information of the nucleic acid recording tag attached to the analyzed polypeptide.
- such transfer of nucleic acid information allows the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue to be retained, e.g., in the extended nucleic acid construct.
- the extended nucleic acid construct is an extended recording tag attached to the analyzed polypeptide or an extended coding tag attached to the binder.
- the extended nucleic acid construct is collected after each cycle of nucleic acid sequence information transfer.
- the extended nucleic acid construct is not collected after each cycle of nucleic acid sequence information transfer (e.g., after each cycle, the extended nucleic acid construct retains the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue), but instead collected and analyzed by a nucleic acid sequencing method after completion of several cycles of nucleic acid sequence information transfer.
- several cycles of nucleic acid sequence information transfer generates an extended recording tag attached to the analyzed polypeptide or fragment thereof (if the polypeptide is cleaved during the assay) which is then analyzed by a nucleic acid sequencing method to decode identities of binders that were bound to the analyzed polypeptide during sequence information transfer cycles. Accordingly, the identities of binders may then be used to obtain amino acid sequence information of the analyzed polypeptide which also includes assessment of occupancy of N- linked glycosylation sites of the analyzed polypeptide based on the methods disclosed herein.
- FIG. 1 depicts an exemplary polypeptide sequencing assay with terminal amino acid (TAA)-specific binders.
- FIG. 2 depicts an exemplary variation of the polypeptide sequencing assay with terminal amino acid (TAA)-specific binders as described in FIG. 1.
- FIG. 3 depicts another exemplary polypeptide sequencing assay with terminal amino acid (TAA)-specific binders each conjugated with a specific detectable label.
- TAA terminal amino acid
- FIG. 4 depicts a native dipeptidyl carboxypeptidase (DCP) and FIG. 5 depicts exemplary design of an engineered C-terminal Cleavase (Clv-C).
- Native dipeptidyl carboxypeptidase DCP can be engineered to cleave a C-terminal dipeptide from a peptide modified at the C-terminus with an amino acid-like label.
- the pl ’ residue and the added X-label act as a dipeptide unit cleavable by an engineered Clv-C.
- the residues in Clv-C enzyme that normally bind the COOH are engineered to bind the unnatural amino acid and label. Possible modifications include amide, acetylation and others.
- FIG. 6 depicts exemplary encoding reactions for two test peptides (SEQ ID NO: 25 and SEQ ID NO: 26) using a set of N-binder (SEQ ID NO: 2) and D binder (SEQ ID NO: 6). Fractions of encoded recording tags were evaluated by NGS and showed specific encoding results for both binders.
- FIG. 7 Exemplary result of the NGPS peptide sequencing assay using a selected plasma protein (HSA) and a 7-binder mix as described in Example 4. Particular reads (e.g., LZNNN, LZLYY, etc.) indicated on the x axis were each assigned to a particular peptide indicated after each read (separated by ::ALB:).
- HSA plasma protein
- FIG. 8 depicts trypsin generated fragments of Haptoglobin, which are known to be glycosylated.
- the N amino acid residues shown in bold are known to be glycosylated, and they converted to D residues by the PNGase F treatment.
- FIG. 9 shows experimental design to address spontaneous deamidation of asparagine residues at N-glycosylated sites for quantification of glycosylation (see Example 10).
- FIG. 10 depicts an exemplary variation of the approach for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, which includes analysis of the attachment amino acid residue of the polypeptide via specific binding agents, wherein the amino acid identity of the amino acid residue of the polypeptide treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag of the binder that binds to the attachment amino acid residue. The signal may be recorded providing information regarding the polypeptide analyte at a location on a support. A nucleic acid hairpin attached to the polypeptide is attached to the support.
- a restriction enzyme is used to create a recording tag with 3 ’ overhang attached to the polypeptide and to the support.
- the recording tag - polypeptide conjugate is contacted with a binder attached to a circularized coding tag comprising the encoder barcode (Binder ID).
- Binder ID the encoder barcode
- a portion of the recording tag is hybridized to a complementary region within the circularized coding tag.
- This formed doublestranded region is used by the phi29 polymerase to initiate a rolling circle amplification (RCA) reaction to amplify the coding tag which includes the encoder barcode followed by detection of amplified copies of the encoder barcode in situ by fluorescently labeled probes.
- RCA rolling circle amplification
- the amplified structure is then released via the restriction enzyme cut which re-generates the recording tag with 3’ overhang.
- the double-stranded region used to initiate the RCA reaction may be generated by introducing a recognition site for a Type IIS restriction enzyme together with a recognition site for a nicking enzyme into the recording tag.
- FIG. 11 depicts exemplary approach using a plurality of fluorescently labeled probes for detection of barcode sequences within nucleic acid structures amplified in situ by methods described in FIG. 10. Standard methods known in the art may be used to detect fluorescently labeled probes attached to barcode sequences followed by signal recording.
- analyte refers to “polypeptide analyte”, and at least partial amino acid sequence, identity and/or specific feature (e.g., a presence of N-glycosylation) of the polypeptide analyte are determined by the methods disclosed herein. In some embodiments, one, two or more amino acid residues of a polypeptide analyte each are individually determined with a certain probability, which may be sufficient for determining identity or a feature of the polypeptide analyte. Polypeptide analyte are substrates of specific binders disclosed herein.
- sample refers to anything which may contain an analyte for which an analyte assay is desired.
- a “sample” can be a solution, a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination thereof.
- the sample may be a biological sample, such as a biological fluid or a biological tissue. Examples of biological fluids include urine, blood, plasma, serum, saliva, semen, stool, sputum, cerebral spinal fluid, tears, mucus, amniotic fluid or the like.
- macromolecule encompasses large molecules composed of smaller subunits.
- macromolecules include, but are not limited to peptides, polypeptides, proteins, nucleic acids, carbohydrates, lipids, macrocycles, or a combination or complex thereof.
- polypeptide is used interchangeably with the term “peptide” and encompasses peptides and proteins, referring to a molecule comprising a chain of three or more amino acids joined by peptide bonds.
- a polypeptide comprises 3 to 50 amino acid residues.
- a peptide does not comprise a secondary, tertiary, or higher structure.
- the polypeptide is a protein.
- a protein comprises 30 or more amino acids, e.g., having more than 50 amino acids.
- a protein in addition to a primary structure, a protein comprises a secondary, tertiary, or higher structure.
- the amino acids of the polypeptides are most typically L-amino acids, but may also be D-amino acids, modified amino acids, amino acid analogs, amino acid mimetics, or any combination thereof.
- Polypeptides may be naturally occurring, synthetically produced, or recombinantly expressed. Polypeptides may be synthetically produced, isolated, recombinantly expressed, or be produced by a combination of methodologies as described above. Polypeptides may also comprise additional groups modifying the amino acid chain, for example, functional groups added via post-translational modification.
- the polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids.
- the term also encompasses an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a detectable label.
- amino acid refers to an organic compound, which serves as a monomeric subunit of a peptide.
- An amino acid includes the 20 standard, naturally occurring or canonical amino acids as well as non-standard amino acids.
- the standard, naturally-occurring amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), and Tyrosine (Y or Tyr).
- amino acid may be an L-amino acid or a D-amino acid.
- Non-standard amino acids may be modified amino acids, amino acid analogs, amino acid mimetics, non-standard proteinogenic amino acids, or non-proteinogenic amino acids that occur naturally or are chemically synthesized.
- binding refers to a nucleic acid molecule, a polypeptide, a protein, or a small molecule that binds to, associates, unites with, recognizes, or combines with a binding target, e.g., a component of a polypeptide.
- a binder may form a covalent association or non-covalent association with the component of a polypeptide to which it binds.
- a binder may also be a chimeric binder, composed of two or more types of molecules, such as a nucleic acid molecule-peptide chimeric binder.
- a binder may be a naturally occurring, synthetically produced, or recombinantly expressed molecule.
- a binder binds to a single monomer or subunit of a polypeptide, such as a single terminal amino acid of a polypeptide.
- a binder may bind to an N-terminal amino acid residue or a C-terminal amino acid residue of a polypeptide.
- a binder may preferably bind to a chemically modified or labeled terminal amino acid residue (e.g., an amino acid that has been labeled or modified by a modifier agent, such as an N-terminal modifier agent) over an unlabeled or non-modified amino acid residue.
- a binder may exhibit selective binding to a component of a polypeptide e.g., a binder may selectively bind to one of the 20 possible natural amino acid residues and bind with very low affinity or not at all to the other 19 natural amino acid residues).
- a binder may exhibit less selective binding, where the binder is capable of binding or configured to bind to a plurality of components of a polypeptide (e.g., a binder may bind with similar affinity to two or more different terminal amino acid residues).
- a binder may be attached to a detectable label or a coding tag, which may be joined to the binder by a linker.
- an engineered binder specifically binds to a particular target moiety (e.g., a TAA or a modified TAA) more readily than it would bind to a random target moiety (e.g., there is a detectable relative increase in the binding of the binder to a specific target moiety or to a group of target moieties (e.g., a group of TAA residues)).
- a particular target moiety e.g., a TAA or a modified TAA
- a group of target moieties e.g., a group of TAA residues
- binders in the set of binders have medium specificity towards a particular target moiety such that a binder binds more than one target moiety and there is a significant probability of incorrect moiety identification based on a single encoding event.
- an engineered binder binds to a cognate target moiety at least twice more likely that to a random, non-cognate target moiety (a 2:1 ratio of specific to non-specific binding).
- Non-specific binding refers to background binding, and is the amount of signal that is produced in a binding assay between an engineered binder and a non-cognate target moiety.
- specific binding refers to binding between an engineered binder and a cognate target moiety (e.g., a TAA or a modified TAA) with a dissociation constant (Kd) of 500 nM or less.
- Kd dissociation constant
- the term “selectivity” refers to the ability of a binder to preferentially bind to one or to several terminal amino acid residues of a peptide analyte, optionally modified with a chemical modification.
- “selectivity” describes preferential binding of a binder to a single NTAA or CTAA residue, or to a small group of NTAA or CTAA residues (e.g., structurally related residues).
- a binder may exhibit selective binding to a particular terminal amino acid residue.
- a binder may exhibit selective binding to a particular class or type of terminal amino acid residues.
- a binder may exhibit particular binding kinetics (e.g., higher association rate constant and/or lower dissociation rate constant) to a particular class or type of terminal amino acid residues or modified terminal amino acid residues, compared to other terminal amino acid residues or modified terminal amino acid residues.
- selectivity of each binder towards specific NTAA or CTAA resides of peptide analytes is determined in advance, before performing contacting steps of the disclosed methods.
- linker refers to one or more of a nucleotide, a nucleotide analog, an amino acid, a peptide, a polypeptide, a polymer, or a non-nucleotide chemical moiety that is used to join two molecules.
- a linker may be used to join a binder with a coding tag, a recording tag with a polypeptide, a polypeptide with a support, a recording tag with a solid support, etc.
- a linker joins two molecules via enzymatic reaction or chemistry reaction (e.g., click chemistry).
- barcode refers to a nucleic acid molecule of about 2 to about 30 bases (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 bases) providing a unique identifier tag or origin information for a polypeptide, a binder, a set of binders from one encoding cycle (when sets are changed between cycles), a sample polypeptides, a set of samples, or polypeptides within a compartment (e.g., droplet, bead, or separated location).
- a barcode can be an artificial sequence or a naturally occurring sequence. In certain embodiments, each barcode within a population of barcodes is different.
- the term “detectable label” or “identifying detectable label” refers to a substance which can indicate the presence of another substance when associated with it.
- the detectable label can be a substance that is linked to or incorporated into the substance to be detected.
- a detectable label is suitable for allowing for detection and also quantification, for example, a detectable label that emitting a detectable and measurable signal.
- Detectable labels include any labels that can be attached to binders and are compatible with the provided methods and include, but not limited to, a bioluminescent label, a biotin/avidin label, a chemiluminescent label, a nucleic acid tag comprising a UMI and/or a barcode, a chromophore, a coenzyme, a dye, an electro-active group, an electrochemiluminescent label, an enzymatic label (e.g.
- a detectable label is nucleic acid tag that comprises a UMI and/or a barcode, and which can be detected via nucleic acid sequencing. In other embodiments, a detectable label does not comprise a nucleic acid tag.
- coding tag refers to a polynucleotide or a sequenceable polymer molecule (see, e.g., Niu et al., 2013, Nat. Chem. 5:282-292; Roy et al., 2015, Nat. Commun. 6:7237; Lutz, 2015, Macromolecules 48:4759-4767; each of which are incorporated by reference), having any suitable length, e.g., a nucleic acid molecule of about 2 bases to about 100 bases, including any integer including 2 and 100 and in between, that comprises identifying information for its associated binder.
- a coding tag may comprise an encoder sequence (e.g., barcode that comprises identifying information regarding the binder), which is optionally flanked by one spacer on one side or optionally flanked by a spacer on each side.
- a coding tag may also be comprised of an optional UMI and/or an optional encoding cycle-specific barcode.
- a coding tag may be single stranded or double-stranded.
- a doublestranded coding tag may comprise blunt ends, overhanging ends, or both.
- a coding tag may refer to the coding tag that is directly attached to a binder, to a complementary sequence hybridized to the coding tag directly attached to a binder (e.g., for double-stranded coding tags), or to coding tag information present in an extended recording tag.
- the term “recording tag” or “RT” refers to a nucleic acid molecule, or a sequenceable polymer molecule (see, e.g., Niu et al., 2013, Nat. Chem. 5:282-292; Roy et al., 2015, Nat. Commun. 6:7237; Lutz, 2015, Macromolecules 48:4759-4767; each of which are incorporated by reference) to which identifying information of a coding tag can be transferred, or from which identifying information about the polypeptide (e.g., UMI information) attached to the recording tag can be transferred to the coding tag.
- identifying information of a coding tag can be transferred, or from which identifying information about the polypeptide (e.g., UMI information) attached to the recording tag can be transferred to the coding tag.
- Identifying information can comprise any information characterizing a molecule such as information pertaining to identity, partition, spatial location, cycle number, etc. Additionally, the presence of UMI information can also be classified as identifying information.
- information from a coding tag attached to the binder can be transferred to the recording tag attached to the polypeptide while the binder is bound to the polypeptide.
- a recording tag may be directly linked to a polypeptide, linked to a polypeptide via a linker, or attached to a polypeptide by virtue of its proximity (or co-localization) on a support.
- a recording tag may be linked via its 5 ’ end or 3 ’ end or at an internal site, as long as the linkage is compatible with the method used to transfer coding tag information to the recording tag or vice versa.
- a recording tag may optionally comprise other functional components, e.g., a universal priming site, unique molecular identifier, a barcode (e.g., a sample barcode, a fraction barcode, spatial barcode, a compartment tag, etc.), a spacer sequence that is complementary to a spacer sequence of a coding tag, or any combination thereof.
- the spacer sequence of a recording tag is optionally at the 3 ’-end of the recording tag in embodiments where polymerase extension is used to transfer coding tag information to the recording tag.
- spacer refers to a nucleic acid molecule of about 1 base to about 20 bases (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 bases) in length that is present on a terminus of a recording tag or coding tag.
- a spacer sequence flanks an encoder sequence of a coding tag on one end or both ends.
- spacer sequences within a set of binders possess the same number of bases.
- a common (shared or identical) spacer may be used in a set of binders.
- a spacer sequence may have a “cycle specific” sequence in order to track binders used in a particular encoding cycle (i.e., contacting-transferring-releasing steps of the methods disclosed herein form “encoding cycle”).
- the spacer sequence (Sp) can be constant across all encoding cycles, be specific for a particular class of polypeptides, or be encoding cycle number specific. In some embodiments, only the sequential binding of correct cognate pairs of RT and CT results in interacting spacer elements and effective primer extension.
- a spacer sequence may comprise sufficient number of bases to anneal to a complementary spacer sequence in a recording tag to initiate a primer extension (also referred to as polymerase extension) reaction, or provide a “splint” for a ligation reaction, or mediate a “sticky end” ligation reaction.
- primer extension also referred to as polymerase extension
- primer extension also referred to as “polymerase extension” and “extension” refers to a reaction catalyzed by a nucleic acid polymerase (e.g., DNA polymerase) whereby a nucleic acid molecule (e.g., oligonucleotide primer, spacer sequence) that anneals to a complementary strand is extended by the nucleic acid polymerase, using the complementary strand as template.
- a nucleic acid polymerase e.g., DNA polymerase
- a nucleic acid molecule e.g., oligonucleotide primer, spacer sequence
- UMI unique molecular identifier
- a polypeptide UMI can be used to computationally deconvolute sequencing data from a plurality of extended recording tags to identify extended recording tags that originated from an individual polypeptide.
- a polypeptide UMI can be used to accurately count originating polypeptide molecules by collapsing NGS reads to unique UMIs.
- a binder UMI can be used to identify each individual molecular binder that binds to a particular polypeptide. For example, a UMI can be used to identify the number of individual binding events for a binder specific for a single amino acid that occurs for a particular peptide molecule. It is understood that when UMI and barcode are both referenced in the context of a binder or polypeptide, that the barcode refers to identifying information other that the UMI for the individual binder or polypeptide (e.g. , sample barcode, compartment barcode, encoding cycle barcode).
- universal priming site or “universal primer” or “universal priming sequence” refers to a nucleic acid molecule, which may be used for analysis, amplification, and/or for sequencing of extended recording tags.
- a universal priming site may include, but is not limited to, a priming site (primer sequence) for PCR amplification, flow cell adaptor sequences that anneal to complementary oligonucleotides on flow cell surfaces enabling bridge amplification in some next generation sequencing platforms, a sequencing priming site, or a combination thereof.
- extended recording tag molecules may be circularized and a universal priming site used for rolling circle amplification to form DNA nanoballs that can be used as sequencing templates (Drmanac et al., 2009, Science 327:78-81).
- recording tag molecules may be circularized and sequenced directly by polymerase extension from universal priming sites (Korlach et al., 2008, Proc. Natl. Acad. Sci. 105:1176-1181).
- extended recording tag refers to a recording tag to which information of at least one binder’s coding tag (or its complementary sequence) has been transferred following binding of the binder to a polypeptide.
- Information between the coding tag and the recording tag may be transferred directly (e.g., ligation) or indirectly e.g., primer extension).
- Information between the coding tag and the recording tag may be transferred enzymatically or chemically (e.g., by chemical ligation).
- An extended recording tag may comprise binder information of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50 or more coding tags.
- the base sequence of an extended recording tag may reflect the temporal and sequential order of binding of the binders identified by their coding tags, may reflect a partial sequential order of binding of the binders identified by the coding tags, or may not reflect any order of binding of the binders identified by the coding tags.
- solid support refers to any solid material, including porous and non-porous materials, to which a polypeptide can be associated directly or indirectly, by any means known in the art, including covalent and non-covalent interactions, or any combination thereof.
- a solid support may be two-dimensional (e.g., planar surface) or three-dimensional (e.g., gel matrix or bead).
- a solid support can be any support surface including, but not limited to, a bead, a microbead, an array, a glass surface, a silicon surface, a plastic surface, a filter, a membrane, a PTFE membrane, a nitrocellulose-based polymer surface, a silicon wafer chip, a flow through chip, a flow cell, a biochip including signal transducing electronics, a channel, a microtiter well, an ELISA plate, a spinning interferometry disc, a nitrocellulose membrane, a nitrocellulose-based polymer surface, a polymer matrix, a nanoparticle, a microparticle, or a microsphere.
- Materials for a solid support include but are not limited to acrylamide, agarose, cellulose, dextran, nitrocellulose, glass, gold, quartz, polystyrene, polyethylene vinyl acetate, polypropylene, polyester, polymethacrylate, polyacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, poly vinyl alcohol (PVA), Teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polyvinylchloride, polylactic acid, polyorthoesters, functionalized silane, polypropylfumerate, collagen, glycosaminoglycans, polyamino acids, dextran, or any combination thereof.
- the bead can include, but is not limited to, a ceramic bead, a polystyrene bead, a polymer bead, a polyacrylate bead, a methylstyrene bead, an agarose bead, a cellulose bead, a dextran bead, an acrylamide bead, a solid core bead, a porous bead, a paramagnetic bead, a glass bead, a controlled pore bead, a silica-based bead, or any combinations thereof.
- a bead may be spherical or an irregularly shaped.
- a bead’s size may range from nanometers, e.g., 10 nm, to millimeters, e.g., 1 mm.
- beads range in size from about 0.2 micron to about 200 microns, or from about 0.5 micron to about 5 microns.
- “a bead” solid support may refer to an individual bead or a plurality of beads.
- the nanoparticles range in size from about 10 nm to about 500 nm in diameter.
- nucleic acid or “polynucleotide” refers to a single- or double-stranded polynucleotide containing deoxyribonucleotides or ribonucleotides that are linked by 3 ’-5’ phosphodiester bonds, as well as polynucleotide analogs.
- a nucleic acid molecule includes, but is not limited to, DNA, RNA, and cDNA.
- a polynucleotide analog may possess a backbone other than a standard phosphodiester linkage found in natural polynucleotides and, optionally, a modified sugar moiety or moieties other than ribose or deoxyribose.
- Polynucleotide analogs contain bases capable of hydrogen bonding by Watson- Crick base pairing to standard polynucleotide bases, where the analog backbone presents the bases in a manner to permit such hydrogen bonding in a sequence-specific fashion between the oligonucleotide analog molecule and bases in a standard polynucleotide.
- polynucleotide analogs include, but are not limited to xeno nucleic acid (XNA), bridged nucleic acid (BNA), glycol nucleic acid (GNA), peptide nucleic acids (PNAs), yPNAs, morpholino polynucleotides, locked nucleic acids (LNAs), threose nucleic acid (TNA), 2’-O-Methyl polynucleotides, 2'-O-alkyl ribosyl substituted polynucleotides, phosphorothioate polynucleotides, and boronophosphate polynucleotides.
- XNA xeno nucleic acid
- BNA bridged nucleic acid
- GAA glycol nucleic acid
- PNAs peptide nucleic acids
- yPNAs yPNAs
- morpholino polynucleotides include locked nucleic acids (LNAs),
- a polynucleotide analog may possess purine or pyrimidine analogs, including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarbostyril analogues, azole carboxamides, and aromatic triazole analogues, or base analogs with additional functionality, such as a biotin moiety for affinity binding.
- the nucleic acid molecule or oligonucleotide is a modified oligonucleotide, such as it contains a modified nucleotide.
- the nucleic acid molecule or oligonucleotide is a DNA with pseudo-complementary bases. In some embodiments, the nucleic acid molecule or oligonucleotide is backbone modified, sugar modified, or nucleobase modified.
- nucleic acid sequencing means the determination of the order of nucleotides in a nucleic acid molecule or a sample of nucleic acid molecules, and refer to any possible sequencing method from a variety of sequencing methods known in the art. Examples of sequencing methods include, without limitation, next generation sequencing, sequencing by synthesis, sequencing by ligation, sequencing by hybridization, polony sequencing, ion semiconductor sequencing, nanopore sequencing, single molecule sequencing and pyrosequencing.
- nucleic acid sequencing technologies include platforms provided by Illumina, BGI, Qiagen, Thermo-Fisher, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencing by ligation, capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, and single-molecule arrays (See e.g., Service, Science (2006) 311:1544-1546).
- Some sequencing methods rely on amplification to clone many nucleic acid (e.g., DNA) molecules in parallel for sequencing in a phased approach.
- Single molecule sequencing interrogates single molecules of DNA and does not require amplification or synchronization. Examples of single molecule sequencing methods include single molecule real-time sequencing (Pacific Biosciences), nanopore-based sequencing (Oxford Nanopore), duplex interrupted nanopore sequencing, and direct imaging of DNA using advanced microscopy.
- analyzing means to identify, detect, quantify, characterize, distinguish, or a combination thereof, all or a portion of the components of the polypeptide.
- analyzing a peptide, polypeptide, or protein includes determining all or a portion of the amino acid sequence (contiguous or non-continuous) of the peptide.
- Analyzing a polypeptide also includes partial identification of a component of the polypeptide (e.g., one or more terminal residues of the polypeptide). For example, partial identification of an amino acid residue in the polypeptide sequence can identify an amino acid in the polypeptide as belonging to a subset of possible amino acid residues.
- polypeptide analysis typically begins with analysis of the n NTAA, and then proceeds to the next amino acid residue of the polypeptide (i.e., n-1, n-2, n-3, and so forth). This is accomplished by removal of the n NTAA, thereby converting the n-1 amino acid residue of the polypeptide to a N-terminal amino acid (referred to herein as the “n-7 NTAA”).
- Analyzing the polypeptide may also include determining the presence and frequency of a post-translational modification on the peptide, such as glycosylation. Analyzing the polypeptide may include combining different types of analysis, for example obtaining epitope information, amino acid sequence information, post-translational modification information, or any combination thereof.
- wild-type also “wild-type” or “native” as used herein is used in connection with biological materials such as nucleic acid molecules and polypeptides, refers to those which are found in nature and not modified by human intervention.
- an engineered binder is a polypeptide having an altered amino acid sequence, relative to an unmodified or wild-type polypeptide, such as starting scaffold, or a portion thereof.
- An engineered binder is a polypeptide which differs from a wild-type scaffold sequence, or a portion thereof, by one or more amino acid substitutions, deletions, additions, or combinations thereof.
- Sequence of a binder can contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more amino acid differences (e.g., mutations) compared to the sequence of starting scaffold.
- a binder generally exhibits at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a corresponding wild-type starting scaffold.
- Non-naturally occurring amino acids as well as naturally occurring amino acids are included within the scope of permissible substitutions or additions.
- a binder is not limited to any binders made or generated by a particular method of making and includes, for example, a binder made or generated by genetic selection, polypeptide engineering, chemical synthesis, directed evolution, de novo recombinant DNA techniques, or combinations thereof. [0060] In some embodiments, variants of a binder displaying only non-substantial or negligible differences in structure can be generated by making conservative amino acid substitutions in the binder.
- binder variants that comprise a sequence having at least 80% (85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99%) sequence identity with the binder sequences can be generated, retaining at least one functional activity of the binder, e.g., ability to specifically bind an N-terminal amino acid (NTAA) residue of the polypeptide analyte.
- NTAA N-terminal amino acid
- Examples of conservative amino acid changes are known in the art.
- non-conservative amino acid changes that are likely to cause major changes in polypeptide structure are those that cause substitution of (a) a hydrophilic residue, e.g., serine or threonine, for (or by) a hydrophobic residue, e.g.
- leucine isoleucine, phenylalanine, valine or alanine
- cysteine or proline for (or by) any other residue
- residue having an electropositive side chain e.g. , lysine, arginine, or histidine
- an electronegative residue e.g. , glutamic acid or aspartic acid
- residue having a bulky side chain e.g. , phenylalanine, for (or by) one not having a side chain, e g., glycine.
- amino acid sequence variants can be prepared by mutations in the DNA.
- Methods for polynucleotide alterations are well known in the art, for example, Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192 and the references cited therein.
- identifying means to predict identity of the peptide with a certain probability. It can be done by identifying a component (e.g., one or more amino acid residues) of the peptide. It can also be done by predicting certain amino acid residues of the peptide and their positions with certain probability, thus creating a peptide signature, and then matching bioinformatically the resulted peptide signature with corresponding signatures of peptides that may be present in the sample (e.g., by matching the peptide signature with peptide sequences from a proteomic or genomic database).
- existing selectivity of a binder is not enough to determine the NTAA residue to which the binder is bound with certainty.
- identity of the NTAA residue can be determined with certain probability (such as being D, E or H and not A, G, I or L).
- Subsequent similar determination of adjacent amino acid residues creates an array of possible variants for the peptide based on variants in the assayed amino acid residues, and by matching this array of variants with theoretical possibilities determined from a proteomic or genomic database, it can be narrowed down to a particular sequence, if enough amino acid residues were assayed.
- sequence identity is a measure of identity between polypeptides at the amino acid level, and a measure of identity between nucleic acids at nucleotide level.
- the polypeptide sequence identity may be determined by comparing the amino acid sequence in a given position in each sequence when the sequences are aligned.
- the nucleic acid sequence identity may be determined by comparing the nucleotide sequence in a given position in each sequence when the sequences are aligned.
- Sequence identity means the percentage of identical subunits at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps and insertions.
- the BLAST algorithm calculates percent sequence identity and performs a statistical analysis of the similarity and identity between the two sequences.
- NCBI National Center for Biotechnology Information
- nucleotides or amino acid positions “correspond to” nucleotides or amino acid positions of a disclosed sequence refers to nucleotides or amino acid positions identified in the polynucleotide or in the polypeptide upon alignment with the disclosed sequence using a standard alignment algorithm, such as the BLAST algorithm (NCBI).
- any given amino acid residue in a given polypeptide at a position corresponding to a particular position of a reference sequence can be identified by performing alignment of the polypeptide sequence with the reference sequence (for example, by using BLASTP publicly available through the NCBI website), matching the corresponding position of the reference sequence with the position in polypeptide sequence and thus identifying the amino acid residue within the polypeptide.
- joining means connecting or linking these substances together utilizing one or more covalent bond(s) and/or non-covalent interactions.
- non-covalent interactions include hydrogen bonding, hydrophobic binding, and Van der Waals forces.
- Joining can be direct or indirect, such as via a linker or via another moiety. In preferred embodiments, joining two or more substances together would not impair structure or functional activities of the joined substances. Attachment can be direct or indirect.
- indirect attachment means include attachment via a linker (e.g., flexible linker), or attachment via a solid support (i.e., when two moieties to be attached are independently coupled to the solid support).
- Recording tags can be attached to polypeptides pre- or post-immobilization to the solid support.
- polypeptides can be first labeled with recording tags and then immobilized to a solid surface via a recording tag comprising two functional moieties for coupling. One functional moiety of the recording tag couples to the polypeptide, and the other functional moiety immobilizes the recording tag- labeled polypeptide to a solid support.
- polypeptides are immobilized to a solid support prior to labeling with recording tags.
- polypeptides can first be derivatized with reactive groups such as click chemistry moieties. The activated polypeptides molecules can then be attached to a suitable solid support and then labeled with recording tags using the complementary click chemistry moiety.
- polypeptides derivatized with alkyne and mTet moieties may be immobilized to a flow cell derivatized with azide and transcyclooctene (TCO) and attached to recording tags labeled with azide and TCO.
- TCO transcyclooctene
- Other click chemistry reactions may also be utilized. It is understood that the methods provided herein for attaching peptides to the solid support may also be used to attach recording tags to the solid support or attach recording tags to peptides.
- m-tetrazine or phenyl tetrazine is used in an iEDDA click chemistry reaction.
- a target polypeptide is labeled with a bifunctional click chemistry reagent, such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide.
- a target polypeptide is labeled with a bifunctional click chemistry reagent, such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide.
- an alkyne can also be a strained alkyne, such as cyclooctynes including Dibenzocyclooctyl (DBCO).
- DBCO Dibenzocyclooctyl
- the term “macromolecule comprises a moiety” refers to a situation where the moiety is either a part of the macromolecule, or directly attached to the macromolecule by means of one or more covalent bond(s), which unite them into a single molecule.
- the term “macromolecule associated with a moiety” indicates that the moiety may or may not be directly attached to the macromolecule by means of one or more covalent bond(s), and can be associated with the macromolecule by means of non-covalent interactions.
- macromolecule is associated with a recording tag
- association between the macromolecule and the recording tag encompasses various possible ways for association between the macromolecule and the recording tag (either direct, covalent or non-covalent association, or indirect association, such as association via a linker or via another object, such as via solid support).
- peptide bond refers to a chemical bond formed between two molecules (such as two amino acids) when the carboxyl group of one molecule reacts with the amino group of the other molecule, releasing a water molecule (H2O).
- PNGase refers to Peptide-N4-(N-acetyl-a-glucosaminyl) asparagine amidase, which is a type of deglycosylating enzyme that hydrolyzes N-linked (i.e., asparagine residue-linked) glycan moieties from glycopolypeptides and yields polypeptides containing aspartic acid residues in places of the original asparagine residues.
- PNGase may be a natural enzyme, such as enzymes that belong to enzyme classification number EC 3.5.1.52.
- Non-limiting examples of PNGase include PNGase F (such as derived from Elizabethkingia miricola with protein sequence found at UniProt ID: P21163), variants of PNGase F (e.g., PNGase F comprising one or more amino acid substitutions selected from D100N, E158Q, E246Q, or a combination thereof), PNGase A, PNGase H +, and variants thereof (see, e.g., Guo RR, et al., PNGase H + variant from Rudaea cellulosilytica with improved deglycosylation efficiency for rapid analysis of eukaryotic N-glycans and hydrogen deuterium exchange mass spectrometry analysis of glycoproteins.
- PNGase F such as derived from Elizabethkingia miricola with protein sequence found at UniProt ID: P21163
- variants of PNGase F e.g., PNGase F comprising one or
- the methods of the present disclosure can also be performed using a functional fragment of a natural PNGase enzyme, wherein the fragment is configured to release N-glycans from asparagine residues.
- an engineered enzyme having similar functionality as a natural PNGase enzyme i.e., hydrolyzes N-linked glycan moieties from glycopolypeptides and yields polypeptides containing aspartic acid residues in places of the original asparagine residues
- a natural PNGase enzyme i.e., hydrolyzes N-linked glycan moieties from glycopolypeptides and yields polypeptides containing aspartic acid residues in places of the original asparagine residues
- an engineered PNGase used in the disclosed methods comprises an amino acid sequence having at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% sequence identity to any one of SEQ ID NOs: 34-36.
- PNGase deglycosylates polypeptide analytes by removing N-glycans from a glycan-containing polypeptide.
- Single PNGase or mixtures of PNGase enzymes may be used to deglycosylate a polypeptide. Cleaving “substantially all” glycans results in a completely deglycosylated protein where “complete deglycosylation” refers to >70%, >80%, >90%, >98% or >99% deglycosylation by a PNGase as determined by SDS-PAGE or by mass spectrometry.
- glycan attachment residue of a N-linked glycosylation site refers to either the asparagine residue to which an N-linked glycan attaches, or to the aspartic acid residue to which is the asparagine residue is converted following PNGase treatment. Amino acid identity of the glycan attachment residue may be determined by specific binders as disclosed herein.
- a high throughput approach for detecting the presence of a glycan in a particular N-glycosylation site in proteins or peptides is disclosed. This approach can be combined with high throughput peptide identification, so that N-glycosylation sites are analyzed without referring to a specific peptide with a known sequence, but rather in all (or most) peptides that are present in a particular sample, or even in multiple samples.
- FIG. 1-FIG. 3 Exemplary high throughput approaches that can be used for both peptide identification and the analysis of N-glycosylation sites are shown in FIG. 1-FIG. 3.
- the exemplary assays described in FIG. 1-FIG. 3 are referred to as next generation peptide sequencing (NGPS) assays, because each of them allows to process thousands, millions or more peptide molecules in parallel.
- NGPS next generation peptide sequencing
- FIG. 1 depicts an exemplary peptide sequencing assay with terminal amino acid (TAA)-specific binders.
- Peptide molecules are each attached to a nucleic acid (e.g., a DNA) recording tag (RT) and attached to beads at a low peptide/RT pair (e.g., a peptide/RT conjugate) density, a sparsity that permits only intra-peptide/RT pair information transfer to occur.
- the peptide and its RT can form a conjugate, and in cases where the peptide is covalently attached to the RT to form a single molecule, the conjugate sparsity on a bead permits only intramolecular information transfer, and not information transfer between adjacent conjugates on the bead.
- the peptide terminal amino acid (TAA) residues are labeled with a terminal modification (TM) to provide greater affinity to binders.
- TM terminal modification
- immobilized and labeled peptides are contacted with a set of binders each specific for labeled TAA residue(s) (e.g., labeled F-specific binder is shown).
- Each binder comprises a nucleic acid (e.g., DNA) coding tag (CT) that comprises a barcode with identifying information regarding the binding moiety of the binder.
- CT nucleic acid
- the coding tag barcode is transferred enzymatically (via extension and/or ligation, such as primer extension followed by ligation) to the recording tag, generating an extended RT.
- the labeled TAA is removed, e.g., by using mild Edman-like elimination chemistry or by a Cleavase enzyme.
- the cycle 1-2-3 is repeated n times.
- the extended RT containing barcodes that represent the n amino acid residues of the peptide is formed and can be sequenced by NGS. A representative structure of the extended RT after 7 cycles is shown.
- FIG. 2 depicts an exemplary variation of the peptide sequencing assay with terminal amino acid (TAA)-specific binders as described in FIG. 1.
- the variation is in use of a set of binders each specific for non-labeled TAA residue(s) (e.g., F-specific binder is shown). Labeling of the TAA occurs after the barcode with identifying information regarding the binding moiety is transferred from the coding tag attached to the binder that was bound to the TAA to the recording tag attached to the peptide. More details regarding the NGPS approaches shown in FIG. 1-FIG. 2 are described in the US patent applications US 2023/0054691 Al, US 11513126 B2, US 2022/0227889 Al and US 2022/0283175 Al, each incorporated by reference herein.
- FIG. 3 depicts another exemplary peptide sequencing assay with terminal amino acid (TAA)-specific binders.
- Peptide molecules are each attached to a nucleic acid (e.g., a DNA) recording tag (RT) and attached to beads at a low peptide/RT pair (e.g., a peptide/RT conjugate) density, a sparsity that permits only intra-peptide/RT pair information transfer to occur.
- the peptide and its RT can form a conjugate, and in cases where the peptide is covalently attached to the RT to form a single molecule, the conjugate sparsity on a bead permits only intramolecular information transfer, and not information transfer between adjacent conjugates on the bead.
- More details regarding the NGPS approaches shown in FIG. 3 are described in the US patent applications US 20200209255 Al, US 20210139973 Al and US 20210364527 Al, each incorporated by reference herein.
- a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide comprising:
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and (c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by
- a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide comprising: (a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and (d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and clea
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D), wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
- the TAA residues are N-terminal amino acid (NTAA) residues. In some embodiments, the TAA residues are modified.
- the TAA is an N-terminal amino acid (NTAA). In other embodiments of the disclosed methods, the TAA is a C-terminal amino acid (CTAA).
- the cleavage of the polypeptide is performed by a cleaving enzyme such as an enzyme described in US 11,427,814 B2.
- the disclosed methods further comprise quantifying degree of the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site of the polypeptide by determining what fraction of molecules of the polypeptide or the cleaved polypeptide bind to the first binder and/or the second binder based on analysis of the identifying information regarding the first binder and/or the second binder.
- additional 1000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
- attachments of an N-linked glycan to a glycan attachment residue of the N-linked glycosylation site of each polypeptide of the additional 1000 or more different polypeptides are assessed.
- the identifying information regarding the first binder and/or the second binder are analyzed by an optical method.
- the identifying information regarding the first binder and/or the second binder are analyzed by a nucleic acid sequencing method.
- the N-linked glycosylation site comprises any one of the following amino acid sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx is any standard, naturally occurring amino acid residue.
- one or more additional molecules of the polypeptide are (i) immobilized on the solid support; (ii) are not contacted with PNGase; and (iii) analyzed as described in (b) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized polypeptide or the cleaved immobilized polypeptide after the contacting with the PNGase.
- contacting with the PNGase occurs after the immobilization of the polypeptide to the solid support.
- the detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
- the detectable labels of the first binder and/or the second binder are each comprise an epitope, which is later detected by a specific antibody.
- the detectable labels of the first binder and/or the second binder are each comprise a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
- the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ nucleic acid sequencing of the encoder barcode of the first binder or the second binder.
- the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag that comprises the encoder barcode.
- the in situ amplification is a rolling circle amplification.
- the disclosed methods further comprise hybridizing a fluorescent oligonucleotide probe to the amplified portion of the nucleic acid coding tag and detecting a signal from the fluorescent oligonucleotide probe.
- the immobilized polypeptide is attached to a nucleic acid recording tag before contacting the immobilized polypeptide treated with the PNGase with the set of binders; and (ii) following binding of the first binder or the second binder to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide, generating an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the first binder or the second binder and nucleic acid sequence information of the nucleic acid recording tag attached to the immobilized polypeptide; and wherein analyzing the detectable label of the first binder or the second binder comprises determining a nucleic acid sequence of at least a portion of the extended nucleic acid construct, wherein the portion comprises the nucleic acid sequence information of the encoder barcode of the first binder or the second binder.
- the nucleic acid sequence of the portion of the extended nucleic acid construct is determined using a DNA sequencer.
- the support is a flow cell.
- the determining amino acid identity of the amino acid residue treated with the PNGase comprises determining a Likelihood of a particular type of the amino acid residue.
- the disclosed methods further comprise determining amino acid identities of one or more additional amino acid residues of the polypeptide treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder.
- the disclosed methods further comprise determining an amino acid sequence of the polypeptide treated with the PNGase based on the determined amino acid identities of the one or more additional amino acid residues of the polypeptide.
- the disclosed methods further comprise the amino acid sequence of the polypeptide treated with the PNGase to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase.
- the first binder binds to a terminal Asn (N) residue, and/or second binder binds to a terminal Asp (D) residue.
- the set of binders further comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D).
- the method does not comprise use of mass spectrometry.
- the first binder of the set of binders configured to specifically bind to an Asn (N) residue within a motif such as it also binds one or more neighboring amino acid residues of the polypeptide analyte.
- the second binder is configured to specifically bind to an Asp (D) residue within a motif such as it also binds one or more neighboring amino acid residues of the polypeptide analyte.
- the motif may be a terminal motif or internal motif.
- the first binder and/or the second binder may comprise an antibody or a functional fragment thereof which is configured to recognize a certain motif (e.g., 2 amino acid residues, 3 amino acid residues, 4 amino acid residues, or 5 amino acid residues) within a polypeptide sequence.
- a certain motif e.g., 2 amino acid residues, 3 amino acid residues, 4 amino acid residues, or 5 amino acid residues
- such binders may be generated as described in US11970693B2 and US11282586B2, incorporated by reference herein.
- binders used in the disclosed methods can specifically bind to an amino acid residue that serves as an attachment site of an N-linked glycan of a N-linked glycosylation site of a polypeptide.
- the amino acid residues that serve as an attachment site of N-glycans can be a terminal amino acid residue (TAA) or internal amino acid residue. Terminal amino acid residues may be specifically detected, because they are located in an unstructured “tail,” which can make them readily accessible by binders.
- a polypeptide analyte is cleaved to make the attachment residue of a N-linked glycosylation site a terminal residue.
- amino acid residues of a polypeptide analyte are sequentially cleaved until the attachment residue of a N-linked glycosylation site becomes a terminal residue.
- N-linked glycosylation sites in proteins often comprises the Asn-X-Ser/Thr sequence motif which usually sits in a solvent-exposed, flexible region (a loops or turn) rather than in regular secondary structure.
- Loop or turn regions are more solvent-exposed, which enhances glycosylation efficiency and makes the glycan accessible for quality-control lectins, folding chaperones, and maturation enzymes. Accordingly, the attachment site residue of a N-linked glycosylation site may be specifically recognized by a binder even when such residue is an internal residue of a polypeptide analyte.
- the first binder is configured to specifically bind to a terminal Asn (N) residue.
- the second binder is configured to specifically bind to a terminal Asp (D) residue.
- the set of binder used in the assay comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D).
- TAA terminal amino acid
- Such binders are known in the art, see, e.g., U.S. Patent Nos. 9,566,335, 10,852,305, 11,959,920 and 9,435,810, incorporated by reference, and Example 1 below.
- the N-linked glycan attached to the attachment residue in a N- linked glycosylation site of a polypeptide is a glycan found on human glycoproteins and efficiently released by PNGase under appropriate conditions.
- the N- linked glycan is a high-mannose type glycan such as a glycan having a chitobiose core (GlcNAca) and 3-7 mannose residues (e.g., MamGlcNAca, MamGlcNAca).
- proteins or peptides in each sample may be barcoded by installing a sample-specific barcode as a part of a recording tag attached to a particular peptide. Peptides from multiple samples each attached to a recording tag are mixed together and processed according to the methods described herein. During parallel analysis of extended recording tags that were attached to peptides during the encoding assay, sample-specific barcode information is extracted and decoded, so the identity and glycosylation status of each analyzed peptide can be combined with the origin of the peptide (e.g., from which sample the peptide is originated). Barcoding methods that permit sample multiplexing were described in the US patent applications US 2019/0145982 Al, US 2022/0214353 Al, and US 2022/0235405 Al, each incorporated by reference herein.
- each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder
- the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
- the disclosed methods are for analysis and/or sequencing of multiple polypeptide analytes simultaneously (multiplexing).
- Multiplexing refers to analysis of a plurality of polypeptide analytes in the same assay.
- the plurality of polypeptide analytes can be derived from the same sample or
- the plurality of polypeptide analytes can be derived from the same subject or different subjects.
- the plurality of polypeptide analytes that are analyzed can be different polypeptide analytes, or the same polypeptide analyte derived from different samples.
- a plurality of polypeptide analytes includes 2 or more polypeptide analytes, 5 or more polypeptide analytes, 10 or more polypeptide analytes, 50 or more polypeptide analytes, 100 or more polypeptide analytes, 500 or more polypeptide analytes, 1000 or more polypeptide analytes, 5,000 or more polypeptide analytes, 10,000 or more polypeptide analytes, 50,000 or more polypeptide analytes, 100,000 or more polypeptide analytes, 500,000 or more polypeptide analytes, or 1,000,000 or more polypeptide analytes.
- the disclosed methods are for analyzing a large number of polypeptides (e.g., at least 1000, 10000, 100000, 1000000 or more polypeptide molecules which comprise molecules of at least 100, 1000, 10000 or more different polypeptides) in a single assay.
- the disclosed methods are for analyzing a large number of N- linked glycosylation sites within a plurality of polypeptides (e.g., at least 100, 1000, 10000, 100000, or more N-linked glycosylation sites present in 1000, 10000, 100000, 1000000 or more polypeptide molecules).
- the first polypeptide is generated by fragmenting a protein from a sample, such as biological sample.
- the disclosed method further comprises quantifying degree of glycosylation for the N-linked glycosylation site of the first polypeptide by determining what fraction of the immobilized first polypeptide molecules comprise Asp TAA as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules.
- immobilized polypeptide molecules comprising immobilized first polypeptide molecules obtained in (a) are: (i) not treated with the PNGase in (a), and (ii) analyzed as described in (b), followed by utilizing data obtained by the analysis of immobilized polypeptide molecules not treated with the PNGase in (c) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized first polypeptide molecules after treatment with the PNGase.
- the method does not comprise use of mass spectrometry.
- the first polypeptide is a pre-selected target polypeptide. In other embodiments, the first polypeptide is a random, previously unknown polypeptide present in a sample, such as biological sample. [0127] In some embodiments, in addition to the first polypeptide, degrees of glycosylation for 100, 200, 500, 1000, 10000 or more different polypeptides each comprising a N-linked glycosylation site are determined in (c) utilizing the first binder and the second binder.
- the TAA is an N-terminal amino acid (NTAA) and the new TAA is a new NTAA.
- the TAA is a C-terminal amino acid (CTAA) and the new TAA is a new CTAA.
- each binder of the plurality binds to a modified TAA of an immobilized polypeptide.
- the modified TAA is a modified N-terminal amino acid (NTAA).
- the modified TAA is a modified C-terminal amino acid (CTAA).
- the modified NTAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N- terminal modifier agent before contacting the solid support with the set of binders.
- the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3): wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF 2 H, CF 3 , OH, OCH 3 , OCF 3 , NH 2 , N(CH 3 ) 2 , NO 2 , SCH 3 , SO 2 CH 3 , CH 2 OH, B(
- the first binder and/or the second binder each comprises a peptide or an aptamer.
- each binder of the plurality comprises an identifying detectable label (i.e., detectable label that identifies the binder to which it is attached);
- the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting the identifying detectable label attached to the binder.
- the identifying detectable label comprises a fluorescent moiety
- the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting a signal attached to the fluorescent moiety.
- the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag; (ii) each binder of the plurality is attached to a nucleic acid coding tag that comprises identifying information regarding the binder; and (iii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (b)(ii) is retained in the nucleic acid recording tag attached to the polypeptide upon transfer from the nucleic acid coding tag, wherein the transfer comprises primer extension and/or ligation.
- the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag that comprises a unique molecular identifier (UMI);
- each binder of the plurality is attached to a nucleic acid coding tag that comprises a barcode comprising identifying information regarding the binder; and
- the UMI from the polypeptide is transferred from the recording tag attached to the polypeptide to the coding tag of the binder, wherein the transfer comprises primer extension and/or ligation.
- Coding tags of binders are collected after each binding cycle, and the identifying information regarding binders that were bound to polypeptides in each binding cycle is obtained by sequencing of the coding tags and analyzing the UMIs present in the coding tags in connection with the barcodes present in the coding tags.
- the analysis of the identifying information regarding the first binder and/or regarding the second binder retained in step (b) for the immobilized first polypeptide molecules is performed by nucleic acid sequencing of recording tags attached to the first polypeptide molecules and extended upon transfer from the nucleic acid coding tags.
- the first binder specifically binds to Asn (N) TAA of an immobilized polypeptide
- the second binder specifically binds to Asp (D) TAA of an immobilized polypeptide.
- the first binder specifically binds not only to Asn (N) TAA of an immobilized polypeptide, but also to a penultimate terminal amino acid residue of the immobilized polypeptide.
- the second binder specifically binds not only to Asp (D) TAA of an immobilized polypeptide, but also to a penultimate terminal amino acid residue of the immobilized polypeptide.
- the first binder and/or the second binder specifically bind(s) not only to TAA of an immobilized polypeptide, but also to one or more neighboring amino acid residues of the immobilized polypeptide.
- the first binder preferentially binds to an Asn (N) residue over an Asp (D) residue of an immobilized polypeptide
- the second binder preferentially binds to the Asp (D) residue over the Asn (N) residue of an immobilized polypeptide.
- the Asn (N) residue and the Asp (D) residue to which the binders are bound are terminal amino acid residues of immobilized polypeptide molecules.
- each of the immobilized polypeptides comprising the immobilized first polypeptide in (a) is covalently joined to the solid support.
- the solid support is a bead, such as porous bead.
- the solid support comprises a plurality of nucleic acid recording tags covalently attached to the support and configured to be associated directly or indirectly with analyzed polypeptides including the first polypeptide, wherein adjacent nucleic acid recording tags on the support are spaced apart from each other on a surface or within a volume of the support at an average distance of about 50 nm or greater. This is beneficial in embodiments where multiple different polypeptides are immobilized on the same support.
- Different polypeptides can be spaced appropriately (e.g., about 50 nm or greater) to reduce the occurrence of or prevent a cross-binding or inter-molecular event, e.g., where a binder binds to a first polypeptide and its coding tag information (i.e., the identifying barcode) is transferred to a recording tag attached to a neighboring polypeptide rather than the recording tag attached to the first polypeptide.
- a binder binds to a first polypeptide and its coding tag information (i.e., the identifying barcode) is transferred to a recording tag attached to a neighboring polypeptide rather than the recording tag attached to the first polypeptide.
- the analysis comprises determining identity of amino acid residues present at the N-linked glycosylation site of different immobilized first polypeptide molecules.
- the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys.
- the TAA or the modified TAA are removed using an enzyme.
- the treatment with the PNGase occurs after the attachment to the solid support.
- steps (i)-(iii) or (i)-(ii) are repeated at least two, at least three, at least four or more times.
- the disclosed methods further comprise determining identity of the first polypeptide based on analysis of (i) the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, and (ii) identifying information regarding at least one additional binder of the set of binders that was bound to some of first polypeptide molecules during step (b).
- each binder of the set of binders is attached to a nucleic acid coding tag that comprises identifying information regarding the binder. In some embodiments of the disclosed methods, each binder of the set of binders is covalently attached to the associated nucleic acid coding tag.
- the identifying information regarding a binder may be obtained or retained, depending on the detection method used.
- the identifying information regarding a binder is obtained, such as an identified detectable label of the binder is detected or analyzed after the binder binds to the target polypeptide.
- the identifying information regarding a binder is retained, such as retained in a nucleic acid recording tag attached to a target polypeptide after the binder binds to the target polypeptide; the retained identifying information may be analyzed and decoded later after completion of the encoding assay (i.e., after analyzing sequentially at least some individual amino acid residues of the immobilized polypeptide).
- the retained identifying information is analyzed using a nucleic acid sequencing of the recording tag attached to the target polypeptide and extended after the encoding assay.
- At least one binder of the set of binders is or comprises an engineered serine carboxypeptidase. In other embodiments of the disclosed methods, at least one binder of the set of binders is or comprises an engineered cysteine carboxypeptidase.
- a DNA polymerase that is used for primer extension during information transfer possesses strand-displacement activity and has limited or is devoid of 3 ’-5 exonuclease activity.
- examples of such polymerases include Klenow exo- (Klenow fragment of DNA Pol 1), T4 DNA polymerase exo-, T7 DNA polymerase exo (Sequenase 2.0), Pfu exo-, Vent exo-, Deep Vent exo-, Bst DNA polymerase large fragment exo-, Bea Pol, and Phi29 Pol exo-.
- the DNA polymerase is active at room temperature and up to 45 °C.
- thermophilic polymerase in another embodiment, a “warm start” version of a thermophilic polymerase is employed such that the polymerase is activated and is used at about 40°C-50 °C.
- An exemplary warm start polymerase is Bst 2.0 Warm Start DNA Polymerase (New England Biolabs).
- Barcode information of a coding tag attached to a specific binder may be transferred to a recording tag attached to the immobilized polypeptide via ligation.
- Ligation may be a blunt end ligation or sticky end ligation.
- Ligation may be an enzymatic ligation reaction. Examples of ligases include, but are not limited to CV DNA ligase, T4 DNA ligase, T7 DNA ligase, T3 DNA ligase, Taq DNA ligase, E. coli DNA ligase.
- a ligation may be a chemical ligation reaction.
- a spacer-less ligation is accomplished by using hybridization of a “recording helper” sequence with an arm on the coding tag.
- annealed complement sequences are chemically ligated using standard chemical ligation or “click chemistry” (Gunderson et al., Genome Res (1998) 8(11): 1142-1153; Litovchick et al., Artif DNA PNA XNA (2014) 5(1): e27896; Roloff et al., Methods Mol Biol (2014) 1050:131-141).
- Click chemistry Gunderson et al., Genome Res (1998) 8(11): 1142-1153; Litovchick et al., Artif DNA PNA XNA (2014) 5(1): e27896; Roloff et al., Methods Mol Biol (2014) 1050:131-141).
- a natural PNGase enzyme is used in the disclosed methods to treat polypeptides having N-linked glycan moieties to remove the N-linked glycan moieties from the polypeptides and yield polypeptides containing aspartic acid residues in places of the original asparagine residues in the N-linked glycosylation sites.
- PNGase enzymes A variety of PNGase enzymes are described in the literature, available commercially and can be used in the disclosed methods.
- Natural PNGases can be divided into three groups: PNGase F-like (bacterial) enzymes, acidic PNGases and cytoplasmic PNGases (see, e.g., Wang T, Voglmeir J.
- PNGases as valuable tools in glycoprotein analysis. Protein Pept Lett. 2014;21(10):976-85). All PNGases catalyze the same enzymatic reaction, but their protein structures are different, which reflects their different biological roles.
- PNGase F-like enzymes can be produced in E. coli, and are commercially available (e.g., PNGase F and rapid PNGase F from New England Biolabs, Ipswich, U.S., Cat. No P0704 and P0710).
- the catalytic structure of PNGase F-like enzymes is well-known (Norris GE, Stillman TJ, Anderson BF, Baker EN.
- PNGase F a glycosylasparaginase from Flavobacterium meningosepticuin. Structure. 1994 Nov 15:2(11): 1049-59), and improved versions of the enzyme were made and used.
- PNGase de-glycosylates polypeptide molecules at N- linked glycosylation sites. In some embodiments, PNGase de-glycosylates at least 50%, 60%, 70%, 80%, 90%, 95% or more polypeptide molecules at N-linked glycosylation site present within the analyzed polypeptide.
- a suitable acidic PNGase (PNGase A) or cytoplasmic PNGase (e.g., yPngl from yeast) is used in the disclosed methods.
- PNGase used in the disclosed methods is selected from the group consisting of PNGase A (see SEQ ID NO: 36), PNGase Ar, PNGase Le, PNGase F (see SEQ ID NO: 34 or SEQ ID NO: 35), a derivative thereof (i.e., having amino acid sequence comprising at least 30%, 50%, 70% or 90% identity to any one of SEQ ID NO: 34-36), a fragment thereof, or a combination thereof.
- PNGase is expressed as a recombinant protein before use in the disclosed methods, such as expressed in animal cells, yeast cells, or insect cells.
- a host cell such as yeast capable of secreting PNGase is selected although the PNGase could also be purified from the lysate of the host cells (see US 9964548 B2, incorporated by reference herein).
- suitable host cells for expressing a plant-derived PNGase may include yeast such as Kluyveromyces lactis or Pichia pastoris.
- PNGase is synthesized in the nonnative host cell with a high mannose N-linked glycan incorporated into PNGase during the expression, it may be deglycosylated using a suitable high mannose N-linked glycans cleavage enzyme such as Endo H.
- a non-natural, engineered PNGase enzyme is used in the disclosed methods.
- polypeptides are generated by fragmenting one or more proteins from a sample before treating the polypeptides with PNGase. Loss of three-dimensional protein structure during fragmentation highly improve the efficiency of deglycosylation by PNGase. Fragmentation may occur enzymatically (e.g., by protease digestion, such as trypsin) or mechanically (e.g., by ultrasound). In some embodiments, a surfactant is further added to aid deglycosylation by PNGase (see US 9964548 B2, incorporated by reference herein). In some embodiments, deglycosylation by PNGase comprises heating the sample to about 37-50° C for 10 minutes.
- each binder the set of binders is modified to be conjugated to an identifiable detectable label.
- the detectable label is a fluorescent label.
- the detectable label is a magnetic label.
- an identifiable detectable label is a nucleic acid barcode.
- an identifiable detectable label is an affinity tag (e.g., Flag tag, HA tag or biotin tag).
- the number of spaces on the support occupied by an identified portion of a polypeptide analyte is counted to quantify the level of that polypeptide analyte in the sample.
- polypeptide analytes may be mixed, spotted, dropped, pipetted, flowed, or otherwise applied to the support.
- support has been functionalized with a chemical moiety such as an NHS ester or other amine-specific reagent before polypeptide analytes are applied to the support. This allows to use immobilization of polypeptide analytes to the support through N-terminus (see also Example 3 below).
- adjacent polypeptide analytes of the disclosed methods attached to the solid support are spaced apart from each other on a surface or within a volume of the support at an average distance of about 50 nm or greater.
- selectivity of each binder used during the encoding assay towards NTAA or CTAA resides of polypeptide analytes is determined in advance, before performing contacting steps of the disclosed methods.
- Each binder may be tested against a panel of peptides each having a different NTAA or CTAA reside and an associated recording tag to characterize selectivity and, optionally, binding kinetics of the binder for each of the 20 natural CTAA resides.
- a set comprising minimum number of binders may be selected that would cover all or a maximum number of the 20 natural NTAA or CTAA resides.
- the provided methods are for generating a nucleic acid encoded library representation of the binding history of each polypeptide of the plurality of target polypeptides (i.e., polypeptide analytes).
- This nucleic acid encoded library can be amplified, and analyzed using high-throughput next generation digital sequencing methods, enabling millions to billions of molecules to be analyzed per run.
- the creation of a nucleic acid encoded library of binding information is useful in another way in that it enables enrichment, subtraction, and normalization by DNA-based techniques that make use of hybridization. These DNA-based methods are easily and rapidly scalable and customizable, and more cost-effective than those available for direct manipulation of other types of macromolecule libraries, such as polypeptide libraries.
- nucleic acid encoded libraries of binding information can be processed prior to sequencing by one or more techniques to enrich and/or subtract and/or normalize the representation of sequences. This enables information of maximum interest to be extracted much more efficiently, rapidly and cost-effectively from very large libraries whose individual members may initially vary in abundance over many orders of magnitude.
- the binder after transferring the identifying information regarding the binder between the nucleic acid coding tag and the nucleic acid recording tag, the binder is released from the cleaved polypeptide analyte.
- the release is controllable.
- the release is induced by changing the reaction conditions, such as buffer conditions.
- the release is controlled by a nucleic acid cleaving reagent used to generate a cleaved extended recording tag on the support attached to the polypeptide analyte.
- binder is engineered from an endopeptidase, such as serine, threonine or cysteine endopeptidase, which binds to several amino acid residues and the acylated intermediate (binder-polypeptide) is formed by the covalent bond between the binder and a residue distal to carboxy or amino terminus of the polypeptide analyte.
- the distal residue becomes a new CTAA of the polypeptide analyte in the next cycle of encoding.
- a method of identifying a large plurality of polypeptide analytes e.g., at least 1000, 10000, 100000, 1000000 or more polypeptide molecules which comprise molecules of at least 100, 1000, 10000 or more different polypeptides
- a large plurality of polypeptide analytes e.g., at least 1000, 10000, 100000, 1000000 or more polypeptide molecules which comprise molecules of at least 100, 1000, 10000 or more different polypeptides
- proteins from a sample can be fractionated into a plurality of fractions, and proteins in each plurality of fractions can be fragmented to polypeptides followed by barcoding of the polypeptides (e.g., by introducing a sample barcode into an associated recording tag for each polypeptide). Then, barcoded polypeptides from different fractions each conjugated to a recording tag can be pooled together and analyzed using methods and compositions disclosed herein. Fractionation, barcoding and pooling techniques are beneficial for analysis of complex biological samples, such as samples having proteins of vastly different abundances (e.g., plasma). Techniques for fractionation, barcoding and pooling are known in the art and disclosed, for example, in US 20190145982 Al, incorporated by reference herein.
- individual polypeptide molecules are attached to a solid support, and at least some individual amino acid residues of the immobilized polypeptides are analyzed sequentially, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
- each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of immobilized polypeptides;
- binders of the set of binders have different specificities towards TAA residues or modified TAA residues which the binders are engineered to bind.
- the first binding profile is determined by generating a digital signature that shows (i) which binders of the set of binders were bound to the modified TAA of polypeptide in the performed binding cycles, and (ii) which binders of the set of binders were not bound to the modified TAA of polypeptide in the performed binding cycles.
- each binder comprises a detectable label.
- the detectable label of each binder comprises a fluorescently labeled probe.
- the detectable label of each binder comprises a unique epitope.
- unique epitopes of binders of the set of binders are distinguished and detected by antibodies that each bind to a unique epitope.
- the signal may be further amplified using methods known in the art, such as using secondary antibodies to decorate primary antibodies upon binding to the cognate epitope.
- the detectable label of each binder comprises a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
- nucleic acid tags as detectable labels provides unique advantages because nucleic acid-based barcodes provide much greater opportunity for multiplexing compared to fluorescent labels.
- nucleic acid tags as detectable labels can be amplified quickly and efficiently, for example, by any known nucleic acid amplification method.
- the detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
- label and “detectable label” comprise a directly or indirectly detectable moiety that is associated with (e.g., conjugated to) a molecule to be detected, e.g., a detectable probe, comprising, but not limited to, fluorophores, radioactive isotopes, fluorescers, chemiluminescers, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, chromophores, dyes, metal ions, metal sols, ligands (e.g., biotin or haptens) and the like.
- a detectable probe comprising, but not limited to, fluorophores, radioactive isotopes, fluorescers, chemiluminescers, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, chromophores, dyes, metal ions, metal sols, ligands (e.g., biotin or haptens) and the like.
- the detectable labels of the first binder and/or the second binder are each comprise a fluorophore, which is a substance that is capable of exhibiting fluorescence in the detectable range.
- fluorophore which is a substance that is capable of exhibiting fluorescence in the detectable range.
- labels that may be used in accordance with the provided embodiments comprise, but are not limited to phycoerythrin, Alexa dyes, fluorescein, YPet, CyPet, Cascade blue, allophycocyanin, Cy3, Cy5, Cy7, rhodamine, dansyl, umbelliferone, Texas red, luminol, acradimum esters, biotin, green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (EYFP), blue fluorescent protein (BFP), red fluorescent protein (RFP), firefly luciferase, Renilla luciferase, NADPH, beta-gal
- obtaining or retaining identifying information regarding the binder bound to the glycan attachment residue comprises analyzing the detectable label of the binder.
- analyzing the detectable label of the binder comprises generating an amplification product (e.g., RCA product or bridge amplification product) (see, e.g., FIG.
- the methods further comprise incubating the detection probes comprising labels with the amplification product, washing unbound detection probes, and detecting the labels, e.g., by imaging (see, e.g., FIG. 11).
- the method comprises sequential hybridization of detectab ly labelled probes to probes (e.g., at the overhangs) hybridizes to the amplification products, thereby generating a spatiotemporal signal signature or code that identifies or corresponds to an encoder barcode sequence in the amplification product, which can be used to identify the binder that binds to that bind to the TAA residue of the immobilized and cleaved polypeptide.
- fluorescence microscopy is used for detection and imaging of binder’s detectable label or a detection probe.
- a fluorescence microscope is an optical microscope that uses fluorescence and phosphorescence instead of, or in addition to, reflection and absorption to study properties of organic or inorganic substances.
- a sample e.g., a support comprising a plurality of polypeptides each attached to a location in a plurality of spatially separated locations of the support
- the fluoresced light which is usually at a longer wavelength than the illumination, is then imaged through a microscope objective.
- Two filters may be used in this technique; an illumination (or excitation) filter which ensures the illumination is near monochromatic and at the correct wavelength, and a second emission (or barrier) filter which ensures none of the excitation light source reaches the detector.
- the “fluorescence microscope” comprises any microscope that uses fluorescence to generate an image, whether it is a simpler setup like an epifluorescence microscope, or a more complicated design such as a confocal microscope, which uses optical sectioning to get better resolution of the fluorescent image.
- confocal microscopy is used for detection and imaging of binder’s detectable label. Confocal microscopy uses point illumination and a pinhole in an optically conjugate plane in front of the detector to eliminate out-of-focus signal.
- binder’s detectable label is a fluorescent probe, which is detected using an optical detection system, which may include, without limitation, a near- field scanning microscopy, far-field confocal microscopy, charge-coupled device (CCD), total internal reflection fluorescence (TIRF) microscopy, super-resolution fluorescence microscopy, dark field microscopy, photoconversion, single and/or multiphoton excitation, spectral wavelength discrimination, fluorophore identification, evanescent wave illumination, and singlemolecule localization microscopy.
- methods include detection of laser- activated fluorescence using a microscope equipped with a camera. Suitable photon detection systems include, without limitation, photodiodes and intensified CCD cameras.
- imaging systems known in the art and used during detection of fluorescent signals in DNA sequencing methods may be used in methods disclosed herein.
- Exemplary imaging systems are disclosed in US9897791B2, US7589315B2, US8039817B2, US8412467B2 and US11834756B2, each of which is incorporated herein by reference.
- polypeptides to be analyzed are attached to a solid support.
- polypeptide analytes form an array on the solid support.
- polypeptide analytes form an ordered array on the solid support.
- polypeptide analytes form an unordered array on the solid support.
- polypeptide analytes are covalently immobilized on the solid support.
- polypeptide analytes are covalently attached to the solid support via nucleic acid molecules such as described in US 1 l,634,709B2, incorporated by reference.
- the solid support is a solid support or surface used for immobilization of nucleic acid analytes during NGS nucleic acid sequencing.
- a variety of such supports are known in the art and used in NGS nucleic acid sequencing instruments, such as instruments from Illumina, Pacific Biosciences, or other companies which produce and sell NGS instruments.
- NGS nucleic acid sequencing instruments such as instruments from Illumina, Pacific Biosciences, or other companies which produce and sell NGS instruments.
- the solid support is planar flow cell.
- the solid support is a glass support with patterned or unpattemed nanowells containing capture nucleic acid probes that can be used in methods disclosed herein for polypeptide immobilization and analysis.
- the solid support is a membrane, such as a nylon membrane. Exemplary materials and other parameters of solid support that can be used in methods disclosed herein are disclosed in US9902951 B2, US9758825 B2, US12104281 B2, US11203612B2, US5846719 A, US5667976A, US8698102B2, and US 11732301 B2, each of which is incorporated herein by reference.
- the amplification product is sequenced using sequencing by synthesis in situ at the location of the polypeptide on the support, where a first population of detectab ly labeled nucleotides (e.g., dNTPs) are introduced to contact a template nucleotide (e.g., a barcode sequence in an RCA product or a bridge amplification product) hybridized to a sequencing primer, and a first detectab ly labeled nucleotide (e.g., A, T, C, or G nucleotide) is incorporated by a polymerase to extend the sequencing primer in the 5' to 3' direction using a complementary nucleotide (a first nucleotide residue) in the template nucleotide as template.
- a first population of detectab ly labeled nucleotides e.g., dNTPs
- a template nucleotide e.g., a barcode sequence in an RCA product or a bridge
- a signal from the first detectably labeled nucleotide can then be detected.
- the first population of nucleotides may be continuously introduced, but in order for a second detectably labeled nucleotide to incorporate into the extended sequencing primer, nucleotides in the first population of nucleotides that have not incorporated into a sequencing primer are generally removed (e.g., by washing), and a second population of detectably labeled nucleotides are introduced into the reaction.
- a second detectably labeled nucleotide e.g., A, T, C, or G nucleotide
- a complementary nucleotide a second nucleotide residue
- the amplification product is sequenced using sequencing by synthesis which comprises contacting the amplification product with a nucleotide mix comprising a fluorescently labeled nucleotide and a nucleotide that is not fluorescently labeled.
- a cognate nucleotide is incorporated by the polymerase into the sequencing primer or an extension product thereof, and the cognate nucleotide may or may not be fluorescently labeled. Sequencing by synthesis methods comprise those described in, for example, but not limited to, US 2007/0166705, US 2006/0188901, U.S. Pat. No.
- the amplification product is sequenced in situ at the location of the polypeptide on the support using a polymerase that is fluorescently labeled. In some embodiments, the amplification product is sequenced in situ at the location of the polypeptide on the support using a polymerase-nucleotide conjugate comprising a fluorescently labeled polymerase linked to a nucleotide moiety that is not fluorescently labeled.
- nucleic acid hybridization can be used for multiplex detecting amplification products, using labeled nucleic acid decoder probes that are complementary to at least a portion of a barcode sequence. Multiplex decoding can be performed with pools of many different probes with distinguishable labels.
- Non-limiting examples of nucleic acid hybridization sequencing are described for example in U.S. Pat. No. 8,460,865, and in Gunderson et al., Genome Research 14:870-877 (2004).
- the amplification product is sequenced using sequencing by binding, using a polymerase that is fluorescently labeled and one or more nucleotides that are not fluorescently labeled.
- a cognate nucleotide is not incorporated by the polymerase into the sequencing primer or an extension product thereof.
- incorporation of a cognate nucleotide by the polymerase into the sequencing primer or an extension product thereof is attenuated or inhibited.
- Various aspects of sequencing by binding are described in U.S. Pat. No. 10,655,176 B2, the content of which is herein incorporated by reference in its entirety.
- sequencing by binding comprises performing repetitive cycles of detecting a stabilized complex that forms at each position along the template nucleic acid to be sequenced (e.g. a ternary complex that includes the primed template nucleic acid, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template nucleic acid.
- a stabilized complex that forms at each position along the template nucleic acid to be sequenced (e.g. a ternary complex that includes the primed template nucleic acid, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template nucleic acid.
- detection of the nucleotide at each position of the template occurs prior to extension of the primer to the next position.
- the methodology is used to distinguish the four different nucleotide types that can be present at positions along a nucleic acid template by uniquely labelling each type of ternary complex (i.e. different types of ternary complexes differing in the type of nucleotide it contains) or by separately delivering the reagents needed to form each type of ternary complex.
- the labeling may comprise fluorescence labeling of, e.g., the cognate nucleotide or the polymerase that participate in the ternary complex.
- the detection by probe hybridization comprises sequential hybridization of fluorescently labeled probes to the amplification product (see, e.g., FIG. 10 and FIG. 11).
- the analysis of the detectable label comprises detecting a polymer generated by a hybridization chain reaction (HCR) reaction, see e.g., US 2017/0009278, which is incorporated herein by reference in its entirety, for exemplary probes and HCR reaction components.
- the analysis of the detectable label comprises hybridizing to the amplification product a detection oligonucleotide (e.g., a detection probe) labeled with a fluorophore, an isotope, a mass tag, or a combination thereof.
- the detection or determination comprises imaging the amplification product, e.g., while the amplification product is attached to a location in a plurality of spatially separated locations of a support such as a flow cell.
- the analysis of binder’s detectable label in the provided methods comprises imaging the amplification product (e.g., RCA product or a rolling circle transcription product) via binding of the detection probe to the amplification product and detecting a label associated with the detection probe.
- the label associated with the detection probe can be measured and quantitated.
- the detectable label of each binder comprises a unique epitope.
- unique epitopes of binders of the set of binders are distinguished and detected by antibodies that each bind to a unique epitope.
- the signal may be further amplified using methods known in the art, such as using secondary antibodies to decorate primary antibodies upon binding to the cognate epitope.
- each of the immobilized polypeptides is attached to a nucleic acid recording tag.
- each binder of the set of binders has a certain level of specificity towards one or more TAA residues or modified TAA residues which the binder is engineered to bind.
- binder specificity towards a TAA residue implies a combination of affinity, which is a strength of interaction, and selectivity, which is whether the binder prefers the one or more target TAA residues over other TAA residues of polypeptides.
- Certain level of affinity between the binder and the one or more target TAA residues is required in order to generate an extended nucleic acid construct following binding of the binder to the TAA residue of the polypeptide, because merely transient binding may not be enough to enable “productive” interaction between the coding tag of the binder and recording tag of the polypeptide.
- a binder binds to a modified or labeled terminal amino acid (e.g., an NTAA that has been functionalized or modified). In some embodiments, a binder binds to a chemically or enzymatically modified terminal amino acid.
- a modified or labeled NTAA can be one that is functionalized with phenylisothiocyanate, PITC, l-fluoro-2,4-dinitrobenzene (Sanger’s reagent, DNFB), benzyloxycarbonyl chloride or carbobenzoxy chloride (Cbz-Cl), N- (Benzyloxycarbonyloxy)succinimide (Cbz-OSu or Cbz-O-NHS), dansyl chloride (DNS-C1, or 1- dimethylaminonaphthalene-5-sulfonyl chloride), 4-sulfonyl-2-nitrofluorobenzene (SNFB), N- Acetyl-Isatoic Anhydride, Isatoic Anhydride, 2-Pyridinecarboxaldehyde, 2- Formylphenylboronic acid, 2-Acetylphenylboronic acid, l-Fluoro-2,4-dinitrobenzene,
- a modifier agent configured to modify a terminal amino acid (TAA) of a polypeptide to yield the modified TAA comprises a compound of any one of Formulas (l)-(10), wherein:
- R 6 and R 7 are each independently Ci-6 alkyl, -CO2C1-4 alkyl, -OR k , aryl, heteroaryl, cycloalkyl or heterocyclyl, wherein the C1-6 alkyl, -CO2C1-4 alkyl, -OR k , aryl, and cycloalkyl are each unsubstituted or substituted; and
- R k is H, C1-6 alkyl, or heterocyclyl, wherein the C1-6 alkyl and heterocyclyl are each unsubstituted or substituted.
- Heterocyclyl can be 5-8 membered ring comprising one or two heteroatoms selected from N, O and S as ring members.
- Heteroaryl can be a 5-6 membered single ring or 8-10 membered bicyclic ring, each of which comprises one to three heteroatoms selected from N, O and S as ring members;
- Formula (2) is: wherein: each R is independently C1-4 alkyl, optionally substituted with up to three groups selected from halo, C1-2 alkoxy, C1-2 haloalkyl, and C1-2 haloalkoxy; and two R groups on the same N can optionally cyclize to form a 5-7 membered ring, optionally containing an additional heteroatom selected from N, O and S as a ring member, and optionally substituted with one or two groups selected from oxo, C1-2 alkyl, C1-2 alkoxy,
- G is selected from halo, benzotriazolyloxy, halobenzotriazolyloxy, pyridinotriazolyloxy, benzotriazolyl-N-oxide, pyridinotriazolyl-N-oxide, -O-(N-succinimide), l-cyano-2-ethoxy-2- oxoethylideneaminooxy, and -O-(N-phthalimide);
- Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O, B, and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR) 2 , aryl, -SR 4 , -S(O) n R 4 , -NR 4 SO 2 R 4 , -SO 2 N(R
- n at each occurrence is independently 1 or 2; and each R and R 4 is independently selected from H, C1-2 alkyl, and C1-C2 haloalkyl;
- W is a bond or a group selected from alkyl, cycloalkyl, heterocyclyl, aryl, heteroaryl, and bicyclic heteroaryl, each of which is optionally substituted with up to four groups independently selected from halo, OH, cyano, azido, -SR 4 , -S(O) n R 4 , -NR 4 SO2R 4 , -SO2N(R 4 )2, -B(OR 4 )2, oxo (unless W is aromatic), amino, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 alkoxy, and C1-C2 haloalkoxy; when W is a ring, ring W may be saturated, unsaturated, or aromatic; when W is a heterocyclic or heteroaromatic ring, it may contain one or two heteroatoms selected from N, O and S as ring members; represents an optional linkage connecting R 10 and L 2 into a 5-6 membere
- R 10 is selected from H, halo, CN, NH 2 , NH(CH 3 ), N(CH 3 ) 2 , NO 2 NHFmoc, NHBoc, C(O)NR 2 , NHC(O)R, NHC(O)OR, B(OR) 2 , aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, Ci- C2 haloalkoxy, and -OR 4 ; and R 10 is absent when W is a bond;
- Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O,B and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy,
- R 4 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl;
- Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O, B, and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR) 2 , aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalk
- R 2 and R 2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or
- R 2 and R 2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R 2 or R 2 is aromatic) selected from halo, CN, NH 2 , NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR) 2 , aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR 4 ; each R and R 4 is independently selected at each occurrence from H, C1-C2 alkyl, and Ci- C2 haloalkyl; n at each occurrence is independently 1 or 2; and
- R 5 is independently selected at each occurrence from H, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 alkoxy, and C1-C2 haloalkoxy;
- M is a cationic counterion
- G'-G 5 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G 5 are N; the dashed bonds can be single bonds or double bonds;
- R 2 and R 2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or
- R 2 and R 2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R 2 or R 2 is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH 3 ) 2 , protected amine (e.g., N3, NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR) 2 , aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR 4 ; each R, R 4 and R 8 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl; and n at each occurrence is independently 1 or 2; and
- R 9 is H, CH3, benzyl, substituted benzyl
- G'-G 5 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G 5 are N;
- R 2 and R 2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or R 2 and R 2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R 2 or R 2 is aromatic) selected from halo, CN, NH2, NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -
- R 11 is H, CH 3 , benzyl, or substituted benzyl
- M is a cationic counterion
- G'-G 4 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G 4 are N;
- R 2 and R 2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or R 2 and R 2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R 2 or R 2 is aromatic) selected from halo, CN, NH2, NH(CH 3 ), N(CH 3 ) 2 , protected amine (e.g., N 3 , NO 2 , NHFmoc, NHBoc), C(O)NR 2 , NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -
- R 12 represents one or two optional substituents on the pyridinium ring, which are independently selected from C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and halo;
- G'-G 4 are each independently selected from CH, CJ, and N, provided not more than 3 of G'-G 4 areN;
- R 13 is selected from H, C1-C2 alkyl, C1-C2 alkoxy, C1-C2 haloalkyl, and C1-C2 haloalkoxy;
- M is a metal binding group selected from the group consisting of sulfonamide, hydroxamic acid, sulfamate, and sulfamide; the group is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF 3 , NH 2 , N(CH 3 ) 2 , NO 2 , SCH 3 , SO2CH3, CH2OH, B(OH) 2 , CN, CONH2, and CONHCH3; and LG is a leaving group.
- the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3):
- M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R;
- R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF 2 H, CF 3 , OH, OCH 3 , OCF 3 , NH 2 , N(CH 3 ) 2 , NO 2 , SCH 3 , SO 2 CH 3 , CH 2 OH, B(OH) 2 , CN, CONH 2 , CO 2 H, CN 4 H, and CONHCH3;
- LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, s
- binders that specifically bind to terminal amino acid residues of polypeptides, which are used in the NGPS assay.
- Such binders may be developed by a number of approaches as disclosed in the art, and some of them are outlined below in Example 1.
- nucleic acid barcode may be used to decode the identity of the TAA residue by using known information regarding binding kinetics and/or specificity of the binders bound to the polypeptide at a given binding cycle.
- a first group of the binders are determined which each of which was bound to individual TAA residues of each polypeptide analyte.
- both known specificities of binders for NTAA or CTAA residues and their order of binding to a polypeptide analyte are used to decode identify of the polypeptide analyte.
- the nucleic acid barcode may be used as an input to a probabilistic neural network which was trained to relate the sequence of the barcode to amino acid identity.
- Training can be performed by testing each binder individually (optionally, conjugated to a coding tag) against a panel of peptides each having a different NTAA or CTAA residue (optionally, with an associated recording tag), collecting sequence information of the recording tags extended after the binding, and feeding the collected information to the probabilistic neural network.
- training can be performed by testing a set of binders (optionally, each conjugated to a coding tag) against the panel of peptides, collecting sequence information of the recording tags extended after the binding, and feeding the collected information to the probabilistic neural network.
- each immobilized polypeptide is back-translated into a series of unique nucleic acid barcodes on the corresponding recording tag attached to the immobilized peptide.
- sequence of the extended recording tag can be analyzed to extract the abovementioned nucleic acid barcodes that correspond to each encoding cycle.
- an artificial intelligence (Al) model can be applied to calculate probabilities of occurrence of specific types of amino acid residues in corresponding places in amino acid sequence of the analyzed peptide.
- the Al model can be pre -trained using multiple known peptide sequences, which were used to generate encoding nucleic acid data on associated recording tags. Modeling encoding of multiple known peptides using known binders allows for training the Al model to faithfully predict amino acid residues based on provided barcode nucleic acid sequences.
- the generated DNA barcodes on the extended recording tag of each polypeptide analyte are input to a probabilistic neural network (PNN) which will leam to relate the sequence of a DNA barcode to an amino acid identity.
- PNN probabilistic neural network
- Probabilistic neural networks can approach Bayes optimal classification for multiclass problems such as amino acid identification from DNA barcodes (Klocker, J., et al., Bayesian Neural Networks for Aroma Classification.
- a classifier based on PNN is guaranteed to learn and converge to an optimal classifier as the size of the representative data set increases.
- Probabilistic neural networks have parallel structure such that data from any amino acid residue are used to learn all other amino acid residues.
- the disclosed methods are used for peptide sequence determination based on probabilistic neural network ensembles.
- the machine learning method is characterized in that the sequence determination can be realized by the following steps: i) the peptide fragments of proteins are encoded using binders into stretches of DNA sequences based on the physicochemical properties of amino acid residues; ii) a group of probabilistic neural network sub-classifiers are established, peptide fragments of proteins with known sequence are used to perform amino acid classification training and obtain a group of trained amino acid classification models; iii) the obtained models are utilized to determine peptide amino acid sequences in the test data sets; iv) the classification results output by the models are counted to generate amino acid candidate sets; v) the methods showing highest accuracy are combined to determine the amino acid sequence of protein peptide fragment; and vi) the algorithmic amino acid determination result is verified through k-fold cross-validation, where k is an integer.
- k-fold cross-validation operates as follows. In k-fold cross- validation, the dataset is shuffled and divided into k groups randomly with no overlap and replacements. This means each group is unique and is used for model evaluation only once. The data groups are carried through the following steps to perform the k-fold cross-validation:
- a unique group is taken as a test data set
- a model is built using the training set and is evaluated using the test set;
- Step 1-4 are repeated until all k groups are used for model evaluation;
- the nucleic acid barcodes on the extended recording tag of each polypeptide analyte are input to a probabilistic neural network (PNN), which will learn to relate the nucleic acid sequence of a barcode to an amino acid identity of the analyzed polypeptide.
- PNN probabilistic neural network
- other statistical models e.g., hidden Markov models
- machine learning methods e.g., random forest models
- the binder further comprises one or more detectable labels such as fluorescent labels, in addition to the binding moiety that recognize a terminal amino acid residue in polypeptides.
- the binder does not comprise a polynucleotide such as a coding tag.
- the binder comprises a synthetic or natural antibody.
- the binder comprises an aptamer.
- the binder comprises an engineered protein binder, such one disclosed in Example 1 below, and a detectable label.
- the detectable label is optically detectable.
- the detectable label comprises a fluorescently moiety, a color-coded nanoparticle, a quantum dot or any combination thereof.
- the label comprises a polystyrene dye encompassing a core dye molecule such as a FluoSphereTM, Nile Red, fluorescein, rhodamine, derivatized rhodamine dyes, such as TAMRA, phosphor, polymethadine dye, fluorescent phosphoramidite, TEXAS RED, green fluorescent protein, acridine, cyanine, cyanine 5 dye, cyanine 3 dye, 5-(2'-aminoethyl)-aminonaphthalene-l-sulfonic acid (EDANS), BODIPY, 120 ALEXA or a derivative or modification of any of the foregoing.
- the detectable label is resistant to photobleaching while producing signal (such as photons) at a unique and easily detectable wavelength, with high signal-to-noise ratio.
- each binder comprises a detectable label
- the detectable label is used to record binding between a polypeptide analyte and the binder.
- the identity of the terminal amino acid residue can be determined with certain probability.
- N-binder and D-binder comprise different distinguishable detectable labels, and upon detection of the signal from one of these detectable labels, information regarding glycosylation at the glycosylation site can be inferred.
- the polypeptide is joined to a support before performing the encoding reaction.
- a support with a large carrying capacity to immobilize a large number of polypeptides.
- the preparation of the polypeptides including joining the polypeptide to a support is performed prior to performing the binding reaction.
- the preparation of the polypeptide including joining the polypeptide to nucleic acid molecule or an oligonucleotide may be performed prior to or after immobilizing the polypeptide.
- a plurality of polypeptides are attached to a support prior to the binding reaction and contacting with a binder.
- the support may comprise any suitable solid material, including porous and non-porous materials, to which a polypeptide, e.g., a polypeptide, can be associated directly or indirectly, by any means known in the art, including covalent and non- covalent interactions, or any combination thereof.
- Exemplary reactions include click chemistry reactions, such as the copper catalyzed reaction of an azide and alkyne to form a triazole (Huisgen 1, 3-dipolar cycloaddition), strain-promoted azide alkyne cycloaddition (SPAAC), reaction of a diene and dienophile (Diels-Alder), strain- promoted alkyne-nitrone cycloaddition, reaction of a strained alkene with an azide, tetrazine or tetrazole, alkene and azide [3+2] cycloaddition, alkene and tetrazine inverse electron demand Diels- Alder (IEDDA) reaction (e.g., m-tetrazine (mTet) or phenyl tetrazine (pTet) and transcyclooctene (TCO); or pTet and an alkene), alkene and tetrazol
- Exemplary displacement reactions include reaction of an amine with: an activated ester; an N-hydroxysuccinimide ester; an isocyanate; an isothioscyanate, an aldehyde, an epoxide, or the like.
- iEDDA click chemistry is used for immobilizing polypeptides to a support since it is rapid and delivers high yields at low input concentrations.
- m-tetrazine rather than tetrazine is used in an iEDDA click chemistry reaction, as m-tetrazine has improved bond stability.
- phenyl tetrazine (pTet) is used in an iEDDA click chemistry reaction.
- a polypeptide is labeled with a bifunctional click chemistry reagent, such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide.
- a bifunctional click chemistry reagent such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide.
- an alkyne can also be a strained alkyne, such as cyclooctynes including Dibenzocyclooctyl (DBCO).
- DBCO Dibenzocyclooctyl
- Similar methods e.g., click chemistry reactions, bioorthogonal reactions
- Such attachments can be achieved by introducing reactive moiety or moieties on one or on both attachment partners.
- a plurality of different polypeptides is immobilized on a solid support, wherein each polypeptide of the plurality of different polypeptides is attached to a nucleic acid recording tag.
- a recording tag may be directly linked to the polypeptide, linked to a polypeptide via a linker, via a multifunctional linker, or attached to a polypeptide by virtue of its proximity (or colocalization) on the support.
- the recording tag is attached to the support, and the polypeptide is immobilized on the support via the recording tag.
- a linker is attached to the support, and the polypeptide and the recording tag are independently attached to the linker, thereby generating immobilization on the support and association of the polypeptide with the recording tag.
- Other immobilization and association variants are possible.
- the recording tag can include a sample identifying barcode.
- a sample barcode is useful in the multiplexed analysis of a set of samples in a single reaction vessel or immobilized to a single solid support (e.g., a bead or a planar substrate) or collection of solid supports.
- a recording tag comprises an optional unique molecular identifier (UMI), which provides a unique identifier tag for each polypeptide to which the UMI is associated with.
- UMI unique molecular identifier
- a UMI can be used to de-convolute sequencing data from a plurality of extended recording tags to identify sequence reads from individual polypeptides.
- each polypeptide is attached to a single recording tag, with each recording tag comprising a unique UMI.
- multiple copies of a recording tag are attached to a single polypeptide, with each copy of the recording tag comprising the same UMI.
- a recording tag comprises a universal priming site, e.g., a forward or 5’ universal priming site.
- a universal priming site is a nucleic acid sequence that may be used for priming a library amplification reaction and/or for sequencing.
- a universal priming site may include, but is not limited to, a priming site for PCR amplification, flow cell adaptor sequences that anneal to complementary oligonucleotides on flow cell surfaces (e.g., Illumina next generation sequencing), a sequencing priming site, or a combination thereof.
- a universal priming site comprises an Illumina P5 primer or an Illumina P7 primer for NGS.
- a polypeptide can be immobilized to a support by an affinity capture reagent (and optionally covalently crosslinked), wherein the recording tag is attached to the affinity capture reagent directly, or alternatively, the polypeptide can be directly immobilized to the support with a recording tag.
- the polypeptide is attached to a bait nucleic acid which hybridizes to a capture nucleic acid and is ligated to a capture nucleic acid which comprises a reactive coupling moiety for attaching to the support (see, e.g., US 2022/0049246 Al, incorporated by reference herein).
- the polypeptide molecules can be spaced appropriately to accommodate methods of performing the binding reaction and any downstream analysis steps to be used to assess the polypeptide. For example, it may be advantageous to space the polypeptide molecules that optimally to allow a nucleic acid-based method for assessing and sequencing the proteins to be performed.
- the method for assessing and sequencing protein polypeptides involves a binder which binds to the polypeptide molecules and the binder comprises a coding tag with information that is transferred to a nucleic acid attached to the polypeptide molecules.
- spacing of the polypeptides on the support is determined based on the consideration that information transfer from a coding tag of a binder bound to one polypeptide molecule may reach a neighboring molecule.
- the surface of the support is passivated (blocked).
- a “passivated” surface refers to a surface that has been treated with outer layer of material.
- Methods of passivating surfaces include standard methods from the fluorescent single molecule analysis literature, including passivating surfaces with polymer like polyethylene glycol (PEG) (Pan et al., 2015, Phys. Biol. 12:045006), polysiloxane (e.g., Pluronic F-127), star polymers (e.g., star PEG) (Groll et al., 2010, Methods Enzymol.
- the density of functional coupling groups for attaching the polypeptide may be titrated on the support surface.
- multiple polypeptide molecules are spaced apart on the surface or within the volume (e.g., porous supports) of a support such that adjacent molecules are spaced apart at a distance of about 50 nm to about 500 nm.
- multiple molecules are spaced apart on the surface of a support with an average distance of at least 50 nm, at least 100 nm, at least 200 nm, or at least 500 nm.
- appropriate spacing of the polypeptide molecules on the support is accomplished by titrating the ratio of available attachment molecules on the support surface.
- the support surface e.g., bead surface
- the support surface is functionalized with a carboxyl group (COOH) which is treated with an activating agent (e.g. , activating agent is EDC and Sulfo-NHS).
- an activating agent e.g. , activating agent is EDC and Sulfo-NHS
- the support surface e.g., bead surface
- a mixture of mPEGn-NH2 and NH2-PEGn-mTet is added to the activated beads (wherein n is any number, such as 1-100).
- the ratio between the mPEG3- NH2 (not available for coupling) and NH2-PEG24-mTet (available for coupling) is titrated to generate an appropriate density of functional moieties available to attach the polypeptides on the support surface.
- the mean spacing between coupling moieties (e.g., NH2-PEG4-mTet) on the solid support is at least 50 nm, at least 100 nm, or at least 500 nm.
- the ratio of NPb-PEGn-mTct to mPEG3-NH2 is about or greater than 1:1000, about or greater than 1:10,000.
- the recording tag attaches to the NH2-PEGn-mTet.
- the spacing of the polypeptide molecules on the support is achieved by controlling the concentration and/or number of available functional groups on the support.
- a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide comprising:
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and
- a method for analyzing a polypeptide comprising:
- each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D), wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
- TAA residues are N-terminal amino acid (NTAA) residues.
- TAA N-terminal amino acid
- TAA is a C-terminal amino acid
- the N-linked glycosylation site comprises any one of the following amino acid sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx is any standard, naturally occurring amino acid residue.
- detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
- detectable labels of the first binder and/or the second binder are each comprise a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
- nucleic acid sequence of the portion of the extended nucleic acid construct is determined using a DNA sequencer.
- determining amino acid identity of the amino acid residue treated with the PNGase comprises determining a Likelihood of a particular type of the amino acid residue.
- a method for detecting N-linked glycosylation of polypeptides comprising a first polypeptide having a N-linked glycosylation site comprising:
- each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
- the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide;
- step (c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
- TAA is an N-terminal amino acid (NTAA) and the new TAA is a new NTAA.
- TAA is a C-terminal amino acid (CTAA) and the new TAA is a new CTAA.
- each binder of the plurality binds to a modified TAA of an immobilized polypeptide.
- modified TAA is a modified N-terminal amino acid (NTAA).
- N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3): wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF 2 H, CF 3 , OH, OCH 3 , OCF 3 , NH 2 , N(CH 3 ) 2 , NO 2 , SCH 3 , SO 2 CH 3 , CH 2 OH, B(OH) 2 , CN, CONH 2 , CO 2 H, CN 4 H, and CONHCH3; LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each RQ is independently aryl or heteroary
- each binder of the plurality comprises an identifying detectable label; (ii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting the identifying detectable label.
- the identifying detectable label comprises a fluorescent moiety
- the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting a signal from the fluorescent moiety.
- step (b) for the immobilized first polypeptide molecules is performed by nucleic acid sequencing of recording tags attached to the first polypeptide molecules and extended upon transfer from the nucleic acid coding tags.
- N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, A snXxxThr or AsnXxxCys.
- steps (i)-(iii) or (i)-(ii) are repeated at least two, at least three, at least four or more times.
- any one of embodiments 35-59 further comprising determining identity of the first polypeptide based on analysis of (i) the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, and (ii) identifying information regarding at least one additional binder of the set of binders that was bound to some of first polypeptide molecules during step (b).
- a method for analyzing a polypeptide comprising a N-linked glycosylation site comprising:
- each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder, wherein the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptid
- each binder specifically binds to a terminal amino acid (TAA) or a modified TAA residue of the polypeptide immobilized on the solid support.
- TAA terminal amino acid
- TAA N-terminal amino acid
- TAA is a C-terminal amino acid
- determining comprises determining identity of at least one amino acid residue present at the N-linked glycosylation site of different immobilized polypeptide molecules.
- N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys.
- steps (ii)-(iii) are repeated one or more times by cleaving the cleaved immobilized polypeptide generating a further cleaved immobilized polypeptide, and contacting the further cleaved immobilized polypeptide with a new subsequent set of binders, wherein each binder of the new subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder.
- Example 1 Generation of binders that specifically bind to terminal amino acid residues of polypeptides.
- binders that specifically bind to terminal amino acid residues of polypeptides, which are used in the NGPS assay.
- binders may be developed by a number of approaches as disclosed in the art, and some of them are briefly outlined below.
- US 2022/0283175 Al discloses methods for generating binders based on metalloprotein scaffolds, and in particular human carbonic anhydrase scaffolds. It disclosed selection and design of engineered binders suitable for the NGPS assay and capable of binding NTM-P1 with minimal P2 bias, where Pl and P2 are the first two N-terminal amino acids, and NTM is a modification of Pl by an N-terminal modifier agent (see, e.g., Example 2 of US 2022/0283175 Al).
- each binder of the set of binders used in the disclosed methods specifically binds to a particular terminal residue of polypeptides, wherein the binder comprises an amino acid sequence X1-C/H/D/E-X2-C/H/D/E-X3-C/H/D/E-X4, wherein C/H/D/E is any single amino acid residue independently selected from the group consisting of amino acid residues C (Cys), H (His), D (Asp), and E (Glu); XI, X2, X3 and X4 are each any amino acid sequence independently comprising between 0 and 500 amino acid residues in length, and wherein the amino acid sequence XI -C/H/D/E -X2-C/H/D/E-X3 -C/H/D/E -X4 chelates the zinc metal cation M with a thermodynamic dissociation constant of 0.5 nM or less.
- NTM A variety of NTM have been tested to increase affinity of an engineered binder towards Pl residue, with M64 being an exemplary NTM (see Example 2 below).
- Specific binders were engineered as described in US 2022/0283175 Al using computed modeling, phage display screening and affinity maturation (see also Examples 3-5 for brief description).
- exemplary binders generated by methods described in US 2022/0283175 Al and specific for a particular modified NTAA of polypeptides are shown in SEQ ID NOs: 1, 3, 5, 7, 8, 11, 13, 15, 17, and 19.
- US 10852305 B2 discloses methods for generating binders based on different tRNA synthetase scaffolds.
- tRNA synthetases RS
- RS tRNA synthetases
- binding pocket in these molecules can be modified to permit the entry of peptides presenting the specifically bound amino acid (as demonstrated in US 10852305 B2).
- specific N-binder and D-binder may be derived from SEQ ID NO: 4 and SEQ ID NO: 9, respectively, as disclosed in US 10852305 B2, Table A. Binders specific for other non-modified NTAA residues were also disclosed.
- US 20200219590 Al discloses methods for generating binders specific for nonmodified NTAA residues to use in high-throughput protein sequencing.
- Table 1 of US 20200219590 Al provides a list of binder sequences together with the amino acid binding preferences of each molecule with respect to amino acid identity at a terminal position of a polypeptide.
- specific D-binder may be derived from SEQ ID NO: 10, as disclosed in US 20200219590 Al, Table 1.
- binders capable of binding to all possible NTAA residues (as well as modified NTAA residues) of polypeptide analytes with a certain level of selectivity are known in the art (see e.g., U.S. Patent Nos. 9,566,335, 10,852,305, and 9,435,810; patent publications US 20190145982 Al, US 2022/0283175 Al, WO 2022/072560 Al, WO 2010/065322 Al, incorporated by reference in its entirety).
- binders capable of binding to CTAA residue of a polypeptide comprise engineered catalytically inactive carboxypeptidases.
- the catalytic residues of the carboxypeptidase can be mutated to create a catalytically inactive enzyme, which still retains its binding ability.
- subtilisin serine proteases comprised of a canonical Ser-His-Asp catalytic triad, in which any or all of the catalytic residues can be mutated to alanine to render the enzyme largely catalytically inactive without affecting binding Km’s (disclosed in Carter P, Wells JA.
- Exemplar carboxypeptidases suitable for engineering include the MEROPS (Rawlings, N., cl a’L, MEROPS'. the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res (2014) 42: D503-D509) family S10, M14, M20 and M32 members; IUPAC classifications include EC 3.4.16 (serine carboxypeptidases), EC 2.4.17 (metallo-carboxypeptidases), and EC 3.4.18 (cysteine carboxypeptidases).
- the serine carboxypeptidases, the metallocarboxypeptidases, and the cysteine carboxypeptidases can be rendered catalytically inactivate by replacing residues within the Ser-His-Asp catalytic triad, the His-Xaa-Xaa-Glu (Ml 4) or His-Glu-X-X-His (M32) motif, or the Cys-His motif with alanine, respectively.
- selective binders can be derived from a carboxypeptidase.
- Carboxypeptidases are proteolytic enzymes that remove C-terminal amino acids/peptides from proteins. Two enzymatic mechanisms for carboxypeptidase activity have been identified: metalloproteases that employ zinc ions to affect catalysis and serine carboxypeptidases that employ an activated serine nucleophile (Breddam, K. Serine carboxypeptidases. A review. Carlsberg Res. Commun. (1986) 51, 83). Carboxypeptidases are a diverse enzyme family that generally demonstrate substrate sequence specificity derived from interactions between the substrate and enzyme active site.
- Metallo-carboxypeptidases are classified in the MEROPS peptidase database as families as M14, M15, M20, M28B, and M32 whereas serine carboxypeptidases belong to the S10, SI 1, S12, S28, S41, and S66.
- C-terminal specificity vs. aminopeptidase activity
- Carboxypeptidases vary in C-terminal amino acid specificity, and highly specific enzymes may represent particularly useful candidates for binder derivation.
- catalytic activity may be removed through genetic engineering or biochemical regulation (e.g. addition of inhibitor or metal chelator).
- selective binders can be derived from the S10 family, which comprises serine proteases with yeast Carboxypeptidase Y (CPY) as an exemplar member.
- CPY is used in peptide sequencing since it is processive and has very little bias for amino acid residues it removes from the C-terminus of the peptide (Pa tterson, D., et ah, C-terminal ladder sequencing via matrix-assisted laser desorption mass spectrometry coupled with carboxypeptidase Y time-dependent and concentration-dependent digestions.
- S10 family comprises serine proteases with yeast Carboxypeptidase Y (CPY) as an exemplar member.
- CPY is used in peptide sequencing since it is processive and has very little bias for amino acid residues it removes from the C-terminus of the peptide (Pa tterson, D., et ah, C-terminal ladder sequencing via matrix-assisted laser desorption mass spectrometry coupled with
- the CPY binding pocket for the C- terminal amino acids is comprised of mostly hydrophobic residues including Trp49, Asn51, Gly52, Cys56, Thr60, Phe64, Glue5, Glul45, Tyr256, Tyr269, Leu272, Ser297, Cys298 and Met398.
- CPY exhibits a broad specificity to the C-terminal residue of polypeptide substrates accommodating hydrophobic, hydrophilic, and aliphatic, residues due to its large binding pocket at the Si’ site in CPY. In contrast, CPY exhibits much greater specificity for the Pl residue (penultimate to C-terminus).
- the SI subsite is a deep pocket mainly constructed of hydrophobic residues, Tyrl47, Leul78, Tyrl85, Tyrl88, Asn241, Leu245, Trp312, Ile340, and Cys341 rendering a hydrophobic preference for the C-terminal penultimate residue (disclosed in US 5,945,329 Customized Protease; US 5,985,627 Modified Carboxypeptidase).
- the C- terminal recognition of CPY is accomplished strictly by hydrogen bonding.
- the carboxyl terminus of the peptide forms hydrogen bonds with the backbone amide of Gly52 and the side chains of Asn51 and Glul45 in CPY
- selective binders can be derived from the M14 family, which is comprised of metallo-carboxypeptidases including Carboxypeptidase A (CPA), Carboxypeptidase B (CPB), and Carboxypeptidase T (CPT) such as the thermophilic bacterial carboxypeptidase from Thermoactinomyces vulgaris (Gomis-Ruth, Structure and Mechanism of Metallocarboxypeptidases, Critical Reviews in Biochemistry and Molecular Biology, (2008) 43:5, 319-345).
- CPA Carboxypeptidase A
- CBP Carboxypeptidase B
- CPT Carboxypeptidase T
- the compact globular shape of the funnelin carboxypeptidases and cone-like entrance to the binding pocket are well suited to being engineered as C-terminal binders.
- Thermoactinomyces vulgaris CPT has broad substrate specificity against hydrophobic, hydrophilic, and charged residues at the C-terminus. This contrasts with the narrow substrate specificity of the CPA and CPB families which hydrolyze hydrophobic and positively charged residues, respectively, from the C-termini of the peptides (Akparov, V., et al., Structural insights into the broad substrate specificity of carboxypeptidase T from Thermoactinomyces vulgaris.
- the specificity for the C-terminal residue is largely determined by the identity of the amino acids comprising the specificity/b inding pocket.
- the M14 funnelin family of carboxypeptidases can have their specificity/binding pocket altered through directed evolution by mutating residues in the specificity/binding pocket.
- residues (CPA numbering) at locations 194, 203, 207, 243, 247, 248, 250, 253-255, and 268 play critical roles in C-terminal amino acid specificity (Gomis-Ruth, Structure and Mechanism of Metallocarboxypeptidases, Critical Reviews in Biochemistry and Molecular Biology, (2008) 43:5, 319-345).
- thermophilic CPT such as P halophilum, Th. vulgaris, or L. thermophila is used as a scaffold for binder.
- HLA Human leukocyte antigens
- MHC major histocompatibility complex
- binders capable of binding to a terminal amino acid residue are engineered from aminoacyl tRNA synthetases (aaRSs).
- the aaRSs are a class of proteins with extraordinarily acid binding specificity (US 9435810 B2).
- the set of 20+ aaRSs exhibit various modes of binding to the amino acids (AAs) including hydrophobic binding, hydrogen binding, salt-bridges, and pi-pi stacking (Kaiser F, et al.
- the structural basis of the genetic code amino acid recognition by aminoacyl-tRNA synthetases.
- aaRSs are N-terminal binders
- engineered aaRSs also offer the capability of generating high affinity interactions with amino acid residues having an exposed C- terminus.
- the aaRSs activate their cognate amino acids through C-terminal conjugation of ATP to produce an aminoacyl adenylate intermediate, which involves significant binding energy generation due to hydrophobic and hydrogen bonding.
- the activated amino acid is subsequently transferred to a respective tRNA to “charge” the tRNA for subsequent protein synthesis.
- an engineered catalytic portion of aaRS is used for CTAA residue binding.
- an engineered enzyme portion of the aaRS is used for CTAA residue binding (Pham, Yen et al., Tryptophanyl-tRNA Synthetase Enzyme, Journal of Biological Chemistry, Volume 285, Issue 49, 38590 - 38601).
- Example 2 Exemplary origin, synthesis and installation of NTMs on NTAA residues of peptides by N-terminal modifier agents.
- N-terminal modifier agent for NTM M64 (in the ester form).
- Peptides, in solution or on solid-support, were dissolved in 25 uL of 0.4 M MOPS buffer, pH 7.6 and 25 uL of acetonitrile (ACN).
- ACN acetonitrile
- 50 uL of the active ester stock solution was added to the peptide- ACN:MOPS solution and incubated at 65 °C for 60 minutes.
- the peptides were functionalized with the respective modification as shown in the above schemes.
- a surfactant-aqueous coupled system can be employed to install NTM (M64) onto the N-terminal amino acid of peptides.
- NTM N-terminal amino acid of peptides.
- NTMs have been similarly installed on N-terminal amino acids of peptides to increase affinity of N-terminal binders.
- Exemplary NTM materials and syntheses were described in US 2022/0283175 Al and in U.S. provisional application 63/525,347 fded July 06, 2023, each incorporated herein by reference.
- Example 3 Binder engineering from relevant scaffolds.
- Binder engineering involves improving affinities of potential binding sites through rational, structure -based approaches on a parental scaffold and generating libraries that contain degenerate NNK codons at multiple, defined positions using Kunkel mutagenesis and phage display selection.
- Kunkel mutagenesis is a known site-directed mutagenesis strategy that introduces point mutations by annealing mutation-containing oligonucleotides to single-stranded uracil-containing single strand DNA (dU-ssDNA) templates.
- Exemplary Kunkel mutagenesis and phage display selection methods are described in US 9102711 B2; US 10906968 B2; and Kunkel, Proc. Natl. Acad. Sci. USA, 1985, 83(2):488-492.
- phage libraries using NNK variant site encoding were constructed targeting residues positions within the substrate-binding pockets of the selected metalloenzymes.
- Phosphorylated primers were obtained that possess degenerate codons at intended positions and were annealed to uracilated ssDNA containing the parental sequence of the same binder of interest with introduced SacII sites.
- the heteroduplex DNA was transformed into custom TG1RM cells (Lucigen TGI Electrocompetent Cells containing a pCDF-lb plasmid expressing SacII enzyme), which removed undesired template DNA with SacII sites resulting in 10 9 - 10 10 libraries.
- Monovalent phage libraries were packaged using standard helper phage and precipitated using PEG/NaCl solution. Using standard protocols, phage libraries were panned against different N-terminally modified target peptides. NTAA modification was applied to target peptides during binder screening and maturation to increase substrate surface available for interaction with the binder, which would result in selection of binders with higher affinity and Pl specificity.
- Luminex enables analysis of binding of phage libraries against multiple peptide targets immobilized on beads in a single assay well. This is accomplished by spatially separating immunoassays performed on beads that contain unique fluorophore cores that exhibit distinct excitation/emission profiles. Multiple target peptide-specific beads are combined in a single well of a multi-well microplate to detect and quantify multiple targets simultaneously. Specific binders were isolated against a variety of N-terminally modified target peptides. Based on the sequence identities after enrichment, consensus mutations or mutational hotspots were identified, and binders were expressed and purified for testing in the encoding assay.
- Binder maturation for affinity and specificity involved multiple cycles of error prone PCR prior to library construction via Kunkel mutagenesis and phage display selection, performed essentially as described in Example 3. Briefly, 60-90 cycles of error prone PCR on a parental binder generated PCR amplicons with an average of 4-6 random amino acid mutations per 100 amino acids. The dsDNA amplicon was digested by lambda exonuclease into “megaprimer” ssDNA, which was used to generate heteroduplex DNA by annealing to uracilated ssDNA of the vector containing the parental sequence of the same binder of interest with introduced SacII sites.
- the heteroduplex DNA was transformed into custom TG1RM cells (Lucigen TGI Electrocompetent Cells containing a pCDF-lb plasmid expressing SacII enzyme), which removed undesired template DNA with SacII sites resulting in 10 9 - 10 10 libraries.
- Monovalent phage libraries were packaged using standard helper phage and precipitated using PEG/NaCl solution. For each round of phage display selection, precipitated phage in the presence of peptide and protein competitors were first depleted against beads coated with off-target peptides for 1 hour at 24 °C and then panned against beads coated with target peptides for 1 hour at 24 °C.
- Plasmid DNA was received from a vendor generated source containing the identified engineered binders conjugated with an N-terminal hexa-histidine tag and a C-terminal SpyCatcher domain (sequences of selected binders used in Example 7 are shown in SEQ ID NOs: 2, 6, 12, 14, 16, 18 and 20). Plasmids were transformed into chemically competent E. coli cells using standard methods. Recovery was done by adding 150 ul of warm SOC and incubation for 1 hour at 30 °C. After recovery, 80 ul of transformed culture was added to 1 ml 2YT containing corresponding antibiotic. The culture was grown overnight and then used to generate stock in glycerol.
- the stock was then used to inoculate an overnight culture of 2YT containing corresponding antibiotic, and the culture was grown overnight for ⁇ 20 hours at 37 °C. This culture was subsequently used to inoculate another larger volume culture of 2YT containing corresponding antibiotic at a 100-fold dilution. The culture was then left at 37°C for 3-4 hours until an optical density of 0.6 was reached. Temperature was then lowered to 15 °C and protein expression was induced with a final concentration of 0.5 mM IPTG. The cultures were grown for an additional 16-20 hours and the cells were harvested by centrifugation at 4,000 rpm for 20 min. The cellular pellets were stored at -80 °C until ready for use.
- the size-exclusion buffer was 25 mM PO4 pH 7.4 with 150 mM NaCl, and after elution and concentration, glycerol was added to final concentration of 10%. Proteins were aliquoted, frozen, and stored at -80 °C.
- Example 6 Removal of terminal amino acid residues from polypeptide analytes. [0257] During the described NGPS-type encoding assay, progression from one encoding cycle to another is performed by cleavage of the terminal amino acid residue encoded at the current cycle, which would expose the next terminal amino acid residue to be encoded in the next encoding cycle.
- an engineered enzyme that catalyzes the removal of a labeled (e.g., chemically labeled or modified) N-terminal amino acid can be used. Generation of such enzymes is disclosed in U.S. patent No. 11,427,814 B2, incorporated herein by reference.
- an M64-specific cleavase set comprising 3 engineered enzymes having amino acid sequences set forth in SEQ ID NOs: 21-23, which were engineered from dipeptidyl peptidase from Thermomonas hydrothermalis (sequence set forth in SEQ ID NO: 24) as described in US 11,427,814 B2.
- Modification of NTAA residues of polypeptide analytes is beneficial to limit cleavage of terminal residues to just a single residue per encoding cycle.
- Modified Cleavase enzymes were evolved to accommodate the installed NTM in their substrate binding pocket along with the NTAA residue, preventing progressive cleavage of the penultimate terminal amino acid residue of the polypeptide analyte, unless the NTM is installed again on the terminal residue formed after the cleavage of the original NTAA by the modified Cleavase enzyme (see US 11,427,814 B2). Therefore, in preferred embodiments, terminal amino acid residues of polypeptide analytes are modified before the cleavage of them could occur.
- the N-terminal amino acid is removed or eliminated using any of the chemical methods as described in in US 2020/0348307 Al, US 2022/0227889 Al, and US US 11,499,979 B2, all of which are incorporated herein by reference.
- the polypeptide is cleaved with a CTAA cleaving enzyme, such as C-terminal exopeptidases.
- a CTAA cleaving enzyme such as C-terminal exopeptidases.
- the rate of cleavage of a C-terminal exopeptidase can be controlled to ensure cleavage of a single or few terminal amino acid residues.
- the first strategy employs a C-terminal carboxypeptidase (CP) to remove a single terminal amino acid residue from polypeptide analytes per one cycle.
- CP C-terminal carboxypeptidase
- the modified carboxypeptidase is preferably engineered to be dependent on a C-terminal modification that enables stepwise cyclic sequencing with removal of only one CTAA per cycle.
- the engineering process can be performed as described in US 11,427,814 B2, where similar modified Cleavases were engineered to recognize and cleave only a modified NTAA residues from polypeptide analytes.
- the cleavase enzyme can be comprised of an engineered dipeptidyl carboxypeptidase which removes a dipeptide from the C-terminus of polypeptide analytes.
- the modified dipeptidyl CP (DCP) is preferably engineered to be dependent on a C- terminal modification to enable stepwise cyclic sequencing (see FIG. 5).
- the C- terminal modification also includes installation of an amino acid (native or unnatural) or bulky residue, the activity of the dipeptidyl CP will remove the resultant modified C-terminal dipeptide comprised of the modification and original CTAA. In this way, either an engineered CP or DCP can be used for stepwise C-terminal sequencing.
- C-terminal Cleavase enzymes are engineered from an M32 carboxypeptidase.
- the M32 family of CPs has two subfamilies, one subfamily that has limitations on peptide substrates length restricted to 10 amino acid residues or less, whereas the other subfamily has no limitations on peptide substrates length.
- TaqCP is an example of a peptide length restricted member of the first subfamily
- BsuCP is a non-length restricted M32 CP of subfamily 2 (ypwA peptidase, MEROPS) (Lee, et al., “Insight into the Substrate Length Restriction of M32 Carboxypeptidases: Characterization of Two Distinct Subfamilies.” Proteins 77 (3): 647-57).
- BsuCP is engineered into a C-terminal Cleavase with ability to work on any length peptide substrate with a free C-terminal end.
- C- terminal Cleavase may be engineered by modifying residues in the CTAA binding site using homology to TaqCP family homologues.
- Example 7 Exemplary N and D amino acid residues detection using encoding assay.
- binders were used to specifically bind to nucleic acid- polypeptide conjugates immobilized on a solid support.
- One binder binds to N-terminal asparagine residues of polypeptides (N-binder; SEQ ID NO: 1), and the other binder binds to N- terminal aspartate residues of polypeptides (D-binder; SEQ ID NO: 5).
- N-binder N-terminal asparagine residues of polypeptides
- D-binder N-terminal asparagine residues of polypeptides
- Both binders were engineered from human carbonic anhydrase I and II scaffolds by directed evolution, specifically as described in Example 2 of US 2022/0283175 AL
- the binders were conjugated to corresponding nucleic acid coding tags comprising barcodes with identifying information regarding the binder.
- the coding tags specific for each binder were attached to SpyTag via a PEG linker, and the resulting fusions were conjugated with binder-SpyCatcher fusion protein via SpyTag-SpyCatcher interaction, essentially as described in US 2021/0208150 AL
- peptides including two test macromolecules (M64- NAEIAGDVAGGK(azide), hereafter M64-premodified N-peptide (SEQ ID NO: 25), and M64- DAEIAGDVAGGK (azide), hereafter M64-premodified D-peptide (SEQ ID NO: 26) were attached to 5 ’-phosphorylated and internal DBCO-modified short oligonucleotide individually, and each resulting peptide-DNA chimera was joined to 5’-phosphrylated oligonucleotide including barcode sequence using T4 DNA ligase.
- the barcoded peptide-DNA chimeras were pooled and immobilized bead-attached capture DNA (SEQ ID NO: 27) using ligation reaction.
- Barcoded DNA-polypeptide conjugates pool 200 nM was mixed in lx Quick ligation buffer with T4 DNA ligase and added to the capture DNA-attached beads. After a 30-minute incubation at 25 °C, the beads were washed with PBST, 20% formamide in PBST and twice of PBST.
- the recording tags in the conjugates contain a barcode for the macromolecule, 2 nt overhang complementary region, Type II restriction enzyme binding region, and flanking region.
- the coding tags attached to the N-binder and D-binder each form a loop with 8 bp duplex and 2 nt overhang at the 3’, which is complementary to the 3’ overhang of the recording tag on the beads.
- the coding tags contain unique barcodes for identification of the binders, and also have the Btsl-V2 binding sequence and the 2 nt complimentary overhang region for the next binding cycle.
- the two binders (300 nM each) were incubated with the peptide -DNA chimera- immobilized beads in 50 mM MOPS buffer, pH 7.5. 33 mM Sodium Sulfate, 1 mM EDTA, at 25 °C for 30 min.
- end capping was performed to introduce a primer site for downstream PCR that will amplify extended recording tags for analysis.
- Capping oligos was introduced that contain a loop DNA with 2 nt 3 ’ overhang complimentary to the 3 ’ overhang of the extended or unextended recording tags.
- a primer site for downstream PCR can be introduced during the extension reaction using longer coding tags that contain a complementary primer sequence.
- FIG. 6 Exemplary encoding results generated by the described encoding method are shown in FIG. 6.
- the corresponding target peptides, M64-modified NAEIAGDVAGGK(azide) (SEQ ID NO: 25) and M64-modified DAEIAGDVAGGK (azide) (SEQ ID NO: 26) were encoded with a mix of N-binder (SEQ ID NO: 2) and D binder (SEQ ID NO: 6).
- Fractions of encoded recording tags (percentage of extended recording tags to total amount of recording tags on the beads (both extended and unextended)) were evaluated by NGS sequencing and showed specific encoding results for both binders.
- Example 8 Exemplary human plasma protein detection using NGPS assay.
- NGPS Next Generation Protein Sequencing
- Plasma proteins were digested to peptide fragments, modified with azide and loaded to beads, then sequenced by NGPS assay.
- the binders were each conjugated to corresponding nucleic acid coding tags comprising barcodes with identifying information flanked by 2 nt overhang at 5 ’ terminus used for information transfer ligation reactions as described below. Coding tags specific for each binder were attached to a SpyTag peptide via a PEG linker, and the resulting fusion was conjugated with binder-SpyCatcher fusion protein via SpyTag-SpyCatcher interaction. [0274] Each human plasma protein (100 pg) was processed using EasyPep MS Sample Prep Kits (Thermo Scientific, USA).
- Plasma protein was dissolved in 100 pL lysis solution, then reduced and alkylated at 95 °C for 10 min after adding 50 pL of reduction solution and 50 pL of alkylation solution. The resulting mixture was mixed with 50 pL of Trypsin/Lys-C protease mix, then incubated at 37 °C for 3 hours. After incubation, 50 pL of digestion stop solution was added. The digested peptide fragments were purified using Cl 8 columns. The digested peptide fragments solution was bedded to Cl 8 resin, washed once with solution A, twice with solution B, and eluted with 300 pL of elution solution.
- the purified peptide fragments solution was dried using speedvac, then dissolved in acetonitrile.
- the azide-modification was performed in 50 pL solution including 84% Acetonitrile/ 16% water, 20 mM Lys(N3) and 0.96 mg/mL carboxypeptidase Y (Sigma- Aldrich, USA) at 30 °C for 16 hours.
- the azide-modified peptide fragments were purified by HPLC.
- Phosphorylated nucleic acid-polypeptide conjugates (200 nM) were annealed and ligated to the hairpin DNAs attached to beads in lx Quick ligation buffer with T4 DNA ligase by 30 minutes incubation at 25 °C. The beads were washed twice with PBST and resuspended in the 50 pL of PBST.
- the recording tags in the conjugates contain a barcode for the plasma protein sample, 2 nt overhang complementary region, Type II restriction enzyme binding region, and flanking region.
- exemplary method of installing M64 NTM onto NTAA residues of immobilized peptides is provided below.
- ACN acetonitrile
- the active ester reagent was prepared from M64 and dissolved in 25 pL DMA and 25 pL ACN to a concentration of 0.05 M stock solution.
- 50 pL of the active ester stock solution was added to the peptide- ACN:MOPS solution and incubated at 45 °C for 60 minutes.
- the peptides were functionalized with the respective modification as shown in the scheme below. Beads with immobilized and M64-modified recording tag-polypeptide conjugates were washed from excess of M64 reagent.
- a mixture of seven binder comprising: a) L binder (SEQ ID NO: 12) having specificity for N-terminally modified L (leucine) amino acid residues of peptides; b) N binder (SEQ ID NO: 2) having specificity for N-terminally modified N (asparagine) amino acid residue of peptides; c) D-E binder (SEQ ID NO: 6) having specificity for N-terminally modified D (aspartic acid) and E (glutamic acid) amino acid residue of peptides; d) F binder (SEQ ID NO: 16) having specificity for N-terminally modified F (phenylalanine) amino acid residue of peptides; e) V-I binder (SEQ ID NO: 18) having specificity for N-terminally modified V (valine) and I (isoleucine) amino acid residue of peptides; f) Y binder (SEQ ID NO: 20) having specificity for N-terminally modified
- the binders (400 nM for L binder, 600 nM for N binder, 600 nM for D-E binder, 400 nM F binder, 600 nM V-I binder, 600 nM Y binder and 600 nM S binder) were incubated with the beads in 50 mM MOPS buffer, 33 mM NaSO4, 1 mM EDTA and 0.1% Tween20, pH 7.5 at 25 °C for 30 min.
- Cycle capping oligonucleotides were used to provide extension of recording tags for peptide analytes that did not participate in interaction with the binders followed by encoding (recording tags that were non-extended in a given cycle), which allows to generate compatible termini in all recording tags attached to peptide analytes suitable for the next encoding cycle.
- the encoded beads were incubated with the cleavase mix that is configured to cleave majority of modified NTAA residues, which includes 20 nM Z11, 2 nM Z13 and 2 nM Z15 in 0.2x TBE buffer at 45 °C for 1 hour to cleave the N-terminal M64- modified amino acids of peptides.
- the cleavase-treated beads were used for next cycle N- terminal modification step.
- four cycle of N-terminal M64 modification, encoding and cleavage step and additional N-terminal M64 modification and encoding were carried out to sequence 5 amino acid residues.
- the end capping step was carried out in this experiment.
- the goal of the end capping step is to introduce primers for amplification of the extended recording tags.
- the conditions for the end capping step were as follows: lx quick ligation buffer, 12.5 U/ pL of T4 DNA ligase, 0.125 U/pL Klenow fragment exo-, 125 pM dNTP each, 0.1% Tween 20, and 0.4 pM each of the end cap oligos at 25 °C for 15 min.
- identifying information regarding specific binders bound to the M64-modified NTAAs was recorded in extended recording tags.
- M64-modified NTAAs of immobilized polypeptides were cleaved, exposing a new NTAA residue for each immobilized polypeptide.
- “binding-encoding” cycle was repeated with a new mixture of binders specific for M64- modified NTAAs, and identifying information regarding specific binders bound to the new M64-modified NTAAs was recorded in extended recording tags. After that, a new M64- modified NTAA cleavage step occurred, and so on.
- analyzing extended recording tags containing the whole “binding history” of corresponding polypeptide analytes by NGS allows for deducing the identity of sequential terminal amino acid residues of polypeptide analytes, if binders have at least a level of specificity for a particular modified NTAA over other modified NTAAs.
- FIG. 7 indicates encoding results for human serum albumin (HSA) protein.
- HSA human serum albumin
- particular peptides may be identified with certain probability using a known binding pattern for each binder, and based on identification of peptides, a protein from a biological sample may be identified from which the peptides were generated (HSA in this case).
- binders in the set of binders have only medium specificity towards a particular target moiety (e.g., TAA of polypeptides) such that a binder binds more than one target moiety and there is a significant probability of incorrect moiety identification based on a single binding event.
- a particular target moiety e.g., TAA of polypeptides
- amino acid sequence of a polypeptide may be inferred based on (i) binding profiles of the binders from the set of binders that correspond to encoder barcode sequences present in extended recording tag sequences, and (ii) calculated probability scores of an association between a string of encoder barcode sequences that correspond to binders from the set of binders that bind to the polypeptide and one or more amino acid sequences of polypeptides of the plurality of polypeptides, as described in US patent application 18/951,277 filed on November 11, 2024.
- Example 9 Detection of N-glycosylated site occupancy.
- a deglycosylation step was performed before performing the NGPS assay.
- the beads were treated with recombinant PNGase F (New England Biolabs, USA) to remove the N-linked oligosaccharides from glycosylated peptides attached to beads.
- Plasma peptide-loaded beads (5000 beads) were resuspended in 10 pL of lx Deglycosylation Mix Buffer 1 (New England Biolabs, USA) including 1 pL of Protein Deglycosylation Mix II. The beads mixture was incubated at 25 °C for 30 minutes, then at 37 °C for 16 hours.
- releasing N-linked glycans from polypeptides comprises treating polypeptides with PNGase F (1 000 U per 20 pg protein) in 50 mM sodium phosphate, pH 7.5 or 50 mM Tris-HCl, pH 8.0, and 1 % NP-40 at 37 °C for 2-16 hours (or overnight) with gentle mixing.
- the deglycosylation treatment by PNGase F converts N-glycosylated N residues at the N- glycosylation sites into D residues.
- Haptoglobin is known to be glycosylated at four different sites.
- 3 trypsin generated fragments were made (SEQ ID NO: 28- SEQ ID NO: 30), which contain these 4 sites. If the glycosylation sites are occupied by glycans, PNGase F treatment generates 3 different peptides that contain “D” residues instead of “N” (SEQ ID NO: 31- SEQ ID NO: 33, see FIG. 8).
- the NGPS method as described in Example 8 identifies, which of the peptides having sequences set forth in SEQ ID NO: 28- SEQ ID NO: 33 are present in a sample containing Haptoglobin. If the amounts of each peptide are determined, the ratio of deglycosylated form (e.g., peptide having sequence SEQ ID NO: 31) to the total amount (deglycosylated (SEQ ID NO: 31) and native (SEQ ID NO: 28)) can be used to quantify percent of the glycosylation site occupancy (in this case, N184 of Haptoglobin).
- deglycosylated form e.g., peptide having sequence SEQ ID NO: 31
- native native
- Example 10 Addressing spontaneous deamidation of asparagine residues at N- glycosylated sites during quantification of glycosylation.
- Example 9 To quantify degree of glycosylation for the N-linked glycosylation site of polypeptide analytes in a sample, the method described in Example 9 is modified as follows.
- the beads with immobilized polypeptide molecules are divided in half, wherein one half is treated with PNGase F, while the other half is not treated with PNGase F.
- Each fraction is barcoded with a barcode nucleic acid sequence by attaching the fraction-specific barcode sequence to a corresponding associated nucleic acid recording tag attached to each polypeptide, which would allow to track during the following analysis whether a particular polypeptide molecule originated from the “treated” or “untreated” fraction.
- the same reference peptide(s) from glycosylated and non-glycosylated fractions may be analyzed to calculate possible differences between two fractions.
- polypeptide molecules not treated with the PNGase are identified during the analysis as having “D” residue at a N-glycosylated site, this indicates spontaneous deamidation, because normally, polypeptide molecules not treated with the PNGase should contain only “N” residue at the N-glycosylated site.
- the presence of a glycan at the N-glycosylated site precludes identification of “N” residue at a N-glycosylated site since the N-binder would not recognize N- glycan.
- FIG. 9 illustrates possible experimental design for glycosylation quantification, showing a case of heterogeneity at a particular N-glycosylation site of a polypeptide.
- Al-Cl are different polypeptide molecules of a polypeptide in fraction 1
- A2-C2 are different polypeptide molecules of the same polypeptide in fraction 2. Presence of either “N” or “D” is shown at a N- glycosylated site of the polypeptide in each species.
- Cl and C2 represent spontaneous deamidation of asparagine residues at the N-glycosylated site.
- NGPS assay can be used to estimate the amounts of Cl (the amount of Al could not be determined), A2+C2 (the total amount of species having “D”), and B2 species.
- FG glycosylated
- the fraction of protein molecules glycosylated [(A2+C2) - Cl] /(A2+B2+C2), and can be calculated from the NGPS experiment described in Example 9.
- SEQ ID NO: 1 - N-binder; S222 0397 (hCAII scaffold sequence only): SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NDGHAFAVEFDDSQDKAVLKGGPLDGTYRLFQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDFGKAKQQPDGLAVLGIFLKVGSAKPGLQKWDVLDSIKTKGKSA DFTNFDPRGLLPESLDYWTYPGSQTVPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGE GEPEELMVDNWRPAQPLKNRQIKASFK [0295] SEQ ID NO: 2 - N-binder-SpyCatcher Fusion; S222_0397:
- SEQ ID NO: 4 N-binder candidate; E.coli; AsnNAAB - from US 20210072252
- SEQ ID NO: 34 - PNGase F (Elizabethkingia miricola (Chryseobacterium miricola)):
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Hematology (AREA)
- Urology & Nephrology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- Food Science & Technology (AREA)
- Cell Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure relates to methods for high-throughput analysis of proteins and peptides employing a set of engineered binders that recognize terminal amino acid residues of peptide analytes. In particular, disclosed herein is the analysis of occupancy of N-linked glycosylation sites in proteins and peptides. The disclosure finds utility at large-scale profiling of N-linked glycosylation sites, as well as monitoring changes in glycosylation patterns associated with numerous disease conditions.
Description
HIGH-THROUGHPUT ANALYSIS OF N-LINKED GLYCOSYLATION SITE OCCUPANCY IN PROTEINS AND PEPTIDES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. provisional patent application No. 63/647,550, filed on May 14, 2024, the disclosure and content of which is incorporated herein by reference in its entirety for all purposes.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
[0002] The contents of the electronic sequence listing (File Name: ENCODIA0056- OWO_ST26.xml; Size: 45,135 bytes; and Date of Creation: May 12, 2025) is herein incorporated by reference in its entirety.
TECHNICAL FIELD
[0003] The present disclosure generally relates to biotechnology, in particular to methods for analysis of proteins and peptides, in particular, analysis of occupancy of N-linked glycosylation sites in proteins and peptides. The disclosure finds utility at large-scale profiling of N-linked glycosylation sites, as well as monitoring changes in glycosylation patterns associated with numerous disease conditions.
BACKGROUND
[0004] Variations in N-linked glycosylation (also called N-glycosylation) involve changes where the same site on a protein may or may not be glycosylated. This variability in glycosylation at specific sites within a protein can significantly impact its biological activity. Under different conditions, such as disease states or in different cellular environments, N- glycosylation sites in proteins may experience changes in their glycosylation patterns, such as when they lose or gain a glycan’s attachment. It is well documented that variations in N- glycosylation site occupancy within the same protein can be associated with numerous diseases, including cancer and certain genetic disorders, because such changes can affect the structure, stability, and function of proteins.
[0005] In one example, changes in the N-glycosylation pattern of immunoglobulins have been linked to autoimmune diseases, where glycosylation profile modifications impact their ability to regulate the immune system (Reusch D, Tejada ML. Fc glycans of therapeutic antibodies as critical quality attributes. Glycobiology. 2015 Dec;25(12): 1325-34). In another example, altered glycosylation in mucins can influence cell adhesion, signaling, and immune response, contributing to cancer progression and metastasis (Laubli H, Borsig L. Altered Cell
Adhesion and Glycosylation Promote Cancer Immune Suppression and Metastasis. Front Immunol. 2019 Sep 6;10:2120). In yet another example, changes in N-glycosylation of alpha-1- antitrypsin affect its function as a protease inhibitor, impacting disease progression of liver diseases (McCarthy C, et al., The role and importance of glycosylation of acute phase proteins with focus on alpha- 1 antitrypsin in acute and chronic inflammatory conditions. J Proteome Res. 2014 Jul 3; 13(7):3131-43). N-glycans have been suggested to have a major role in preventing the impairment of glucose-stimulated insulin secretion by modulating cell surface expression of glucose transporters (Stambuk T, Gornik O. Protein Glycosylation in Diabetes. Adv Exp Med Biol. 2021;1325:285-305).
[0006] Characterizing N-glycosylation site occupancy in specific proteins is important for understanding glycoprotein function, as well as for disease research and drug development, as N-glycosylation site occupancy could be affected during a disease condition or modified by an applied drug. The degree of N-glycosylation site occupancy by itself may correlate with progression or severity of the disease. Mass spectrometry (MS) has been used to analyze N- glycosylation site occupancy, where both label-free and labeling methods are applied (Zhu Z, Go EP, Desaire H. Absolute quantitation of glycosylation site occupancy using isotopically labeled standards and LC-MS. J Am Soc Mass Spectrom. 2014 Jun;25(6):1012-7; Zhang S, et al., Quantification of N-glycosylation site occupancy status based on labeling/label-free strategies with LC-MS/MS. Taianta, 2017, 170: 509-513). Labeling methodologies include SILAC (stable isotope labeling by amino acids in cell culture), TMT (tandem mass tags), and iTRAQ (isobaric tags for absolute and relative quantification). Despite the existence of MSbased methods, their throughput remains limited. Thus, there is a need for high-throughput methods for analysis of N-glycosylation site occupancy on a large, such as proteome-wide or population-wide, scale.
[0007] The present disclosure describes methods for high-throughput N-glycosylation site occupancy analysis that fulfill this and other needs. These and other embodiments of the invention will be apparent upon reference to the following detailed description. To this end, various references are set forth herein which describe in more detail certain background information, methods, and/or compositions, and are each hereby incorporated by reference in their entireties.
BRIEF SUMMARY
[0008] The summary is not intended to be used to limit the scope of the claimed subject matter. Other features, details, utilities, and advantages of the claimed subject matter will be
apparent from the detailed description including those embodiments disclosed in the accompanying drawings and in the appended claims.
[0009] The present disclosure describes sensitive and reliable analytical methods to identify occupation of N-glycosylation sites in multiple proteins that may be originated from different biological sample, providing a large-scale N-linked glycoproteomic analysis. Attempts to identify in vivo N-glycosylation sites on a proteome level has been reported, mapping 6367 N- glycosylation sites on 2352 proteins in four mouse tissues and blood plasma using high-accuracy mass spectrometry (Zielinska DF, et al., Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell. 2010 May 28;141(5):897-907). The vast majority of sites have a consensus sequence motif N-X-S/T-X, where N is an asparagine residue and X is not a proline residue. Importantly, evolutionary conservation of both solvent-exposed glycosylated asparagine residues and the canonical asparagine glycosylation motif sequences was observed (Park C, Zhang J. Genome-wide evolutionary conservation of N-glycosylation sites. Mol Biol Evol. 2011 Aug;28(8):2351-7), demonstrating functional importance ofN- glycosylation and highlighting diagnostic potential for high-throughput N-glycosylation monitoring.
[0010] In some embodiments, the present disclosure utilizes previously reported techniques for high-throughput polypeptide analysis that involve molecular barcoding, use of binders that bind to specific terminal amino acid residues and encoding of specific binding events (see, e.g., US 11513126 B2, US 2023/0136966 Al, US 2023/0054691 Al, US 2019/0145982 Al, each incorporated herein by reference). The methods allow for next-generation peptide sequencing (NGPS) assay, which comprises several steps performed in a cyclical progression where in each encoding cycle, the NTAA residue of a polypeptide to be analyzed is contacted with a specific binder having a detectable label (e.g., a coding tag with identifying barcode); following binding, the detectable label is analyzed (e.g., identifying barcode is transferred into a recording tag attached to the polypeptide); and the NTAA residue is cleaved to form a new NTAA residue of the polypeptide (see, e.g., US12235276B2 and US11782062B2, incorporated herein). In some embodiments, each cycle adds a barcode identifying corresponding binder in the extended recording tag attached to the polypeptide. In some embodiments, sequencing of the recording tag extended after several rounds of encoding allows to identify specific binders that were bound to each terminal amino acid residue formed at the beginning of each encoding cycle. If specificities of binders utilized in the assay are known, then identities of each of the encoded terminal amino acid residues on the polypeptide may be predicted with a certain probability, and the identity of the polypeptide can be derived by matching amino acid sequence variants predicted from the
encoding assay to a theoretical collection of peptides potentially present in the sample, which can be obtained from a genomic or proteomic database.
[0011] The described stepwise encoding of sequential amino acid residues at the N-terminus and/or C-terminus of a polypeptide can be adopted to enable high-throughput N-glycosylation site occupancy detection. Instead of developing binders specific for glycosylated amino acid residues, the present disclosure utilizes treatment of polypeptides with a PNGase enzyme, a deglycosylating enzyme that hydrolyzes N-linked glycan moieties from glycopolypeptides and yield polypeptides containing aspartic acid residues in place of the original asparagine residues effectively changing the glycan attachment residue of N-linked glycosylation sites of polypeptides (see, e.g., Kuhn P, et al., Active site and oligosaccharide recognition residues of peptide-N4-(N-acetyl-beta-D-glucosaminyl)asparagine amidase F. J Biol Chem. 1995 Dec 8;270(49):29493-7). Amino acid identity of the glycan attachment residue of N-linked glycosylation sites may be determined following treatment with the PNGase enzyme by utilizing two binders, such as a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue, and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue. In some embodiments, amino acid identity of the glycan attachment residue may be determined following treatment with the PNGase enzyme by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue. The described approach can be performed in a very high-throughput manner (e.g., for thousands or millions of polypeptides in a single assay), and may be combined with high-throughput polypeptide identification, because the same binders may be utilized to determine amino acid sequence information regarding the analyzed polypeptides.
[0012] In some embodiments, if a particular N-glycosylation site is partially occupied by a glycan molecule (i.e., some polypeptide molecules are glycosylated at this site while other molecules are not glycosylated at this site), treatment with the PNGase enzyme will produce a heterogenous population of polypeptide molecules, where the previously non-glycosylated polypeptide molecules will still contain original asparagine residues at the analyzed glycosylation site(s), while the previously glycosylated molecules will contain aspartic acid residues at the analyzed glycosylation site(s) generated after the PNGase treatment. In some embodiments, multiple polypeptides are formed by digesting proteins from one or more biological samples, followed by treatment of the polypeptides with the PNGase enzyme and the NGPS analysis of the polypeptides that utilizes binders specific for “N” and “D” residues (referred to as N-binder and D-binder). In some embodiments, the attachment residue of the N-
linked glycan within the N-linked glycosylation site (i.e., the original “N” residue) is an internal residue in a polypeptide analyte, and a binder is able to preferentially binds to an Asn (N) residue over an Asp (D) residue at the attachment site. In other embodiments, the attachment residue of the N-linked glycan within the N-linked glycosylation site (i.e., the original “N” residue) is a terminal residue of a polypeptide analyte, and binders that specifically bind to terminal “N” or “D” residues are utilized.
[0013] In some embodiments, sequential cleavage of terminal residues of a polypeptide analyte is utilized which eventually exposes the N/D residue within the analyzed N- glycosylation site treated with PNGase to become a terminal amino acid residue, and its identity may be determined using available N-binder and D-binder. One particular advantage of the disclosed methods is that when sequential cleavage of terminal residues of a polypeptide analyte is utilized, this would allow to combine identification of a polypeptide (e.g., determining polypeptide amino acid sequence through probabilistic identification of individual terminal amino acid residues) with detecting the presence of a glycan in a particular N-glycosylation site, thereby achieving high-throughput polypeptide identification and characterization. In some embodiments, the described methods can be used to quantify the percentage of polypeptide molecules glycosylated at a particular site by comparing encoding yield for N-binder and D- binder to encode the N or D residue within the analyzed N-glycosylation site in a population of the polypeptide molecules. The higher percentage of D residues at the analyzed N-glycosylation site in the polypeptide molecules is identified, the higher glycosylation rate of the polypeptide at this site was before the PNGase treatment. A 100% glycosylation would result in 100% conversion of N residues to D residues at the analyzed N-glycosylation site in the analyzed polypeptide molecules. Another important advantage of the described process is that it allows to perform simultaneous identification of a peptide and glycan presence in a high-throughput manner, that is analyzing multiple, if not all, peptides present in a particular center. The described process can be applied for analysis of at least 1000, 10000, 100000 or more individual polypeptides simultaneously, which allows to analyze N-glycosylation site occupancy at a proteome level. The high multiplexing feature of the described process makes it superior in comparison with mass spectrometry-based glycosylation identification.
[0014] Furthermore, additional challenges are known to exist when mass spectrometry is applied to detect the described mass differences between N residues and D residues following conversion from the PNGase treatment. The mass spectrometry signal of the deglycosylated peptide (having “D”) cannot be directly compared to the signal of the non-glycosylated peptide (having “N”), because the deglycosylated peptide showed reduced signal intensity of up to 50%
compared with the non-glycosylated counterpart of equal molar concentration for certain peptide sequences (Stavenhagen K, et al, “Quantitative mapping of glycoprotein micro- heterogeneity and macro-heterogeneity: an evaluation of mass spectrometry signal strengths using synthetic peptides and glycopeptides”, J. Mass Spectrom. 2013; 48:627-639), making the quantification of glycosylation based on mass spectrometry methods inaccurate. The methods described below avoid this limitation.
[0015] In some embodiments, disclosed herein is a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and (c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
[0016] In some embodiments, disclosed herein is also a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) cleaving the polypeptide treated with the PNGase to generate a cleaved polypeptide comprising the glycan attachment residue of the N-linked glycosylation site as a terminal amino acid (TAA) residue of the cleaved polypeptide;
(c) contacting the immobilized and cleaved polypeptide with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and (d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and cleaved polypeptide, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
[0017] In some embodiments, disclosed herein is also a method for analyzing a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves an N-linked glycan from a glycan attachment residue of a N-linked glycosylation site of the polypeptide, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn
(N) and Asp (D), wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
(c) cleaving the immobilized polypeptide to expose a new TAA residue;
(d) contacting the immobilized and cleaved polypeptide with the set of binders, wherein the first binder, the second binder, or the third binder binds to the new TAA residue of the immobilized and cleaved polypeptide;
(e) optionally repeating (c) - (d) in one or more cycles;
(f) determining an amino acid sequence of the polypeptide by obtaining or retaining identifying information regarding the binders that bind to the TAA residues in (b), (d), and optionally (e); and
(g) comparing the amino acid sequence determined in (f) to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase, thereby detecting one or more N-linked glycosylation sites in the polypeptide.
[0018] In preferred embodiments, two binders are needed for implementation of the disclosed approach: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site, and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site.
[0019] In some embodiments, disclosed herein is a method for detecting N-linked glycosylation of polypeptides comprising a first polypeptide having a N-linked glycosylation site, the method comprising:
(a) attaching polypeptides comprising the first polypeptide to a solid support, wherein the polypeptides are treated with a PNGase before or after the attachment of the polypeptides to the solid support, thereby obtaining immobilized polypeptides comprising immobilized first polypeptide de-glycosylated at the N-linked glycosylation site;
(b) analyzing sequentially at least some individual amino acid residues of the immobilized polypeptides, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
(i) contacting the solid support with a set of binders, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
(ii) following binding of a binder of the set of binders to the polypeptide, obtaining or retaining identifying information regarding the binder;
(iii) removing the TAA or the modified TAA to expose a new TAA, thereby generating a cleaved polypeptide, and, optionally, modifying the new TAA to yield a newly modified TAA; and
(iv) repeating steps (i)-(iii) or (i)-(ii) at least one time,
wherein the immobilized first polypeptide is analyzed; the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide;
(c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
[0020] In some embodiments, disclosed herein is also a method for analyzing a polypeptide comprising a N-linked glycosylation site, comprising:
(a) contacting the polypeptide with a PNGase, wherein the polypeptide is attached to a solid support before or after the contacting with the PNGase, thereby immobilizing the polypeptide on the solid support;
(b) determining at least partial amino acid sequence of the immobilized polypeptide, wherein the determining comprises:
(i) contacting the immobilized polypeptide treated with the PNGase with a set of binders,
(ii) cleaving the immobilized polypeptide to generate a cleaved immobilized polypeptide, and
(iii) contacting the cleaved immobilized polypeptide with a subsequent set of binders, wherein each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder, wherein the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
[0021] In some embodiments, the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue may be obtained or retained depending on particular way of detecting the detectable labels of the binders. In some embodiments, the detectable label of the binder is detected in situ upon the binding to the glycan attachment
residue of a polypeptide treated with PNGase. In some embodiments, detecting in situ upon binding allows to obtain the identifying information regarding the binder, which may be stored and utilized later to decode amino acid identity of the glycan attachment residue. In some embodiments, the detectable label of the binder comprises a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder. In some embodiments, the encoder barcode may be analyzed in situ (at the polypeptide location) such as by sequencing in situ or in situ amplification. In some embodiments, the encoder barcode sequence information is transferred between the nucleic acid coding tag of the binder and a recording tag attached to the analyzed polypeptide. In some embodiments, such transfer of nucleic acid information generates an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the binder and nucleic acid sequence information of the nucleic acid recording tag attached to the analyzed polypeptide. In some embodiments, such transfer of nucleic acid information allows the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue to be retained, e.g., in the extended nucleic acid construct. In some embodiments, the extended nucleic acid construct is an extended recording tag attached to the analyzed polypeptide or an extended coding tag attached to the binder. In some embodiments, the extended nucleic acid construct is collected after each cycle of nucleic acid sequence information transfer. In some embodiments, the extended nucleic acid construct is not collected after each cycle of nucleic acid sequence information transfer (e.g., after each cycle, the extended nucleic acid construct retains the identifying information regarding the first binder or the second binder that bind to the glycan attachment residue), but instead collected and analyzed by a nucleic acid sequencing method after completion of several cycles of nucleic acid sequence information transfer. In one example, several cycles of nucleic acid sequence information transfer generates an extended recording tag attached to the analyzed polypeptide or fragment thereof (if the polypeptide is cleaved during the assay) which is then analyzed by a nucleic acid sequencing method to decode identities of binders that were bound to the analyzed polypeptide during sequence information transfer cycles. Accordingly, the identities of binders may then be used to obtain amino acid sequence information of the analyzed polypeptide which also includes assessment of occupancy of N- linked glycosylation sites of the analyzed polypeptide based on the methods disclosed herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] Non-limiting embodiments of the present invention will be described by way of example with reference to the accompanying figures, which are schematic and are not intended to be drawn to scale. For purposes of illustration, not every component is labeled in every
figure, nor is every component of each embodiment of the invention shown where illustration is not necessary to allow those of ordinary skill in the art to understand the invention.
[0023] FIG. 1 depicts an exemplary polypeptide sequencing assay with terminal amino acid (TAA)-specific binders.
[0024] FIG. 2 depicts an exemplary variation of the polypeptide sequencing assay with terminal amino acid (TAA)-specific binders as described in FIG. 1.
[0025] FIG. 3 depicts another exemplary polypeptide sequencing assay with terminal amino acid (TAA)-specific binders each conjugated with a specific detectable label.
[0026] FIG. 4 depicts a native dipeptidyl carboxypeptidase (DCP) and FIG. 5 depicts exemplary design of an engineered C-terminal Cleavase (Clv-C). Native dipeptidyl carboxypeptidase DCP can be engineered to cleave a C-terminal dipeptide from a peptide modified at the C-terminus with an amino acid-like label. The pl ’ residue and the added X-label act as a dipeptide unit cleavable by an engineered Clv-C. The residues in Clv-C enzyme that normally bind the COOH are engineered to bind the unnatural amino acid and label. Possible modifications include amide, acetylation and others.
[0027] FIG. 6 depicts exemplary encoding reactions for two test peptides (SEQ ID NO: 25 and SEQ ID NO: 26) using a set of N-binder (SEQ ID NO: 2) and D binder (SEQ ID NO: 6). Fractions of encoded recording tags were evaluated by NGS and showed specific encoding results for both binders.
[0028] FIG. 7. Exemplary result of the NGPS peptide sequencing assay using a selected plasma protein (HSA) and a 7-binder mix as described in Example 4. Particular reads (e.g., LZNNN, LZLYY, etc.) indicated on the x axis were each assigned to a particular peptide indicated after each read (separated by ::ALB:).
[0029] FIG. 8 depicts trypsin generated fragments of Haptoglobin, which are known to be glycosylated. The N amino acid residues shown in bold are known to be glycosylated, and they converted to D residues by the PNGase F treatment.
[0030] FIG. 9 shows experimental design to address spontaneous deamidation of asparagine residues at N-glycosylated sites for quantification of glycosylation (see Example 10).
[0031] FIG. 10 depicts an exemplary variation of the approach for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, which includes analysis of the attachment amino acid residue of the polypeptide via specific binding agents, wherein the amino acid identity of the amino acid residue of the polypeptide treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag of the
binder that binds to the attachment amino acid residue. The signal may be recorded providing information regarding the polypeptide analyte at a location on a support. A nucleic acid hairpin attached to the polypeptide is attached to the support. A restriction enzyme is used to create a recording tag with 3 ’ overhang attached to the polypeptide and to the support. The recording tag - polypeptide conjugate is contacted with a binder attached to a circularized coding tag comprising the encoder barcode (Binder ID). Upon binding of the binder to the attachment residue of the polypeptide (which may be a TAA residue), a portion of the recording tag is hybridized to a complementary region within the circularized coding tag. This formed doublestranded region is used by the phi29 polymerase to initiate a rolling circle amplification (RCA) reaction to amplify the coding tag which includes the encoder barcode followed by detection of amplified copies of the encoder barcode in situ by fluorescently labeled probes. The amplified structure is then released via the restriction enzyme cut which re-generates the recording tag with 3’ overhang. In some embodiments, the double-stranded region used to initiate the RCA reaction may be generated by introducing a recognition site for a Type IIS restriction enzyme together with a recognition site for a nicking enzyme into the recording tag.
[0032] FIG. 11 depicts exemplary approach using a plurality of fluorescently labeled probes for detection of barcode sequences within nucleic acid structures amplified in situ by methods described in FIG. 10. Standard methods known in the art may be used to detect fluorescently labeled probes attached to barcode sequences followed by signal recording.
DETAILED DESCRIPTION
DEFINITIONS
[0033] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which the present disclosure belongs. If a definition set forth in this section is contrary to or otherwise inconsistent with a definition set forth in the patents, applications, published applications and other publications that are herein incorporated by reference, the definition set forth in this section prevails over the definition that is incorporated herein by reference.
[0034] As used herein, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a peptide” includes one or more peptides, or mixtures of peptides. Also, and unless specifically stated or obvious from context, as used herein, the term “or” is understood to be inclusive and covers both “or” and “and”.
[0035] The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or
parameter herein includes (and describes) embodiments that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X”. [0036] As used herein, the term “analyte” refers to a substance whose chemical constituents are being identified and/or measured. In preferred embodiments, “analyte” refers to “polypeptide analyte”, and at least partial amino acid sequence, identity and/or specific feature (e.g., a presence of N-glycosylation) of the polypeptide analyte are determined by the methods disclosed herein. In some embodiments, one, two or more amino acid residues of a polypeptide analyte each are individually determined with a certain probability, which may be sufficient for determining identity or a feature of the polypeptide analyte. Polypeptide analyte are substrates of specific binders disclosed herein.
[0037] As used herein, the term “sample” refers to anything which may contain an analyte for which an analyte assay is desired. As used herein, a “sample” can be a solution, a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination thereof. The sample may be a biological sample, such as a biological fluid or a biological tissue. Examples of biological fluids include urine, blood, plasma, serum, saliva, semen, stool, sputum, cerebral spinal fluid, tears, mucus, amniotic fluid or the like.
[0038] As used herein, the term “macromolecule” encompasses large molecules composed of smaller subunits. Examples of macromolecules include, but are not limited to peptides, polypeptides, proteins, nucleic acids, carbohydrates, lipids, macrocycles, or a combination or complex thereof.
[0039] As used herein, the term “polypeptide” is used interchangeably with the term “peptide” and encompasses peptides and proteins, referring to a molecule comprising a chain of three or more amino acids joined by peptide bonds. In some embodiments, a polypeptide comprises 3 to 50 amino acid residues. In some embodiments, a peptide does not comprise a secondary, tertiary, or higher structure. In some embodiments, the polypeptide is a protein. In some embodiments, a protein comprises 30 or more amino acids, e.g., having more than 50 amino acids. In some embodiments, in addition to a primary structure, a protein comprises a secondary, tertiary, or higher structure. The amino acids of the polypeptides are most typically L-amino acids, but may also be D-amino acids, modified amino acids, amino acid analogs, amino acid mimetics, or any combination thereof. Polypeptides may be naturally occurring, synthetically produced, or recombinantly expressed. Polypeptides may be synthetically produced, isolated, recombinantly expressed, or be produced by a combination of methodologies as described above. Polypeptides may also comprise additional groups modifying the amino acid chain, for example, functional groups added via post-translational modification. The
polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The term also encompasses an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a detectable label.
[0040] As used herein, the term “amino acid” refers to an organic compound, which serves as a monomeric subunit of a peptide. An amino acid includes the 20 standard, naturally occurring or canonical amino acids as well as non-standard amino acids. The standard, naturally-occurring amino acids include Alanine (A or Ala), Cysteine (C or Cys), Aspartic Acid (D or Asp), Glutamic Acid (E or Glu), Phenylalanine (F or Phe), Glycine (G or Gly), Histidine (H or His), Isoleucine (I or He), Lysine (K or Lys), Leucine (L or Leu), Methionine (M or Met), Asparagine (N or Asn), Proline (P or Pro), Glutamine (Q or Gin), Arginine (R or Arg), Serine (S or Ser), Threonine (T or Thr), Valine (V or Vai), Tryptophan (W or Trp), and Tyrosine (Y or Tyr). An amino acid may be an L-amino acid or a D-amino acid. Non-standard amino acids may be modified amino acids, amino acid analogs, amino acid mimetics, non-standard proteinogenic amino acids, or non-proteinogenic amino acids that occur naturally or are chemically synthesized.
[0041] As used herein, the term “binder” refers to a nucleic acid molecule, a polypeptide, a protein, or a small molecule that binds to, associates, unites with, recognizes, or combines with a binding target, e.g., a component of a polypeptide. A binder may form a covalent association or non-covalent association with the component of a polypeptide to which it binds. A binder may also be a chimeric binder, composed of two or more types of molecules, such as a nucleic acid molecule-peptide chimeric binder. A binder may be a naturally occurring, synthetically produced, or recombinantly expressed molecule. In preferred embodiments, a binder binds to a single monomer or subunit of a polypeptide, such as a single terminal amino acid of a polypeptide. A binder may bind to an N-terminal amino acid residue or a C-terminal amino acid residue of a polypeptide. A binder may preferably bind to a chemically modified or labeled terminal amino acid residue (e.g., an amino acid that has been labeled or modified by a modifier agent, such as an N-terminal modifier agent) over an unlabeled or non-modified amino acid residue. A binder may exhibit selective binding to a component of a polypeptide e.g., a binder may selectively bind to one of the 20 possible natural amino acid residues and bind with very low affinity or not at all to the other 19 natural amino acid residues). A binder may exhibit less selective binding, where the binder is capable of binding or configured to bind to a plurality of components of a polypeptide (e.g., a binder may bind with similar affinity to two or more
different terminal amino acid residues). A binder may be attached to a detectable label or a coding tag, which may be joined to the binder by a linker.
[0042] The terms “specific binding” and “specificity of binding” are used herein to qualify the relative affinity and selectivity by which an engineered binder binds to a cognate target moiety (e.g., a TAA residue it is engineered to bind). Generally, an engineered binder specifically binds to a particular target moiety (e.g., a TAA or a modified TAA) more readily than it would bind to a random target moiety (e.g., there is a detectable relative increase in the binding of the binder to a specific target moiety or to a group of target moieties (e.g., a group of TAA residues)). In some embodiments of the disclosed methods, binders in the set of binders have medium specificity towards a particular target moiety such that a binder binds more than one target moiety and there is a significant probability of incorrect moiety identification based on a single encoding event. In some embodiments, an engineered binder binds to a cognate target moiety at least twice more likely that to a random, non-cognate target moiety (a 2:1 ratio of specific to non-specific binding). Non-specific binding refers to background binding, and is the amount of signal that is produced in a binding assay between an engineered binder and a non-cognate target moiety. In some embodiments, specific binding refers to binding between an engineered binder and a cognate target moiety (e.g., a TAA or a modified TAA) with a dissociation constant (Kd) of 500 nM or less.
[0043] As used herein, the term “selectivity” refers to the ability of a binder to preferentially bind to one or to several terminal amino acid residues of a peptide analyte, optionally modified with a chemical modification. In preferred embodiments, “selectivity” describes preferential binding of a binder to a single NTAA or CTAA residue, or to a small group of NTAA or CTAA residues (e.g., structurally related residues). In some embodiments, a binder may exhibit selective binding to a particular terminal amino acid residue. In some embodiments, a binder may exhibit selective binding to a particular class or type of terminal amino acid residues. In some embodiments, a binder may exhibit particular binding kinetics (e.g., higher association rate constant and/or lower dissociation rate constant) to a particular class or type of terminal amino acid residues or modified terminal amino acid residues, compared to other terminal amino acid residues or modified terminal amino acid residues. In some embodiments, selectivity of each binder towards specific NTAA or CTAA resides of peptide analytes is determined in advance, before performing contacting steps of the disclosed methods.
[0044] As used herein, the term “linker” refers to one or more of a nucleotide, a nucleotide analog, an amino acid, a peptide, a polypeptide, a polymer, or a non-nucleotide chemical moiety that is used to join two molecules. A linker may be used to join a binder with a coding tag, a
recording tag with a polypeptide, a polypeptide with a support, a recording tag with a solid support, etc. In certain embodiments, a linker joins two molecules via enzymatic reaction or chemistry reaction (e.g., click chemistry).
[0045] As used herein, the term “barcode” refers to a nucleic acid molecule of about 2 to about 30 bases (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 bases) providing a unique identifier tag or origin information for a polypeptide, a binder, a set of binders from one encoding cycle (when sets are changed between cycles), a sample polypeptides, a set of samples, or polypeptides within a compartment (e.g., droplet, bead, or separated location). A barcode can be an artificial sequence or a naturally occurring sequence. In certain embodiments, each barcode within a population of barcodes is different. In other embodiments, a portion of barcodes in a population of barcodes is different. A population of barcodes may be randomly generated or non-randomly generated. In certain embodiments, a population of barcodes are error-correcting or error-tolerant barcodes. Barcodes can be used to computationally deconvolute the multiplexed sequencing data and identify sequence reads derived from an individual polypeptide, sample, library, etc. A barcode can also be used for deconvolution of a collection of polypeptides that have been distributed into small compartments for enhanced mapping. For example, rather than mapping a peptide back to the proteome, the peptide is mapped back to its originating protein molecule or protein complex.
[0046] As used herein, the term “detectable label” or “identifying detectable label” refers to a substance which can indicate the presence of another substance when associated with it. The detectable label can be a substance that is linked to or incorporated into the substance to be detected. In some embodiments, a detectable label is suitable for allowing for detection and also quantification, for example, a detectable label that emitting a detectable and measurable signal. Detectable labels include any labels that can be attached to binders and are compatible with the provided methods and include, but not limited to, a bioluminescent label, a biotin/avidin label, a chemiluminescent label, a nucleic acid tag comprising a UMI and/or a barcode, a chromophore, a coenzyme, a dye, an electro-active group, an electrochemiluminescent label, an enzymatic label (e.g. alkaline phosphatase, luciferase or horseradish peroxidase), a fluorescent label, a latex particle, a magnetic particle, a metal, a metal chelate, a phosphorescent dye, a protein label, a radioactive element or moiety, and a stable radical. In some embodiments, a detectable label is nucleic acid tag that comprises a UMI and/or a barcode, and which can be detected via nucleic acid sequencing. In other embodiments, a detectable label does not comprise a nucleic acid tag. [0047] As used herein, the term “coding tag” or “CT” refers to a polynucleotide or a sequenceable polymer molecule (see, e.g., Niu et al., 2013, Nat. Chem. 5:282-292; Roy et al.,
2015, Nat. Commun. 6:7237; Lutz, 2015, Macromolecules 48:4759-4767; each of which are incorporated by reference), having any suitable length, e.g., a nucleic acid molecule of about 2 bases to about 100 bases, including any integer including 2 and 100 and in between, that comprises identifying information for its associated binder. A coding tag may comprise an encoder sequence (e.g., barcode that comprises identifying information regarding the binder), which is optionally flanked by one spacer on one side or optionally flanked by a spacer on each side. A coding tag may also be comprised of an optional UMI and/or an optional encoding cycle-specific barcode. A coding tag may be single stranded or double-stranded. A doublestranded coding tag may comprise blunt ends, overhanging ends, or both. A coding tag may refer to the coding tag that is directly attached to a binder, to a complementary sequence hybridized to the coding tag directly attached to a binder (e.g., for double-stranded coding tags), or to coding tag information present in an extended recording tag.
[0048] As used herein, the term “recording tag” or “RT” refers to a nucleic acid molecule, or a sequenceable polymer molecule (see, e.g., Niu et al., 2013, Nat. Chem. 5:282-292; Roy et al., 2015, Nat. Commun. 6:7237; Lutz, 2015, Macromolecules 48:4759-4767; each of which are incorporated by reference) to which identifying information of a coding tag can be transferred, or from which identifying information about the polypeptide (e.g., UMI information) attached to the recording tag can be transferred to the coding tag. Identifying information can comprise any information characterizing a molecule such as information pertaining to identity, partition, spatial location, cycle number, etc. Additionally, the presence of UMI information can also be classified as identifying information. In certain embodiments, after a binder binds to a polypeptide, information from a coding tag attached to the binder can be transferred to the recording tag attached to the polypeptide while the binder is bound to the polypeptide. A recording tag may be directly linked to a polypeptide, linked to a polypeptide via a linker, or attached to a polypeptide by virtue of its proximity (or co-localization) on a support. A recording tag may be linked via its 5 ’ end or 3 ’ end or at an internal site, as long as the linkage is compatible with the method used to transfer coding tag information to the recording tag or vice versa. A recording tag may optionally comprise other functional components, e.g., a universal priming site, unique molecular identifier, a barcode (e.g., a sample barcode, a fraction barcode, spatial barcode, a compartment tag, etc.), a spacer sequence that is complementary to a spacer sequence of a coding tag, or any combination thereof. The spacer sequence of a recording tag is optionally at the 3 ’-end of the recording tag in embodiments where polymerase extension is used to transfer coding tag information to the recording tag.
[0049] As used herein, the term “spacer” (Sp) refers to a nucleic acid molecule of about 1 base to about 20 bases (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 bases) in length that is present on a terminus of a recording tag or coding tag. In some embodiments, a spacer sequence flanks an encoder sequence of a coding tag on one end or both ends. Following binding of a binder to a polypeptide, annealing between complementary spacer sequences on their associated coding tag and recording tag, respectively, allows transfer of binding information through a primer extension reaction and/or ligation between the recording tag and coding tag. Preferably, spacer sequences within a set of binders possess the same number of bases. A common (shared or identical) spacer may be used in a set of binders. A spacer sequence may have a “cycle specific” sequence in order to track binders used in a particular encoding cycle (i.e., contacting-transferring-releasing steps of the methods disclosed herein form “encoding cycle”). The spacer sequence (Sp) can be constant across all encoding cycles, be specific for a particular class of polypeptides, or be encoding cycle number specific. In some embodiments, only the sequential binding of correct cognate pairs of RT and CT results in interacting spacer elements and effective primer extension. A spacer sequence may comprise sufficient number of bases to anneal to a complementary spacer sequence in a recording tag to initiate a primer extension (also referred to as polymerase extension) reaction, or provide a “splint” for a ligation reaction, or mediate a “sticky end” ligation reaction.
[0050] As used herein, the term “primer extension”, also referred to as “polymerase extension” and “extension”, refers to a reaction catalyzed by a nucleic acid polymerase (e.g., DNA polymerase) whereby a nucleic acid molecule (e.g., oligonucleotide primer, spacer sequence) that anneals to a complementary strand is extended by the nucleic acid polymerase, using the complementary strand as template. Various polymerases capable of performing the extension are known in the art.
[0051] As used herein, the term “unique molecular identifier” or “UMI” refers to a nucleic acid molecule of about 3 to about 40 bases (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 bases) in length providing a unique identifier tag for each polypeptide, polypeptide or binder to which the UMI is linked. A polypeptide UMI can be used to computationally deconvolute sequencing data from a plurality of extended recording tags to identify extended recording tags that originated from an individual polypeptide. A polypeptide UMI can be used to accurately count originating polypeptide molecules by collapsing NGS reads to unique UMIs. A binder UMI can be used to identify each individual molecular binder that binds to a particular polypeptide. For example, a UMI can be used to identify the number of individual binding events for a binder specific for a
single amino acid that occurs for a particular peptide molecule. It is understood that when UMI and barcode are both referenced in the context of a binder or polypeptide, that the barcode refers to identifying information other that the UMI for the individual binder or polypeptide (e.g. , sample barcode, compartment barcode, encoding cycle barcode).
[0052] As used herein, the term “universal priming site” or “universal primer” or “universal priming sequence” refers to a nucleic acid molecule, which may be used for analysis, amplification, and/or for sequencing of extended recording tags. A universal priming site may include, but is not limited to, a priming site (primer sequence) for PCR amplification, flow cell adaptor sequences that anneal to complementary oligonucleotides on flow cell surfaces enabling bridge amplification in some next generation sequencing platforms, a sequencing priming site, or a combination thereof. In some embodiments, extended recording tag molecules may be circularized and a universal priming site used for rolling circle amplification to form DNA nanoballs that can be used as sequencing templates (Drmanac et al., 2009, Science 327:78-81). Alternatively, recording tag molecules may be circularized and sequenced directly by polymerase extension from universal priming sites (Korlach et al., 2008, Proc. Natl. Acad. Sci. 105:1176-1181).
[0053] As used herein, the term “extended recording tag” refers to a recording tag to which information of at least one binder’s coding tag (or its complementary sequence) has been transferred following binding of the binder to a polypeptide. Information between the coding tag and the recording tag may be transferred directly (e.g., ligation) or indirectly e.g., primer extension). Information between the coding tag and the recording tag may be transferred enzymatically or chemically (e.g., by chemical ligation). An extended recording tag may comprise binder information of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50 or more coding tags. The base sequence of an extended recording tag may reflect the temporal and sequential order of binding of the binders identified by their coding tags, may reflect a partial sequential order of binding of the binders identified by the coding tags, or may not reflect any order of binding of the binders identified by the coding tags.
[0054] As used herein, the term “solid support” or “support” refers to any solid material, including porous and non-porous materials, to which a polypeptide can be associated directly or indirectly, by any means known in the art, including covalent and non-covalent interactions, or any combination thereof. A solid support may be two-dimensional (e.g., planar surface) or three-dimensional (e.g., gel matrix or bead). A solid support can be any support surface including, but not limited to, a bead, a microbead, an array, a glass surface, a silicon surface, a
plastic surface, a filter, a membrane, a PTFE membrane, a nitrocellulose-based polymer surface, a silicon wafer chip, a flow through chip, a flow cell, a biochip including signal transducing electronics, a channel, a microtiter well, an ELISA plate, a spinning interferometry disc, a nitrocellulose membrane, a nitrocellulose-based polymer surface, a polymer matrix, a nanoparticle, a microparticle, or a microsphere. Materials for a solid support include but are not limited to acrylamide, agarose, cellulose, dextran, nitrocellulose, glass, gold, quartz, polystyrene, polyethylene vinyl acetate, polypropylene, polyester, polymethacrylate, polyacrylate, polyethylene, polyethylene oxide, polysilicates, polycarbonates, poly vinyl alcohol (PVA), Teflon, fluorocarbons, nylon, silicon rubber, polyanhydrides, polyglycolic acid, polyvinylchloride, polylactic acid, polyorthoesters, functionalized silane, polypropylfumerate, collagen, glycosaminoglycans, polyamino acids, dextran, or any combination thereof. For example, when solid surface is a bead, such as a microbead, the bead can include, but is not limited to, a ceramic bead, a polystyrene bead, a polymer bead, a polyacrylate bead, a methylstyrene bead, an agarose bead, a cellulose bead, a dextran bead, an acrylamide bead, a solid core bead, a porous bead, a paramagnetic bead, a glass bead, a controlled pore bead, a silica-based bead, or any combinations thereof. A bead may be spherical or an irregularly shaped. A bead’s size may range from nanometers, e.g., 10 nm, to millimeters, e.g., 1 mm. In certain embodiments, beads range in size from about 0.2 micron to about 200 microns, or from about 0.5 micron to about 5 microns. In certain embodiments, “a bead” solid support may refer to an individual bead or a plurality of beads. In certain embodiments, the nanoparticles range in size from about 10 nm to about 500 nm in diameter.
[0055] As used herein, the term “nucleic acid” or “polynucleotide” refers to a single- or double-stranded polynucleotide containing deoxyribonucleotides or ribonucleotides that are linked by 3 ’-5’ phosphodiester bonds, as well as polynucleotide analogs. A nucleic acid molecule includes, but is not limited to, DNA, RNA, and cDNA. A polynucleotide analog may possess a backbone other than a standard phosphodiester linkage found in natural polynucleotides and, optionally, a modified sugar moiety or moieties other than ribose or deoxyribose. Polynucleotide analogs contain bases capable of hydrogen bonding by Watson- Crick base pairing to standard polynucleotide bases, where the analog backbone presents the bases in a manner to permit such hydrogen bonding in a sequence-specific fashion between the oligonucleotide analog molecule and bases in a standard polynucleotide. Examples of polynucleotide analogs include, but are not limited to xeno nucleic acid (XNA), bridged nucleic acid (BNA), glycol nucleic acid (GNA), peptide nucleic acids (PNAs), yPNAs, morpholino polynucleotides, locked nucleic acids (LNAs), threose nucleic acid (TNA), 2’-O-Methyl
polynucleotides, 2'-O-alkyl ribosyl substituted polynucleotides, phosphorothioate polynucleotides, and boronophosphate polynucleotides. A polynucleotide analog may possess purine or pyrimidine analogs, including for example, 7-deaza purine analogs, 8-halopurine analogs, 5-halopyrimidine analogs, or universal base analogs that can pair with any base, including hypoxanthine, nitroazoles, isocarbostyril analogues, azole carboxamides, and aromatic triazole analogues, or base analogs with additional functionality, such as a biotin moiety for affinity binding. In some embodiments, the nucleic acid molecule or oligonucleotide is a modified oligonucleotide, such as it contains a modified nucleotide. In some embodiments, the nucleic acid molecule or oligonucleotide is a DNA with pseudo-complementary bases. In some embodiments, the nucleic acid molecule or oligonucleotide is backbone modified, sugar modified, or nucleobase modified.
[0056] As used herein, "nucleic acid sequencing" means the determination of the order of nucleotides in a nucleic acid molecule or a sample of nucleic acid molecules, and refer to any possible sequencing method from a variety of sequencing methods known in the art. Examples of sequencing methods include, without limitation, next generation sequencing, sequencing by synthesis, sequencing by ligation, sequencing by hybridization, polony sequencing, ion semiconductor sequencing, nanopore sequencing, single molecule sequencing and pyrosequencing. Further examples of nucleic acid sequencing technologies include platforms provided by Illumina, BGI, Qiagen, Thermo-Fisher, and Roche, including formats such as parallel bead arrays, sequencing by synthesis, sequencing by ligation, capillary electrophoresis, electronic microchips, “biochips,” microarrays, parallel microchips, and single-molecule arrays (See e.g., Service, Science (2006) 311:1544-1546). Some sequencing methods rely on amplification to clone many nucleic acid (e.g., DNA) molecules in parallel for sequencing in a phased approach. Single molecule sequencing interrogates single molecules of DNA and does not require amplification or synchronization. Examples of single molecule sequencing methods include single molecule real-time sequencing (Pacific Biosciences), nanopore-based sequencing (Oxford Nanopore), duplex interrupted nanopore sequencing, and direct imaging of DNA using advanced microscopy.
[0057] As used herein, “analyzing” the polypeptide means to identify, detect, quantify, characterize, distinguish, or a combination thereof, all or a portion of the components of the polypeptide. For example, analyzing a peptide, polypeptide, or protein includes determining all or a portion of the amino acid sequence (contiguous or non-continuous) of the peptide. Analyzing a polypeptide also includes partial identification of a component of the polypeptide (e.g., one or more terminal residues of the polypeptide). For example, partial identification of an
amino acid residue in the polypeptide sequence can identify an amino acid in the polypeptide as belonging to a subset of possible amino acid residues. In preferred embodiments, polypeptide analysis typically begins with analysis of the n NTAA, and then proceeds to the next amino acid residue of the polypeptide (i.e., n-1, n-2, n-3, and so forth). This is accomplished by removal of the n NTAA, thereby converting the n-1 amino acid residue of the polypeptide to a N-terminal amino acid (referred to herein as the “n-7 NTAA”). Analyzing the polypeptide may also include determining the presence and frequency of a post-translational modification on the peptide, such as glycosylation. Analyzing the polypeptide may include combining different types of analysis, for example obtaining epitope information, amino acid sequence information, post-translational modification information, or any combination thereof.
[0058] The term “unmodified” (also “wild-type” or “native”) as used herein is used in connection with biological materials such as nucleic acid molecules and polypeptides, refers to those which are found in nature and not modified by human intervention.
[0059] The term "modified", or “engineered”, or "variant" as used in reference to nucleic acid molecules and polypeptide molecules, e.g., binders, implies that such molecules are created by human intervention and/or they are non-naturally occurring. For example, an engineered binder is a polypeptide having an altered amino acid sequence, relative to an unmodified or wild-type polypeptide, such as starting scaffold, or a portion thereof. An engineered binder is a polypeptide which differs from a wild-type scaffold sequence, or a portion thereof, by one or more amino acid substitutions, deletions, additions, or combinations thereof. Sequence of a binder can contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more amino acid differences (e.g., mutations) compared to the sequence of starting scaffold. A binder generally exhibits at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a corresponding wild-type starting scaffold. Non-naturally occurring amino acids as well as naturally occurring amino acids are included within the scope of permissible substitutions or additions. A binder is not limited to any binders made or generated by a particular method of making and includes, for example, a binder made or generated by genetic selection, polypeptide engineering, chemical synthesis, directed evolution, de novo recombinant DNA techniques, or combinations thereof. [0060] In some embodiments, variants of a binder displaying only non-substantial or negligible differences in structure can be generated by making conservative amino acid substitutions in the binder. By doing this, binder variants that comprise a sequence having at least 80% (85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99%) sequence identity with the binder sequences can be generated, retaining at least one functional activity of
the binder, e.g., ability to specifically bind an N-terminal amino acid (NTAA) residue of the polypeptide analyte. Examples of conservative amino acid changes are known in the art. Examples of non-conservative amino acid changes that are likely to cause major changes in polypeptide structure are those that cause substitution of (a) a hydrophilic residue, e.g., serine or threonine, for (or by) a hydrophobic residue, e.g. , leucine, isoleucine, phenylalanine, valine or alanine; (b) a cysteine or proline for (or by) any other residue; (c) a residue having an electropositive side chain, e.g. , lysine, arginine, or histidine, for (or by) an electronegative residue, e.g. , glutamic acid or aspartic acid; or (d) a residue having a bulky side chain, e.g. , phenylalanine, for (or by) one not having a side chain, e g., glycine. Methods of making targeted amino acid substitutions, deletions, truncations, and insertions are generally known in the art. For example, amino acid sequence variants can be prepared by mutations in the DNA. Methods for polynucleotide alterations are well known in the art, for example, Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192 and the references cited therein.
[0061] As used herein, “identifying” a peptide means to predict identity of the peptide with a certain probability. It can be done by identifying a component (e.g., one or more amino acid residues) of the peptide. It can also be done by predicting certain amino acid residues of the peptide and their positions with certain probability, thus creating a peptide signature, and then matching bioinformatically the resulted peptide signature with corresponding signatures of peptides that may be present in the sample (e.g., by matching the peptide signature with peptide sequences from a proteomic or genomic database). For example, in some embodiments, existing selectivity of a binder is not enough to determine the NTAA residue to which the binder is bound with certainty. In these cases, identity of the NTAA residue can be determined with certain probability (such as being D, E or H and not A, G, I or L). Subsequent similar determination of adjacent amino acid residues creates an array of possible variants for the peptide based on variants in the assayed amino acid residues, and by matching this array of variants with theoretical possibilities determined from a proteomic or genomic database, it can be narrowed down to a particular sequence, if enough amino acid residues were assayed.
[0062] The term “sequence identity” is a measure of identity between polypeptides at the amino acid level, and a measure of identity between nucleic acids at nucleotide level. The polypeptide sequence identity may be determined by comparing the amino acid sequence in a given position in each sequence when the sequences are aligned. Similarly, the nucleic acid sequence identity may be determined by comparing the nucleotide sequence in a given position in each sequence when the sequences are aligned. "Sequence identity" means the percentage of
identical subunits at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps and insertions. For example, the BLAST algorithm (NCBI) calculates percent sequence identity and performs a statistical analysis of the similarity and identity between the two sequences. The software for performing BLAST analysis is publicly available through the National Center for Biotechnology Information (NCBI) website.
[0063] The terms “corresponding to position(s)” or “position(s) ... with reference to position(s)” of or within a polypeptide or a polynucleotide, such as recitation that nucleotides or amino acid positions “correspond to” nucleotides or amino acid positions of a disclosed sequence, such sequence set forth in the Sequence Listing, refers to nucleotides or amino acid positions identified in the polynucleotide or in the polypeptide upon alignment with the disclosed sequence using a standard alignment algorithm, such as the BLAST algorithm (NCBI). One skilled in the art can identify any given amino acid residue in a given polypeptide at a position corresponding to a particular position of a reference sequence, such as set forth in the Sequence Listing, by performing alignment of the polypeptide sequence with the reference sequence (for example, by using BLASTP publicly available through the NCBI website), matching the corresponding position of the reference sequence with the position in polypeptide sequence and thus identifying the amino acid residue within the polypeptide.
[0064] The term “joining” or “attaching” one substance to another substance means connecting or linking these substances together utilizing one or more covalent bond(s) and/or non-covalent interactions. Some examples of non-covalent interactions include hydrogen bonding, hydrophobic binding, and Van der Waals forces. Joining can be direct or indirect, such as via a linker or via another moiety. In preferred embodiments, joining two or more substances together would not impair structure or functional activities of the joined substances. Attachment can be direct or indirect. In some embodiments, indirect attachment means include attachment via a linker (e.g., flexible linker), or attachment via a solid support (i.e., when two moieties to be attached are independently coupled to the solid support). Recording tags can be attached to polypeptides pre- or post-immobilization to the solid support. For example, polypeptides can be first labeled with recording tags and then immobilized to a solid surface via a recording tag comprising two functional moieties for coupling. One functional moiety of the recording tag couples to the polypeptide, and the other functional moiety immobilizes the recording tag- labeled polypeptide to a solid support. Alternatively, polypeptides are immobilized to a solid support prior to labeling with recording tags. For example, polypeptides can first be derivatized with reactive groups such as click chemistry moieties. The activated polypeptides molecules can
then be attached to a suitable solid support and then labeled with recording tags using the complementary click chemistry moiety. As an example, polypeptides derivatized with alkyne and mTet moieties may be immobilized to a flow cell derivatized with azide and transcyclooctene (TCO) and attached to recording tags labeled with azide and TCO. Other click chemistry reactions may also be utilized. It is understood that the methods provided herein for attaching peptides to the solid support may also be used to attach recording tags to the solid support or attach recording tags to peptides.
[0065] Exemplary suitable reactions that can be used in the described attachment approaches include, without limitation, the copper catalyzed reaction of an azide and alkyne to form a triazole (Huisgen 1, 3-dipolar cycloaddition), strain-promoted azide alkyne cycloaddition (SPAAC), reaction of a diene and dienophile (Diels- Alder), strain-promoted alkyne-nitrone cycloaddition, reaction of a strained alkene with an azide, tetrazine or tetrazole, alkene and azide [3+2] cycloaddition, alkene and tetrazine inverse electron demand Diels- Alder (IEDDA) reaction (e.g., m-tetrazine (mTet) or phenyl tetrazine (pTet) and trans-cyclooctene (TCO); or pTet and an alkene), alkene and tetrazole photoreaction, Staudinger ligation of azides and phosphines, and various displacement reactions, such as displacement of a leaving group by nucleophilic attack on an electrophilic atom (Horisawa 2014, Knall, Hollauf et al. 2014). In some embodiments, m-tetrazine or phenyl tetrazine (pTet) is used in an iEDDA click chemistry reaction. In one case, a target polypeptide is labeled with a bifunctional click chemistry reagent, such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide. In some embodiments, an alkyne can also be a strained alkyne, such as cyclooctynes including Dibenzocyclooctyl (DBCO).
[0066] As used herein, the term “macromolecule comprises a moiety” refers to a situation where the moiety is either a part of the macromolecule, or directly attached to the macromolecule by means of one or more covalent bond(s), which unite them into a single molecule. Instead, the term “macromolecule associated with a moiety” indicates that the moiety may or may not be directly attached to the macromolecule by means of one or more covalent bond(s), and can be associated with the macromolecule by means of non-covalent interactions. For example, “macromolecule is associated with a recording tag” encompasses various possible ways for association between the macromolecule and the recording tag (either direct, covalent or non-covalent association, or indirect association, such as association via a linker or via another object, such as via solid support).
[0067] The term “peptide bond” as used herein refers to a chemical bond formed between two molecules (such as two amino acids) when the carboxyl group of one molecule reacts with the amino group of the other molecule, releasing a water molecule (H2O).
[0068] As used herein, the term “PNGase” refers to Peptide-N4-(N-acetyl-a-glucosaminyl) asparagine amidase, which is a type of deglycosylating enzyme that hydrolyzes N-linked (i.e., asparagine residue-linked) glycan moieties from glycopolypeptides and yields polypeptides containing aspartic acid residues in places of the original asparagine residues. PNGase may be a natural enzyme, such as enzymes that belong to enzyme classification number EC 3.5.1.52. Non-limiting examples of PNGase include PNGase F (such as derived from Elizabethkingia miricola with protein sequence found at UniProt ID: P21163), variants of PNGase F (e.g., PNGase F comprising one or more amino acid substitutions selected from D100N, E158Q, E246Q, or a combination thereof), PNGase A, PNGase H +, and variants thereof (see, e.g., Guo RR, et al., PNGase H + variant from Rudaea cellulosilytica with improved deglycosylation efficiency for rapid analysis of eukaryotic N-glycans and hydrogen deuterium exchange mass spectrometry analysis of glycoproteins. Rapid Commun Mass Spectrom. 2022 Nov 15; 36(21):e9376 and references therein). The methods of the present disclosure can also be performed using a functional fragment of a natural PNGase enzyme, wherein the fragment is configured to release N-glycans from asparagine residues. In some embodiments, an engineered enzyme having similar functionality as a natural PNGase enzyme (i.e., hydrolyzes N-linked glycan moieties from glycopolypeptides and yields polypeptides containing aspartic acid residues in places of the original asparagine residues) is used, such as, for example, an engineered PNGase having one or more conservative amino acid substitutions in the amino acid sequence of a naturally occurring PNGase. In some embodiments, an engineered PNGase used in the disclosed methods comprises an amino acid sequence having at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% sequence identity to any one of SEQ ID NOs: 34-36.
[0069] As used herein, PNGase deglycosylates polypeptide analytes by removing N-glycans from a glycan-containing polypeptide. Single PNGase or mixtures of PNGase enzymes may be used to deglycosylate a polypeptide. Cleaving “substantially all” glycans results in a completely deglycosylated protein where “complete deglycosylation” refers to >70%, >80%, >90%, >98% or >99% deglycosylation by a PNGase as determined by SDS-PAGE or by mass spectrometry. In preferred embodiments, PNGase treatment results in “complete deglycosylation”, that is at least 70% of N-glycans are removed from N-linked glycosylation sites of polypeptide analytes. [0070] As used herein, “N-linked glycosylation site” refers to a sequence motif of a polypeptide comprising an asparagine residue to which a N-linked glycan moiety may be
attached, and which may be converted to an aspartic acid residue following PNGase treatment. Examples of N-linked glycosylation sites within polypeptides comprise AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx refers to any amino acid except proline. As used herein, “glycan attachment residue” of a N-linked glycosylation site refers to either the asparagine residue to which an N-linked glycan attaches, or to the aspartic acid residue to which is the asparagine residue is converted following PNGase treatment. Amino acid identity of the glycan attachment residue may be determined by specific binders as disclosed herein.
[0071] Throughout this disclosure, various aspects of this invention are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
[0072] Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. Similarly, use of a), b), etc., or i), ii), etc. does not by itself connote any priority, precedence, or order of steps in the claims. Similarly, the use of these terms in the specification does not by itself connote any required priority, precedence, or order.
[0073] Other objects, advantages and features of the present invention will become apparent from the following specification taken in conjunction with the accompanying drawings.
DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0074] In this disclosure, a high throughput approach for detecting the presence of a glycan in a particular N-glycosylation site in proteins or peptides is disclosed. This approach can be combined with high throughput peptide identification, so that N-glycosylation sites are analyzed without referring to a specific peptide with a known sequence, but rather in all (or most) peptides that are present in a particular sample, or even in multiple samples.
[0075] Exemplary high throughput approaches that can be used for both peptide identification and the analysis of N-glycosylation sites are shown in FIG. 1-FIG. 3. The
exemplary assays described in FIG. 1-FIG. 3 are referred to as next generation peptide sequencing (NGPS) assays, because each of them allows to process thousands, millions or more peptide molecules in parallel.
[0076] FIG. 1 depicts an exemplary peptide sequencing assay with terminal amino acid (TAA)-specific binders. (1) Peptide molecules are each attached to a nucleic acid (e.g., a DNA) recording tag (RT) and attached to beads at a low peptide/RT pair (e.g., a peptide/RT conjugate) density, a sparsity that permits only intra-peptide/RT pair information transfer to occur. For instance, the peptide and its RT can form a conjugate, and in cases where the peptide is covalently attached to the RT to form a single molecule, the conjugate sparsity on a bead permits only intramolecular information transfer, and not information transfer between adjacent conjugates on the bead. The peptide terminal amino acid (TAA) residues are labeled with a terminal modification (TM) to provide greater affinity to binders. (2) Next, immobilized and labeled peptides are contacted with a set of binders each specific for labeled TAA residue(s) (e.g., labeled F-specific binder is shown). Each binder comprises a nucleic acid (e.g., DNA) coding tag (CT) that comprises a barcode with identifying information regarding the binding moiety of the binder. After binding and washing, the coding tag’s barcode is transferred enzymatically (via extension and/or ligation, such as primer extension followed by ligation) to the recording tag, generating an extended RT. (3) The labeled TAA is removed, e.g., by using mild Edman-like elimination chemistry or by a Cleavase enzyme. The cycle 1-2-3 is repeated n times. After n cycles, the extended RT containing barcodes that represent the n amino acid residues of the peptide is formed and can be sequenced by NGS. A representative structure of the extended RT after 7 cycles is shown.
[0077] FIG. 2 depicts an exemplary variation of the peptide sequencing assay with terminal amino acid (TAA)-specific binders as described in FIG. 1. The variation is in use of a set of binders each specific for non-labeled TAA residue(s) (e.g., F-specific binder is shown). Labeling of the TAA occurs after the barcode with identifying information regarding the binding moiety is transferred from the coding tag attached to the binder that was bound to the TAA to the recording tag attached to the peptide. More details regarding the NGPS approaches shown in FIG. 1-FIG. 2 are described in the US patent applications US 2023/0054691 Al, US 11513126 B2, US 2022/0227889 Al and US 2022/0283175 Al, each incorporated by reference herein.
[0078] FIG. 3 depicts another exemplary peptide sequencing assay with terminal amino acid (TAA)-specific binders. (1) Peptide molecules are each attached to a nucleic acid (e.g., a DNA) recording tag (RT) and attached to beads at a low peptide/RT pair (e.g., a peptide/RT conjugate) density, a sparsity that permits only intra-peptide/RT pair information transfer to occur. For
instance, the peptide and its RT can form a conjugate, and in cases where the peptide is covalently attached to the RT to form a single molecule, the conjugate sparsity on a bead permits only intramolecular information transfer, and not information transfer between adjacent conjugates on the bead. More details regarding the NGPS approaches shown in FIG. 3 are described in the US patent applications US 20200209255 Al, US 20210139973 Al and US 20210364527 Al, each incorporated by reference herein.
[0079] Utilizing a high throughput NGPS approach that involves use of terminal amino acidspecific binders, such as any one of the approaches shown in FIG. 1 - FIG. 2, allows to combine both peptide identification and the analysis of N-glycosylation sites, by an exemplary method shown below.
[0080] In some embodiments, disclosed herein is a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and (c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
[0081] In some embodiments, disclosed herein is also a method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) cleaving the polypeptide to generate a cleaved polypeptide comprising the glycan attachment residue of the N-linked glycosylation site as a terminal amino acid (TAA) residue of the cleaved polypeptide;
(c) contacting the immobilized and cleaved polypeptide with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and (d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and cleaved polypeptide, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
[0082] In some embodiments, disclosed herein is also a method for analyzing a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves an N-linked glycan from a glycan attachment residue of a N-linked glycosylation site of the polypeptide, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D),
wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
(c) cleaving the immobilized polypeptide to expose a new TAA residue;
(d) contacting the immobilized and cleaved polypeptide with the set of binders, wherein the first binder, the second binder, or the third binder binds to the new TAA residue of the immobilized and cleaved polypeptide;
(e) optionally repeating (c) - (d) in one or more cycles;
(f) determining an amino acid sequence of the polypeptide by obtaining or retaining identifying information regarding the binders that bind to the TAA residues in (b), (d), and optionally (e); and
(g) comparing the amino acid sequence determined in (f) to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase, thereby detecting one or more N-linked glycosylation sites in the polypeptide.
[0083] Various embodiments apply equally to the aspects provided herein but will for the sake of brevity be recited only once. Thus, various of the following embodiments may be applicable to all the embodiments recited above.
[0084] In some embodiments of the disclosed methods, the TAA residues are N-terminal amino acid (NTAA) residues. In some embodiments, the TAA residues are modified.
[0085] In some embodiments of the disclosed methods, the TAA is an N-terminal amino acid (NTAA). In other embodiments of the disclosed methods, the TAA is a C-terminal amino acid (CTAA).
[0086] In some embodiments, the cleavage of the polypeptide is performed by a cleaving enzyme such as an enzyme described in US 11,427,814 B2.
[0087] In some embodiments, the disclosed methods further comprise quantifying degree of the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site of the polypeptide by determining what fraction of molecules of the polypeptide or the cleaved polypeptide bind to the first binder and/or the second binder based on analysis of the identifying information regarding the first binder and/or the second binder.
[0088] In some embodiments of the disclosed methods, in addition to the polypeptide, additional 1000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
[0089] In some embodiments of the disclosed methods, attachments of an N-linked glycan to a glycan attachment residue of the N-linked glycosylation site of each polypeptide of the additional 1000 or more different polypeptides are assessed.
[0090] In some embodiments of the disclosed methods, the identifying information regarding the first binder and/or the second binder are analyzed by an optical method.
[0091] In some embodiments of the disclosed methods, the identifying information regarding the first binder and/or the second binder are analyzed by a nucleic acid sequencing method.
[0092] In some embodiments of the disclosed methods, before contacting with the PNGase, the N-linked glycosylation site comprises any one of the following amino acid sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx is any standard, naturally occurring amino acid residue.
[0093] In some embodiments of the disclosed methods, one or more additional molecules of the polypeptide are (i) immobilized on the solid support; (ii) are not contacted with PNGase; and (iii) analyzed as described in (b) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized polypeptide or the cleaved immobilized polypeptide after the contacting with the PNGase.
[0094] In some embodiments of the disclosed methods, contacting with the PNGase occurs after the immobilization of the polypeptide to the solid support.
[0095] In some embodiments of the disclosed methods, the detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
[0096] In some embodiments of the disclosed methods, the detectable labels of the first binder and/or the second binder are each comprise an epitope, which is later detected by a specific antibody.
[0097] In some embodiments of the disclosed methods, the detectable labels of the first binder and/or the second binder are each comprise a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
[0098] In some embodiments of the disclosed methods, the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ nucleic acid sequencing of the encoder barcode of the first binder or the second binder.
[0099] In some embodiments of the disclosed methods, the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag that
comprises the encoder barcode. In some embodiments, the in situ amplification is a rolling circle amplification.
[0100] In some embodiments, the disclosed methods further comprise hybridizing a fluorescent oligonucleotide probe to the amplified portion of the nucleic acid coding tag and detecting a signal from the fluorescent oligonucleotide probe.
[0101] In some embodiments of the disclosed methods, (i) the immobilized polypeptide is attached to a nucleic acid recording tag before contacting the immobilized polypeptide treated with the PNGase with the set of binders; and (ii) following binding of the first binder or the second binder to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide, generating an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the first binder or the second binder and nucleic acid sequence information of the nucleic acid recording tag attached to the immobilized polypeptide; and wherein analyzing the detectable label of the first binder or the second binder comprises determining a nucleic acid sequence of at least a portion of the extended nucleic acid construct, wherein the portion comprises the nucleic acid sequence information of the encoder barcode of the first binder or the second binder.
[0102] In some embodiments of the disclosed methods, the nucleic acid sequence of the portion of the extended nucleic acid construct is determined using a DNA sequencer.
[0103] In some embodiments of the disclosed methods, the support is a flow cell.
[0104] In some embodiments of the disclosed methods, the determining amino acid identity of the amino acid residue treated with the PNGase comprises determining a Likelihood of a particular type of the amino acid residue.
[0105] In some embodiments, the disclosed methods further comprise determining amino acid identities of one or more additional amino acid residues of the polypeptide treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder.
[0106] In some embodiments, the disclosed methods further comprise determining an amino acid sequence of the polypeptide treated with the PNGase based on the determined amino acid identities of the one or more additional amino acid residues of the polypeptide.
[0107] In some embodiments, the disclosed methods further comprise the amino acid sequence of the polypeptide treated with the PNGase to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase.
[0108] In some embodiments of the disclosed methods, the first binder binds to a terminal Asn (N) residue, and/or second binder binds to a terminal Asp (D) residue.
[0109] In some embodiments of the disclosed methods, the set of binders further comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D).
[0110] In some embodiments of the disclosed methods, the method does not comprise use of mass spectrometry.
[0111] In some embodiments, the first binder of the set of binders configured to specifically bind to an Asn (N) residue within a motif such as it also binds one or more neighboring amino acid residues of the polypeptide analyte. In some embodiments, the second binder is configured to specifically bind to an Asp (D) residue within a motif such as it also binds one or more neighboring amino acid residues of the polypeptide analyte. The motif may be a terminal motif or internal motif. For example, the first binder and/or the second binder may comprise an antibody or a functional fragment thereof which is configured to recognize a certain motif (e.g., 2 amino acid residues, 3 amino acid residues, 4 amino acid residues, or 5 amino acid residues) within a polypeptide sequence. In one example, such binders may be generated as described in US11970693B2 and US11282586B2, incorporated by reference herein. Antibodies that specifically bind to two, three or four amino acid residues epitopes are known in the art (see, e.g., US8859741B2 describing monoclonal antibodies that bind PCSK9 by contacting at least one of two internal residues (residue 237 or residue 238 of PCSK9), and W02013040564A2 describing antibodies recognizing arbitrarily designed epitope of three amino acid residues) and may be generated using well-known methods, such as Antibodies: A Laboratory Manual, Second edition; 2014, 847 pp, Cold Spring Harbor Laboratory Press, which provides step-by- step protocols for generating both polyclonal and monoclonal antibodies against a desired target. [0112] In some embodiments, binders used in the disclosed methods can specifically bind to an amino acid residue that serves as an attachment site of an N-linked glycan of a N-linked glycosylation site of a polypeptide. The amino acid residues that serve as an attachment site of N-glycans can be a terminal amino acid residue (TAA) or internal amino acid residue. Terminal amino acid residues may be specifically detected, because they are located in an unstructured “tail,” which can make them readily accessible by binders. In some embodiments, a polypeptide analyte is cleaved to make the attachment residue of a N-linked glycosylation site a terminal residue. In one example, amino acid residues of a polypeptide analyte are sequentially cleaved until the attachment residue of a N-linked glycosylation site becomes a terminal residue.
[0113] When an amino acid residue that serves as attachment site of the N-glycan within a N-linked glycosylation site of a polypeptide is an internal amino acid residue, such amino acid residue is often located within a locally unstructured loop or a structural turn allowing the
binders to bind the residue without steric interference from neighboring structural motifs. Indeed, N-linked glycosylation sites in proteins often comprises the Asn-X-Ser/Thr sequence motif which usually sits in a solvent-exposed, flexible region (a loops or turn) rather than in regular secondary structure. For example, an in silico survey of 53 glycoprotein motifs classified each motif residue as a-helix, fi-strand, structured turn, or non-structured loop, and showed a strong bias toward non-structured loop regions for the glycosylated Asn residues (Silverman JM, Imperial! B. Bacterial N-Glycosylation Efficiency Is Dependent on the Structural Context of Target Sequons. J Biol Chem. 2016 Oct 14;291(42):22001-22010). In another example, a survey of 506 glycoproteins in the PDB (2592 N-linked glycosylation sites, 1683 occupied) found that only ~10% of occupied sites fall in helices and ~17% in ^-strands, whereas ~27% are in turns, ~17.5% in bends and the remainder in “random coil” loops (Petrescu AJ, et al., Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding. Glycobiology. 2004 Feb:14(2):103-14). Loop or turn regions are more solvent-exposed, which enhances glycosylation efficiency and makes the glycan accessible for quality-control lectins, folding chaperones, and maturation enzymes. Accordingly, the attachment site residue of a N-linked glycosylation site may be specifically recognized by a binder even when such residue is an internal residue of a polypeptide analyte.
[0114] In some embodiments, the first binder is configured to specifically bind to a terminal Asn (N) residue. In some embodiments, the second binder is configured to specifically bind to a terminal Asp (D) residue. In some embodiments, the set of binder used in the assay comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D). Such binders are known in the art, see, e.g., U.S. Patent Nos. 9,566,335, 10,852,305, 11,959,920 and 9,435,810, incorporated by reference, and Example 1 below.
[0115] In some embodiments, the N-linked glycan attached to the attachment residue in a N- linked glycosylation site of a polypeptide is a glycan found on human glycoproteins and efficiently released by PNGase under appropriate conditions. In some embodiments, the N- linked glycan is a high-mannose type glycan such as a glycan having a chitobiose core (GlcNAca) and 3-7 mannose residues (e.g., MamGlcNAca, MamGlcNAca). In some embodiments, the N-linked glycan is “hybrid type” glycan such as a glycan having a branch that remains “high-mannose” (mannose-only) and one branch elaborated with N-acetylglucosamine (GlcNAc) and sometimes further sugars (e.g., MamGlcNAcs , which has core GlcNAcaMa and two additional mannoses on the al -3 arm and a GlcNAc on the al -6 arm). In some embodiments, the N-linked glycan is a bi-antennary type glycan such as a glycan having both the al -3 and al -6 mannose arms extended with GlcNAc, galactose (Gal), and often terminated
with sialic acid (Neu5Ac). Importantly, PNGase F (Peptide N-Glycosidase F) cleaves the bond between the innermost GlcNAc and the asparagine of virtually all mammalian N-linked glycans, so the attachment of these glycans to an amino acid residue (i.e., attachment residue) of the N- linked glycosylation site of a polypeptide may be assessed by the disclosed methods.
[0116] In some embodiments of the disclosed methods, to analyze multiple samples, proteins or peptides in each sample may be barcoded by installing a sample-specific barcode as a part of a recording tag attached to a particular peptide. Peptides from multiple samples each attached to a recording tag are mixed together and processed according to the methods described herein. During parallel analysis of extended recording tags that were attached to peptides during the encoding assay, sample-specific barcode information is extracted and decoded, so the identity and glycosylation status of each analyzed peptide can be combined with the origin of the peptide (e.g., from which sample the peptide is originated). Barcoding methods that permit sample multiplexing were described in the US patent applications US 2019/0145982 Al, US 2022/0214353 Al, and US 2022/0235405 Al, each incorporated by reference herein.
[0117] Provided herein is also a method for detecting N-linked glycosylation within one or more polypeptides, which comprise a first polypeptide having a N-linked glycosylation site, the method comprising:
(a) attaching polypeptides comprising the first polypeptide to a solid support, wherein the polypeptides are treated with a PNGase before or after the attachment of the polypeptides to the solid support, thereby obtaining immobilized polypeptides comprising immobilized first polypeptide de-glycosylated at the N-linked glycosylation site;
(b) analyzing sequentially at least some individual amino acid residues of the immobilized polypeptides, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
(i) contacting the solid support with a set of binders, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
(ii) following binding of a binder of the set of binders to the polypeptide, obtaining or retaining identifying information regarding the binder;
(iii) removing the TAA or the modified TAA to expose a new TAA, thereby generating a cleaved polypeptide, and, optionally, modifying the new TAA to yield a newly modified TAA; and (iv) repeating steps (i)-(iii) or (i)-(ii) at least one time, wherein the immobilized first polypeptide is analyzed; the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and
the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide after treatment with the PNGase;
(c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
[0118] Provided herein is also a method for analyzing a polypeptide comprising a N-linked glycosylation site, comprising:
(a) contacting the polypeptide with a PNGase, wherein the polypeptide is attached to a solid support before or after the contacting with the PNGase, thereby immobilizing the polypeptide on the solid support;
(b) determining at least partial amino acid sequence of the immobilized polypeptide, wherein the determining comprises:
(i) contacting the immobilized polypeptide treated with the PNGase with a set of binders,
(ii) cleaving the immobilized polypeptide to generate a cleaved immobilized polypeptide, and
(iii) contacting the cleaved immobilized polypeptide with a subsequent set of binders, wherein each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder, wherein the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
[0119] In preferred embodiments, the disclosed methods are for analysis and/or sequencing of multiple polypeptide analytes simultaneously (multiplexing).
[0120] Multiplexing as used herein refers to analysis of a plurality of polypeptide analytes in the same assay. The plurality of polypeptide analytes can be derived from the same sample or
31
different samples. The plurality of polypeptide analytes can be derived from the same subject or different subjects. The plurality of polypeptide analytes that are analyzed can be different polypeptide analytes, or the same polypeptide analyte derived from different samples. A plurality of polypeptide analytes includes 2 or more polypeptide analytes, 5 or more polypeptide analytes, 10 or more polypeptide analytes, 50 or more polypeptide analytes, 100 or more polypeptide analytes, 500 or more polypeptide analytes, 1000 or more polypeptide analytes, 5,000 or more polypeptide analytes, 10,000 or more polypeptide analytes, 50,000 or more polypeptide analytes, 100,000 or more polypeptide analytes, 500,000 or more polypeptide analytes, or 1,000,000 or more polypeptide analytes.
[0121] In preferred embodiments, the disclosed methods are for analyzing a large number of polypeptides (e.g., at least 1000, 10000, 100000, 1000000 or more polypeptide molecules which comprise molecules of at least 100, 1000, 10000 or more different polypeptides) in a single assay. In preferred embodiments, the disclosed methods are for analyzing a large number of N- linked glycosylation sites within a plurality of polypeptides (e.g., at least 100, 1000, 10000, 100000, or more N-linked glycosylation sites present in 1000, 10000, 100000, 1000000 or more polypeptide molecules).
[0122] In some embodiments of the disclosed methods, before step (a), the first polypeptide is generated by fragmenting a protein from a sample, such as biological sample.
[0123] In some embodiments, the disclosed method further comprises quantifying degree of glycosylation for the N-linked glycosylation site of the first polypeptide by determining what fraction of the immobilized first polypeptide molecules comprise Asp TAA as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules.
[0124] In some embodiments, some of immobilized polypeptide molecules comprising immobilized first polypeptide molecules obtained in (a) are: (i) not treated with the PNGase in (a), and (ii) analyzed as described in (b), followed by utilizing data obtained by the analysis of immobilized polypeptide molecules not treated with the PNGase in (c) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized first polypeptide molecules after treatment with the PNGase.
[0125] In preferred embodiments, the method does not comprise use of mass spectrometry.
[0126] In some embodiments, the first polypeptide is a pre-selected target polypeptide. In other embodiments, the first polypeptide is a random, previously unknown polypeptide present in a sample, such as biological sample.
[0127] In some embodiments, in addition to the first polypeptide, degrees of glycosylation for 100, 200, 500, 1000, 10000 or more different polypeptides each comprising a N-linked glycosylation site are determined in (c) utilizing the first binder and the second binder.
[0128] In some embodiments of the disclosed methods, the TAA is an N-terminal amino acid (NTAA) and the new TAA is a new NTAA.
[0129] In some embodiments of the disclosed methods, the TAA is a C-terminal amino acid (CTAA) and the new TAA is a new CTAA.
[0130] In some embodiments of the disclosed methods, each binder of the plurality binds to a modified TAA of an immobilized polypeptide. In some of these embodiments, the modified TAA is a modified N-terminal amino acid (NTAA). In other of these embodiments, the modified TAA is a modified C-terminal amino acid (CTAA).
[0131] In some embodiments of the disclosed methods, the modified NTAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N- terminal modifier agent before contacting the solid support with the set of binders. In some embodiments, the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3):
wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group
is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF3, NH2, N(CH3)2, NO2, SCH3, SO2CH3, CH2OH, B(OH)2, CN, CONH2, CO2H, CN4H, and CONHCH3; LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; CC is a cationic counterion; X is one of the following: O, S, Se, or NH.
[0132] In some embodiments of the disclosed methods, the first binder and/or the second binder each comprises a peptide or an aptamer.
[0133] In some embodiments of the disclosed methods, (i) each binder of the plurality comprises an identifying detectable label (i.e., detectable label that identifies the binder to which it is attached); (ii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting the identifying detectable label attached to the binder. In some embodiments, the identifying detectable label comprises a fluorescent moiety, and the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting a signal attached to the fluorescent moiety.
[0134] In some embodiments of the disclosed methods, (i) the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag; (ii) each binder of the plurality is attached to a nucleic acid coding tag that comprises identifying information regarding the binder; and (iii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (b)(ii) is retained in the nucleic acid recording tag attached to the polypeptide upon transfer from the nucleic acid coding tag, wherein the transfer comprises primer extension and/or ligation.
[0135] In other embodiments of the disclosed methods, (i) the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag that comprises a unique molecular identifier (UMI); (ii) each binder of the plurality is attached to a nucleic acid coding tag that comprises a barcode comprising identifying information regarding the binder; and (iii) upon binding of a binder to a polypeptide, the UMI from the polypeptide is transferred from the recording tag attached to the polypeptide to the coding tag of the binder, wherein the transfer comprises primer extension and/or ligation. Coding tags of binders are collected after each binding cycle, and the identifying information regarding binders that were bound to polypeptides in each binding cycle is obtained by sequencing of the coding tags and analyzing the UMIs present in the coding tags in connection with the barcodes present in the coding tags.
[0136] In some embodiments of the disclosed methods, the analysis of the identifying information regarding the first binder and/or regarding the second binder retained in step (b) for the immobilized first polypeptide molecules is performed by nucleic acid sequencing of recording tags attached to the first polypeptide molecules and extended upon transfer from the nucleic acid coding tags.
[0137] In some embodiments of the disclosed methods, the first binder specifically binds to Asn (N) TAA of an immobilized polypeptide, and the second binder specifically binds to Asp (D) TAA of an immobilized polypeptide.
[0138] In some embodiments of the disclosed methods, the first binder specifically binds not only to Asn (N) TAA of an immobilized polypeptide, but also to a penultimate terminal amino acid residue of the immobilized polypeptide. In some embodiments, the second binder specifically binds not only to Asp (D) TAA of an immobilized polypeptide, but also to a penultimate terminal amino acid residue of the immobilized polypeptide. In some embodiments of the disclosed methods, the first binder and/or the second binder specifically bind(s) not only to TAA of an immobilized polypeptide, but also to one or more neighboring amino acid residues of the immobilized polypeptide.
[0139] In some embodiments of the disclosed methods, the first binder preferentially binds to an Asn (N) residue over an Asp (D) residue of an immobilized polypeptide, and the second binder preferentially binds to the Asp (D) residue over the Asn (N) residue of an immobilized polypeptide. In preferred embodiments of the disclosed methods, the Asn (N) residue and the Asp (D) residue to which the binders are bound are terminal amino acid residues of immobilized polypeptide molecules.
[0140] In some embodiments of the disclosed methods, each of the immobilized polypeptides comprising the immobilized first polypeptide in (a) is covalently joined to the solid support.
[0141] In some embodiments of the disclosed methods, the solid support is a bead, such as porous bead.
[0142] In some embodiments of the disclosed methods, the solid support comprises a plurality of nucleic acid recording tags covalently attached to the support and configured to be associated directly or indirectly with analyzed polypeptides including the first polypeptide, wherein adjacent nucleic acid recording tags on the support are spaced apart from each other on a surface or within a volume of the support at an average distance of about 50 nm or greater. This is beneficial in embodiments where multiple different polypeptides are immobilized on the same support. Different polypeptides can be spaced appropriately (e.g., about 50 nm or greater) to reduce the occurrence of or prevent a cross-binding or inter-molecular event, e.g., where a binder binds to a first polypeptide and its coding tag information (i.e., the identifying barcode) is transferred to a recording tag attached to a neighboring polypeptide rather than the recording tag attached to the first polypeptide.
[0143] In some embodiments of the disclosed methods, in (b), the analysis comprises determining identity of amino acid residues present at the N-linked glycosylation site of different immobilized first polypeptide molecules.
[0144] In some embodiments of the disclosed methods, the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys.
[0145] In some embodiments of the disclosed methods, in (b)(iii), the TAA or the modified TAA are removed using an enzyme.
[0146] In some embodiments of the disclosed methods, in (a), the treatment with the PNGase occurs after the attachment to the solid support.
[0147] In some embodiments of the disclosed methods, in (b), steps (i)-(iii) or (i)-(ii) are repeated at least two, at least three, at least four or more times.
[0148] In some embodiments, the disclosed methods further comprise determining identity of the first polypeptide based on analysis of (i) the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, and (ii) identifying information regarding at least one additional binder of the set of binders that was bound to some of first polypeptide molecules during step (b).
[0149] In some embodiments of the disclosed methods, each binder of the set of binders comprises a detectable label or a signal-generating moiety. In some embodiments of the disclosed methods, each binder of the set of binders comprises a fluorescent label.
[0150] In some embodiments of the disclosed methods, each binder of the set of binders is attached to a nucleic acid coding tag that comprises identifying information regarding the binder. In some embodiments of the disclosed methods, each binder of the set of binders is covalently attached to the associated nucleic acid coding tag.
[0151] The identifying information regarding a binder may be obtained or retained, depending on the detection method used. In some embodiments of the disclosed methods, the identifying information regarding a binder is obtained, such as an identified detectable label of the binder is detected or analyzed after the binder binds to the target polypeptide. In other embodiments of the disclosed methods, the identifying information regarding a binder is retained, such as retained in a nucleic acid recording tag attached to a target polypeptide after the binder binds to the target polypeptide; the retained identifying information may be analyzed and decoded later after completion of the encoding assay (i.e., after analyzing sequentially at least some individual amino acid residues of the immobilized polypeptide). In some embodiments, the retained identifying information is analyzed using a nucleic acid sequencing of the recording tag attached to the target polypeptide and extended after the encoding assay.
[0152] In some embodiments of the disclosed methods, each binder of the set of binders comprises a catalytic residue that participates in a catalytic dyad or catalytic triad, wherein the
catalytic dyad or catalytic triad mediates the binder acylation and formation of a covalent acyl bond between the catalytic residue and the new CTAA residue of the cleaved polypeptide, wherein the covalent acyl bond is subsequently hydrolyzed in a controllable manner to release the catalytic residue. In preferred embodiments, the catalytic residue is a serine, threonine or cysteine residue.
[0153] In some embodiments of the disclosed methods, at least one binder of the set of binders is or comprises an engineered serine carboxypeptidase. In other embodiments of the disclosed methods, at least one binder of the set of binders is or comprises an engineered cysteine carboxypeptidase.
[0154] Barcode information of a coding tag attached to a specific binder may be transferred to a recording tag using a variety of methods. In any of the disclosed embodiments, the transfer of identifying information (e.g., from a coding tag to a recording tag, or from a recording tag to a coding tag) can be accomplished by ligation (e.g., an enzymatic or chemical ligation, a splint ligation, a sticky end ligation, a single-strand (ss) ligation such as a ssDNA ligation, or any combination thereof), a polymerase-mediated reaction (e.g., primer extension of single-stranded nucleic acid or double-stranded nucleic acid), or any combination thereof.
[0155] In some embodiments, a DNA polymerase that is used for primer extension during information transfer possesses strand-displacement activity and has limited or is devoid of 3 ’-5 exonuclease activity. Examples of such polymerases include Klenow exo- (Klenow fragment of DNA Pol 1), T4 DNA polymerase exo-, T7 DNA polymerase exo (Sequenase 2.0), Pfu exo-, Vent exo-, Deep Vent exo-, Bst DNA polymerase large fragment exo-, Bea Pol, and Phi29 Pol exo-. In a preferred embodiment, the DNA polymerase is active at room temperature and up to 45 °C. In another embodiment, a “warm start” version of a thermophilic polymerase is employed such that the polymerase is activated and is used at about 40°C-50 °C. An exemplary warm start polymerase is Bst 2.0 Warm Start DNA Polymerase (New England Biolabs).
[0156] Barcode information of a coding tag attached to a specific binder may be transferred to a recording tag attached to the immobilized polypeptide via ligation. Ligation may be a blunt end ligation or sticky end ligation. Ligation may be an enzymatic ligation reaction. Examples of ligases include, but are not limited to CV DNA ligase, T4 DNA ligase, T7 DNA ligase, T3 DNA ligase, Taq DNA ligase, E. coli DNA ligase. Alternatively, a ligation may be a chemical ligation reaction. In one embodiment, a spacer-less ligation is accomplished by using hybridization of a “recording helper” sequence with an arm on the coding tag. The annealed complement sequences are chemically ligated using standard chemical ligation or “click
chemistry” (Gunderson et al., Genome Res (1998) 8(11): 1142-1153; Litovchick et al., Artif DNA PNA XNA (2014) 5(1): e27896; Roloff et al., Methods Mol Biol (2014) 1050:131-141). [0157] Various aspects of coding tag and recording tag compositions, as well as aspects of transferring identifying information from a coding tag to a recording tag are disclosed in the earlier published applications US 2019/0145982 Al and US 2020/0348308 Al, incorporated herein by reference.
[0158] In some embodiments, a natural PNGase enzyme is used in the disclosed methods to treat polypeptides having N-linked glycan moieties to remove the N-linked glycan moieties from the polypeptides and yield polypeptides containing aspartic acid residues in places of the original asparagine residues in the N-linked glycosylation sites. A variety of PNGase enzymes are described in the literature, available commercially and can be used in the disclosed methods. Natural PNGases can be divided into three groups: PNGase F-like (bacterial) enzymes, acidic PNGases and cytoplasmic PNGases (see, e.g., Wang T, Voglmeir J. PNGases as valuable tools in glycoprotein analysis. Protein Pept Lett. 2014;21(10):976-85). All PNGases catalyze the same enzymatic reaction, but their protein structures are different, which reflects their different biological roles. PNGase F-like enzymes can be produced in E. coli, and are commercially available (e.g., PNGase F and rapid PNGase F from New England Biolabs, Ipswich, U.S., Cat. No P0704 and P0710). The catalytic structure of PNGase F-like enzymes is well-known (Norris GE, Stillman TJ, Anderson BF, Baker EN. The three-dimensional structure of PNGase F, a glycosylasparaginase from Flavobacterium meningosepticuin. Structure. 1994 Nov 15:2(11): 1049-59), and improved versions of the enzyme were made and used.
[0159] In preferred embodiments, PNGase de-glycosylates polypeptide molecules at N- linked glycosylation sites. In some embodiments, PNGase de-glycosylates at least 50%, 60%, 70%, 80%, 90%, 95% or more polypeptide molecules at N-linked glycosylation site present within the analyzed polypeptide.
[0160] In some embodiments, a suitable acidic PNGase (PNGase A) or cytoplasmic PNGase (e.g., yPngl from yeast) is used in the disclosed methods. In some embodiments, PNGase used in the disclosed methods is selected from the group consisting of PNGase A (see SEQ ID NO: 36), PNGase Ar, PNGase Le, PNGase F (see SEQ ID NO: 34 or SEQ ID NO: 35), a derivative thereof (i.e., having amino acid sequence comprising at least 30%, 50%, 70% or 90% identity to any one of SEQ ID NO: 34-36), a fragment thereof, or a combination thereof.
[0161] In some embodiments, PNGase is expressed as a recombinant protein before use in the disclosed methods, such as expressed in animal cells, yeast cells, or insect cells. In some embodiments, a host cell such as yeast capable of secreting PNGase is selected although the
PNGase could also be purified from the lysate of the host cells (see US 9964548 B2, incorporated by reference herein). Examples of suitable host cells for expressing a plant-derived PNGase may include yeast such as Kluyveromyces lactis or Pichia pastoris. If PNGase is synthesized in the nonnative host cell with a high mannose N-linked glycan incorporated into PNGase during the expression, it may be deglycosylated using a suitable high mannose N-linked glycans cleavage enzyme such as Endo H. In some embodiments, a non-natural, engineered PNGase enzyme is used in the disclosed methods.
[0162] In some embodiments, polypeptides are generated by fragmenting one or more proteins from a sample before treating the polypeptides with PNGase. Loss of three-dimensional protein structure during fragmentation highly improve the efficiency of deglycosylation by PNGase. Fragmentation may occur enzymatically (e.g., by protease digestion, such as trypsin) or mechanically (e.g., by ultrasound). In some embodiments, a surfactant is further added to aid deglycosylation by PNGase (see US 9964548 B2, incorporated by reference herein). In some embodiments, deglycosylation by PNGase comprises heating the sample to about 37-50° C for 10 minutes.
[0163] In some embodiments, each binder the set of binders is modified to be conjugated to an identifiable detectable label. In some embodiments the detectable label is a fluorescent label. In some embodiments, the detectable label is a magnetic label. In some embodiments, an identifiable detectable label is a nucleic acid barcode. In some embodiments, an identifiable detectable label is an affinity tag (e.g., Flag tag, HA tag or biotin tag). In some embodiments, the number of spaces on the support occupied by an identified portion of a polypeptide analyte is counted to quantify the level of that polypeptide analyte in the sample.
[0164] In some embodiments, polypeptide analytes may be mixed, spotted, dropped, pipetted, flowed, or otherwise applied to the support. In some embodiments, support has been functionalized with a chemical moiety such as an NHS ester or other amine-specific reagent before polypeptide analytes are applied to the support. This allows to use immobilization of polypeptide analytes to the support through N-terminus (see also Example 3 below). In some embodiments, adjacent polypeptide analytes of the disclosed methods attached to the solid support are spaced apart from each other on a surface or within a volume of the support at an average distance of about 50 nm or greater.
[0165] In preferred embodiments, selectivity of each binder used during the encoding assay towards NTAA or CTAA resides of polypeptide analytes is determined in advance, before performing contacting steps of the disclosed methods. Each binder may be tested against a panel of peptides each having a different NTAA or CTAA reside and an associated recording tag to
characterize selectivity and, optionally, binding kinetics of the binder for each of the 20 natural CTAA resides. When multiple alternative binders exist, a set comprising minimum number of binders may be selected that would cover all or a maximum number of the 20 natural NTAA or CTAA resides.
[0166] In some embodiments, the provided methods are for generating a nucleic acid encoded library representation of the binding history of each polypeptide of the plurality of target polypeptides (i.e., polypeptide analytes). This nucleic acid encoded library can be amplified, and analyzed using high-throughput next generation digital sequencing methods, enabling millions to billions of molecules to be analyzed per run. The creation of a nucleic acid encoded library of binding information is useful in another way in that it enables enrichment, subtraction, and normalization by DNA-based techniques that make use of hybridization. These DNA-based methods are easily and rapidly scalable and customizable, and more cost-effective than those available for direct manipulation of other types of macromolecule libraries, such as polypeptide libraries. Thus, nucleic acid encoded libraries of binding information can be processed prior to sequencing by one or more techniques to enrich and/or subtract and/or normalize the representation of sequences. This enables information of maximum interest to be extracted much more efficiently, rapidly and cost-effectively from very large libraries whose individual members may initially vary in abundance over many orders of magnitude.
[0167] In some embodiments of the disclosed methods, after transferring the identifying information regarding the binder between the nucleic acid coding tag and the nucleic acid recording tag, the binder is released from the cleaved polypeptide analyte. In preferred embodiments, the release is controllable. In some embodiments, the release is induced by changing the reaction conditions, such as buffer conditions. In some embodiments, the release is controlled by a nucleic acid cleaving reagent used to generate a cleaved extended recording tag on the support attached to the polypeptide analyte.
[0168] In some embodiments, binder is engineered from an endopeptidase, such as serine, threonine or cysteine endopeptidase, which binds to several amino acid residues and the acylated intermediate (binder-polypeptide) is formed by the covalent bond between the binder and a residue distal to carboxy or amino terminus of the polypeptide analyte. In these embodiments, the distal residue becomes a new CTAA of the polypeptide analyte in the next cycle of encoding.
[0169] In some embodiments, provided herein is a method of identifying a large plurality of polypeptide analytes (e.g., at least 1000, 10000, 100000, 1000000 or more polypeptide molecules which comprise molecules of at least 100, 1000, 10000 or more different
polypeptides) in a single assay, and also simultaneously detect and/or quantify N-glycosylation site occupancies in these polypeptide analytes.
[0170] In some embodiments, proteins from a sample can be fractionated into a plurality of fractions, and proteins in each plurality of fractions can be fragmented to polypeptides followed by barcoding of the polypeptides (e.g., by introducing a sample barcode into an associated recording tag for each polypeptide). Then, barcoded polypeptides from different fractions each conjugated to a recording tag can be pooled together and analyzed using methods and compositions disclosed herein. Fractionation, barcoding and pooling techniques are beneficial for analysis of complex biological samples, such as samples having proteins of vastly different abundances (e.g., plasma). Techniques for fractionation, barcoding and pooling are known in the art and disclosed, for example, in US 20190145982 Al, incorporated by reference herein.
[0171] During the disclosed methods, individual polypeptide molecules are attached to a solid support, and at least some individual amino acid residues of the immobilized polypeptides are analyzed sequentially, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
(i) contacting immobilized polypeptides with a set of binders, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of immobilized polypeptides;
(ii) following binding of a binder of the set of binders to the polypeptide, obtaining or retaining identifying information regarding the binder;
(iii) removing the TAA or the modified TAA to expose a new TAA, and, optionally, modifying the new TAA to yield a newly modified TAA, thereby generating a cleaved polypeptide; and
(iv) repeating steps (i)-(iii) or (i)-(ii) at least one time.
[0172] In some embodiments, binders of the set of binders have different specificities towards TAA residues or modified TAA residues which the binders are engineered to bind. [0173] In some embodiments, the first binding profile is determined by generating a digital signature that shows (i) which binders of the set of binders were bound to the modified TAA of polypeptide in the performed binding cycles, and (ii) which binders of the set of binders were not bound to the modified TAA of polypeptide in the performed binding cycles.
[0174] In some embodiments, each binder comprises a detectable label. In some embodiments, the detectable label of each binder comprises a fluorescently labeled probe.
[0175] In some embodiments, the detectable label of each binder comprises a unique epitope. In some embodiments, unique epitopes of binders of the set of binders are distinguished and detected by antibodies that each bind to a unique epitope. Using this approach, the signal
may be further amplified using methods known in the art, such as using secondary antibodies to decorate primary antibodies upon binding to the cognate epitope.
[0176] In some embodiments, the detectable label of each binder comprises a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder. Utilizing nucleic acid tags as detectable labels provides unique advantages because nucleic acid-based barcodes provide much greater opportunity for multiplexing compared to fluorescent labels. In addition, nucleic acid tags as detectable labels can be amplified quickly and efficiently, for example, by any known nucleic acid amplification method.
[0177] In some embodiments of the disclosed methods, the detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
[0178] The terms “label” and “detectable label” comprise a directly or indirectly detectable moiety that is associated with (e.g., conjugated to) a molecule to be detected, e.g., a detectable probe, comprising, but not limited to, fluorophores, radioactive isotopes, fluorescers, chemiluminescers, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, chromophores, dyes, metal ions, metal sols, ligands (e.g., biotin or haptens) and the like.
[0179] In some embodiments of the disclosed methods, the detectable labels of the first binder and/or the second binder are each comprise a fluorophore, which is a substance that is capable of exhibiting fluorescence in the detectable range. Particular examples of labels that may be used in accordance with the provided embodiments comprise, but are not limited to phycoerythrin, Alexa dyes, fluorescein, YPet, CyPet, Cascade blue, allophycocyanin, Cy3, Cy5, Cy7, rhodamine, dansyl, umbelliferone, Texas red, luminol, acradimum esters, biotin, green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (EYFP), blue fluorescent protein (BFP), red fluorescent protein (RFP), firefly luciferase, Renilla luciferase, NADPH, beta-galactosidase, horseradish peroxidase, glucose oxidase, alkaline phosphatase, chloramphenical acetyl transferase, and urease. Labels can also include quantum dots.
[0180] In some embodiments, obtaining or retaining identifying information regarding the binder bound to the glycan attachment residue comprises analyzing the detectable label of the binder. In some embodiments, analyzing the detectable label of the binder comprises generating an amplification product (e.g., RCA product or bridge amplification product) (see, e.g., FIG.
10). In some embodiments, the methods further comprise incubating the detection probes comprising labels with the amplification product, washing unbound detection probes, and
detecting the labels, e.g., by imaging (see, e.g., FIG. 11). In some aspects, the method comprises sequential hybridization of detectab ly labelled probes to probes (e.g., at the overhangs) hybridizes to the amplification products, thereby generating a spatiotemporal signal signature or code that identifies or corresponds to an encoder barcode sequence in the amplification product, which can be used to identify the binder that binds to that bind to the TAA residue of the immobilized and cleaved polypeptide.
[0181] In some embodiments, fluorescence microscopy is used for detection and imaging of binder’s detectable label or a detection probe. In some aspects, a fluorescence microscope is an optical microscope that uses fluorescence and phosphorescence instead of, or in addition to, reflection and absorption to study properties of organic or inorganic substances. In fluorescence microscopy, a sample (e.g., a support comprising a plurality of polypeptides each attached to a location in a plurality of spatially separated locations of the support) is illuminated with light of a wavelength which excites fluorescence in the sample. The fluoresced light, which is usually at a longer wavelength than the illumination, is then imaged through a microscope objective. Two filters may be used in this technique; an illumination (or excitation) filter which ensures the illumination is near monochromatic and at the correct wavelength, and a second emission (or barrier) filter which ensures none of the excitation light source reaches the detector.
Alternatively, these functions may both be accomplished by a single dichroic filter. The “fluorescence microscope” comprises any microscope that uses fluorescence to generate an image, whether it is a simpler setup like an epifluorescence microscope, or a more complicated design such as a confocal microscope, which uses optical sectioning to get better resolution of the fluorescent image. In some embodiments, confocal microscopy is used for detection and imaging of binder’s detectable label. Confocal microscopy uses point illumination and a pinhole in an optically conjugate plane in front of the detector to eliminate out-of-focus signal.
[0182] In some embodiments, binder’s detectable label is a fluorescent probe, which is detected using an optical detection system, which may include, without limitation, a near- field scanning microscopy, far-field confocal microscopy, charge-coupled device (CCD), total internal reflection fluorescence (TIRF) microscopy, super-resolution fluorescence microscopy, dark field microscopy, photoconversion, single and/or multiphoton excitation, spectral wavelength discrimination, fluorophore identification, evanescent wave illumination, and singlemolecule localization microscopy. In some embodiments, methods include detection of laser- activated fluorescence using a microscope equipped with a camera. Suitable photon detection systems include, without limitation, photodiodes and intensified CCD cameras.
[0183] In some embodiments, imaging systems known in the art and used during detection of fluorescent signals in DNA sequencing methods may be used in methods disclosed herein. Exemplary imaging systems are disclosed in US9897791B2, US7589315B2, US8039817B2, US8412467B2 and US11834756B2, each of which is incorporated herein by reference.
[0184] In preferred embodiments of disclosed methods, polypeptides to be analyzed are attached to a solid support. In some embodiments, polypeptide analytes form an array on the solid support. In some embodiments, polypeptide analytes form an ordered array on the solid support. In some embodiments, polypeptide analytes form an unordered array on the solid support. In some embodiments, polypeptide analytes are covalently immobilized on the solid support. In some embodiments, polypeptide analytes are covalently attached to the solid support via nucleic acid molecules such as described in US 1 l,634,709B2, incorporated by reference. [0185] In some embodiments, the solid support is a solid support or surface used for immobilization of nucleic acid analytes during NGS nucleic acid sequencing. A variety of such supports are known in the art and used in NGS nucleic acid sequencing instruments, such as instruments from Illumina, Pacific Biosciences, or other companies which produce and sell NGS instruments. There are methods known in the art that allow for high-density immobilization of nucleic acids on solid supports, and those can be used in methods disclosed herein for high- density polypeptide immobilization on the support. In some embodiments, the solid support is planar flow cell. In some embodiments, the solid support is a glass support with patterned or unpattemed nanowells containing capture nucleic acid probes that can be used in methods disclosed herein for polypeptide immobilization and analysis. In some embodiments, the solid support is a membrane, such as a nylon membrane. Exemplary materials and other parameters of solid support that can be used in methods disclosed herein are disclosed in US9902951 B2, US9758825 B2, US12104281 B2, US11203612B2, US5846719 A, US5667976A, US8698102B2, and US 11732301 B2, each of which is incorporated herein by reference.
[0186] In some embodiments of the disclosed methods, the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ nucleic acid sequencing of the encoder barcode of the first binder or the second binder. In other embodiments of the disclosed methods, the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag that comprises the encoder barcode.
[0187] In some embodiments, the amplification product is sequenced using sequencing by synthesis in situ at the location of the polypeptide on the support, where a first population of
detectab ly labeled nucleotides (e.g., dNTPs) are introduced to contact a template nucleotide (e.g., a barcode sequence in an RCA product or a bridge amplification product) hybridized to a sequencing primer, and a first detectab ly labeled nucleotide (e.g., A, T, C, or G nucleotide) is incorporated by a polymerase to extend the sequencing primer in the 5' to 3' direction using a complementary nucleotide (a first nucleotide residue) in the template nucleotide as template. A signal from the first detectably labeled nucleotide can then be detected. The first population of nucleotides may be continuously introduced, but in order for a second detectably labeled nucleotide to incorporate into the extended sequencing primer, nucleotides in the first population of nucleotides that have not incorporated into a sequencing primer are generally removed (e.g., by washing), and a second population of detectably labeled nucleotides are introduced into the reaction. Then, a second detectably labeled nucleotide (e.g., A, T, C, or G nucleotide) is incorporated by the same or a different polymerase to extend the already extended sequencing primer in the 5' to 3' direction using a complementary nucleotide (a second nucleotide residue) in the template nucleotide as template. Thus, in some embodiments, cycles of introducing and removing detectably labeled nucleotides are performed.
[0188] In some embodiments, the amplification product is sequenced using sequencing by synthesis which comprises contacting the amplification product with a nucleotide mix comprising a fluorescently labeled nucleotide and a nucleotide that is not fluorescently labeled. In some embodiments, a cognate nucleotide is incorporated by the polymerase into the sequencing primer or an extension product thereof, and the cognate nucleotide may or may not be fluorescently labeled. Sequencing by synthesis methods comprise those described in, for example, but not limited to, US 2007/0166705, US 2006/0188901, U.S. Pat. No. 7,057,026, US 2006/0240439, US 2006/0281109, US 2011/0059865, US 2005/0100900, U.S. Pat. No. 9,217,178, US 2009/0118128, US 2012/0270305, US 2013/0260372, and US 2013/0079232, all of which are herein incorporated by reference in their entireties.
[0189] In some embodiments, the amplification product is sequenced in situ at the location of the polypeptide on the support using a polymerase that is fluorescently labeled. In some embodiments, the amplification product is sequenced in situ at the location of the polypeptide on the support using a polymerase-nucleotide conjugate comprising a fluorescently labeled polymerase linked to a nucleotide moiety that is not fluorescently labeled.
[0190] In some embodiments, nucleic acid hybridization can be used for multiplex detecting amplification products, using labeled nucleic acid decoder probes that are complementary to at least a portion of a barcode sequence. Multiplex decoding can be performed with pools of many different probes with distinguishable labels. Non-limiting examples of nucleic acid hybridization
sequencing are described for example in U.S. Pat. No. 8,460,865, and in Gunderson et al., Genome Research 14:870-877 (2004).
[0191] In some embodiments, the amplification product is sequenced using sequencing by binding, using a polymerase that is fluorescently labeled and one or more nucleotides that are not fluorescently labeled. In some embodiments, during sequencing by binding, a cognate nucleotide is not incorporated by the polymerase into the sequencing primer or an extension product thereof. In some embodiments, incorporation of a cognate nucleotide by the polymerase into the sequencing primer or an extension product thereof is attenuated or inhibited. Various aspects of sequencing by binding are described in U.S. Pat. No. 10,655,176 B2, the content of which is herein incorporated by reference in its entirety. In some embodiments, sequencing by binding comprises performing repetitive cycles of detecting a stabilized complex that forms at each position along the template nucleic acid to be sequenced (e.g. a ternary complex that includes the primed template nucleic acid, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template nucleic acid. In the sequencing by binding approach, detection of the nucleotide at each position of the template occurs prior to extension of the primer to the next position. Generally, the methodology is used to distinguish the four different nucleotide types that can be present at positions along a nucleic acid template by uniquely labelling each type of ternary complex (i.e. different types of ternary complexes differing in the type of nucleotide it contains) or by separately delivering the reagents needed to form each type of ternary complex. In some instances, the labeling may comprise fluorescence labeling of, e.g., the cognate nucleotide or the polymerase that participate in the ternary complex.
[0192] In some embodiments, the detection by probe hybridization comprises sequential hybridization of fluorescently labeled probes to the amplification product (see, e.g., FIG. 10 and FIG. 11). In some embodiments, the analysis of the detectable label comprises detecting a polymer generated by a hybridization chain reaction (HCR) reaction, see e.g., US 2017/0009278, which is incorporated herein by reference in its entirety, for exemplary probes and HCR reaction components. In some embodiments, the analysis of the detectable label comprises hybridizing to the amplification product a detection oligonucleotide (e.g., a detection probe) labeled with a fluorophore, an isotope, a mass tag, or a combination thereof. In some embodiments, the detection or determination comprises imaging the amplification product, e.g., while the amplification product is attached to a location in a plurality of spatially separated locations of a support such as a flow cell.
[0193] In some embodiments, the analysis of binder’s detectable label in the provided methods comprises imaging the amplification product (e.g., RCA product or a rolling circle transcription product) via binding of the detection probe to the amplification product and detecting a label associated with the detection probe. In some embodiments, the label associated with the detection probe can be measured and quantitated.
[0194] In some embodiments, the detectable label of each binder comprises a unique epitope. In some embodiments, unique epitopes of binders of the set of binders are distinguished and detected by antibodies that each bind to a unique epitope. Using this approach, the signal may be further amplified using methods known in the art, such as using secondary antibodies to decorate primary antibodies upon binding to the cognate epitope.
[0195] In some embodiments of the disclosed methods, each of the immobilized polypeptides is attached to a nucleic acid recording tag.
[0196] In some embodiments, each binder of the set of binders has a certain level of specificity towards one or more TAA residues or modified TAA residues which the binder is engineered to bind. As used herein, binder specificity towards a TAA residue implies a combination of affinity, which is a strength of interaction, and selectivity, which is whether the binder prefers the one or more target TAA residues over other TAA residues of polypeptides. Certain level of affinity between the binder and the one or more target TAA residues is required in order to generate an extended nucleic acid construct following binding of the binder to the TAA residue of the polypeptide, because merely transient binding may not be enough to enable “productive” interaction between the coding tag of the binder and recording tag of the polypeptide.
[0197] In some embodiments, a binder binds to a modified or labeled terminal amino acid (e.g., an NTAA that has been functionalized or modified). In some embodiments, a binder binds to a chemically or enzymatically modified terminal amino acid. A modified or labeled NTAA can be one that is functionalized with phenylisothiocyanate, PITC, l-fluoro-2,4-dinitrobenzene (Sanger’s reagent, DNFB), benzyloxycarbonyl chloride or carbobenzoxy chloride (Cbz-Cl), N- (Benzyloxycarbonyloxy)succinimide (Cbz-OSu or Cbz-O-NHS), dansyl chloride (DNS-C1, or 1- dimethylaminonaphthalene-5-sulfonyl chloride), 4-sulfonyl-2-nitrofluorobenzene (SNFB), N- Acetyl-Isatoic Anhydride, Isatoic Anhydride, 2-Pyridinecarboxaldehyde, 2- Formylphenylboronic acid, 2-Acetylphenylboronic acid, l-Fluoro-2,4-dinitrobenzene, Succinic anhydride, 4-Chloro-7-nitrobenzofurazan, Pentafluorophenylisothiocyanate, 4- (Trifluoromethoxy)-phenylisothiocyanate, 4-(Trifluoromethyl)-phenylisothiocyanate, 3- (Carboxylic acid)-phenylisothiocyanate, 3-(Trifluoromethyl)-phenylisothiocyanate, 1-
Naphthylisothiocyanate, N-nitroimidazole-l-carboximidamide, N,N,A<-Bis(pivaloyl)-1H- pyrazole- 1 -carboxamidine, N,N , A<-Bis(benzyloxycarbonyl)- 1 H-pyrazole- 1 -carboxamidine, an acetylating reagent, a guanidinylation reagent, a thioacylation reagent, a thioacetylation reagent, or a thiobenzylation reagent, or a diheterocyclic methanimine reagent. In some embodiments, the binder binds an amino acid labeled by contacting with a reagent or using a method as described in US 2020/0348307 Al, US 2023/0220589 Al, or US 2020/0348307 Al, each incorporated herein by reference. In some embodiments, the binder binds an amino acid labeled by an amine modifying reagent.
[0198] In some embodiments, a modifier agent configured to modify a terminal amino acid (TAA) of a polypeptide to yield the modified TAA comprises a compound of any one of Formulas (l)-(10), wherein:
Formula (1) is
R6
or a salt or conjugate thereof, wherein
R6 and R7 are each independently Ci-6 alkyl, -CO2C1-4 alkyl, -ORk, aryl, heteroaryl, cycloalkyl or heterocyclyl, wherein the C1-6 alkyl, -CO2C1-4 alkyl, -ORk, aryl, and cycloalkyl are each unsubstituted or substituted; and
Rkis H, C1-6 alkyl, or heterocyclyl, wherein the C1-6 alkyl and heterocyclyl are each unsubstituted or substituted. Heterocyclyl can be 5-8 membered ring comprising one or two heteroatoms selected from N, O and S as ring members. Heteroaryl can be a 5-6 membered single ring or 8-10 membered bicyclic ring, each of which comprises one to three heteroatoms selected from N, O and S as ring members;
Formula (2) is:
wherein: each R is independently C1-4 alkyl, optionally substituted with up to three groups selected from halo, C1-2 alkoxy, C1-2 haloalkyl, and C1-2 haloalkoxy; and two R groups on the same N can optionally cyclize to form a 5-7 membered ring, optionally containing an additional heteroatom selected from N, O and S as a ring member,
and optionally substituted with one or two groups selected from oxo, C1-2 alkyl, C1-2 alkoxy,
Ci -2 haloalkyl, and C1-2 haloalkoxy; and
G is selected from halo, benzotriazolyloxy, halobenzotriazolyloxy, pyridinotriazolyloxy, benzotriazolyl-N-oxide, pyridinotriazolyl-N-oxide, -O-(N-succinimide), l-cyano-2-ethoxy-2- oxoethylideneaminooxy, and -O-(N-phthalimide);
Formula (3) is:
wherein Q is 0RQ, OH, or OM, where M is cationic counterion; each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR;
Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O, B, and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, -SR4, -S(O)nR4, -NR4SO2R4, -SO2N(R4)2, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR4; when ring Cy is absent, the dashed bond may be a single bond or a double bond, and the dashed bond is optionally substituted by one or two groups selected from halo, CN, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, CO2R4, and -OR4; each L1 is independently a bond or C1-C2 alkylene, C1-C2 haloalkylene, NHC(O), SO2, or NHSO2;
R2 and R2 can each be H or a side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains;
or R2 or R2 can be an aryl, heteroaryl, bicyclic aryl, or bicyclic heteroaryl, each of which is optionally substituted with up to three groups independently selected from halo, cyano, azido, amino, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 alkoxy, and C1-C2 haloalkoxy;
' represents an optional link between R2 and L1, forming a 5-6 membered ring; n at each occurrence is independently 1 or 2; and each R and R4 is independently selected from H, C1-2 alkyl, and C1-C2 haloalkyl;
Formula (4) is:
wherein; wherein Q is OH, ORQ or OM, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; and M is cationic counterion;
W is a bond or a group selected from alkyl, cycloalkyl, heterocyclyl, aryl, heteroaryl, and bicyclic heteroaryl, each of which is optionally substituted with up to four groups independently selected from halo, OH, cyano, azido, -SR4, -S(O)nR4, -NR4SO2R4, -SO2N(R4)2, -B(OR4)2, oxo (unless W is aromatic), amino, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 alkoxy, and C1-C2 haloalkoxy; when W is a ring, ring W may be saturated, unsaturated, or aromatic; when W is a heterocyclic or heteroaromatic ring, it may contain one or two heteroatoms selected from N, O and S as ring members; represents an optional linkage connecting R10 and L2 into a 5-6 membered ring, optionally including an additional N, O or S as a ring member;
R10 is selected from H, halo, CN, NH2, NH(CH3), N(CH3)2, NO2 NHFmoc, NHBoc, C(O)NR2, NHC(O)R, NHC(O)OR, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, Ci- C2 haloalkoxy, and -OR4; and R10 is absent when W is a bond;
L2 and L3 are independently selected from a bond, CH2, SO2R, NHSO2R, C(=O)R, RNHC(=O), RNCH3C(=O), C1-C2 alkylene, C1-C2 haloalkylene, or triazole;
each R is independently selected from Ci-6 alkyl, phenyl, and benzyl, each of which is optionally substituted with up to three groups selected from halo, CN, C1-C2 haloalkyl, C1-C2 haloalkoxy, CO2R4, and -OR4;
Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O,B and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, , -SR4, -S(O)nR4, - NR4SO2R4, -SO2N(R4)2, and -OR4; when ring Cy is absent, the dashed bond may be a single bond or a double bond, and the dashed bond is optionally substituted by one or two groups selected from halo, CN, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, CO2R4, and -OR4; each L1 is independently a bond or C1-C2 alkylene, C1-C2 haloalkylene, NHC(O), SO2, or NHSO2; n at each occurrence is independently 1 or 2; and
R4 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl;
Formula (5) is:
wherein Q is OH, ORQ or OM, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; and M is cationic counterion;
' represents an optional link between R2 and nitrogen, forming a 5-6 membered ring: when the optional link is present, R5 is absent;
Cy is a 5 to 7 membered ring or an 8-10 membered bicyclic ring system, and Cy may be absent or present; when present, ring Cy may be saturated, unsaturated, or aromatic, and the dashed bond may be a single bond, double bond, or aromatic bond; when Cy is present, it may be a carbocyclic ring, or it may contain one to three heteroatoms selected from N, O, B, and S as ring members; and Cy is optionally substituted with one to six groups (or with one to four groups when Cy is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, , -SR4, -S(O)nR4, - NR4SO2R4, -SO2N(R4)2, and -OR4; when ring Cy is absent, the dashed bond may be a single bond or a double bond, and the dashed bond is optionally substituted by one or two groups selected from halo, CN, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, CO2R4, and -OR4;
R2 and R2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or
R2 and R2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R2 or R2 is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR4; each R and R4 is independently selected at each occurrence from H, C1-C2 alkyl, and Ci- C2 haloalkyl; n at each occurrence is independently 1 or 2; and
R5 is independently selected at each occurrence from H, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 alkoxy, and C1-C2 haloalkoxy;
Formula (6) is:
wherein Q is OH, ORQ or OM, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR;
M is a cationic counterion;
G'-G5 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G5 are N; the dashed bonds can be single bonds or double bonds;
J at each occurrence is independently selected from H, C1-C2 alkyl, NO2, C1-C2 haloalkyl, C1-C2 haloalkoxy, halo, -OR8, -N(R8)2, -SR8, -S(O)nR8, -NR8SO2R8, -SO2N(R8)2, , SO3R8, -B(OR8)2, C(=O)R8, CN, CON(R8)2, -COOR8, -C(-O)Ar, and tetrazole, where Ar represents a phenyl or 5-6 membered heteroaryl ring that is optionally substituted with one or two groups selected from halo, CN, R8 and OR8;
R2 and R2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or
R2 and R2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R2 or R2 is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR4; each R, R4 and R8 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl; and n at each occurrence is independently 1 or 2; and
R9 is H, CH3, benzyl, substituted benzyl;
Formula (7) is:
wherein Q is OH, ORQ or OM,
and when Q is OH or OM, the chemical reagent also comprises a peptide coupling reagent such as a compound of formula (1) or formula (2); each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; in some embodiments, RQ is 4-nitrophenyl, 2,4-dinitrophenyl, 4-fluorophenyl, 2,4- difluorophenyl, 2,3,4,5,6-pentafLuorophenyl, 2,3,5,6-tetrafluorophenyl, 4-sulfo- 2, 3, 5, 6, tetrafluorophenyl, halogen, imidazole, pyrazole, benzotriazole, and triazole; and M is a cationic counterion;
G'-G5 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G5 are N;
' represents an optional link between R2 and the nitrogen atom, forming a 5-6 membered ring: when the link is present, R11 is absent;
J at each occurrence is independently selected from H, C1-C2 alkyl, NO2, C1-C2 haloalkyl, C1-C2 haloalkoxy, halo, -OR8, -N(R8)2, -SR8, -S(O)nR8, -NR8SO2R8, -SO2N(R8)2, SO3R8, -B(OR8)2, C(=O)R8, CN, CON(R8)2, -COOR8, -C(-O)Ar, and tetrazole, where Ar represents a phenyl or 5-6 membered heteroaryl ring that is optionally substituted with one or two groups selected from halo, CN, R8 and OR8;
R2 and R2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or R2 and R2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R2 or R2 is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR4; each R, R4 and R8 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl; n at each occurrence is independently 1 or 2; and
R11 is H, CH3, benzyl, or substituted benzyl;
Formula (8) is:
wherein Q is OH, ORQ or OM, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR;
M is a cationic counterion;
G'-G4 are each independently selected from CH, CJ, BN, BO, and N, provided not more than 3 of G'-G4 are N;
J at each occurrence is independently selected from H, C1-C2 alkyl, NO2, C1-C2 haloalkyl, C1-C2 haloalkoxy, halo, -OR8, -N(R8)2, -SR8, -S(O)nR8, -NR8SO2R8, -SO2N(R8)2, , SO3R8, -B(OR8)2, C(=O)R8, CN, CON(R8)2, -COOR8, -C(-O)Ar, and tetrazole, where Ar represents a phenyl or 5-6 membered heteroaryl ring that is optionally substituted with one or two groups selected from halo, CN, R8 and OR8;
R2 and R2 can each be the side chain of an amino acid, e.g. one of the side chains of the 20 common amino acid side chains, optionally protected amino acid side chains, post- translationally modified amino acid side chains, unnatural amino acid sidechains; or R2 and R2 can each be H or a group selected from aryl, heteroaryl, bicyclic aryl, bicyclic heteroaryl, and heterocyclyl, each of which is optionally substituted with one to six groups (or with one to four groups when R2 or R2 is aromatic) selected from halo, CN, NH2, NH(CH3), N(CH3)2, protected amine (e.g., N3, NO2, NHFmoc, NHBoc), C(O)NR2, NHC(O)R, B(OR)2, aryl, heteroaryl, C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and -OR4; each R, R4 and R8 is independently selected at each occurrence from H, C1-C2 alkyl, and C1-C2 haloalkyl; n at each occurrence is independently 1 or 2; and
R12 represents one or two optional substituents on the pyridinium ring, which are independently selected from C1-C2 alkyl, C1-C2 haloalkyl, C1-C2 haloalkoxy, and halo;
Formula (9) is:
wherein:
G'-G4 are each independently selected from CH, CJ, and N, provided not more than 3 of G'-G4 areN;
J at each occurrence is independently selected from H, C1-C2 alkyl, NO2, C1-C2 haloalkyl, C1-C2 haloalkoxy, halo, -OR8, -N(R8)2, -SR8, -S(O)nR8, -NR8SO2R8, -SO2N(R8)2, , SO3R8, -B(OR8)2, C(=O)R8, CN, CON(R8)2, -COOR8, -C(-O)Ar, and tetrazole, where Ar represents a phenyl or 5-6 membered heteroaryl ring that is optionally substituted with one or two groups selected from halo, CN, R8 and OR8; each R8 is independently selected from H, C1-C2 alkyl, and C1-C2 haloalkyl; n at each occurrence is independently 1 or 2; and
R13 is selected from H, C1-C2 alkyl, C1-C2 alkoxy, C1-C2 haloalkyl, and C1-C2 haloalkoxy; and
Formula (10) is
wherein:
M is a metal binding group selected from the group consisting of sulfonamide, hydroxamic acid, sulfamate, and sulfamide; the group is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF3, NH2, N(CH3)2, NO2, SCH3, SO2CH3, CH2OH, B(OH)2, CN, CONH2, and CONHCH3; and LG is a leaving group.
[0199] In preferred embodiments of the disclosed methods, the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3):
(10), (H), (12), and (13), wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group
is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF3, NH2, N(CH3)2, NO2, SCH3, SO2CH3, CH2OH, B(OH)2, CN, CONH2, CO2H, CN4H, and CONHCH3; LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; CC is a cationic counterion; X is one of the following: O, S, Se, or NH. In preferred embodiments, the metal binding group is a Zn cation binding group.
[0200] The methods disclosed herein employ binders that specifically bind to terminal amino acid residues of polypeptides, which are used in the NGPS assay. Such binders may be developed by a number of approaches as disclosed in the art, and some of them are outlined below in Example 1.
[0201] In preferred embodiments of the disclosed methods, given that selectivities of each of the binders towards terminal amino acid (TAA) residues are known, information regarding identity of the TAA residue of the analyzed immobilized polypeptide is encoded in unique nucleic acid barcode present in the extended recording tag. This nucleic acid barcode may be used to decode the identity of the TAA residue by using known information regarding binding kinetics and/or specificity of the binders bound to the polypeptide at a given binding cycle.
[0202] In some embodiments, a first group of the binders are determined which each of which was bound to individual TAA residues of each polypeptide analyte. In some embodiments, for each polypeptide analyte to be identified and for each binding cycle, one can determine both binders that bind to the polypeptide analyte and binders that do not bind to the polypeptide analyte. In some embodiments, one can use one or more deconvolution methods based on the known binding properties of the binders to match the group of the binders to a sequence of a polypeptide, thereby determining the identity of each of polypeptide analytes present in a sample. In preferred embodiments, both known specificities of binders for NTAA or
CTAA residues and their order of binding to a polypeptide analyte are used to decode identify of the polypeptide analyte.
[0203] In some embodiments, the nucleic acid barcode may be used as an input to a probabilistic neural network which was trained to relate the sequence of the barcode to amino acid identity. Training can be performed by testing each binder individually (optionally, conjugated to a coding tag) against a panel of peptides each having a different NTAA or CTAA residue (optionally, with an associated recording tag), collecting sequence information of the recording tags extended after the binding, and feeding the collected information to the probabilistic neural network. Alternatively, training can be performed by testing a set of binders (optionally, each conjugated to a coding tag) against the panel of peptides, collecting sequence information of the recording tags extended after the binding, and feeding the collected information to the probabilistic neural network.
[0204] In some embodiments, after several cycles of contacting/transferring (also known as “encoding”), each immobilized polypeptide is back-translated into a series of unique nucleic acid barcodes on the corresponding recording tag attached to the immobilized peptide. During the analysis step, sequence of the extended recording tag can be analyzed to extract the abovementioned nucleic acid barcodes that correspond to each encoding cycle. Then, to associate the extracted nucleic acid barcodes with corresponding amino acid residues, an artificial intelligence (Al) model can be applied to calculate probabilities of occurrence of specific types of amino acid residues in corresponding places in amino acid sequence of the analyzed peptide. In preferred embodiments, the Al model can be pre -trained using multiple known peptide sequences, which were used to generate encoding nucleic acid data on associated recording tags. Modeling encoding of multiple known peptides using known binders allows for training the Al model to faithfully predict amino acid residues based on provided barcode nucleic acid sequences.
[0205] In some embodiments, the generated DNA barcodes on the extended recording tag of each polypeptide analyte are input to a probabilistic neural network (PNN) which will leam to relate the sequence of a DNA barcode to an amino acid identity. Probabilistic neural networks (Mohebali, B., et al., Chapter 14 - Probabilistic neural networks: a brief overview of theory, implementation, and application, in Handbook of Probabilistic Models, P. Samui, et al., Editors. 2020, Butterworth-Heinemann, p. 347-367) can approach Bayes optimal classification for multiclass problems such as amino acid identification from DNA barcodes (Klocker, J., et al., Bayesian Neural Networks for Aroma Classification. Journal of Chemical Information and Computer Sciences, 2002. 42(6): p. 1443-1449). A classifier based on PNN is guaranteed to
learn and converge to an optimal classifier as the size of the representative data set increases. Probabilistic neural networks have parallel structure such that data from any amino acid residue are used to learn all other amino acid residues.
[0206] In some embodiments, the disclosed methods are used for peptide sequence determination based on probabilistic neural network ensembles. The machine learning method is characterized in that the sequence determination can be realized by the following steps: i) the peptide fragments of proteins are encoded using binders into stretches of DNA sequences based on the physicochemical properties of amino acid residues; ii) a group of probabilistic neural network sub-classifiers are established, peptide fragments of proteins with known sequence are used to perform amino acid classification training and obtain a group of trained amino acid classification models; iii) the obtained models are utilized to determine peptide amino acid sequences in the test data sets; iv) the classification results output by the models are counted to generate amino acid candidate sets; v) the methods showing highest accuracy are combined to determine the amino acid sequence of protein peptide fragment; and vi) the algorithmic amino acid determination result is verified through k-fold cross-validation, where k is an integer.
[0207] In some embodiments, k-fold cross-validation operates as follows. In k-fold cross- validation, the dataset is shuffled and divided into k groups randomly with no overlap and replacements. This means each group is unique and is used for model evaluation only once. The data groups are carried through the following steps to perform the k-fold cross-validation:
1) A unique group is taken as a test data set;
2) The remaining (k-1) groups are used as a training data set;
3) A model is built using the training set and is evaluated using the test set;
4) The evaluation score is retained and the model is discarded;
5) Step 1-4 are repeated until all k groups are used for model evaluation;
6) The mean of the k evaluation scores is output as the k-fold cross-validated model performance score.
[0208] In some embodiments, the nucleic acid barcodes on the extended recording tag of each polypeptide analyte are input to a probabilistic neural network (PNN), which will learn to relate the nucleic acid sequence of a barcode to an amino acid identity of the analyzed polypeptide. In other embodiments, other statistical models (e.g., hidden Markov models) and machine learning methods (e.g., random forest models) can be used for classifying a NGS read from each extended recording tag into a specific amino acid residue.
[0209] In certain embodiments, the binder further comprises one or more detectable labels such as fluorescent labels, in addition to the binding moiety that recognize a terminal amino acid
residue in polypeptides. In some embodiments, the binder does not comprise a polynucleotide such as a coding tag. Optionally, the binder comprises a synthetic or natural antibody. In some embodiments, the binder comprises an aptamer. In one embodiment, the binder comprises an engineered protein binder, such one disclosed in Example 1 below, and a detectable label. In one embodiment, the detectable label is optically detectable. In some embodiments, the detectable label comprises a fluorescently moiety, a color-coded nanoparticle, a quantum dot or any combination thereof. In one embodiment the label comprises a polystyrene dye encompassing a core dye molecule such as a FluoSphere™, Nile Red, fluorescein, rhodamine, derivatized rhodamine dyes, such as TAMRA, phosphor, polymethadine dye, fluorescent phosphoramidite, TEXAS RED, green fluorescent protein, acridine, cyanine, cyanine 5 dye, cyanine 3 dye, 5-(2'-aminoethyl)-aminonaphthalene-l-sulfonic acid (EDANS), BODIPY, 120 ALEXA or a derivative or modification of any of the foregoing. In one embodiment, the detectable label is resistant to photobleaching while producing signal (such as photons) at a unique and easily detectable wavelength, with high signal-to-noise ratio.
[0210] In some embodiments of the disclosed methods, each binder comprises a detectable label, and the detectable label is used to record binding between a polypeptide analyte and the binder. Upon binding of a binder to a terminal amino acid residue of a polypeptide, by measuring fluorescent signal (e.g., intensity, lifetime, etc.) of the detectable label of the binder on an integrated semiconductor chip, the identity of the terminal amino acid residue can be determined with certain probability. In some embodiments, N-binder and D-binder comprise different distinguishable detectable labels, and upon detection of the signal from one of these detectable labels, information regarding glycosylation at the glycosylation site can be inferred. Further optical detection methods used for identification of terminal amino acid residues by binders conjugated with a detectable label are disclosed in the patent publications US 20210364527 Al, US 20210139973 Al, US 20200209257 Al, US 11549942 B2, US 11282586 B2, US 20210239705 Al each of which is incorporated by reference herein.
Attachment to the support
[0211] In some embodiments, the polypeptide is joined to a support before performing the encoding reaction. In some cases, it is desirable to use a support with a large carrying capacity to immobilize a large number of polypeptides. In some embodiments, it is preferred to immobilize the polypeptides using a three-dimensional support (e.g., a porous matrix or a porous bead). For example, the preparation of the polypeptides including joining the polypeptide to a support is performed prior to performing the binding reaction. In some examples, the preparation of the polypeptide including joining the polypeptide to nucleic acid molecule or an
oligonucleotide may be performed prior to or after immobilizing the polypeptide. In some embodiments, a plurality of polypeptides are attached to a support prior to the binding reaction and contacting with a binder.
[0212] In some embodiments, the support may comprise any suitable solid material, including porous and non-porous materials, to which a polypeptide, e.g., a polypeptide, can be associated directly or indirectly, by any means known in the art, including covalent and non- covalent interactions, or any combination thereof.
[0213] Various reactions may be used to attach the polypeptide analytes to a support. The polypeptides may be attached directly or indirectly to the support. In some cases, the polypeptides are attached to the support via a nucleic acid (e.g., via a nucleic acid recording tag). Exemplary reactions include click chemistry reactions, such as the copper catalyzed reaction of an azide and alkyne to form a triazole (Huisgen 1, 3-dipolar cycloaddition), strain-promoted azide alkyne cycloaddition (SPAAC), reaction of a diene and dienophile (Diels-Alder), strain- promoted alkyne-nitrone cycloaddition, reaction of a strained alkene with an azide, tetrazine or tetrazole, alkene and azide [3+2] cycloaddition, alkene and tetrazine inverse electron demand Diels- Alder (IEDDA) reaction (e.g., m-tetrazine (mTet) or phenyl tetrazine (pTet) and transcyclooctene (TCO); or pTet and an alkene), alkene and tetrazole photoreaction, Staudinger ligation of azides and phosphines, and various displacement reactions, such as displacement of a leaving group by nucleophilic attack on an electrophilic atom (Horisawa 2014, Knall, Hollauf et al. 2014). Exemplary displacement reactions include reaction of an amine with: an activated ester; an N-hydroxysuccinimide ester; an isocyanate; an isothioscyanate, an aldehyde, an epoxide, or the like. In some embodiments, iEDDA click chemistry is used for immobilizing polypeptides to a support since it is rapid and delivers high yields at low input concentrations. In another embodiment, m-tetrazine rather than tetrazine is used in an iEDDA click chemistry reaction, as m-tetrazine has improved bond stability. In another embodiment, phenyl tetrazine (pTet) is used in an iEDDA click chemistry reaction. In one case, a polypeptide is labeled with a bifunctional click chemistry reagent, such as alkyne-NHS ester (acetylene -PEG-NHS ester) reagent or alkyne-benzophenone to generate an alkyne-labeled polypeptide. In some embodiments, an alkyne can also be a strained alkyne, such as cyclooctynes including Dibenzocyclooctyl (DBCO).
[0214] Similar methods (e.g., click chemistry reactions, bioorthogonal reactions) can be used to attach the polypeptide analyte to the associated nucleic acid recording tag, or to attach the binder to the associated detectable label or nucleic acid coding tag. Such attachments can be
achieved by introducing reactive moiety or moieties on one or on both attachment partners. Some specific examples are described in Examples below.
[0215] In some embodiments of the disclosed methods, a plurality of different polypeptides is immobilized on a solid support, wherein each polypeptide of the plurality of different polypeptides is attached to a nucleic acid recording tag. Various possible ways exist for association between an immobilized polypeptide and the associated nucleic acid recording tag. A recording tag may be directly linked to the polypeptide, linked to a polypeptide via a linker, via a multifunctional linker, or attached to a polypeptide by virtue of its proximity (or colocalization) on the support. In some embodiments, the recording tag is attached to the support, and the polypeptide is immobilized on the support via the recording tag. In some embodiments, a linker is attached to the support, and the polypeptide and the recording tag are independently attached to the linker, thereby generating immobilization on the support and association of the polypeptide with the recording tag. Other immobilization and association variants are possible.
[0216] In some embodiments, at least one recording tag is associated or co-localized directly or indirectly with the polypeptide. In another embodiment, multiple recording tags are attached to the polypeptide, such as to the lysine residues or peptide backbone. In some embodiments, a polypeptide labeled with multiple recording tags is fragmented or digested into smaller peptides, with each peptide labeled on average with one recording tag. A recording tag may be single stranded, or partially or completely double stranded. In some embodiments, the recording tag may comprise a unique molecular identifier, a compartment tag, a partition barcode, sample barcode, a fraction barcode, a spacer sequence, a universal priming site, or any combination thereof. In some embodiments, the recording tag may comprise a blocking group, such as at the 3 ’-terminus of the recording tag. In some cases, the 3 ’-terminus of the recording tag is blocked to prevent extension of the recording tag by a polymerase.
[0217] In some embodiments, the recording tag can include a sample identifying barcode. A sample barcode is useful in the multiplexed analysis of a set of samples in a single reaction vessel or immobilized to a single solid support (e.g., a bead or a planar substrate) or collection of solid supports. For example, polypeptides from many different samples can be labeled with recording tags with sample-specific barcodes, and then all the samples pooled together prior to immobilization to a support, cyclic binding of the binder, and recording tag analysis. In certain embodiments, a recording tag comprises an optional unique molecular identifier (UMI), which provides a unique identifier tag for each polypeptide to which the UMI is associated with. A UMI can be used to de-convolute sequencing data from a plurality of extended recording tags to identify sequence reads from individual polypeptides. In some embodiments, within a library of
polypeptides, each polypeptide is attached to a single recording tag, with each recording tag comprising a unique UMI. In other embodiments, multiple copies of a recording tag are attached to a single polypeptide, with each copy of the recording tag comprising the same UMI. In certain embodiments, a recording tag comprises a universal priming site, e.g., a forward or 5’ universal priming site. A universal priming site is a nucleic acid sequence that may be used for priming a library amplification reaction and/or for sequencing. A universal priming site may include, but is not limited to, a priming site for PCR amplification, flow cell adaptor sequences that anneal to complementary oligonucleotides on flow cell surfaces (e.g., Illumina next generation sequencing), a sequencing priming site, or a combination thereof. In some embodiments, a universal priming site comprises an Illumina P5 primer or an Illumina P7 primer for NGS.
[0218] In certain embodiments, a polypeptide can be immobilized to a support by an affinity capture reagent (and optionally covalently crosslinked), wherein the recording tag is attached to the affinity capture reagent directly, or alternatively, the polypeptide can be directly immobilized to the support with a recording tag. In one embodiment, the polypeptide is attached to a bait nucleic acid which hybridizes to a capture nucleic acid and is ligated to a capture nucleic acid which comprises a reactive coupling moiety for attaching to the support (see, e.g., US 2022/0049246 Al, incorporated by reference herein). In some examples, the bait or capture nucleic acid may serve as a recording tag to which information regarding the polypeptide can be transferred. In some embodiments, the polypeptide is attached to a bait nucleic acid to form a nucleic acid-polypeptide conjugate. In some embodiments, the immobilization methods comprise bringing the nucleic acid-polypeptide conjugate into proximity with a support by hybridizing the bait nucleic acid to a capture nucleic acid attached to the support, and covalently coupling the nucleic acid-polypeptide conjugate to the solid support. In some cases, the nucleic acid-polypeptide conjugate is coupled indirectly to the solid support, such as via a linker. In some embodiments, a plurality of the nucleic acid-polypeptide conjugates is coupled on the solid support and any adjacently coupled nucleic acid-polypeptide conjugates are spaced apart from each other at an average distance of about 50 nm or greater.
[0219] In certain embodiments where multiple polypeptides are immobilized on the same support, the polypeptide molecules can be spaced appropriately to accommodate methods of performing the binding reaction and any downstream analysis steps to be used to assess the polypeptide. For example, it may be advantageous to space the polypeptide molecules that optimally to allow a nucleic acid-based method for assessing and sequencing the proteins to be performed. In some embodiments, the method for assessing and sequencing protein
polypeptides involves a binder which binds to the polypeptide molecules and the binder comprises a coding tag with information that is transferred to a nucleic acid attached to the polypeptide molecules. In some cases, spacing of the polypeptides on the support is determined based on the consideration that information transfer from a coding tag of a binder bound to one polypeptide molecule may reach a neighboring molecule.
[0220] In some embodiments, the surface of the support is passivated (blocked). A “passivated” surface refers to a surface that has been treated with outer layer of material. Methods of passivating surfaces include standard methods from the fluorescent single molecule analysis literature, including passivating surfaces with polymer like polyethylene glycol (PEG) (Pan et al., 2015, Phys. Biol. 12:045006), polysiloxane (e.g., Pluronic F-127), star polymers (e.g., star PEG) (Groll et al., 2010, Methods Enzymol. 472:1-18), hydrophobic dichlorodimethylsilane (DDS) + self-assembled Tween-20 (Hua et al., 2014, Nat. Methods 11:1233-1236), diamond-like carbon (DLC), DLC + PEG (Stavis et al., 2011, Proc. Natl. Acad. Sci. USA 108:983-988), and zwitterionic moiety (see, e.g., US 2006/0183863). In addition to covalent surface modifications, a number of passivating agents can be employed as well including surfactants like Tween-20, polysiloxane in solution (Pluronic series), poly vinyl alcohol (PVA), and proteins like BSA and casein. Alternatively, density of polypeptides can be titrated on the surface or within the volume of a solid support by spiking a competitor or “dummy” reactive molecule when immobilizing the polypeptides on the solid support.
[0221] To control spacing of the immobilized polypeptides on the support, the density of functional coupling groups for attaching the polypeptide (e.g., TCO or carboxyl groups (COOH)) may be titrated on the support surface. In some embodiments, multiple polypeptide molecules are spaced apart on the surface or within the volume (e.g., porous supports) of a support such that adjacent molecules are spaced apart at a distance of about 50 nm to about 500 nm. In some embodiments, multiple molecules are spaced apart on the surface of a support with an average distance of at least 50 nm, at least 100 nm, at least 200 nm, or at least 500 nm.
[0222] In some embodiments, appropriate spacing of the polypeptide molecules on the support is accomplished by titrating the ratio of available attachment molecules on the support surface. In some examples, the support surface (e.g., bead surface) is functionalized with a carboxyl group (COOH) which is treated with an activating agent (e.g. , activating agent is EDC and Sulfo-NHS). In some examples, the support surface (e.g., bead surface) comprises NHS moieties. In some embodiments, a mixture of mPEGn-NH2 and NH2-PEGn-mTet is added to the activated beads (wherein n is any number, such as 1-100). The ratio between the mPEG3- NH2 (not available for coupling) and NH2-PEG24-mTet (available for coupling) is titrated to
generate an appropriate density of functional moieties available to attach the polypeptides on the support surface. In certain embodiments, the mean spacing between coupling moieties (e.g., NH2-PEG4-mTet) on the solid support is at least 50 nm, at least 100 nm, or at least 500 nm. In some specific embodiments, the ratio of NPb-PEGn-mTct to mPEG3-NH2 is about or greater than 1:1000, about or greater than 1:10,000. In some further embodiments, the recording tag attaches to the NH2-PEGn-mTet. In some embodiments, the spacing of the polypeptide molecules on the support is achieved by controlling the concentration and/or number of available functional groups on the support.
[0223] More immobilization examples are described in US 2023/0054691 Al, incorporated by reference herein.
EXAMPLARY EMBODIMENTS
[0224] The following enumerated embodiments represent certain embodiments and examples of the invention:
1. A method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and
(c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N- linked glycosylation site.
2. A method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) cleaving the polypeptide treated with the PNGase to generate a cleaved polypeptide comprising the glycan attachment residue of the N-linked glycosylation site as a terminal amino acid (TAA) residue of the cleaved polypeptide;
(c) contacting the immobilized and cleaved polypeptide with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and
(d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and cleaved polypeptide, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
3. A method for analyzing a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves an N-linked glycan from a glycan attachment residue of a N-linked glycosylation site of the polypeptide, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and
a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D), wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
(c) cleaving the immobilized polypeptide to expose a new TAA residue;
(d) contacting the immobilized and cleaved polypeptide with the set of binders, wherein the first binder, the second binder, or the third binder binds to the new TAA residue of the immobilized and cleaved polypeptide;
(e) optionally repeating (c) - (d) in one or more cycles;
(f) determining an amino acid sequence of the polypeptide by obtaining or retaining identifying information regarding the binders that bind to the TAA residues in (b), (d), and optionally (e); and
(g) comparing the amino acid sequence determined in (f) to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase, thereby detecting one or more N-linked glycosylation sites in the polypeptide.
4. The method of embodiment 3, further comprising following binding of a binder of the set of binders to the polypeptide or the cleaved polypeptide, obtaining or retaining identifying information regarding the binder.
5. The method of embodiment 3 or embodiment 4, wherein the TAA residues are N-terminal amino acid (NTAA) residues.
6. The method of any one of embodiments 3-5, wherein the TAA residues are modified.
7. The method of embodiment 2, wherein the TAA is an N-terminal amino acid (NTAA).
8. The method of embodiment 2, wherein the TAA is a C-terminal amino acid (CTAA).
9. The method of any one of embodiments 2-8, wherein the cleavage of the polypeptide is performed by a cleaving enzyme.
10. The method of any one of embodiments 1-9, further comprising quantifying degree of the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site of the polypeptide by determining what fraction of molecules of the polypeptide or the cleaved polypeptide bind to the first binder and/or the second binder based on analysis of the identifying information regarding the first binder and/or the second binder.
11. The method of any one of embodiments 1-10, wherein in addition to the polypeptide, additional 1000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
1
12. The method of embodiment 11, wherein attachments of an N-linked glycan to a glycan attachment residue of the N-linked glycosylation site of each polypeptide of the additional 1000 or more different polypeptides are assessed.
13. The method of any one of embodiments 1-12, wherein the identifying information regarding the first binder and/or the second binder are analyzed by an optical method.
14. The method of any one of embodiments 1-12, wherein the identifying information regarding the first binder and/or the second binder are analyzed by a nucleic acid sequencing method.
15. The method of any one of embodiments 1-14, wherein before contacting with the PNGase, the N-linked glycosylation site comprises any one of the following amino acid sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx is any standard, naturally occurring amino acid residue.
16. The method of any one of embodiments 1-15, wherein one or more additional molecules of the polypeptide are (i) immobilized on the solid support; (ii) are not contacted with PNGase; and (iii) analyzed as described in (b) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized polypeptide or the cleaved immobilized polypeptide after the contacting with the PNGase.
17. The method of any one of embodiments 1-16, wherein in (a), the contacting with the PNGase occurs after the immobilization of the polypeptide to the solid support.
18. The method of any one of embodiments 13-17, wherein the detectable labels of the first binder and/or the second binder are each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
19. The method of any one of embodiments 1-17, wherein the detectable labels of the first binder and/or the second binder are each comprise an epitope.
20. The method of any one of embodiments 1-17, wherein the detectable labels of the first binder and/or the second binder are each comprise a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
21. The method of embodiment 20, wherein the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ nucleic acid sequencing of the encoder barcode of the first binder or the second binder.
22. The method of embodiment 20, wherein the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag that comprises the encoder barcode.
23. The method of embodiment 22, wherein the in situ amplification is a rolling circle amplification.
24. The method of embodiment 22 or embodiment 23, further comprising hybridizing a fluorescent oligonucleotide probe to the amplified portion of the nucleic acid coding tag and detecting a signal from the fluorescent oligonucleotide probe.
25. The method of embodiment 20, wherein (i) the immobilized polypeptide is attached to a nucleic acid recording tag before contacting the immobilized polypeptide treated with the PNGase with the set of binders; and (ii) following binding of the first binder or the second binder to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide, generating an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the first binder or the second binder and nucleic acid sequence information of the nucleic acid recording tag attached to the immobilized polypeptide; and wherein obtaining or retaining identifying information regarding the first binder or the second binder comprises determining a nucleic acid sequence of at least a portion of the extended nucleic acid construct, wherein the portion comprises the nucleic acid sequence information of the encoder barcode of the first binder or the second binder.
26. The method of embodiment 25, wherein the nucleic acid sequence of the portion of the extended nucleic acid construct is determined using a DNA sequencer.
27. The method of any one of embodiments 1-26, wherein the support is a flow cell.
28. The method of any one of embodiments 1-27, wherein the determining amino acid identity of the amino acid residue treated with the PNGase comprises determining a Likelihood of a particular type of the amino acid residue.
29. The method of any one of embodiments 2-28, further comprising determining amino acid identities of one or more additional amino acid residues of the polypeptide treated with the PNGase by utilizing the first binder or the second binder.
30. The method of embodiment 29, further comprising determining an amino acid sequence of the polypeptide treated with the PNGase based on the determined amino acid identities of the one or more additional amino acid residues of the polypeptide.
31. The method of embodiment 30, further comprising comparing the amino acid sequence of the polypeptide treated with the PNGase to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase.
32. The method of any one of embodiments 1-31, wherein the first binder binds to a terminal Asn (N) residue, and/or second binder binds to a terminal Asp (D) residue.
33. The method of any one of embodiments 1-32, wherein the set of binders further comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D).
34. The method of any one of embodiments 1-33, wherein the method does not comprise use of mass spectrometry.
35. A method for detecting N-linked glycosylation of polypeptides comprising a first polypeptide having a N-linked glycosylation site, the method comprising:
(a) attaching polypeptides comprising the first polypeptide to a solid support, wherein the polypeptides are treated with a PNGase before or after the attachment of the polypeptides to the solid support, thereby obtaining immobilized polypeptides comprising immobilized first polypeptide de-glycosylated at the N-linked glycosylation site;
(b) analyzing sequentially at least some individual amino acid residues of the immobilized polypeptides, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
(i) contacting the solid support with a set of binders, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
(ii) following binding of a binder of the set of binders to the polypeptide, obtaining or retaining identifying information regarding the binder;
(iii) removing the TAA or the modified TAA to expose a new TAA, thereby generating a cleaved polypeptide, and, optionally, modifying the new TAA to yield a newly modified TAA; and (iv) repeating steps (i)-(iii) or (i)-(ii) at least one time, wherein the immobilized first polypeptide is analyzed; the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide;
(c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
36. The method of embodiment 35, further comprising quantifying degree of glycosylation for the N-linked glycosylation site of the first polypeptide by determining what fraction of the immobilized first polypeptide molecules comprise Asp TAA as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder
and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules.
37. The method of embodiment 35 or embodiment 36, wherein some of immobilized polypeptide molecules comprising immobilized first polypeptide molecules obtained in (a) are: (i) not treated with the PNGase in (a), and (ii) analyzed as described in (b), followed by utilizing data obtained by the analysis of immobilized polypeptide molecules not treated with the PNGase in (c) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized first polypeptide molecules after treatment with the PNGase.
38. The method of any one of embodiments 35-37, wherein before (a), the first polypeptide is generated by fragmenting a protein from a sample.
39. The method of any one of embodiments 35-38, wherein the method does not comprise use of mass spectrometry.
40. The method of any one of embodiments 35-39, wherein the first polypeptide is a pre-selected target polypeptide.
41. The method of any one of embodiments 36-40, wherein in addition to the first polypeptide, degrees of glycosylation for 100, 200, 500, 1000, 5000 or more different polypeptides each comprising a N-linked glycosylation site are determined in (c) utilizing the first binder and the second binder.
42. The method of any one of embodiments 35-41, wherein the TAA is an N-terminal amino acid (NTAA) and the new TAA is a new NTAA.
43. The method of any one of embodiments 35-41, wherein the TAA is a C-terminal amino acid (CTAA) and the new TAA is a new CTAA.
44. The method of any one of embodiments 35-43, wherein each binder of the plurality binds to a modified TAA of an immobilized polypeptide.
45. The method of embodiment 44, wherein the modified TAA is a modified N-terminal amino acid (NTAA).
46. The method of embodiment 45, wherein the modified NTAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N-terminal modifier agent before contacting the solid support with the set of binders.
47. The method of embodiment 46, wherein the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3):
wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group
is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF3, NH2, N(CH3)2, NO2, SCH3, SO2CH3, CH2OH, B(OH)2, CN, CONH2, CO2H, CN4H, and CONHCH3; LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally oxidized; or RQ can be -C(=O)R or -C(=O)-OR; CC is a cationic counterion; X is one of the following: O, S, Se, or NH.
48. The method of any one of embodiments 35-47, wherein the first binder and/or the second binder each comprises a peptide or an aptamer.
49. The method of any one of embodiments 35-48, wherein (i) each binder of the plurality comprises an identifying detectable label; (ii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting the identifying detectable label.
50. The method of embodiment 49, wherein the identifying detectable label comprises a fluorescent moiety, and the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting a signal from the fluorescent moiety.
51. The method of any one of embodiments 35-50, wherein (i) the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag; (ii) each binder of the plurality is attached to a nucleic acid coding tag that comprises identifying information regarding the binder; and (iii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (b)(ii) is retained in the nucleic acid recording tag attached to the polypeptide upon transfer from the nucleic acid coding tag, wherein the transfer comprises primer extension or ligation.
52. The method of embodiment 51, wherein the analysis of the identifying information regarding the first binder and/or regarding the second binder retained in step (b) for the
immobilized first polypeptide molecules is performed by nucleic acid sequencing of recording tags attached to the first polypeptide molecules and extended upon transfer from the nucleic acid coding tags.
53. The method of any one of embodiments 35-52, wherein each of the immobilized polypeptides comprising the immobilized first polypeptide in (a) is covalently joined to the solid support.
54. The method of any one of embodiments 35-53, wherein the solid support is a bead.
55. The method of any one of embodiments 35-54, wherein in (b), the analysis comprises determining identity of amino acid residues present at the N-linked glycosylation site of different immobilized first polypeptide molecules.
56. The method of any one of embodiments 35-55, wherein the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, A snXxxThr or AsnXxxCys.
57. The method of any one of embodiments 35-56, wherein in (b)(iii), the TAA or the modified TAA are removed using an enzyme.
58. The method of any one of embodiments 35-57, wherein in (a), the treatment with the PNGase occurs after the attachment to the solid support.
59. The method of any one of embodiments 35-58, wherein in (b), steps (i)-(iii) or (i)-(ii) are repeated at least two, at least three, at least four or more times.
60. The method of any one of embodiments 35-59, further comprising determining identity of the first polypeptide based on analysis of (i) the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, and (ii) identifying information regarding at least one additional binder of the set of binders that was bound to some of first polypeptide molecules during step (b).
61. A method for analyzing a polypeptide comprising a N-linked glycosylation site, comprising:
(a) contacting the polypeptide with a PNGase, wherein the polypeptide is attached to a solid support before or after the contacting with the PNGase, thereby immobilizing the polypeptide on the solid support;
(b) determining at least partial amino acid sequence of the immobilized polypeptide, wherein the determining comprises:
(i) contacting the immobilized polypeptide treated with the PNGase with a set of binders,
(ii) cleaving the immobilized polypeptide to generate a cleaved immobilized polypeptide, and
(iii) contacting the cleaved immobilized polypeptide with a subsequent set of binders, wherein each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder, wherein the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
62. The method of embodiment 61, further comprising quantifying degree of glycosylation for the N-linked glycosylation site of the polypeptide by determining what fraction of the immobilized polypeptide molecules or the cleaved immobilized polypeptide molecules binds to the first binder and/or the second binder at the N-linked glycosylation site based on analysis of the detectable labels attached to the binders.
63. The method of embodiment 61 or embodiment 62, wherein in addition to the polypeptide, additional 100, 200, 500, 1000, 5000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
64. The method of any one of embodiments 61-63, wherein the method does not comprise use of mass spectrometry.
65. The method of any one of embodiments 61-64, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA residue of the polypeptide immobilized on the solid support.
66. The method of embodiment 65, wherein the TAA is an N-terminal amino acid (NTAA).
67. The method of embodiment 65, wherein the TAA is a C-terminal amino acid (CTAA).
68. The method of any one of embodiments 61-67, wherein the cleaving step (ii) removes a terminal amino acid (TAA) or a modified TAA residue of the immobilized polypeptide to generate a new TAA residue of the cleaved immobilized polypeptide.
69. The method of embodiment 68, wherein the cleavage of the TAA or modified TAA residue is performed by a cleaving enzyme.
70. The method of any one of embodiments 65-69, wherein the modified TAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N- terminal or C-terminal modifier agent before contacting the polypeptide with the binders.
71. The method of any one of embodiments 61-70, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed after performing step (iii).
72. The method of any one of embodiments 61-70, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed before performing step (iii).
73. The method of any one of embodiments 61-72, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed by an optical method.
74. The method of any one of embodiments 61-72, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed by a sequencing method, such as a nucleic acid sequencing method.
75. The method of any one of embodiments 61-74, wherein the immobilized polypeptide is covalently joined to the solid support, such as a bead.
76. The method of any one of embodiments 61-75, wherein the determining comprises determining identity of at least one amino acid residue present at the N-linked glycosylation site of different immobilized polypeptide molecules.
77. The method of any one of embodiments 61-76, wherein the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys.
78. The method of any one of embodiments 61-77, wherein in (a), the contacting with the PNGase occurs after the attachment to the solid support.
79. The method of any one of embodiments 61-78, wherein the set of binders is essentially the same as the subsequent set of binders.
80. The method of any one of embodiments 61-79, wherein after step (iii), steps (ii)-(iii) are repeated one or more times by cleaving the cleaved immobilized polypeptide generating a further cleaved immobilized polypeptide, and contacting the further cleaved immobilized polypeptide with a new subsequent set of binders, wherein each binder of the new subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder.
EXAMPLES
[0225] The following examples are offered to illustrate but not to limit the methods, compositions, and uses provided herein. Certain aspects of the present invention, including, but not limited to, embodiments for high-throughput peptide analysis, the Proteocode™ peptide assay, methods of generating specific binders recognizing terminal residues of peptides, agents
configured for removing modified NTAA residues from the N-terminally modified target peptides, information transfer between coding tags and recording tags, methods of making polynucleotide-peptide conjugates, methods for attachment of nucleotide-peptide conjugates to a support, methods of generating barcodes, methods for analyzing extended recording tags were disclosed in the earlier published applications US 2019/0145982 Al, US 2020/0348308 Al, US 2020/0348307 Al, US 2021/0208150 Al, US 2022/0049246 Al, US 2022/0283175 Al, US 2022/0144885 Al, US 2022/0227889 Al, US 2021/0214701 Al and US 2023/0136966 Al, the contents of which are incorporated herein by reference in their entireties.
[0226] Example 1. Generation of binders that specifically bind to terminal amino acid residues of polypeptides.
[0227] The methods disclosed herein employ binders that specifically bind to terminal amino acid residues of polypeptides, which are used in the NGPS assay. Such binders may be developed by a number of approaches as disclosed in the art, and some of them are briefly outlined below.
[0228] US 2022/0283175 Al discloses methods for generating binders based on metalloprotein scaffolds, and in particular human carbonic anhydrase scaffolds. It disclosed selection and design of engineered binders suitable for the NGPS assay and capable of binding NTM-P1 with minimal P2 bias, where Pl and P2 are the first two N-terminal amino acids, and NTM is a modification of Pl by an N-terminal modifier agent (see, e.g., Example 2 of US 2022/0283175 Al). In some embodiments, each binder of the set of binders used in the disclosed methods specifically binds to a particular terminal residue of polypeptides, wherein the binder comprises an amino acid sequence X1-C/H/D/E-X2-C/H/D/E-X3-C/H/D/E-X4, wherein C/H/D/E is any single amino acid residue independently selected from the group consisting of amino acid residues C (Cys), H (His), D (Asp), and E (Glu); XI, X2, X3 and X4 are each any amino acid sequence independently comprising between 0 and 500 amino acid residues in length, and wherein the amino acid sequence XI -C/H/D/E -X2-C/H/D/E-X3 -C/H/D/E -X4 chelates the zinc metal cation M with a thermodynamic dissociation constant of 0.5 nM or less. [0229] A variety of NTM have been tested to increase affinity of an engineered binder towards Pl residue, with M64 being an exemplary NTM (see Example 2 below). Specific binders were engineered as described in US 2022/0283175 Al using computed modeling, phage display screening and affinity maturation (see also Examples 3-5 for brief description).
Sequences of exemplary binders generated by methods described in US 2022/0283175 Al and specific for a particular modified NTAA of polypeptides are shown in SEQ ID NOs: 1, 3, 5, 7, 8, 11, 13, 15, 17, and 19.
[0230] US 10852305 B2 discloses methods for generating binders based on different tRNA synthetase scaffolds. tRNA synthetases (RS) have intrinsic specificity for free amino acids, so they are useful scaffolds for developing NTAA-specific binders for use in high-throughput protein sequencing. By introducing one or more mutations in its amino acid-binding domain, the inherent specificity of RS protein may be retained, while broadening the binding capability of the RS protein from free monomers to peptides. The binding pocket in these molecules can be modified to permit the entry of peptides presenting the specifically bound amino acid (as demonstrated in US 10852305 B2). In particular, specific N-binder and D-binder may be derived from SEQ ID NO: 4 and SEQ ID NO: 9, respectively, as disclosed in US 10852305 B2, Table A. Binders specific for other non-modified NTAA residues were also disclosed.
[0231] US 20200219590 Al discloses methods for generating binders specific for nonmodified NTAA residues to use in high-throughput protein sequencing. Table 1 of US 20200219590 Al provides a list of binder sequences together with the amino acid binding preferences of each molecule with respect to amino acid identity at a terminal position of a polypeptide. In particular, specific D-binder may be derived from SEQ ID NO: 10, as disclosed in US 20200219590 Al, Table 1.
[0232] Examples of binders capable of binding to all possible NTAA residues (as well as modified NTAA residues) of polypeptide analytes with a certain level of selectivity are known in the art (see e.g., U.S. Patent Nos. 9,566,335, 10,852,305, and 9,435,810; patent publications US 20190145982 Al, US 2022/0283175 Al, WO 2022/072560 Al, WO 2010/065322 Al, incorporated by reference in its entirety).
[0233] In some embodiments, binders capable of binding to CTAA residue of a polypeptide comprise engineered catalytically inactive carboxypeptidases. To act as a binder, the catalytic residues of the carboxypeptidase can be mutated to create a catalytically inactive enzyme, which still retains its binding ability. This is exemplified with the subtilisin serine proteases comprised of a canonical Ser-His-Asp catalytic triad, in which any or all of the catalytic residues can be mutated to alanine to render the enzyme largely catalytically inactive without affecting binding Km’s (disclosed in Carter P, Wells JA. Dissecting the catalytic triad of a serine protease. Nature. 1988 Apr 7;332(6164):564-8). Exemplar carboxypeptidases suitable for engineering include the MEROPS (Rawlings, N., cl a’L, MEROPS'. the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res (2014) 42: D503-D509) family S10, M14, M20 and M32 members; IUPAC classifications include EC 3.4.16 (serine carboxypeptidases), EC 2.4.17 (metallo-carboxypeptidases), and EC 3.4.18 (cysteine carboxypeptidases). The serine carboxypeptidases, the metallocarboxypeptidases, and the cysteine carboxypeptidases can be
rendered catalytically inactivate by replacing residues within the Ser-His-Asp catalytic triad, the His-Xaa-Xaa-Glu (Ml 4) or His-Glu-X-X-His (M32) motif, or the Cys-His motif with alanine, respectively.
[0234] In some embodiments, selective binders can be derived from a carboxypeptidase. Carboxypeptidases are proteolytic enzymes that remove C-terminal amino acids/peptides from proteins. Two enzymatic mechanisms for carboxypeptidase activity have been identified: metalloproteases that employ zinc ions to affect catalysis and serine carboxypeptidases that employ an activated serine nucleophile (Breddam, K. Serine carboxypeptidases. A review. Carlsberg Res. Commun. (1986) 51, 83). Carboxypeptidases are a diverse enzyme family that generally demonstrate substrate sequence specificity derived from interactions between the substrate and enzyme active site. Metallo-carboxypeptidases are classified in the MEROPS peptidase database as families as M14, M15, M20, M28B, and M32 whereas serine carboxypeptidases belong to the S10, SI 1, S12, S28, S41, and S66. In many carboxypeptidases, C-terminal specificity (vs. aminopeptidase activity) is mediated by interactions between the C- terminal carboxylate on the substrate and an arginine in the metalloprotease. Carboxypeptidases vary in C-terminal amino acid specificity, and highly specific enzymes may represent particularly useful candidates for binder derivation. Importantly, catalytic activity may be removed through genetic engineering or biochemical regulation (e.g. addition of inhibitor or metal chelator).
[0235] In some embodiments, selective binders can be derived from the S10 family, which comprises serine proteases with yeast Carboxypeptidase Y (CPY) as an exemplar member. CPY is used in peptide sequencing since it is processive and has very little bias for amino acid residues it removes from the C-terminus of the peptide (Pa tterson, D., et ah, C-terminal ladder sequencing via matrix-assisted laser desorption mass spectrometry coupled with carboxypeptidase Y time-dependent and concentration-dependent digestions. AnaL Chem. (1995) 67:3971-3978; Jung G, et al., Carboxypeptidase Y: structural basis for protein sorting and catalytic triad. J Biochem. 1999 Jul; 126(1 ): 1-6). The CPY binding pocket for the C- terminal amino acids (Pf residue) is comprised of mostly hydrophobic residues including Trp49, Asn51, Gly52, Cys56, Thr60, Phe64, Glue5, Glul45, Tyr256, Tyr269, Leu272, Ser297, Cys298 and Met398. CPY exhibits a broad specificity to the C-terminal residue of polypeptide substrates accommodating hydrophobic, hydrophilic, and aliphatic, residues due to its large binding pocket at the Si’ site in CPY. In contrast, CPY exhibits much greater specificity for the Pl residue (penultimate to C-terminus). The SI subsite is a deep pocket mainly constructed of hydrophobic residues, Tyrl47, Leul78, Tyrl85, Tyrl88, Asn241, Leu245, Trp312, Ile340, and
Cys341 rendering a hydrophobic preference for the C-terminal penultimate residue (disclosed in US 5,945,329 Customized Protease; US 5,985,627 Modified Carboxypeptidase). The C- terminal recognition of CPY is accomplished strictly by hydrogen bonding. The carboxyl terminus of the peptide forms hydrogen bonds with the backbone amide of Gly52 and the side chains of Asn51 and Glul45 in CPY.
[0236] In some embodiments, selective binders can be derived from the M14 family, which is comprised of metallo-carboxypeptidases including Carboxypeptidase A (CPA), Carboxypeptidase B (CPB), and Carboxypeptidase T (CPT) such as the thermophilic bacterial carboxypeptidase from Thermoactinomyces vulgaris (Gomis-Ruth, Structure and Mechanism of Metallocarboxypeptidases, Critical Reviews in Biochemistry and Molecular Biology, (2008) 43:5, 319-345). The compact globular shape of the funnelin carboxypeptidases and cone-like entrance to the binding pocket are well suited to being engineered as C-terminal binders. Various members of the M14 family exhibit different C-terminus specificities. For instance, Thermoactinomyces vulgaris CPT has broad substrate specificity against hydrophobic, hydrophilic, and charged residues at the C-terminus. This contrasts with the narrow substrate specificity of the CPA and CPB families which hydrolyze hydrophobic and positively charged residues, respectively, from the C-termini of the peptides (Akparov, V., et al., Structural insights into the broad substrate specificity of carboxypeptidase T from Thermoactinomyces vulgaris. FEBS Journal, (2015), 282(7), 1214-1224; Akparov, V., et al., Structural principles of the wide substrate specificity of Thermoactinomyces vulgaris carboxypeptidase T. reconstruction of the carboxypeptidase B primary specificity pocket. Biochemistry. Biokhimiia, (2007), 72(4), 416— 423).
[0237] The specificity for the C-terminal residue is largely determined by the identity of the amino acids comprising the specificity/b inding pocket. As such, the M14 funnelin family of carboxypeptidases can have their specificity/binding pocket altered through directed evolution by mutating residues in the specificity/binding pocket. In particular, residues (CPA numbering) at locations 194, 203, 207, 243, 247, 248, 250, 253-255, and 268 play critical roles in C-terminal amino acid specificity (Gomis-Ruth, Structure and Mechanism of Metallocarboxypeptidases, Critical Reviews in Biochemistry and Molecular Biology, (2008) 43:5, 319-345). This is exemplified by the specificity conferred by residue 255 in which isoleucine is present in CPA (hydrophobic residue preference), aspartate in CPB (positively charged residue preference), threonine in CPT (broad specificity), and arginine in M14 N/E-type carboxypeptidases (negatively charged residue specificity) (Akparov et al., Structural principles of the wide substrate specificity of Thermoactinomyces vulgaris carboxypeptidase T. reconstruction of the
carboxypeptidase B primary specificity pocket. Biokhimiia, (2007), 72(4), 416-423). Other residues such as Argl45, Tyr248, and Asnl44 involved in binding to the C-terminal carboxylate should remain unaltered since they are involved in formation of a salt bridge with the C- terminal-carboxylate of the substrate. In a preferred embodiment, a thermophilic CPT such as P halophilum, Th. vulgaris, or L. thermophila is used as a scaffold for binder.
[0238] Human leukocyte antigens (HLA’s) in humans (aka major histocompatibility complex (MHC) in animals) are part of a naturally evolved mechanism to derive antibodies against a particular (protein derived) antigen. Upon intracellular digestion of a foreign protein, resulting peptides are bound to HLA molecules and displayed on the extracellular surface for recognition by host T-lymphocytes. Significant binding energy is derived from interactions with the C-terminal region of the peptide (Guillaume P, et al., Proceedings of the National Academy of Sciences, 2018, 115 (20) 5083-5088), indicating these proteins may provide useful scaffolds to derive C-terminal binders. While N-terminal degrons were the first identified, a number of C- terminal degrons have been identified recently (Guillaume P, et al., Proceedings of the National Academy of Sciences, 2018, 115 (20) 5083-5088). E3 ubiquitin ligases of the cullin-RING (CRL) family recognize C-terminal amino acids or motifs to facilitate targeted protein degradation (Varshavsky A. N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci U S A. 2019; 116(2):358-366). C-degron pathways for glycine, arginine, aspartate, alanine, and valine have identified and these proteins are therefore potential candidates to derive tethered amino acid binder scaffolds (Timms and Koren, “Tying up loose ends: the N-degron and C-degron pathways of protein degradation”. Biochem Soc Trans 28 August 2020; 48 (4): 1557-1567).
[0239] In some embodiments, binders capable of binding to a terminal amino acid residue are engineered from aminoacyl tRNA synthetases (aaRSs). The aaRSs are a class of proteins with exquisite amino acid binding specificity (US 9435810 B2). The set of 20+ aaRSs exhibit various modes of binding to the amino acids (AAs) including hydrophobic binding, hydrogen binding, salt-bridges, and pi-pi stacking (Kaiser F, et al. The structural basis of the genetic code: amino acid recognition by aminoacyl-tRNA synthetases. Sci Rep (2020) 10, 12647; Borgo, B, "Strategies for Computational Protein Design with Application to the Development of a Biomolecular Tool-kit for Single Molecule Protein Sequencing" (2014)). These references describe using engineered aaRSs as N-terminal binders, but engineered aaRSs also offer the capability of generating high affinity interactions with amino acid residues having an exposed C- terminus. The aaRSs activate their cognate amino acids through C-terminal conjugation of ATP to produce an aminoacyl adenylate intermediate, which involves significant binding energy
generation due to hydrophobic and hydrogen bonding. The activated amino acid is subsequently transferred to a respective tRNA to “charge” the tRNA for subsequent protein synthesis. As such, adenylation of the C-terminal amino acid on the coupler-amino acid complex during aaRS binding and subsequent adenylation should lead to greatly increased affinity (Carter CW Jr, et al. “The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: an unlikely scenario for the origins of translation that will not be dismissed”. Biol Direct. 2014;9: 11). In one embodiment, an engineered catalytic portion of aaRS is used for CTAA residue binding. In another embodiment, an engineered enzyme portion of the aaRS is used for CTAA residue binding (Pham, Yen et al., Tryptophanyl-tRNA Synthetase Enzyme, Journal of Biological Chemistry, Volume 285, Issue 49, 38590 - 38601).
[0240] Example 2. Exemplary origin, synthesis and installation of NTMs on NTAA residues of peptides by N-terminal modifier agents.
[0241] Structures, origin and installation methods for exemplary N-terminal modifier agents used for modification of NTAA residues of peptides are shown below.
[0242] Exemplary N-terminal modifier agent for NTM=M64 (in the ester form).
[0243] Exemplary method of installing M64 onto N-terminal amino acid of a peptide, shown as NTAA-PP. Peptides, in solution or on solid-support, were dissolved in 25 uL of 0.4 M MOPS buffer, pH=7.6 and 25 uL of acetonitrile (ACN). Separately, the active ester reagent was prepared from M64 and dissolved in 25 uL DMA and 25 uL ACN to a concentration of 0.05 M stock solution. Then, 50 uL of the active ester stock solution was added to the peptide- ACN:MOPS solution and incubated at 65 °C for 60 minutes. Upon completion, the peptides were functionalized with the respective modification as shown in the above schemes.
[0244] Alternatively, a surfactant-aqueous coupled system can be employed to install NTM (M64) onto the N-terminal amino acid of peptides. Using a 10 mM solution of 5% DMSO in 2% TGPS-750-M in water containing 1% 2,6-lutidine, the peptides are modified to completion in 20 minutes at 40 °C.
[0245] Multiple other NTMs have been similarly installed on N-terminal amino acids of peptides to increase affinity of N-terminal binders. Exemplary NTM materials and syntheses
were described in US 2022/0283175 Al and in U.S. provisional application 63/525,347 fded July 06, 2023, each incorporated herein by reference.
[0246] Example 3. Binder engineering from relevant scaffolds.
[0247] Scaffolds disclosed in Example 1 above can be utilized to generate high affinity binders specific for particular terminal amino acid residues of polypeptides. The procedure is described in US 2022/0283175 Al and briefly summarized below.
[0248] Binder engineering involves improving affinities of potential binding sites through rational, structure -based approaches on a parental scaffold and generating libraries that contain degenerate NNK codons at multiple, defined positions using Kunkel mutagenesis and phage display selection. Kunkel mutagenesis is a known site-directed mutagenesis strategy that introduces point mutations by annealing mutation-containing oligonucleotides to single-stranded uracil-containing single strand DNA (dU-ssDNA) templates. Exemplary Kunkel mutagenesis and phage display selection methods are described in US 9102711 B2; US 10906968 B2; and Kunkel, Proc. Natl. Acad. Sci. USA, 1985, 83(2):488-492.
[0249] In this example, high diversity (~1010) phage libraries using NNK variant site encoding were constructed targeting residues positions within the substrate-binding pockets of the selected metalloenzymes. Phosphorylated primers were obtained that possess degenerate codons at intended positions and were annealed to uracilated ssDNA containing the parental sequence of the same binder of interest with introduced SacII sites. After polymerase extension and ligation, the heteroduplex DNA was transformed into custom TG1RM cells (Lucigen TGI Electrocompetent Cells containing a pCDF-lb plasmid expressing SacII enzyme), which removed undesired template DNA with SacII sites resulting in 109- 1010 libraries. Monovalent phage libraries were packaged using standard helper phage and precipitated using PEG/NaCl solution. Using standard protocols, phage libraries were panned against different N-terminally modified target peptides. NTAA modification was applied to target peptides during binder screening and maturation to increase substrate surface available for interaction with the binder, which would result in selection of binders with higher affinity and Pl specificity.
[0250] For each round of phage display selection, precipitated phage in the presence of peptide and protein competitors were first depleted against beads coated with off-target peptides for 1 hour at 24 °C and then panned against beads coated with target peptides for 1 hour at 24 °C. After washing 6 times with PBST, beads-bound phages were eluted using 0.2 M pH=2.2 glycine for 10 min at 24 °C and then subsequently used to infect mid-log phase TGI cells. Once the final round of selection was complete, the output was profiled in a phage-based, multiplexed
binding assay (Luminex, DiaSorin, USA) against a panel of N-terminally modified target peptides and underwent next-generation sequencing to obtain clone enrichment sequence information. Luminex enables analysis of binding of phage libraries against multiple peptide targets immobilized on beads in a single assay well. This is accomplished by spatially separating immunoassays performed on beads that contain unique fluorophore cores that exhibit distinct excitation/emission profiles. Multiple target peptide-specific beads are combined in a single well of a multi-well microplate to detect and quantify multiple targets simultaneously. Specific binders were isolated against a variety of N-terminally modified target peptides. Based on the sequence identities after enrichment, consensus mutations or mutational hotspots were identified, and binders were expressed and purified for testing in the encoding assay.
[0251] Example 4. Binder maturation.
[0252] Binder maturation for affinity and specificity involved multiple cycles of error prone PCR prior to library construction via Kunkel mutagenesis and phage display selection, performed essentially as described in Example 3. Briefly, 60-90 cycles of error prone PCR on a parental binder generated PCR amplicons with an average of 4-6 random amino acid mutations per 100 amino acids. The dsDNA amplicon was digested by lambda exonuclease into “megaprimer” ssDNA, which was used to generate heteroduplex DNA by annealing to uracilated ssDNA of the vector containing the parental sequence of the same binder of interest with introduced SacII sites. After polymerase extension and ligation, the heteroduplex DNA was transformed into custom TG1RM cells (Lucigen TGI Electrocompetent Cells containing a pCDF-lb plasmid expressing SacII enzyme), which removed undesired template DNA with SacII sites resulting in 109- 1010 libraries. Monovalent phage libraries were packaged using standard helper phage and precipitated using PEG/NaCl solution. For each round of phage display selection, precipitated phage in the presence of peptide and protein competitors were first depleted against beads coated with off-target peptides for 1 hour at 24 °C and then panned against beads coated with target peptides for 1 hour at 24 °C. After washing 6 times with PBST, beads-bound phages were eluted using 0.2 M pH=2.2 glycine for 10 min at 24 °C and then subsequently used to infect mid-log phase TGI cells. Once the final round of selection was completed, the output was profiled in a phage-based, multiplexed binding assay (Luminex, DiaSorin, USA) against a panel of N-terminally modified target peptides and underwent nextgeneration sequencing to obtain clone enrichment sequence information. Based on the sequence identities after enrichment, consensus mutations or mutational hotspots were identified, and binders were expressed and purified for testing in the encoding assay.
[0253] Example 5. Binder expression and purification.
[0254] Plasmid DNA was received from a vendor generated source containing the identified engineered binders conjugated with an N-terminal hexa-histidine tag and a C-terminal SpyCatcher domain (sequences of selected binders used in Example 7 are shown in SEQ ID NOs: 2, 6, 12, 14, 16, 18 and 20). Plasmids were transformed into chemically competent E. coli cells using standard methods. Recovery was done by adding 150 ul of warm SOC and incubation for 1 hour at 30 °C. After recovery, 80 ul of transformed culture was added to 1 ml 2YT containing corresponding antibiotic. The culture was grown overnight and then used to generate stock in glycerol. The stock was then used to inoculate an overnight culture of 2YT containing corresponding antibiotic, and the culture was grown overnight for ~20 hours at 37 °C. This culture was subsequently used to inoculate another larger volume culture of 2YT containing corresponding antibiotic at a 100-fold dilution. The culture was then left at 37°C for 3-4 hours until an optical density of 0.6 was reached. Temperature was then lowered to 15 °C and protein expression was induced with a final concentration of 0.5 mM IPTG. The cultures were grown for an additional 16-20 hours and the cells were harvested by centrifugation at 4,000 rpm for 20 min. The cellular pellets were stored at -80 °C until ready for use.
[0255] Stored cellular pellets were resuspend in 25 mM Tris pH=7.9, 500 mM NaCl, and 10 mM imidazole with included protease inhibitor and were lysed by sonication. The clarified lysate was loaded onto an AKTA FPLC using a tandem purification method of nickel affinity and size-exclusion chromatography. The retained protein was eluted from the nickel affinity column using 25 mM Tris pH 7.9, 500 mM NaCl, 300 mM imidazole directly onto the sizeexclusion column. The size-exclusion buffer was 25 mM PO4 pH 7.4 with 150 mM NaCl, and after elution and concentration, glycerol was added to final concentration of 10%. Proteins were aliquoted, frozen, and stored at -80 °C.
[0256] Example 6. Removal of terminal amino acid residues from polypeptide analytes. [0257] During the described NGPS-type encoding assay, progression from one encoding cycle to another is performed by cleavage of the terminal amino acid residue encoded at the current cycle, which would expose the next terminal amino acid residue to be encoded in the next encoding cycle.
[0258] Several approaches have been disclosed for removal of N-terminal amino acid residues of polypeptides, including chemical cleavage and enzymatic cleavage. In some embodiments, an engineered enzyme (“Cleavase”) that catalyzes the removal of a labeled (e.g., chemically labeled or modified) N-terminal amino acid can be used. Generation of such enzymes is disclosed in U.S. patent No. 11,427,814 B2, incorporated herein by reference. When
such enzymes have biases towards the cleavage of particular NTAA residues, a set of the engineered enzymes can be employed that would cover the cleavage reaction for all (or most) standard NTAA residues (see US 11,427,814 B2). In Example 7 below, an M64-specific cleavase set is used, comprising 3 engineered enzymes having amino acid sequences set forth in SEQ ID NOs: 21-23, which were engineered from dipeptidyl peptidase from Thermomonas hydrothermalis (sequence set forth in SEQ ID NO: 24) as described in US 11,427,814 B2. [0259] Modification of NTAA residues of polypeptide analytes is beneficial to limit cleavage of terminal residues to just a single residue per encoding cycle. Modified Cleavase enzymes were evolved to accommodate the installed NTM in their substrate binding pocket along with the NTAA residue, preventing progressive cleavage of the penultimate terminal amino acid residue of the polypeptide analyte, unless the NTM is installed again on the terminal residue formed after the cleavage of the original NTAA by the modified Cleavase enzyme (see US 11,427,814 B2). Therefore, in preferred embodiments, terminal amino acid residues of polypeptide analytes are modified before the cleavage of them could occur.
[0260] In some embodiments, the N-terminal amino acid is removed or eliminated using any of the chemical methods as described in in US 2020/0348307 Al, US 2022/0227889 Al, and US US 11,499,979 B2, all of which are incorporated herein by reference.
[0261] In other embodiments, the polypeptide is cleaved with a CTAA cleaving enzyme, such as C-terminal exopeptidases. The rate of cleavage of a C-terminal exopeptidase can be controlled to ensure cleavage of a single or few terminal amino acid residues.
[0262] A few different strategies may be employed to generate C-terminal cleavase enzymes capable of removing a C-terminal amino acid residue from polypeptide analytes. The first strategy employs a C-terminal carboxypeptidase (CP) to remove a single terminal amino acid residue from polypeptide analytes per one cycle. The modified carboxypeptidase is preferably engineered to be dependent on a C-terminal modification that enables stepwise cyclic sequencing with removal of only one CTAA per cycle. The engineering process can be performed as described in US 11,427,814 B2, where similar modified Cleavases were engineered to recognize and cleave only a modified NTAA residues from polypeptide analytes. Alternatively, the cleavase enzyme can be comprised of an engineered dipeptidyl carboxypeptidase which removes a dipeptide from the C-terminus of polypeptide analytes. Again, the modified dipeptidyl CP (DCP) is preferably engineered to be dependent on a C- terminal modification to enable stepwise cyclic sequencing (see FIG. 5). Furthermore, if the C- terminal modification also includes installation of an amino acid (native or unnatural) or bulky residue, the activity of the dipeptidyl CP will remove the resultant modified C-terminal
dipeptide comprised of the modification and original CTAA. In this way, either an engineered CP or DCP can be used for stepwise C-terminal sequencing.
[0263] In some embodiments, C-terminal Cleavase enzymes are engineered from an M32 carboxypeptidase. The M32 family of CPs has two subfamilies, one subfamily that has limitations on peptide substrates length restricted to 10 amino acid residues or less, whereas the other subfamily has no limitations on peptide substrates length. TaqCP is an example of a peptide length restricted member of the first subfamily, and BsuCP is a non-length restricted M32 CP of subfamily 2 (ypwA peptidase, MEROPS) (Lee, et al., “Insight into the Substrate Length Restriction of M32 Carboxypeptidases: Characterization of Two Distinct Subfamilies.” Proteins 77 (3): 647-57). In some embodiments, BsuCP is engineered into a C-terminal Cleavase with ability to work on any length peptide substrate with a free C-terminal end. C- terminal Cleavase may be engineered by modifying residues in the CTAA binding site using homology to TaqCP family homologues.
[0264] Example 7. Exemplary N and D amino acid residues detection using encoding assay.
[0265] In this example two exemplary binders were used to specifically bind to nucleic acid- polypeptide conjugates immobilized on a solid support. One binder binds to N-terminal asparagine residues of polypeptides (N-binder; SEQ ID NO: 1), and the other binder binds to N- terminal aspartate residues of polypeptides (D-binder; SEQ ID NO: 5). Both binders were engineered from human carbonic anhydrase I and II scaffolds by directed evolution, specifically as described in Example 2 of US 2022/0283175 AL The binders were conjugated to corresponding nucleic acid coding tags comprising barcodes with identifying information regarding the binder. The coding tags specific for each binder were attached to SpyTag via a PEG linker, and the resulting fusions were conjugated with binder-SpyCatcher fusion protein via SpyTag-SpyCatcher interaction, essentially as described in US 2021/0208150 AL
[0266] For encoding assay, peptides including two test macromolecules (M64- NAEIAGDVAGGK(azide), hereafter M64-premodified N-peptide (SEQ ID NO: 25), and M64- DAEIAGDVAGGK (azide), hereafter M64-premodified D-peptide (SEQ ID NO: 26) were attached to 5 ’-phosphorylated and internal DBCO-modified short oligonucleotide individually, and each resulting peptide-DNA chimera was joined to 5’-phosphrylated oligonucleotide including barcode sequence using T4 DNA ligase. The barcoded peptide-DNA chimeras were pooled and immobilized bead-attached capture DNA (SEQ ID NO: 27) using ligation reaction. Barcoded DNA-polypeptide conjugates pool (200 nM) was mixed in lx Quick ligation buffer with T4 DNA ligase and added to the capture DNA-attached beads. After a 30-minute incubation at 25 °C, the beads were washed with PBST, 20% formamide in PBST and twice of
PBST. The recording tags in the conjugates contain a barcode for the macromolecule, 2 nt overhang complementary region, Type II restriction enzyme binding region, and flanking region. After chimera immobilized to capture DNA-attached beads, the samples were incubated with Klenow fragment (3 ’->5’ exo-) (0.125 U/pL, MCLAB, USA), Btsl-V2 (0.5 U/pL, New England Biolabs, USA), dNTPs (each at 125 pM), and CutSmart buffer (50 mM Potassium acetate, 20 mM Tris-acetate , 10 mM Magnesium acetate , 100 pg/ml BSA, pH 7.9, New England Biolabs, USA) at 25 °C for 30 min and washed twice with PBST. As a result, 2nt overhang at 3 ’ was generated.
[0267] The coding tags attached to the N-binder and D-binder each form a loop with 8 bp duplex and 2 nt overhang at the 3’, which is complementary to the 3’ overhang of the recording tag on the beads. The coding tags contain unique barcodes for identification of the binders, and also have the Btsl-V2 binding sequence and the 2 nt complimentary overhang region for the next binding cycle. The two binders (300 nM each) were incubated with the peptide -DNA chimera- immobilized beads in 50 mM MOPS buffer, pH 7.5. 33 mM Sodium Sulfate, 1 mM EDTA, at 25 °C for 30 min. Aster washing once with TBST, beads were incubated with 1.563 U/pL T4 DNA ligase ((New England Biolabs, USA), 0.125 U/pL of Klenow fragment (3’->5’ exo-) (MCLAB, USA), dNTP mixture (125 pM of each) and 50 nM cycle-capping oligonucleotide in the quick ligase buffer (66 mM Tris-HCl, 10 mM MgC12, 1 mM Dithiothreitol, 1 mM ATP, 7.5% Polyethylene glycol (PEG6000), pH at 7.6) at 25 °C for 30 min, followed by washes of TBST twice. During extension, transfer of barcode information from the coding tag to the recording tag was occurred, resulting in extended recording tags. The beads with extended recording tag were incubated with Btsl-V2 (0.5 U/pL, New England Biolabs, USA) and CutSmart buffer (50 mM Potassium acetate, 20 mM Tris- Acetate, 10 mM Magnesium acetate , 100 pg/ml BSA, pH 7.9, New England Biolabs, USA) at 25 °C for 30 min and washed twice with PBST. As a result, 2nt overhang at 3’ was generated.
[0268] After the binding cycle, end capping was performed to introduce a primer site for downstream PCR that will amplify extended recording tags for analysis. Capping oligos was introduced that contain a loop DNA with 2 nt 3 ’ overhang complimentary to the 3 ’ overhang of the extended or unextended recording tags. Instead of performing a separate capping step, a primer site for downstream PCR can be introduced during the extension reaction using longer coding tags that contain a complementary primer sequence. 400 nM of end capping oligos were incubated with the beads in presence of 12.5 U/pL of T4 DNA ligase, 0.125 U/pL of Klenow fragment (3 ’->5’ exo-) (MCLAB, USA) and dNTP mixture (125 pM of each) in lx quick ligase buffer (66 mM Tris-HCl, 10 mM MgCL, 1 mM Dithiothreitol, 1 mM ATP, 7.5% Polyethylene
glycol 6000, pH at 7.6) at 25 °C for 15 min. The sample were washed twice with TBST. The extended recording tags of the assay were subjected to PCR amplification and analyzed by nextgeneration sequencing (NGS), revealing barcode information about binders that were interacted with the macromolecules.
[0269] Exemplary encoding results generated by the described encoding method are shown in FIG. 6. The corresponding target peptides, M64-modified NAEIAGDVAGGK(azide) (SEQ ID NO: 25) and M64-modified DAEIAGDVAGGK (azide) (SEQ ID NO: 26) were encoded with a mix of N-binder (SEQ ID NO: 2) and D binder (SEQ ID NO: 6). Fractions of encoded recording tags (percentage of extended recording tags to total amount of recording tags on the beads (both extended and unextended)) were evaluated by NGS sequencing and showed specific encoding results for both binders.
[0270] Example 8. Exemplary human plasma protein detection using NGPS assay.
[0271] Next Generation Protein Sequencing (NGPS) assay, similar to the NGPS assay disclosed in US 20190145982 Al, was performed with plasma proteins. Binding of binders specific for M64-modified NTAA residues to immobilized polypeptide analytes followed by encoding of the identifying information regarding binders to recording tags attached to polypeptide analytes generates extended recording tags containing the whole “binding history” of corresponding polypeptide analytes. Essentially, there was accumulation of information about all sequential binding events for a polypeptide analyte on a single nucleic acid recording tag attached to this polypeptide analyte. The sequential addition of coding tag barcodes to the same recording tag attached to the polypeptide analyte offers a significant technical advantage when analyzing large numbers of macromolecules in a single assay (high-throughput analysis), increasing throughput capability of the assay, and greatly reduces the amount of sequencing needed to analyze a large plurality of polypeptides (e.g., only a single final extended recording tag needs to be analyzed for each polypeptide analyte).
[0272] Plasma proteins were digested to peptide fragments, modified with azide and loaded to beads, then sequenced by NGPS assay.
[0273] The binders were each conjugated to corresponding nucleic acid coding tags comprising barcodes with identifying information flanked by 2 nt overhang at 5 ’ terminus used for information transfer ligation reactions as described below. Coding tags specific for each binder were attached to a SpyTag peptide via a PEG linker, and the resulting fusion was conjugated with binder-SpyCatcher fusion protein via SpyTag-SpyCatcher interaction.
[0274] Each human plasma protein (100 pg) was processed using EasyPep MS Sample Prep Kits (Thermo Scientific, USA). Plasma protein was dissolved in 100 pL lysis solution, then reduced and alkylated at 95 °C for 10 min after adding 50 pL of reduction solution and 50 pL of alkylation solution. The resulting mixture was mixed with 50 pL of Trypsin/Lys-C protease mix, then incubated at 37 °C for 3 hours. After incubation, 50 pL of digestion stop solution was added. The digested peptide fragments were purified using Cl 8 columns. The digested peptide fragments solution was bedded to Cl 8 resin, washed once with solution A, twice with solution B, and eluted with 300 pL of elution solution. The purified peptide fragments solution was dried using speedvac, then dissolved in acetonitrile. The azide-modification was performed in 50 pL solution including 84% Acetonitrile/ 16% water, 20 mM Lys(N3) and 0.96 mg/mL carboxypeptidase Y (Sigma- Aldrich, USA) at 30 °C for 16 hours. The azide-modified peptide fragments were purified by HPLC.
[0275] Azide-modified peptide fragments were attached to short nucleic acid via Dibenzaocyclooctyl (DBCO) copper-free click chemistry, and the resulting peptide-DNA chimeras were barcoded and loaded to hairpin capture DNA beads. The capture DNAs were conjugated to the beads using trans-cyclooctene (TCO) and methyltetrazine (mTet)-based click chemistry. TCO-modified short hairpin capture DNAs (16 basepair stem, 4 base loop, 17 base 5’ overhang) were reacted with mTet-coated beads. Phosphorylated nucleic acid-polypeptide conjugates (200 nM) were annealed and ligated to the hairpin DNAs attached to beads in lx Quick ligation buffer with T4 DNA ligase by 30 minutes incubation at 25 °C. The beads were washed twice with PBST and resuspended in the 50 pL of PBST. The recording tags in the conjugates contain a barcode for the plasma protein sample, 2 nt overhang complementary region, Type II restriction enzyme binding region, and flanking region. After chimera immobilized to capture DNA-attached beads, the samples were incubated with Klenow fragment (3 ’->5’ exo-) (0.125 U/pL, MCLAB, USA), Btsl-V2 (0.5 U/pL, New England Biolabs, USA), dNTPs (each at 125 pM), and CutSmart buffer (50 mM Potassium acetate, 20 mM Tris-acetate , 10 mM Magnesium acetate , 100 pg/ml BSA, pH 7.9, New England Biolabs, USA) at 25 °C for 30 min and washed twice with PBST. As a result, 2nt overhang at 3’ was generated.
[0276] After termini preparation, an exemplary N-terminal modifier (NTM) M64 was installed on N-terminal amino acid (NTAA) residues of peptides immobilized on beads, providing higher affinity and specificity during binding reactions with metalloprotein-based binders that recognize NTM-modified NTAA residues (a proper size of NTM that fits a binding pocket of the binding metalloprotein binder is required, see also US 2022/0283175 Al, incorporated herein by reference), and also compatibility with removal of the NTM-modified
NTAA residues after binding by Cleavase enzymes (see U.S. patent No. 11,427,814 B2, incorporated herein by reference).
[0277] Exemplary method of installing M64 NTM onto NTAA residues of immobilized peptides, shown as NTAA-PP, is provided below. Beads with immobilized peptides were treated with 25 pL of 0.4 M MOPS buffer, pH=7.6 and 25 pL of acetonitrile (ACN). Separately, the active ester reagent was prepared from M64 and dissolved in 25 pL DMA and 25 pL ACN to a concentration of 0.05 M stock solution. Then, 50 pL of the active ester stock solution was added to the peptide- ACN:MOPS solution and incubated at 45 °C for 60 minutes. Upon completion, the peptides were functionalized with the respective modification as shown in the scheme below. Beads with immobilized and M64-modified recording tag-polypeptide conjugates were washed from excess of M64 reagent.
[0278] A mixture of seven binder was used comprising: a) L binder (SEQ ID NO: 12) having specificity for N-terminally modified L (leucine) amino acid residues of peptides; b) N binder (SEQ ID NO: 2) having specificity for N-terminally modified N (asparagine) amino acid residue of peptides; c) D-E binder (SEQ ID NO: 6) having specificity for N-terminally modified D (aspartic acid) and E (glutamic acid) amino acid residue of peptides; d) F binder (SEQ ID NO: 16) having specificity for N-terminally modified F (phenylalanine) amino acid residue of peptides; e) V-I binder (SEQ ID NO: 18) having specificity for N-terminally modified V (valine) and I (isoleucine) amino acid residue of peptides; f) Y binder (SEQ ID NO: 20) having specificity for N-terminally modified Y (tyrosine) amino acid residue of peptides; g) S binder (SEQ ID NO: 14) having specificity for N-terminally modified S (serine) amino acid residue of peptides. Each binder was conjugated to corresponding nucleic acid coding tag comprising a barcode with identifying information regarding the binder as described above.
[0279] The binders (400 nM for L binder, 600 nM for N binder, 600 nM for D-E binder, 400 nM F binder, 600 nM V-I binder, 600 nM Y binder and 600 nM S binder) were incubated with the beads in 50 mM MOPS buffer, 33 mM NaSO4, 1 mM EDTA and 0.1% Tween20, pH 7.5 at 25 °C for 30 min. After washing with TBST, 50 nM of cycle capping oligonucleotides, lx quick ligation buffer, 12.5 U/pL T4 DNA ligase, 0.125 U/pL Klenow fragment exo-, 125 pM dNTP
each, and 0.1% Tween 20 were used for incubation at 25 °C for 30 min. After washing twice with TBST, the resulting beads were incubated with digestion reagent (lx rCutsmart Buffer, 0.05 U/pL Btsl-V2, 0.1% Tween 20) at 37 °C for 30 minutes. Cycle capping oligonucleotides were used to provide extension of recording tags for peptide analytes that did not participate in interaction with the binders followed by encoding (recording tags that were non-extended in a given cycle), which allows to generate compatible termini in all recording tags attached to peptide analytes suitable for the next encoding cycle.
[0280] After washing with TBST, the encoded beads were incubated with the cleavase mix that is configured to cleave majority of modified NTAA residues, which includes 20 nM Z11, 2 nM Z13 and 2 nM Z15 in 0.2x TBE buffer at 45 °C for 1 hour to cleave the N-terminal M64- modified amino acids of peptides. The cleavase-treated beads were used for next cycle N- terminal modification step. In this experiment, four cycle of N-terminal M64 modification, encoding and cleavage step and additional N-terminal M64 modification and encoding were carried out to sequence 5 amino acid residues.
[0281] Before analyzing the extended recording tags by NGS, the end capping step was carried out in this experiment. The goal of the end capping step is to introduce primers for amplification of the extended recording tags. The conditions for the end capping step were as follows: lx quick ligation buffer, 12.5 U/ pL of T4 DNA ligase, 0.125 U/pL Klenow fragment exo-, 125 pM dNTP each, 0.1% Tween 20, and 0.4 pM each of the end cap oligos at 25 °C for 15 min. After washing, lx rCutSmart Buffer, 0.02 U/pL of USER enzyme (New England Biolabs, USA) and 0.1% Tween 20 were applied to the samples at 37 °C for 30 min. After washing, the extended recording tags were amplified by PCR. The final products were analyzed by next-generation sequencing (NGS).
[0282] During the first “binding-encoding” cycle identifying information regarding specific binders bound to the M64-modified NTAAs was recorded in extended recording tags. After the first “binding-encoding” cycle was complete, M64-modified NTAAs of immobilized polypeptides were cleaved, exposing a new NTAA residue for each immobilized polypeptide. Then, “binding-encoding” cycle was repeated with a new mixture of binders specific for M64- modified NTAAs, and identifying information regarding specific binders bound to the new M64-modified NTAAs was recorded in extended recording tags. After that, a new M64- modified NTAA cleavage step occurred, and so on. Finally, analyzing extended recording tags containing the whole “binding history” of corresponding polypeptide analytes by NGS allows for deducing the identity of sequential terminal amino acid residues of polypeptide analytes, if
binders have at least a level of specificity for a particular modified NTAA over other modified NTAAs.
[0283] The result of this experiment is shown in FIG. 7, which indicates encoding results for human serum albumin (HSA) protein. Although each binder has specificity towards particular NTAA, each of the binders might still recognize several other NTAA residues, creating ambiguity at recognition, especially when binders compete with each other for a target residue. Particular reads (e.g., LZNNN, LZLYY, etc.) indicated on the horizontal axis of FIG. 7 were each assigned during the analysis to a particular peptide indicated after each read (separated by ::ALB:). Although no perfect matching of binders to particular NTAA was observed, particular peptides may be identified with certain probability using a known binding pattern for each binder, and based on identification of peptides, a protein from a biological sample may be identified from which the peptides were generated (HSA in this case).
[0284] In some embodiments, binders in the set of binders have only medium specificity towards a particular target moiety (e.g., TAA of polypeptides) such that a binder binds more than one target moiety and there is a significant probability of incorrect moiety identification based on a single binding event. In these embodiments, amino acid sequence of a polypeptide may be inferred based on (i) binding profiles of the binders from the set of binders that correspond to encoder barcode sequences present in extended recording tag sequences, and (ii) calculated probability scores of an association between a string of encoder barcode sequences that correspond to binders from the set of binders that bind to the polypeptide and one or more amino acid sequences of polypeptides of the plurality of polypeptides, as described in US patent application 18/951,277 filed on November 11, 2024.
[0285] Example 9. Detection of N-glycosylated site occupancy.
[0286] In order to switch the “analysis” method described in Example 8 to occupancy detection of N-glycosylated site(s), a deglycosylation step was performed before performing the NGPS assay. The beads were treated with recombinant PNGase F (New England Biolabs, USA) to remove the N-linked oligosaccharides from glycosylated peptides attached to beads. Plasma peptide-loaded beads (5000 beads) were resuspended in 10 pL of lx Deglycosylation Mix Buffer 1 (New England Biolabs, USA) including 1 pL of Protein Deglycosylation Mix II. The beads mixture was incubated at 25 °C for 30 minutes, then at 37 °C for 16 hours. After incubation, the beads were washed once with TBST, once with TBST including 500 mM NaCl, and twice with TBST. Then, the beads were used in the NGPS assay as described in Example 8.
[0287] In some embodiments, releasing N-linked glycans from polypeptides comprises treating polypeptides with PNGase F (1 000 U per 20 pg protein) in 50 mM sodium phosphate, pH 7.5 or 50 mM Tris-HCl, pH 8.0, and 1 % NP-40 at 37 °C for 2-16 hours (or overnight) with gentle mixing.
[0288] The deglycosylation treatment by PNGase F converts N-glycosylated N residues at the N- glycosylation sites into D residues. For example, Haptoglobin is known to be glycosylated at four different sites. During trypsin digestion of Haptoglobin, 3 trypsin generated fragments were made (SEQ ID NO: 28- SEQ ID NO: 30), which contain these 4 sites. If the glycosylation sites are occupied by glycans, PNGase F treatment generates 3 different peptides that contain “D” residues instead of “N” (SEQ ID NO: 31- SEQ ID NO: 33, see FIG. 8). The NGPS method as described in Example 8 identifies, which of the peptides having sequences set forth in SEQ ID NO: 28- SEQ ID NO: 33 are present in a sample containing Haptoglobin. If the amounts of each peptide are determined, the ratio of deglycosylated form (e.g., peptide having sequence SEQ ID NO: 31) to the total amount (deglycosylated (SEQ ID NO: 31) and native (SEQ ID NO: 28)) can be used to quantify percent of the glycosylation site occupancy (in this case, N184 of Haptoglobin).
[0289] Example 10. Addressing spontaneous deamidation of asparagine residues at N- glycosylated sites during quantification of glycosylation.
[0290] One of the challenges in detection and quantitation of glycosylation site occupancy is spontaneous deamidation of asparagine residues, especially with close proximity to a glycine residue or other small amino acid (see e.g., Palmisano G, et al., Chemical deamidation: a common pitfall in large-scale N-linked glycoproteomic mass spectrometry-based analyses. J Proteome Res. 2012 Mar 2; 11(3): 1949-57). The underlying assumption of the accurate quantification of glycosylation by the method described in Example 9 is that there is no spon taneous deamidation of asparagine residues, which would impair quantification results.
[0291] To quantify degree of glycosylation for the N-linked glycosylation site of polypeptide analytes in a sample, the method described in Example 9 is modified as follows. The beads with immobilized polypeptide molecules are divided in half, wherein one half is treated with PNGase F, while the other half is not treated with PNGase F. Each fraction is barcoded with a barcode nucleic acid sequence by attaching the fraction-specific barcode sequence to a corresponding associated nucleic acid recording tag attached to each polypeptide, which would allow to track during the following analysis whether a particular polypeptide molecule originated from the “treated” or “untreated” fraction. After that, both fractions and
again combined and processed by NGPS as described in Example 9. In addition, to enable better quantification, the same reference peptide(s) from glycosylated and non-glycosylated fractions may be analyzed to calculate possible differences between two fractions.
[0292] When polypeptide molecules not treated with the PNGase are identified during the analysis as having “D” residue at a N-glycosylated site, this indicates spontaneous deamidation, because normally, polypeptide molecules not treated with the PNGase should contain only “N” residue at the N-glycosylated site. The presence of a glycan at the N-glycosylated site precludes identification of “N” residue at a N-glycosylated site since the N-binder would not recognize N- glycan. Accordingly, by comparing amounts of polypeptide molecules having “D” residue at the N-glycosylated site in each of the two fractions (the fractions are distinguished by fractionspecific barcode), glycosylation at the N-glycosylated site may be quantified more accurately. FIG. 9 illustrates possible experimental design for glycosylation quantification, showing a case of heterogeneity at a particular N-glycosylation site of a polypeptide. Al-Cl are different polypeptide molecules of a polypeptide in fraction 1 , and A2-C2 are different polypeptide molecules of the same polypeptide in fraction 2. Presence of either “N” or “D” is shown at a N- glycosylated site of the polypeptide in each species. Cl and C2 represent spontaneous deamidation of asparagine residues at the N-glycosylated site. NGPS assay can be used to estimate the amounts of Cl (the amount of Al could not be determined), A2+C2 (the total amount of species having “D”), and B2 species. To estimate what fraction of protein molecules is glycosylated (FG), one may use the equation FG=A2/(A2+B2+C2). While A2 could not be directly measured by NGPS, it may be estimated as A2=(A2+C2) - Cl, if we assume that fractions Cl and C2 remain approximately equal, and the treatment of PNGase per se does not induce deamidation (in this experiment, fraction 1 is incubated in the same conditions as fraction 2, but without PNGase). Thus, the fraction of protein molecules glycosylated =[(A2+C2) - Cl] /(A2+B2+C2), and can be calculated from the NGPS experiment described in Example 9.
[0293] Sequences
[0294] SEQ ID NO: 1 - N-binder; S222 0397 (hCAII scaffold sequence only): SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NDGHAFAVEFDDSQDKAVLKGGPLDGTYRLFQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDFGKAKQQPDGLAVLGIFLKVGSAKPGLQKWDVLDSIKTKGKSA DFTNFDPRGLLPESLDYWTYPGSQTVPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGE GEPEELMVDNWRPAQPLKNRQIKASFK
[0295] SEQ ID NO: 2 - N-binder-SpyCatcher Fusion; S222_0397:
MGSSHHHHHHSSGSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLK PLSVSYDQATSLRILNDGHAFAVEFDDSQDKAVLKGGPLDGTYRLFQFHFHWGSLDGQ GSEHTVDKKKYAAELHLVHWNTKYGDFGKAKQQPDGLAVLGIFLKVGSAKPGLQKV VDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSQTVPPLLESVTWIVLKEPISVSS EQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFKGGSGGGSGGGSGGSVDT LSGLSSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQV
KDFYLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0296] SEQ ID NO: 3 - N-binder; S222 0400:
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL
NDGHAFLVEFDDSQDKAVLKGGPLDGTYRLKQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDFGKAKQQPDGLAVLGIFLKVGSAKPGLQKWDVLDSIKTKGKSA DFTNFDPRGLLPESLDYWTYPGSQTIPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGEG EPEELMVDNWRPAQPLKNRQIKASFK
[0297] SEQ ID NO: 4 - N-binder candidate; E.coli; AsnNAAB - from US 20210072252
Al, asparagine-tRN synthetase of (Table A; SEQ ID NO: 1 ):
SIEYLREVAHLRPRTNLIGAVARVRHTLAQALHRFFNEQGFFWVSTPLITASDTEGAGE MFRVSTLDLENLPRNDQGKVDFDKDFFGKESFLTVSGQLNGETYACALSKIYTFGPTFR AENSNTSRHLAEFWMLEPEVAFANLNDIAGLAEAMLKYVFKAVLEERADDMKFFAER VDKDAVSRLERFIEADFAQVDYTDAVTILENCGRKFENPVYWGVDLSSEHERYLAEEH FKAPVWKNYPKDIKAFYMRLNEDGKTVAAMDVLAPGIGEIIGGSQREERLDVLDERM LEMGLNKEDYWWYRDLRRYGTVPHSGFGLGFERLIAYVTGVQNVRDVIPFPRTP
[0298] SEQ ID NO: 5 - D-binder; S254 0083 (hCAI scaffold sequence only):
ASPDWGYDDKNGPEQWSKLYPIANGNNQSPVDIKTSETKHDTSLKPISVSYNPATAKEII NVGHSFRVNFEDNDNRSVLKGGPFSDSYRLYQFHFHWGSTNEHGSEHTVDGVKYSAEL HWHWNSAKYSSLAEALSKADGIAVIGVLMKVGEANPKLQKVLDALQAIKTKGKRAP FTNFDPSTLLPSSLDFWTYPGSWTAPPLYEIVTWIILKESISVSSEQLAQFRSLLSNVEGDN AVPMQHNNRPTQPLKGRTVRASF
[0299] SEQ ID NO: 6 - D-binder-SpyCatcher Fusion; S254 0083:
MGSSHHHHHHSSGASPDWGYDDKNGPEQWSKLYPIANGNNQSPVDIKTSETKHDTSLK PISVSYNPATAKEIINVGHSFRVNFEDNDNRSVLKGGPFSDSYRLYQFHFHWGSTNEHGS EHTVDGVKYSAELHVVHWNSAKYSSLAEALSKADGIAVIGVLMKVGEANPKLQKVLD ALQAIKTKGKRAPFTNFDPSTLLPSSLDFWTYPGSWTAPPLYEIVTWIILKESISVSSEQL
AQFRSLLSNVEGDNAVPMQHNNRPTQPLKGRTVRASFGGSGGGSGGGSGGSVDTLSGL
SSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVKDF
YLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0300] SEQ ID NO: 7 - D-binder; S222 0336:
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NNGHTFNVEFDDSQDKAVLKGGPLDGTYRLFQFHFHWGSLDGQGSEHTVDKKKYAAE LHLVHWNTKYGDFGKAIQQPDGIAILGIFLKVGSAKPGLQKWDVLDSIKTKGKSADFT NFDPRGLLPESLDYWTYPGSMTIPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPE ELMVDNWRPAQPLKNRQIKASFK
[0301] SEQ ID NO: 8 - D-binder; S222 0195:
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAE LHLVHWNTKYGDFRKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSAD FTNFDPRGLLPESLDYWTYPGSRTTPPLWESVTWIVLKEPISVSSEQVLKFRKLNFNGEG EPEELMVDNWRPAQPLKNRQIKASFK
[0302] SEQ ID NO: 9 - D-binder; E.coli; AspNAAB - - US 20210072252 Al aspartic acid- t RN A synthetase (Table A; SEQ ID NO: 14):
LPLDSNHVNTEEARLKYRYLDLRRPEMAQRLKTRAKITSLVRRFMDDHGFLDIETPMLT KATPEGARDYLVPSRVHKGKFYALPQSPQLFKQLLMMSGFDRYYQIVKCFRDEDLRAD RQPEFTQIDVETSFMTAPQVREVMEALVRHLWLEVKGVDLGDFPVMTFAEAERRYGSD KPDLRNPMELTDVADLLRSVEFAVFAGPANDPKGRVAALRVPGGASLTRKQIDEYDNF VKIYGAKGLAYIKVNERAKGLEGINSPVAKFLNAEHEAILDRTAAQDGDMIFFGADNK KIVADAMGALRLKVGKDLGLTDESKWAPLWVIDFPMFEDDGEGGLTAMHHPFTSPKD MTAAELKAAPENAVANAYDMVINGYEVGGGSVRIHNGDMQQTVFGILGINEEEQREK FGFLLDALKYGTPPHAGLAFGLDRLTMLLTGTDNIRDVIAFPK
[0303] SEQ ID NO: 10 - D-binder; Vibrio vulnificus from US 20200219590 Al (Table 1;
SEQ ID: 17):
MSSDIHQIKIGLTDNHPCSYLPERKERVAVALEADMHTADNYEVLLANGFRRSGNTIYK PHCDSCHSCQPIRISVPDIELSRSQKRLLAKARSLSWSMKRNMDENWFDLYSRYIVARH RNGTMYPPKKDDFAHFSRNQWLTTQFLHIYEGQRLIAVAVTDIMDHCASAFYTFFEPEH ELSLGTLAVLFQLEFCQEEKKQWLYLGYQIDECPAMNYKVRFHRHQKLVNQRWQ
[0304] SEQ ID NO: 11 - L-binder; S222 0151 (hCAII scaffold sequence only):
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL
NNGHVFGVEFDDSQDKAVLKGGPLDGTYRLNQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDLGKAVQQPDGIAILGIFLKVGSAKPGLQKWDVLDSIKTKGKSAD FTNFDPRGLLPESLDYWTYPGSLTIPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGEGE PEELMVDNWRPAQPLKNRQIKASFK
[0305] SEQ ID NO: 12 - L-binder-SpyCatcher Fusion; S222 0151 :
MGSSHHHHHHSSGSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLK PLSVSYDQATSLRILNNGHVFGVEFDDSQDKAVLKGGPLDGTYRLNQFHFHWGSLDGQ GSEHTVDKKKYAAELHLVHWNTKYGDLGKAVQQPDGIAILGIFLKVGSAKPGLQKW DVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTIPPLLESVTWIVLKEPISVSSEQ VLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFKGGSGGGSGGGSGGSVDTLS GLSSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVK DFYLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0306] SEQ ID NO: 13 - S-binder; S222 0415 (hCAII scaffold sequence only):
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NNGHAFDVEFDDSQDKAVLKGGPLDGTYRLHQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDYGKAAQQPDGIAILGIFLKVGSAKPGLQKVVDVLDSIKTKGKSAD FTNFDPRGLLPESLDYWTYPGSMTIPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGEGE PEELMVDNWRPAQPLKNRQIKASFK
[0307] SEQ ID NO: 14 - S-binder-SpyCatcher Fusion; S222 0415:
MGSSHHHHHHSSGSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLK PLSVSYDQATSLRILNNGHAFDVEFDDSQDKAVLKGGPLDGTYRLHQFHFHWGSLDGQ GSEHTVDKKKYAAELHLVHWNTKYGDYGKAAQQPDGIAILGIFLKVGSAKPGLQKW DVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSMTIPPLLESVTWIVLKEPISVSSEQ VLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFKGGSGGGSGGGSGGSVDTLS GLSSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVK DFYLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0308] SEQ ID NO: 15 - F-binder; S222 0903 (hCAII scaffold sequence only):
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL NNGHTFIVEFDDSQDKAVLKGGPLDGTYRLVQFHFHWGSLDGQGSEHTVDKKKYAAE LHLVHWNTKYGDLGKALQQPDGLAILGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADF TNFDPRGLLPESLDYWTYPGSWTVPPLLESVTWIVLKEPISVSSEQVLKFRKLNFNGEGE PEELMVDNWRPAQPLKNRQIKASFK
[0309] SEQ ID NO: 16 - F-binder-SpyCatcher Fusion; S222 0903:
MGSSHHHHHHSSGSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLK
PLSVSYDQATSLRILNNGHTFIVEFDDSQDKAVLKGGPLDGTYRLVQFHFHWGSLDGQ
GSEHTVDKKKYAAELHLVHWNTKYGDLGKALQQPDGLAILGIFLKVGSAKPGLQKW
DVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSWTVPPLLESVTWIVLKEPISVSSE
QVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFKGGSGGGSGGGSGGSVDTL SGLSSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQV KDFYLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0310] SEQ ID NO: 17 - V-binder; S254 0366 (hCAI scaffold sequence only):
ASPDWGYDDKNGPEQWSKLYPIANGNNQSPVDIKTSETKHDTSLKPISVSYNPATAKEII
NVGHSFWVNFEDNDNRSVLKGGPFSDSYRLFQFHFHWGSTNEHGSEHTVDGVKYSAE
LHVAHWNSAKYSSLAEAQSKADGVAIIGVLMKVGEANPKLQKVLDALQAIKTKGKRA PFTNFDPSTLLPSSLDFWTYPGSLTVPPLFESVTWIILKESISVSSEQLAQFRSLLSNVEGD NAVPMQHNNRPTQPLKGRTVRASF
[0311] SEQ ID NO: 18 - V-binder-SpyCatcher Fusion; S254 0366:
MGSSHHHHHHSSGASPDWGYDDKNGPEQWSKLYPIANGNNQSPVDIKTSETKHDTSLK
PISVSYNPATAKEIINVGHSFWVNFEDNDNRSVLKGGPFSDSYRLFQFHFHWGSTNEHG
SEHTVDGVKYSAELHVAHWNSAKYSSLAEAQSKADGVAIIGVLMKVGEANPKLQKVL
DALQAIKTKGKRAPFTNFDPSTLLPSSLDFWTYPGSLTVPPLFESVTWIILKESISVSSEQL
AQFRSLLSNVEGDNAVPMQHNNRPTQPLKGRTVRASFGGSGGGSGGGSGGSVDTLSGL SSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVKDF YLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0312] SEQ ID NO: 19 - Y-binder; S222 0395 (hCAII scaffold sequence only):
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRIL
NDGHAFKVEFDDSQDKAVLKGGPLDGTYRLEQFHFHWGSLDGQGSEHTVDKKKYAA ELHLVHWNTKYGDLGKAMQQPDGLAILGIFLKVGSAKPGLQKWDVLDSIKTKGKSA DFTNFDPRGLLPESLDYWTYPGSQTIPPLIESVTWIVLKEPISVSSEQVLKFRKLNFNGEG EPEELMVDNWRPAQPLKNRQIKASFK
[0313] SEQ ID NO: 20 - Y-binder-SpyCatcher Fusion; S222 0395:
MGSSHHHHHHSSGSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLK
PLSVSYDQATSLRILNDGHAFKVEFDDSQDKAVLKGGPLDGTYRLEQFHFHWGSLDGQ GSEHTVDKKKYAAELHLVHWNTKYGDLGKAMQQPDGLAILGIFLKVGSAKPGLQKVV DVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSQTIPPLIESVTWIVLKEPISVSSEQ
VLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFKGGSGGGSGGGSGGSVDTLS
GLSSEQGQSGDMTIEEDSATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVK
DFYLYPGKYTFVETAAPDGYEVATAITFTVNEQGQVTVNGKATKGDAHI
[0314] SEQ ID NO: 21 - Z11 Cleavase enzyme:
DEGMWVPQQLPEIAGALKKAGLKLDPKQLSDLTGDPMGAWSLGGCTGSFVSPQGLV
ATNHHCAYGAIQLNSTPEKNLIKDGFNAPTQADELSAGPNARIYVLEGITDVTAQAKAA
MAAAGNDPVARANALEAFEKKITSDCEAEPGYRCRVYSFMGGITYRLFKNLEIKDVRL
VYAPPSSVGKFGGDIDMWMWPSHTGDFSFYRAYVGKDGKPAPYSKDNVPYRPKHWL
KIADTPLGEGDFVMVAGYPGRTDRYALVAEFENTQRWLYPAISKAYKDQIALVEAAAK
DNPEIAVKYAAALAGWNPTSKNFDGQLEGFKRNDVLAIKRREEAAVLRWLRARGKAG
TPALEAHAALVKLVADTARTQERDLVLGSFNRTGIIGVAVNLYRLAIERQKPDAEREPG
YQQRDLPVIEGSLKQMERRYVPAMDRQLRAYWLRRYVALPAAQHVAAVDAWLGGSD
KAAAEAALARLDQSRLGSLEERLKWFNADRAAFEASTDPAIQYAVAVMPTLLAMEQQ
AKTRYGVALEARPRYLQAWDYKKSKGQAVYPDANSTLRITYGHVKGYTGLNGKVYT
PFTTLEEVAAKCTGVEPFDCPKALLEAVAAKRYAGLADARLGTVPVNFLADLDITGGN
SGSPVLDANGRLVGLAFQVTLESVASNWVFDPVLTRMISVDQRYMRWIMQEVMPAPQ LLEELGVPPRQ
[0315] SEQ ID NO: 22 - Z13 Cleavase enzyme:
DEGMWVPQQLPEIAGALKKAGLKLDPKQLSDLTGDPMGAWSLGGCTGSFVSPQGLV
ATNHHCAYGAIQLNSTPEKNLIKDGFNAPTQADELSAGPNARIYVLEGITDVTAQAKAA
MAAAGNDPVARANALEAFEKKITSDCEAEPGYRCRVYSFMGGITYRLFKNLEIKDVRL
VYAPPSSVGKFGGDIDMWMWPSHTGDFSFYRAYVGKDGKPAPYSKDNVPYRPKHWL
KIADTPLGEGDFVMVAGYPGRTDRYALVAEFENTQRWLYPAISKAYKDQIALVEAAAK
DNPEIAVKYAAALAGWNPTSKLFDGQLEGFKRNDVLAIKRREEAAVLRWLRARGKAG
TPALEAHAALVKLVADTARTQERDLVLGSFNRTGIIGVAVNLYRLAIERQKPDAEREPG
YQQRDLPVIEGSLKQMERRYVPAMDRQLRAYWLRRYVALPAAQHVAAVDAWLGGSD
KAAAEAALARLDQSRLGSLEERLKWFNADRAAFEASTDPAIQYAVAVMPTLLAMEQQ
AKTRYGVALEARPRYLQAWDYKKSKGQAVYPDANSTLRITYGHVKGYTGLNGKVYT
PFTTLEEVAAKCTGVEPFDCPKALLEAVAAKRYAGLADARLGTVPVNFLADLDGTGGN
SGSPVLDANGRLVGLVFQGTLESVASDWVFDPVLTRCISVDQRYMRWIMQEVMPAPQ
LLEELGVPPRQ
[0316] SEQ ID NO: 23 - Z15 Cleavase enzyme:
DEGMWVPQQLPEIAGALKKAGLKLDPKQLSDLTGDPMGAWSLGGCTGSFVSPQGLV
ATNHHCAYGAIQLNSTPEKNLIKDGFNAPTQADELSAGPNARIYVLEGITDVTAQAKAA
MAAAGNDPVARANALEAFEKKITSDCEAEPGYRCRVYSFMGGITYRLFKNLEIKDVRL VYAPPSSVGKFGGDIDMWMWPSHTGDFSFYRAYVGKDGKPAPYSKDNVPYRPKHWL KIADTPLGEGDFVMVAGYPGRTDRYALVAEFENTQRWLYPAISKAYKDQIALVEAAAK DNPEIAVKYAAALAGWNPTSKLFDGQLEGFKRNDVLAIKRREEAAVLRWLRARGKAG TPALEAHAALVKLVADTARTQERDLVLGSFNRTGIIGVAVNLYRLAIERQKPDAEREPG YQQRDLPVIEGSLKQMERRYVPAMDRQLRAYWLRRYVALPAAQHVAAVDAWLGGSD KAAAEAALARLDQSRLGSLEERLKWFNADRAAFEASTDPAIQYAVAVMPTLLAMEQQ AKTRYGVALEARPRYLQAWDYKKSKGQAVYPDANSTLRITYGHVKGYTGLNGKVYT
PFTTLEEVAAKCTGVEPFDCPKALLEAVAAKRYAGLADARLGTVPVNFLADLDTTGGN SGSPVLDANGRLVGLGFQRTLESVASNWVFDPVLTRSISVDQRYMRWIMQEVMPAPQL LEELGVPPRQ
[0317] SEQ ID NO: 24 -Dipeptidyl peptidase from Thermomonas hydrothermalis (without the signal peptide) S46 Peptidase:
DEGMWVPQQLPEIAGALKKAGLKLDPKQLSDLTGDPMGAWSLGGCTGSFVSPQGLV ATNHHCAYGAIQLNSTPEKNLIKDGFNAPTQADELSAGPNARIYVLEGITDVTAQAKAA MAAAGNDPVARANALEAFEKKITSDCEAEPGYRCRVYSFMGGITYRLFKNLEIKDVRL VYAPPSSVGKFGGDIDNWMWPRHTGDFSFYRAYVGKDGKPAPYSKDNVPYRPKHWL KIADTPLGEGDFVMVAGYPGRTDRYALVAEFENTQNWLYPAISKAYKDQIALVEAAAK DNPEIAVKYAAALAGWNNTSKNFDGQLEGFKRNDVLAIKRREEAAVLEWLRARGKAG TPALEAHAALVKLVADTARTQERDLVLGSFNRTGIIGVAVNLYRLAIERQKPDAEREPG YQQRDLPVIEGSLKQMERRYVPAMDRQLRAYWLDRYVALPAAQHVAAVDAWLGGSD
KAAAEAALARLDQSRLGSLEERLKWFNADRAAFEASTDPAIQYAVAVMPTLLAMEQQ AKTRYGVALEARPRYLQAWDYKKSKGQAVYPDANSTLRITYGHVKGYTGLNGKVYT PFTTLEEVAAKETGVEPFDNPKALLEAVAAKRYAGLADARLGTVPVNFLADLDITGGN SGSPVLDANGRLVGLAFDGTLESVASNWVFDPVLTRMISVDQRYMRWIMQEVMPAPQ LLEELGVPPRQ
[0318] SEQ ID NO: 25 - NAEIAGDVAGGK(azide) - test peptide, synthetic
[0319] SEQ ID NO: 26 - DAEIAGDVAGGK (azide) - test peptide, synthetic
[0320] SEQ ID NOS 27 and 37, respectively - /5Phos/TGT AGG GAA AGA GTG TTT /iAmMC6T/T/iSpC3/A CAC TCT TTC CCT ACA CGA CGC TCTTCC GAT CT - Capture DNA, synthetic
[0321] SEQ ID NO: 28 - MVSHHNLTTGATLINEQWLLTTAK - test peptide, synthetic
[0322] SEQ ID NO: 29 - NLFLNHSENATAK - test peptide, synthetic
[0323] SEQ ID NO: 30 - WVLHPNYSQVDIGLIK- test peptide, synthetic
[0324] SEQ ID NO: 31 - MVSHHDLTTGATLINEQWLLTTAK - test peptide, synthetic
[0325] SEQ ID NO: 32 - NLFLDHSEDATAK - test peptide, synthetic
[0326] SEQ ID NO: 33 - WVLHPDYSQVDIGLIK - test peptide, synthetic
[0327] SEQ ID NO: 34 - PNGase F (Elizabethkingia miricola (Chryseobacterium miricola)):
MRKLLIFSISAYLMAGIVSCKGVDSATPVTEDRLALNAVNAPADNTVNIKTFDKVKNAF
GDGLSQSAEGTFTFPADVTTVKTIKMFIKNECPNKTCDEWDRYANVYVKNKTTGEWY
EIGRFITPYWVGTEKLPRGLEIDVTDFKSLLSGNTELKIYTETWLAKGREYSVDFDIVYG
TPDYKYSAWPVIQYNKSSIDGVPYGKAHTLGLKKNIQLPTNTEKAYLRTTISGWGHAK
PYDAGSRGCAEWCFRTHTIAINNANTFQHQLGALGCSANPINNQSPGNWAPDRAGWCP
GMAVPTRIDVLNNSLTGSTFSYEYKFQSWTNNGTNGDAFYAISSFVIAKSNTPISAPVVT N
[0328] SEQ ID NO: 35 - PNGase F, Elizabethkingia bruuniana:
MRKLLIFSISAYLMAGIVSCKGVDSATPVTEDRLALNAVNAPADNTVNIKTFDKVKNAF
GDGLSQSAEGTFTFPADVTTVKTIKMFIKNECPNKTCDEWDRYANVYVKNKTTGEWY
EIGRFITPYWVGTEKLPRGLEIDVTDFKSLLSGNTELKIYTETWLAKGREYSVDFDIVYG
TPDYKYSAWPVIQYNKSSIDGVPYGKAHTLGLKKNIQLPTNTEKAYLRTTISGWGHAK
PYDAGSRGCAEWCFRTHTIAINNANTFQHQLGALGCSANPINNQSPGNWAPDRAGWCP
GMAVPTRIDVLNNSLTGSTFSYEYKFQSWTNNGTNGDAFYAISSFVIAKSNTPISAPVVT N
[0329] SEQ ID NO: 36 - PNGase A, Prunus dulcis (Almond) (Amygdalus dulcis):
EPTPLHDTPPTVFFEVTKPIEVPKTKPCSQLILQHDFAYTYGQAPVFANYTPPSDCPSQTF
STIVLEWKATCRRRQFDRIFGVWLGGVEILRSCTAEPRPNGIVWTVEKDITRYYSLLKSN
QTLAVYLGNLIDKTYTGIYHVNISLHFYPAKEKLNSFQQKLDNLASGYHSWADLILPISR
NLPLNDGLWFEVQNSNDTELKEFKIPQNAYRAVLEVYVSFHENDEFWYSNLPNEYIAA NNLSGTPGNGPFREVWSLDGEWGAVWPFTVIFTGGINPLLWRPITAIGSFDLPTYDIEI
TPFLGKILDGKSHKFGFNVTNALNVWYVDANLHLWLDKQSTKTEGKLSKHSSLPLWS
LVSDFKGLNGTFLTRTSRSVSSTGWVKSSYGNITTRSIQDFYYSNSMVLGKDGNMQIVN
QKIIFNDSVYINLPSSYVHSLTSHKTFPLYLYTDFLGQGNGTYLLITNVDLGFIEKKSGLG
FSNSSLRNLRSAEGNMWKNNLWSGLESTQQIYRYDGGKFCYFRNISSSNYTILYDKV
GSKCNKKSLSNLDFVLSRLWPFGARMNFAGLRFT
[0330] The present disclosure is not intended to be limited in scope to the particular disclosed embodiments, which are provided, for example, to illustrate various aspects of the invention. Various modifications to the compositions and methods described will become apparent from the description and teachings herein. Such variations may be practiced without departing from the true scope and spirit of the disclosure and are intended to fall within the scope of the present disclosure. These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
Claims
1. A method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the glycan attachment residue of the N-linked glycosylation site; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the glycan attachment residue of the N-linked glycosylation site, wherein the first binder or the second binder of the set of binders binds to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide treated with the PNGase; and
(c) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the glycan attachment residue, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N- linked glycosylation site.
2. A method for assessing an attachment of an N-linked glycan to a glycan attachment residue of a N-linked glycosylation site of a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves the N-linked glycan from the glycan attachment residue of the N-linked glycosylation site, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) cleaving the polypeptide to generate a cleaved polypeptide comprising the glycan attachment residue of the N-linked glycosylation site as a terminal amino acid (TAA) residue of the cleaved polypeptide;
(c) contacting the immobilized and cleaved polypeptide with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue as the TAA residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue as the TAA residue, wherein the first binder or the second binder of the set of binders binds to the TAA residue of the immobilized and cleaved polypeptide treated with the PNGase; and
(d) determining amino acid identity of the glycan attachment residue of the N-linked glycosylation site treated with the PNGase by obtaining or retaining identifying information regarding the first binder or the second binder that bind to the TAA residue of the immobilized and cleaved polypeptide, thereby determining the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site.
3. A method for analyzing a polypeptide, the method comprising:
(a) contacting the polypeptide with a PNGase, wherein the PNGase cleaves an N-linked glycan from a glycan attachment residue of a N-linked glycosylation site of the polypeptide, and wherein the polypeptide is immobilized on a solid support before or after the contacting with the PNGase, thereby providing the immobilized polypeptide treated with the PNGase;
(b) contacting the immobilized polypeptide treated with the PNGase with a set of binders, wherein each binder of the set of binders is attached to a detectable label that comprises identifying information regarding the binder, and the set of binders comprises: a first binder that binds to a terminal Asn (N) residue; a second binder that binds to a terminal Asp (D) residue; and a third binder that binds to a terminal amino acid (TAA) residue other than Asn
(N) and Asp (D), wherein the first binder, the second binder, or the third binder binds to the TAA residue of the immobilized polypeptide;
(c) cleaving the immobilized polypeptide to expose a new TAA residue;
(d) contacting the immobilized and cleaved polypeptide with the set of binders, wherein the first binder, the second binder, or the third binder binds to the new TAA residue of the immobilized and cleaved polypeptide;
(e) optionally repeating (c) - (d) in one or more cycles;
(f) determining an amino acid sequence of the polypeptide by obtaining or retaining identifying information regarding the binders that bind to the TAA residues in (b), (d), and optionally (e); and
(g) comparing the amino acid sequence determined in (f) to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase, thereby detecting one or more N-linked glycosylation sites in the polypeptide.
4. The method of claim 3, further comprising following binding of a binder of the set of binders to the polypeptide or the cleaved polypeptide, obtaining or retaining identifying information regarding the binder.
5. The method of claim 3 or claim 4, wherein the TAA residues are N-terminal amino acid (NTAA) residues.
6. The method of any one of claims 3-5, wherein the TAA residues are modified.
7. The method of claim 2, wherein the TAA is an N-terminal amino acid (NTAA).
8. The method of claim 2, wherein the TAA is a C-terminal amino acid (CTAA).
9. The method of any one of claims 2-8, wherein the cleavage of the polypeptide is performed by a cleaving enzyme.
10. The method of any one of claims 1-9, further comprising quantifying degree of the attachment of the N-linked glycan to the glycan attachment residue of the N-linked glycosylation site of the polypeptide by determining what fraction of molecules of the polypeptide or the cleaved polypeptide bind to the first binder and/or the second binder based on analysis of the identifying information regarding the first binder and/or the second binder.
11. The method of any one of claims 1-10, wherein in addition to the polypeptide, additional 1000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
12. The method of claim 11, wherein attachments of an N-linked glycan to a glycan attachment residue of the N-linked glycosylation site of each polypeptide of the additional 1000 or more different polypeptides are assessed.
13. The method of any one of claims 1-12, wherein the identifying information regarding the first binder and/or the second binder are analyzed by an optical method.
14. The method of any one of claims 1-12, wherein the identifying information regarding the first binder and/or the second binder are analyzed by a nucleic acid sequencing method.
15. The method of any one of claims 1-14, wherein before contacting with the PNGase, the N-linked glycosylation site comprises any one of the following amino acid sequences:
I l l
AsnXxxSer, AsnXxxThr or AsnXxxCys, wherein Xxx is any standard, naturally occurring amino acid residue.
16. The method of any one of claims 1-15, wherein one or more additional molecules of the polypeptide are (i) immobilized on the solid support; (ii) are not contacted with PNGase; and (iii) analyzed as described in (b) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized polypeptide or the cleaved immobilized polypeptide after the contacting with the PNGase.
17. The method of any one of claims 1-16, wherein in (a), the contacting with the PNGase occurs after the immobilization of the polypeptide to the solid support.
18. The method of any one of claims 13-17, wherein the detectable labels of the first binder and/or the second binder each comprise a fluorescently labeled probe, and the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated from the fluorescently labeled probe.
19. The method of any one of claims 1-17, wherein the detectable labels of the first binder and/or the second binder each comprise an epitope.
20. The method of any one of claims 1-17, wherein the detectable labels of the first binder and/or the second binder each comprise a nucleic acid coding tag that comprises an encoder barcode that comprises identifying information regarding the binder.
21. The method of claim 20, wherein the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ nucleic acid sequencing of the encoder barcode of the first binder or the second binder.
22. The method of claim 20, wherein the amino acid identity of the amino acid residue treated with the PNGase is determined by detecting a signal generated via performing an in situ amplification of at least a portion of the nucleic acid coding tag that comprises the encoder barcode.
23. The method of claim 22, wherein the in situ amplification is a rolling circle amplification.
24. The method of claim 22 or claim 23, further comprising hybridizing a fluorescent oligonucleotide probe to the amplified portion of the nucleic acid coding tag and detecting a signal from the fluorescent oligonucleotide probe.
25. The method of claim 20, wherein (i) the immobilized polypeptide is attached to a nucleic acid recording tag before contacting the immobilized polypeptide treated with the PNGase with the set of binders; and (ii) following binding of the first binder or the second binder to the glycan attachment residue of the N-linked glycosylation site of the immobilized polypeptide, generating
an extended nucleic acid construct comprising nucleic acid sequence information of the encoder barcode of the first binder or the second binder and nucleic acid sequence information of the nucleic acid recording tag attached to the immobilized polypeptide; and wherein obtaining or retaining identifying information regarding the first binder or the second binder comprises determining a nucleic acid sequence of at least a portion of the extended nucleic acid construct, wherein the portion comprises the nucleic acid sequence information of the encoder barcode of the first binder or the second binder.
26. The method of claim 25, wherein the nucleic acid sequence of the portion of the extended nucleic acid construct is determined using a DNA sequencer.
27. The method of any one of claims 1-26, wherein the support is a flow cell.
28. The method of any one of claims 1-27, wherein the determining amino acid identity of the amino acid residue treated with the PNGase comprises determining a Likelihood of a particular type of the amino acid residue.
29. The method of any one of claims 2-28, further comprising determining amino acid identities of one or more additional amino acid residues of the polypeptide treated with the PNGase by utilizing the first binder or the second binder.
30. The method of claim 29, further comprising determining an amino acid sequence of the polypeptide treated with the PNGase based on the determined amino acid identities of the one or more additional amino acid residues of the polypeptide.
31. The method of claim 30, further comprising comparing the amino acid sequence of the polypeptide treated with the PNGase to an amino acid sequence of the polypeptide determined without contacting the polypeptide with the PNGase.
32. The method of any one of claims 1-31, wherein the first binder binds to a terminal Asn (N) residue, and/or second binder binds to a terminal Asp (D) residue.
33. The method of any one of claims 1-32, wherein the set of binders further comprises a third binder that binds to a terminal amino acid (TAA) residue other than Asn (N) and Asp (D).
34. The method of any one of claims 1-33, wherein the method does not comprise use of mass spectrometry.
35. A method for detecting N-linked glycosylation of polypeptides comprising a first polypeptide having a N-linked glycosylation site, the method comprising:
(a) attaching polypeptides comprising the first polypeptide to a solid support, wherein the polypeptides are treated with a PNGase before or after the attachment of the polypeptides to the solid support, thereby obtaining immobilized polypeptides comprising immobilized first polypeptide de-glycosylated at the N-linked glycosylation site;
(b) analyzing sequentially at least some individual amino acid residues of the immobilized polypeptides, wherein the analysis comprises the following steps for each analyzed immobilized polypeptide:
(i) contacting the solid support with a set of binders, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA of a polypeptide immobilized on the solid support;
(ii) following binding of a binder of the set of binders to the polypeptide, obtaining or retaining identifying information regarding the binder;
(iii) removing the TAA or the modified TAA to expose a new TAA, thereby generating a cleaved polypeptide, and, optionally, modifying the new TAA to yield a newly modified TAA; and (iv) repeating steps (i)-(iii) or (i)-(ii) at least one time, wherein the immobilized first polypeptide is analyzed; the set of binders comprises a first binder that specifically binds to Asn TAA and a second binder that specifically binds to Asp TAA; and the first binder or the second binder bind to Asn TAA or Asp TAA being a glycan attachment residue of the N-linked glycosylation site of the immobilized first polypeptide;
(c) determining whether at least some of the immobilized first polypeptide molecules comprise Asp residue as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, thereby detecting N-linked glycosylation in the N-linked glycosylation site of the first polypeptide.
36. The method of claim 35, further comprising quantifying degree of glycosylation for the N-linked glycosylation site of the first polypeptide by determining what fraction of the immobilized first polypeptide molecules comprise Asp TAA as the first residue of the N-linked glycosylation site based on analysis of the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules.
37. The method of claim 35 or claim 36, wherein some of immobilized polypeptide molecules comprising immobilized first polypeptide molecules obtained in (a) are: (i) not treated with the PNGase in (a), and (ii) analyzed as described in (b), followed by utilizing data obtained by the analysis of immobilized polypeptide molecules not treated with the PNGase in (c) to identify a shift from Asn residue to Asp residue as the first residue of the N-linked glycosylation site of the immobilized first polypeptide molecules after treatment with the PNGase.
38. The method of any one of claims 35-37, wherein before (a), the first polypeptide is generated by fragmenting a protein from a sample.
39. The method of any one of claims 35-38, wherein the method does not comprise use of mass spectrometry.
40. The method of any one of claims 35-39, wherein the first polypeptide is a pre-selected target polypeptide.
41. The method of any one of claims 36-40, wherein in addition to the first polypeptide, degrees of glycosylation for 100, 200, 500, 1000, 5000 or more different polypeptides each comprising a N-linked glycosylation site are determined in (c) utilizing the first binder and the second binder.
42. The method of any one of claims 35-41, wherein the TAA is an N-terminal amino acid (NTAA) and the new TAA is a new NTAA.
43. The method of any one of claims 35-41, wherein the TAA is a C-terminal amino acid (CTAA) and the new TAA is a new CTAA.
44. The method of any one of claims 35-43, wherein each binder of the plurality binds to a modified TAA of an immobilized polypeptide.
45. The method of claim 44, wherein the modified TAA is a modified N-terminal amino acid (NTAA).
46. The method of claim 45, wherein the modified NTAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N-terminal modifier agent before contacting the solid support with the set of binders.
47. The method of claim 46, wherein the N-terminal modifier agent is selected from the group consisting of compounds of the following Formula (10)-(l 3):
wherein M is a metal binding group that comprises sulfonamide, hydroxamic acid, sulfamate, or sulfamide; the group
is a 5 or 6 membered aromatic ring containing up to three heteroatoms selected from N, O, and S as ring members, and is optionally substituted by R; R represents one or two optional substituents selected from the group consisting of F, Cl, CH3, CF2H, CF3, OH, OCH3, OCF3, NH2, N(CH3)2, NO2, SCH3, SO2CH3, CH2OH, B(OH)2, CN, CONH2, CO2H, CN4H, and CONHCH3; LG is OH, ORQ, or OCC, each RQ is independently aryl or heteroaryl, each of which is optionally substituted with one or more groups selected from halo, nitro, cyano, sulfonate, carboxylate, alkylsulfonyl, and N of heteroaryl is optionally
oxidized; or RQ can be -C(=O)R or -C(=O)-OR; CC is a cationic counterion; X is one of the following: O, S, Se, or NH.
48. The method of any one of claims 35-47, wherein the first binder and/or the second binder each comprises a peptide or an aptamer.
49. The method of any one of claims 35-48, wherein (i) each binder of the plurality comprises an identifying detectable label; (ii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting the identifying detectable label.
50. The method of claim 49, wherein the identifying detectable label comprises a fluorescent moiety, and the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (c)(ii) is obtained by detecting a signal from the fluorescent moiety.
51. The method of any one of claims 35-50, wherein (i) the immobilized polypeptides comprising the immobilized first polypeptide are each independently attached to a nucleic acid recording tag; (ii) each binder of the plurality is attached to a nucleic acid coding tag that comprises identifying information regarding the binder; and (iii) the identifying information regarding the binder that binds to the TAA or modified TAA of the polypeptide in (b)(ii) is retained in the nucleic acid recording tag attached to the polypeptide upon transfer from the nucleic acid coding tag, wherein the transfer comprises primer extension or ligation.
52. The method of claim 51, wherein the analysis of the identifying information regarding the first binder and/or regarding the second binder retained in step (b) for the immobilized first polypeptide molecules is performed by nucleic acid sequencing of recording tags attached to the first polypeptide molecules and extended upon transfer from the nucleic acid coding tags.
53. The method of any one of claims 35-52, wherein each of the immobilized polypeptides comprising the immobilized first polypeptide in (a) is covalently joined to the solid support.
54. The method of any one of claims 35-53, wherein the solid support is a bead.
55. The method of any one of claims 35-54, wherein in (b), the analysis comprises determining identity of amino acid residues present at the N-linked glycosylation site of different immobilized first polypeptide molecules.
56. The method of any one of claims 35-55, wherein the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, A snXxxThr or AsnXxxCys.
57. The method of any one of claims 35-56, wherein in (b)(iii), the TAA or the modified TAA are removed using an enzyme.
58. The method of any one of claims 35-57, wherein in (a), the treatment with the PNGase occurs after the attachment to the solid support.
59. The method of any one of claims 35-58, wherein in (b), steps (i)-(iii) or (i)-(ii) are repeated at least two, at least three, at least four or more times.
60. The method of any one of claims 35-59, further comprising determining identity of the first polypeptide based on analysis of (i) the identifying information regarding the first binder and/or regarding the second binder obtained or retained in step (b) for the immobilized first polypeptide molecules, and (ii) identifying information regarding at least one additional binder of the set of binders that was bound to some of first polypeptide molecules during step (b).
61. A method for analyzing a polypeptide comprising a N-linked glycosylation site, comprising:
(a) contacting the polypeptide with a PNGase, wherein the polypeptide is attached to a solid support before or after the contacting with the PNGase, thereby immobilizing the polypeptide on the solid support;
(b) determining at least partial amino acid sequence of the immobilized polypeptide, wherein the determining comprises:
(i) contacting the immobilized polypeptide treated with the PNGase with a set of binders,
(ii) cleaving the immobilized polypeptide to generate a cleaved immobilized polypeptide, and
(iii) contacting the cleaved immobilized polypeptide with a subsequent set of binders, wherein each binder of the set of binders and the subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder, wherein the set of binders and/or the subsequent set of binders comprise: a first binder that preferentially binds to an Asn (N) residue over an Asp (D) residue; and a second binder that preferentially binds to the Asp (D) residue over the Asn (N) residue, and wherein the at least partial amino acid sequence is determined by analyzing the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide, thereby analyzing N-linked glycosylation in the polypeptide.
62. The method of claim 61, further comprising quantifying degree of glycosylation for the N-linked glycosylation site of the polypeptide by determining what fraction of the immobilized polypeptide molecules or the cleaved immobilized polypeptide molecules binds to the first binder and/or the second binder at the N-linked glycosylation site based on analysis of the detectable labels attached to the binders.
63. The method of claim 61 or claim 62, wherein in addition to the polypeptide, additional 100, 200, 500, 1000, 5000 or more different polypeptides each comprising a N-linked glycosylation site are analyzed in parallel utilizing the first binder and the second binder.
64. The method of any one of claims 61-63, wherein the method does not comprise use of mass spectrometry.
65. The method of any one of claims 61-64, wherein each binder specifically binds to a terminal amino acid (TAA) or a modified TAA residue of the polypeptide immobilized on the solid support.
66. The method of claim 65, wherein the TAA is an N-terminal amino acid (NTAA).
67. The method of claim 65, wherein the TAA is a C-terminal amino acid (CTAA).
68. The method of any one of claims 61-67, wherein the cleaving step (ii) removes a terminal amino acid (TAA) or a modified TAA residue of the immobilized polypeptide to generate a new TAA residue of the cleaved immobilized polypeptide.
69. The method of claim 68, wherein the cleavage of the TAA or modified TAA residue is performed by a cleaving enzyme.
70. The method of any one of claims 65-69, wherein the modified TAA of the immobilized polypeptide is obtained by modifying the immobilized polypeptide with an N-terminal or C- terminal modifier agent before contacting the polypeptide with the binders.
71. The method of any one of claims 61-70, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed after performing step (iii).
72. The method of any one of claims 61-70, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed before performing step (iii).
73. The method of any one of claims 61-72, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed by an optical method.
74. The method of any one of claims 61-72, wherein the detectable labels attached to the binders that bind to the immobilized polypeptide or the cleaved immobilized polypeptide are analyzed by a sequencing method, such as a nucleic acid sequencing method.
75. The method of any one of claims 61-74, wherein the immobilized polypeptide is covalently joined to the solid support, such as a bead.
76. The method of any one of claims 61-75, wherein the determining comprises determining identity of at least one amino acid residue present at the N-linked glycosylation site of different immobilized polypeptide molecules.
77. The method of any one of claims 61-76, wherein the N-linked glycosylation site comprises any one of the following sequences: AsnXxxSer, AsnXxxThr or AsnXxxCys.
78. The method of any one of claims 61-77, wherein in (a), the contacting with the PNGase occurs after the attachment to the solid support.
79. The method of any one of claims 61-78, wherein the set of binders is essentially the same as the subsequent set of binders.
80. The method of any one of claims 61-79, wherein after step (iii), steps (ii)-(iii) are repeated one or more times by cleaving the cleaved immobilized polypeptide generating a further cleaved immobilized polypeptide, and contacting the further cleaved immobilized polypeptide with a new subsequent set of binders, wherein each binder of the new subsequent set of binders specifically binds to an amino acid residue or sequence and is attached to a detectable label that comprises identifying information regarding the binder.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463647550P | 2024-05-14 | 2024-05-14 | |
| US63/647,550 | 2024-05-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240502A1 true WO2025240502A1 (en) | 2025-11-20 |
Family
ID=97720630
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029176 Pending WO2025240502A1 (en) | 2024-05-14 | 2025-05-13 | High-throughput analysis of n-linked glycosylation site occupancy in proteins and peptides |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240502A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9896713B2 (en) * | 2013-12-25 | 2018-02-20 | Tosoh Corporation | Method for determining site having N-linked sugar chain added thereto or proportion of said addition |
| US20220127754A1 (en) * | 2019-01-21 | 2022-04-28 | Encodia, Inc. | Methods and compositions of accelerating reactions for polypeptide analysis and related uses |
| WO2025102003A1 (en) * | 2023-11-09 | 2025-05-15 | Encodia, Inc. | Methods for identifying polypeptides |
-
2025
- 2025-05-13 WO PCT/US2025/029176 patent/WO2025240502A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9896713B2 (en) * | 2013-12-25 | 2018-02-20 | Tosoh Corporation | Method for determining site having N-linked sugar chain added thereto or proportion of said addition |
| US20220127754A1 (en) * | 2019-01-21 | 2022-04-28 | Encodia, Inc. | Methods and compositions of accelerating reactions for polypeptide analysis and related uses |
| WO2025102003A1 (en) * | 2023-11-09 | 2025-05-15 | Encodia, Inc. | Methods for identifying polypeptides |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12292446B2 (en) | Kits for analysis using nucleic acid encoding and/or label | |
| US12019078B2 (en) | Macromolecule analysis employing nucleic acid encoding | |
| KR102567902B1 (en) | Modified Cleivases, Their Uses and Related Kits | |
| US20240409995A1 (en) | Single-molecule peptide sequencing through molecular barcoding and ex-situ analysis | |
| US20230016396A1 (en) | Methods of polypeptide sequencing | |
| US20250101496A1 (en) | Methods for balancing encoding signals of analytes | |
| US12259393B2 (en) | Protein sequencing via coupling of polymerizable molecules | |
| WO2022072560A1 (en) | Polypeptide terminal binders and uses thereof | |
| US20240158829A1 (en) | Methods for biomolecule analysis employing multi-component detection agent and related kits | |
| CN114127281A (en) | Proximity interaction analysis | |
| WO2025240502A1 (en) | High-throughput analysis of n-linked glycosylation site occupancy in proteins and peptides | |
| US12474346B2 (en) | Methods and kits for multicycle encoding assay | |
| US20250102513A1 (en) | Kits for analysis using nucleic acid encoding and/or label |