[go: up one dir, main page]

WO2025151826A1 - Determination of protein information by recoding amino acid polymers into dna polymers with metadata tagging - Google Patents

Determination of protein information by recoding amino acid polymers into dna polymers with metadata tagging

Info

Publication number
WO2025151826A1
WO2025151826A1 PCT/US2025/011251 US2025011251W WO2025151826A1 WO 2025151826 A1 WO2025151826 A1 WO 2025151826A1 US 2025011251 W US2025011251 W US 2025011251W WO 2025151826 A1 WO2025151826 A1 WO 2025151826A1
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
amino acid
recode
tag
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/011251
Other languages
French (fr)
Inventor
Christopher Macdonald
Michael Graige
Brian Mather
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Abrus Bio Inc
Original Assignee
Abrus Bio Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abrus Bio Inc filed Critical Abrus Bio Inc
Publication of WO2025151826A1 publication Critical patent/WO2025151826A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6818Sequencing of polypeptides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6804Nucleic acid analysis using immunogens
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B20/00Methods specially adapted for identifying library members
    • C40B20/04Identifying library members by means of a tag, label, or other readable or detectable entity associated with the library members, e.g. decoding processes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2440/00Post-translational modifications [PTMs] in chemical analysis of biological material

Definitions

  • the present disclosure relates to or includes compositions of matter, methods, and systems for analyzing polymeric macromolecules, including peptides, polypeptides, and proteins, in a highly- parallel and high-throughput manner via recoding their sequences into DNA polymers.
  • assay methods such as assay methods including dynamic range compression.
  • assay methods comprising: contacting peptides with a binding agent that targets or binds to a high-abundance peptide of the peptides, wherein the binding agent comprises a nucleic acid having a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme; and performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides, wherein the reverse translation protein sequencing comprises: incorporating the nucleic acid sequence with said recognition site for a restriction enzyme or complement thereof into the nucleic acid sequence representing a high-abundance peptide, introducing a restriction enzyme to the nucleic acid sequences; thereby cleaving nucleic acid sequence representing high-abundance peptides and thereby depleting representation of high-abundance peptides from among the
  • a method for analyzing a peptide having a post- translational modification comprising: contacting a peptide having a post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a sequence representing the post-translational modification; transferring the sequence information or its complement to the peptide or a position proximal to the immobilized peptide; removing the post-translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing the peptide, wherein the information representing said post-translational modification is incorporated into the nucleic acid sequence, and the nucleic acid sequence is sequenced.
  • Performing reverse translation protein sequencing may comprise performing a method herein such as a method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support.
  • a method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support comprising: (a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) performing (bl) or (b2): (bl) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N- terminal amino acid residue of the peptide; and (z) a immobilizing moiety for immobilization to the solid support; or (b2) coupling the N-terminal amino acid with a first reactive
  • an ITC conjugate and providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety (e.g.
  • Some embodiments include: based on the obtained sequence information, determining identity and positional information of the N-terminal modification or post-translational modification.
  • the peptide is immobilized by being bound to a solid support.
  • the solid support comprises a bead, plate, chip, slide, glass, silica, resin, gel, hydrogel, membrane, polystyrene, metal, nitrocellulose, mineral, plastic, polyacrylamide, latex, or ceramic.
  • FIG. 9 schematically illustrates the assembly of an exemplary configuration of a recode block, according to embodiments of the present disclosure.
  • FIG. 10 schematically illustrates the assembly of an exemplary configuration of a recode block, according to embodiments of the present disclosure.
  • FIG. 11 schematically illustrates the transfer of amino acid identity information from a binding agent’ s recode tag to an immobilized conjugate’ s cycle tag to form a recode block via ligation, according to embodiments of the present disclosure.
  • FIG. 12 schematically illustrates an iterative process for assembling recode blocks, according to embodiments of the present disclosure.
  • FIG. 13 schematically illustrates the relative sizes of various constituents in the process of FIG. 3, according to embodiments of the present disclosure.
  • FIG. 15A-15B schematically illustrate assembly of a recode block according to some methods described herein.
  • FIG. 15A shows one embodiment where the cycle tag oligonucleotide of an immobilized amino acid complex is utilized to splint the ligation of a cognate recode tag oligonucleotide and a solution oligonucleotide. In some embodiments, no ligation occurs without a cognate binding agent interaction with an isolated amino acid complex.
  • FIG. 15A shows one embodiment where the cycle tag oligonucleotide of an immobilized amino acid complex is utilized to splint the ligation of a cognate recode tag oligonucleotide and a solution oligonucleotide. In some embodiments, no ligation occurs without a cognate binding agent interaction with an isolated amino acid complex.
  • FIG. 15A shows one embodiment where the cycle tag oligonucleotide of an immobilized amino acid complex is utilized to splint the lig
  • FIG. 18 schematically illustrates immobilization of multiple isolated amino acid complexes having cycle and amino acid identity information to a surface in proximity to the immobilization point of an immobilized peptide.
  • FIG. 19 schematically illustrates the assembly of a single memory oligo for subsequent DNA sequencing analysis, according to embodiments of the present disclosure.
  • FIG. 23 schematically illustrates the release of memory oligos and conjugate complexes from a solid support, according to embodiments of the present disclosure.
  • FIG. 39 schematically illustrates a 2-step assembly of a CRC with the N-terminus of a peptide.
  • FIG. 40 schematically illustrates a 2-step assembly of a CRC with the N-terminus of a peptide.
  • FIG. 41 Illustrates an ITC-conjugate (ITCC) having an amine -reactive moiety (RG) for binding the NTAA and a functional group (X) for coupling to a chemically-reactive conjugate (CRC) and 2) a chemically-reactive conjugate comprising a cycle tag (Ct), an immobilizing moiety (SI), and a functional group (X’) that can couple with the ITC-conjugate.
  • ITC-conjugate ITC-conjugate
  • RG amine -reactive moiety
  • X for coupling to a chemically-reactive conjugate
  • CRC chemically-reactive conjugate
  • SI immobilizing moiety
  • X functional group
  • FIG. 44 Shows functionality of PPO. Relative fluorescence units of a fluorescent oligo complementary to the oligo on PPO immobilized to an azide-modified surface via Cu-catalyzed Huisgen cycloaddition.
  • FIG. 45 shows functionality of PPO: Relative fluorescence units (RFU) of PPO immobilized to a different azide-modified surface via Cu-catalyzed Huisgen cyclo addition followed by reaction with amine-labelled fluorescein.
  • REU Relative fluorescence units
  • FIG. 46A-46D show example simulations binding of a commercially-available binder to an immobilized PTH-ligand.
  • FIG. 51 illustrates the steps of an exemplary workflow to deplete DNA information associated with targeted protein(s) from a population of DNA representing the proteins of a sample.
  • the process may be applied to polymeric macromolecules of a biological sample, including polymeric macromolecules such as peptides and proteins of a blood plasma sample, according to embodiments of the present disclosure.
  • FIG. 52 schematically illustrates exemplary operations of the process in FIG. 51, according to embodiments of the present disclosure.
  • a metadata conjugate detects abundant protein to be depleted, whether intact protein, denatured protein, or modified protein (e.g. human albumin has unique 4 amino acid sequence at the N-terminus).
  • the depletion target is incorporated into a memory oligo which is subsequently depleted from the DNA sample prior to NGS sequencing by the action of a restriction endonuclease.
  • FIG. 53 schematically illustrates various post-translational modifications of proteins, and more particularly, N-terminal modifications thereof, as examples of molecular attributes that can be identified, in addition to sequence, by the novel compositions, systems, and methods for polymeric macromolecule analysis described herein.
  • FIG. 55 schematically illustrates exemplary operations of the process in FIG. 54, according to embodiments of the present disclosure.
  • a metadata conjugate may detect an N-terminal modification such as N-terminal ubiquitination, and a metadata tag may be immobilized for subsequent incorporation into a memory oligo that also contains amino acid sequence information of the associated protein peptide or macromolecule. Afterwards, the N-terminal modification may be removed to allow reverse translation of the peptide sequence and analysis.
  • FIG. 56 illustrates a concept of Peptide Identification Numbers. Random degenerate bases placed at a priori defined positions within a recode block following assembly into a memory oligo may be analyzed in silico to provide a uniquely identifying sequence.
  • non-shaded areas represent nucleotide cycle codes, and shaded areas represent amino acid codes, m represents a metatag.
  • 1 represents any recode block, i is any other recode block, j is any appropriate integer, k is any appropriate integer not j, and N# represents a degenerate random base.
  • FIG. 57 schematically illustrates an exemplary mechanism for modifying a carrier protein or peptide to render it unresponsive to sequencing operations, according to embodiments of the present disclosure.
  • the assay methods are useful for reverse translation protein sequencing of peptides.
  • the assay methods may include tagging or identifying nucleic acid sequences representing high-abundance peptides, and cleaving the nucleic acid sequences representing high-abundance peptides.
  • Some embodiments relate to methods for analyzing post-translationally modified peptides. Such analysis may include reverse translation protein sequencing of the post-translationally modified peptides, and tagging or identifying sequences representing the post-translationally modified peptides or post-translationally modified amino acids of the post-translationally modified peptides.
  • Some methods and compositions described herein may be useful for determining identity and positional information of an amino acid residue of a peptide.
  • the peptide may be coupled to a solid support, contacted with a chemically-reactive conjugate which cleaves an N-terminal amino acid of the peptide and couples the N-terminal amino acid to the solid support with a cycle tag.
  • This may then be contacted with a binding agent, such as one specific for the N-terminal amino acid.
  • the binding agent may include a recode tag.
  • the cycle tag and recode tag may include nucleic acid information which may be sequenced to obtain the identity and positional information of the N-terminal amino acid. The process may be repeated for various amino acids of the peptide.
  • positional and information of amino acid residues of proteins may be recoded using nucleic acids and obtained upon sequencing the nucleic acids.
  • the PTC-modified amino group is treated with acid (typically anhydrous TFA) to yield an ATZ-modified (2-anilino-5(4)-thiozolinone) amino acid, separating the amino acid from the polymer and creating a next N-terminus on the polypeptide.
  • acid typically anhydrous TFA
  • ATZ-modified (2-anilino-5(4)-thiozolinone) amino acid separating the amino acid from the polymer and creating a next N-terminus on the polypeptide.
  • the cyclic ATZ-amino acid is converted to a PTH-amino acid derivative and analyzed via chromatography.
  • FIG. 1 illustrates a segmentation of the field of proteomics by technology.
  • the current landscape for proteomic analysis includes the following general approaches: 1) Edman degradation followed by chromatography; 2) fragmentation followed by advanced separation and mass spectroscopy techniques; and 3) recognition of proteins via affinity molecules. While these (and other) approaches can provide useful information for researchers, they do not provide such information at the scale, throughput, or cost needed to unlock transformative applications in research, diagnostics, or therapeutics.
  • Some more particular challenges associated with current approaches include: a. Protein folding is dynamic, and proteins can lose their characteristic shape. When they do, recognition-based methods become inaccurate.
  • Some embodiments include determining identity information of an amino acid residue of a peptide. Some embodiments include positional information of an amino acid residue of a peptide.
  • the peptide may be coupled to a solid support. Some embodiments include providing the peptide. Some embodiments include coupling the peptide to the solid support. The peptide may be coupled to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support. The N-terminal amino acid (NTAA) residue may be exposed to reaction conditions. Some embodiments include providing a chemically-reactive conjugate (CRC).
  • the CRC may include a cycle tag.
  • the cycle tag may include a cycle nucleic acid, which may be associated with a cycle number.
  • the CRC may include a reactive moiety.
  • the reactive moiety may bind the NTAA.
  • the reactive moiety may cleave the NTAA.
  • the CRC may include an immobilizing moiety, which may be for immobilization to the solid support.
  • Some embodiments include coupling the N-terminal amino acid with a first reactive moiety (e.g. ITC) or a first reactive moiety conjugate (e.g. an ITC conjugate).
  • Some embodiments include providing a CRC comprising a second reactive moiety that binds the first reactive moiety or first reactive moiety conjugate. The second reactive moiety may react with the first reactive moiety or first reactive moiety conjugate.
  • Some embodiments include contacting the peptide with the CRC.
  • Some embodiments include coupling the CRC to the NTAA of the peptide to form a conjugate complex. Some embodiments include coupling the CRC to the first reactive moiety or first reactive moiety conjugate coupled to the NTAA of the peptide to form the conjugate complex. Some embodiments include immobilizing the conjugate complex to the solid support, e.g. via the immobilizing moiety. Some embodiments include cleaving and thereby separating the N-terminal amino acid residue from the peptide. Some embodiments include exposing the next amino acid residue as an NTAA residue on the cleaved peptide. Some embodiments include providing an immobilized amino acid complex. The immobilized amino acid complex may include the cleaved and separated N- terminal amino acid residue.
  • Some embodiments include contacting the immobilized amino acid complex with a binding agent.
  • the binding agent may include a binding moiety, which may preferentially bind to the immobilized amino acid complex.
  • the binding agent may include a recode tag.
  • the recode tag may include a recode nucleic acid.
  • the recode nucleic acid may correspond with the binding agent.
  • Some embodiments include forming an affinity complex.
  • the affinity complex may include an immobilized amino acid complex.
  • the affinity complex may include a binding agent.
  • Some embodiments include bringing a cycle tag into proximity with a recode tag.
  • Some embodiments include bringing a cycle tag into proximity with a recode tag within each formed affinity complex.
  • Some embodiments include transferring information of a recode tag.
  • Some embodiments of the methods described herein include any of the following steps: 1: binding to substrate; 2: functionalized PITC conjugation to amino acid; 3: immobilization of PITC conjugate to hydrogel substrate; 4: cleavage of amino acid via Edman degradation; 4a: nucleotide deprotection; 5: build recode blocks with binders; 6: memory oligo assembly; and 7: release of oligo for sequencing.
  • FIG. 3 schematically illustrates various operations of the workflow of FIG. 2, according to embodiments of the present disclosure. More particularly, FIG. 3 illustrates primary stages of the “recoding” operations of FIG. 2 as a process 300. As shown, there are three distinct and separable stages for the recoding process 300, and each stage is depicted in a row of operations.
  • PTC conjugate reacts with the N-terminal amino acid to form a phenylthiocarbamoyl-amino acid (PTC) conjugate.
  • a stringent wash removes unreacted PITC conjugate, and then, at operation 3, activation of an orthogonal chemistry used to tether the conjugate to the support is initiated to immobilize the PTC conjugate in proximity to the anchor point of the associated analyte.
  • PTC- thiol conjugates or PTC-alkyne conjugates may be immobilized to the solid support.
  • the second row of FIG. 3 depicts an operation of the iterative process whereby recode blocks are built.
  • amino acid information is associated with cycle information.
  • a plurality of binding agents that recognize the immobilized conjugate-AA-cycle tag complexes are introduced at operation 5a and bind to their cognate target at operation 5b.
  • the binding agents are engineered to preferentially recognize specific conjugates based on differences in the cognate amino acid of the immobilized conjugate. Those agents that possess both the cognate AA and the cognate cycle information will thereby direct ligation of AA information to a cycle tag of the corresponding conjugate complex (operations 5c and 5d).
  • the formed recode blocks are assembled into a memory oligo (e.g. combined into a single memory oligo).
  • This oligo is capable of being amplified on the solid support or in solution, then analyzed using DNA sequencing methods to determine a sequence and/or abundance of the immobilized analytes.
  • the co-localized recode blocks interact based on their complementary DNA sequences to assemble a DNA oligonucleotide that represents the sequence of the original macromolecule.
  • the process is similar to g-block assembly of a gene product. Assembly may be facilitated by a polymerase extension-ligation process, or by a ligation process.
  • the surface is seeded with macromolecule analytes such that, predominantly, they are spatially separated and reactants that interact with one macromolecule do not interact with another.
  • a volume element is defined by the radius circumscribed by the length of the macromolecular polymer, and the lengths of the linkers of the conjugate complexes and the binding agents.
  • FIG. 5 schematically illustrates the interaction of chemically-reactive conjugates with terminal amino acids of immobilized peptides during operation 2 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure.
  • the conjugate has 3 functions: 1) bind to a terminal amino acid and cleave the peptide bond between the terminal amino acid and the next amino acid in the polymer (for N-terminal reactions, this is equivalent to the classical function of Edman’s reagent); 2) immobilize the conjugate to the solid support; and 3) carry a cycle tag oligo. Note the 1:1 relationship of an immobilized peptide with a conjugate in a given cycle.
  • Conjugates that react with and become bound to the terminal amino acid in operation 2 of the recoding process 300 are shown as filled triangles in FIG. 5.
  • the PITC conjugate reacts with the N-terminal amino acid to form a phenylthiocarbamoyl-amino acid (PTC) conjugate.
  • Unreacted conjugates are shown as open triangles. Any unreacted PITC conjugate can be washed from the surface of the solid support prior to triggering the chemistry that joins the PTC-conjugate to the surface.
  • FIG. 6 schematically illustrates the immobilization of chemically-reactive conjugates onto a solid support during operation 3 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure.
  • the conjugate immobilization reaction can be triggered by light, catalyst addition, or by modifying the buffer properties or temperature to control the rate of reaction. For example, reducing redox potential allows formation of stable disulfide linkages.
  • a stringent wash removes unreacted PITC conjugate, and then activation of an orthogonal chemistry used to join the conjugate to the solid support is initiated to immobilize PTC conjugate in proximity to the anchor point of an associated peptide.
  • PTC-thiol conjugates or PTC- alkyne conjugates may be immobilized to the solid support.
  • the length of the peptide defines a volume element around the anchor point with the support, and conjugates associated with the specific peptide are co-localized to that anchor point.
  • a conjugate-reactive scavenger may be added to cap the reactivity of residual conjugate that was not reacted to an N-terminal amino acid, was incompletely washed, and became attached to the solid support.
  • FIG. 7 schematically illustrates the cleavage of terminal amino acids (e.g., the cleavage of peptide bonds) at operation 4 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure.
  • degradation chemistry this is accomplished by a change in pH from basic to harsh acidic conditions, sometimes in organic solvents. Accordingly, the hydrogel and conjugation reactions are designed to withstand the peptide bond cleavage conditions. Also for this reason, the cycle tag nucleic acids and any other nucleic acids immobilized to the solid support will comprise protecting groups that prevent degradation of amines or other reactive moieties of the nucleic acid.
  • FIG. 8 schematically illustrates the result of iteratively repeating the operations of FIG. 5-7, according to embodiments of the present disclosure. More particularly, FIG. 8 illustrates the iteration of operations 2-4 of the recoding process 300.
  • a series of co-localized conjugates each having a cycle tag that carries information related to the relative position of an amino acid in one immobilized peptide analyte, are spatially isolated from the conjugates of other peptide analytes. The details of creation of each conjugate is independent of the information carried by the conjugate.
  • immobilized conjugates carrying information derived from carboxy-terminus chemistry and immobilized conjugates carrying information derived from amine-terminus chemistry may be combined in downstream steps.
  • Binding agents that possess both the cognate AA and the cognate cycle information will direct ligation of AA information to the cycle tag, as shown in the bottom panel.
  • components for recognition of all AA conjugates for all cycles are present simultaneously to concurrently create recode blocks. Discrimination may be enhanced under “competitive” conditions in concert with slow annealing to find global max binding energies for the combined affinity binding moiety and the nucleic acids. Note that recognition of the immobilized PTC-conjugate on the solid support avoids near-neighbor effects from amino acids that were adjacent on the original peptide.
  • FIG. 10 schematically illustrates preparatory operations of an exemplary process for assembling the recode block of FIG. 9, according to embodiments of the present disclosure.
  • the bottom panel in FIG. 10 shows a binding agent comprising a binding moiety and a recode tag, as well as a conjugate having a cycle tag.
  • binding agents and immobilized conjugates There are several possible interactions between binding agents and immobilized conjugates that may exist, since binding agents for recognition of all AA conjugates for all cycles are present simultaneously.
  • Stringent wash conditions remove weakly bound binding agents from the surface. These may be due to cross-reactive binding of binding moieties to non-cognate PTC-AA-cycle tag conjugate complexes, and include interactions classified as either (c), (d) or (e).
  • FIG. 21 schematically illustrates various oligonucleotide constituents within a sample volume during recode block assembly, according to embodiments of the present disclosure. Accounting for interactions and tuning reaction conditions facilitates accurate and complete assembly during the recoding process 300.
  • element surrounding the anchor point of a protein or peptide will exist immobilized PTC-AA-cycle tag-conjugate complexes that have: (a) same AA, different cycle information, and (b) different AA, different cycle information, BUT no complexes with (c) different AA, same cycle information or (d) same AA, same cycle information.
  • “Group 1” constituents are present to support assembly of cycle 1 information.
  • the effective concentrations of constituents are high due to the co-localization within the volume element defined by the length of the macromolecular analyte and the length of the linkers of the associated recode blocks.
  • the complexity of oligos is not high.
  • cycle codes Cl, C2, . . .Cn
  • amino acid codes AA1, AA2, . . .AAn
  • mismatch ligation is unlikely. Note that even “incorrect assembly” that results from mismatch ligation produces an oligo with useful macromolecular analyte sequence information, since the cycle information flanks the amino acid information.
  • FIG. 23 schematically illustrates the release of memory oligos and conjugate complexes from a solid support at operation 8 of the recoding process 300, according to embodiments of the present disclosure.
  • An exemplary memory oligo is shown in FIG. 23 having p7 and P5 adapters.
  • the memory oligo may also comprise a sample index, a UMI, a CRISPR PAM or spacer sequence, or other identifying nucleic acid sequence that may be incorporated during the NGS library preparation steps. Cleaving the tethers (or a subset of tethers) to the solid support is an optional step to improve the efficiency of PCR extensions involving the memory oligo.
  • Conjugate removal from the surface is an optional process to clean-up the solid support prior to its use for downstream steps such as cluster generation, and NGS sequencing.
  • FIG. 23 reduction of a disulfide bond is depicted, which can be mediated by addition of dithiothreitol to a solution contacting the support surface.
  • starter and terminator sequences may be designed to include sequencing adapters (e.g. P5 and P7) and sequencing primer binding sites directly in their sequences. Including these elements is useful for simplifying downstream library preparation for sequencing by eliminating additional adapter ligation steps.
  • a starter sequence may comprise, from 5' to 3': a P5 adapter sequence, a sequencing primer binding site, and a universal assembly sequence (U-AS).
  • a terminator sequence can include a U-AS complement, a sequencing primer binding site, and a P7 adapter sequence. This design can ensure that fully assembled memory oligos automatically contain any necessary sequences for cluster generation and sequencing.
  • Some embodiments include a terminator sequence.
  • the terminator sequence may include an assembly sequence complement such as a U-AS complement.
  • the terminator sequence may include a primer binding site such as a sequencing primer binding site.
  • the terminator sequence may include an adapter sequence such as a P7 adapter sequence.
  • the terminator sequence may include the assembly sequence, primer binding site, and adapter sequence.
  • oligo designs for a cycle tag, recode tag, recode block, and/or memory oligo may include CRISPR PAM and spacer sequences (or other) specific to albumin, e.g., NGG, Cl-AAtagMet-C2-AAtagL ys , to preferentially deplete recoded albumin peptide sequences via cutting of the memory oligo amplicon with a CRISPR nuclease or other enzyme.
  • FIG. 29A-29B depict fluorescence values obtained throughout execution of steps 1 through 4 of FIG. 3.
  • Relative fluorescence units (RFU) of fluorescent oligonucleotides complementary to cycle tags mark progress through advancing steps of the method.
  • each bar shows a measurement of fluorescence in an advancing step.
  • Bars 1 demonstrate minimal autofluorescence of the peptide and solid support used in the study.
  • Bars 3 demonstrate capture of fluorescent oligonucleotides by CRCs immobilized to the solid support via the reaction of their reactive moiety (PITC) with the N-terminal amino acid of immobilized peptides.
  • PITC reactive moiety
  • Low signal for bars 2 supports that signal is not related to unbound fluorescent oligonucleotides in solution, and is consistent with a signal emanating from fluorescent oligos captured by CRCs reacted to immobilized peptides on solid support.
  • Bars 4 demonstrate the signal from fluorescent oligos released from the surface upon exposing the surface to mild chemical conditions that promote dehybridization of oligonucleotides. Relative values for bars 3 and 4 can be explained by a difference in volume during the measurements. Bars 5 corroborate the dehybridization of fluorescent oligonucleotides from the surface.
  • CRCs of sample B were immobilized to the surface via a Cu-catalyzed Huisgen cycloaddition reaction. Also, between measurement of bars 5 and bars 6, the surface was subjected to anhydrous acid under conditions that support cleavage of the N-terminal amino acid, exposing the next amino acid residue as a N- terminal amino acid residue on the cleaved peptide. Bars 6 show the progression of contacting a second CRC having a different cycle tag sequence to the surface via the reactive moiety (PITC). The CRC will be reactive toward newly exposed N-terminal amino acids of the immobilized peptides following the cleavage of the first N-terminal amino acid with acid.
  • PITC reactive moiety
  • Bars 8 demonstrates capture of the new fluorescent oligo by CRC immobilized to the solid support via the reaction of its reactive moiety (PITC) with the new terminal amino acid of an immobilized peptide.
  • Low signal for bars 7 supports that signal is not related to unbound fluorescent oligonucleotides in solution, and is consistent with a signal emanating from fluorescent oligos captured by the CRC reacted to the immobilized peptide on solid support.
  • Bars 9 demonstrates the signal from fluorescent oligos released from the surface upon exposing the surface to mild chemical conditions that promote dehybridization of oligonucleotides. Relative values for bars 8 and 9 can be explained by difference in volume during the measurements.
  • Bars 10 corroborate the dehybridization of fluorescent oligonucleotides from the surface.
  • the progression of fluorescence signals confirms reaction, capture, and cleavage of a N- terminal amino acid residue of a peptide using matter and methods disclosed within.
  • Strong signals for bars in steps 3, 4, 8, and 9 confirm functionality of the CRC to perform steps 2-4 of Fig. 3.
  • FIG. 29B each bar shows fluorescence in an advancing step of the method.
  • the conditions and conclusion are the same as for bars B of FIG. 29A, with the exception that the starting azide-functionalized silica surface was supplied by a commercial source.
  • FIG. 30 schematically illustrates how efficiency of memory oligo assembly may be adjusted, according to the methods described herein.
  • the large sphere in FIG. 30 represents a volume as defined by the length of an analyte polymer, e.g., an amino acid polymer.
  • an analyte polymer e.g., an amino acid polymer.
  • Within the large sphere are many smaller spheres. Each of these smaller spheres may represent a volume as defined by the binding agents and conjugates utilized during the recoding process, and more particularly the binding agents and conjugates utilized during operation 5 described above. Such volume is primarily dependent on the linker lengths of both binding agents and conjugates.
  • the polymer (representative larger sphere) may be collapsed via known polymer collapse mechanism, such as those described in, e.g., Leonid Lonov, Hydrogel-based actuators: possibilities and limitations, Materials Today, 17,10, 494 (2014), which is herein incorporated in its entirety.
  • the binding agents and conjugates (representative smaller spheres) may be expanded, e.g., by utilizing expandable spacers, linking oligos, and/or deconvolution of rare events in silica, as described elsewhere herein, thereby facilitating communication of neighboring recode blocks.
  • the recode blocks may be linked in any sequential order to create a memory oligo.
  • Expandable spacers may include molecules that comprise multiple thiol groups. When disulfide bonds are formed, the range of the spacer is shortened, and when the cross-linkers are reduced, e.g., by addition of DTT, the spacer range is increased.
  • FIG. 31 illustrates the utilization of universal sequences to facilitate linking of recode blocks during memory oligo assembly without regard to any specific order, according to embodiments of the present disclosure.
  • recode blocks may be linked in any sequential order to create a memory oligo. This is due to cycle information being immediately adjacent to amino acid information in assembled recode blocks, regardless of whether the recode blocks are in sequential or non-sequential order within a memory oligo. While assembly of recode blocks in the correct sequential order of an analyte may be efficient, the adjacent nature of the cycle and amino acid information in the recode blocks may cause redundancies. Thus, to avoid these redundancies while also relaxing the criteria for memory oligo assembly, the recode blocks may be assembled in random order.
  • universal assembly sequences may be utilized during the recoding process. Such universal sequences may be attached to the 5’ and/or 3’ ends of cycle tags and/or recode tags prior to introduction of these tags to the anchored analyte(s). Attaching complementary universal sequences to two or more cycle tags and/or recode tags facilitates the random linking (e.g., ligation) of resulting recode blocks during memory oligo assembly, without regard to sequential order, and a correct macromolecule analyte sequence may be assigned during post-sequencing analysis.
  • FIG. 32 schematically illustrates transfer of information from a location oligo to a recode block.
  • a peptide is attached to a solid support via a location linker, which may include any molecule configured to attach the peptide to the solid support, and further configured to bind to a nucleic acid.
  • the nucleic acid can include any suitable type of nucleic acid sequence that carries code information related to the location of immobilized PTC-conjugates isolated on the solid support.
  • the nucleic acid could be directly joined to hydrogel.
  • This nucleic acid may be referred to as the “location oligo.”
  • Location oligos may be attached to location linkers before or after binding of the peptide thereto, and/or before or after immobilization of the peptide to the solid support.
  • a PCR-like thermal cycling process may be performed to sequentially transfer location oligo information via polymerase extension onto a plurality of proximal recode blocks.
  • the transfer of information involves a quantitative polymerase chain reaction. In some embodiments, the transfer of information involves a ligase chain reaction. In some embodiments, the transfer of information involves a helicase-dependent amplification. In some embodiments, the transfer of information involves a strand displacement amplification. In some embodiments, the transfer of information involves a loop-mediated isothermal amplification. In some embodiments, the transfer of information involves a rolling circle amplification. In some embodiments, the transfer of information involves a recombinase polymerase amplification. In some embodiments, the transfer of information involves a nicking enzyme amplification reaction. In some embodiments, the transfer of information involves a whole genome amplification.
  • the transfer of information involves a transcription-mediated amplification. In some embodiments, the transfer of information involves a multiple displacement amplification. In some embodiments, the transfer of information involves a multiple annealing and looping-based amplification cycles. In some embodiments, the transfer of information involves a nucleic acid sequence-based amplification.
  • said transferring information comprises joining the recode nucleic acid or a reverse complement of the recode nucleic acid with the cycle nucleic acid. In some embodiments, said transferring of information comprises joining the recode nucleic acid or a reverse complement of the nucleic acid with a complement of the cycle nucleic acid. In some embodiments, said transferring of information comprises joining the recode nucleic acid or a reverse complement of the nucleic acid with a complement separate nucleic acid that corresponds to a cycle nucleic acid.
  • a recode nucleic acid or a reverse complement thereof may be joined with a cycle nucleic acid or a reverse complement thereof.
  • the joining may form a recode block, or may be used to form a memory oligonucleotide.
  • the joining may be included in a method described herein, such as a method for determining protein information such as amino acid location or identity.
  • joining comprises enzymatic ligation. In some embodiments, joining comprises splint ligation. In some embodiments, joining comprises chemical ligation. In some embodiments, joining comprises template-assisted ligation. In some embodiments, joining comprises the use of a ligase enzyme. In some embodiments, joining comprises the use of a splint oligonucleotide. In some embodiments, joining comprises the use of a catalyst. In some embodiments, joining comprises the use of a bridging molecule. In some embodiments, joining comprises the use of a condensation agent. In some embodiments, joining comprises the use of a coupling reagent. In some embodiments, joining comprises the use of a polymerase enzyme.
  • joining comprises the use of a complementary nucleic acid sequence. In some embodiments, joining comprises the use of a nicking enzyme. In some embodiments, joining comprises the use of a nucleic acid modifying enzyme. In some embodiments, joining comprises the use of a recombinase. In some embodiments, joining comprises the use of a strand-displacing polymerase. In some embodiments, joining comprises the use of a singlestrand binding protein. In some embodiments, joining comprises a click chemistry reaction. In some embodiments, joining comprises a phosphodiester bond formation. In some embodiments, joining comprises a peptide nucleic acid-mediated ligation. In some embodiments, each binding agent comprises recode tags with a unique nucleic acid sequence. In some embodiments, a plurality of binding agents comprises recode tags with the same nucleic acid sequence. In some embodiments, binding agents comprises recode tags which may have a unique sequence portion and a common sequence portion.
  • joining the recode nucleic acid or a sequence of the recode nucleic acid with the cycle nucleic acid or a sequence of the cycle nucleic acid to generate a recode block comprises: (i) joining the recode nucleic acid with the cycle nucleic acid, (ii) joining the recode nucleic acid with a sequence of the cycle nucleic acid, (iii) joining a sequence of the recode nucleic acid with the cycle nucleic acid, or (iv) joining a sequence of the recode nucleic acid with a sequence of the cycle nucleic acid.
  • Some embodiments include performing a nucleic acid sequence-based amplification to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. Some embodiments include performing polymerase chain reaction (PCR) to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid.
  • PCR polymerase chain reaction
  • the PCR comprises real-time PCR, digital PCR, multiplex PCR, nested PCR, hot-start PCR, touchdown PCR, or quantitative PCR.
  • Some embodiments include performing or conducting a ligase chain reaction, a helicase-dependent amplification, a strand displacement amplification, a loop-mediated isothermal amplification, a rolling circle amplification, a recombinase polymerase amplification, a nicking enzyme amplification reaction, a whole genome amplification, a transcription-mediated amplification, a multiple displacement amplification, or multiple annealing and looping-based amplification cycles, to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid.
  • the joining comprises enzymatic ligation, splint ligation, chemical ligation, template-assisted ligation, use of a ligase enzyme, use of a splint oligonucleotide, use of a catalyst, use of a bridging molecule, use of a condensation agent, use of a coupling reagent, use of a polymerase enzyme, use of a complementary nucleic acid sequence, use of a nicking enzyme, use of a nucleic acid modifying enzyme, use of a recombinase, use of a strand-displacing polymerase, use of a single-strand binding protein, a click chemistry reaction, a phosphodiester bond formation, or a peptide nucleic acid-mediated ligation.
  • Some embodiments include contacting an additional immobilized amino acid complex with a second binding agent.
  • the binding agent and the second binding agent comprise distinct recode tags having different recode nucleic acids from each other.
  • the binding agent and the second binding agent comprise recode tags having identical recode nucleic acids as each other.
  • the binding agent and the second binding agent comprise distinct recode tags having recode nucleic acids that have different sequences from each other, and that have a portion of the recode nucleic acids that are identical.
  • said transferring information comprises joining or combining the recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid with the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid, to generate a recode block.
  • said transferring information comprises joining or combining a sequence corresponding to a recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid with a sequence corresponding to the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid, to generate a recode block.
  • FIG. 16, FIG. 17 and FIG. 18 illustrate methods of co-localization of nucleic acids.
  • FIG. 16a illustrates steps in an alternate embodiment where cycle and amino acid information are aggregated in serial steps.
  • the symbol ‘a’ may represent one or more amino acids of an immobilized protein, or an isolated amino acid.
  • the “C/AA” moiety of the binding agent is a nucleic acid that represents cycle and amino acid identity, is stable throughout serial peptide degradation reactions, and is capable to hybridize with other nucleic acids.
  • the “AC” moiety of the binding agent represents an activatable coupling for immobilization of C/AA to a solid support.
  • Binding agents comprising a C/AA nucleic acid that represents cycle and amino acid information, and that comprise a reactive moiety that maybe joined to the solid support, contact either the n-terminal amino acid or a di- or tri- peptide of the analyte.
  • the triangle indicates chemical cleavage location, for example the location of a di-sulfide bridge. Alternate embodiments may also utilize the overwhelming advantages of localized assembly methods as described below.
  • the process is repeated, as shown in figure FIG. 16c to create a plurality of co-localized nucleic acids.
  • the number of C/AA conjugates in an analysis is equal to, or a subset of, the number of cycles multiplied by the number of amino acids or amino acid derivatives in the analysis.
  • the C/AA oligonucleotide is comprised of amino acid information alone, and further comprised of a reactive moiety capable to join with cycle information. In this aspect, cycle information is added prior to the next cycle.
  • a PNA molecule encoding cycle information and having a complement to the reactive moiety of the C/AA PNA oligonucleotide of a binding agent is brought into contact with an immobilized binding agent.
  • the reaction joins the two PNA molecules which together comprise amino acid and cycle information.
  • FIG. 15A illustrates a method to assemble recode blocks, wherein the cycle tag is used as a splint to facilitate ligation of amino acid and cycle information.
  • U-AS is a unifying assembly sequence to facilitate memory oligo assembly
  • U-HS is a unifying hybridization sequence to facilitate recode block assembly
  • C is a cycle tag sequence
  • i denotes the cycle number
  • X is a group that blocks 3’ extension
  • denotes a possible position of cleavage. Since the construct is designed to be used as a polymerization template during memory oligo assembly cleavage should leave a blocked 3’ end.
  • AA denotes a sequence that represents amino acid identity. Sections of LNA and RNA are shown as a preferred embodiment that improves efficiency by increasing hybridization energy and allowing facile digestion, respectively, during memory oligo assembly.
  • FIG. 15B illustrates a method for assembly of recode blocks, wherein the cycle tag is used as a capture agent to facilitate ligation of amino acid and cycle information and localization of the product.
  • the only function required of the cycle tag nucleic acid is specific homoduplex or heteroduplex hybridization.
  • U-AS is a unifying assembly sequence to facilitate memory oligo assembly
  • U-HS is a unifying hybridization sequence to facilitate recode block assembly
  • C is a cycle tag sequence
  • i denotes the cycle number
  • X is a group that blocks 3’ extension
  • denotes a possible position of cleavage.
  • AA denotes a sequence that represents amino acid identity
  • cycCode is a sequence that represents cycle information.
  • Sections of PNA, LNA and RNA are shown as a preferred embodiment that improves efficiency by increasing hybridization energy and allowing facile digestion during memory oligo assembly.
  • the resulting cognate product associates amino acid and cycle information, and includes sequences useful for memory oligo assembly. Assembled recode block complexes remain localized by virtue of specific nucleic acid hybridization interactions.
  • the binding agent may be cleaved either chemically or enzymatically, as indicated in FIG. 15C and FIG. 15B.
  • the cleavage could be directed to the structure linking the binding moiety to the recodeTag, or the recodeTag itself, or to any appropriate natural or engineered position within the binding agent.
  • base pair mismatches may be used to adjust binding energies (Tm) and/or specificity of nucleic acids of the assembly process.
  • Crowding agents are known to increase the reaction speed of several enzymes, including DNA modifying enzymes.
  • crowding agents such as PEG, hydrophilic polysaccharides, and dextrans, may be used to improve efficiency of nucleic acid assembly operations. Assembling Localized Nucleic Acids into Memory Oligos
  • Assembly of localized nucleic acids can be designed to occur in a stepwise or parallel manner, and in each case, can be designed to produce ordered or random assemblies of information.
  • Methods for stepwise and parallel assembly of ordered information are shown in figures FIGs. 3, 14-19 and FIGs. 21-24.
  • a simplified approach for parallel random assembly is illustrated in figure FIG. 25.
  • each nucleotide block of information shares a common sequence that allows enzymatic ligation via a common splint.
  • the issue with this simple parallel method for random assembly is that cyclization of the memory oligo constructs may occur, thereby limiting throughput. Ordered methods avoid this pitfail, but may be kinetically inefficient. Stepwise random assembly avoid throughput limitations and increases efficiency over ordered assembly methods.
  • stepswise and parallel assembly are illustrated.
  • FIG. 24A-24F illustrates methods of Initiation for random assembly of memory oligos.
  • FIG.24C further shows that although the Initiation CRC may be recognized by a binding agent in subsequent steps of the reverse translation process, that binding agent is a spectator, in that no complementarity between its recode block sequence exists with the hyb tag or extended initiation oligo.
  • bold arrows indicates polymerase extension
  • thin arrows indicates RNAse-H digestion.
  • a hyb tag may be introduced at any appropriate step during processing. For example, it may be introduced before amino acids are isolated into immobilized amino acid complexes, or after the last immobilized amino acid complex is created, or after assembly of recode blocks. Extension of the hyb tag may occur at any appropriate step during processing. For example, it may be extended at any step after harsh chemical cleavage of amino acids from the immobilized analyte are complete. There may be a single or multiple hyb tags introduced per analyte.
  • nucleic acid configurations may be employed.
  • FIG. 24D a configuration having a hyb tag bridge oligo is demonstrated.
  • the hyb tag segment can be long to ensure essentially irreversible hybridization.
  • the hyb tag bridge oligo 3’ end is not blocked, so that it can be extended and facilitate subsequent extension during assembly of localized nucleic acid blocks.
  • the hyb tag ligation oligo serves as a template and is blocked at its 3’ end (or is composed of RNA), such that it does not participate in further extension reactions.
  • FIG. 24E shows configurations whereby location information may be assembled with amino acid identity and cycle information.
  • contacting oligos having a sequence complementary to the UEI of an initiation oligo with an initiation oligo and a co-localized recode block results in association of the information. Association may be catalyzed by ligation, polymerization, a combination of the two, or any appropriate method.
  • optional sequences that may be useful for analysis include a primer sequence. Increasing temperature, reducing salt, adding organic solvents or other means may be employed to dissociate the formed product.
  • a copy of the product may be copied into solution by adding appropriate PCR primers, polymerase, and associated reagents.
  • a protein analyte is labeled with a stable recognition element (e.g., biotin, DNP, inorganic compound, cage compound complement, or element that can be joined directly or indirectly to a nucleic acid) either prior to or subsequent to steps 1-4 of the disclosed process.
  • the recognition element may facilitate assembly of a memory oligo by coupling to a nucleic acid that performs the function of the hyb tag nucleic acid.
  • the original peptide analyte may be degraded using proteinase k or any suitable endo or exopeptidase prior to joining the recognition element to the hyb tag nucleic acid.
  • an amine reactive moiety such as an ITC conjugated to the hyb tag nucleic acid may be applied to the surface to form a thiourea bond joining the hybtag to the N-terminal amino acid of a peptide.
  • This is particularly effective at any step following the step(s) of cleaving and thereby separating the N-terminal amino acid residue(s) from the peptide, thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N- terminal amino acid residue.
  • the hyb tag may be randomly seeded, or may be immobilized near the anchor point of a peptide analyte using the process steps described herein.
  • Initiation and Extension constructs are shown. Initiator construct generation was described in FIG. 24A-24F. Recode block Extension constructs may be generated using the steps described in FIG. 3, FIG. 9- FIG. 12, FIG. 15- FIG. 18, and FIG. 33. Cycle i indicates any recode block of the associated protein.
  • U-AS is a unifying assembly sequence for memory oligo assembly
  • U-HS is the unifying hybridization sequence used for recode block assembly
  • a preferred nucleic acid composition is depicted. The assembly is designed such that there is one U-AS’ sequence per analyte.
  • U-AS from the recode block interacts with U-AS’ of the initiator.
  • events are as follows: 1) hybridization of U-AS to U-AS’ facilitated by the strong interaction between LNA and DNA nucleobases; 2) extension of U-AS’ by a polymerase having strand displacement capability; 3) RNAse-H action on the newly formed double-stranded RNA- DNA duplex to expose a new U-AS’ 3’ end; and 4) interaction of a U-AS sequence of a next recode block with the newly formed single-stranded U-AS’.
  • Non-ligated LO e.g, due to lack of BA recognition
  • U-AS bases of the LO are RNA, so when they form the duplex it may be chewed up, leaving insignificant homology of a DNA:DNA duplex competing with strong LNA:DNA interactions.
  • RNA is in competition with LNA for binding to the landing pad (U-AS’), so temperature may be used to modulate specificity.
  • Using a displacing polymerase effectively releases the bridge oligo from the growing memory oligo strand, and thereby increases the mobility of the growing memory oligo.
  • Designed mismatches of nucleobases may be used to optimize relative Tm and specificity. The method is highly efficient to assemble nucleic acid information blocks in random order, avoiding circularization and termination events.
  • more than one Initiator is associated with a given protein analyte. This provides the flexibility to assemble multiple recode blocks in parallel to create multiple memory oligos from a single analyte.
  • more than one U-AS sequence may be employed. This provides that opportunity to orthogonally assembly more than one memory oligo for an analyte.
  • cycles 1 through 100 may utilize a first U-AS sequence
  • cycles 101-200 may utilize a second U-AS sequence, resulting in recode blocks 1-100 being assembled independently from recode blocks 101- 200.
  • Different U-AS sequences may be applied in any order, and in the case of ordered assembly, the number of U-AS may be equal to the number of cycles. In certain embodiments, this may serve to change the average length of the assembled memory oligo or the average number of recode blocks per memory oligo.
  • multiple initiator sequences may be strategically employed to control memory oligo length and optimize sequencing performance.
  • initiators are introduced at defined intervals, such as every 10 amino acids, whereby the resulting memory oligos may be tuned to an optimal size for a chosen sequencing platform. Further examples of defined intervals may be or include intervals of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, or 50 amino acids, or a range of any two of the aforementioned numbers of amino acids.
  • a peptide comprises 300 amino acids and each recode block comprises approximately 50 bases
  • placing initiators every 10 amino acids may generate memory oligos of approximately 500 bases on average, thereby providing compatibility with short-read sequencing platforms.
  • Such controlled oligonucleotide length provides improved sequencing quality by maintaining read lengths within optimal ranges for chosen platforms.
  • the controlled length further provides increased flexibility to accommodate various sequencing technologies.
  • a location oligo sequence or complement may be transferred using methods described herein to the more than one Initiator, creating information for the shared group of localized Initiators.
  • nucleic acid elements may be incorporated that promote location oligo assembly into adjacent strands.
  • a ligation oligo my comprise a sequence complementary to a sequence of the initiation oligo and be used to facilitate extension, ligation or otherwise transfer information of one memory oligo to an adjacent memory oligo.
  • Extension occurs in serial steps that may include temperature modulation, and in others embodiments the process may be run in a single isothermal step.
  • base pair mismatches may be used to adjust binding energies (Tm) and/or specificity of nucleic acids of the assembly process.
  • endonuclease nickase or similar enzymes may replace RNAse-H to provide a single-stranded U-AS’
  • FIG. 27 Further Extension operations for stepwise assembly into random memory oligos are illustrated in figure FIG. 27.
  • This embodiment may be useful when assembling elements that do not effectively participate in all types of enzymatically-catalyzed reactions, but have the capability to form specific interactions.
  • PNA is not effective to serve as a template for PCR or ligation, but is highly capable to create strong specific heteroduplexes with various nucleic acids including DNA.
  • Initiation and Extension constructs are shown. Initiator construct generation was described in FIG. 24A-24F. Recode block Extension constructs may be generated using the steps described in FIG. 3, FIG. 9- FIG. 12, FIG. 15- FIG. 18, and FIG. 33.
  • Cycle i indicates any recode block of the associated protein.
  • U-AS is a unifying assembly sequence for memory oligo assembly
  • C/AA 7 is a C/AA tag where i represents a cycle andj represents an amino acid or modified or derivatized amino acid
  • C/AA code 7 is the associated nucleic acid sequence representing the amino acid and cycle information of the C/AA tag.
  • sequence of C/AA code, 7 may or may not be the same as the sequence of the C/AA tag.
  • a preferred nucleic acid composition is depicted. The assembly is designed such that there is one U-AS’ sequence per analyte. During the Extension 1 operation, U-AS from the recode block interacts with U-AS’ of the initiator.
  • events are as follows: 1) hybridization of U-AS to U-AS’ facilitated by the strong interaction between LNA and DNA nucleobases; 2) extension of U-AS’ by a polymerase having strand displacement capability; 3) RNAse-H action on the newly formed double-stranded RNA-DNA duplex to expose a new U-AS’ 3’ end; and 4) interaction of a U-AS sequence of a next recode block with the newly formed single-stranded U-AS’.
  • FIG. 28 schematically illustrates various detailed steps of one such workflow.
  • An exemplary workflow starts after operations l-5d of FIG. 3, recoding process 200, where recode blocks are assembled on a surface.
  • recoding process 200 where recode blocks are assembled on a surface.
  • FIG. 28A shows an azide-functionalized surface of a solid support following isolation of amino acid complexes on the surface according to methods described herein.
  • FIG. 28B shows the surface subsequently grafted with a primer lawn.
  • the primers may comprise sequences that are the same or complementary to initiation oligo sequences, such as a unifying assembly sequence(U-AS). They may contain sequences that are the same or complementary to location oligo sequences. They comprise compound structures, such as those depicted in FIG. 15B.
  • an initiation oligo is randomly seeded onto the surface, in analogy to random seeding of oligonucleotides onto a next-generation sequencing flowcell surface, such as an Illumina NGS flowcell surface.
  • the initiation oligo may be randomly seeded, or may in some embodiments be immobilized near the anchor point of a peptide analyte as described for hyb oligos using process steps in analogy to those described herein.
  • FIG. 28D-28F A bridge amplification-like process proceeds as illustrated in FIG. 28D-28F.
  • priming occurs as the opposite end of an extended fragment bends over and “bridges” to another complementary oligo on the surface.
  • Repeated denaturation and extension cycles result in amplicons that may be further analyzed using next-generation DNA sequencing technologies.
  • FIG. 28G-FIG. 281 show one aspect wherein a memory oligo is assembled by repeatedly “bridging” to a complementary segment of the recode block on the surface, and undergoing polymerase extension.
  • polymerase with strand displacement capability disassembles the recode block information as its information is transferred to the growing memory oligo, avoiding repeated inclusion into a growing memory oligo.
  • Illumina patterning and ExAmp technology may be employed to improve efficiency. Patterned surfaces may aid in isolation of proteins and may reduce cross-talk during memory oligo assembly.
  • additional steps reduce or eliminate repeated copying of the same cycle and amino acid information during the assembly process.
  • a non- polymerizable molecule such as PNA
  • blocker oligonucleotides may be introduced to hybridize to the freed-up PNA sequence, effectively removing it from further interaction with a memory oligo.
  • the memory oligonucleotide may include multiple recode blocks, reverse complement of multiple recode blocks, or one or more recode blocks and the reverse complement of one or more recode blocks.
  • the memory oligonucleotide may be used in a method described herein, such as a method for determining protein information such as amino acid position or identity.
  • obtaining the sequence information for the recode block comprises performing sequencing. In some embodiments, obtaining the sequence information for the memory oligonucleotide comprises performing sequencing.
  • the memory oligonucleotide may include a recode block or multiple recode blocks.
  • the sequencing comprises Sanger sequencing. In some embodiments, the sequencing comprises Next-Generation Sequencing.
  • the sequencing comprises pyrosequencing, sequencing by synthesis, sequencing by ligation, Illumina sequencing, Ion Torrent sequencing, Pacific Biosciences sequencing, Oxford Nanopore sequencing, SOLiD sequencing, nanopore sequencing, Single Molecule Real-Time (SMRT) sequencing, 454 sequencing, Complete Genomics sequencing, Helicos sequencing, MinlON sequencing, direct RNA sequencing, Linked-Read sequencing, mate-pair sequencing, or targeted gene sequencing.
  • the sequence information for the memory oligonucleotide is obtained by sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Sanger sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Next-Generation Sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by pyrosequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by sequencing by synthesis. In some embodiments, the sequence information for the memory oligonucleotide is obtained by sequencing by ligation.
  • the sequence information for the memory oligonucleotide is obtained by Illumina sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Ion Torrent sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Pacific Biosciences sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Oxford Nanopore sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by SOLiD sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by nanopore sequencing.
  • the sequence information for the memory oligonucleotide is obtained by Single Molecule Real-Time (SMRT) sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by 454 sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Complete Genomics sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Helicos sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by MinlON sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by direct RNA sequencing.
  • SMRT Single Molecule Real-Time
  • sequence information for the memory oligonucleotide is obtained by Linked-Read sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by mate-pair sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by targeted gene sequencing.
  • Some embodiments include aggregation of information from only a subset of cycles. Some embodiments include analysis of peptide information that does not include all amino acids of a peptide, for example using sequencing information generated through a recode process (e.g. from a memory oligonucleotide formed from sequences of recode tags and cycle tags) that does not include all amino acids of the peptide. In some embodiments, only some amino acids of a protein are recoded into recode blocks. A memory oligo may include recode blocks corresponding to all, or only some of the amino acids, of a peptide. The missing amino acid information may be taken into account when reconstructing a peptide, or identifying a peptide. Some memory oligonucleotides include recode blocks with recode tag and cycle tag sequences.
  • the solid support comprises a bead, a plate, or a chip.
  • the solid support comprises glass slide, silica, a resin, a gel, a membrane, polystyrene, a metal, nitrocellulose, a mineral, plastic, polyacrylamide, latex, or ceramic.
  • the solid support is a bead, a plate, or a chip.
  • the solid support is a magnetic bead.
  • the solid support is a glass slide.
  • the solid support is a microarray chip.
  • the solid support is a nanoparticle.
  • the solid support is a silica gel.
  • the solid support is a resin.
  • the solid support is a polystyrene bead.
  • the solid support is a gold plate.
  • the solid support is a silicon chip.
  • the solid support is a nitrocellulose membrane.
  • peptides Disclosed herein, in some embodiments, are peptides.
  • the peptide may be the subject of a method which seeks to obtain information about the peptide, such as information on an identity or location of one or more amino acids of the peptide.
  • the peptide may be included in a method described herein, such as a method for determining protein information such as amino acid location or identity.
  • the peptide is an enzyme. In some embodiments, the peptide is an antibody. In some embodiments, the peptide is a viral protein. In some embodiments, the peptide is a bacterial protein. In some embodiments, the peptide is a synthetic peptide. In some embodiments, the peptide is a bioactive peptide. In some embodiments, the peptide is a peptide hormone. In some embodiments, the peptide is an oligopeptide. In some embodiments, the peptide is a polypeptide. In some embodiments, the peptide is a fusion protein. In some embodiments, the peptide is a cyclic peptide.
  • peptides coupled to a solid support are peptides coupled to a solid support.
  • the peptide is coupled to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support.
  • the peptide may be coupled directly by a C-terminal amino acid residue to the solid support, or may be coupled directly by an internal (e.g. non-N-terminal and non-C-terminal) amino acid residue to the solid support.
  • the N-terminus of the peptide is linked or coupled indirectly to the solid support via a chain of other amino acids of the peptide.
  • the peptide is derived from a human, plant, bacterium, fungus, animal, virus, mammal, bird, marine organism, insect, reptile, amphibian, synthetic source, protist, yeast, primate, cell culture, parasite, patient sample, environmental sample, or genetically modified organism.
  • the peptide is derived from a cell lysate, blood sample, plasma sample, serum sample, tissue biopsy, saliva sample, urine sample, cerebrospinal fluid sample, sweat sample, synovial fluid sample, fecal sample, gut microbiome sample, environmental water sample, soil sample, bacterial culture, viral culture, organoid, tumor biopsy, sputum sample, or hair sample.
  • the peptide is derived from a reptile. In some embodiments, the peptide is derived from an amphibian. In some embodiments, the peptide is derived from a synthetic source. In some embodiments, the peptide is derived from a protist. In some embodiments, the peptide is derived from a yeast. In some embodiments, the peptide is derived from a primate. In some embodiments, the peptide is derived from a cell culture. In some embodiments, the peptide is derived from a parasite. In some embodiments, the peptide is derived from a patient sample. In some embodiments, the peptide is derived from an environmental sample. In some embodiments, the peptide is derived from a genetically modified organism.
  • the peptide is derived from a cell lysate. In some embodiments, the peptide is derived from a plasma sample. In some embodiments, the peptide is derived from a tissue biopsy. In some embodiments, the peptide is derived from a serum sample. In some embodiments, the peptide is derived from a saliva sample. In some embodiments, the peptide is derived from a urine sample. In some embodiments, the peptide is derived from a cerebrospinal fluid sample. In some embodiments, the peptide is derived from a sweat sample. In some embodiments, the peptide is derived from a synovial fluid sample.
  • the peptide is associated with a disease state.
  • the peptide is associated with a cancerous disease state, an autoimmune disease state, a neurodegenerative disease state, a cardiovascular disease state, a metabolic disease state, a genetic disease state, a viral infection, a bacterial infection, a fungal infection, a parasitic infection, an inflammatory condition, an endocrine disorder, an immunodeficiency, a respiratory disorder, a skin disorder, a gastrointestinal disorder, a psychiatric disorder, an aging process, a muscular disorder, or a renal disorder.
  • the peptide is associated with a parasitic infection. In some embodiments, the peptide is associated with an inflammatory condition. In some embodiments, the peptide is associated with an endocrine disorder. In some embodiments, the peptide is associated with an immunodeficiency. In some embodiments, the peptide is associated with a respiratory disorder. In some embodiments, the peptide is associated with a skin disorder. In some embodiments, the peptide is associated with a gastrointestinal disorder. In some embodiments, the peptide is associated with a psychiatric disorder. In some embodiments, the peptide is associated with an aging process. In some embodiments, the peptide is associated with a muscular disorder. In some embodiments, the peptide is associated with a renal disorder.
  • the peptide is a biomarker for a disease or condition, a drug target for a disease or condition, an antigen for the development of a vaccine, used for patient stratification in a clinical trial, a therapeutic agent for a disease or condition, used in the production of a biosimilar or generic drug, used for evaluating the efficacy of a drug treatment, used in personalized medicine for a specific disease or condition, used in immuno-oncology research, used in the validation of a diagnostic test, used in the development of a peptide-based therapeutic, a component of a cell signaling pathway, used in a structure-activity relationship study, used in the development of an immunoassay, used in the study of protein-protein interactions, used in the design of a drug delivery system, used in a high- throughput screening assay, used in a pharmacokinetic study, used in the formulation of a nutraceutical product, used in the development of a probiotic product, or used in a proteomics study.
  • the peptide is a biomarker for a disease or condition.
  • the peptide is a drug target for a specific disease or condition.
  • the peptide is an antigen for the development of a vaccine.
  • the peptide is used for patient stratification in a clinical trial.
  • the peptide is a therapeutic agent for a specific disease or condition.
  • the peptide is used in the production of a biosimilar or generic drug.
  • the peptide is used for evaluating the efficacy of a drug treatment.
  • the peptide is used in personalized medicine for a specific disease or condition.
  • the peptide is used in immuno-oncology research. In some embodiments, the peptide is used in the validation of a diagnostic test. In some embodiments, the peptide is used in the development of a peptide-based therapeutic. In some embodiments, the peptide is a component of a cell signaling pathway. In some embodiments, the peptide is used in a structureactivity relationship study. In some embodiments, the peptide is used in the development of an immunoassay. In some embodiments, the peptide is used in the study of protein-protein interactions. In some embodiments, the peptide is used in the design of a drug delivery system. In some embodiments, the peptide is used in a high-throughput screening assay.
  • the peptide is used in a pharmacokinetic study. In some embodiments, the peptide is used in the formulation of a nutraceutical product. In some embodiments, the peptide is used in the development of a probiotic product. In some embodiments, the peptide is used in a proteomics study. Chemicall -Reactive Conjugates
  • CRCs chemically-reactive conjugates
  • the CRC may be used in a method described herein, such as a method for determining protein information such as amino acid sequence, identity, or location.
  • the chemically-reactive conjugate (CRC) may include a nucleic acid sequence tag.
  • the chemically-reactive conjugate may include a reactive moiety.
  • the reactive moiety may bind and cleave a N-terminal amino acid residue from a peptide.
  • the chemically- reactive conjugate may include an immobilizing moiety.
  • the immobilizing moiety may bind to a solid support, and thus may be useful for immobilization to a solid support.
  • the chemically-reactive conjugate may include (A) a cycle tag; (B) a reactive moiety for binding and cleaving a N-terminal amino acid residue from a peptide; and (C) an immobilizing moiety for immobilization to a solid support.
  • the CRC may include the following structure: (Formula
  • the CRC may include the following structure: (Formula II).
  • CRC may include the structure of Formula I or Formula II, or any suitable structure connecting A, B, and C.
  • A is, or includes, a cycle tag
  • B is, or includes, a reactive moiety (e.g. for binding and cleaving a N-terminal amino acid residue from a peptide)
  • C is, or includes, an immobilizing moiety (e.g. for immobilization to a solid support).
  • LA, LB, and Lc are optional linkers in
  • Formula I may comprise a central moiety.
  • LAB and LBC are optional linkers in Formula II. Additional arms or aspects may be included or added to Formula I or II.
  • the chemically reactive conjugate may include a central moiety.
  • the central moiety may be or include a central carbon.
  • the central carbon may be attached to other carbons, such as to 3 other carbons, and link to the arms of the chemically-reactive conjugate.
  • the central moiety may include a heterocycle, a carbocycle, or a trivalent nitrogen.
  • the trivalent nitrogen may include an amine.
  • the amine may include a tertiary amine.
  • the central moiety may include a trivalent boron, a tri- or higher valency phosphorus, a tetravalent silicon, a polyhedral oligomeric silsesquioxane (POSS), a siloxane, a branched siloxane, a polyether, a phosphazene, a phosphonium, an ammonium, an imidazolium, a methane, a propane, a butane, a pentane, a hexane, a C1-C24 alkyl, a benzene, a toluene, a xylene, a phenol, an N,N-disubstituted aniline, an anisole, a trihydroxybenzene, a benzenetricarboxylic acid, a phthalic acid, a trimesic acid, a cyclopropane, a glycol, a glycerol,
  • the chemically reactive conjugate may comprise a branched or modified oligonucleotide having one or more modified nucleobases or functional groups at the 5’, 3’ or internal positions, wherein they function as immobilization moiety(s), reactive moiety(s) or ITC-conjugate reactive moiety(s).
  • modified nucleobases may be used to join a N-terminal amino acid reactive group, such as a modified PITC or ITC-conjugate. Any set of nucleobases of the CRC may function as a linker, a spacer, and/or the cycle tag.
  • the oligonucleotides may be branched or cyclic or possess secondary structures such as hairpins.
  • these chemically reactive conjugates may comprise this range of nucleotides wherein the bases and or backbone functional groups (e.g., extracyclic amines, phosphate, thiophosphate) have protecting groups present.
  • Nucleotides which possess enhanced stability to the conditions of peptide sequencing [[(Watts, J.K.; Katolik, A.; Viladoms, J.; Damha, M.J. Org. Biomol. Chem., 2009, 7, 1904-1910)]] (for example, strong acid), may be preferred embodiments.
  • the chemically-reactive conjugate is prepared by an organic synthesis method.
  • Some examples of multicomponent reaction schemes are shown in FIG. 38-42B.
  • chemically-reactive conjugate comprising (A) a cycle tag; (B) a reactive moiety; and (C) an immobilizing moiety.
  • (A), (B), and (C) are oriented linearly in relation to each other.
  • (A), (B), and (C) are oriented in any of the following orders: (A)-(B)-(C) (like Formula II), (A)-(C)-(B), or (B)-(A)-(C).
  • (A), (B), and (C) are linearly like Formula II and include optional linkers between (A),
  • (A), (B), and (C) are linearly like Formula II and include optional linkers between (A), (B), and (C), but in the following order: (B)-(A)-(C).
  • each of (A), (B) and (C) are on independent arms in relation to each other.
  • the CRC is linear in the order (A)-(B)-(C). In some embodiments, the CRC is linear in the order (A)-(C)-(B). In some embodiments, the CRC is linear in the order (B)-(A)-
  • the CRC each of (A), (B) and (C) are on independent arms.
  • Some embodiments include a cleavable group between (A) and (B), between (B) and (C), between (A) and (C), between (A) and (B+C), between (B) and (A+C), or between (C) and (A+B), or any combination thereof.
  • Some embodiments include a cleavable group between (A) and (B).
  • Some embodiments include a cleavable group between (B) and (C).
  • Some embodiments include a cleavable group between (A) and (C).
  • Some embodiments include a cleavable group between (A) and (B+C).
  • Some embodiments include a cleavable group between (C) and (A+C).
  • Some embodiments include a cleavable group between (C) and (A+B).
  • Some embodiments include a non-nucleic acid label (e.g. element A).
  • the detectable label comprises a fluorophore, a radioactive label, an isotopic label, a mass tag, a chemiluminescent tag, or an imaging tag.
  • Some embodiments include a detectable label.
  • the detectable label is a fluorophore.
  • the detectable label is a radioactive label.
  • the CRC comprises a pre-nucleic acid sequence tag comprising a group for attaching a nucleic acid sequence.
  • said group for attaching a nucleic acid sequence comprises an oxyamine group, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, a strained alkene, or derivative thereof.
  • said group for attaching a nucleic acid sequence is subsequently used to attach a nucleic acid sequence.
  • the nucleic acid sequence tag is generated upon conjugating the nucleic acid sequence to a group for attaching a nucleic acid sequence comprising an oxyamine group, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof.
  • the nucleic acid sequence tag is generated upon conjugating the nucleic acid sequence to a group for attaching a nucleic acid sequence comprising a protected oxyamine group, a protected thiol, a protected amine, a protected hydrazine, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof.
  • the conjugation occurs prior to the peptide sequencing steps.
  • the conjugation occurs after the CRC is reacted to the N-terminal amino acid. In some embodiments, the conjugation occurs after the CRC is reacted to and then cleaved from the N-terminal amino acid, but prior to initiation of the next cycle.
  • a reaction scheme comprising: 1) an ITC-conjugate comprising an amine-reactive moiety (RG) for binding the NTAA and a functional group (X) and 2) a chemically reactive conjugate, comprising a cycle tag (Ct), an immobilizing moiety (SI) that can react with functional groups on a solid support, and a functional group (X’) that can couple to the ITC- conjugate.
  • RG amine-reactive moiety
  • X functional group
  • a chemically reactive conjugate comprising a cycle tag (Ct), an immobilizing moiety (SI) that can react with functional groups on a solid support, and a functional group (X’) that can couple to the ITC- conjugate.
  • Examples of pairs of functional groups that may be suitable for joining the ITC-conjugate and CRC are shown in Table 2. The joining operation may be spontaneous, triggered (e.g., by light, temperature, solution composition change or other environmental change), or catalyzed.
  • the ITC-conjugate comprises a linker.
  • Suitable chemical linkers include but are not limited to alkyl, aryl, ether, amide, cycloalkyl, branched or linear structures or combinations thereof.
  • the linker may include a -C(O)-, -O-, -S-, -S(O)-, -NH-, -C(O)O-, -C(O)Cl-C10 alkyl, - C(O)Cl-C10 alkyl-O-, -C(O)Cl-C10 alkyl-CO2-, -C(O)Cl-C10 alkyl-NH-, -C(O)Cl-C10 alkyl-S-, - C(O)Cl-C10 alkyl-C(O)-NH-, -C(O)Cl-10 alkyl-NH-C(O)-, -C1-C10 alkyl-, -
  • the cycle tag portion of the chemically reactive conjugate may be joined to the CRC between steps of the sequencing operation.
  • the cycle tag may be joined at any point between steps (a) and (f) of Process 200.
  • general reaction schemes may comprise any of the following, where ITCC represents the ITC-conjugate, RG represents the reactive group toward the peptide N-terminus (e.g. isothiocyanate), Ct represents the cycle tag, X:X’ and Y:Y’ represent pairs of reactive functional groups for attaching the CRC to the ITC-conjugate or the cycle tag to the CRC respectively, and SI represents the surface immobilization moiety:
  • the surface immobilization group is formed via a subsequent reaction after chemically-reactive conjugate is attached to the peptide N-terminus.
  • the chemically reactive conjugate may have 2 or more surface immobilization moieties.
  • X is not amine reactive or is much less amine reactive than the reactive group RG; cross reactions between X and RG are minimal or absent; all coupled linkages have some stability to the conditions of peptide sequencing (except of course the intended cleavage of the peptide NTAA from the remainder of the peptide); and the newly formed linkages are not amine reactive and/or not reactive toward surface functional groups; X:X’ and Y:Y’ are not or are minimally reactivity with surface functional groups or surface immobilizing groups; and the surface functional groups and surface immobilization moieties require an activation step or catalyst in order to couple with each other to a significant extent.
  • the cycle tag may be associated with a cycle number.
  • the cycle number may correspond with an amino acid number, for example an amino acid number of a peptide when numbered from N to C.
  • the cycle tag may be a part of a chemicallyreactive conjugate.
  • nucleic acid tags may be included within a chemically reactive conjugate.
  • the nucleic acid tag of the chemically reactive conjugate may be referred to, or be included as an example of a cycle nucleic acid tag.
  • the nucleic acid sequence tag comprises a DNA or RNA sequence.
  • the nucleic acid sequence tag comprises at least 10 nucleotides.
  • the nucleic acid sequence tag is ligated or bound to an additional oligonucleotide.
  • the reactive moiety may be included as part of a chemically-reactive conjugate.
  • the reactive moiety comprises an Edman degradation reagent. In some embodiments, the reactive moiety comprises a phenyl isothiocyanate (PITC). In some embodiments, the reactive moiety comprises an isothiocyanate (ITC) or some derivative thereof. In some embodiments, the reactive moiety comprises dansyl chloride or some derivative thereof. In some embodiments, the reactive moiety comprises dinitrofluorobenzene (DNFB) or some derivative thereof. [0255] In some embodiments, the reactive moiety comprises an enzyme or peptide. In some embodiments, the reactive moiety is an enzyme. In some embodiments, the reactive moiety is a peptide. In some embodiments, the reactive moiety specifically cleaves at a specific amino acid.
  • PITC phenyl isothiocyanate
  • ITC isothiocyanate
  • DNFB dinitrofluorobenzene
  • the reactive moiety specifically cleaves at a specific amino acid that is not N-terminal. In some embodiments, the reactive moiety specifically cleaves at a specific amino acid that is not be the N-terminal acid. In some embodiments, the enzyme or peptide has aminopeptidase activity. In some embodiments, the enzyme or peptide is a modified aminopeptidase. In some embodiments, the reactive moiety cleaves more than a single amino acid. In some embodiments, the reactive moiety cleaves 2, 3, 4, 5 or more amino acids. In some embodiments, the reactive moiety cleaves amino acids at a specific motif.
  • reactive moieties that may bind and cleave more than a single amino acid may include a peptidyldipeptidase, or a modified peptidyldipeptidase, such as a modified angiotensin-converting enzyme (ACE).
  • ACE angiotensin-converting enzyme
  • the reactive moiety may include ACE or a modified ACE.
  • Some embodiments comprise C-terminal peptide degradation, for example following the alkylated thiohydantoin method described by DuPont et al. Dupont DR, Bozzini M, Boyd VL.
  • the C-terminal carboxyl may be converted to a thiohydantoin via treatment with acetic anhydride followed by thiocyanate ion under acidic conditions.
  • the reactive moiety comprises a group on the CRC for attaching to a cleavable derivatized N-terminal amino acid, comprising a tetrazine, an azide, an alkene, an alkyne, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof.
  • a cleavable derivatized N-terminal amino acid comprising a tetrazine, an azide, an alkene, an alkyne, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof.
  • the immobilizing moiety comprises a thiol group, an amine group, or a carboxyl group.
  • the immobilizing moiety comprises a protected thiol group, a protected amine group, or a carboxyl group, an azide, an alkyne, an alkene, an aryl boronic acid, an aryl halide, a haloalkyne, a silylalkyne, a Si-H group, a protected or photoprotected reactive group, or a photoactivated reactive group.
  • the immobilizing moiety is an azide, an alkyne, an alkene, an aryl boronic acid, an aryl halide, a haloalkyne, a silylalkyne, a Si-H group, a protected or photoprotected reactive group, or a photoactivated reactive group.
  • the immobilizing moiety may include a thiol.
  • the immobilizing moiety may include an amine.
  • the immobilizing moiety may include an alkyne.
  • the immobilizing moiety may include an azide.
  • the immobilizing moiety may include an alkene.
  • the immobilizing moiety may include an aryl boronic acid.
  • the immobilizing moiety may include an aryl halide.
  • the immobilizing moiety may include a haloalkyne.
  • the immobilizing moiety may include a silylalkyne.
  • the immobilizing moiety may include a Si-H group.
  • the immobilizing moiety may include a protected or photoprotected reactive group (such as a pyridyl disulfide, a phenylacyl protected thiol, a nitrobenzyl protected thiol, a photocaged DBCO).
  • the immobilizing moiety may include a photoactivated reactive group (such as an azirine, a tetrazole, a sydnone, a 3- hydroxynapthalen-2-ol) .
  • the linker comprises an alkylene.
  • the alkylene is a C1-C20 alkylene or a derivative thereof.
  • the C1-C20 alkylene may optionally be substituted variants thereof.
  • the alkylene is a Cl -CIO alkylene or a derivative thereof.
  • the linker comprises a heteroalkylene.
  • the heteroalklyene comprises a PEGi- n , wherein n is any suitable integer.
  • n is an integer from 2-100.
  • n is an integer from 2-50.
  • n is an integer from 2-25.
  • n is an integer form 2-20.
  • the heteroalkylene comprises a PEG1-20 (e.g. 1 to 20 units of polyethene glycol) or a derivative thereof.
  • the PEG1-20 may optionally be substituted variants thereof.
  • the linker may comprise an oligoethylene glycol, a peptide, an oligopropylene glycol, an oligoamide, an oligosaccharide, a siloxane, a fully-alkylated polyamine, a polyol, an oligomeric polyester, a nucleic acid, or an oligomeric poly( tetramethylene oxide).
  • Any or all of the linkers may independently include or be selected from any of the aforementioned linkers.
  • LA may be cleavable.
  • LB may be cleavable.
  • Lc may be cleavable.
  • LAB may be cleavable.
  • LBC may be cleavable. Any combination of the aforementioned linkers may be used.
  • a cleavable moiety may be cleaved by light, under acidic conditions, under basic conditions, an enzyme, or a combination thereof.
  • the light may comprise UV light, visible light, IR light, laser, or a combination thereof.
  • the cleavable moiety may be a photocleavable moiety.
  • the photocleaveable moiety may comprise an electon withdrawing group, such as, but not limited to a nitro group or halide group.
  • the cleavable moiety may be an enzymatically cleavable moiety.
  • the chemically reactive conjugate and/or conjugate complex comprises a spacer associated with a reactive moiety used for immobilization of the chemically-reactive conjugate complex to the hydrogel surface, and the spacer comprises a restriction endonuclease cleavage sequence capable of releasing the PITC-AA moiety and/or cycle tag from the conjugate complex.
  • the chemically reactive conjugate and/or conjugate complex comprises a spacer associated with the reactive moiety used to bind and cleave terminal amino acids, and that spacer contains a restriction endonuclease cleavage sequence capable to release the cycle tag and/or the reactive moiety used for immobilization from the conjugate complex.
  • the chemically reactive conjugate may be in a pro-form, meaning that it is able, through additions, activations, cleavage reactions or other manipulations, to perform the functions of cycle identification (e.g., cycle tag), binding and cleavage of amino acids (e.g., PITC), and reaction to a surface, such as a hydrogel coated surface.
  • cycle identification e.g., cycle tag
  • binding and cleavage of amino acids e.g., PITC
  • reaction to a surface such as a hydrogel coated surface.
  • transferring the information of the recode tag to the recode block is mediated by a DNA polymerase, or by a combination of a DNA polymerase and ligase.
  • transferring the information of the recode tag to the recode block is mediated by chemical ligation.
  • a plurality of macromolecules and associated conjugate complexes are joined to a solid support.
  • the plurality of macromolecules are spaced apart on the solid support at an average distance >100 nm.
  • modification of a terminal amino acid of the peptide prior to contacting the peptide with the first chemically-reactive conjugate increases the reactivity of the chemically-active conjugate toward the modified amino acid relative to non-modified amino acids.
  • activation of the C-terminal amino acid with acetic anhydride prior to contacting with trimethylsilylisothiocyanate has been described. Bailey, J.M., Shenoy, N.R., Ronk, M., & Shively, J.E., 1992, Protein Sci. 1, 68-80.
  • the methods described herein further comprise after contacting the recode blocks with polymerase, nucleotides, ligase, and/or buffer under conditions that allow extension- ligation or ligation to assemble the recode blocks into a memory oligo, contacting a plurality of incompletely ligated memory oligos with linking oligos, polymerase, nucleotides, ligase, and/or buffer under conditions that allow extension-ligation or ligation to assemble the incompletely ligated memory oligos into a memory oligo. Accordingly, the yield during memory oligo assembly may be increased.
  • the methods described herein further comprise after contacting the recode blocks with polymerase, nucleotides, ligase, and/or buffer under conditions that allow extensionligation or ligation to assemble the recode blocks into a memory oligo, contacting a plurality of incompletely ligated memory oligo fragments and/or recode blocks with linking oligos, ligase, and buffer under conditions that promote ligation of recode blocks and memory oligo fragments. Accordingly, the yield during memory oligo assembly may be increased.
  • the linking oligo comprises a sequence complementary to that of the recode blocks, thereby facilitating ligation of recode blocks that were not ligated during contacting with the polymerase, nucleotides, ligase, and buffer.
  • the linking oligos comprise additional nucleotide sequences coded to carry information related to sample or process, and/or that aid in ligation or extension-ligation.
  • the memory oligo is amplified prior to analysis, e.g., by bridge amplification, ExAmp NGS clustering, isothermal clustering, solution-based PCR amplification, A-tailing to add primers sequences prior to solution-based amplification, or any suitable DNA amplification method.
  • a memory oligo optionally comprises a sample index, a spacer, a unique molecular identifier (UMI), a universal priming site, a CRISPR protospacer adjacent motif (PAM) sequence, or any combination thereof.
  • UMI unique molecular identifier
  • PAM CRISPR protospacer adjacent motif
  • a plurality of memory oligos are enriched prior to analysis, e.g., via a depletion process or a normalization process to remove or reduce the fraction of oligos associated with abundant protein, peptides, or macromolecules.
  • enrichment or depletion may be carried out via commercially available kits, such as Agilent SureSelect, or via custom enrichment or depletion methods using oligonucleotides partially complementary to a memory oligo sequence, e.g., complementary to AA tag sequences of the target memory oligo.
  • analyzing the memory oligo(s) comprises a nucleic acid sequencing method. [00358] In some aspects, analyzing the memory oligo(s) comprises analysis via a multiplex PCR method.
  • helicase may be utilized during assembly of memory oligos.
  • the use or strobing of helicase during one or more assembly processes may, in some examples, improve access of DNA blocks to facilitate longer memory oligo assembly.
  • the memory oligo or recode blocks thereof are configured to be analyzed using a decode-based methodology. More information regarding decode-based techniques may be found in Gunderson et al., Decoding Randomly Ordered DNA Arrays, Genome Res., 2004 May; 14(5):870-7, which is herein incorporated in its entirety by reference for all purposes.
  • fragments of memory oligos, or recode blocks, or any such spatially-confined set of constructs that contains sequence and identity information associated with a given peptide, protein, protein complex, or polymer are analyzed using a decode-based methodology. See Gunderson et al.
  • identifying components are selected from UMIs, sample indexes, recode tags, recode blocks, ligation oligos, AA tags, their complements, or any combination thereof.
  • one or more chemically-reactive conjugates binds to a terminal amino acid residue of the peptide.
  • one or more binding agents bind to the conjugate complex.
  • the conjugate complex comprises a post-translationally modified amino acid.
  • the identifying components of a recode tag, recode block, or both comprise error detection and/or correction bits.
  • the error detection/correcting sequence is derived from Hamming distance theory, or other modern digital code space theories (e.g., Lee, Levenshtein-Tenengolts, Reed-Solomon, or others).
  • the constituents of a recode tag, recode block, or both comprise 2, 3, 4, 5, 6 or more different types of nucleotides.
  • the code (or codes) e.g., sequences
  • the code (or codes) e.g., sequences) associated with a recode tag or recode block via analysis of the memory oligo are derived from 2, 3, 4, 5, 6 or more types of nucleotides.
  • an immobilized peptide is linearized (denatured) using detergent(s), surfactant(s), chaotropic agent(s), reducing agent(s), and/or alkylation agent(s).
  • a chemically-reactive conjugate reacts and cleaves from a C-terminus of the peptide rather than the N-terminus to create recode blocks that can be assembled using any of the methods described herein.
  • paired-end read information may be collected from an immobilized protein complex, protein, or peptide, by creating recode blocks using chemically-reactive conjugates operating on both the N-terminus and C-terminus of a given protein complex, protein, or peptide sequentially or in parallel to create recode blocks that can be assembled using methods described herein.
  • a method for acquiring a priori defined code information via sequencing of a subset of nucleotides types in an oligonucleotide or oligonucleotide cluster is provided. Such is particularly beneficial when considering readouts of information stored in DNA (e.g., DNA data storage information technology readout).
  • information recoded into a memory oligos is acquired via sequencing of a subset of the nucleotide types in the memory oligo.
  • a subset of nucleotide types may be identified and a subset of nucleotide types may not be identified in the sequencing readout, e.g., by introducing non-fluorescent, non-reversibly-terminated nucleotides into an SBS sequencing reagent mixture.
  • the subset is 2 of the 4 natural nucleotides.
  • one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed.
  • subunits of a given protein are co-immobilized directly or through their interaction with native subunits on the surface. Subsequently, the one or more subunits may be simultaneously recoded by processes (b)-(m), including alternate aspects associated with the method, within the same localized region.
  • Information of the memory oligo may contain an admixture of subunits (protein and native) which can be deconvoluted in silico.
  • a method for preparing interacting DNA-peptides, or a plurality of interacting DNA-peptides complexes, to be joined to a solid support comprising: (a) cross-linking peptides, protein, and/or protein complexes with native DNA with which the protein was associated in biological context for one or more samples (for example, using formaldehyde, or other methods known in the art); (b) activating zero, 1, 2, or more moieties of each cross-linked peptide-DNA, protein, and/or protein complex-DNA complexes; (c) optionally joining a sample-specific nucleotide index sequence to the activated peptides-DNA, and/or protein-DNA complexes; and (d) joining the complexes to a solid support.
  • one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed.
  • the method provides for the analysis of vivo interactions between proteins and DNA.
  • a peptide comprises any suitable macromolecular polymer, including a protein, a peptide, and the like.
  • a monomeric unit of the macromolecular polymer may comprise an amino acid, a carbohydrate, and/or any monomeric moiety that may be combined into a polymer.
  • the method further comprises depletion of one or more abundant proteins from the sample prior to any of operations (a) (b) (c), and/or (d).
  • the macromolecule analyte is degraded at any step following step (g) of the disclosed method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes.
  • step (g) of the disclosed method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes For example, the residual amino acids that are not isolated through steps a-g of the method may be digested prior to contacting the surface with binding agents.
  • Methods for degrading the macromolecular analyte include chemical methods to release the macromolecule from the surface, such as reduction of a disulfide linkage, or enzymatic digestion methods that include proteinase k, trypsin, chymotrypsin, or other endo or exo-peptidases.
  • the utilization of chemically-reactive conjugates with cleavable spacers allows rejuvenation of a surface of a substrate for a second round of recoding.
  • a method for analyzing one or more residual immobilized analytes from a surface having a plurality of peptides, proteins, and/or protein complexes comprising: (a) providing a surface used in a previous round of recoding operations (b) - (d) described below, and which has been rejuvenated by cleaving the spacers of a first chemically-reactive conjugate, (b) providing a second chemically-reactive conjugate (e.g.
  • surface rejuvenation may include ‘strobing’ the protein using either chemical (e.g., phenylisothiocyanate (PITC)) or biological (e.g., aminopeptidase) methods.
  • chemical e.g., phenylisothiocyanate (PITC)
  • biological e.g., aminopeptidase
  • a plurality of assembly oligos containing all or some of the possible assembly oligos are hybridized to the memory oligo, ligated, and dehybridized to form a solution-phase memory oligo.
  • location information is transferred to C/AA, or its associated memory oligo.
  • protecting groups that are removable under mild alkaline conditions e.g., phenoxyacetyl (Pac) protected dA and 4-isopropyl-phenoxyacetyl (iPr-Pac) protected dG, along with acetyl protected dC, may be employed.
  • protecting the individual bases A, G, and C can be achieved through acylation reactions with the appropriate acid chlorides.
  • the specific acid chlorides used may be benzoyl chloride for adenine and cytosine, isobutyryl chloride for guanine.
  • methods comprising: (a) protecting an oligonucleotide of a binding or reactive molecule; (b) contacting said molecule with the N-terminus of a peptide bound to a solid support; (c) cleaving one or more amino acid residues from said peptide; (d) deprotecting the oligonucleotide of the binding or reactive molecule; (e) contacting the deprotected oligo with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation.
  • Some embodiments include repeating any of the aforementioned steps.
  • the chemically reactive species may include a chemically reactive conjugate described herein.
  • methods comprising: (a) protecting an oligonucleotide joined to a peptide; (b) contacting the N-terminus of said peptide with reagent(s) to cleave one or more amino acid residues from said peptide; (c) deprotecting the oligonucleotide bound to the peptide; (d) contacting the deprotected oligonucleotide with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation.
  • Some embodiments include repeating any of the aforementioned steps.
  • the chemically reactive species may include a chemically reactive conjugate described herein.
  • oligonucleotide associated with location or identity of a peptide
  • methods comprising: (a) protecting an oligonucleotide associated with location or identity of a peptide; (b) contacting the N-terminus of said peptide with reagent(s) to cleave one or more amino acid residues from said peptide; (c) deprotecting the oligonucleotide bound to the peptide; (d) contacting the deprotected oligonucleotide with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation.
  • Some embodiments include repeating any of the aforementioned steps.
  • the chemically reactive species may include a chemically reactive conjugate described herein.
  • Some embodiments include deprotecting the oligonucleotide after cleaving the terminal amino acid of the peptide, and then reacting a second reagent with the oligonucleotide. Some examples include a washing step before or after (a), (b), (c), (d), or (e). Washing may include changing a solution, removing an excess reagent or solution. Any of the aforementioned steps (e.g. step (e)), or a combination of said steps, may be optional in some embodiments.
  • oligonucleotide coupled to a solid support
  • cleaving a terminal amino acid of a peptide coupled to the solid support comprising: (a) protecting an oligonucleotide coupled to a solid support; (b) cleaving a terminal amino acid of a peptide coupled to the solid support; (c) deprotecting the oligonucleotide; (d) reacting a reagent with the oligonucleotide; and (e) reprotecting the oligonucleotide.
  • Some embodiments include binding a chemically reactive species to a terminal amino acid of the peptide after reprotecting the oligonucleotide.
  • Some embodiments include deprotecting the oligonucleotide after binding the chemically reactive species to the terminal amino acid of the peptide, and then reacting a second reagent with the oligonucleotide. Some examples include a washing step before or after (a), (b), (c), (d), or (e). Washing may include changing a solution, removing an excess reagent or solution. Any of the aforementioned steps (e.g. step (e)), or a combination of said steps, may be optional in some embodiments.
  • the reactive moiety cleaves the terminal amino acid from the peptide to expose a next terminal amino acid, and wherein the method further comprising contacting the next amino acid with another of the conjugate after reprotecting the oligonucleotide.
  • the terminal amino acid is N- terminal.
  • the peptide is immobilized to a solid support.
  • the conjugate comprises an organic, small molecule.
  • the conjugate comprises a chemically-reactive conjugate (CRC) comprising: (A) the oligonucleotide; (B) the reactive moiety; and (C) an immobilization moiety.
  • the oligonucleotide comprises a cycle nucleic acid.
  • methods comprising: providing a conjugate comprising a peptide coupled to a protected oligonucleotide; contacting the terminal amino acid of the peptide, thereby binding a reactive moiety to the terminal amino acid, and optionally cleaving the terminal amino acid from the peptide; deprotecting the oligonucleotide; and contacting the deprotected oligonucleotide with an enzyme or reagent for ligation or polymerization.
  • Some embodiments include reprotecting the oligonucleotide.
  • the reactive moiety cleaves the terminal amino acid from the peptide to expose a next terminal amino acid, and wherein the method further comprising contacting the next amino acid with another of the conjugate after reprotecting the oligonucleotide.
  • the terminal amino acid is N-terminal.
  • the peptide is immobilized to a solid support.
  • the conjugate comprises an organic, small molecule.
  • Subset sequencing may be particularly useful when an oligonucleotide is required to function during a physiochemical activity, such as a primer for PCR or a spacer oligo, and function to store information.
  • nucleotides of a sequence that is functional during physiochemical activities provide redundant stored information.
  • An aspect such as a barcode nucleic acid or recode nucleic acid may include nucleotides such as A, G, C, and T, whereas information content of the physiochemically functional sequence may be represented by a subset of the nucleotides (such as A and C, or T and G).
  • a recode tag, cycle tag, and/or recode block nucleic acids include sequence that is useful to obtain. In some aspects, this information can be obtained by sequencing a subset of the nucleotides that comprise the nucleic acid. When an oligonucleotide that includes the redundant information sequenced, a subset of nucleotides may be skipped during sequencing.
  • the method may include (a) providing, in a nucleic acid sequencing reaction, a combination reversibly terminated nucleotides and nucleotides that are not reversibly terminated.
  • reversibly terminated nucleotides are fluorescent.
  • non- reversibly terminated nucleotides are fluorescent.
  • nucleotides of the nucleic acid being sequenced that correspond with the nucleotides that are not reversibly terminated are not sequenced.
  • the method may include providing, in a nucleic acid sequencing reaction, a combination reversibly terminated nucleotides and nucleotides that are not reversibly terminated, wherein nucleotides of the nucleic acid being sequenced that correspond with the nucleotides that are not reversibly terminated are not sequenced.
  • the method may include identifying nucleotides of the nucleic acid being sequenced that correspond with the reversibly terminated nucleotides.
  • the nucleic acid being sequenced comprises a region that includes only a subset of nucleotides selected from A, C, G, and T, and wherein the subset of nucleotides are not sequenced.
  • the subset of nucleotides selected from A, C, G, and T comprises 2 nucleotides selected from A, C, G, and T.
  • the subset of nucleotides selected from A, C, G, and T comprises 3 nucleotides selected from A, C, G, and T.
  • the region comprises a primer sequence.
  • the region does not include a barcode sequence, recode nucleic acid sequence or a portion thereof, or a cycle nucleic acid sequence or a portion thereof.
  • the region that is not sequenced may comprise 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000, or more nucleotides, or a range of nucleotides defined by any two or more of the aforementioned integers.
  • the part that is sequenced may comprise 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000, or more nucleotides, or a range of nucleotides defined by any two or more of the aforementioned integers.
  • the subset includes a combination of A, G, C, or T.
  • the subset of nucleotide constituents identified through DNA sequencing is 2 of 4 natural nucleotides (e.g. 2 of A, G, C and T).
  • the subset may include A and G, A and C, A and T, G and C, G and T, or C and T.
  • the subset may exclude A and G, A and C, A and T, G and C, G and T, or C and T.
  • the subset of nucleotides identified through DNA sequencing is A and C.
  • the subset being sequenced includes all four natural nucleotides, wherein non-natural nucleotides are incorporated and are not sequenced and are skipped by non-reversibly terminated nucleotides.
  • the subset of nucleotide constituents identified through DNA sequencing is 3 of 4 natural nucleotides (e.g. 3 of A, G, C and T).
  • the subset may include A, G and C; A, G, and T; A, C, and T; or G, C and T.
  • the subset may exclude A, G and C; A, G, and T; A, C, and T; or G, C and T.
  • the subset of nucleotides may be sequenced through the use of modified nucleotides (e.g. dideoxy (ddNTPs) such as may be used in Sanger sequencing).
  • the modified nucleotides may include reversible terminated chemistry.
  • the modified nucleotides may include a dye or tag such as a fluorescent dye or tag.
  • the modified nucleotides may be provided in a sequencing reaction.
  • other nucleotides not included in the subset are not sequenced (e.g. are skipped).
  • the nucleotides not included in the subset may exclude the modification. For example, unmodified nucleotides corresponding to the nucleotides that are skipped or not included in the subset may be used in a sequencing reaction mix.
  • methods may include sequencing a subset of nucleotides of an oligonucleotide molecule, comprising: (a) providing a solution that includes oligonucleotides to be sequenced; (b) providing a sequencing reagent comprising one or more nucleotides as predominantly reversibly terminated nucleotides and one or more nucleotides as predominantly non-terminated nucleotides; (c) preparing (a) for sequencing according to protocols for a sequencing system; (d) sequencing the prepared solution of (a) using as at least one component of the sequencing reagents the sequencing reagent of (b) for at least one cycle of DNA sequencing; and (e) obtaining a sequence order for a subset of the nucleotides in the original oligonucleotide sequence.
  • the oligonucleotides have been designed to contain information about the composition of a peptide or amino acid from a peptide.
  • the oligonucleotide is a memory oligo, a recode tag, a recode block, or a cycle tag.
  • the oligonucleotide is derived from a protein sequencing method that creates barcoded nucleic acid information representing protein sequence and/or protein identity.
  • the oligonucleotides is any nucleic acid sequence that embodies information related to peptide or amino acid sequence or composition.
  • oligonucleotides that include 2, 3, 4, 5, 6 or more different types of nucleotide constituents and that employ a subset of the nucleotide constituents to represent cycle, amino acid, location, and/or protein information; (b) utilizing the physicochemical properties the designed oligonucleotides within a protein sequencing method, such as may be described herein; (c) collecting DNA sequence information for the nucleotides that represent protein information; and (d) analyzing DNA sequence information of a subset of nucleotides to infer protein information.
  • N-terminal modifications are important to cellular function and are integral to cell regulation and health. [Ramazi, S., Zahiri, J. Post-translational modifications in proteins: resources, tools and prediction methods. Database (2021) Vol. 2021]. It is estimated that most human and plant proteins are post-translationally modified. As many as 90% are N-acetylated, and as many as 20% N-glycosylated.
  • Metadata may be converted to a recode block and associated with single-molecule protein sequence information via the methods described in the present disclosure. Grouping metadata and sequencing data provides a more comprehensive protein analysis. The concept may be generalized to identify any feature of a protein or sample (metadata) that along with sequence information provides additional biological insight.
  • a process for identifying and recording N-terminal modifications, before their removal from a protein. This may involve the use of a metadata conjugate capable of recognizing an N-terminal modification.
  • FIG. 54 illustrates a simplified block diagram of an exemplary workflow (Process 400) for capturing N-terminal modifications during protein sequencing, according to embodiments of the present disclosure.
  • FIG. 55 schematically illustrates exemplary operations of the process in FIG. 54, according to embodiments of the present disclosure.
  • process 400 begins at operation 402, where proteins from an isolated biological sample are immobilized on a solid support, some of the proteins having one of a set of a priori defined N-terminal modifications.
  • a metadata conjugate having 3 functional moieties is introduced. Functions include: 1) a binding moiety, which could be an antibody, aptamer, single-chain variable fragment (scFv), or any molecule having specificity for an N- terminal modification.
  • an antibody-conjugate specifically designed to recognize, e.g., N-terminal ubiquitin of the protein is introduced, as shown at left in FIG. 55, 2) a metadata tag.
  • this could be an oligonucleotide, as shown in FIG. 55, and 3) an immobilizing moiety, e.g., a moiety that joins the metadata tag to either the cognate peptide, or directly to the solid support in proximity to the anchor point of the cognate immobilized peptide.
  • an immobilizing moiety e.g., a moiety that joins the metadata tag to either the cognate peptide, or directly to the solid support in proximity to the anchor point of the cognate immobilized peptide.
  • the metadata tag is immobilized. Immobilization can be facilitated through multiple mechanisms.
  • the metadata conjugate can undergo a chemical modification to form a covalent bond with the protein or a functional group present on the solid support or may be joined via photon-induced immobilization upon exposure to a specific wavelength of light.
  • the post-translational modification is enzymatically or chemically removed from the immobilized protein to expose a free amino N-terminus.
  • Modifications may be removed by the action(s) of aminopeptidase C, aminopeptidase N, dipeptidase, endopeptidase such as trypsin, ClpAP protease, Sentrin/SUMO-specific proteases (SENP), deubiquitinating enzymes or by any suitable means.
  • aminopeptidase C aminopeptidase C
  • aminopeptidase N dipeptidase
  • endopeptidase such as trypsin
  • ClpAP protease ClpAP protease
  • Sentrin/SUMO-specific proteases SEubiquitinating enzymes or by any suitable means.
  • Exposing the free N-terminal amine may allow reverse translation protein sequencing to proceed on the immobilized proteins, step 410, as described in embodiments herein (e.g., process 200) and in PCT/US2023/070077 included by reference in its entirety.
  • the metadata tag is assembled together with nucleic acid recode blocks that represent amino acid sequence information of the associated peptide. Ligation, polymerization, or other suitable methods may be utilized. The result is a chimeric nucleic acid (memory oligo) that not only represents the amino acid sequence of the corresponding immobilized protein, but also carries the information of the metadata tag.
  • DNA sequencing at operation 414 provides detailed information not only about the protein’s amino acid sequence but also about its former modification. As a result, the final data set offers a comprehensive picture of the protein, including N-terminal modification, thereby significantly enriching the proteomic analysis.
  • Some embodiments include contacting a peptide having a post-translational modification with a binding agent that targets or binds to the post-translational modification.
  • the binding agent comprises a nucleic acid having a sequence representing the post- translational modification.
  • Some embodiments include transferring the sequence information or its complement to the peptide or a position proximal to the immobilized peptide.
  • Some embodiments include removing the post-translational modification from the peptide.
  • Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing the peptide.
  • Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing an amino acid of the peptide. Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing at least one amino acid of the peptide. In some embodiments, the information representing said post-translational modification is incorporated into the nucleic acid sequence, and the nucleic acid sequence is sequenced.
  • the binding moiety specifically binds an N-terminal modification.
  • the N-terminal modification comprises a post-translational modification.
  • Some embodiments include: before contacting the immobilized amino acid complex with a binding agent: contacting the immobilized amino acid complex with a modification binding agent, wherein the N- terminal amino acid comprises a post-translational modification, and wherein the modification binding agent comprises a modification binding moiety for preferentially binding to the post-translational modification; and a modification recode tag comprising a recode nucleic acid corresponding with the modification binding agent, thereby forming a complex comprising an immobilized post-translationally modified amino acid complex and a modification binding agent and thereby bringing a cycle tag into proximity with a modification recode tag.
  • a metadata tag comprises a wobble in 1 or more positions, the position(s) being a priori defined.
  • the number of wobbles in the metadata tag for one metadata attribute may be the same or different than the number of wobbles in the metadata tag representing any other metadata attribute.
  • the position of the wobble within the metadata sequence of metadata attribute i may be the same or different than the position of wobbles in any other metadata attribute.
  • a recode block comprises cycle tags, recode tags, and/or metadata tags that have one or more wobbles.
  • a memory oligo comprises recode blocks and/or metadata blocks having one or more wobbles.
  • the information of the wobble base(s) is directly transferred to the recode block or memory oligo through the actions of enzymatic ligation, chemical ligation, polymerization, or any combination thereof.
  • the information of the wobble base(s) is directly or indirectly assembled to create a recode block or memory oligo using any of the methods describe herein.
  • the number of wobble bases is the same across elements, leading to a consistent PIN length. In some embodiments the number of wobble bases is the different across elements, leading to a variation in PIN length.
  • a memory oligo comprising information of wobble base(s) is analyzed to determine and/or utilize the Polymer Identity Number (PIN).
  • PIN Polymer Identity Number
  • the random information of each tag can be informatically combined in silico to generate a PIN.
  • the present disclosure provides methods for analyzing, and for preparing for analysis, polymeric macromolecules, such as peptides, polypeptides, and proteins. Accordingly, aspects of the present disclosure relate to the field of proteomics.
  • Protein sequencing is a valuable technique in proteomics and molecular biology, offering valuable insights into the structure and function of proteins.
  • mass spectrometry emerging techniques like single molecule protein sequencing, the capacity to analyze complex protein samples has increased considerably.
  • Carrier proteins are useful to mitigate non-specific adsorptive losses. These proteins may serve as a matrix that prevents a target protein from adsorbing onto the walls of sample containers or other analytical equipment. Examples of carrier proteins may include [3-lactoglobulin, bovine serum albumin (BSA), and ovalbumin, among others. The amount of carrier protein added to a sample can range from 1:1 to 100:1 (carrier-to-target protein mass ratio), depending on the specific requirements of the experimental design.
  • carrier proteins may introduce a significant computational and analytical burden when sequencing, thereby inflating throughput requirements. For instance, inclusion of carrier proteins at a 10:1 mass ratio can increase the raw data generated by up to an order of magnitude, requiring additional computational power for data sorting and analysis. Moreover, the carrier proteins themselves may become subject to sequencing, complicating the data output. Current methods for mitigating this issue, such as sample pre-processing and data filtration, are inadequate and can introduce bias or error.
  • Analysis, as described herein, may refer to sequencing, such as protein sequencing, and other related processes for determining one or more characteristics of a macromolecule.
  • compositions, systems, and methods for preparing the preparatory phase (e.g., before cyclic sequencing) of a sample for protein sequencing Such compositions, systems, and methods facilitate improved protein sequencing accuracy and efficiency.
  • the present disclosure provides novel chemically and enzymatically modified carrier proteins, and other carrier molecules, that are inert to common protein sequencing techniques, as well as methods of forming the same.
  • the present disclosure offers a novel, practical approach to improving the efficiency and accuracy of protein sequencing in scenarios where carrier proteins are utilized, thereby addressing critical gap in the field.
  • amino acid and notation “AA” refer to natural d-, 1-, non-natural, and post-translationally modified amino acids.
  • An “N-terminal amino acid” refers to an amino acid that has a free amine group, and is linked to only one other amino acid of the peptide through an amide bond.
  • a “C-terminal amino acid” refers to an amino acid that has a free carboxyl group, and is linked to only one other amino acid of the peptide through an amide bond.
  • binding agent refers to an entity comprised of a binding moiety joined with a recode tag.
  • the binding moiety and recode tag may be joined by a linker.
  • a binding moiety may form a covalent association or non-covalent association with target analytes, which include immobilized conjugate complexes, such as an immobilized PTC-AA-cycle tag-conjugate complex.
  • the binding moiety may exhibit preferential binding to one conjugate complex over another one depending on the amino acid of the complex.
  • the binding moiety may bind preferentially to classes of amino acids that are structurally or functionally similar within the conjugate complex.
  • C/AA tag refers to a nucleic acid molecule of any length, but typically in the range 5-40 bases, having a sequence that represents a particular amino acid and cycle of a single-molecule sequencing workflow. The length of a C/AA tag may differ for different cycles of the workflow.
  • the C/AA tag may optionally comprise additional nucleic acid sequences that direct assembly of memory oligos in subsequent steps, such as unifying assembly sequences which facilitate recode block assembly irrespective of the order of assembly.
  • a C/AA tag may optionally comprise a restriction endonuclease sequence, and/or a sequence facilitating amplification of recode blocks, or any sequence functionality disclosed herein.
  • chemically-reactive conjugate may include or refer to a conjugate comprising (a) a reactive moiety(ies) that can bind and cleave a terminal amino acid, (b) a reactive moiety that allows immobilization to a solid support, and (c) a cycle tag with identifying information regarding the workflow cycle.
  • various moieties may be replaced with reactive moieties for attaching said groups.
  • a chemically-reactive conjugate may include a reactive moiety that can bind to a first reactive moiety such as a molecule comprising an ITC.
  • conjugate complex and “immobilized conjugate complex” refer to a chemicallyreactive conjugate having been joined optionally as appropriate within the context to: an amino acid (e.g., a monomer of the macromolecular analyte), a peptide, a linker, a solid support, and/or a cycle tag.
  • amino acid e.g., a monomer of the macromolecular analyte
  • linker e.g., a peptide, a linker, a solid support, and/or a cycle tag.
  • cycle tag e.g., a cycle tag.
  • complementary refers to Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds.
  • a nucleic acid includes a nucleotide sequence described as having a "percent complementarity" or “percent homology” to a specified second nucleotide sequence.
  • a nucleotide sequence may have 80%, 90%, or 100% complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence.
  • cycle tag refers to a nucleic acid molecule of any length, but typically in the range 5-20 bases, having a sequence that is defined to represent a particular cycle of the recode workflow.
  • the length of a cycle tag may differ for different cycles of the workflow.
  • the cycle tag may optionally comprise additional nucleic acid sequences that direct assembly of memory oligos in subsequent steps, such as universal assembly sequences which facilitate recode block assembly irrespective of the order of assembly.
  • a cycle tag may optionally comprise a restriction endonuclease sequence.
  • cycle tag may also refer to any construct that enables a method of subsequent identification of the cycle information, such as a mass tag.
  • deprotecting refers to removing protecting moieties that preserve the integrity of a functional group during exposure to conditions and potential reactants that may otherwise react to alter the functional group.
  • exemplary protecting agents for nucleic acids include: FMOC, acetyl (Ac), benzoyl (Bz), dimethylformamidine (DMFA), and phenoxyacetyl (PAC). See, Radhakrishnan P. Iyer, Current Protocols in Nucleic Acid Chemistry.
  • hydrogel refers to synthetic polymers, natural polymers, and/or hybrid polymers.
  • exemplary monomers that may form the hydrogel include one or more: acrylamide, acrylate, vinyl pyridine, dihydroxy methacrylates, other methacrylates, HEMA, PHEMA, PVA, HPMC, PLGA, PEG, etc., in linear, branched, and crosslinked configurations, block co-polymers configurations, or other configurations conducive to sequencing macromolecules.
  • a hydrogel may be associated with a solid support through covalent or non-covalent interactions.
  • the hydrogel may further comprise orthogonal conjugation chemistry modalities to support the recode workflow.
  • initiation oligo refers to any nucleic acid that when immobilized is capable of initiating assembly of co-localized nucleic acids.
  • Initiation oligos may comprise a sequence facilitating assembly of recode blocks, determination of a shared relative spatial location, amplification of nucleic acids, and/or ULI, U-AS, U-HS, hyb tags, ligation complements, or combinations of these or other sequences described herein for the analysis of peptides. It is of any length, but typically in the range 15- 80 bases and the length of one hyb tag may differ from that of another.
  • an initiation oligo may optionally comprise a restriction endonuclease sequence.
  • Initiation oligos may be linked via a linker, or may comprise a linker. It is also recognized that the initiation oligos may be another molecular format that is not a nucleic acid, and that information may be joined with an initiation oligo via a chemical reaction.
  • ITC-Conjugate refers to a molecule having an amine-reactive group (RG), including but not limited to isothiocyanate, alkyl isothiocyanate, aryl isothiocyanate, substituted aryl isothiocyanate, isoselenocyanate, alkyl selenocyanate, aryl selenocyanate, substituted aryl selenocyanate and a functional group capable of being joined to a CRC, or other complementary reactive element.
  • RG amine-reactive group
  • the ITC-conjugate may include substituents that influence the reactivity or physicochemical characteristics of the molecule, such as fluoro, nitro, halo, carboxyl, cyano, pyridyl, ether, thioether, amide, carbonate, carbamate, tertiary amino, quaternary amino groups and combinations thereof.
  • the ITC conjugate may possess heterocyclic structures including imidazole, pyrazole, pyrazines, thiophene, furan, pyrrole, pyran, pyrimidine, oxazole, thiazole. It is recognized that in the case of C-terminal sequencing the RG group would be reactive to the carboxyl terminus.
  • ligation oligo refers to a nucleic acid that becomes ligated to a cycle tag of an immobilized conjugate complex when appropriately directed by a cognate binding agent via hybridization to the recode tag of the cognate binding agent.
  • Ligation oligos may, in certain embodiments, hold information related to amino acid and workflow cycle assembly, and are complementary to the recode tag of a cognate binding agent. It is also recognized that the ligation oligo may be another molecular format that is not a nucleic acid, and that recodes amino acid and workflow cycle information that can be joined with a cycle tag via a chemical reaction.
  • ligation oligos may optionally comprise a sequence facilitating ligation, extension: ligation, or chemical ligation of a recode block to another other recode block irrespective of the order of assembly. For example, by including a 3’ and/or 5’ universal assembly sequence on a plurality of recode blocks such that at least two recode blocks share the same universal assembly sequence, assembly of such recode blocks into a memory oligo, in any given order, is enabled.
  • linker refers to a molecule used to join two or more molecules.
  • the composition of the molecule may be a polymer, a monomer or combination of both.
  • a linker may further comprise reactive elements that promote covalent and/or non-covalent conjugation between molecules.
  • Exemplary linkers include those used to join a binding agent to a recode tag, or a cycle tag to other elements of a conjugate complex, e.g. a molecule having a NHS-ester at one end and an azide at the other end of a PEG molecule, or a molecule having a biotin at one end and an maleimide moiety at the other end of a nucleic acid.
  • linking oligo refers to a nucleic acid capable of promoting ligation between a recode block associated with a given workflow cycle and a second recode block associated with any other workflow cycle of the recoding process. Linking oligos are useful to complete the assembly of a memory oligo, because they can substitute for errors, e.g., in upstream processes that resulted incomplete or unexpected recode block sequence for one or more workflow cycles, no recode block assembly for one or more workflow cycles, or steric effects that prevent interaction between and assembly of recode blocks.
  • n-1 refers to a cycle prior to the last cycle and, so on. It can also refer to a nearest and a next-nearest subunit molecule to the terminal subunit of a macromolecular analyte.
  • PITC-conjugate refers to a chemically-reactive conjugate that has not been reacted with an amino acid or a solid support. It is recognized that the qualifier “PITC” is representative terminology to describe any number of molecules (or sets of molecules) that can function similarly to bind to N-terminal or C-terminal amino acids and cleave the terminal subunit.
  • immobilized conjugate complex refers to a chemically-reactive conjugate that has been reacted with an amino acid been immobilized to a solid support. It is recognized that the qualifier “PTC” is representative terminology to describe any number of alternative molecules (or sets of molecules) that can function similarly to bind to N-terminal or C-terminal amino acids and cleave the terminal subunit.
  • post-translational modification refers to any modification of an 1-, d-, or non-natural amino acid, either biologically or synthetically.
  • the modifications can occur at the terminal amine, the terminal carboxyl, or any reactive moiety of a peptide. Examples include, but are not limited to, phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, and succinylation. Further information regarding post- translational modifications may be found in, DOI: 10.1021/acs.biochem.7b00861. Biochemistry 2018, 57, 177-185, which is herein incorporated by reference in its entirety.
  • recode block refers a construct created by interaction between a cycle tag of an immobilized conjugate complex and the recode tag of a cognate binding agent.
  • a recode block is a chimeric nucleic acid molecule that contains the information relating the workflow cycle and the amino acid, or class of amino acid, composition that comprises the conjugate complex. Further, the recode block holds information to direct assembly of a memory oligo, and/or amplify the recode block.
  • a recode block may be formed by utilizing an extension-ligation method to transfer information from the recode tag to the recode block, or via a ligation reaction under appropriate conditions in the presence of ligase and ligation oligo, or otherwise transferring information from the cycle tag to a separate entity.
  • a recode block may be formed by utilizing an extension-ligation method to transfer information from the cycle tag to the recode block, or via a ligation reaction under appropriate conditions in the presence of ligase and ligation oligo, or otherwise transferring information from the cycle tag to a separate tag.
  • a recode block may be formed by otherwise transferring information from the cycle tag and recode tag to a separate entity.
  • recode tag refers to a nucleic acid molecule of any length, but typically in the range 15-60 bases, having a sequence comprised of an zth cycle tag complement, an AA tag complement, and an (z-l)th cycle tag complement. It provides identifying amino acid (or monomer subunit) information for its associated binding agent. It may uniquely identify one amino acid or may identify a class of amino acids with structural and/or functional similarity.
  • a recode tag may provide a probabilistic estimate as to the identity of the amino acid component of an immobilized PTC-AA-cycle tag-conjugate complex, and thereby provide sufficient information for analysis.
  • a recode tag may optionally comprise the ith cycle tag complement, an AA tag complement, and/or a universal assembly sequence or a complement of the universal assembly sequence that aids in the assembly of a memory oligo.
  • a recode tag may optionally comprise a universal assembly sequence at both the 3’ and 5’ ends to facilitate memory oligo assembly without regard to the order of assembly of constituent recode blocks.
  • a recode tag may comprise a sequence facilitating amplification of recode blocks.
  • the solid support may comprise a glass slide or wafer, a silicon slide or wafer, a PC, PTC, polyethylene (PE), high density polyethylene (HDPE), or other plastic slide, a teflon, nylon, nitrocellulose membrane, or borosilicate capillary, a ceramic surface or a gold surface.
  • Particles and beads may be formed from polystyrene, cross-linked polystyrene, agarose, or acrylamide. Beads or nanoparticles may be magnetic or paramagnetic to support separation or purification processes.
  • Solid supports may be passivated with glass, silicon oxide, tantalum pentoxide, DLC diamond-like carbon, or other passivation agents.
  • a “solid support,” including membranes, may be passivated or activated via corona or other plasma treatments methods. Solid supports may further be assembled with other components to facilitate fluid transport and/or detection (e.g., flowcell, biochip, a microtiter plate. Solid supports may comprise an associated hydrogel that supports joining components for macromolecule recoding and/or analysis workflows. In certain examples, the term, “solid support” may include any of the described solid supports above further associated with a hydrogel.
  • splint refers to a nucleic acid with complementarity to the 5’ end of one nucleic acid and the 3’ end of another nucleic acid, such that hybridization of the splint to both nucleic acids brings the 5 ’and 3’ ends into proximity to promote either chemical or biological ligation.
  • UMI unique molecular identifier
  • universal priming site or “universal primer” refers to a nucleic acid molecule, which may be used for library amplification and/or during NGS.
  • exemplary universal priming sequences can include P5, P7, P5’, P7’, SBS Read 1, and SBS Read 2 primers.
  • workflow cycle or “cycle” refers to the iteration number of any one of the operations of a process flow or method described herein.
  • nucleic acid or nucleic acid sequence refers to a nucleic acid or nucleic acid sequence.
  • a nucleic acid is or includes RNA.
  • a nucleic acid is or includes DNA.
  • Some sequences include uracil (U) or (T), and in some embodiments, where applicable a U may be replaced with a T or vice versa when considering DNA or RNA.
  • the beads exhibited an increase in fluorescence during the hybridization reactions with the fluorescent complementary oligo to Sysl SOC. Significant fluorescence was detected in the dehybridization solutions, and the beads subsequently lost most of their fluorescence following the dehybridization treatment. After undergoing Edman degradation, and with the PPO Sys3-SOC immobilization, the hybridization with the fluorescent Sys3 SOC complementary oligo resulted in a fluorescence level akin to that observed during the Sysl SOC hybridization. Upon dehybridization, the dehybridization solutions again displayed significant fluorescence, and the beads, in turn, lost most of their fluorescence.
  • aminoxy-PEGl - azide may be conjugated to a cycle tag oligonucleotide, which has a formylindole modification.
  • the aminoxy group of an aminoxy-PEGl -azide will react with the aldehyde group on the formylindole nucleobase to form an oxime bond.
  • the azide group can be used to generate further linkages, if desired.
  • NEB New England Biolabs
  • a DNA ladder (cat# 10597012 from Invitrogen) was prepared, following the indicated procedures, and denatured in 0.1M NaOH before loading on the gel.
  • Gel electrophoresis (FIG. 36) showed the successful creation of the desired product and successful ligation in the presence of modified bases internal to the 5’ and 3’ ends of the SOC oligonucleotide.
  • In lane 2 are the products from the ligation of the 45-mer oligo with tether arm with a 30-mer ligation oligo on both the 3' (SEQ ID NO: 85) and 5' (SEQ ID NO: 84) ends.
  • PCR was conducted on ligation output (Fig.
  • This example describes validation of the affinity binding capacity, binding kinetics, and binder affinity to contact the immobilized amino acid complexes with a binding moiety.
  • binder fidelity plays a role in the sequencing accuracy.
  • An in-silico simulation was conducted to assess the impact of binder fidelity on the accuracy of protein identification.
  • a probability matrix was computed for a set of analyte-ligand complexes using empirically determined binding constants of N- terminal amino acid binding proteins (NAABs from Rodriques et al, see FIG. 46A-46B).
  • a cohort of proteins randomly selected from the UniProt database was mutated according to the steady-state probabilities in the matrix to simulate a 'measured cohort' using the NAABs.
  • the 'measured cohort' was mapped using an in-house custom alignment algorithm to evaluate the impact of binder infidelity.
  • a custom alignment algorithm was also developed to assess the alignability of the mutated proteins with the reference proteins. This algorithm utilized the Levenshtein distance between pairs of mutated peptide strings and reference proteins. The distance between letters as a function of the inverse of the probability matrix elements was also accounted for. This approach can ensure that likely mutations are perceived as closer to the reference string than unlikely mutations.
  • the reagents include cartridge fluid, capture kits (consisting of reagents such as low and high refractive index normalization fluids (4% and 32% glycerol), EDC, NHS, lOmM HC1, and IM Ethanolamine, lOmM Sodium Acetate, and lOmM MES), and Streptavidin Reagent Kit (part # ALTO-R-STV-KIT).
  • the experiment included adjusting ligand concentration, salt concentrations, and analyte concentrations to provide optimal density for analyte binding on the 48 analyte wells of the 16-Channel Carboxyl disposable cartridge.
  • This example describes the multi-step construction of chemically-reactive conjugates and the binding agents, and provides the desired outcomes of the chronological performance of some embodiments of the recoding processes.
  • Biologically or synthetically derived samples may be manipulated prior to the recoding process. These manipulations may include lysis, purification, enrichment, protein fragmentation, etc.
  • Serine proteases or serine endopeptidases include a broad class of enzymes that cleave peptide bonds in proteins.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Urology & Nephrology (AREA)
  • Microbiology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Cell Biology (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present disclosure relates to compositions of matter, methods, and systems for analyzing polymeric macromolecules, including polymeric macromolecules such as peptides, polypeptides, and proteins.

Description

DETERMINATION OF PROTEIN INFORMATION BY RECODING AMINO ACID
POLYMERS INTO DNA POLYMERS WITH METADATA TAGGING
CROSS-REFERENCE
[001] This application claims the benefit of U.S. Provisional Application No. 63/620,076, filed January 11, 2024, and U.S. Provisional Application No. 63/551,359, filed February 8, 2024, which applications are incorporated herein by reference.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING
[002] The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled 062954-502001 WO.xml, created January 10, 2025, which is 267,407 bytes in size. The information in the electronic format of the Sequence Listing is incorporated by reference in its entirety.
FIELD
[003] The present disclosure relates to compositions of matter, methods, and systems for analyzing polymeric macromolecules, including polymeric macromolecules such as peptides, polypeptides, and proteins.
BACKGROUND
[004] Proteins are fundamental to cellular function. Accordingly, the sequences of the thousands of proteins within each cell, as well as their concentrations, are critical indicators of cell health. Aberrant sequences or concentrations of proteins may signal a disease state. However, tools and technologies are currently lacking for sensitive, accurate, economical, and unbiased characterization of proteomes. Early detection of unusual sequences and/or concentrations is critical to the diagnosis and treatment of many diseases, such as, e.g., cancer. For these and other reasons, better tools to evaluate protein and peptide sequence and concentration in biological samples should be developed.
[005] Once such tools are available, discovery of novel biomarkers, accurate determination of concentrations for even the lowest-abundance proteins, discovery of important post-translational modifications, and monitoring of the dynamics of the proteome are some of the first steps toward improving healthcare. These initial steps toward deeper understanding and earlier detection of important signatures of cancer and other health conditions will allow diagnosis at the earliest stages, facilitate therapeutic discovery, and create beneficial impact on patient care by informing the course of treatment. [006] There is thus a need in the art for compositions of matter, methods, and systems for highly- parallelized, accurate, sensitive, and high-throughput proteomic analysis. The present disclosure addresses this and other needs.
SUMMARY
[007] The present disclosure relates to or includes compositions of matter, methods, and systems for analyzing polymeric macromolecules, including peptides, polypeptides, and proteins, in a highly- parallel and high-throughput manner via recoding their sequences into DNA polymers.
[008] Disclosed herein, in some embodiments, are assay methods such as assay methods including dynamic range compression. Disclosed herein, in some embodiments, are assay methods, comprising: contacting peptides with a binding agent that targets or binds to a high-abundance peptide of the peptides, wherein the binding agent comprises a nucleic acid having a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme; and performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides, wherein the reverse translation protein sequencing comprises: incorporating the nucleic acid sequence with said recognition site for a restriction enzyme or complement thereof into the nucleic acid sequence representing a high-abundance peptide, introducing a restriction enzyme to the nucleic acid sequences; thereby cleaving nucleic acid sequence representing high-abundance peptides and thereby depleting representation of high-abundance peptides from among the nucleic acid sequences, and sequencing the nucleic acid sequences. Performing reverse translation protein sequencing may comprise performing a method herein such as a method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support.
[009] Disclosed herein, in some embodiments, are methods for analyzing a peptide having a post- translational modification. Disclosed herein, in some embodiments, are methods for analyzing a peptide having a post-translational modification, comprising: contacting a peptide having a post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a sequence representing the post-translational modification; transferring the sequence information or its complement to the peptide or a position proximal to the immobilized peptide; removing the post-translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing the peptide, wherein the information representing said post-translational modification is incorporated into the nucleic acid sequence, and the nucleic acid sequence is sequenced. Performing reverse translation protein sequencing may comprise performing a method herein such as a method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support. [0010] Disclosed herein, in some embodiments, are methods for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising: (a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) performing (bl) or (b2): (bl) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N- terminal amino acid residue of the peptide; and (z) a immobilizing moiety for immobilization to the solid support; or (b2) coupling the N-terminal amino acid with a first reactive moiety (e.g. an ITC conjugate), and providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety (e.g. that binds or reacts with the ITC conjugate); and (z) a immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the chemically-reactive conjugate of (bl) thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex, or contacting the peptide with the chemically-reactive conjugate of (b2) thereby coupling the chemically-reactive conjugate to the first reactive moiety coupled to the N-terminal amino acid of the peptide to form the conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as the N-terminal amino acid residue on the cleaved peptide, and thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (g) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block; (h) obtaining sequence information for the recode block; and (i) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide.
[0011] Disclosed herein, are methods for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising: (a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) coupling the N-terminal amino acid with a first reactive moiety (e.g. an ITC conjugate); (c) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety (e.g. that binds or reacts with the ITC conjugate); and (z) a immobilizing moiety for immobilization to the solid support; (d) contacting the peptide with the chemically-reactive conjugate of (b2) thereby coupling the chemically-reactive conjugate to the first reactive moiety coupled to the N-terminal amino acid of the peptide to form the conjugate complex; (e) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (f) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as the N-terminal amino acid residue on the cleaved peptide, and thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (g) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (h) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block; (i) obtaining sequence information for the recode block; and (j) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide.
[0012] In some embodiments, in the post-translational modification comprises phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, or succinylation. In some embodiments, each binding agent comprises a recode tag comprising the recognition site for a restriction enzyme or a complement thereof, or comprising the sequence representing the post-translational modification or a complement thereof. Some embodiments include generating a recode block comprising sequence information from a cycle tag, and sequence information from the recode tag. Some embodiments include generating a memory oligonucleotide from the recode block and other recode blocks. In some embodiments, the binding moiety comprises an antibody or a fragment thereof, or aptamer. In some embodiments, the binding moiety specifically binds an N-terminal modification. In some embodiments, the N-terminal modification comprises a post-translational modification. Some embodiments include: before contacting the immobilized amino acid complex with a binding agent: contacting the immobilized amino acid complex with a modification binding agent, wherein the N-terminal amino acid comprises a post-translational modification, and wherein the modification binding agent comprises a modification binding moiety for preferentially binding to the post-translational modification; and a modification recode tag comprising a recode nucleic acid corresponding with the modification binding agent, thereby forming a complex comprising an immobilized post-translationally modified amino acid complex and a modification binding agent and thereby bringing a cycle tag into proximity with a modification recode tag. Some embodiments include removing the N-terminal modification from the N-terminal amino acid. Some embodiments include: based on the obtained sequence information, determining identity and positional information of the N-terminal modification or post-translational modification. In some embodiments, the peptide is immobilized by being bound to a solid support. In some embodiments, the solid support comprises a bead, plate, chip, slide, glass, silica, resin, gel, hydrogel, membrane, polystyrene, metal, nitrocellulose, mineral, plastic, polyacrylamide, latex, or ceramic. In some embodiments, the peptide comprises a hormone, neurotransmitter, enzyme, antibody, viral protein, bacterial protein, synthetic peptide, bioactive peptide, peptide hormone, oligopeptide, polypeptide, fusion protein, cyclic peptide, branched peptide, recombinant protein, tumor marker, therapeutic peptide, antigenic peptide, or signaling peptide. In some embodiments, the peptide is associated with a disease. In some embodiments, the peptide is obtained from a sample comprising a cell lysate, blood sample, plasma sample, serum sample, tissue biopsy, saliva sample, urine sample, cerebrospinal fluid sample, sweat sample, synovial fluid sample, fecal sample, gut microbiome sample, environmental water sample, soil sample, bacterial culture, viral culture, organoid, tumor biopsy, sputum sample, or hair sample.
BRIEF DESCRIPTION OF THE DRAWINGS
[013] So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only exemplary embodiments and are therefore not to be considered limiting of its scope, and may admit to other equally effective embodiments.
[014] Accordingly, the foregoing and other features and advantages of the present disclosure will be more fully understood from the following detailed description of illustrative embodiments taken in conjunction with the accompanying drawings in which:
[015] FIG. 1 illustrates an exemplary segmentation of the field of proteomics by technology.
[016] FIG. 2 illustrates a simplified block diagram of an exemplary workflow for analyzing polymeric macromolecules, including polymeric macromolecules such as peptides, and proteins, according to embodiments of the present disclosure.
[017] FIG. 3 schematically illustrates a process comprising various operations of the workflow of FIG. 2, according to embodiments of the present disclosure.
[018] FIG. 4 schematically illustrates an exemplary solid support for spatially supporting macromolecule analytes during the process of FIG. 3, according to embodiments of the present disclosure.
[019] FIG. 5 schematically illustrates the interaction of chemically-reactive conjugates with terminal amino acids of immobilized peptides during the operations of FIG. 3, according to embodiments of the present disclosure. [020] FIG. 6 schematically illustrates the immobilization of chemically-reactive conjugates onto a solid support during the operations of FIG. 3, according to embodiments of the present disclosure.
[021] FIG. 7 schematically illustrates the cleavage of terminal amino acids (e.g., the cleavage of peptide bonds) after conjugate immobilization during the operations of FIG. 3, according to embodiments of the present disclosure.
[022] FIG. 8 schematically illustrates the result of iteratively repeating operations of FIG. 5-7, according to embodiments of the present disclosure.
[023] FIG. 9 schematically illustrates the assembly of an exemplary configuration of a recode block, according to embodiments of the present disclosure.
[024] FIG. 10 schematically illustrates the assembly of an exemplary configuration of a recode block, according to embodiments of the present disclosure.
[025] FIG. 11 schematically illustrates the transfer of amino acid identity information from a binding agent’ s recode tag to an immobilized conjugate’ s cycle tag to form a recode block via ligation, according to embodiments of the present disclosure.
[026] FIG. 12 schematically illustrates an iterative process for assembling recode blocks, according to embodiments of the present disclosure.
[027] FIG. 13 schematically illustrates the relative sizes of various constituents in the process of FIG. 3, according to embodiments of the present disclosure.
[028] FIG. 14 schematically illustrates the separation of incompatible chemical operations during the process of FIG. 3, according to embodiments of the present disclosure.
[029] FIG. 15A-15B schematically illustrate assembly of a recode block according to some methods described herein. FIG. 15A shows one embodiment where the cycle tag oligonucleotide of an immobilized amino acid complex is utilized to splint the ligation of a cognate recode tag oligonucleotide and a solution oligonucleotide. In some embodiments, no ligation occurs without a cognate binding agent interaction with an isolated amino acid complex. FIG. 15B shows one embodiment wherein the cycle tag oligonucleotide of an immobilized amino acid complex is utilized to capture a bridging oligonucleotide that serves as a splint for ligation of a cognate recode tag oligonucleotide and a solution oligonucleotide. In some embodiments, no ligation occurs without a cognate binding agent interaction with an isolated amino acid complex.
[030] FIG. 16 schematically illustrates immobilization of multiple oligonucleotides having cycle and amino acid identity information to a surface in proximity to the anchor point of an immobilized peptide. [031] FIG. 17 schematically illustrates immobilization of multiple oligonucleotides having cycle and amino acid identity information to a surface in proximity to the immobilization point of an isolated amino acid complex.
[032] FIG. 18 schematically illustrates immobilization of multiple isolated amino acid complexes having cycle and amino acid identity information to a surface in proximity to the immobilization point of an immobilized peptide. [033] FIG. 19 schematically illustrates the assembly of a single memory oligo for subsequent DNA sequencing analysis, according to embodiments of the present disclosure.
[034] FIG. 20 schematically illustrates the remediation of incomplete recode blocks during memory oligo assembly, according to embodiments of the present disclosure.
[035] FIG. 21 schematically illustrates various oligonucleotide constituents within a sample volume during recode block assembly, according to embodiments of the present disclosure.
[036] FIG. 22 schematically illustrates various oligonucleotide constituents within a sample volume during memory oligo assembly, according to embodiments of the present disclosure.
[037] FIG. 23 schematically illustrates the release of memory oligos and conjugate complexes from a solid support, according to embodiments of the present disclosure.
[038] Fig. 24A-24F schematically illustrates methods to initiate memory oligo assembly. FIG. 24A illustrates polymerase extension of an immobilized hyb tag oligo and RNAse treatment to provide a single-stranded Unifying Assembly Sequence (U-AS) 3’ end. FIG. 24B illustrates how a PCR primer or other functionality may be imparted to the initiation sequence. FIG. 24C shows that the initiation sequence may be inert with respect to interaction with binding agents. FIG. 24D illustrates an embodiment wherein the hyb tag anchors the initiation oligo via homo or heteroduplex interaction. Figure FIG. 24E shows a configuration whereby location information may be assembled with amino acid identity and cycle information. FIG. 24F schematically illustrates generalized methodologies and configurations related to initiating assembly of localized nucleic acids.
[039] Fig. 25 depicts a general construct that may participate in stepwise random assembly or parallel random assembly of localized nucleic acids. U-AS may be or include a sequence specific to one cycle of amino acid isolation, or U-AS or may be or include the same sequence such that information of a recode block may be joined with information of any other recode block. U-AS’ splint oligo is an example of an element that may be used to join information between recode blocks to create a memory oligo. C/AA is a sequence that represents amino acid and positional information.
[040] Fig. 26 illustrates a process of stepwise random assembly of localized nucleic acids according to methods described herein.
[041] Fig. 27 illustrates a process of stepwise random assembly of localized nucleic acids according to methods described herein.
[042] Fig. 28 illustrates a process of stepwise random assembly of localized nucleic acids according to methods described herein.
[043] FIG. 29A-29B show PPO functionality. Relative fluorescence units trace binding, cleaving, and immobilization (e.g. steps 1 - 4 of FIG. 3) by PPO of an N-terminal amino acid residue of an immobilized peptide.
[044] FIG. 30 schematically illustrates the adjustment of access between oligonucleotide constituents during memory oligo assembly according to the methods described herein, according to embodiments of the present disclosure. [045] FIG. 31 illustrates the utilization of universal sequences during memory oligo assembly, according to embodiments of the present disclosure.
[046] FIG. 32 is a schematic showing transfer of information from a location oligo to a recode block, according to embodiments of the present disclosure.
[047] FIG. 33 schematically illustrates an alternative event during performance of the methods described herein, according to embodiments of the present disclosure.
[048] FIG. 34 is a schematic describing useful process steps, system geometry, and components.
[049] FIG. 35 shows an example model CRC with a model vanillin molecule in place of an oligonucleotide for proof-of-concept analysis showing creation of the three described groups.
[050] FIG. 36 shows gel data of ligation of a model cycle tag and a recode tag to generate a recode block.
[051] FIG. 37 shows an example of a cyclic protection and deprotection workflow.
[052] FIG. 38 schematically illustrates a 2-step assembly of a CRC with the N-terminus of a peptide.
[053] FIG. 39 schematically illustrates a 2-step assembly of a CRC with the N-terminus of a peptide.
[054] FIG. 40 schematically illustrates a 2-step assembly of a CRC with the N-terminus of a peptide.
[055] FIG. 41 Illustrates an ITC-conjugate (ITCC) having an amine -reactive moiety (RG) for binding the NTAA and a functional group (X) for coupling to a chemically-reactive conjugate (CRC) and 2) a chemically-reactive conjugate comprising a cycle tag (Ct), an immobilizing moiety (SI), and a functional group (X’) that can couple with the ITC-conjugate.
[056] FIG. 42A-42B show a CRC synthesis processes and intermediate molecules.
[057] FIG. 43 shows functionality of PPO: Relative fluorescence units (RFU) of PPO immobilized to an azide-modified surface via Cu-catalyzed Huisgen cycloaddition followed by reaction with amine- labelled fluorescein.
[058] FIG. 44 Shows functionality of PPO. Relative fluorescence units of a fluorescent oligo complementary to the oligo on PPO immobilized to an azide-modified surface via Cu-catalyzed Huisgen cycloaddition.
[059] FIG. 45 shows functionality of PPO: Relative fluorescence units (RFU) of PPO immobilized to a different azide-modified surface via Cu-catalyzed Huisgen cyclo addition followed by reaction with amine-labelled fluorescein.
[060] FIG. 46A-46D show example simulations binding of a commercially-available binder to an immobilized PTH-ligand.
[061] FIG. 47 shows PCR data of ligation of a model cycle tag and a recode tag to generate a recode block.
[062] FIG. 48 schematically illustrates joining oligonucleotides associated with a CRC and a non- covalent biomolecular recognition molecule through their interaction with elements of an immobilized biomolecule.
[063] FIG. 49 shows q-PCR data of the product created from the configuration depicted in figure 38. [064] FIG. 50 shows HPLC chromatograms for oligonucleotides exposed to anhydrous TFA simulating Edman peptide cleavage conditions.
[065] FIG. 51 illustrates the steps of an exemplary workflow to deplete DNA information associated with targeted protein(s) from a population of DNA representing the proteins of a sample. The process may be applied to polymeric macromolecules of a biological sample, including polymeric macromolecules such as peptides and proteins of a blood plasma sample, according to embodiments of the present disclosure.
[066] FIG. 52 schematically illustrates exemplary operations of the process in FIG. 51, according to embodiments of the present disclosure. A metadata conjugate detects abundant protein to be depleted, whether intact protein, denatured protein, or modified protein (e.g. human albumin has unique 4 amino acid sequence at the N-terminus). The depletion target is incorporated into a memory oligo which is subsequently depleted from the DNA sample prior to NGS sequencing by the action of a restriction endonuclease.
[067] FIG. 53 schematically illustrates various post-translational modifications of proteins, and more particularly, N-terminal modifications thereof, as examples of molecular attributes that can be identified, in addition to sequence, by the novel compositions, systems, and methods for polymeric macromolecule analysis described herein.
[068] FIG. 54 illustrates steps of an exemplary workflow to associate DNA representing an N- terminal modification of a protein with DNA representing sequence of the cognate protein.
[069] FIG. 55 schematically illustrates exemplary operations of the process in FIG. 54, according to embodiments of the present disclosure. A metadata conjugate may detect an N-terminal modification such as N-terminal ubiquitination, and a metadata tag may be immobilized for subsequent incorporation into a memory oligo that also contains amino acid sequence information of the associated protein peptide or macromolecule. Afterwards, the N-terminal modification may be removed to allow reverse translation of the peptide sequence and analysis.
[070] FIG. 56 illustrates a concept of Peptide Identification Numbers. Random degenerate bases placed at a priori defined positions within a recode block following assembly into a memory oligo may be analyzed in silico to provide a uniquely identifying sequence. In the figure, non-shaded areas represent nucleotide cycle codes, and shaded areas represent amino acid codes, m represents a metatag. 1 represents any recode block, i is any other recode block, j is any appropriate integer, k is any appropriate integer not j, and N# represents a degenerate random base.
[071] FIG. 57 schematically illustrates an exemplary mechanism for modifying a carrier protein or peptide to render it unresponsive to sequencing operations, according to embodiments of the present disclosure.
[072] FIG. 58 schematically illustrates another exemplary mechanism for modifying a carrier protein or peptide to render it unresponsive to sequencing operations, according to embodiments of the present disclosure. [073] FIG. 59 depicts a flow diagram for an exemplary set of reactions leading to a modified analyte containing some or all of the original amino acid identity and positional information, and that is amenable to protein sequencing, according to embodiments of the present disclosure.
[074] It should be understood that the drawings are not necessarily to scale, and that like reference numbers refer to like features. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
DETAILED DESCRIPTION
[075] Disclosed herein, in some embodiments, are assay methods. Some assay methods herein are useful for reverse translation protein sequencing of peptides. The assay methods may include tagging or identifying nucleic acid sequences representing high-abundance peptides, and cleaving the nucleic acid sequences representing high-abundance peptides. Some embodiments relate to methods for analyzing post-translationally modified peptides. Such analysis may include reverse translation protein sequencing of the post-translationally modified peptides, and tagging or identifying sequences representing the post-translationally modified peptides or post-translationally modified amino acids of the post-translationally modified peptides.
[076] Some methods and compositions described herein may be useful for determining identity and positional information of an amino acid residue of a peptide. The peptide may be coupled to a solid support, contacted with a chemically-reactive conjugate which cleaves an N-terminal amino acid of the peptide and couples the N-terminal amino acid to the solid support with a cycle tag. This may then be contacted with a binding agent, such as one specific for the N-terminal amino acid. The binding agent may include a recode tag. The cycle tag and recode tag may include nucleic acid information which may be sequenced to obtain the identity and positional information of the N-terminal amino acid. The process may be repeated for various amino acids of the peptide. Thus, positional and information of amino acid residues of proteins may be recoded using nucleic acids and obtained upon sequencing the nucleic acids.
[077] Methods herein may include aspects of WO2024015875 or W02024040236, which are incorporated herein in their entirety. For example, some methods herein may include aspects of a chemically reactive conjugate, or of a protein recoding method or a reverse translation method of WO2024015875 or W02024040236. Performing reverse translation protein sequencing may comprise performing some aspects of a method for determining identity and positional information of an amino acid residue of a peptide such as a peptide coupled to a solid support.
INTRODUCTION
[078] Sequences and concentrations of cellular and secreted proteins are useful indicators of cell health. Aberrant sequences or concentrations may signal a disease state. However, tools and technologies are currently lacking for sensitive, accurate, economical, and unbiased characterization of proteomes. Early detection of unusual sequences and/or concentrations is critical to the diagnosis and treatment of many diseases, such as, e.g., cancer. For these and other reasons, better tools to evaluate protein and peptide sequence and concentration in biological samples must be developed.
[079] Next-generation sequencing (NGS) of DNA and RNA polymers has transformed diagnostic, clinical, and research approaches by enabling clinicians and researchers to analyze billions of DNA sequences at high throughput and low cost. The ability to detect and quantify proteins and peptides, however, has lagged behind that of nucleic acids in large part because there is no equivalent to polymerase chain reaction (PCR) for amino acid polymers. New tools to sensitively quantify proteins and assess their sequences can, similar to NGS, aid in understanding cellular processes, continue to transform research, diagnostics, clinical approaches, and help facilitate precision medicine.
[080] Current state-of-art proteomics toolkits include the following general approaches: 1) Edman degradation followed by chromatography; 2) fragmentation followed by advanced separation and mass spectroscopy (MS) techniques; and 3) recognition of proteins via affinity molecules. These methods provide much useful information. However, none of these approaches creates information at the scale, throughput, reproducibility, access, or cost needed to unlock transformative applications in research, diagnostics, or therapeutics.
[081] Peptide sequencing based on Edman degradation was first proposed and automated by Pehr Edman in the 1950’ s. The process is analogous to Sanger sequencing. Briefly, stepwise degradation of the N-terminal amino acid on a peptide through a series of chemical reactions and downstream HPLC analyses is used to collect peptide sequence information. First, the N-terminal amino acid is reacted with phenyl isothiocyanate (PITC) under basic conditions (typically NMP/methanol/water) to form a phenylthiocarbamoyl (PTC) derivative. In a second step, the PTC-modified amino group is treated with acid (typically anhydrous TFA) to yield an ATZ-modified (2-anilino-5(4)-thiozolinone) amino acid, separating the amino acid from the polymer and creating a next N-terminus on the polypeptide. The cyclic ATZ-amino acid is converted to a PTH-amino acid derivative and analyzed via chromatography. These steps are then repeated sequentially to determine a peptide sequence. It is effective, but upfront protein sample requirements are high, and the process lacks the throughput and cost to support large scale discovery.
[082] More recently, multiplexed methods and devices for Edman degradation-based peptide sequencing of micro quantities of proteins have been developed. For example, see Chhabra, U.S. Patent No. 7,611,834 B2. However, such methods and devices are still unsuitable for highly-parallelized, high- throughput proteomic analysis.
[083] In the last 20 years, peptide analysis by fragmentation and analysis via mass spectroscopy (here, LC/MS) has been increasingly used to quantify protein abundance and determine sequence. Additionally, in certain applications, recognition-based proteomics has been employed. In this approach, affinity molecules, such as antibodies or antibody fragments, aptamers, RNA, or modified proteins, are commonly engineered to recognize the tertiary structure of analytes. Often, these are linked to molecular beacons that fluoresce or provide other means of detecting the binding event, such as in ELISA assay. However, like previous approaches, fragmentation and recognition-based methods lack the throughput and efficiency to support large scale discovery.
[084] The present disclosure provides methods for analyzing polymeric macromolecules, such as peptides, polypeptides, and proteins. Accordingly, aspects of the present disclosure relate to the field of proteomics.
[085] FIG. 1 illustrates a segmentation of the field of proteomics by technology. As described above, the current landscape for proteomic analysis includes the following general approaches: 1) Edman degradation followed by chromatography; 2) fragmentation followed by advanced separation and mass spectroscopy techniques; and 3) recognition of proteins via affinity molecules. While these (and other) approaches can provide useful information for researchers, they do not provide such information at the scale, throughput, or cost needed to unlock transformative applications in research, diagnostics, or therapeutics. Some more particular challenges associated with current approaches (e.g., Edman’s, LC/MS, and affinity approaches) include: a. Protein folding is dynamic, and proteins can lose their characteristic shape. When they do, recognition-based methods become inaccurate. This can happen in the case of labile proteins, or uncontrolled sample treatment prior to analysis. b. Recognition-based methods do not inform as to whether the protein sequence is a catalytically-ineffective variant, as often becomes the case in cancer biology. c. Biomarkers of interest are likely to be present at fM or lower concentrations, beneath the detection limit of most available tools used to quantify the absolute abundance of multiple proteins. d. The universe of protein molecules is extensive. It is much more complex than the RNA transcriptome, due to additional diversity introduced by post-translational modifications (PTMs). e. Proteins within a cell dynamically change (in expression level and modification state) in response to the environment, physiological state, and disease state. Thus, proteins contain a vast amount of relevant information that is largely unexplored. f. Generating an effective collection of affinity agents having low cross-reactivity between to off-target macromolecules can be time-consuming. g. Multiplexing the readout of a collection of affinity agents having minimizing crossreactivity between the affinity agents and off-target macromolecules is challenging. h. Existing methods and the automation around current approaches is slow, expensive, and for the case of Edman’s methods, have a limited throughput of only a few peptides per day. i. LC/MS suffers from drawbacks including: high instrument cost, requirement for a sophisticated user, poor quantification ability, poor dynamic range. Since proteins ionize at different levels of efficiencies, absolute quantitation and even relative quantitation between samples is challenging. j. LC/MS analyzes the more abundant species, so there is a need to employ complex upfront sample preparation, e.g., nanoparticle corona, making characterization of low abundance proteins challenging. k. LC/MS sample throughput is typically limited to a few thousand peptides per run.
More recent attempts to develop methodologies suffer from poor discrimination of n- terminal or c-terminal AA and are confounded by “neighbor effects.” Still other single molecule methodologies under development will require costly instrumentation because amplification of the analyte is not possible, and one must detect small numbers of photons, electrons, or detection elements. Some expose recording tags repeatedly to harsh chemical agents, and are inefficient due to the cyclic nature of serial processing. [086] Precise measurement of proteomics metadata, such as N-terminal modification, glycosylation, and protein conformation, in concert with sequence information is imperative, because anomalies in these meta parameters often correlate with disease states. [Reynaud, E. (2010) Nature Education 3(9):28]. However, capturing these biologically-relevant attributes along with sequence is not yet possible using current proteomics analysis technologies. Further, complexities and challenges associated with proteomic sample preparation limit holistic evaluation of the proteome and have highlighted a need for improved methods thereof.
[087] Comprehensive measurement of N-terminal modifications presents both a challenge and an opportunity for proteomic analysis. These modifications, which are common in cellular proteins, influence protein function, stability, and interactions. Typical N-terminal modifications include acetylation, propionylation, various forms of methylation (monomethylation, dimethylation, trimethylation), myristoylation, palmitoylation, and ubiquitylation. Each of these modifications confer distinct functional attributes and/or regulatory control over proteins, making them vital aspects of cellular physiology and signaling pathways. The challenge is that the presence of these modifications impedes the initiation of protein sequencing. For example, N-terminal modifications, and the presence of complexed small molecules or disulfide bonds, obstructs traditional sequencing chemistries, necessitating their removal prior to sequencing process. Obviously, removal, while essential for sequencing, leads to the loss of critical information embodied by the modifications, thus diminishing the comprehensiveness of the protein analysis. Recording the presence of N-terminal modifications prior to their removal provides valuable data. Thus, there is a need in the field for a solution in which removal of N-terminal modifications does not translate into a loss of valuable functional and regulatory information integral to biological understanding. [088] Another challenge in collection of holistic proteomic information is related to the massive protein dynamic range. The concentration distribution of proteins in plasma is highly skewed; the ten most abundant proteins account for approximately 90% of total protein mass, and the number expands to 22 proteins representing roughly 99% of the total mass. [Anderson and Anderson, 2002; Anderson, N. L., & Anderson, N. G. (2002) Molecular & Cellular Proteomics, 1(11), 845-867]. This skewed distribution hampers analytical sensitivity and complicates the identification of low-abundance proteins, which may carry essential diagnostic or prognostic information. Current methods such as immunoaffinity depletion have been employed to remove high-abundance proteins; however, these approaches have notable limitations including the potential loss of target proteins, incomplete depletion, and high operational costs. [Zolotarjova, et.al., (2008). Proteomics, 8(19), 3952-3961]. Additionally, depletion strategies may inadvertently remove proteins that form complexes with the abundant proteins, thereby obscuring biologically relevant information. Thus, the dynamic range of proteins in biological samples, such as blood plasma, presents a significant obstacle for precise comprehensive protein concentration quantification. Similarly, abundant proteins may obscure low-abundance proteins, which carry essential diagnostic information. Traditional protein quantification methods are analog in nature, as opposed to the digital counting capability that is inherent for the sequencing method of the present disclosure. Unbiased depletion of DNA amplicons associated with interfering high-abundance proteins, coupled with deep read NGS data acquisition, will allow effective interrogation of even low abundance proteins in a biological sample.
[089] Given the shortcomings and limitations of existing techniques in managing protein dynamic range, protein metadata, and addressing post-translational modifications during single molecule protein sequencing, there is an acute need for innovative solutions to these issues. Methods are needed to effectively prepare proteins for sequencing while preserving this ‘metadata,’ and do so without compromising the accuracy, efficiency, and comprehensiveness of protein quantitation. The herein proposed methods address these shortcomings, enhancing the potential for advancements in biomarker discovery, diagnostics, and drug development.
[090] The present disclosure addresses the above challenges, as well as other needs, by providing compositions, systems, and methods for incorporating typically overlooked molecular attributes into sequencing data during analysis of polymeric macromolecules, thereby ensuring a holistic understanding of the structure and function of a target protein. Such methods as described herein can be referred to as “metadata tagging.”
[091] The present disclosure addresses the above challenges as well as other needs by providing methods, systems, and compositions for analyzing polymeric macromolecules via recoding of their sequences and metadata attributes into DNA polymers for subsequent DNA sequencing and analysis. IMPROVED METHODS FOR DETERMINING PROTEIN SEQUENCE AND ABUNDANCE
[092] Disclosed herein, in some embodiments, are methods for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising: (a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) performing (bl) or (b2): (bl) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N- terminal amino acid residue of the peptide; and (z) a immobilizing moiety for immobilization to the solid support; or (b2) coupling the N-terminal amino acid with a first reactive moiety (e.g. an ITC conjugate), and providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety (e.g. that binds or reacts with the ITC conjugate); and (z) a immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the chemically-reactive conjugate of (bl) thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex, or contacting the peptide with the chemically-reactive conjugate of (b2) thereby coupling the chemically-reactive conjugate to the first reactive moiety coupled to the N-terminal amino acid of the peptide to form the conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as the N-terminal amino acid residue on the cleaved peptide, and thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (g) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block; (h) obtaining sequence information for the recode block; and (i) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide. The process may be performed on multiple polypeptides, such as multiple immobilized polypeptides together or simultaneously.
[093] Some embodiments include determining identity information of an amino acid residue of a peptide. Some embodiments include positional information of an amino acid residue of a peptide. The peptide may be coupled to a solid support. Some embodiments include providing the peptide. Some embodiments include coupling the peptide to the solid support. The peptide may be coupled to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support. The N-terminal amino acid (NTAA) residue may be exposed to reaction conditions. Some embodiments include providing a chemically-reactive conjugate (CRC). The CRC may include a cycle tag. The cycle tag may include a cycle nucleic acid, which may be associated with a cycle number. The CRC may include a reactive moiety. The reactive moiety may bind the NTAA. The reactive moiety may cleave the NTAA. The CRC may include an immobilizing moiety, which may be for immobilization to the solid support. Some embodiments include coupling the N-terminal amino acid with a first reactive moiety (e.g. ITC) or a first reactive moiety conjugate (e.g. an ITC conjugate). Some embodiments include providing a CRC comprising a second reactive moiety that binds the first reactive moiety or first reactive moiety conjugate. The second reactive moiety may react with the first reactive moiety or first reactive moiety conjugate. Some embodiments include contacting the peptide with the CRC. Some embodiments include coupling the CRC to the NTAA of the peptide to form a conjugate complex. Some embodiments include coupling the CRC to the first reactive moiety or first reactive moiety conjugate coupled to the NTAA of the peptide to form the conjugate complex. Some embodiments include immobilizing the conjugate complex to the solid support, e.g. via the immobilizing moiety. Some embodiments include cleaving and thereby separating the N-terminal amino acid residue from the peptide. Some embodiments include exposing the next amino acid residue as an NTAA residue on the cleaved peptide. Some embodiments include providing an immobilized amino acid complex. The immobilized amino acid complex may include the cleaved and separated N- terminal amino acid residue. Some embodiments include contacting the immobilized amino acid complex with a binding agent. The binding agent may include a binding moiety, which may preferentially bind to the immobilized amino acid complex. The binding agent may include a recode tag. The recode tag may include a recode nucleic acid. The recode nucleic acid may correspond with the binding agent. Some embodiments include forming an affinity complex. The affinity complex may include an immobilized amino acid complex. The affinity complex may include a binding agent. Some embodiments include bringing a cycle tag into proximity with a recode tag. Some embodiments include bringing a cycle tag into proximity with a recode tag within each formed affinity complex. Some embodiments include transferring information of a recode tag. Some embodiments include generating a recode block. Some embodiments include transferring information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block. Some embodiments include obtaining sequence information for the recode block. Some embodiments include generating a memory oligonucleotide, which may comprise multiple recode blocks. Some embodiments include determining identity or positional information of an amino acid residue of the peptide. Some embodiments include performing a method on multiple polypeptides. The multiple polypeptides may be immobilized, such as together on a solid support. Some embodiments include performing a method on multiple polypeptides together. Some embodiments include performing a method on multiple polypeptides simultaneously. [094] Referring to FIG. 1, certain embodiments of the present disclosure fall into the segment: “Proteomics” >> “Advanced Research” >> “Chemical >> “Sequence-based.” The numerous applications of the present disclosure include peptide sequence and quantification determination in synthetically-derived and biologically-derived samples that include a plurality of protein complex, protein, and/or polypeptide components.
[095] Some embodiments of the methods described herein include any of the following steps: 1: binding to substrate; 2: functionalized PITC conjugation to amino acid; 3: immobilization of PITC conjugate to hydrogel substrate; 4: cleavage of amino acid via Edman degradation; 4a: nucleotide deprotection; 5: build recode blocks with binders; 6: memory oligo assembly; and 7: release of oligo for sequencing.
[096] FIG. 2 illustrates a simplified block diagram of an exemplary workflow 200 for analyzing polymeric macromolecules according to embodiments of the present disclosure. More particularly, the workflow 200 comprises high-level overview of various methods herein, and how such methods fit synergistically with DNA sequencing technologies. As shown, samples of macromolecules, e.g., proteins and peptides, are prepared and immobilized onto solid supports (Box 1). While immobilized, the amino acid sequences of the macromolecules are converted, e.g., “recoded,” into DNA sequences (Box 2), and the DNA sequences amplified into libraries for NGS sequencing (Box 3). The DNA libraries are then sequenced (Box 4) and analyzed (Box 5) via high-throughput, high-accuracy methods, thereby enabling low cost.
[097] FIG. 3 schematically illustrates various operations of the workflow of FIG. 2, according to embodiments of the present disclosure. More particularly, FIG. 3 illustrates primary stages of the “recoding” operations of FIG. 2 as a process 300. As shown, there are three distinct and separable stages for the recoding process 300, and each stage is depicted in a row of operations.
[098] In a first stage (operations 1-4 in FIG. 3), cycle information is captured. At operation 1, a surface of a solid support is prepared for attachment of: a macromolecular analyte, a set of universal primers, as well as attachment of a tri-functional chemically-reactive conjugate. This can be accomplished by providing 3 (or more) orthogonal chemistries on the surface of the support, shown in FIG. 3 as aldehyde-hydrazine, azide-alkyne, and thiol. Note that multiple conjugation chemistries are possible, including alternative chemistry functional groups to anchor primers, macromolecular analytes, and chemically-reactive conjugates to the solid support, as described below. Using at least one of the support’s orthogonally reactive modalities, a plurality of macromolecular analytes, e.g., proteins, protein fragments (i.e., peptides), or other polymers, are immobilized to the support surface.
[099] Operations 2 - 4 are then performed to immobilize tri-functional chemically-reactive conjugates (as conjugate-AA-cycle tag complexes). As shown, at operation 2, an N-terminal amino acid of the immobilized analytes is contacted with a chemically-reactive conjugate comprising a reactive group to the amino terminus, such as Edman’s reagent (phenyl isothiocyanate (PITC)), an orthogonally- reactive group to the support, and a nucleic acid molecule carrying information about the cycle when the conjugate was contacted with the analyte. Under basic conditions the PITC conjugate reacts with the N-terminal amino acid to form a phenylthiocarbamoyl-amino acid (PTC) conjugate. A stringent wash removes unreacted PITC conjugate, and then, at operation 3, activation of an orthogonal chemistry used to tether the conjugate to the support is initiated to immobilize the PTC conjugate in proximity to the anchor point of the associated analyte. For example, by changing redox conditions to induce disulfide formation, or adding Cu2+, stabilizer and redox components to induce a Click reaction, PTC- thiol conjugates or PTC-alkyne conjugates may be immobilized to the solid support. Following immobilization of conjugates to the solid support, a conjugate-reactive scavenger may be added to cap the reactivity of any bound conjugate that was not washed away in the previous step(s), to render it inactive for future n-terminal amino acid reaction. At operation 4, peptide bond cleavage targeting the N-terminal amino acid of the peptide is induced. In examples employing Edman’s degradation chemistry, this is facilitated by a change in pH from basic to acidic conditions. Operations 1-4 may then be repeated for n cycles to produce a lawn of n cycle-tagged conjugates localized on a solid support. [0100] In effect, a first iteration through operations 2-4 (i.e., first cycle) provides information related to the terminal monomer of the immobilized polymeric analyte. A second cycle thereof provides information related to the next monomer of the immobilized polymeric analyte, and so on. Iterating through steps 2-4 for n cycles creates a lawn of spatially localized conjugates holding cycle information. With appropriate spacing between anchor points of immobilized macromolecular analytes, conjugates associated with a single analyte are co-located and isolated from those of other analytes.
[0101] The second row of FIG. 3 depicts an operation of the iterative process whereby recode blocks are built. In this operation, amino acid information is associated with cycle information. Briefly, a plurality of binding agents that recognize the immobilized conjugate-AA-cycle tag complexes are introduced at operation 5a and bind to their cognate target at operation 5b. The binding agents are engineered to preferentially recognize specific conjugates based on differences in the cognate amino acid of the immobilized conjugate. Those agents that possess both the cognate AA and the cognate cycle information will thereby direct ligation of AA information to a cycle tag of the corresponding conjugate complex (operations 5c and 5d). Repeated binding, washing, and ligation allows multiple attempts to find cognate partners and transfer information to each immobilized conjugate-AA-cycle tag complex to build a recode block. Accordingly, multiple successive binding cycles can be used to drive the yield of information transfer from the binding agents to the immobilized conjugates to an arbitrarily high completion level.
[0102] In a third row of FIG. 3, the formed recode blocks are assembled into a memory oligo (e.g. combined into a single memory oligo). This oligo is capable of being amplified on the solid support or in solution, then analyzed using DNA sequencing methods to determine a sequence and/or abundance of the immobilized analytes. Briefly, at operation 6, the co-localized recode blocks interact based on their complementary DNA sequences to assemble a DNA oligonucleotide that represents the sequence of the original macromolecule. The process is similar to g-block assembly of a gene product. Assembly may be facilitated by a polymerase extension-ligation process, or by a ligation process.
[0103] Gaps in connectivity between co-localized conjugates may exist, for example, due to a) incomplete information accumulation during the sequential degradation of the peptide and immobilization of a PTC-AA-cycle tag-conjugate complexes, b) incomplete information transfer from a recode tag to a cycle tag during recode block assembly, or c) simply an incomplete ligation of available and existing recode block information during memory oligo assembly. To remedy these gaps and enable high-yield assembly of information into a single oligo that can be analyzed using DNA sequencing, a ligation step employing generic splint oligos may be executed. Thus, at operation 7, incomplete assembly of co-localized recode blocks and/or memory oligos is rectified by adding generic splints that are capable of substituting for recode blocks sequences that were not created at operation 5. In the case of missing recode block information, the amino acid information associated with an errant cycle will be lost, but substantial recode block information will be assembled into the memory oligo. At operation 8, the tethers of the recode blocks are released, and an amplifiable product is generated via polymerase extension. Optionally, the solid support surface may be restored by cleaving conjugates from the surface.
[0104] FIG. 4 schematically illustrates an exemplary solid support for spatially supporting macromolecule analytes, according to embodiments of the present disclosure. As shown, the solid support is coated by a hydrogel that supports orthogonal chemistries. Orthogonal chemistries depicted are: aldehyde -hydrazine, azide-alkyne, and thiol. Either a thiol or Click chemistry can be activated for attachment of tri-functional chemically-reactive conjugates, depending on the immobilization scheme chosen. The aldehyde -hydrazine conjugation is an exemplary chemistry that can provide specific and orthogonal immobilization of a macromolecular analyte. The surface is seeded with macromolecule analytes such that, predominantly, they are spatially separated and reactants that interact with one macromolecule do not interact with another. A volume element is defined by the radius circumscribed by the length of the macromolecular polymer, and the lengths of the linkers of the conjugate complexes and the binding agents.
[0105] FIG. 5 schematically illustrates the interaction of chemically-reactive conjugates with terminal amino acids of immobilized peptides during operation 2 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure. Generally, the conjugate has 3 functions: 1) bind to a terminal amino acid and cleave the peptide bond between the terminal amino acid and the next amino acid in the polymer (for N-terminal reactions, this is equivalent to the classical function of Edman’s reagent); 2) immobilize the conjugate to the solid support; and 3) carry a cycle tag oligo. Note the 1:1 relationship of an immobilized peptide with a conjugate in a given cycle. Conjugates that react with and become bound to the terminal amino acid in operation 2 of the recoding process 300 are shown as filled triangles in FIG. 5. Under basic conditions, the PITC conjugate reacts with the N-terminal amino acid to form a phenylthiocarbamoyl-amino acid (PTC) conjugate. Unreacted conjugates are shown as open triangles. Any unreacted PITC conjugate can be washed from the surface of the solid support prior to triggering the chemistry that joins the PTC-conjugate to the surface.
[0106] FIG. 6 schematically illustrates the immobilization of chemically-reactive conjugates onto a solid support during operation 3 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure. Generally, the conjugate immobilization reaction can be triggered by light, catalyst addition, or by modifying the buffer properties or temperature to control the rate of reaction. For example, reducing redox potential allows formation of stable disulfide linkages. A stringent wash removes unreacted PITC conjugate, and then activation of an orthogonal chemistry used to join the conjugate to the solid support is initiated to immobilize PTC conjugate in proximity to the anchor point of an associated peptide. For example, by changing redox conditions to induce disulfide formation, or adding Cu2+, stabilizer and redox components to induce a Click reaction, PTC-thiol conjugates or PTC- alkyne conjugates may be immobilized to the solid support. Note that the length of the peptide defines a volume element around the anchor point with the support, and conjugates associated with the specific peptide are co-localized to that anchor point. Following immobilization of conjugates to the solid support, a conjugate-reactive scavenger may be added to cap the reactivity of residual conjugate that was not reacted to an N-terminal amino acid, was incompletely washed, and became attached to the solid support. In this way, imperfect removal of PITC conjugate is remediated by introducing an amino acid mimic. Unreacted PITC-conjugate may bind to the surface, but its future reactivity to amino acids is quenched, leaving spectator conjugates on the surface that are not able to participate in downstream processes.
[0107] FIG. 7 schematically illustrates the cleavage of terminal amino acids (e.g., the cleavage of peptide bonds) at operation 4 of recoding process 300 in FIG. 3, according to embodiments of the present disclosure. In examples utilizing an Edman’s degradation chemistry, this is accomplished by a change in pH from basic to harsh acidic conditions, sometimes in organic solvents. Accordingly, the hydrogel and conjugation reactions are designed to withstand the peptide bond cleavage conditions. Also for this reason, the cycle tag nucleic acids and any other nucleic acids immobilized to the solid support will comprise protecting groups that prevent degradation of amines or other reactive moieties of the nucleic acid. Cleavage of the terminal amino acid results in release of the peptide and provides a new terminal amino acid on the immobilized peptide analyte. In FIG. 7, the immobilized PTC-AA- cycle tag-conjugate complexes are localized near the peptide analyte’s anchor point.
[0108] FIG. 8 schematically illustrates the result of iteratively repeating the operations of FIG. 5-7, according to embodiments of the present disclosure. More particularly, FIG. 8 illustrates the iteration of operations 2-4 of the recoding process 300. As shown, a series of co-localized conjugates, each having a cycle tag that carries information related to the relative position of an amino acid in one immobilized peptide analyte, are spatially isolated from the conjugates of other peptide analytes. The details of creation of each conjugate is independent of the information carried by the conjugate. Thus, immobilized conjugates carrying information derived from carboxy-terminus chemistry and immobilized conjugates carrying information derived from amine-terminus chemistry may be combined in downstream steps.
[0109] FIG. 9 schematically illustrates the assembly of a recode block, e.g., operations 5a-5b above, according to embodiments of the present disclosure. In this process of the recoding process 300, the amino acid identity information is aggregated with cycle information. A cognate binding agent interact with an immobilized conjugate as shown in the top panel of FIG. 9. Binding agents are engineered to preferentially recognize specific conjugates based on differences in the cognate amino acid of the immobilized conjugate. The binding energy of the binding agent is a combination of: (a) the binding energy of the affinity binding moiety and the conjugate, and (b) the hybridization energy between the cycle tag oligo of the conjugate and the recode tag oligo of the binding agent. Binding agents that possess both the cognate AA and the cognate cycle information will direct ligation of AA information to the cycle tag, as shown in the bottom panel. In practice, components for recognition of all AA conjugates for all cycles are present simultaneously to concurrently create recode blocks. Discrimination may be enhanced under “competitive” conditions in concert with slow annealing to find global max binding energies for the combined affinity binding moiety and the nucleic acids. Note that recognition of the immobilized PTC-conjugate on the solid support avoids near-neighbor effects from amino acids that were adjacent on the original peptide.
[0110] FIG. 10 schematically illustrates preparatory operations of an exemplary process for assembling the recode block of FIG. 9, according to embodiments of the present disclosure. The bottom panel in FIG. 10 shows a binding agent comprising a binding moiety and a recode tag, as well as a conjugate having a cycle tag. There are several possible interactions between binding agents and immobilized conjugates that may exist, since binding agents for recognition of all AA conjugates for all cycles are present simultaneously. They may be classified as: (a) correct cognate AA, correct cognate nucleotide; (b) correct cognate AA, incorrect cognate nucleotide; (c) incorrect cognate AA, correct cognate nucleotide; (d) incorrect cognate AA, incorrect cognate nucleotide; and (e) non-specific binding. Stringent wash conditions remove weakly bound binding agents from the surface. These may be due to cross-reactive binding of binding moieties to non-cognate PTC-AA-cycle tag conjugate complexes, and include interactions classified as either (c), (d) or (e). Interactions classified as (a) are productive during the next step of oligo ligation where information is transferred from a recode tag to a cycle tag to form a recode block. Interactions classified as (b) are not productive during the next step of oligo ligation. The characteristic time ( l/koff), where koff is the off rate of the cognate binding agent can exceed the time to effectively wash the solid support.
[0111] FIG. 11 schematically illustrates the transfer of amino acid identity information from a binding agent’s recode tag to an immobilized conjugate’s cycle tag to form a recode block via ligation, e.g., operation 5c-5d of recoding process 300, according to embodiments of the present disclosure. As shown, complementary ligation oligos and ligase are added in an appropriate buffer to support ligation and undergo information transfer only when cognate amino acid and complementary nucleic acid conditions are met. Binding agents that are cognate to the amino acid, but comprise a recode tag non- complementary to the cycle tag, do not undergo information transfer. Similarly, ligation oligos that are not complementary to the recode tag of the binding agent do not undergo information transfer.
[0112] FIG. 12 schematically illustrates iterative performance of the operations 5a-5d of recoding process 300 for assembling recode blocks, according to embodiments of the present disclosure. During the performance thereof, binding agents for recognition of all AA conjugates for all cycles are present simultaneously. Thus, the efficiency of correct binding of the cognate pair in any one trial may be low. Slow annealing will help to differentiate between interactions with similar binding energies, and drive binding of the cognate pair. However, this may not improve efficiency to desired levels. In addition, steric hinderance due to co-localization of immobilized conjugates may restrict access of binding agents to one or more conjugates in any given trial. To drive a high fraction of recode blocks assembly, multiple trials of bind, wash, and ligation can be employed. In a trial where an unproductive binding event occurs, no ligation occurs. A stringent wash to remove the non-cognate binding agents creates a new opportunity to find and anneal to the cognate agent in the next trial. In principle, assuming there are no systematic effects, repeating trials will drive recode blocks assembly to completion.
[0113] FIG. 13 schematically illustrates the relative sizes of various constituents of the recoding process 300, according to embodiments of the present disclosure. As shown, the relative sizes of the various constituents emphasizes the need to provide linker/spacers that allow ample freedom for constituents to interact, while also maintaining co-localization isolation for each immobilized macromolecular analyte on the solid support.
[0114] FIG. 14 schematically illustrates the separation of incompatible chemical operations during the recoding process 300, according to embodiments of the present disclosure. As shown, the recoding process 300 lends itself to separating these steps, such that toggling between chemistries to complete a cycle and/or reversible chemistries is not required.
[0115] FIG. 19 schematically illustrates the assembly of a memory oligo for subsequent DNA sequencing analysis at operations 6-8 of the recoding process 300, according to embodiments of the present disclosure. As shown, the overlapping and complementary sequences of co-localized recode blocks facilitate assembly thereof into a single oligo (memory oligo) that becomes the seed for analysis using DNA sequencing technologies. Several molecular biology methods may be useful during assembly. For example, a memory oligo may be assembled using extension via polymerase followed by ligation, or simply by using ligation methods. In the case of assembly by ligation, addition of single stranded 5’ phosphorylated DNA oligos complementary to the AA tags of the recode blocks facilitate assembly. Ligation directly to primer sequences immobilized to the solid support, such as the P5 and P7 sequences shown in FIG. 19, using chimeric splints having sequence complementary to recode blocks and to P5 or P7 sequence may facilitate memory oligo amplification. Subsequent to memory oligo assembly, the recode block tethers to the solid support may optionally be cleaved, and polymerase extension from the 3’ end of immobilized P5 or P7 may initiate cluster generation. Alternately, assembled memory oligo may undergo end-repair, A-tailing, sequencing adapter ligation, and amplification either in situ or in solution.
[0116] FIG. 20 schematically illustrates the remediation of incomplete recode blocks during memory oligo assembly, according to embodiments of the present disclosure. Thus, FIG. 20 illustrates operation 7 of the recoding process 300. As shown, gaps in connectivity between co-localized conjugates may exist, for example, due to a) incomplete information accumulation during the sequential degradation of the peptide and immobilization of a PTC-AA-cycle tag-conjugate complexes, b) incomplete information transfer from a recode tag to a cycle tag during recode block assembly, or c) an incomplete ligation of available and existing recode block information during memory oligo assembly. To remedy these gaps and enable high-yield assembly of information into a single oligo that can be analyzed using DNA sequencing, a ligation step employing generic splint oligos may be executed. Remediation may be accomplished simultaneously for all cycles by using a pool that contains splints capable to assemble any non-ligated recode block with any other non-ligated recode blocks. Alternatively, remediation may be accomplished by stepwise using a subset of the described pool. The “...” in FIG. 20 indicates completion of the series and represents intervening linking oligos not explicitly shown. Cl indicates the cycle tag sequence (or its complement sequence) and “n” denotes the total number of cycles.
[0117] FIG. 21 schematically illustrates various oligonucleotide constituents within a sample volume during recode block assembly, according to embodiments of the present disclosure. Accounting for interactions and tuning reaction conditions facilitates accurate and complete assembly during the recoding process 300. Within any given volume element surrounding the anchor point of a protein or peptide will exist immobilized PTC-AA-cycle tag-conjugate complexes that have: (a) same AA, different cycle information, and (b) different AA, different cycle information, BUT no complexes with (c) different AA, same cycle information or (d) same AA, same cycle information. In FIG. 21, “Group 1” constituents are present to support assembly of cycle 1 information. “Group 2” constituents are present to support assembly of cycle 2 information, and so on through group 3 to group “n”. Interactions within and across groups are cataloged at the top of each column. Total numbers of oligo constituents are given for each constituent type. Weak interactions due to hybridization of shortmer oligos is possible. These will be outweighed by the relatively stronger interaction of the binding moiety directing oligos for assembly. The heavy line shows a desired interaction assumed for a given recode block, AAi-Cyclei, and represents the total binding energy of the interaction. The light lines show exemplary possible oligo interactions. The Tm for these interactions is low, and thus erroneous ligation leading to erroneous recode block information is controlled. Recode blocks are shown with various tether sites, e.g., to 5’, 3’, and to internal nucleosides. The “ in FIG. 21 indicates completion of the series and represents intervening cycle tags, ligation oligos, or recode tags not explicitly shown. Cl indicates the cycle tag sequence (or its complement sequence), AA indicates the amino acid recode sequence, and “n” denotes the total number of cycles. [0118] FIG. 22 schematically illustrates various oligonucleotide constituents within a sample volume during memory oligo assembly, according to embodiments of the present disclosure. In FIG. 22, the effective concentrations of constituents are high due to the co-localization within the volume element defined by the length of the macromolecular analyte and the length of the linkers of the associated recode blocks. The complexity of oligos, however, is not high. And because cycle codes (Cl, C2, . . .Cn) and amino acid codes (AA1, AA2, . . .AAn) are defined using schema based on communication theory, mismatch ligation is unlikely. Note that even “incorrect assembly” that results from mismatch ligation produces an oligo with useful macromolecular analyte sequence information, since the cycle information flanks the amino acid information. Sequential information blocks in the memory oligo are redundant in determining the sequence of a peptide analyte. The “. . .” indicates completion of the series and represents intervening recode blocks or AA’ -complements not explicitly shown. The “n” denotes the total number of cycles.
[0119] FIG. 23 schematically illustrates the release of memory oligos and conjugate complexes from a solid support at operation 8 of the recoding process 300, according to embodiments of the present disclosure. An exemplary memory oligo is shown in FIG. 23 having p7 and P5 adapters. The memory oligo may also comprise a sample index, a UMI, a CRISPR PAM or spacer sequence, or other identifying nucleic acid sequence that may be incorporated during the NGS library preparation steps. Cleaving the tethers (or a subset of tethers) to the solid support is an optional step to improve the efficiency of PCR extensions involving the memory oligo. Conjugate removal from the surface is an optional process to clean-up the solid support prior to its use for downstream steps such as cluster generation, and NGS sequencing. In FIG. 23, reduction of a disulfide bond is depicted, which can be mediated by addition of dithiothreitol to a solution contacting the support surface.
[0120] In certain embodiments, starter (e.g. initiator) and terminator sequences may be designed to include sequencing adapters (e.g. P5 and P7) and sequencing primer binding sites directly in their sequences. Including these elements is useful for simplifying downstream library preparation for sequencing by eliminating additional adapter ligation steps. For example, a starter sequence may comprise, from 5' to 3': a P5 adapter sequence, a sequencing primer binding site, and a universal assembly sequence (U-AS). Similarly, a terminator sequence can include a U-AS complement, a sequencing primer binding site, and a P7 adapter sequence. This design can ensure that fully assembled memory oligos automatically contain any necessary sequences for cluster generation and sequencing. Additional elements like sample indices or unique molecular identifiers (UMIs) may also be incorporated into these sequences. When designing these sequences, care may be taken to ensure that the adapter and primer sequences do not interfere with the assembly process by forming unintended secondary structures or by participating in non-specific hybridization events.
[0121] Some embodiments include a starter sequence. A starter sequence may be or include an initiator sequence. The starter sequence may include a adapter sequence such as a P5 adapter sequence. The starter sequence may include a primer binding site such as a sequencing primer binding site. The starter sequence may include an assembly sequence such as a U-AS. The starter sequence may include the adapter sequence, the primer binding site, and the assembly sequence.
[0122] Some embodiments include a terminator sequence. The terminator sequence may include an assembly sequence complement such as a U-AS complement. The terminator sequence may include a primer binding site such as a sequencing primer binding site. The terminator sequence may include an adapter sequence such as a P7 adapter sequence. The terminator sequence may include the assembly sequence, primer binding site, and adapter sequence.
[0123] In certain embodiments, a recode block comprises a sequence that facilitates assembly of a memory oligo, and/or that facilitates target enrichment, target depletion, and/or sequencing sample preparation (e.g. NGS sample preparation), such as a CRISPR PAM or spacer sequence. For example, about 90% of the protein content in human blood plasma is albumin. It would be advantageous to deplete the albumin in plasma to improve the sensitivity to detect lower-abundance proteins that are interacting with albumin therein. Thus, depletion via DNA methods of enrichment or depletion following recoding may provide less biased sample preparation than depletion or enrichment of a protein sample via recognition-based methods of protein enrichment or depletion. Accordingly, oligo designs for a cycle tag, recode tag, recode block, and/or memory oligo may include CRISPR PAM and spacer sequences (or other) specific to albumin, e.g., NGG, Cl-AAtagMet-C2-AAtagLys, to preferentially deplete recoded albumin peptide sequences via cutting of the memory oligo amplicon with a CRISPR nuclease or other enzyme.
[0124] FIG. 29A-29B depict fluorescence values obtained throughout execution of steps 1 through 4 of FIG. 3. Relative fluorescence units (RFU) of fluorescent oligonucleotides complementary to cycle tags mark progress through advancing steps of the method. In FIG. 29A each bar shows a measurement of fluorescence in an advancing step. Bars 1 demonstrate minimal autofluorescence of the peptide and solid support used in the study. Bars 3 demonstrate capture of fluorescent oligonucleotides by CRCs immobilized to the solid support via the reaction of their reactive moiety (PITC) with the N-terminal amino acid of immobilized peptides. Low signal for bars 2 supports that signal is not related to unbound fluorescent oligonucleotides in solution, and is consistent with a signal emanating from fluorescent oligos captured by CRCs reacted to immobilized peptides on solid support. Bars 4 demonstrate the signal from fluorescent oligos released from the surface upon exposing the surface to mild chemical conditions that promote dehybridization of oligonucleotides. Relative values for bars 3 and 4 can be explained by a difference in volume during the measurements. Bars 5 corroborate the dehybridization of fluorescent oligonucleotides from the surface. Between measurement of bars 5 and 6 CRCs of sample B were immobilized to the surface via a Cu-catalyzed Huisgen cycloaddition reaction. Also, between measurement of bars 5 and bars 6, the surface was subjected to anhydrous acid under conditions that support cleavage of the N-terminal amino acid, exposing the next amino acid residue as a N- terminal amino acid residue on the cleaved peptide. Bars 6 show the progression of contacting a second CRC having a different cycle tag sequence to the surface via the reactive moiety (PITC). The CRC will be reactive toward newly exposed N-terminal amino acids of the immobilized peptides following the cleavage of the first N-terminal amino acid with acid. Bars 8 demonstrates capture of the new fluorescent oligo by CRC immobilized to the solid support via the reaction of its reactive moiety (PITC) with the new terminal amino acid of an immobilized peptide. Low signal for bars 7 supports that signal is not related to unbound fluorescent oligonucleotides in solution, and is consistent with a signal emanating from fluorescent oligos captured by the CRC reacted to the immobilized peptide on solid support. Bars 9 demonstrates the signal from fluorescent oligos released from the surface upon exposing the surface to mild chemical conditions that promote dehybridization of oligonucleotides. Relative values for bars 8 and 9 can be explained by difference in volume during the measurements. Bars 10 corroborate the dehybridization of fluorescent oligonucleotides from the surface. The progression of fluorescence signals confirms reaction, capture, and cleavage of a N- terminal amino acid residue of a peptide using matter and methods disclosed within. Strong signals for bars in steps 3, 4, 8, and 9 confirm functionality of the CRC to perform steps 2-4 of Fig. 3. In FIG. 29B each bar shows fluorescence in an advancing step of the method. The conditions and conclusion are the same as for bars B of FIG. 29A, with the exception that the starting azide-functionalized silica surface was supplied by a commercial source.
[0125] FIG. 30 schematically illustrates how efficiency of memory oligo assembly may be adjusted, according to the methods described herein. As shown, the large sphere in FIG. 30 represents a volume as defined by the length of an analyte polymer, e.g., an amino acid polymer. Within the large sphere are many smaller spheres. Each of these smaller spheres may represent a volume as defined by the binding agents and conjugates utilized during the recoding process, and more particularly the binding agents and conjugates utilized during operation 5 described above. Such volume is primarily dependent on the linker lengths of both binding agents and conjugates. Accordingly, to facilitate association of recode blocks during memory oligo assembly, the polymer (representative larger sphere) may be collapsed via known polymer collapse mechanism, such as those described in, e.g., Leonid Lonov, Hydrogel-based actuators: possibilities and limitations, Materials Today, 17,10, 494 (2014), which is herein incorporated in its entirety. Alternatively, the binding agents and conjugates (representative smaller spheres) may be expanded, e.g., by utilizing expandable spacers, linking oligos, and/or deconvolution of rare events in silica, as described elsewhere herein, thereby facilitating communication of neighboring recode blocks. Note that the recode blocks may be linked in any sequential order to create a memory oligo. Expandable spacers may include molecules that comprise multiple thiol groups. When disulfide bonds are formed, the range of the spacer is shortened, and when the cross-linkers are reduced, e.g., by addition of DTT, the spacer range is increased.
[0126] FIG. 31 illustrates the utilization of universal sequences to facilitate linking of recode blocks during memory oligo assembly without regard to any specific order, according to embodiments of the present disclosure. As described above, recode blocks may be linked in any sequential order to create a memory oligo. This is due to cycle information being immediately adjacent to amino acid information in assembled recode blocks, regardless of whether the recode blocks are in sequential or non-sequential order within a memory oligo. While assembly of recode blocks in the correct sequential order of an analyte may be efficient, the adjacent nature of the cycle and amino acid information in the recode blocks may cause redundancies. Thus, to avoid these redundancies while also relaxing the criteria for memory oligo assembly, the recode blocks may be assembled in random order.
[0127] To facilitate the assembly of memory oligos without regard to any specific order, universal assembly sequences may be utilized during the recoding process. Such universal sequences may be attached to the 5’ and/or 3’ ends of cycle tags and/or recode tags prior to introduction of these tags to the anchored analyte(s). Attaching complementary universal sequences to two or more cycle tags and/or recode tags facilitates the random linking (e.g., ligation) of resulting recode blocks during memory oligo assembly, without regard to sequential order, and a correct macromolecule analyte sequence may be assigned during post-sequencing analysis.
[0128] FIG. 32 schematically illustrates transfer of information from a location oligo to a recode block. In certain embodiments, during the recoding process, a peptide is attached to a solid support via a location linker, which may include any molecule configured to attach the peptide to the solid support, and further configured to bind to a nucleic acid. The nucleic acid can include any suitable type of nucleic acid sequence that carries code information related to the location of immobilized PTC-conjugates isolated on the solid support. The nucleic acid could be directly joined to hydrogel. This nucleic acid may be referred to as the “location oligo.” Location oligos may be attached to location linkers before or after binding of the peptide thereto, and/or before or after immobilization of the peptide to the solid support. During the recoding process, after transferring the information of the recode tags to the immobilized conjugate complexes to generate recode blocks, a PCR-like thermal cycling process may be performed to sequentially transfer location oligo information via polymerase extension onto a plurality of proximal recode blocks. Utilization of non-natural nucleotides in the synthetic nucleic acids, denoted in the figure as circles, and polymerase extension with only A, G, C, T and iC in the reaction solution may be used to control undesired polymerase extension.
[0129] FIG. 33 schematically illustrates an example of an alternative event during performance of the recoding methods described herein, according to embodiments of the present disclosure. More particularly, FIG. 33 depicts the inaccurate association of cycle tag information to a monomer of an analyte as caused by two conjugates being immobilized in close proximity to each other on a solid support. Referring back to operation 5a of the recoding process, upon introducing binding agents to immobilized conjugate-AA-cycle tag complexes, the binding agents that possess both the cognate AA and the cognate cycle information should recognize and bind to their target conjugates. Thereafter, the binding agent should direct ligation of its AA information, in the form of a recode tag, to a cycle tag of the conjugate complex. However, the proximity of immobilized conjugate complexes on a solid support may in rare occasions cause a binding agent that possesses the cognate AA but not the cognate cycle information to bind to a conjugate complex, which leads to an alternative recode tag-cycle tag ligation and thus, inaccurate associate of cycle tag information to a monomer of the analyte.
[0130] In FIG. 33, a binding agent “AA1:C12” is shown bound to a conjugate- AA-cycle tag complex “C1:AA1.” In this example, the binding agent AA1:C12 correctly recognizes the cognate amino acid (e.g., AA1 is recognized) of CA: AA1. However, because the cycle tag of complex C1:AA1 is noncognate, the binding agent should not bind to C1:AA1. Yet, the binding agent AA1:C12 recognizes the cycle tag C12 of the nearby conjugate-AA-cycle tag complex “C12:AA3,” which facilitates the “alternative” binding of binding agent AA1:C12 to complex C1:AA1. This binding is therefore facilitated by the avidity of the binding moiety of the binding agent AA1:C12 to AA1, in addition to the hybridization energy of the C12:AA3 complex in close proximity to the CA: AA1 complex. As a result of this binding, amino acid 3 (AA3) is “alternatively” assigned to cycle 12 (C12), whereas the correct assignment in in this example would be AA1 to Cl. Note that in this example, the binding agent and the nearby conjugate complex may hold the same cycle information in order to allow the alternative event.
[0131] The alternative assignment in FIG. 33 may not be remedied by sequentially introducing binding agents to immobilized conjugate-AA-cycle tag complexes in the order of: AAi.n:Cl, AAi.n:C2, AAj. n:C3, and so on, for all recode cycles to cycle “n”. Similarly, the assignment cannot be remedied by sequentially introducing binding agents to immobilized conjugate-AA-cycle tag complexes in the order of: AApCl, AA2:C2, AA3:C3, and so on, for all amino acid binding moieties. Rather, in order to reduce or eliminate the type of interactions in FIG. 33, spatial separation of the conjugate-AA-cycle tag complexes may be promoted.
[0132] Short spacers for the conjugate-AA-cycle tag complexes and the binding agents may be used during the recode block assembly steps (e.g., operations 5a-5d above) to effectively avoid these alternative events. However, such spacers may negatively affect the assembly of memory oligos, since such assembly is facilitated by the interaction of recode blocks. To overcome these conflicting spatial constraints, a spacer molecule that can be controllably lengthened or expanded may be used. For example, a cysteine may be incorporated at both ends of a spacer molecule via a disulfide bridge, thereby facilitating a shortened linker during recode block assembly (e.g., operations depicted in FIG. 33). This spacer may be expanded during memory oligo assembly by reducing the disulfide bonds. Alternatively, a polymer that is controllably expandable can be utilized. For example, a two-part hydrogel configured to collapse or expand based on solution/solvent conditions, or a polymer having reactivity that allows for expansion, can be incorporated in the solid support. During recode block assembly, the polymer may be relaxed, thereby increasing the distance between anchored conjugate- AA-cycle tag complexes; however, during other steps of the recoding process, the polymer may be collapsed.
[0133] In still further embodiments, linking oligos with bridging capability may be utilized (e.g., see FIG. 31) to mitigate inaccessibility of recode blocks to one another. For example, as shown in FIG. 31, long linking oligos may be used to bridge gaps via recode blocks via an extensiomligation approach. In other embodiments, a priori information and probabilities may be used to improve the accuracy of identification, since the events as depicted in FIG. 33 are rare and dependent on proximity of conjugate- AA-cycle tag complexes). Al and other computer-based methods may be useful to recognize these events so that they may be corrected in silico.
[0134] FIG. 34A-34C include example methodologies, which may include isolation, assignment, and assembly. FIG. 34A depicts an example of Isolation: N-terminal amino acids may be sequentially removed from a peptide using a tri-functional molecule in a series of cycles, each of which results in immobilization of one amino acid complex adjacent to the anchor point of its cognate peptide. Multiple cycles may create a lawn of spatially localized complexes holding cycle DNA, as depicted in the uppermost Geometry panel, where the large sphere represents a protein localization, and the smaller spheres represent its isolated amino acid localizations. Cycle may be known, but an amino acid identity is not yet determined. FIG. 34B depicts an example of Assignment: following the removal of protecting groups from isolated complexes and transition from an anhydrous to an aqueous environment, amino acid identity may be appended to isolated complexes via recognition by an affinity construct that brings identity information in the form of DNA into proximity of the cycle DNA. ‘Identity’ and ‘cycle’ DNA may be ligated in a high-fidelity reaction. FIG. 34C depicts an example of Assembly: extension-ligation of regional DNA into a long construct that reflects the original peptide information, as shown in the lowest geometry panel, may be analyzed using NGS sequencing.
[0135] FIG. 35 shows a ~lkd trifunctional molecule with: (1) phenyl isothiocyanate, (2) propargyl, and (3) model vanillin at the oligo position to simplify analytical characterization of the base structure: NNN-(Propargyl-PEG2) (6-oxo-6-(dibenzo[b,f] azacyclo oct-4-yn-l-yl) -caproic) (PEG3-l-acetamido- 4-iso-thiocyanato-benzene). The molecular structure was confirmed using LC ESI-MS, and its function was tested. The HPLC analysis shows formation of a product with high yield indicating functional activity of the key reactive isothiocyanate moiety.
[0136] FIG. 36 shows an agarose electrophoresis gel that demonstrates effective in situ ligation of cycle tags and recode tag oligos. In lane 1 is a dsDNA ladder (cat# 10597012 from Invitrogen) with the brightest band appearing at 100 base pairs. In lane 2 are the products from ligation of the 45-mer oligo with tether arm with a 30-mer ligation oligo on both the 3' (Sys#001 LO2, 30, SEQ ID NO: 85) and 5' (Sys#001,L01,30, SEQ ID NO: 84) ends. Three bands are visible: the product with both 30-mer oligos ligated, a faint band showing either one or the other 30-mer oligos ligated, and a smeared band showing the unligated 45-mer oligo. Lanes 3 and 4 show the ligation products when only one or the other of the 30-mer ligation oligos is added to the reaction, so a shorter product is generated. Lane 5 shows the ligation mixture without ligase added, the primary band is the 45-mer oligo band. Lane 6 shows the ligation products with a "no tether" version of the 45-mer oligo and the three bands are similar to those in Lane 2 indicating the presence of the double ligation product, the single ligation products, and the unligated 45-mer product. [0137] FIG. 37 shows a block diagram for steps of a cyclic protection and deprotection workflow.
[0138] FIG. 38 shows a reaction scheme for the stepwise assembly of an immobilized CRC complex, where an N-terminal amino acid is reacted with an amine -reactive molecule possessing a 2nd reactive functional group (e.g. tetrazine). The trifunctional construct possessing a nucleic acid cycle tag and surface immobilization moiety and trans-cyclooctene may be reacted with the tetrazine of the aminereactive molecule to form an immobilized CRC complex.
[0139] FIG. 39 shows a reaction scheme for the stepwise assembly of an immobilized CRC complex, where an N-terminal amino acid is reacted with an amine -reactive molecule possessing a 2nd reactive functional group (e.g. trans-cyclooctene). The trifunctional construct possessing a nucleic acid cycle tag and surface immobilization moiety may be reacted with the tetrazine functional group of the aminereactive molecule to form an immobilized CRC complex.
[0140] FIG. 40 shows a reaction scheme for the stepwise assembly of an immobilized CRC complex, where a tetrazine-labeled oligo is reacted with a functional group (e.g. trans-cyclooctene) of a trifunctional construct possessing a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide and a surface immobilization moiety.
[0141] FIG. 42A-42B show a CRC synthesis processes and intermediate molecules. FIG. 42A is a block diagram that illustrates the steps for synthesizing PPO, starting from PDA. It is converted to PDON-tBOC, which may be deprotected to form PDON, then converted to PDO and subsequently converted to PPO. FIG. 42B includes the chemical structure of PPO and intermediates.
[0142] FIG. 43 shows the function of PPO. Relative fluorescence units (RFU) of PPO immobilized to an azide-modified surface via Cu-catalyzed Huisgen cycloaddition followed by reaction with amine- labelled fluorescein is shown. Multiple fractions of purified PPO perform similarly. Strong signals above background confirm both the function of the alkyne and ITC chemically-reactive elements of the CRC.
[0143] FIG. 44 shows the function of PPO. Relative fluorescence units of a fluorescent oligo complementary to the oligo on PPO immobilized to an azide-modified surface via Cu-catalyzed Huisgen cycloaddition is shown. Multiple fractions of purified PPO perform similarly. Strong signals above background confirm both the function of the alkyne and oligo elements of the CRC.
[0144] FIG. 45 shows the function of PPO. Relative fluorescence units (RFU) of PPO immobilized to an amine-modified surface via the reactive ITC moiety followed by Cu-catalyzed Huisgen cycloaddition to an azido-labeled fluorescein reagent is shown. Multiple fractions of purified PPO perform similarly. Strong signals above background confirm the function of the alkyne, function of the ITC chemically-reactive elements of the CRC and the capability to use the CRC on multiple embodiments of solid support.
[0145] FIG. 46A-46D shows exemplary simulations and the binding kinetics of a commercially- available antibody (Sigma, SAB5200015) to an immobilized phosphotyrosine-PTH-ligand. FIG. 46D shows representative data of strong and reproducible binding curves generated using the Nicoya SPR system.
[0146] FIG. 47 shows PCR data of ligated recode block. Amplification of ligated oligos both with and without tethers shows amplification of ligated recode blocks with tethers, thus showing the ability to generate amplicons off of tethered recode blocks for subsequent obtaining of sequence information for a memory oligonucleotide or recode block.
[0147] FIG. 48 schematically illustrates the surface-bound model system used to demonstrate ligation and amplification efficiencies of the surface-associated steps of the method described herein. As shown, the SA-biotin interaction represents a non-covalent interaction of a binding agent, and the PPO with its associated cycle nucleic acid represents a CRC. They are in proximity via reaction with the N-terminus of a model peptide with isothiocyanate of the CRC.
[0148] FIG. 49 demonstrates formation of a recode block resulting from the ligation and amplification of surface-immobilized oligonucleotides. These enzyme-catalyzed results demonstrate of the ability to perform the novel assemble processes of the method in situ. The orientation of surface bound constructs are as depicted in FIG. 48. Amplification of proximal oligos produces a strong signal similar to that obtained from positive controls and clearly above the negative controls for the reaction. Full length high fidelity product was confirmed by melt curve analysis and Sanger sequencing.
[0149] FIG. 50 shows HPLC chromatograms for oligonucleotides exposed to anhydrous TFA, simulating Edman peptide cleavage conditions. An hr in anhydrous TFA, simulating Edman peptide cleavage conditions, has no significant effect on the peak intensity or shape of protected oligonucleotides. The peaks for protected oligos were largely unchanged (figure panels A and B), while the peak for the protected oligo was largely absent after 1 hour (figure panels C and D). Note: The peaks at 2.5 min, 3.5 min, and 7.5 mins for the 1 hr TFA condition spectra correspond to DMSO / TFA / Imidiazole.
[0150] Determining identity or positional information of an amino acid residue may include reverse translation protein sequencing or vice versa. Reverse translation protein sequencing may include recoding an amino acid sequence to a nucleic acid sequence, and then decoding or sequencing the nucleic acid sequence. Recoding may include the use of reagents or methods herein, or described in WO2024015875 or W02024040236. Reverse translation protein sequencing may include providing a peptide to a solid support (e.g. such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions). Reverse translation protein sequencing may include providing a chemically reactive conjugate (CRC). A CRC may include a cycle tag comprising a cycle nucleic acid associated with a cycle number, a reactive moiety for binding an N- terminal amino acid residue of the peptide, and an immobilizing moiety for immobilization to the solid support. Reverse translation protein sequencing may include contacting a peptide with a CRC. Reverse translation protein sequencing may include forming a conjugate complex (e.g. upon contact of a peptide with a CRC). Reverse translation protein sequencing may include immobilizing a conjugate complex to a solid support. Reverse translation protein sequencing may include cleaving an N-terminal peptide. Reverse translation protein sequencing may include separating an N-terminal amino acid residue from a peptide. Reverse translation protein sequencing may include providing an immobilized amino acid complex. An immobilized amino acid complex may include cleaved or separated N-terminal amino acid residue. Reverse translation protein sequencing may include contacting an immobilized amino acid complex with a binding agent. A binding agent may include: a binding moiety for preferentially binding to the immobilized amino acid complex, and a recode tag comprising a recode nucleic acid corresponding with the binding agent. Reverse translation protein sequencing may include forming an affinity complex. An affinity complex may include an immobilized amino acid complex and a binding agent. Reverse translation protein sequencing may include bringing a cycle tag into proximity with a recode tag within an affinity complex. Reverse translation protein sequencing may include transferring information of a recode nucleic acid to a cycle nucleic acid (e.g. of an immobilized conjugate complex). Reverse translation protein sequencing may include generating a recode block. Reverse translation protein sequencing may include obtaining sequence information of a recode block. Reverse translation protein sequencing may include formation of a memory oligo. Forming a memory oligo may include joining two or more recode blocks together. Forming a memory oligo may include combining sequences or sequence information of two or more recode blocks. Reverse translation protein sequencing may include obtaining sequence information of a memory oligo. Reverse translation protein sequencing may include determining identity and positional information of an amino acid residue of a peptide based on obtained sequence information (e.g. obtained information of one or more recode blocks, or obtained information of a memory oligo).
Recode Tags
[0151] Disclosed herein, in some embodiments, are recode tags. The recode tag may be a part of a binding agent. The recode tag may correspond with a binding agent. For example, the recode tag may convey information about a molecule (e.g. an amino acid or PTM) to which the binding agent binds. The recode tag may include a nucleic acid such as a recode nucleic acid. In some embodiments, the recode nucleic acid comprises DNA or RNA. In some embodiments, the recode tag is a DNA sequence. In some embodiments, the recode tag is an RNA sequence. The recode nucleic acid may be useful to encode amino acid information in a nucleic acid. The recode tag may be used in a method described herein, such as a method for determining protein information such as amino acid position or identity.
Recode Blocks
[0152] Disclosed herein, in some embodiments, are recode blocks. The recode block may include a cycle tag, and a recode tag or a reverse complement thereof. The recode block may include a cycle tag or a reverse complement thereof, and a recode tag. The recode block may include a cycle tag or a reverse complement thereof, and a recode tag or a reverse complement thereof. The recode block may include a cycle tag and a recode tag, or information corresponding to the cycle tag and the recode tag. For example, the recode block may include a cycle nucleic acid, a cycle nucleic acid sequence, or a reverse complement thereof, and may include a recode nucleic acid, a recode nucleic acid sequence, or a reverse complement thereof. The recode block may be useful for joining into a memory oligonucleotide, either of which may convey information about amino acid position and identity within a protein. The recode block may be used in a method described herein, such as a method for determining protein information such as amino acid position or identity.
[0153] In some embodiments, the recode block comprises the recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid joined or combined with the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid. In some embodiments, the recode block comprises the recode nucleic acid or a reverse complement of the sequence of the recode nucleic acid joined with the cycle tag. In some embodiments, the recode block comprises the recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid. In some embodiments, the recode block comprises the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid.
Transfer of Information
[0154] Disclosed herein, in some embodiments, are methods which include transferring information. For example, a method may include transferring information of the recode nucleic acid and the cycle nucleic acid of the immobilized conjugate complex to generate a recode block. In another example, a method may include transferring information of the cycle nucleic acid and the recode nucleic acid of the immobilized conjugate complex to a separate element to generate a recode block. The transfer of information may form a recode block, or may be used to form a memory oligonucleotide. The transfer of information may be included in a method described herein, such as a method for determining protein information such as amino acid position or identity.
[0155] In some embodiments, said transferring information comprises performing a nucleic acid sequence-based amplification, for example to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. In some embodiments, said transferring information comprises performing polymerase chain reaction (PCR) to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. In some embodiments, the PCR comprises real-time PCR, digital PCR, multiplex PCR, nested PCR, hot-start PCR, touchdown PCR, or quantitative PCR. In some embodiments, said transferring information comprises performing or conducting a ligase chain reaction, a helicase-dependent amplification, a strand displacement amplification, a loop-mediated isothermal amplification, a rolling circle amplification, a recombinase polymerase amplification, a nicking enzyme amplification reaction, a whole genome amplification, a transcription-mediated amplification, a multiple displacement amplification, or multiple annealing and looping-based amplification cycles, for example to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. The amplification or other procedure may be to generate the sequence of the recode nucleic acid, the sequence of the cycle nucleic acid, a reverse complement, or a combination thereof. In some embodiments, the information of the recode nucleic acid comprises a sequence of the recode nucleic acid or a reverse complement of the sequence of the recode nucleic acid. Information transfer between nucleic acids may be accomplished through various other means. In certain embodiments, chemical ligation methods may be employed instead of or in addition to enzymatic ligation. Such methods may include click chemistry reactions between modified nucleic acids, photochemical conjugation between nucleobases or backbone modifications, or copper-free click chemistry approaches. The choice of transfer chemistry may be optimized based on reaction conditions and molecular stability requirements. [0156] In some embodiments, the transfer of information involves a polymerase chain reaction. In some embodiments, the transfer of information involves a reverse transcription polymerase chain reaction. In some embodiments, the transfer of information involves a real-time polymerase chain reaction. In some embodiments, the transfer of information involves a digital polymerase chain reaction. In some embodiments, the transfer of information involves a multiplex polymerase chain reaction. In some embodiments, the transfer of information involves a nested polymerase chain reaction. In some embodiments, the transfer of information involves a hot-start polymerase chain reaction. In some embodiments, the transfer of information involves a touchdown polymerase chain reaction. In some embodiments, the transfer of information involves a quantitative polymerase chain reaction. In some embodiments, the transfer of information involves a ligase chain reaction. In some embodiments, the transfer of information involves a helicase-dependent amplification. In some embodiments, the transfer of information involves a strand displacement amplification. In some embodiments, the transfer of information involves a loop-mediated isothermal amplification. In some embodiments, the transfer of information involves a rolling circle amplification. In some embodiments, the transfer of information involves a recombinase polymerase amplification. In some embodiments, the transfer of information involves a nicking enzyme amplification reaction. In some embodiments, the transfer of information involves a whole genome amplification. In some embodiments, the transfer of information involves a transcription-mediated amplification. In some embodiments, the transfer of information involves a multiple displacement amplification. In some embodiments, the transfer of information involves a multiple annealing and looping-based amplification cycles. In some embodiments, the transfer of information involves a nucleic acid sequence-based amplification.
[0157] In some embodiments, said transferring information comprises joining the recode nucleic acid or a reverse complement of the recode nucleic acid with the cycle nucleic acid. In some embodiments, said transferring of information comprises joining the recode nucleic acid or a reverse complement of the nucleic acid with a complement of the cycle nucleic acid. In some embodiments, said transferring of information comprises joining the recode nucleic acid or a reverse complement of the nucleic acid with a complement separate nucleic acid that corresponds to a cycle nucleic acid.
.Joinins
[0158] Disclosed herein, in some embodiments, are methods which include joining. For example, a recode nucleic acid or a reverse complement thereof may be joined with a cycle nucleic acid or a reverse complement thereof. The joining may form a recode block, or may be used to form a memory oligonucleotide. The joining may be included in a method described herein, such as a method for determining protein information such as amino acid location or identity.
[0159] In some embodiments, joining comprises enzymatic ligation. In some embodiments, joining comprises splint ligation. In some embodiments, joining comprises chemical ligation. In some embodiments, joining comprises template-assisted ligation. In some embodiments, joining comprises the use of a ligase enzyme. In some embodiments, joining comprises the use of a splint oligonucleotide. In some embodiments, joining comprises the use of a catalyst. In some embodiments, joining comprises the use of a bridging molecule. In some embodiments, joining comprises the use of a condensation agent. In some embodiments, joining comprises the use of a coupling reagent. In some embodiments, joining comprises the use of a polymerase enzyme. In some embodiments, joining comprises the use of a complementary nucleic acid sequence. In some embodiments, joining comprises the use of a nicking enzyme. In some embodiments, joining comprises the use of a nucleic acid modifying enzyme. In some embodiments, joining comprises the use of a recombinase. In some embodiments, joining comprises the use of a strand-displacing polymerase. In some embodiments, joining comprises the use of a singlestrand binding protein. In some embodiments, joining comprises a click chemistry reaction. In some embodiments, joining comprises a phosphodiester bond formation. In some embodiments, joining comprises a peptide nucleic acid-mediated ligation. In some embodiments, each binding agent comprises recode tags with a unique nucleic acid sequence. In some embodiments, a plurality of binding agents comprises recode tags with the same nucleic acid sequence. In some embodiments, binding agents comprises recode tags which may have a unique sequence portion and a common sequence portion.
[0160] In some embodiments, joining the recode nucleic acid or a sequence of the recode nucleic acid with the cycle nucleic acid or a sequence of the cycle nucleic acid to generate a recode block comprises: (i) joining the recode nucleic acid with the cycle nucleic acid, (ii) joining the recode nucleic acid with a sequence of the cycle nucleic acid, (iii) joining a sequence of the recode nucleic acid with the cycle nucleic acid, or (iv) joining a sequence of the recode nucleic acid with a sequence of the cycle nucleic acid. Some embodiments include performing a nucleic acid sequence-based amplification to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. Some embodiments include performing polymerase chain reaction (PCR) to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid. In some embodiments, the PCR comprises real-time PCR, digital PCR, multiplex PCR, nested PCR, hot-start PCR, touchdown PCR, or quantitative PCR. Some embodiments include performing or conducting a ligase chain reaction, a helicase-dependent amplification, a strand displacement amplification, a loop-mediated isothermal amplification, a rolling circle amplification, a recombinase polymerase amplification, a nicking enzyme amplification reaction, a whole genome amplification, a transcription-mediated amplification, a multiple displacement amplification, or multiple annealing and looping-based amplification cycles, to generate the sequence of the recode nucleic acid or the sequence of the cycle nucleic acid.
[0161] In some embodiments, the joining comprises enzymatic ligation, splint ligation, chemical ligation, template-assisted ligation, use of a ligase enzyme, use of a splint oligonucleotide, use of a catalyst, use of a bridging molecule, use of a condensation agent, use of a coupling reagent, use of a polymerase enzyme, use of a complementary nucleic acid sequence, use of a nicking enzyme, use of a nucleic acid modifying enzyme, use of a recombinase, use of a strand-displacing polymerase, use of a single-strand binding protein, a click chemistry reaction, a phosphodiester bond formation, or a peptide nucleic acid-mediated ligation.
[0162] Some embodiments include contacting an additional immobilized amino acid complex with a second binding agent. In some embodiments, the binding agent and the second binding agent comprise distinct recode tags having different recode nucleic acids from each other. In some embodiments, the binding agent and the second binding agent comprise recode tags having identical recode nucleic acids as each other. In some embodiments, the binding agent and the second binding agent comprise distinct recode tags having recode nucleic acids that have different sequences from each other, and that have a portion of the recode nucleic acids that are identical.
[0163] In some embodiments, said transferring information comprises joining or combining the recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid with the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid, to generate a recode block. In some embodiments, said transferring information comprises joining or combining a sequence corresponding to a recode nucleic acid, a sequence of the recode nucleic acid, or a reverse complement of the sequence of the recode nucleic acid with a sequence corresponding to the cycle nucleic acid, a sequence of the cycle nucleic acid, or a reverse complement of the sequence of the cycle nucleic acid, to generate a recode block.
Localizing Nucleic Acids
[0164] Multiple methods may be envisioned to provide a set of co-localized nucleic acids holding information associated with protein sequence (e.g., amino acid identity and positional information). In addition to the methods described above, figures FIG. 16, FIG. 17 and FIG. 18 illustrate methods of co-localization of nucleic acids.
[0165] FIG. 16a illustrates steps in an alternate embodiment where cycle and amino acid information are aggregated in serial steps. In figure FIG. 16a the symbol ‘a’ may represent one or more amino acids of an immobilized protein, or an isolated amino acid. The “C/AA” moiety of the binding agent is a nucleic acid that represents cycle and amino acid identity, is stable throughout serial peptide degradation reactions, and is capable to hybridize with other nucleic acids. The “AC” moiety of the binding agent represents an activatable coupling for immobilization of C/AA to a solid support. Binding agents comprising a C/AA nucleic acid that represents cycle and amino acid information, and that comprise a reactive moiety that maybe joined to the solid support, contact either the n-terminal amino acid or a di- or tri- peptide of the analyte. The triangle indicates chemical cleavage location, for example the location of a di-sulfide bridge. Alternate embodiments may also utilize the overwhelming advantages of localized assembly methods as described below.
[0166] In FIG. 16b unbound molecules are washed away and C/AA is immobilized in proximity to the anchor point of the peptide analyte.
[0167] The process is repeated, as shown in figure FIG. 16c to create a plurality of co-localized nucleic acids. The number of C/AA conjugates in an analysis is equal to, or a subset of, the number of cycles multiplied by the number of amino acids or amino acid derivatives in the analysis. For efficiency, in some embodiments, the C/AA oligonucleotide is comprised of amino acid information alone, and further comprised of a reactive moiety capable to join with cycle information. In this aspect, cycle information is added prior to the next cycle. For example, a PNA molecule encoding cycle information and having a complement to the reactive moiety of the C/AA PNA oligonucleotide of a binding agent is brought into contact with an immobilized binding agent. The reaction joins the two PNA molecules which together comprise amino acid and cycle information.
Assembling Localized Nucleic Acids into Recode Blocks
[0168] Methods for manipulating and amplifying nucleic acids have been described. They include traditional methods [Saiki, et.al., Science, 1985 Dec 20;230(4732): 1350-4;] as well as isothermal methods [Zhao, et.al., Chemical Reviews 2015 115 (22), 12491-12545], various solution based multienzyme methods [Cao, et. al., Int. J. Mol. Sci. 2022, 23, 4620, Gibson, et. al., Nature Methods. 6 (5): 343-345; Horspool et al. BMC Research Notes 2010, 3:291], and surface-based methods [Adessi, et.al., (2000) Nucleic Acids Res. 28:e87; Mercier, Biophysical Journal Volume 85 October 2003 2075-2086]. Thus, association of nucleic acid information from independent nucleic acids can be accomplished using numerous methods. However, methods to assemble immobilized oligonucleotides that are co-localized on a surface to efficiently associate blocks of information has not been discussed extensively in the literature. [0169] FIG. 15A illustrates a method to assemble recode blocks, wherein the cycle tag is used as a splint to facilitate ligation of amino acid and cycle information. U-AS is a unifying assembly sequence to facilitate memory oligo assembly, U-HS is a unifying hybridization sequence to facilitate recode block assembly, C is a cycle tag sequence, i denotes the cycle number, X is a group that blocks 3’ extension, and ▼ denotes a possible position of cleavage. Since the construct is designed to be used as a polymerization template during memory oligo assembly cleavage should leave a blocked 3’ end. AA denotes a sequence that represents amino acid identity. Sections of LNA and RNA are shown as a preferred embodiment that improves efficiency by increasing hybridization energy and allowing facile digestion, respectively, during memory oligo assembly.
[0170] FIG. 15B illustrates a method for assembly of recode blocks, wherein the cycle tag is used as a capture agent to facilitate ligation of amino acid and cycle information and localization of the product. In this embodiment, the only function required of the cycle tag nucleic acid is specific homoduplex or heteroduplex hybridization. U-AS is a unifying assembly sequence to facilitate memory oligo assembly, U-HS is a unifying hybridization sequence to facilitate recode block assembly, C is a cycle tag sequence, i denotes the cycle number, X is a group that blocks 3’ extension, and ▼ denotes a possible position of cleavage. AA denotes a sequence that represents amino acid identity, and cycCode is a sequence that represents cycle information. Sections of PNA, LNA and RNA are shown as a preferred embodiment that improves efficiency by increasing hybridization energy and allowing facile digestion during memory oligo assembly. The resulting cognate product associates amino acid and cycle information, and includes sequences useful for memory oligo assembly. Assembled recode block complexes remain localized by virtue of specific nucleic acid hybridization interactions.
[0171] In some embodiments, the unifying sequences may be specific to support ordered assembly, or may be common sequences to support random assembly of recode blocks.
[0172] In some embodiments, following recognition and ligation to form the recode block, e.g., steps (g) and (h) of the herein disclosed method for determining identity and positional information of a plurality of amino acid residues of a peptide, optionally the binding agent may be cleaved either chemically or enzymatically, as indicated in FIG. 15C and FIG. 15B. The cleavage could be directed to the structure linking the binding moiety to the recodeTag, or the recodeTag itself, or to any appropriate natural or engineered position within the binding agent.
[0173] In some aspects, base pair mismatches may be used to adjust binding energies (Tm) and/or specificity of nucleic acids of the assembly process.
[0174] Crowding agents are known to increase the reaction speed of several enzymes, including DNA modifying enzymes. In some embodiments, crowding agents, such as PEG, hydrophilic polysaccharides, and dextrans, may be used to improve efficiency of nucleic acid assembly operations. Assembling Localized Nucleic Acids into Memory Oligos
[0175] Assembly of localized nucleic acids can be designed to occur in a stepwise or parallel manner, and in each case, can be designed to produce ordered or random assemblies of information. Methods for stepwise and parallel assembly of ordered information are shown in figures FIGs. 3, 14-19 and FIGs. 21-24. A simplified approach for parallel random assembly is illustrated in figure FIG. 25. In this illustrated embodiment, each nucleotide block of information shares a common sequence that allows enzymatic ligation via a common splint. The issue with this simple parallel method for random assembly is that cyclization of the memory oligo constructs may occur, thereby limiting throughput. Ordered methods avoid this pitfail, but may be kinetically inefficient. Stepwise random assembly avoid throughput limitations and increases efficiency over ordered assembly methods. Thus, for clarity several efficient stepwise methods to assemble co-localized nucleic acids in a random fashion are illustrated.
[0176] There are two primary operations required for assembly of memory oligos: Initiation and Extension. FIG. 24A-24F illustrates methods of Initiation for random assembly of memory oligos.
[0177] More specifically, FIG. 24A illustrates and embodiment wherein a CRC is provided. This initiation CRC has a hybridization tag nucleic acid (“hyb tag") in place of the cycle tag nucleic acid. The hyb tag could be DNA, RNA, PNA, LNA, XNA, BNA, TNA, GNA, or protected or fluorinated, or otherwise modified versions thereof, or combinations thereof. In this embodiment the cycle and amino acid information are sacrificed to achieve the goal of providing a controllable, extendable, locally covalently immobilized 3’ oligonucleotide end following a simple hybridization-extension process. A ligation oligo having a blocked 3’ end and sequence complementary to the hyb tag provides a template for PCR extension. U-AS is a unifying assembly sequence for memory oligo assembly. The ligation oligo may be fully or partially RNA to allow RNAase chew back to provide a single-stranded oligonucleotide that facilitates subsequent extension reactions. Alternately, one or more nicking endonuclease sites may be designed into the hyb tag and/or U-AS to provide a single-stranded oligonucleotide that facilitates subsequent extension reactions.
[0178] As shown in figure FIG. 24B, additional elements may be included in the assembly initiation oligo. These include but are not limited to PIN nucleotides, ULI sequence (Unique Location Identifier, Location Oligo, LocationOligo sequences), SBS primer sequence, a sample index, a spacer, a unique molecular identifier (UMI), a universal priming site, a CRISPR protospacer adjacent motif (PAM) sequence, or any combination thereof. A linker having length that is any length between 0.001 and 100 times the length the peptide, protein, or macromolecule analyte may be used directly or indirectly to link the hyb tag to the solid support.
[0179] FIG.24C further shows that although the Initiation CRC may be recognized by a binding agent in subsequent steps of the reverse translation process, that binding agent is a spectator, in that no complementarity between its recode block sequence exists with the hyb tag or extended initiation oligo. In the figure bold arrows indicates polymerase extension, and thin arrows indicates RNAse-H digestion. A hyb tag may be introduced at any appropriate step during processing. For example, it may be introduced before amino acids are isolated into immobilized amino acid complexes, or after the last immobilized amino acid complex is created, or after assembly of recode blocks. Extension of the hyb tag may occur at any appropriate step during processing. For example, it may be extended at any step after harsh chemical cleavage of amino acids from the immobilized analyte are complete. There may be a single or multiple hyb tags introduced per analyte.
[0180] In some instances, it is desirable to perform enzyme-catalyzed molecular biology operations on DNA, rather than RNA, XNA, PNA, TNA or other non-natural substrates. Thus, in some embodiments alternative nucleic acid configurations may be employed. For example, in FIG. 24D a configuration having a hyb tag bridge oligo is demonstrated. Note that in this configuration the hyb tag segment can be long to ensure essentially irreversible hybridization. In this embodiment the hyb tag bridge oligo 3’ end is not blocked, so that it can be extended and facilitate subsequent extension during assembly of localized nucleic acid blocks. The hyb tag ligation oligo (hyb tag EO) serves as a template and is blocked at its 3’ end (or is composed of RNA), such that it does not participate in further extension reactions.
[0181] Figure FIG. 24E shows configurations whereby location information may be assembled with amino acid identity and cycle information. In this figure, contacting oligos having a sequence complementary to the UEI of an initiation oligo with an initiation oligo and a co-localized recode block results in association of the information. Association may be catalyzed by ligation, polymerization, a combination of the two, or any appropriate method. As shown, optional sequences that may be useful for analysis include a primer sequence. Increasing temperature, reducing salt, adding organic solvents or other means may be employed to dissociate the formed product. Optionally, a copy of the product may be copied into solution by adding appropriate PCR primers, polymerase, and associated reagents. [0182] In some instances, it is desirable to have flexibility in generating a hyb tag. For example, FIG. 24F illustrates a set of embodiments wherein a hyb tag may be added prior to or during step (a) or at any steps between step (a) step (i) of the reverse translation process. Further, FIG. 24F illustrates that that the hyb tag may be introduced through a secondary coupling step. This may be especially helpful to avoid damage to the hyb tag, which may occur during cycles of peptide degradation. A protein analyte is labeled with a stable recognition element (e.g., biotin, DNP, inorganic compound, cage compound complement, or element that can be joined directly or indirectly to a nucleic acid) either prior to or subsequent to steps 1-4 of the disclosed process. The recognition element may facilitate assembly of a memory oligo by coupling to a nucleic acid that performs the function of the hyb tag nucleic acid.
[0183] In some embodiments, optionally the original peptide analyte may be degraded using proteinase k or any suitable endo or exopeptidase prior to joining the recognition element to the hyb tag nucleic acid.
[0184] In some embodiments, the hyb tag maybe a recognition element that is immobilized to the surface in proximity to the anchor point of a peptide analyte and may be joined directly or indirectly to a nucleic acid, (either prior to or subsequent to steps 1-4 of the disclosed process) to facilitate assembly of a memory oligo.
[0185] In one aspect an amine reactive moiety, such as an ITC, conjugated to the hyb tag nucleic acid may be applied to the surface to form a thiourea bond joining the hybtag to the N-terminal amino acid of a peptide. This is particularly effective at any step following the step(s) of cleaving and thereby separating the N-terminal amino acid residue(s) from the peptide, thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N- terminal amino acid residue.
[0186] In some embodiments the hyb tag may be randomly seeded, or may be immobilized near the anchor point of a peptide analyte using the process steps described herein.
[0187] In some embodiments the memory oligo or a representative oligonucleotide containing location, amino acid identity and positional information is released into solution after each cycle of amino acid complex immobilization, prior to the next cycle of amino acid complex immobilization. This may be accomplished through contacting the immobilized amino acid complex with an oligo having location information and proximally located to an Initiator oligo and a recode block having amino acid and position information. The location information may be transferred using ligation, polymerization, and/or action of endonuclease, or any suitable combination or means to form a nucleic acid representing location, amino acid identity, and positional information as shown in FIG. 24E. As stated above, there are two primary operations required for assembly: Initiation and Extension. The Extension operations for stepwise assembly into random memory oligos are illustrated in figure FIG. 26. These have the advantages of efficiently creating long linear nucleic acid memory oligos.
[0188] At the top of FIG. 26 the Initiation and Extension constructs are shown. Initiator construct generation was described in FIG. 24A-24F. Recode block Extension constructs may be generated using the steps described in FIG. 3, FIG. 9- FIG. 12, FIG. 15- FIG. 18, and FIG. 33. Cycle i indicates any recode block of the associated protein. U-AS is a unifying assembly sequence for memory oligo assembly, U-HS is the unifying hybridization sequence used for recode block assembly, and a preferred nucleic acid composition is depicted. The assembly is designed such that there is one U-AS’ sequence per analyte. During the Extension 1 operation, U-AS from the recode block interacts with U-AS’ of the initiator. In some embodiments events are as follows: 1) hybridization of U-AS to U-AS’ facilitated by the strong interaction between LNA and DNA nucleobases; 2) extension of U-AS’ by a polymerase having strand displacement capability; 3) RNAse-H action on the newly formed double-stranded RNA- DNA duplex to expose a new U-AS’ 3’ end; and 4) interaction of a U-AS sequence of a next recode block with the newly formed single-stranded U-AS’.
[0189] There are several favorable features of the disclosed method. Note that the events may be completed with or without washing or solution exchange between events. Non-ligated LO (e.g, due to lack of BA recognition) does not become incorporated and, thus, does not terminate memory oligo extension. Also, U-AS bases of the LO are RNA, so when they form the duplex it may be chewed up, leaving insignificant homology of a DNA:DNA duplex competing with strong LNA:DNA interactions. Note that RNA is in competition with LNA for binding to the landing pad (U-AS’), so temperature may be used to modulate specificity. Using a displacing polymerase effectively releases the bridge oligo from the growing memory oligo strand, and thereby increases the mobility of the growing memory oligo. Designed mismatches of nucleobases may be used to optimize relative Tm and specificity. The method is highly efficient to assemble nucleic acid information blocks in random order, avoiding circularization and termination events.
[0190] In some embodiments, more than one Initiator is associated with a given protein analyte. This provides the flexibility to assemble multiple recode blocks in parallel to create multiple memory oligos from a single analyte.
[0191] In some embodiments, more than one U-AS sequence may be employed. This provides that opportunity to orthogonally assembly more than one memory oligo for an analyte. For example, cycles 1 through 100 may utilize a first U-AS sequence, and cycles 101-200 may utilize a second U-AS sequence, resulting in recode blocks 1-100 being assembled independently from recode blocks 101- 200. Different U-AS sequences may be applied in any order, and in the case of ordered assembly, the number of U-AS may be equal to the number of cycles. In certain embodiments, this may serve to change the average length of the assembled memory oligo or the average number of recode blocks per memory oligo.
[0192] In some embodiments, multiple initiator sequences may be strategically employed to control memory oligo length and optimize sequencing performance. In certain embodiments, initiators are introduced at defined intervals, such as every 10 amino acids, whereby the resulting memory oligos may be tuned to an optimal size for a chosen sequencing platform. Further examples of defined intervals may be or include intervals of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, or 50 amino acids, or a range of any two of the aforementioned numbers of amino acids. For example, in embodiments where a peptide comprises 300 amino acids and each recode block comprises approximately 50 bases, placing initiators every 10 amino acids may generate memory oligos of approximately 500 bases on average, thereby providing compatibility with short-read sequencing platforms. Such controlled oligonucleotide length provides improved sequencing quality by maintaining read lengths within optimal ranges for chosen platforms. The controlled length further provides increased flexibility to accommodate various sequencing technologies.
[0193] In some embodiments, a location oligo sequence or complement may be transferred using methods described herein to the more than one Initiator, creating information for the shared group of localized Initiators. In some embodiments, nucleic acid elements may be incorporated that promote location oligo assembly into adjacent strands. For example, a ligation oligo my comprise a sequence complementary to a sequence of the initiation oligo and be used to facilitate extension, ligation or otherwise transfer information of one memory oligo to an adjacent memory oligo. In some embodiments Extension occurs in serial steps that may include temperature modulation, and in others embodiments the process may be run in a single isothermal step.
[0194] In some aspects, base pair mismatches may be used to adjust binding energies (Tm) and/or specificity of nucleic acids of the assembly process.
[0195] In some embodiments, endonuclease nickase or similar enzymes may replace RNAse-H to provide a single-stranded U-AS’
[0196] Further Extension operations for stepwise assembly into random memory oligos are illustrated in figure FIG. 27. This embodiment may be useful when assembling elements that do not effectively participate in all types of enzymatically-catalyzed reactions, but have the capability to form specific interactions. For example, PNA is not effective to serve as a template for PCR or ligation, but is highly capable to create strong specific heteroduplexes with various nucleic acids including DNA. [Ref: Fouz et.al., Molecules 2020, 25, 786; Pezo, et.al., Angew. Chem., Int. Ed., 2013, 52, 8139-8143; Duffy et al. BMC Biology (2020) 18:112],
[0197] At the top of FIG. 27 Initiation and Extension constructs are shown. Initiator construct generation was described in FIG. 24A-24F. Recode block Extension constructs may be generated using the steps described in FIG. 3, FIG. 9- FIG. 12, FIG. 15- FIG. 18, and FIG. 33. Cycle i indicates any recode block of the associated protein. U-AS is a unifying assembly sequence for memory oligo assembly, C/AA,7 is a C/AA tag where i represents a cycle andj represents an amino acid or modified or derivatized amino acid, C/AA code,7 is the associated nucleic acid sequence representing the amino acid and cycle information of the C/AA tag. Note that the sequence of C/AA code,7 may or may not be the same as the sequence of the C/AA tag. A preferred nucleic acid composition is depicted. The assembly is designed such that there is one U-AS’ sequence per analyte. During the Extension 1 operation, U-AS from the recode block interacts with U-AS’ of the initiator. In some embodiments events are as follows: 1) hybridization of U-AS to U-AS’ facilitated by the strong interaction between LNA and DNA nucleobases; 2) extension of U-AS’ by a polymerase having strand displacement capability; 3) RNAse-H action on the newly formed double-stranded RNA-DNA duplex to expose a new U-AS’ 3’ end; and 4) interaction of a U-AS sequence of a next recode block with the newly formed single-stranded U-AS’.
[0198] Additional approaches may be adapted from concepts of bridge amplification [Adessi, et.al., (2000) Nucleic Acids Res. 28:e87] to assemble information from localized 43nucleic acids. FIG. 28 schematically illustrates various detailed steps of one such workflow. An exemplary workflow starts after operations l-5d of FIG. 3, recoding process 200, where recode blocks are assembled on a surface. Similarly, such a workflow may start after the steps depicted in FIG. 15, FIG. 16, FIG. 17, or FIG. 18. FIG. 28A shows an azide-functionalized surface of a solid support following isolation of amino acid complexes on the surface according to methods described herein. FIG. 28B shows the surface subsequently grafted with a primer lawn. The primers may comprise sequences that are the same or complementary to initiation oligo sequences, such as a unifying assembly sequence(U-AS). They may contain sequences that are the same or complementary to location oligo sequences. They comprise compound structures, such as those depicted in FIG. 15B. In FIG. 28C an initiation oligo is randomly seeded onto the surface, in analogy to random seeding of oligonucleotides onto a next-generation sequencing flowcell surface, such as an Illumina NGS flowcell surface. Note that the initiation oligo may be randomly seeded, or may in some embodiments be immobilized near the anchor point of a peptide analyte as described for hyb oligos using process steps in analogy to those described herein. The density may be controlled by limiting dilution. An exemplary downward projection of the surface is shown as FIG. 28J, where the anchor position of immobilized protein analytes may overlap with the positions randomly adopted by the initiation oligo. Locations of the amino acid complexes for 4 immobilized protein analytes lie within the circular areas. Initiation oligos randomly adopt a position centered in the square areas. Any immobilized isolated amino acid complex may be assembled using an initiation oligo that is within proximity. It is apparent that patterned substrates and associated techniques can improve efficiency.
[0199] A bridge amplification-like process proceeds as illustrated in FIG. 28D-28F. In the first steps of bridge amplification, priming occurs as the opposite end of an extended fragment bends over and “bridges” to another complementary oligo on the surface. Repeated denaturation and extension cycles (not shown, but similar to PCR) result in amplicons that may be further analyzed using next-generation DNA sequencing technologies. FIG. 28G-FIG. 281 show one aspect wherein a memory oligo is assembled by repeatedly “bridging” to a complementary segment of the recode block on the surface, and undergoing polymerase extension. In some embodiments, such as in FIG 15B, polymerase with strand displacement capability disassembles the recode block information as its information is transferred to the growing memory oligo, avoiding repeated inclusion into a growing memory oligo. [0200] In some embodiments, Illumina patterning and ExAmp technology may be employed to improve efficiency. Patterned surfaces may aid in isolation of proteins and may reduce cross-talk during memory oligo assembly.
[0201] In some embodiments, additional steps reduce or eliminate repeated copying of the same cycle and amino acid information during the assembly process. For example, in the case where a non- polymerizable molecule, such as PNA, holds the complement, and the complement is displaced, blocker oligonucleotides may be introduced to hybridize to the freed-up PNA sequence, effectively removing it from further interaction with a memory oligo.
[0202] Exemplary methods to efficiently create memory oligos comprising the operations of Initiation and Extension are provided as Examples below.
Memory Oliso Readout
[0203] Disclosed herein, in some embodiments, are methods that include a memory oligonucleotide. The memory oligonucleotide may include multiple recode blocks, reverse complement of multiple recode blocks, or one or more recode blocks and the reverse complement of one or more recode blocks. The memory oligonucleotide may be used in a method described herein, such as a method for determining protein information such as amino acid position or identity.
[0204] In some embodiments, obtaining the sequence information for the recode block comprises performing sequencing. In some embodiments, obtaining the sequence information for the memory oligonucleotide comprises performing sequencing. The memory oligonucleotide may include a recode block or multiple recode blocks. In some embodiments, the sequencing comprises Sanger sequencing. In some embodiments, the sequencing comprises Next-Generation Sequencing. In some embodiments, the sequencing comprises pyrosequencing, sequencing by synthesis, sequencing by ligation, Illumina sequencing, Ion Torrent sequencing, Pacific Biosciences sequencing, Oxford Nanopore sequencing, SOLiD sequencing, nanopore sequencing, Single Molecule Real-Time (SMRT) sequencing, 454 sequencing, Complete Genomics sequencing, Helicos sequencing, MinlON sequencing, direct RNA sequencing, Linked-Read sequencing, mate-pair sequencing, or targeted gene sequencing.
[0205] In some embodiments, the sequence information for the memory oligonucleotide is obtained by sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Sanger sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Next-Generation Sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by pyrosequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by sequencing by synthesis. In some embodiments, the sequence information for the memory oligonucleotide is obtained by sequencing by ligation. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Illumina sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Ion Torrent sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Pacific Biosciences sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Oxford Nanopore sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by SOLiD sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by nanopore sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Single Molecule Real-Time (SMRT) sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by 454 sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Complete Genomics sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Helicos sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by MinlON sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by direct RNA sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by Linked-Read sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by mate-pair sequencing. In some embodiments, the sequence information for the memory oligonucleotide is obtained by targeted gene sequencing.
[0206] Some embodiments include aggregation of information from only a subset of cycles. Some embodiments include analysis of peptide information that does not include all amino acids of a peptide, for example using sequencing information generated through a recode process (e.g. from a memory oligonucleotide formed from sequences of recode tags and cycle tags) that does not include all amino acids of the peptide. In some embodiments, only some amino acids of a protein are recoded into recode blocks. A memory oligo may include recode blocks corresponding to all, or only some of the amino acids, of a peptide. The missing amino acid information may be taken into account when reconstructing a peptide, or identifying a peptide. Some memory oligonucleotides include recode blocks with recode tag and cycle tag sequences.
Binding Agent
[0207] Disclosed herein, in some embodiments, are binding agents. The binding agent may include a recode tag and a binding moiety. The recode tag may include a recode nucleic acid. The binding agent may be used in a method described herein, such as a method for determining protein information such as amino acid location or identity.
[0208] In some embodiments, the binding moiety comprises a peptide. In some embodiments, the binding moiety comprises an antibody. In some embodiments, the antibody comprises a monoclonal antibody, polyclonal antibody, an antibody fragment, an antibody derivative, a bispecific antibody, a nanobody, or a single-domain antibody. In some embodiments, the antibody comprises an antibody fragment comprising a Fab, F(ab')2, or scFv. In some embodiments, the binding moiety comprises an antibody derivative comprising an antibody-drug conjugate, a synthetic antibody, an antibody mimic, an engineered protein binder comprising a DARPin or Affibody, an aptamer, a ligand for a peptide receptor, a small molecule, a lectin, an enzyme substrate, a RNA molecule, or a DNA molecule. In some embodiments, the binding moiety comprises an antibody or a fragment thereof, or aptamer.
[0209] In some embodiments, the binding agent includes an antibody. In some embodiments, the binding agent includes a monoclonal antibody. In some embodiments, the binding agent includes a polyclonal antibody. In some embodiments, the binding agent includes an antibody fragment, such as Fab, F(ab')2, or scFv. In some embodiments, the binding agent includes an antibody derivative, such as an antibody-drug conjugate. In some embodiments, the binding agent includes a bispecific antibody. In some embodiments, the binding agent includes a synthetic antibody or antibody mimic. In some embodiments, the binding agent includes an aptamer. In some embodiments, the binding agent includes a nanobody or single-domain antibody. In some embodiments, the binding agent includes an engineered protein binder, such as a DARPins or Affibodies. In some embodiments, the binding agent includes a peptide. In some embodiments, the binding agent includes a ligand for a peptide receptor. In some embodiments, the binding agent includes a small molecule. In some embodiments, the binding agent includes a lectin. In some embodiments, the binding agent includes an enzyme substrate. In some embodiments, the binding agent includes a RNA molecule. In some embodiments, the binding agent includes a DNA molecule.
[0210] In some embodiments, the binding agent further comprises a second tag. In some embodiments, the second tag comprises a fluorescent tag for visualization, a biotin tag for interaction with streptavidin, a radioactive tag for detection, a quantum dot for visualization, a mass spectrometry-based detection tag, a chromogenic tag for visualization, a chemiluminescent tag for detection, a photoacoustic imaging tag, a single-molecule imaging tag, or a dual-modality imaging tag.
[0211] In some embodiments, the binding agent is labeled with a second tag for visualization. In some embodiments, the binding agent is labeled with a fluorescent tag for visualization. In some embodiments, the binding agent is labeled with a biotin tag for subsequent interaction with streptavidin. In some embodiments, the binding agent is labeled with a radioactive tag for detection. In some embodiments, the binding agent is labeled with a quantum dot for visualization. In some embodiments, the binding agent is labeled with a second tag for mass spectrometry-based detection. In some embodiments, the binding agent is labeled with a chromogenic tag for visualization. In some embodiments, the binding agent is labeled with a chemiluminescent tag for detection. In some embodiments, the binding agent is labeled with a second tag for photoacoustic imaging. In some embodiments, the binding agent is labeled with a second tag for single -molecule imaging. In some embodiments, the binding agent is labeled with a second tag for dual-modality imaging.
[0212] In some embodiments, the binding moiety binds to any of the following amino acids: Ala, Arg, Asn, Asp, Cys, Gin, Glu, Gly, His, He, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, or Vai. In some embodiments, the binding moiety binds to Ala. In some embodiments, the binding moiety binds to Arg. In some embodiments, the binding moiety binds to Asn. In some embodiments, the binding moiety binds to Asp. In some embodiments, the binding moiety binds to Cys. In some embodiments, the binding moiety binds to Gin. In some embodiments, the binding moiety binds to Glu. In some embodiments, the binding moiety binds to Gly. In some embodiments, the binding moiety binds to His. In some embodiments, the binding moiety binds to He. In some embodiments, the binding moiety binds to Leu. In some embodiments, the binding moiety binds to Lys. In some embodiments, the binding moiety binds to Met. In some embodiments, the binding moiety binds to Phe. In some embodiments, the binding moiety binds to Pro. In some embodiments, the binding moiety binds to Ser. In some embodiments, the binding moiety binds to Thr. In some embodiments, the binding moiety binds to Trp. In some embodiments, the binding moiety binds to Tyr. In some embodiments, the binding moiety binds to Vai. In some embodiments, the binding moiety binds to a combination of any of the aforementioned amino acids. Multiple binding agents may be used, with various binding agents having binding moieties that bind to distinct amino acids, and having distinct recode tags that correspond with the distinct amino acids. Multiple binding agents may be used, with various binding agents having binding moieties that bind to multiple amino acids, groups of amino acids, or marginally preferential binding to some amino acids over others. Multiple binding agents may be used with binding agents having a combination of properties including some binding to distinct amino acids, and other binding to groups of amino acids. [0213] In some embodiments, the binding moiety binds to a dipeptide. In some embodiments, the binding moiety binds to tripeptide. In some embodiments, the binding moiety binds to any of the following: a natural amino acid, a post-translationally modified (PTM) amino acid, a derivatized version of an amino acid, a derivatized or stabilized version of a post-translationally modified amino acid, a synthetic amino acid, an amino acid with a specific side chain, an amino acid with a phosphorylated side chain, an amino acid with a glycosylated side chain, an amino acid with a methylation modification, or a D-amino acid. In some embodiments, the binding moiety binds to a combination of any of the aforementioned amino acids. In some embodiments, the binding moiety binds to a group of amino acids. For example, a binding moiety may bind to multiple of many amino acids, e.g. all positively charges, or phosphorylated PTMS. In some embodiments, the binding moiety is weakly specific for an amino acid or group of amino acids. For example, in some embodiments, the binding moiety has only a mild preference for one amino acid or group of amino acids over another. In some embodiments, a PTM such as phosphotyrosine, phosphothreonine, or phosphoserine is recognized. The binding moiety may bind to a phosphorylated amino acid. The binding moiety may bind to a glycosylated amino acid. The binding moiety may bind to a methylated amino acid. The binding moiety may bind to a ubiquitinylated amino acid. Multiple different binding moieties may be used in a plurality of binding agents, and each binding agent may include a recode tag corresponding with each of the multiple different binding moieties. The binding moiety may bind to a derivatized or stabilized version of an amino acid, post-translationally modified amino acid, of other natural or synthetic amino acid. The binding moiety may bind to an amino acid that has undergone sumoyloation, prenylation, nitrosylation, sulfation, ADP-ribosylation, palmitoylation, myristoylation, carboxylation, hydroxylation, or other modification. The binding moiety may bind to a group or class of said modifications or amino acids with similar modifications. For example, the binding moiety may bind to a group such as any amino acid having a certain PTM, such as all phosphorylated amino acids.
Solid Support
[0214] Disclosed herein, in some embodiments, are solid supports. A peptide may be coupled to the solid support. A chemically-reactive conjugate may bind to the solid support. The solid support may be used in a method described herein, such as a method for determining protein information such as amino acid location or identity. In some embodiments, the peptide is immobilized by being bound to a solid support. In some embodiments, peptides are immobilized by being bound to a solid support.
[0215] In some embodiments, the solid support comprises a bead, a plate, or a chip. In some embodiments, the solid support comprises glass slide, silica, a resin, a gel, a membrane, polystyrene, a metal, nitrocellulose, a mineral, plastic, polyacrylamide, latex, or ceramic. In some embodiments, the solid support comprises a magnetic bead, a glass slide, a microarray chip, a nanoparticle, a silica gel, a resin, a polystyrene bead, a gold plate, a silicon chip, a nitrocellulose membrane, a quartz slide, a multiwell plate, a cellulose paper, an agarose bead, a plastic bead, a polyacrylamide gel, a magnetic nanoparticle, a latex bead, or a ceramic bead. In some embodiments, the solid support is contained within a flow cell or within a well plate.
[0216] In some embodiments, the solid support is a bead, a plate, or a chip. In some embodiments, the solid support is a magnetic bead. In some embodiments, the solid support is a glass slide. In some embodiments, the solid support is a microarray chip. In some embodiments, the solid support is a nanoparticle. In some embodiments, the solid support is a silica gel. In some embodiments, the solid support is a resin. In some embodiments, the solid support is a polystyrene bead. In some embodiments, the solid support is a gold plate. In some embodiments, the solid support is a silicon chip. In some embodiments, the solid support is a nitrocellulose membrane. In some embodiments, the solid support is a quartz slide. In some embodiments, the solid support is a multi-well plate. In some embodiments, the solid support is a cellulose paper. In some embodiments, the solid support is an agarose bead. In some embodiments, the solid support is a plastic bead. In some embodiments, the solid support is a polyacrylamide gel. In some embodiments, the solid support is a magnetic nanoparticle. In some embodiments, the solid support is a latex bead. In some embodiments, the solid support is a ceramic bead. In some embodiments, the solid support is contained within a flow cell. In some embodiments, the solid support is contained within well plate.
[0217] In some embodiments, the solid support comprises a bead, plate, chip, polymer, metal, or glass. In some embodiments, the solid support is a bead. In some embodiments, the solid support is a plate. In some embodiments, the solid support is a chip. In some embodiments, the solid support is composed of a polymer. In some embodiments, the solid support is composed of a metal. In some embodiments, the solid support is composed of glass.
Peptides
[0218] Disclosed herein, in some embodiments, are peptides. The peptide may be the subject of a method which seeks to obtain information about the peptide, such as information on an identity or location of one or more amino acids of the peptide. The peptide may be included in a method described herein, such as a method for determining protein information such as amino acid location or identity.
[0219] In some embodiments, the peptide comprises a polypeptide or a protein. In some embodiments, the peptide comprises a hormone, neurotransmitter, enzyme, antibody, viral protein, bacterial protein, synthetic peptide, bioactive peptide, peptide hormone, oligopeptide, polypeptide, fusion protein, cyclic peptide, branched peptide, recombinant protein, tumor marker, therapeutic peptide, antigenic peptide, or signaling peptide. [0220] In some embodiments, the peptide is a polypeptide or a protein. In some embodiments, the peptide is a hormone. In some embodiments, the peptide is a neurotransmitter. In some embodiments, the peptide is an enzyme. In some embodiments, the peptide is an antibody. In some embodiments, the peptide is a viral protein. In some embodiments, the peptide is a bacterial protein. In some embodiments, the peptide is a synthetic peptide. In some embodiments, the peptide is a bioactive peptide. In some embodiments, the peptide is a peptide hormone. In some embodiments, the peptide is an oligopeptide. In some embodiments, the peptide is a polypeptide. In some embodiments, the peptide is a fusion protein. In some embodiments, the peptide is a cyclic peptide. In some embodiments, the peptide is a branched peptide. In some embodiments, the peptide is a recombinant protein. In some embodiments, the peptide is a tumor marker. In some embodiments, the peptide is a therapeutic peptide. In some embodiments, the peptide is an antigenic peptide. In some embodiments, the peptide is a signaling peptide.
[0221] Disclosed herein, in some embodiments, are peptides coupled to a solid support. In some embodiments, the peptide is coupled to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support. For example, the peptide may be coupled directly by a C-terminal amino acid residue to the solid support, or may be coupled directly by an internal (e.g. non-N-terminal and non-C-terminal) amino acid residue to the solid support. In some embodiments, the N-terminus of the peptide is linked or coupled indirectly to the solid support via a chain of other amino acids of the peptide.
[0222] In some embodiments, the peptide coupled to the solid support such that a N-terminal amino acid residue is exposed to reaction conditions. For example, the N-terminal amino acid residue may be on an exterior of the peptide. In some embodiments, the N-terminal amino acid residue exposed to reaction conditions is exposed to a solvent.
[0223] In some embodiments, the peptide is derived from a human, plant, bacterium, fungus, animal, virus, mammal, bird, marine organism, insect, reptile, amphibian, synthetic source, protist, yeast, primate, cell culture, parasite, patient sample, environmental sample, or genetically modified organism. [0224] In some embodiments, the peptide is derived from a cell lysate, blood sample, plasma sample, serum sample, tissue biopsy, saliva sample, urine sample, cerebrospinal fluid sample, sweat sample, synovial fluid sample, fecal sample, gut microbiome sample, environmental water sample, soil sample, bacterial culture, viral culture, organoid, tumor biopsy, sputum sample, or hair sample.
[0225] In some embodiments, the peptide is derived from a human. In some embodiments, the peptide is derived from a plant. In some embodiments, the peptide is derived from a bacterium. In some embodiments, the peptide is derived from a fungus. In some embodiments, the peptide is derived from an animal. In some embodiments, the peptide is derived from a virus. In some embodiments, the peptide is derived from a mammal. In some embodiments, the peptide is derived from a bird. In some embodiments, the peptide is derived from a marine organism. In some embodiments, the peptide is derived from an insect. In some embodiments, the peptide is derived from a reptile. In some embodiments, the peptide is derived from an amphibian. In some embodiments, the peptide is derived from a synthetic source. In some embodiments, the peptide is derived from a protist. In some embodiments, the peptide is derived from a yeast. In some embodiments, the peptide is derived from a primate. In some embodiments, the peptide is derived from a cell culture. In some embodiments, the peptide is derived from a parasite. In some embodiments, the peptide is derived from a patient sample. In some embodiments, the peptide is derived from an environmental sample. In some embodiments, the peptide is derived from a genetically modified organism.
[0226] In some embodiments, the peptide is derived from a cell lysate. In some embodiments, the peptide is derived from a plasma sample. In some embodiments, the peptide is derived from a tissue biopsy. In some embodiments, the peptide is derived from a serum sample. In some embodiments, the peptide is derived from a saliva sample. In some embodiments, the peptide is derived from a urine sample. In some embodiments, the peptide is derived from a cerebrospinal fluid sample. In some embodiments, the peptide is derived from a sweat sample. In some embodiments, the peptide is derived from a synovial fluid sample. In some embodiments, the peptide is derived from a fecal sample. In some embodiments, the peptide is derived from a gut microbiome sample. In some embodiments, the peptide is derived from an environmental water sample. In some embodiments, the peptide is derived from a soil sample. In some embodiments, the peptide is derived from a bacterial culture. In some embodiments, the peptide is derived from a viral culture. In some embodiments, the peptide is derived from an organoid. In some embodiments, the peptide is derived from a tumor biopsy. In some embodiments, the peptide is derived from a sputum sample. In some embodiments, the peptide is derived from a hair sample.
[0227] In some embodiments, the peptide is associated with a disease state. In some embodiments, the peptide is associated with a cancerous disease state, an autoimmune disease state, a neurodegenerative disease state, a cardiovascular disease state, a metabolic disease state, a genetic disease state, a viral infection, a bacterial infection, a fungal infection, a parasitic infection, an inflammatory condition, an endocrine disorder, an immunodeficiency, a respiratory disorder, a skin disorder, a gastrointestinal disorder, a psychiatric disorder, an aging process, a muscular disorder, or a renal disorder.
[0228] In some embodiments, the peptide is associated with a specific disease state. In some embodiments, the peptide is associated with a cancerous disease state. In some embodiments, the peptide is associated with an autoimmune disease state. In some embodiments, the peptide is associated with a neurodegenerative disease state. In some embodiments, the peptide is associated with a cardiovascular disease state. In some embodiments, the peptide is associated with a metabolic disease state. In some embodiments, the peptide is associated with a genetic disease state. In some embodiments, the peptide is associated with a viral infection. In some embodiments, the peptide is associated with a bacterial infection. In some embodiments, the peptide is associated with a fungal infection. In some embodiments, the peptide is associated with a parasitic infection. In some embodiments, the peptide is associated with an inflammatory condition. In some embodiments, the peptide is associated with an endocrine disorder. In some embodiments, the peptide is associated with an immunodeficiency. In some embodiments, the peptide is associated with a respiratory disorder. In some embodiments, the peptide is associated with a skin disorder. In some embodiments, the peptide is associated with a gastrointestinal disorder. In some embodiments, the peptide is associated with a psychiatric disorder. In some embodiments, the peptide is associated with an aging process. In some embodiments, the peptide is associated with a muscular disorder. In some embodiments, the peptide is associated with a renal disorder.
[0229] In some embodiments, the peptide is a biomarker for a disease or condition, a drug target for a disease or condition, an antigen for the development of a vaccine, used for patient stratification in a clinical trial, a therapeutic agent for a disease or condition, used in the production of a biosimilar or generic drug, used for evaluating the efficacy of a drug treatment, used in personalized medicine for a specific disease or condition, used in immuno-oncology research, used in the validation of a diagnostic test, used in the development of a peptide-based therapeutic, a component of a cell signaling pathway, used in a structure-activity relationship study, used in the development of an immunoassay, used in the study of protein-protein interactions, used in the design of a drug delivery system, used in a high- throughput screening assay, used in a pharmacokinetic study, used in the formulation of a nutraceutical product, used in the development of a probiotic product, or used in a proteomics study.
[0230] In some embodiments, the peptide is a biomarker for a disease or condition. In some embodiments, the peptide is a drug target for a specific disease or condition. In some embodiments, the peptide is an antigen for the development of a vaccine. In some embodiments, the peptide is used for patient stratification in a clinical trial. In some embodiments, the peptide is a therapeutic agent for a specific disease or condition. In some embodiments, the peptide is used in the production of a biosimilar or generic drug. In some embodiments, the peptide is used for evaluating the efficacy of a drug treatment. In some embodiments, the peptide is used in personalized medicine for a specific disease or condition. In some embodiments, the peptide is used in immuno-oncology research. In some embodiments, the peptide is used in the validation of a diagnostic test. In some embodiments, the peptide is used in the development of a peptide-based therapeutic. In some embodiments, the peptide is a component of a cell signaling pathway. In some embodiments, the peptide is used in a structureactivity relationship study. In some embodiments, the peptide is used in the development of an immunoassay. In some embodiments, the peptide is used in the study of protein-protein interactions. In some embodiments, the peptide is used in the design of a drug delivery system. In some embodiments, the peptide is used in a high-throughput screening assay. In some embodiments, the peptide is used in a pharmacokinetic study. In some embodiments, the peptide is used in the formulation of a nutraceutical product. In some embodiments, the peptide is used in the development of a probiotic product. In some embodiments, the peptide is used in a proteomics study. Chemicall -Reactive Conjugates
[0231] Disclosed herein, in some embodiments, are chemically-reactive conjugates (CRCs). The CRC may be used in a method described herein, such as a method for determining protein information such as amino acid sequence, identity, or location. The chemically-reactive conjugate (CRC) may include a nucleic acid sequence tag. The chemically-reactive conjugate may include a reactive moiety. The reactive moiety may bind and cleave a N-terminal amino acid residue from a peptide. The chemically- reactive conjugate may include an immobilizing moiety. The immobilizing moiety may bind to a solid support, and thus may be useful for immobilization to a solid support. The chemically-reactive conjugate may include (A) a cycle tag; (B) a reactive moiety for binding and cleaving a N-terminal amino acid residue from a peptide; and (C) an immobilizing moiety for immobilization to a solid support.
[0232] The CRC may include the following structure: (Formula
^AB l-BC
A B C
I). The CRC may include the following structure: (Formula II). The
CRC may include the structure of Formula I or Formula II, or any suitable structure connecting A, B, and C. In either formula, A is, or includes, a cycle tag, B is, or includes, a reactive moiety (e.g. for binding and cleaving a N-terminal amino acid residue from a peptide), and C is, or includes, an immobilizing moiety (e.g. for immobilization to a solid support). LA, LB, and Lc are optional linkers in
Formula I. Further, in Formula I, may comprise a central moiety. LAB and LBC are optional linkers in Formula II. Additional arms or aspects may be included or added to Formula I or II.
[0233] The chemically reactive conjugate may include a central moiety. The central moiety may be or include a central carbon. The central carbon may be attached to other carbons, such as to 3 other carbons, and link to the arms of the chemically-reactive conjugate. The central moiety may include a heterocycle, a carbocycle, or a trivalent nitrogen. The trivalent nitrogen may include an amine. The amine may include a tertiary amine. The central moiety may include a trivalent boron, a tri- or higher valency phosphorus, a tetravalent silicon, a polyhedral oligomeric silsesquioxane (POSS), a siloxane, a branched siloxane, a polyether, a phosphazene, a phosphonium, an ammonium, an imidazolium, a methane, a propane, a butane, a pentane, a hexane, a C1-C24 alkyl, a benzene, a toluene, a xylene, a phenol, an N,N-disubstituted aniline, an anisole, a trihydroxybenzene, a benzenetricarboxylic acid, a phthalic acid, a trimesic acid, a cyclopropane, a glycol, a glycerol, an ethylene glycol, an oligoethylene glycol, a branched oligoethylene glycol, a multi-arm oligoethylene glycol, a dendrimer, a propylene glycol, an oligopropylene glycol, a trimethylolpropane, a pentaerythritol, a dipentaerythritol, a sugar, a glycoside, a saccharide, a glucose, a fructose, a furanose, a galactose, a mannose, a cyclohexane, a cyclooctane, a cycloheptane, a cyclopentane, a cyclobutene, a cyclononane, a cyclohexene, a cyclobutene, a cyclopentene, a cyclooctene, a cyclononane, an adamantane, a naphthalene, an anthracene, a pyrene, an annulene, a pyridine, a N-substituted piperadine, a N,N-disubstituted piperazine, a thiophene, an indole, a pyrazine, an isoquionline, a pyran, a furan, a pyrimidine, a purine, an oxazole, a benzofuran, a carbazole, a xanthene, a coumarin, an oxazine, a benzothiophene, a benzoxazole, an acridine, a dibenzofuran, a fluorene, an N-substituted azepine, an N-substituted azocine, a thiocane, an N- substituted azonane, a spiro compound, an indolizine, a benzimidazole, an isoindole, an azoindole, a cyclotrisiloxane, a cyclotetrasiloxane, a polycyclic aromatic hydrocarbon, an alkene, a biphenyl, a terphenyl, a triphenylmethane, a decalin, a phenanthrene, a phosphonate, a trisubstituted phosphine, a phosphonic acid, a phosphite, a borate, a norbornane, an oxanorbornene, a norbornene, an oxanorbornene, a dioxane, a di-tertiaryamine, a tri-tertiaryamine, a tetra-tertiary amine, an amide, an N,N-dialkylamide, a sulfonamide, a phosphonamide, a phthalimide, a gallate, an ether, a thioether, a thioamide, a mesitylene, a carboxylic acid functional molecule, a diene, a cyanurate, a guanidine, a urea, a substituted urea, a thiourea, a hydrazone, an oxime, a dibenzocyclooctene, a triazole, or an ester. The central moiety may join the A, B, and C elements of the chemically-reactive conjugate.
[0234] The chemically reactive conjugate may comprise a branched or modified oligonucleotide having one or more modified nucleobases or functional groups at the 5’, 3’ or internal positions, wherein they function as immobilization moiety(s), reactive moiety(s) or ITC-conjugate reactive moiety(s). Additionally, modified nucleobases may be used to join a N-terminal amino acid reactive group, such as a modified PITC or ITC-conjugate. Any set of nucleobases of the CRC may function as a linker, a spacer, and/or the cycle tag. The oligonucleotides may be branched or cyclic or possess secondary structures such as hairpins. There are many possible structures for the chemically reactive conjugate. It may be a synthetic (non-natural) structure, or a natural (biobased or biosynthetic) structure, or a synthetically modified version thereof. The chemically reactive conjugate may comprise a range of nucleotides, including but not limited to 2’F-arabinonucleotides (2’F-ANA), 2’F-ribonucleotides (2’F- RNA), threose based nucleotides (TNA), peptide nucleic acids (PNA), ribonucleotides (RNA), pyrimidine (C and T) rich or exclusively pyrimidine sequences of deoxyribonucleotides (DNA), 7- deazaadenine containing sequences, locked nucleic acids (LNA), Xeno nucleic acids (XNA), glycol nucleic acids (GNA), cyclohexene nucleic acid (CeNA), sequences containing phosphorothioate linkages, 2’0-alkylribonucleotides and combinations, chimeras or versions thereof. Further, these chemically reactive conjugates may comprise this range of nucleotides wherein the bases and or backbone functional groups (e.g., extracyclic amines, phosphate, thiophosphate) have protecting groups present. Nucleotides which possess enhanced stability to the conditions of peptide sequencing [[(Watts, J.K.; Katolik, A.; Viladoms, J.; Damha, M.J. Org. Biomol. Chem., 2009, 7, 1904-1910)]] (for example, strong acid), may be preferred embodiments. These include, but are not limited to 2’F- arabinonucleotides (2’F-ANA), 2’F-ribonucleotides (2’F-RNA), threose based nucleotides (TNA), peptide nucleic acids (PNA), ribonucleotides (RNA), pyrimidine rich or exclusively pyrimidine sequences of deoxyribonucleotides (DNA), 7-deazaadenine containing sequences, 2’0- alkylribonucleotides and the nucleotides described in [[WO1994015619A1]] which is incorporated by reference, as well as protected versions of all of the aforementioned.
[0235] In some embodiments, the chemically-reactive conjugate is prepared by an organic synthesis method. Some examples of multicomponent reaction schemes are shown in FIG. 38-42B.
[0236] Disclosed herein, in some embodiments, are chemically-reactive conjugate comprising (A) a cycle tag; (B) a reactive moiety; and (C) an immobilizing moiety. In some embodiments, (A), (B), and (C) are oriented linearly in relation to each other. In some embodiments, (A), (B), and (C) are oriented in any of the following orders: (A)-(B)-(C) (like Formula II), (A)-(C)-(B), or (B)-(A)-(C). In some embodiments, (A), (B), and (C) are linearly like Formula II and include optional linkers between (A),
(B), and (C), but in the following order: (A)-(C)-(B). In some embodiments, (A), (B), and (C) are linearly like Formula II and include optional linkers between (A), (B), and (C), but in the following order: (B)-(A)-(C). In some embodiments, each of (A), (B) and (C) are on independent arms in relation to each other.
[0237] In some embodiments, the CRC is linear in the order (A)-(B)-(C). In some embodiments, the CRC is linear in the order (A)-(C)-(B). In some embodiments, the CRC is linear in the order (B)-(A)-
(C). In some embodiments, the CRC each of (A), (B) and (C) are on independent arms.
[0238] Some embodiments include a cleavable group between (A) and (B), between (B) and (C), between (A) and (C), between (A) and (B+C), between (B) and (A+C), or between (C) and (A+B), or any combination thereof. Some embodiments include a cleavable group between (A) and (B). Some embodiments include a cleavable group between (B) and (C). Some embodiments include a cleavable group between (A) and (C). Some embodiments include a cleavable group between (A) and (B+C). Some embodiments include a cleavable group between (C) and (A+C). Some embodiments include a cleavable group between (C) and (A+B).
[0239] Some embodiments include a non-nucleic acid label (e.g. element A). In some embodiments, the detectable label comprises a fluorophore, a radioactive label, an isotopic label, a mass tag, a chemiluminescent tag, or an imaging tag. Some embodiments include a detectable label. In some embodiments, the detectable label is a fluorophore. In some embodiments, the detectable label is a radioactive label.
[0240] In some embodiments, the CRC comprises a pre-nucleic acid sequence tag comprising a group for attaching a nucleic acid sequence. In some embodiments, said group for attaching a nucleic acid sequence comprises an oxyamine group, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, a strained alkene, or derivative thereof. In some embodiments, said group for attaching a nucleic acid sequence is subsequently used to attach a nucleic acid sequence. In some embodiments, the nucleic acid sequence tag is generated upon conjugating the nucleic acid sequence to a group for attaching a nucleic acid sequence comprising an oxyamine group, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof. In some embodiments, the nucleic acid sequence tag is generated upon conjugating the nucleic acid sequence to a group for attaching a nucleic acid sequence comprising a protected oxyamine group, a protected thiol, a protected amine, a protected hydrazine, a tetrazine, an azide, an alkyne, an alkene, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof. In some embodiments, the conjugation occurs prior to the peptide sequencing steps. In some embodiments, the conjugation occurs after the CRC is reacted to the N-terminal amino acid. In some embodiments, the conjugation occurs after the CRC is reacted to and then cleaved from the N-terminal amino acid, but prior to initiation of the next cycle.
[0241] Some examples of chemically-reactive conjugates are included in Table 1.
[0242] In some embodiments, it may be advantageous to bind the N-terminal amino acid (NT A A) using an ITC-conjugate, and subsequently join the ITC conjugate to the CRC. This allows high concentration to afford fast kinetics and high yield of bound NTAA with reduced expense and steric hinderance. Thus, a reaction scheme is envisioned comprising: 1) an ITC-conjugate comprising an amine-reactive moiety (RG) for binding the NTAA and a functional group (X) and 2) a chemically reactive conjugate, comprising a cycle tag (Ct), an immobilizing moiety (SI) that can react with functional groups on a solid support, and a functional group (X’) that can couple to the ITC- conjugate. Examples of pairs of functional groups that may be suitable for joining the ITC-conjugate and CRC are shown in Table 2. The joining operation may be spontaneous, triggered (e.g., by light, temperature, solution composition change or other environmental change), or catalyzed.
[0243] In some embodiments the ITC-conjugate comprises a linker. Suitable chemical linkers include but are not limited to alkyl, aryl, ether, amide, cycloalkyl, branched or linear structures or combinations thereof. The linker may include a -C(O)-, -O-, -S-, -S(O)-, -NH-, -C(O)O-, -C(O)Cl-C10 alkyl, - C(O)Cl-C10 alkyl-O-, -C(O)Cl-C10 alkyl-CO2-, -C(O)Cl-C10 alkyl-NH-, -C(O)Cl-C10 alkyl-S-, - C(O)Cl-C10 alkyl-C(O)-NH-, -C(O)Cl-10 alkyl-NH-C(O)-, -C1-C10 alkyl-, -C1-C10 alkyl-O-, -Cl- C10 alkyl-CO2-, -C1-C10 alkyl-NH-, -C1-C10 alkyl-S-, -C1-C10 alkyl-C(O)-NH-, -C1-C10 alkyl-NH- C(O)-, -CH2CH2SO2-C1-C10 alkyl-, CH2C(O)-C1-C1-1Q alkyl-, =N-(O or N)-Cl-C10 alkyl-O-, =N- (O or N)-Cl-C10 alkyl-NH-, =N-(O or N)-Cl-C10 alkyl-CO2-, =N-(O or N)-Cl-C10 alkyl-S-,
[0244] In some embodiments, it may be advantageous for the cycle tag portion of the chemically reactive conjugate to be joined to the CRC between steps of the sequencing operation. For example, the cycle tag may be joined at any point between steps (a) and (f) of Process 200. Thus, general reaction schemes may comprise any of the following, where ITCC represents the ITC-conjugate, RG represents the reactive group toward the peptide N-terminus (e.g. isothiocyanate), Ct represents the cycle tag, X:X’ and Y:Y’ represent pairs of reactive functional groups for attaching the CRC to the ITC-conjugate or the cycle tag to the CRC respectively, and SI represents the surface immobilization moiety:
1. Peptide-NH2 + RG-CRC(-Ct)-SI -> Peptide-NH-RG-CRC(-Ct)-SI
2. i) Peptide-NH2 + RG-ITCC-X -> Peptide-NH-RG-ITCC-X ii) Peptide-NH-RG-ITCC-X + X’-CRC(-Ct)-SI -> Peptide-NH-RG-ITCC-X-X’- CRC(-Ct)-SI
3. i) Peptide-NH2 + RG-CRC(-Y)-SI -> Peptide-NH-RG-CRC(-Y)-SI ii) Peptide-NH-RG-CRC(-Y)-SI + Y’-Ct -> Peptide-NH-RG-CRC(-Y-Y’-Ct)-SI
4. i) Peptide-NH2 + RG-ITCC-X -> Peptide-NH-RG-ITCC-X ii) Peptide-NH-RG-ITCC-X + X’-CRC(-Y)-SI -> Peptide-NH-RG-ITCC-X-X’- CRC(-Y)-SI iii) Peptide-NH-RG-ITCC-X-X’-CRC(-Y)-SI + Y’-Ct -> Peptide-NH-RG-ITCC-X- X’-CRC(-Y-Y’Ct)-SI
[0245] It is to be understood that other permutations of the above schemes are possible, and are within the scope of this disclosure. For instance, multiple sets of reactive groups could be used to create the linkages that form the chemically reactive conjugate.
[0246] In one aspect the surface immobilization group is formed via a subsequent reaction after chemically-reactive conjugate is attached to the peptide N-terminus. The chemically reactive conjugate may have 2 or more surface immobilization moieties.
[0247] There are many possible functional groups and chemistries that can fulfill the above schemes. In preferred embodiments: X is not amine reactive or is much less amine reactive than the reactive group RG; cross reactions between X and RG are minimal or absent; all coupled linkages have some stability to the conditions of peptide sequencing (except of course the intended cleavage of the peptide NTAA from the remainder of the peptide); and the newly formed linkages are not amine reactive and/or not reactive toward surface functional groups; X:X’ and Y:Y’ are not or are minimally reactivity with surface functional groups or surface immobilizing groups; and the surface functional groups and surface immobilization moieties require an activation step or catalyst in order to couple with each other to a significant extent. Reactions between the surface immobilizing groups and surface functional groups which occur slowly or to a low degree of conversion in the absence of an activation step are also within the scope of this disclosure. The activation step may include a deprotection step, a thermal treatment, a pH change, an enzymatic treatment, a light exposure, exposure to a catalyst or any other treatment which enables the reaction to occur, or any combination of the aforementioned. Table 2 provides a non- exhaustive list of possible reactive functional groups that could satisfy the roles of X, X’, Y and Y’.
Table 2. Examples of reactive functional groups for a bifunctional reagent or a chemically reactive conjugate.
* requires protection of one or more functional group types on peptide/protein (e.g. lysine, cysteine, etc)
** photocatalyzed
*** requires amine- or thiol-containing group on the CRC, and the other member of the pair on the bifunctional reagent or the surface functional group
Cycle Tags
[0248] Disclosed herein, in some embodiments, are cycle tags. The cycle tag may be associated with a cycle number. The cycle number may correspond with an amino acid number, for example an amino acid number of a peptide when numbered from N to C. The cycle tag may be a part of a chemicallyreactive conjugate.
[0249] The cycle tag may include a cycle nucleic acid. In some embodiments, the cycle nucleic acid comprises DNA or RNA. In some embodiments, the cycle tag nucleic acid includes RNA, peptide, synthetic small molecule, or peptide nucleic acid. In some embodiments, the cycle tag is a fluorescent tag.
[0250] In some embodiments, the cycle tag comprises a peptide. In some embodiments, the cycle tag comprises a peptide nucleic acid. In some embodiments, the cycle tag comprises a fluorescent tag. In some embodiments, the cycle tag comprises a small molecule. In some embodiments, the cycle tag comprises nucleic acid. In some embodiments, the cycle tag is synthetic.
[0251] Disclosed herein, in some embodiments, are nucleic acid tags. The nucleic acid tag may be included within a chemically reactive conjugate. The nucleic acid tag of the chemically reactive conjugate may be referred to, or be included as an example of a cycle nucleic acid tag. In some embodiments, the nucleic acid sequence tag comprises a DNA or RNA sequence. In some embodiments, the nucleic acid sequence tag comprises at least 10 nucleotides. In some embodiments, the nucleic acid sequence tag is ligated or bound to an additional oligonucleotide.
[0252] In some embodiments, the nucleic acid sequence tag is a DNA sequence. In some embodiments, the nucleic acid sequence tag is an RNA sequence. In some embodiments, the nucleic acid sequence tag is a sequence of at least 10 nucleotides. In some embodiments, the nucleic acid sequence tag is a site for ligating or binding further oligonucleotides and may not include nucleic acids itself.
Reactive Moieties
[0253] Disclosed herein, in some embodiments, are reactive moieties. The reactive moiety may be included as part of a chemically-reactive conjugate.
[0254] In some embodiments, the reactive moiety comprises an Edman degradation reagent. In some embodiments, the reactive moiety comprises a phenyl isothiocyanate (PITC). In some embodiments, the reactive moiety comprises an isothiocyanate (ITC) or some derivative thereof. In some embodiments, the reactive moiety comprises dansyl chloride or some derivative thereof. In some embodiments, the reactive moiety comprises dinitrofluorobenzene (DNFB) or some derivative thereof. [0255] In some embodiments, the reactive moiety comprises an enzyme or peptide. In some embodiments, the reactive moiety is an enzyme. In some embodiments, the reactive moiety is a peptide. In some embodiments, the reactive moiety specifically cleaves at a specific amino acid. In some embodiments, the reactive moiety specifically cleaves at a specific amino acid that is not N-terminal. In some embodiments, the reactive moiety specifically cleaves at a specific amino acid that is not be the N-terminal acid. In some embodiments, the enzyme or peptide has aminopeptidase activity. In some embodiments, the enzyme or peptide is a modified aminopeptidase. In some embodiments, the reactive moiety cleaves more than a single amino acid. In some embodiments, the reactive moiety cleaves 2, 3, 4, 5 or more amino acids. In some embodiments, the reactive moiety cleaves amino acids at a specific motif. In some embodiments, the motif is at the carboxyl side of lysine (K) and arginine (R) amino acid residues, as long as the next residue is not proline. In some embodiments, the reactive moiety binds and cleaves to a c-terminal amino acid. In some embodiments, the reactive moiety that binds and cleaves to a c-terminal amino acid comprises a modified carboxypeptidase. In some embodiments, the reactive moiety cleaves more than a single amino acid. Examples of reactive moieties that may bind and cleave more than a single amino acid may include a peptidyldipeptidase, or a modified peptidyldipeptidase, such as a modified angiotensin-converting enzyme (ACE). The reactive moiety may include ACE or a modified ACE.
[0256] Some embodiments comprise C-terminal peptide degradation, for example following the alkylated thiohydantoin method described by DuPont et al. Dupont DR, Bozzini M, Boyd VL. The alkylated thiohydantoin method for C-terminal sequence analysis. EXS. 2000; 88:119-31. doi.org/10.1007/978-3-0348-8458-7_8. The C-terminal carboxyl may be converted to a thiohydantoin via treatment with acetic anhydride followed by thiocyanate ion under acidic conditions. Optionally, the C-terminus can be converted to a thiohydantoin via reaction with diphenyl phosphoroisothiocyanatidate (DPP-ITC). Alkylation of the thiohydantoin can be achieved via reaction with an alkyl halide functional chemically reactive conjugate under basic conditions, resulting in alkylation at the sulfur of the thiohydantoin. This is useful for linking the C-terminus with the CRC. The cleavage of the C-terminal amino acid conjugate may be achieved with thiocyanate ion under acidic conditions.
[0257] In some embodiments, the reactive moiety comprises a group on the CRC for attaching to a cleavable derivatized N-terminal amino acid, comprising a tetrazine, an azide, an alkene, an alkyne, a trans-cyclooctene, a DBCO, a bicyclononyne, a norbornene, a strained alkyne, or a strained alkene, or a derivative thereof.
Immobilizing Moieties
[0258] Disclosed herein, in some embodiments, are immobilizing moieties. The immobilizing moiety may be included as part of a chemically-reactive conjugate.
[0259] In some embodiments, the immobilizing moiety comprises a thiol group, an amine group, or a carboxyl group. In some embodiments, the immobilizing moiety comprises a protected thiol group, a protected amine group, or a carboxyl group, an azide, an alkyne, an alkene, an aryl boronic acid, an aryl halide, a haloalkyne, a silylalkyne, a Si-H group, a protected or photoprotected reactive group, or a photoactivated reactive group. In some embodiments, the immobilizing moiety is an azide, an alkyne, an alkene, an aryl boronic acid, an aryl halide, a haloalkyne, a silylalkyne, a Si-H group, a protected or photoprotected reactive group, or a photoactivated reactive group. The immobilizing moiety may include a thiol. The immobilizing moiety may include an amine. The immobilizing moiety may include an alkyne. The immobilizing moiety may include an azide. The immobilizing moiety may include an alkene. The immobilizing moiety may include an aryl boronic acid. The immobilizing moiety may include an aryl halide. The immobilizing moiety may include a haloalkyne. The immobilizing moiety may include a silylalkyne. The immobilizing moiety may include a Si-H group. The immobilizing moiety may include a protected or photoprotected reactive group (such as a pyridyl disulfide, a phenylacyl protected thiol, a nitrobenzyl protected thiol, a photocaged DBCO). The immobilizing moiety may include a photoactivated reactive group (such as an azirine, a tetrazole, a sydnone, a 3- hydroxynapthalen-2-ol) .
[0260] In some embodiments, the immobilizing moiety is a thiol group. In some embodiments, the immobilizing moiety is an amine group. In some embodiments, the immobilizing moiety is a carboxyl group. In some embodiments, the moiety includes a protected amine, a protected oxyamine, a protected hydrazine, or a blocked isocyanate.
Linkers
[0261] Any of the components of the CRC may be linked. The linkage may be through a linker. The components may have the same or different linkers. When the CRC includes the structure of Formula I, LA, LB, or Lc may include a linker. LA may include a linker. LB may include a linker. When the CRC includes the structure of Formula II, LAB, or LBC- LA may include a linker. LAB may include a linker. LBC may include a linker. In some embodiments, the CRC comprises a linker located at LA, LB, and/or LC- [0262] In some embodiments, the linker comprises polyethylene glycol (PEG), a hydrocarbon, an ether, a carboxyl, an amine, an amide, an azide, a thiol, an azide-thiol, an alkylene, a heteroalkylene, a cyclic group, phenyl, or a combination thereof. The linker may include polyethylene glycol (PEG). The PEG may comprise PEGn, such as PEGmo-
[0263] In some cases, the linker comprises an alkylene. In some instances, the alkylene is a C1-C20 alkylene or a derivative thereof. In some instances, the C1-C20 alkylene may optionally be substituted variants thereof. In some instances, the alkylene is a Cl -CIO alkylene or a derivative thereof. In some cases, the linker comprises a heteroalkylene. In some instances, the heteroalklyene comprises a PEGi- n, wherein n is any suitable integer. In some instances, n is an integer from 2-100. In some instances, n is an integer from 2-50. In some instances, n is an integer from 2-25. In some instances, n is an integer form 2-20. In some instances, the heteroalkylene comprises a PEG1-20 (e.g. 1 to 20 units of polyethene glycol) or a derivative thereof. In some instances, the PEG1-20 may optionally be substituted variants thereof. The linker may comprise an oligoethylene glycol, a peptide, an oligopropylene glycol, an oligoamide, an oligosaccharide, a siloxane, a fully-alkylated polyamine, a polyol, an oligomeric polyester, a nucleic acid, or an oligomeric poly( tetramethylene oxide). In some aspects, the linker may be modified, for example, with one or more of the following: a heterocycle, a carbocycle, a thioester, an ether, a thioether, a tertiary amine, an amide, a carbamate, a sulfonamide, a dibenzocyclooctene, a triazole, a thioamide, an oxime, a hydrazone, a urea, a thiourea, a carbonyl (such as an ester or amide), or a carbonate. The number of PEG units in a PEG linker or carbon atoms in an alkylene linker can be decreased or increased as needed. Varying the number of PEGs or carbon atoms in the linker may have varying effects chemical reactive arm reach. For example, longer PEG arms may be useful for allowing greater flexibility or promiscuity, while and shorter PEG arms may provide more rigidity or specificity. [0264] The linker may include a -C(O)-, -O-, -S-, -S(O)-, -C(O)O-, -C(O)Cl-C10 alkyl, -C(O)Cl-C10 alkyl-O-, -C(O)Cl-C10 alkyl-CO2-, -C(O)Cl-C10 alkyl-S-, -C(O)Cl-10 alkyl-NH-C(O)-, -C1-C10 alkyl-, -C1-C10 alkyl-O-, -C1-C10 alkyl-CO2-, -C1-C10 alkyl-S-, -C1-C10 alkyl-NH-C(O)-, - CH2CH2SO2-C1-C10 alkyl-, CH2C(O)-Cl-Cl-10 alkyl-, =N-(O or N)-Cl-C10 alkyl-O-, =N-(O or N)-
. Any or all of the linkers, such as LA, LB, LC, LAB, or LBC, may independently include or be selected from any of the aforementioned linkers. LA may be cleavable. LB may be cleavable. Lc may be cleavable. LAB may be cleavable. LBC may be cleavable. Any combination of the aforementioned linkers may be used.
[0265] A linker may be included between a cycle tag and a reactive moiety (e.g. in a linear version of the CRC), and said linker may be cleavable. A linker may be included between a cycle tag and an immobilizing moiety (e.g. in a linear version of the CRC), and said linker may be cleavable. A linker may be included between a reactive moiety and an immobilizing moiety (e.g. in a linear version of the CRC), and said linker may be cleavable. Any combination of the aforementioned linkers may be used. [0266] In some embodiments, one or more of the linker(s) are cleavable. In some embodiments, one or more cleavable linker(s) comprises a disulfide. The linker may include a cleavable moiety. In some aspects, the cleavable moiety is cleaved by light, an enzyme, or a combination thereof. In some aspects, the light comprises UV light, visible light, IR light, laser, or a combination thereof. In some aspects, the cleavable moiety comprises a photocleavable moiety. In some aspects, the photocleavable moiety comprises an o-nitrobenzyloxy group, o-nitrobenzyl amino group, o-nitrobenzyl group, o-nitroveratryl group, phenacyl group, p-alkoxyphenacyl group, benzoin group, or a pivaloyl group. In some aspects, the photocleavable moiety comprises the o-nitrobenzyl group. In some aspects, the o-nitrobenzyl group is substituted with a methoxy group or an ethoxy group.
[0267] A cleavable moiety may be cleaved by light, under acidic conditions, under basic conditions, an enzyme, or a combination thereof. In some cases, the light may comprise UV light, visible light, IR light, laser, or a combination thereof. In such cases, the cleavable moiety may be a photocleavable moiety. The photocleaveable moiety may comprise an electon withdrawing group, such as, but not limited to a nitro group or halide group. In alternative cases, the cleavable moiety may be an enzymatically cleavable moiety.
[0268] The cleavable moiety may include a pH sensitive cleavable bond which can be cleaved under acidic or basic conditions. In some non-limiting examples, the cleavable moiety may include a pH sensitive cleavable bond which is cleaved by acidifying the solution. In some non-limiting examples, the cleavable moiety may include a pH sensitive cleavable bond which is cleaved by making the solution basic. The pH sensitive cleavable bond is advantageous because the molecule can be delivered, but would not react until it was under a slightly acidified environment which can be beneficial for the method of protein sequencing.
[0269] The cleavable moiety may include a disulfide bond. The disulfide bond may be chemically or enzymatically formed. The disulfide bond may be cleaved by a reducing agent. The disulfide bond may be enzymatically cleavable. The cleavable moiety may include a protein or peptide sequence that is recognized and cleaved by the enzyme. For example, the cleavable moiety may include the peptide sequence ENLYFQ*S (where * denotes a cleavage site). The disulfide bond may be included as part of a peptide. [0270] An enzyme that cleaves a cleavable moiety may include an enzyme that cleaves a disulfide bond. Some examples of enzymes that may cleave disulfide bonds include thioredoxin or glutaredoxin. The enzyme may include trypsin. The enzyme may include a virus that cleaves a specific peptide sequence. For example, a tobacco etch virus (TEV) protein that specially cleaves the peptide sequence ENLYFQ*S (where * denotes a cleavage site) may be used. This or another peptide sequence may be present in between the central moiety and one (or any) of the arms. After linkage and enrichment, may bond could be cleaved, thereby releasing the molecule of interest.
[0271] The photocleavable moiety may be cleaved by UV light. The UV light may have a wavelength in the range of about 100 nm to about 400 nm, about 200 nm to about 400 nm, about 250 nm to about 400 nm, about 280 nm to about 400 nm, about 100 nm to about 370 nm, about 200 nm to about 370 nm, about 250 nm to about 370 nm, or about 280 nm to about 370 nm. In some instances, the photocleavable moiety comprises a nitrobenzyl oxy group, nitrobenzylamino group, nitrobenzyl group, nitroveratryl group, phenacyl group, alkoxyphenacyl group, benzoin group, or a pivaloyl group. In some examples, the nitro group may be in the ortho position of the benzyl, veratryl, phenacyl, benzoin, or pivaloyl group relative to site of cleavage (e.g., o-nitrobenzyloxy group, o-nitrobenzylamino group, o-nitrobenzyl group, o-nitroveratryl group). In some examples, the alkoxy group may be in the para position of the benzyl, veratryl, phenacyl, benzoin, or pivaloyl group relative to the site of cleavage (e.g., p- alkoxyphenacyl group). In one aspect, the photocleavable moiety comprises a nitrobenzyl group. The nitro group may be ortho to the benzyl group relative to the site of cleavage (o-nitrobenzyl group). The o-nitrobenzyl group may be substituted with a methoxy or an ethoxy. In some cases, the methoxy or ethoxy may be substituted in the para position relative to the nitro of the o-nitrobenzyl group. In further examples, the o-nitrobenzyl group may comprise a linkage connecting to a linker, such as those described herein, that further connects to the central moiety. The linkage may be in the meta position relative to the nitro group. The linkage may comprise an ester, an ether, an amine, an amide, a carbamate, -O- C1-C10 alkyl-, or any other linkage described herein. In some examples, the photocleavable moiety may comprise the structure represented by the formula:
[0272] In such examples, n may be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. [0273] Any or all of the linkers, such as LA, LB, LC, LAB, or LBC, may independently include or be selected from any of the aforementioned cleavable linkers or non-cleavable linkers or a combination of cleavable and non-cleavable linkers.
Process 200
[00274] In certain embodiments, a method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes is provided, the method comprising: (a) providing a peptide of mer length n=2 to 2000 joined to a solid support; (b) providing a first chemicallyreactive conjugate, e.g., a PITC-conjugate, wherein the first chemically-reactive conjugate comprises a cycle tag (e.g., a “cycleTag”) with identifying information regarding a workflow cycle of the method, a reactive moiety that can bind and cleave a terminal amino acid of the peptide, and a reactive moiety that facilitates immobilization to a solid support, (c) contacting the peptide with the first chemically- reactive conjugate, wherein the first chemically-reactive conjugate binds with a terminal amino acid, or a modified terminal moiety, of the peptide to form a conjugate complex, e.g., a PTC-AA-cycle tagconjugate complex, (d) immobilizing the conjugate complex to the solid support, (e) cleaving the terminal amino acid from the peptide thereby providing an immobilized conjugate complex, and a new terminal amino acid of the peptide joined to the solid support of (a), (f) contacting the immobilized conjugate complex with a first binding agent capable of binding to the immobilized conjugate complex, wherein the first binding agent comprises a binding moiety and a first recode tag (e.g., a “recodeTag”) with identifying information regarding the first binding agent; (g) transferring the information of the first recode tag associated with the first binding agent and the cycle tag of the immobilized conjugate complex, to generate a first recode block (e.g., a “recodeBlock”); (h) optionally repeating steps (b) through (g) to assemble a second recode block having recoding information for the new terminal amino acid of the peptide; (i) optionally repeating step (h) for additional iterative cycles to create additional recode blocks for additional amino acids of the immobilized peptide of step (a); (j) optionally deprotecting nucleic acids of the first, second, and additional recode blocks; (k) contacting the recode blocks with polymerase, nucleotides, ligase, and buffer under conditions that allow extension-ligation to assemble the recode blocks into a memory oligonucleotide (e.g., a “memoryOligo”); and (1) analyzing the memory oligonucleotide.
[00275] In some aspects, one or more operations of the method are repeated one or more times to increase a step yield of the method. For example, in specific aspects, operations (c), (d), (e), (f), and/or (g) are repeated one or more times to increase the step yield.
[00276] In some aspects, different solvents and reaction parameters are used when executing steps (c) or (d), or any appropriate step such that the distribution of immobilization sites of amino acid complexes is modulated.
[00277] In some aspects, enhancers, such as m-PEG2-azide, octylamine, and n-propylamine, are introduced to surfaces or reagents. [00278] In some aspects, the method further comprises, between operation (h) and (j) and/or after operation (k), contacting the immobilized conjugate complex with a promiscuous binding agent capable of binding to the immobilized conjugate complex independent of the identity of an amino acid (AA) within the conjugate complex, and wherein the promiscuous binding agent comprises a binding moiety that associates with the immobilized conjugate independent of the AA. The promiscuous binding agent may carry specific cycle information, or a promiscuous recode tag (e.g., inosine bases) capable of hybridization to any cycle tag (or subset of cycle tags) and that carries identifying information regarding the promiscuous binding agent. This provides robustness to the binding recognition operation, and may be repeated one or more times to increase the step yield. In such aspects, operation (k) may be repeated after contacting the immobilized conjugate complex with the promiscuous binding agent.
[00279] In some aspects, the peptide comprises any suitable macromolecular polymer, including a protein, a peptide, a complex carbohydrate, and the like. In such aspects, a monomeric unit of the macromolecular polymer may comprise an amino acid, a carbohydrate, and/or any monomeric moiety that may be combined into a polymer.
[00280] In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., moieties used to join the complex to a solid support), and the reaction comprises an activatable chemistry. In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., moieties used to join the complex to a solid support), and the reaction comprises an activatable chemistry. In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., moieties used to join the complex to a solid support), and the reaction comprises a reversible chemistry and activatable chemistry.
[00281] In some aspects, the recode tag linked to the binding agent is a nucleic acid having a sequence corresponding to an (n-l)th cycle tag or (n+/-i)th cycle tag, an amino acid (AA) tag (e.g., an “AAtag”), and an nth cycle tag. Optionally, the recode tag linked to the binding agent is a nucleic acid having a universal sequence for amplification or assembly, a sequence complementary to a cycle tag (e.g., a “cycle tag complement sequence”), and an amino acid (AA) tag (e.g., an “AAtag”).
[00282] In some aspects, operation (k) comprises contacting the recode blocks with ligase, A A tag oligonucleotide complements, and buffer under conditions that allow ligation to assemble the recode blocks and AA tag oligonucleotide complements into a memory oligo, or create a fragment of a memory oligo.
Process 200-1
[0283] Disclosed herein, in some embodiments, are methods for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising: (a) providing the peptide to the solid support, the peptide coupled to the solid support such that a N- terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) providing a chemically-reactive conjugate, the chemically-reactive conjugate comprising: (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide; and (z) a immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the chemically-reactive conjugate, thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (g) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block; (j) obtaining sequence information for the recode block; and (k) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide.
[0284] Disclosed herein, in some embodiments, are methods for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support. The method may include providing the peptide to the solid support. In some embodiments, the peptide is coupled to the solid support, for example such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support or is exposed to reaction conditions. The method may include providing a chemically- reactive conjugate. The chemically-reactive conjugate may include a cycle tag. The cycle tag may include a cycle nucleic acid associated with a cycle number. The chemically-reactive conjugate may include a reactive moiety. The reactive moiety may be useful for binding the N-terminal amino acid residue of the peptide. The chemically-reactive conjugate may include an immobilizing moiety. The immobilizing moiety may be useful for immobilization to the solid support. The method may include contacting the peptide with the chemically-reactive conjugate. Contacting the peptide with the chemically-reactive conjugate may couple the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex. The method may include immobilizing the conjugate complex to the solid support, for example via the immobilizing moiety. The method may include cleaving or separating the N-terminal amino acid residue from the peptide. Cleaving or separating the N-terminal amino acid residue from the peptide may provide an immobilized amino acid complex. The immobilized amino acid complex may include the cleaved and separated N-terminal amino acid residue. The method may include contacting the immobilized amino acid complex with a binding agent. The binding agent may include a binding moiety. The binding moiety may be useful for preferentially binding to the immobilized amino acid complex. The binding agent may include a recode tag. The recode tag may include a recode nucleic acid corresponding with the binding agent. Contacting the immobilized amino acid complex with the binding agent may form an affinity complex. The affinity complex may include an immobilized amino acid complex. The affinity complex may include a binding agent. Contacting the immobilized amino acid complex with the binding agent may bring the cycle tag into proximity with the recode tag, for example within the affinity complex. The method may include transferring information of the recode nucleic acid and the cycle nucleic acid. This may generate a recode block. The recode block may be assembled into a memory oligonucleotide. The method may include joining one or more recode blocks created from one or more amino acid residues. The method may include obtaining sequence information of the recode blocks. The method may include obtaining sequence information of the memory oligonucleotide. The method may include, based on the obtained sequence information, determining information of an amino acid residue of the peptide. The information may include identity information. The information may include positional information. In some embodiments, cleaving the N-terminal amino acid residue from the peptide exposes a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide. In some embodiments, the reactive moiety of the chemically-reactive conjugate cleaves the N-terminal amino acid residue from the peptide. Some embodiments include repeating any of the aforementioned steps for each subsequent amino acid of the peptide. In some embodiments, the immobilizing moiety comprises an activatable chemical moiety, alkyne. Some embodiments include joining the chemical moiety to the solid support. In some embodiments, cleaving the N-terminal amino acid residue from the peptide exposes a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide. In some embodiments, the reactive moiety of the chemically-reactive conjugate cleaves the N-terminal amino acid residue from the peptide. Some embodiments include washing away chemically-reactive conjugates that are not joined to the solid support before contacting the next N-terminal amino acid of the peptide with a chemically-reactive conjugate. Some embodiments include contacting the immobilized amino acid complex with a binding agent to form an affinity complex. Some embodiments include washing the immobilized amino acid complex before said contacting the immobilized amino acid complex with a binding agent. Some embodiments include washing the immobilized amino acid affinity complex after said contacting the affinity complex with one or a set of binding agents. In some embodiments, the sequence information is used to determine the likely three-dimensional structure of the peptide. Some embodiments include repeating steps (b) through (k) for each subsequent amino acid in the peptide. Some embodiments include joining the recode nucleic acid or a sequence of the recode nucleic acid with the cycle nucleic acid or a sequence of the cycle nucleic acid to generate a recode block.
Process 200-2
[0285] Disclosed herein, in some embodiments, are methods for determining identity and positional information of a plurality of amino acid residues of a peptide, the peptide comprising n amino acid residues, the method comprising: (a) coupling the peptide to a solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) providing a chemically-reactive conjugate, the chemically-reactive conjugate comprising: (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number, (y) a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide, and (z) an immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the chemically-reactive conjugate, thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as a N-terminal amino acid residue on the cleaved peptide and providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) repeating (b) through (e) n-1 times to assemble n-1 additional immobilized amino acid complexes, each additional immobilized amino acid complex comprising a nucleic acid associated with cycle 2 to n, accordingly; (g) contacting the immobilized amino acid complexes with a binding agent or a set of binding agents, each binding agent comprising: a binding moiety for preferentially binding to one or to a subset of the immobilized amino acid complexes, and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming one or more affinity complexes, each affinity complex comprising an immobilized amino acid complex and the binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (h) within each formed affinity complex, joining a cycle tag or a reverse complement thereof to a recode tag to form a recode block, or otherwise transferring information of the recode nucleic acid and the cycle nucleic acid of the immobilized conjugate complex, thereby creating a plurality of recode blocks, each recode block corresponding with a formed affinity complex; (i) joining two or more members of the plurality of recode blocks to form a memory oligonucleotide; (j) obtaining sequence information for the memory oligonucleotide; and (k) based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide.
[0286] Disclosed herein, in some embodiments, are methods for determining identity and positional information of a plurality of amino acid residues of a peptide. The peptide may include n amino acid residues. The method may include coupling the peptide to a solid support. The coupling may be such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support or is exposed to reaction conditions. The method may include providing a chemically-reactive conjugate. The chemically-reactive conjugate may include a cycle tag comprising a cycle nucleic acid associated with a cycle number the chemically-reactive conjugate may include a reactive moiety. The reactive moiety may bind and/or cleave the N-terminal amino acid residue of the peptide. The reactive moiety may expose a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide. The chemically-reactive conjugate may include an immobilizing moiety for immobilization to the solid support. The method may include contacting the peptide with the chemically-reactive conjugate. Such contacting may couple the chemically-reactive conjugate to the N-terminal amino acid of the peptide, and may form a conjugate complex. The method may include immobilizing the conjugate complex to the solid support. The immobilization may be via the immobilizing moiety. The method may include cleaving and thereby separating the N-terminal amino acid residue from the peptide. The cleaving may expose the next amino acid residue as a N-terminal amino acid residue on the cleaved peptide. The method may include providing an immobilized amino acid complex. The immobilized amino acid complex may include the cleaved and separated N-terminal amino acid residue. The method may include repeating steps n-1 times to assemble n-1 additional immobilized amino acid complexes. Additional immobilized amino acid complexes may include a nucleic acid associated with cycle 2 to n. The method may include contacting the immobilized amino acid complexes with one or a set of binding agents. The binding agent may include a binding moiety for preferentially binding to one or to a subset of the immobilized amino acid complexes. The binding agent may include a recode tag. The recode tag may include a recode nucleic acid corresponding with the binding agent. Contacting the immobilized amino acid complexes with one or more binding agents may form one or more affinity complexes. The affinity complexes may include an immobilized amino acid complex and the binding agent. Contacting the immobilized amino acid complexes with a binding agent may bring a cycle tag into proximity with a recode tag within the formed affinity complexes. The method may include, within each formed affinity complex, joining a cycle tag or a reverse complement thereof to a recode tag. The joining may form a recode block. The joining or method may include creating a plurality of recode blocks. Each recode block may correspond with a formed affinity complex. The method may include joining two or more members of the plurality of recode blocks to form a memory oligonucleotide. The method may include obtaining sequence information for the memory oligonucleotide. The method may include, based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide. In some embodiments, n is an integer greater than or equal to 2. In some embodiments, each binding agent comprises recode tags with a unique nucleic acid sequence. In some embodiments, a plurality of binding agents comprises recode tags with the same nucleic acid sequence. In some embodiments, binding agents comprises recode tags which may have a unique sequence portion and a common sequence portion.
[0287] Disclosed herein, in some embodiments, are chemically-reactive conjugates comprising: (a) a nucleic acid sequence tag; (b) a reactive moiety for binding and cleaving a N-terminal amino acid residue from a peptide; and (c) an immobilizing moiety for immobilization to a solid support.
[0288] Disclosed herein, in some embodiments, are chemically-reactive conjugates. The chemically- reactive conjugate may include a nucleic acid sequence tag. The chemically-reactive conjugate may include a reactive moiety. The reactive moiety may be useful for binding a N-terminal amino acid residue. The reactive moiety may be useful for cleaving a N-terminal amino acid residue from a peptide. The chemically-reactive conjugate may include an immobilizing moiety. The immobilizing moiety may be useful for immobilization to a solid support. Also disclosed are kits containing any of the components described herein.
Process 200-3
[00289] In certain embodiments, a method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes is provided, the method comprising: (a) providing a peptide of mer length n=2 to 2000 joined to a solid support; (b) providing a first chemically- reactive conjugate (e.g. a PITC-conjugate), wherein the conjugate comprises a cycle tag with identifying information regarding a workflow cycle of the method, a reactive moiety that can bind and cleave a terminal amino acid of the peptide, and a reactive moiety that facilitates immobilization to a solid support; (c) contacting the peptide with the first chemically-reactive conjugate, wherein the first chemically-reactive conjugate binds with the terminal amino acid, or a modified terminal moiety, of the peptide to form a first conjugate complex, e.g., a PIT-AA-cycle tag-conjugate complex; (d) immobilizing the first conjugate complex to the solid support; (e) cleaving the terminal amino acid from the peptide thereby providing a first immobilized conjugate complex and a new terminal amino acid of the peptide joined to the solid support of (a); (f) optionally repeating (b) through (e) to assemble a second immobilized conjugate complex having cycle information for the new terminal amino acid of the peptide, (g) optionally repeating (f) for additional iterative cycles to create additional immobilized conjugate complexes for additional amino acids of the peptide of step (a); (h) optionally deprotecting nucleic acids of the conjugate complex and/or any protected nucleic acids associated with the solid support; (i) contacting the first immobilized conjugate complex with a first binding agent capable of binding to the first immobilized conjugate complex, wherein the first binding agent comprises a binding moiety and a recode tag with identifying information regarding the first binding agent; (j) transferring the information of the recode tag associated with the first binding agent to the cycle tag of the first immobilized conjugate complex to generate a first recode block; (k) optionally repeating (i) and (j) with a second binding agent comprising a binding moiety and a recode tag with identifying information regarding the second binding agent to transfer the information of the recode tag associated with the second binding agent to the second immobilized conjugate complex to generate a second recode block; (1) optionally repeating (k) for additional cycles to create recode blocks for additional amino acids of the peptide of step (a); (m) contacting the recode blocks with polymerase, nucleotides, ligase, and buffer under conditions that allow extension-ligation to assemble the recode blocks into a memory oligo, or create a fragment of a memory oligo; and (n) analyze the memory oligo.
[00290] In some aspects, one or more operations of the method are repeated one or more times to increase a step yield of the method. For example, in specific aspects, operations (c), (d), (e), (i), and/or (j) are repeated one or more times to increase the step yield. [00291] In some aspects, the method further comprises after operation (m), contacting the first immobilized conjugate complex with a promiscuous binding agent capable of binding to the first immobilized conjugate complex independent of the identity of an amino acid within the conjugate complex, wherein the promiscuous binding agent comprises a binding moiety that associates with the immobilized conjugate independent of the amino acid. The promiscuous binding agent may carry specific cycle information, or a promiscuous recode tag (e.g., inosine bases) capable of hybridization to any cycle tag (or subset of cycle tags) and that carries identifying information regarding the promiscuous binding agent. This provides robustness to the binding recognition operation, and may be repeated one or more times to increase the step yield. In such aspects, operation (m) may be repeated after contacting the immobilized conjugate complex with the promiscuous binding agent.
[00292] In some aspects, assembly (e.g., joining) of the recode blocks is facilitated by utilization of a permissive polymerase, such as polymerase theta (Pol0), or by utilization of proteins involved in blunt end DNA ligation processes similar to non-homologous end joining (NHEJ). See, e.g., Poplawski T et al., Postepy Biochem 2009; 55(l):36-45; Davis AJ, Chen DJ, Transl Cancer Res. 2013 June; 2(3): 130— 143.
[00293] In some aspects, the peptide comprises any suitable macromolecular polymer, including a protein, a peptide, a polypeptide, and the like. In such aspects, a monomeric unit of the macromolecular polymer may comprise an amino acid, a carbohydrate, and/or any monomeric moiety that may be combined into a polymer.
Process 200-4
[0294] Disclosed herein, in some embodiments, are methods method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes, comprising: (a) providing a peptide of mer length n=2 to 2000 joined to a solid support; (b) providing a first chemically-reactive conjugate, wherein the conjugate comprises a cycle tag, a reactive moiety that can bind and cleave a terminal amino acid of the peptide, and a reactive moiety that facilitates immobilization to a solid support; (c) contacting the peptide with the first chemically-reactive conjugate, wherein the first chemically-reactive conjugate binds with the terminal amino acid, or a modified terminal moiety, of the peptide to form a first conjugate complex; (d) immobilizing the first conjugate complex to the solid support; (e) cleaving the terminal amino acid from the peptide thereby providing a first immobilized conjugate complex and a new terminal amino acid of the peptide joined to the solid support of (a); (f) optionally repeating processes (b) through (e) to assemble a second immobilized conjugate complex having cycle information for the new terminal amino acid of the peptide; (g) optionally repeating (f) for additional iterative cycles to create additional immobilized conjugate complexes for additional amino acids of the peptide of step (a); (h) optionally deprotecting nucleic acids of the conjugate complex and/or any protected nucleic acids associated with the solid support; (i) contacting the first immobilized conjugate complex with a first binding agent capable of binding to the first immobilized conjugate complex, wherein the first binding agent comprises a binding moiety and a recode tag with identifying information regarding the first binding agent; (j) transferring the information of the recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block; (k) optionally repeating (i) and (j) with a second binding agent comprising a binding moiety and a recode tag with identifying information regarding the second binding agent to transfer the information of the recode tag associated with the second binding agent and the second immobilized conjugate complex to generate a second recode block; (1) optionally repeating (k) for additional cycles to create recode blocks for additional amino acids of the peptide of step (a); (m) contacting the recode blocks with polymerase, nucleotides, ligase, and buffer under conditions that allow extension-ligation to assemble the recode blocks into a memory oligo, or create a fragment of a memory oligo; and (n) analyzing the memory oligo. Any of the aforementioned method steps may be used alone or in combination with other steps or methods described herein. In some embodiments, (e), (i), and (j) are repeated one or more times to increase a step yield of the method. [0295] In some embodiments, step (b) comprises: an ITC-conjugate and chemically-reactive conjugate, wherein the conjugate comprises a cycle tag, a reactive moiety that can bind to the ITC- conjugate, and a reactive moiety that facilitates immobilization to a solid support.
[0296] In some embodiments, step (c) comprises: the steps of contacting the N-terminus of the peptide with an ITC-conjugate, washing unbound ITC conjugate away, and subsequently contacting the ITC- conjugate-modified N-terminal-amino-acid-derivative of the peptide with a chemically-reactive conjugate to form a conjugate complex;
[0297] In some embodiments step (c) comprises repeating the ITC-conjugate coupling to improve stepwise reaction yield. Following the initial introduction and coupling of ITC-conjugate, unbound conjugate is removed, e.g., via solution exchange, and the peptide is again contacted with a second ITC- conjugate to bind any unreacted amine groups of a N-terminal amino acids. The second ITC conjugate may be the same or a different structure than the first ITC conjugate. It may be applied in the same or different solvent, at the same or different concentration, for the same or different time, at the same or different temperature. For example, a coupling reaction at step (c) of a PITC-tetrazine conjugate at 40mM for 30 minutes at 40C may be followed by one or more repetitions of an alkyl-isothiocyanate conjugate, such as 1-propyl isothiocyanate, at 3M for 10 minutes at 40C. Advantages include using less costly reagents to increase stepwise yield and/or avoiding phasing.
[0298] Some embodiments include: after (m) and/or (1), contacting the first immobilized conjugate complex with a promiscuous binding agent capable of binding to the first immobilized conjugate complex independent of the identity of an amino acid within the conjugate complex, wherein the promiscuous binding agent comprises a binding moiety that associates with the immobilized conjugate independent of the amino acid, and a promiscuous recode tag capable of hybridization to any cycle tag and that carries identifying information regarding the promiscuous binding agent. In some embodiments, the conjugate complex comprises zero, one, or more reactive moieties, and the reaction comprises an activatable chemistry and/or reversible chemistry. In some embodiments, the recode tag associated with the first binding agent is a nucleic acid having a sequence corresponding to an (n-l)th cycle tag, an amino acid (AA) tag, and an nth cycle tag. In some embodiments, (i) through (1) are performed simultaneously. In some embodiments, (m) comprises contacting the recode blocks with ligase, AA tag oligonucleotide complements, and buffer under conditions that allow ligation to assemble the recode blocks and AA tag oligonucleotide complements into a memory oligo. In some embodiments, the memory oligo, the cycle tag, and the recode block each comprise a nucleic acid molecule. In some embodiments, the memory oligo comprises a universal priming site, the universal priming site comprising a priming site for amplification or a priming site for sequencing, or both. In some embodiments, the binding agent comprises a polypeptide or protein.
Process 200-5
[0299] Disclosed herein, in some embodiments, are methods for determining identity and positional information of a plurality of amino acid residues of a peptide, the peptide comprising n amino acid residues, the method comprising: (a) coupling the peptide to a solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) providing a chemically-reactive conjugate, the chemically-reactive conjugate comprising: (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide; and (z) a immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the chemically-reactive conjugate, thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as a N-terminal amino acid residue on the cleaved peptide and providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) repeating (b) through (e) n-1 times to assemble n-1 additional immobilized amino acid complexes, each additional immobilized amino acid complex comprising a nucleic acid associated with cycle 2 to n, accordingly; (g) contacting the immobilized amino acid complexes with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to one or to a subset of the immobilized amino acid complexes; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming one or more affinity complexes, each affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex; (h) within each formed affinity complex, joining a cycle tag to a recode tag to form a recode block, thereby creating a plurality of recode blocks, each recode block corresponding with a formed affinity complex; (i) joining two or members of the plurality of recode blocks to form a memory oligonucleotide; (j) obtaining sequence information for the memory oligonucleotide; and (k) based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide. In some embodiments, n is an integer greater than or equal to 2. In some embodiments, each binding agent comprises recode tags with a unique nucleic acid sequence. In some embodiments, a plurality of binding agents comprises recode tags with the same nucleic acid sequence. In some embodiments, binding agents comprises recode tags which may have a unique sequence portion and a common sequence portion.
[0300] In some embodiments, determining the identity and positional information of the plurality of amino acid residues of the peptide comprises determining the identity and positional information of all of the amino acid residues of the peptide. In some embodiments, determining the identity and positional information of the plurality of amino acid residues of the peptide comprises determining the identity and positional information of only a subset of the amino acid residues of the peptide. Some embodiments include identifying the peptide, for example by comparing the identity and positional information of the plurality of amino acid residues to a database.
Process 202 - Location Oligos
[00301] In certain embodiments, a method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes is provided, the method comprising: (a) providing a peptide of mer length n=2 to 2000 joined to a solid support using a location linker, wherein the location linker is bound to a location oligo (e.g., “locationOligo”); (b) providing a first chemicallyreactive conjugate (e.g. a PITC-conjugate), wherein the conjugate comprises a cycle tag with identifying information regarding a workflow cycle of the method, a reactive moiety that can bind and cleave a terminal amino acid of the peptide, and a reactive moiety that facilitates immobilization to a solid support; (c) contacting the peptide with the first chemically-reactive conjugate, wherein the first chemically-reactive conjugate binds with the terminal amino acid, or a modified terminal moiety, of the peptide to form a first conjugate complex, e.g., a PIT-AA-cycle tag-conjugate complex; (d) immobilizing the first conjugate complex to the solid support; (e) cleaving the terminal amino acid from the peptide thereby providing a first immobilized conjugate complex and a new terminal amino acid of the peptide joined to the solid support of (a); (f) optionally repeating (b) through (e) to assemble a second immobilized conjugate complex having cycle information for the new terminal amino acid of the peptide; (g) optionally repeating (f) for additional iterative cycles to create additional immobilized conjugate complexes for additional amino acids of the peptide of step (a); (h) optionally deprotecting nucleic acids of the conjugate complex and/or any protected nucleic acids associated with the solid support; (i) contacting the first immobilized conjugate complex with a first binding agent capable of binding to the first immobilized conjugate complex, wherein the first binding agent comprises a binding moiety and a recode tag with identifying information regarding the first binding agent; (j) transferring the information of the recode tag associated with the first binding agent to the cycle tag of the first immobilized conjugate complex to generate a first recode block; (k) optionally repeating (i) and (j) with a second binding agent comprising a binding moiety and a recode tag with identifying information regarding the second binding agent to transfer the information of the recode tag associated with the second binding agent to the second immobilized conjugate complex to generate a second recode block; (1) optionally repeating step (k) for additional cycles to create recode blocks for additional amino acids of the peptide of step (a); (m) contacting at least the first recode block and a corresponding location oligo with a polymerase, nucleotides, and buffer under conditions that allow extension to transfer information from the location oligo to the first recode block, thereby creating a memory oligo; (n) optionally repeating step (m) to transfer information from the location oligo to additional recode blocks proximal to the location oligos; (o) releasing the memory oligos from the solid support via tether cleavage, hydrogel dissociation, polymerization, or another means; (p) optionally assembling the memory oligos into longer memory oligos ex situ)-, and (q) analyzing the memory oligos.
[00302] For clarity, it is envisioned in some embodiments that the location oligo is provided via random seeding, via a hyb tag initiator, or in another manner not using a location linker.
[00303] In some aspects, one or more operations of the method are repeated one or more times to increase a step yield of the method. For example, in specific aspects, operations (e), (i), and/or (j) are repeated one or more times to increase the step yield.
[00304] In some aspects, the peptide comprises any suitable macromolecular polymer, including a protein, a peptide, a complex carbohydrate, and the like. In such aspects, a monomeric unit of the macromolecular polymer may comprise an amino acid, a carbohydrate, and/or any monomeric moiety that may be combined into a polymer.
[00305] In some aspects, the recode tag linked to the binding agent is a nucleic acid having a sequence corresponding to an (n-l)th cycle tag or (n+/-i)th cycle tag, an amino acid (AA) tag (e.g., an “AAtag”), and an nth cycle tag. Optionally, the recode tag linked to the binding agent is a nucleic acid having a universal sequence for amplification or assembly, a sequence complementary to a cycle tag (e.g., a “cycle tag complement sequence”), and an amino acid (AA) tag (e.g., an “AAtag”).
[00306] In some aspects, some aspects, the information is transferred from a location oligo to a recode block using a ligase.
[00307] In some aspects, each individual memory oligo is analyzed either on its own or randomly assembled with other memory oligos from the same analyte or different analytes of a sample. This approach may facilitate streamlining of the recoding process and allows for more efficient analysis.
[00308] In some aspects, the location oligos can be utilized to determine spatial location within a histological tissue section and combined with identification data in silico to enables spatial resolution of individual protein molecules. Determining spatial locations of protein molecules within histological tissue sections enables spatial multiomic analysis. Spatial multiomics is the study of gene/RNA expression and protein abundance with spatial context to elucidate functional biology. Integrating different scales of analysis from spatial multiomics can facilitate an improved understanding of tissue and cellular microenvironments.
[00309] In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., used to join the complex to a solid support), and the reaction comprises an activatable chemistry.
[00310] In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., used to join the complex to a solid support), and the reaction comprises a reversible chemistry.
[00311] In some aspects, the conjugate complex comprises zero, one, or more reactive moieties (e.g., used to join the complex to a solid support), and the reaction comprises an activatable and reversible chemistry.
[00312] In some aspects, one or more amino acids (or monomer subunits) are removed from the immobilized peptide (or macromolecular analyte) without regard to identifying the amino acid (or monomer) for that cycle. These “skipped” amino acid cycles are recorded in silico, and analysis algorithms account for known translations of the skipped information during alignment to reference sequences. In the case of peptides, this may be accomplished by optionally performing one or more iterations of operations 2-4, described below, where PITC is substituted for the chemically-reactive conjugate (e.g., a PITC-conjugate). This may be referred to as a “strobed” read or “strobed” sequencing. One advantage of this aspect is that an isoform of a protein may be readily determined by reading segments of said protein that are not adjacent to one another to achieve long-range information. This may save time and costs to obtain intervening or redundant information contained in the peptide, or in a combination of peptide and associated genomic information. For example, this aspect may include 5 cycles of peptide degradation using a chemically-reactive conjugate, followed by 30 cycles using PITC or an enzymatic cleavage, then another 5 cycles with the chemically-reactive conjugate, and so on.
[00313] In some aspects, utilization of a predetermined subset of binding agents allows identification of a subset of the amino acids of a peptide, polypeptide, protein, or a protein complex. Given that sites of interest (e.g. post-translational modification (PTM) or splice locations) can be different across various proteins in a mixed population, this aspect eliminates the need for measuring/determining the identity for every single amino acid in a sample at every single cycle - a task that would require significantly more sequencing.
[00314] In some aspects, the subset of amino acids identified by the subset of binding agents are modified with a post translational modification. Doing so may greatly enrich the information density for the subset of amino acids upon analysis.
[00315] In some aspects, one or more amino acids (or monomer subunits) are removed from the immobilized peptide (or macromolecular analyte) without regard to identifying the amino acid (or monomer) for that cycle, using an aminopeptidase (e.g., CAS Number: 37288-67-8) or similar agent/construct. Further, this technique can also be applied to prepare an N-terminus of proteins or peptides protected by acylation for processing by the chemically-reactive conjugate. Further, this method can be used to “strobe” through amino acids, such as proline, which may otherwise not be effectively cleaved under chemical conditions using a chemically-reactive conjugate in some examples. [00316] In some aspects, one or more operations of the method are performed simultaneously. For example, in specific aspects, operations (i) through (1) are performed at the same time.
[00317] In some aspects, operation (m) comprises contacting the recode blocks with ligase, AA tag oligonucleotide complements, and buffer under conditions that allow ligation to assemble the recode blocks and AA tag oligonucleotide complements into a memory oligo.
[00318] In some aspects, a memory oligo, a cycle tag, a recode block, an AA tag complement, and/or a ligation oligo or component may comprise a DNA molecule, an RNA molecule, another type of nucleic acid molecule, a DNA molecule with pseudo-complementary bases (e.g. Inosine), or a combination or chimera thereof.
[00319] In some aspects, the memory oligo or ligation component comprises a universal priming site, and the universal priming site may comprise a priming site for amplification, priming site for sequencing, or both.
[00320] In some aspects, the memory oligo comprises a sample index, a spacer, a unique molecular identifier (UMI), a universal priming site, a CRISPR protospacer adjacent motif (PAM) sequence, or any combination thereof.
[00321] In some aspects, the memory oligo and/or chemically-reactive conjugate comprises a spacer having a length between 0.1 nm and 500 nm attached at its 3'-terminus, 5’-terminus, or attached to a modified nucleotide base.
[00322] In some aspects, the memory oligo is associated with a unique molecule identifier (UMI) or barcode.
[00323] In some aspects, a solid support as described herein comprises a solid bead, a porous bead, a solid planar support, a porous planar support, a patterned or non-patterned surface, a nanoparticle, or an inorganic or polymeric microsphere. In some aspects, the support may comprise a glass slide or wafer, a silicon slide or wafer, a PC PTC PE HDPE or other plastic surface, a teflon, nylon, nitrocellulose or other membrane, and particles/ beads may be polystyrene, crosslinked polystyrene, agarose, or acrylamide.
[00324] In some aspects, the bead or nanoparticle is magnetic or paramagnetic.
[00325] In some aspects, a solid support may be passivated with glass, silicon oxide, tantalum pentoxide, DLC diamond-like carbon, or other passivation agents, or a solid support may comprise membranes that are passivated or activated via, e.g., corona or other plasma treatments methods, etc.
[00326] In some aspects, a solid support may or may not be assembled with other components to facilitate fluid transport and/or detection (e.g., flowcell, biochip, a microtiter plate).
[00327] In some aspects, a solid support is comprised of a hydrogel that supports joining components for macromolecule recoding and/or analysis workflow. [00328] In some aspects, a hydrogel is formed from synthetic polymers, natural polymers, and/or hybrid polymers. Monomers may include one or more: acrylamide, dihydroxy methacrylates, methacrylic acid, or the like in linear, branched, and/or crosslinked configurations, block co-polymers configurations, or other configurations conducive to sequencing macromolecules.
[00329] In some aspects, a hydrogel comprises at least 3 orthogonal conjugation chemistry modalities. [00330] In some aspects, surface chemistries may be modified to manage non-specific binding. These comprise PEG-silane coated surfaces, PEG-coated surfaces, polyethylene oxide coated surfaces, selfassembled monolayers, zwitterionic polymer coated surfaces, layer-by-layer assembled surfaces, poly (2-methacryloyloxy ethyl phosphorylcholine) coated surfaces, poly (oligoethyleneglycol methacrylate) coated surfaces, poly(oligoethyleneglycol acrylamide) coated surfaces, poly(2- hydroxyethylmethacrylate) coated surfaces, polysaccharide coated surfaces, dextran coated surfaces, poly(sulfobetaine) coated surfaces, poly(carboxybetaine) coated surfaces, polydimethylsiloxane coated surfaces, hyaluronic acid coated surfaces, fluoropolymer coated surfaces, perfluoropolyether coated surfaces, as well as random or block copolymer, branched, crosslinked, functional group modified, combinations, and multilayer versions thereof.
[00331] In some aspects, macromolecule (e.g., protein, peptide) and/or universal primer sequences are covalently joined to the solid support.
[00332] In some aspects, the binding agent comprises a polypeptide or protein, e.g., an antibody or portion thereof (e.g., a single-chain variable fragment (scFv), a fragment antigen-binding (FAB) region, a FAB2 region), a nanobody, a DNA aptamer, an RNA aptamer, a modified aptamer, a photo-active or non-photoactive cage compound, an oligo-peptide permease (Opp), an aminoacyl tRNA synthetase (aaRS), a periplasmic binding protein (PBP), a dipeptide permease (Dpp), a proton dependent oligopeptide transporter (POT), a modified aminopeptidase, a modified amino acyl tRNA synthetase, a modified anticalin, or a modified Clp protease adaptor protein (ClpS).
[00333] In some aspects, the binding agent is capable of selectively binding to an immobilized conjugate complex depending on the AA that is part of the complex.
[00334] In some aspects, the binding agent comprises a binding moiety and a recode tag.
[00335] In some aspects, the recode tag comprises sequences that represent AA information, and the recode block comprises sequences that represent both workflow cycle and amino acid (or monomer identity) information.
[00336] In some aspects, the binding moiety and the recode tag are joined by a linker with length between 0.1 nm and 500 nm.
[00337] In some aspects, the chemically reactive conjugate and/or conjugate complex further comprises a spacer, a workflow cycle specific sequence, a unique molecular identifier, a universal priming site, a restriction endonuclease cleavage sequence, or any combination thereof.
[00338] In some aspects, the chemically reactive conjugate and/or conjugate complex comprises a spacer associated with a reactive moiety used for immobilization of the chemically-reactive conjugate complex to the hydrogel surface, and the spacer comprises a restriction endonuclease cleavage sequence capable of releasing the PITC-AA moiety and/or cycle tag from the conjugate complex.
[00339] In some aspects, the chemically reactive conjugate and/or conjugate complex comprises a spacer associated with the reactive moiety used to bind and cleave terminal amino acids, and that spacer contains a restriction endonuclease cleavage sequence capable to release the cycle tag and/or the reactive moiety used for immobilization from the conjugate complex.
[00340] In some aspects the chemically reactive conjugate may be in a pro-form, meaning that it is able, through additions, activations, cleavage reactions or other manipulations, to perform the functions of cycle identification (e.g., cycle tag), binding and cleavage of amino acids (e.g., PITC), and reaction to a surface, such as a hydrogel coated surface.
[00341] In some aspects, transferring the information of the recode tag to the recode block is mediated by a DNA ligase and a ligation oligo.
[00342] In some aspects, transferring the information of the recode tag to the recode block is mediated by a DNA polymerase, or by a combination of a DNA polymerase and ligase.
[00343] In some aspects, transferring the information of the recode tag to the recode block is mediated by chemical ligation.
[00344] In some aspects, a plurality of macromolecules and associated conjugate complexes are joined to a solid support.
[00345] In some aspects, a plurality of pools with different combinations or compositions of binding agents having completely distinct, or distinct but overlapping, affinities can be introduced to the surface of immobilized chemically-reactive conjugates. By using different pools with distinct binding properties, a more comprehensive and accurate characterization of the immobilized peptides can be achieved.
[00346] In some aspects, the plurality of macromolecules are spaced apart on the solid support at an average distance >100 nm.
[00347] In some aspects, the reactivity of a residual chemically-reactive conjugate (e.g., a conjugate that is unreacted with amino-acid, but immobilized to the surface, due to insufficient removal by washing prior to initiating the immobilization chemistry) is quenched by an amino acid or amino acid mimic so as to become a bystander in future cycles.
[00348] In some aspects, modification of a terminal amino acid of the peptide prior to contacting the peptide with the first chemically-reactive conjugate increases the reactivity of the chemically-active conjugate toward the modified amino acid relative to non-modified amino acids. For example, activation of the C-terminal amino acid with acetic anhydride prior to contacting with trimethylsilylisothiocyanate has been described. Bailey, J.M., Shenoy, N.R., Ronk, M., & Shively, J.E., 1992, Protein Sci. 1, 68-80.
[00349] In some aspects, the methods described herein further comprise after contacting the recode blocks with polymerase, nucleotides, ligase, and/or buffer under conditions that allow extension- ligation or ligation to assemble the recode blocks into a memory oligo, contacting a plurality of incompletely ligated memory oligos with linking oligos, polymerase, nucleotides, ligase, and/or buffer under conditions that allow extension-ligation or ligation to assemble the incompletely ligated memory oligos into a memory oligo. Accordingly, the yield during memory oligo assembly may be increased. [00350] In some aspects, the methods described herein further comprise after contacting the recode blocks with polymerase, nucleotides, ligase, and/or buffer under conditions that allow extensionligation or ligation to assemble the recode blocks into a memory oligo, contacting a plurality of incompletely ligated memory oligo fragments and/or recode blocks with linking oligos, ligase, and buffer under conditions that promote ligation of recode blocks and memory oligo fragments. Accordingly, the yield during memory oligo assembly may be increased.
[00351] In some aspects, the linking oligo comprises a sequence complementary to that of the recode blocks, thereby facilitating ligation of recode blocks that were not ligated during contacting with the polymerase, nucleotides, ligase, and buffer.
[00352] In some aspects, the linking oligos comprise additional nucleotide sequences coded to carry information related to sample or process, and/or that aid in ligation or extension-ligation.
[00353] In some aspects, the memory oligo is amplified prior to analysis, e.g., by bridge amplification, ExAmp NGS clustering, isothermal clustering, solution-based PCR amplification, A-tailing to add primers sequences prior to solution-based amplification, or any suitable DNA amplification method.
[00354] In some aspects, a memory oligo optionally comprises a sample index, a spacer, a unique molecular identifier (UMI), a universal priming site, a CRISPR protospacer adjacent motif (PAM) sequence, or any combination thereof.
[00355] In some aspects, a plurality of memory oligos are enriched prior to analysis, e.g., via a depletion process or a normalization process to remove or reduce the fraction of oligos associated with abundant protein, peptides, or macromolecules. In some aspects, enrichment or depletion may be carried out via commercially available kits, such as Agilent SureSelect, or via custom enrichment or depletion methods using oligonucleotides partially complementary to a memory oligo sequence, e.g., complementary to AA tag sequences of the target memory oligo.
[00356] In some aspects, a plurality of memory oligos representing a plurality of macromolecules are analyzed in parallel.
[00357] In some aspects, analyzing the memory oligo(s) comprises a nucleic acid sequencing method. [00358] In some aspects, analyzing the memory oligo(s) comprises analysis via a multiplex PCR method.
[00359] In some aspects, the nucleic acid sequencing method comprises sequencing by synthesis, sequencing by ligation, sequencing by hybridization, or pyrosequencing.
[00360] In some aspects, the nucleic acid sequencing method comprises single molecule microscopy sequencing or nanopore sequencing. [00361] In some aspects, the memory oligo is configured to be analyzed using commercially available NGS technology, such as the NGS methods exemplified by Illumina, Element Bio, and Singular Genomics.
[00362] In some aspects, the chemically reactive conjugate and/or conjugate complex comprises a cleavable group flanked by matched unique molecular identifiers (UMIs) within the cycle tag to facilitate cleavage of memory oligos at designated positions. In these aspects, one or more restriction endonuclease sequences carried by one or more cycle tag sequences assembled into a memory oligo are cleaved to create one or more oligonucleotides (memory oligos). The oligonucleotides are short enough to be read completely using short-read DNA sequencing technology, including those short-read DNA sequencing methods and devices commercialized by Illumina, Element Bio, and Singular Genomics.
[00363] In some aspects, helicase may be utilized during assembly of memory oligos. The use or strobing of helicase during one or more assembly processes may, in some examples, improve access of DNA blocks to facilitate longer memory oligo assembly.
[00364] In some aspects, the memory oligo or recode blocks thereof are configured to be analyzed using a decode-based methodology. More information regarding decode-based techniques may be found in Gunderson et al., Decoding Randomly Ordered DNA Arrays, Genome Res., 2004 May; 14(5):870-7, which is herein incorporated in its entirety by reference for all purposes.
[00365] In some aspects, fragments of memory oligos, or recode blocks, or any such spatially-confined set of constructs that contains sequence and identity information associated with a given peptide, protein, protein complex, or polymer, are analyzed using a decode-based methodology. See Gunderson et al.
[00366] In some aspects, identifying components are selected from UMIs, sample indexes, recode tags, recode blocks, ligation oligos, AA tags, their complements, or any combination thereof.
[00367] In some aspects, the N-terminal AA of the peptide is removed by chemical cleavage alternatives to Edman cleavage.
[00368] In some aspects, one or more chemically-reactive conjugates binds to a terminal amino acid residue of the peptide.
[00369] In some aspects, one or more binding agents bind to the conjugate complex.
[00370] In some aspects, the conjugate complex comprises a post-translationally modified amino acid. [00371] In some aspects, the identifying components of a recode tag, recode block, or both comprise error detection and/or correction bits.
[00372] In some aspects, the error detection/correcting sequence is derived from Hamming distance theory, or other modern digital code space theories (e.g., Lee, Levenshtein-Tenengolts, Reed-Solomon, or others).
[00373] In some aspects, the constituents of a recode tag, recode block, or both, comprise 2, 3, 4, 5, 6 or more different types of nucleotides. [00374] In some aspects, the code (or codes) (e.g., sequences) associated with a recode tag or recode block via analysis of the memory oligo are derived from 2, 3, 4, 5, 6 or more types of nucleotides.
[00375] In some aspects, the number of different types of nucleotides used to create a recode code do not equal the number of nucleotide types that comprise the recode tag, cycle tag, or either, or both.
[00376] In some aspects, a macromolecule, fragment, or peptide activation comprises a functional moiety NHS group, aldehyde group, azide group, alkyne group, maleimide group, thiol group, tetrazine and trans-cyclooctene, or the like.
[00377] In some aspects, an immobilized peptide is linearized (denatured) using detergent(s), surfactant(s), chaotropic agent(s), reducing agent(s), and/or alkylation agent(s).
[00378] In some aspects, a chemically-reactive conjugate reacts and cleaves from a C-terminus of the peptide rather than the N-terminus to create recode blocks that can be assembled using any of the methods described herein.
[00379] In some aspects, “paired-end read” information may be collected from an immobilized protein complex, protein, or peptide, by creating recode blocks using chemically-reactive conjugates operating on both the N-terminus and C-terminus of a given protein complex, protein, or peptide sequentially or in parallel to create recode blocks that can be assembled using methods described herein.
[00380] In certain embodiments, a method for acquiring a priori defined code information via sequencing of a subset of nucleotides types in an oligonucleotide or oligonucleotide cluster is provided. Such is particularly beneficial when considering readouts of information stored in DNA (e.g., DNA data storage information technology readout).
[00381] In some aspects, information recoded into a memory oligos is acquired via sequencing of a subset of the nucleotide types in the memory oligo. For example, a subset of nucleotide types may be identified and a subset of nucleotide types may not be identified in the sequencing readout, e.g., by introducing non-fluorescent, non-reversibly-terminated nucleotides into an SBS sequencing reagent mixture. In certain embodiments, the subset is 2 of the 4 natural nucleotides.
[00382] In certain embodiments, a method for preparing a peptide or a plurality of peptides of mer length n=2 to 2000 to be joined to a solid support is provided, the method comprising: (a) fragmenting peptides, protein, and/or protein complexes in one or more samples; (b) activating zero, 1, 2, or more moieties of each fragmented peptide, protein, and/or protein complex; (c) optionally joining a samplespecific nucleotide index sequence to the activated peptides, proteins, and/or protein complexes; and (d) joining the peptides to a solid support.
[00383] In some aspects, one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed.
[0384] In some aspects, subunits of a given protein are co-immobilized directly or through their interaction with native subunits on the surface. Subsequently, the one or more subunits may be simultaneously recoded by processes (b)-(m), including alternate aspects associated with the method, within the same localized region. Information of the memory oligo may contain an admixture of subunits (protein and native) which can be deconvoluted in silico.
[0385] In certain embodiments, a method for preparing interacting peptides, or a plurality of interacting peptides, to be joined to a solid support is provided, the method comprising: (a) cross-linking peptides, protein, and/or protein complexes in one or more samples (for example, using homo-bifunctional, heterobifunctional, or photoreactive methods as described in Kluger, et al., (2004) Bioorganic Chemistry v32:6, 451); (b) activating zero, 1, 2, or more moieties of each cross-linked peptide, protein, and/or protein complex for immobilization to a solid support; (c) optionally joining a sample-specific nucleotide index sequence to the activated peptides, proteins, and/or protein complexes; and (d) joining the complexes to the solid support. In some aspects, one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed. Generally, the method enables the analysis of in vivo associated proteins and their interactions, and thus, facilitates discovery, identification, and investigation of protein interactomes.
[0386] In certain embodiments, a method for preparing interacting DNA-peptides, or a plurality of interacting DNA-peptides complexes, to be joined to a solid support is provided, the method comprising: (a) cross-linking peptides, protein, and/or protein complexes with native DNA with which the protein was associated in biological context for one or more samples (for example, using formaldehyde, or other methods known in the art); (b) activating zero, 1, 2, or more moieties of each cross-linked peptide-DNA, protein, and/or protein complex-DNA complexes; (c) optionally joining a sample-specific nucleotide index sequence to the activated peptides-DNA, and/or protein-DNA complexes; and (d) joining the complexes to a solid support. In some aspects, one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed. Generally, the method provides for the analysis of vivo interactions between proteins and DNA.
[0387] In some aspects, fragmentation comprises physical sheering, endopeptidase activity, modified endopeptidase activity, protease, metalloprotease, and/or other suitable fragmenting methods.
[0388] In some aspects, a peptide comprises any suitable macromolecular polymer, including a protein, a peptide, and the like. In such aspects, a monomeric unit of the macromolecular polymer may comprise an amino acid, a carbohydrate, and/or any monomeric moiety that may be combined into a polymer.
[0389] In some aspects, the method further comprises depletion of one or more abundant proteins from the sample prior to any of operations (a) (b) (c), and/or (d).
[0390] It may be advantageous to reduce the complexity of macromolecules on the solid support. Benefits may include reduction of non-specific binding, and/or removal of constituents that may form specific interactions with reagents, such as a pool of binding agents. Therefore, in some aspects, the macromolecule analyte is degraded at any step following step (g) of the disclosed method for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes. For example, the residual amino acids that are not isolated through steps a-g of the method may be digested prior to contacting the surface with binding agents. Methods for degrading the macromolecular analyte include chemical methods to release the macromolecule from the surface, such as reduction of a disulfide linkage, or enzymatic digestion methods that include proteinase k, trypsin, chymotrypsin, or other endo or exo-peptidases.
Process 203 - Substrate Rejuvenation
[0391] In certain embodiments, the utilization of chemically-reactive conjugates with cleavable spacers allows rejuvenation of a surface of a substrate for a second round of recoding. For example, in certain embodiments, a method for analyzing one or more residual immobilized analytes from a surface having a plurality of peptides, proteins, and/or protein complexes is provided, the method comprising: (a) providing a surface used in a previous round of recoding operations (b) - (d) described below, and which has been rejuvenated by cleaving the spacers of a first chemically-reactive conjugate, (b) providing a second chemically-reactive conjugate (e.g. a PITC-conjugate), wherein the conjugate comprises a cycle tag with identifying information regarding a workflow cycle of the method, a reactive moiety that can bind and cleave a terminal amino acid of the peptide, and a reactive moiety that facilitates immobilization to a solid support; (c) contacting the peptide with the second chemically- reactive conjugate, wherein the second chemically-reactive conjugate binds with the terminal amino acid, or a modified terminal moiety, of the peptide to form a second conjugate complex, e.g., a PIT- AA-cycle tag-conjugate complex; (d) immobilizing the second conjugate complex to the solid support; (e) cleaving the terminal amino acid from the peptide thereby providing a second immobilized conjugate complex and a new terminal amino acid of the peptide joined to the solid support of (a); (f) optionally repeating (b) through (e) to assemble a second immobilized conjugate complex having cycle information for the new terminal amino acid of the peptide, (g) optionally repeating (f) for additional iterative cycles to create additional immobilized conjugate complexes for additional amino acids of the peptide of step (a); (h) optionally deprotecting nucleic acids of the conjugate complex and/or any protected nucleic acids associated with the solid support; (i) contacting the second immobilized conjugate complex with a binding agent capable of binding to the second immobilized conjugate complex, wherein the binding agent comprises a binding moiety and a recode tag with identifying information regarding the binding agent; (j) transferring the information of the recode tag associated with the binding agent to the cycle tag of the second immobilized conjugate complex to generate a recode block; (k) optionally repeating (i) and (j) with a second binding agent comprising a binding moiety and a recode tag with identifying information regarding the second binding agent to transfer the information of the recode tag associated with the second binding agent to the second immobilized conjugate complex to generate a second recode block; (1) optionally repeating (k) for additional cycles to create recode blocks for additional amino acids of the peptide of step (a); (m) contacting the recode blocks with polymerase, nucleotides, ligase, and buffer under conditions that allow extension-ligation to assemble the recode blocks into a memory oligo, or create a fragment of a memory oligo; and (n) analyze the memory oligo.
[0392] In some aspects, previously described aspects associated with a first round of operations are applied to a second round of operations.
[0393] In some aspects, one or more of the operations of the method are performed in any suitable sequential order, or are simultaneously performed.
[0394] In some aspects, a rejuvenation process is repeated one of more times.
[0395] In some aspects, only a fraction of the chemically-reactive conjugates are cleaved from a surface, as it may be desirable to retain a fraction of the recode blocks to facilitate in silico mapping and assembly across iterative cycles of memory oligo assembly.
[0396] In some aspects, surface rejuvenation may include ‘strobing’ the protein using either chemical (e.g., phenylisothiocyanate (PITC)) or biological (e.g., aminopeptidase) methods.
[0397] In some aspects, the amine groups of residual non-cleaved recode blocks nucleic acid bases are protected by reaction with fluorenylmethyloxycarbonyl (FMOC) or other standard protection chemistries.
[0398] In some aspects, following process (m) of the method, a plurality of assembly oligos containing all or some of the possible assembly oligos are hybridized to the memory oligo, ligated, and dehybridized to form a solution-phase memory oligo.
Process 304
[0399] Thus, herein disclosed is method for determining identity and positional information of a plurality of amino acid residues of a peptide, the peptide comprising n amino acid residues, the method comprising the steps of: (a) coupling the peptide to a solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) providing a binding agent, the binding agent comprising: (x) a binding moiety for preferentially binding to one or to a subset of amino acid residues such as N-terminal residues of the peptide, (y) a C/AA tag comprising a nucleic acid corresponding with the binding agent, and (z) an immobilizing moiety for immobilization to the solid support; (c) contacting the peptide with the binding agent thereby forming a complex bringing a C/AA tag into proximity with the point at which the peptide is coupled to the solid support; (d) immobilizing the complex to the solid support via the immobilizing moiety; (e) contacting the N-terminus of the peptide with a reactive moiety for cleaving the N-terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide; (f) optionally associating cycle information with the immobilized C/AA tag; (g) repeating (b) through (e) n-1 times to assemble n-1 additional immobilized C/AA oligonucleotides, each additional immobilized C/AA oligonucleotide comprising a nucleic acid associated with cycle 2 to n, accordingly; (h) joining two or more members of the plurality of recode blocks to form a memory oligonucleotide; (i) obtaining sequence information for the memory oligonucleotide; and (j) based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide.
[0400] In some embodiments location information is transferred to C/AA, or its associated memory oligo.
[0401] FIG. 17 illustrates steps in an alternate embodiment where cycle and amino acid information are aggregated in serial steps. In figure FIG. 17a the symbol ‘a’ may represent one or more amino acids of an immobilized protein, or an isolated amino acid. The “C/AA” moiety of the binding agent is a nucleic acid that represents cycle and amino acid identity, is stable throughout serial peptide degradation reactions, and capable to hybridize with other nucleic acids. The “AC” moiety of the binding agent represents an activatable chemistry for immobilization of C/AA to a solid support. Binding agents comprising a C/AA nucleic acid that represents cycle and amino acid information, and that comprise a reactive moiety that maybe joined to the solid support, contact an isolated amino acid complex. The triangle indicates chemical cleavage location, for example the location of a di-sulfide bridge. Alternate embodiments may also utilize the overwhelming advantages of localized assembly methods as described below.
[0402] In FIG. 17b unbound molecules are washed away and C/AA is immobilized in proximity to the immobilized amino acid complex. Immobilized amino acid complexes may be cleaved or blocked between the multiple cycles.
[0403] The process is repeated, as shown in figure FIG. 17c to create a plurality of co-localized nucleic acids. The number of C/AA conjugates in an analysis is equal to, or a subset of, the number of cycles multiplied by the number of amino acids or amino acid derivatives in the analysis. For efficiency, in some embodiments, the C/AA oligonucleotide is comprised of amino acid information alone, and further comprised of a reactive moiety capable to join with cycle information. In this aspect, cycle information is added prior to the next cycle. For example, a PNA molecule encoding cycle information and having a complement to the reactive moiety of the C/AA PNA oligonucleotide of a binding agent is brought into contact with an immobilized binding agent. The reaction joins the two PNA molecules which together comprise amino acid and cycle information.
Process 305 - 1
[0404] Thus, herein disclosed is a method for determining identity and positional information of a plurality of amino acid residues of a peptide, the peptide comprising n amino acid residues, the method comprising: (a) coupling the peptide to a solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) providing an ITC-conjugate, the ITC-conjugate comprising: (x) a reactive moiety for binding and cleaving the N- terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide, and (y) a functional group that serves as an immobilization moiety to the solid support; (c) contacting the peptide with the ITC-conjugate, thereby coupling the ITC-conjugate to the N-terminal amino acid of the peptide to form a conjugate complex; (d) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as a N-terminal amino acid residue on the cleaved peptide and providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N- terminal amino acid residue; (f) providing a binding agent, the binding agent comprising: (xx) a binding moiety for preferentially binding to one or to a subset of the immobilized amino acid complexes, (yy) a C/AA tag comprising a nucleic acid corresponding with the binding agent, and (zz) an immobilizing moiety for immobilization to the solid support; (g) contacting the peptide with the binding agent thereby forming a complex bringing a C/AA tag into proximity with the immobilized amino acid complex; (h) immobilizing the complex to the solid support via the immobilizing moiety; (i) optionally associating cycle information with the immobilized C/AA tag; (j) repeating (b) through (i) n-1 times to assemble n-1 additional immobilized amino acid complexes, each additional immobilized amino acid complex comprising a nucleic acid associated with cycle 2 to n, accordingly, and n-1 additional immobilized C/AA oligonucleotides, each additional immobilized C/AA oligonucleotide comprising a nucleic acid associated with cycle 2 to n, accordingly; (k) joining two or more members of the plurality of recode blocks to form a memory oligonucleotide; (1) obtaining sequence information for the memory oligonucleotide; and (m) based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide.
[0405] In some embodiments location information is transferred to C/AA, or its associated memory oligo.
[0406] FIG. 18 illustrates steps in an alternate embodiment where cycle and amino acid information are aggregated in serial steps. FIG. 18a shows an immobilized peptide having derivatized sidechains. The derivatization may impart a nucleic acid that represents the identity of the associated amino acid or a reactive moiety that can be joined to a nucleic acid that represents the identity of the associated amino acid. Fig. FIG. 18b shows the surface following peptide degradation and immobilization of amino acid complexes. Localized nucleic acids may be further associated to create recode blocks and memory oligos.
Process 305 - 2
[0407] As an alternative readout method, herein disclosed is a method for determining identity and positional information of a plurality of amino acid residues of a peptide, the peptide comprising n amino acid residues, the method comprising: (a) derivatization of select amino acid side chains of a peptide analyte either directly or indirectly with nucleic acid tags representing the amino acid identity; (b) coupling the peptide to a solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (c) providing a chemicallyreactive conjugate, the chemically-reactive conjugate comprising: (x) a cycle tag associated with a cycle number; (y) a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide and exposing a next amino acid residue as a N-terminal amino acid residue on the cleaved peptide; and (z) an immobilizing moiety for immobilization to the solid support; (d) contacting the peptide with the chemically-reactive conjugate, thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex; (e) immobilizing the conjugate complex to the solid support via the immobilizing moiety; (f) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (g) repeating (b) through (f) n-1 times to assemble n-1 additional immobilized amino acid complexes, each additional immobilized amino acid complex comprising a nucleic acid associated with cycle 2 to n, accordingly, and n-1 additional immobilized nucleic acid tags representing the amino acid identity associated with cycle 2 to n, accordingly; (h) transferring information of the amino acid identity nucleic acid and the cycle tag of the immobilized conjugate complex to generate a recode block; (i) obtaining sequence information for the recode block; and (j) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide. In some embodiments (k) joining two or more members of the plurality of recode blocks to form a memory oligonucleotide; (1) obtaining sequence information for the memory oligonucleotide; and (m) based on the obtained sequence information, determining identity and positional information of a plurality of amino acid residues of the peptide.
[0408] In some embodiments location information is transferred to C/AA, or its associated memory oligo.
Deprotection And Reprotection of Oligonucleotides
[0409] Disclosed herein, in some embodiments, are methods that comprise protection and/or deprotection. For example, some embodiments include any or all aspects shown in FIG. 37. Some embodiments include serially repeated deprotection and reprotection of oligonucleotides during a protein sequencing method to minimize the effect of peptide cleavage chemical conditions on molecular structure of oligonucleotides. Protection, deprotection, or reprotection may be used in a method described herein, such as a method for determining protein information such as amino acid sequence, identity, or location.
[0410] Some embodiments include methods that comprise serially protecting and deprotecting oligonucleotides. The serial protection and deprotection may mitigate DNA damage. Some embodiments include a method for cyclically protecting and deprotecting oligonucleotides bound directly or indirectly to solid support in the presence of peptides bound directly or indirectly to solid support. This may be useful for mitigating DNA damage during cyclic n-terminal degradation of said peptide and subsequent biochemistry within each cycle. Some embodiments include a method for cyclically protecting and deprotecting oligonucleotides in a method of peptide sequencing where the nucleic acid is not bound directly or indirectly to solid support.
[0411] Any or all of the following steps may be included within a peptide sequencing method described herein:
(1) Deprotect an oligonucleotide associated with cycle or amino acid or peptide identity to enable polymerization, ligation, or DNA manipulation by enzymes known in the art to modify, extend, amplify, convert, or ligate DNA
(2) Reprotect the oligonucleotide;
(3) Cleave the terminal amino acid
(4) Repeat
[0412] Cleavage may be performed with a chemically-reactive conjugate (CRC). In some aspects, serially repeated protection and deprotection of oligonucleotides is performed in a context of a protein sequencing protocol, for example within a protein sequencing method, or within a barcode creation and/or detection method.
[0413] In some embodiments, protection and deprotection steps can be iterated. Cycle tags may be deprotected. In some embodiments, Location oligos may be protected, deprotected, and/or reprotected. [0414] Oligonucleotides may be protected using protection chemistries developed for and utilized during phosphoramidite oligonucleotide synthesis. These protecting groups may withstand anhydrous TCA, which is central to synthesis. For example, N(6)-benzoyl A, N(4)-benzoyl C, and N(2) -isobutyryl G, may be employed during DNA synthesis, and may be amenable to protection within protein sequencing methods. Also, protecting groups that are removable under mild alkaline conditions, e.g., phenoxyacetyl (Pac) protected dA and 4-isopropyl-phenoxyacetyl (iPr-Pac) protected dG, along with acetyl protected dC, may be employed. As a non-limiting example, protecting the individual bases A, G, and C can be achieved through acylation reactions with the appropriate acid chlorides. The specific acid chlorides used may be benzoyl chloride for adenine and cytosine, isobutyryl chloride for guanine. Solutions of benzoyl chloride in a solvent such as dimethylformamide (DMF) and isobutyl chloride in DMF may be prepared and applied to re -protect the oligonucleotides bound to solid support. In some embodiments, thymine is not protected, but if needed may be protected, for example using diphenylcarbamoyl chloride.
[0415] Disclosed herein, in some embodiments, are methods, comprising: (a) protecting an oligonucleotide of a binding or reactive molecule; (b) contacting said molecule with the N-terminus of a peptide bound to a solid support; (c) cleaving one or more amino acid residues from said peptide; (d) deprotecting the oligonucleotide of the binding or reactive molecule; (e) contacting the deprotected oligo with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation. Some embodiments include repeating any of the aforementioned steps. The chemically reactive species may include a chemically reactive conjugate described herein. [0416] Disclosed herein, in some embodiments, are methods, comprising: (a) protecting an oligonucleotide joined to a peptide; (b) contacting the N-terminus of said peptide with reagent(s) to cleave one or more amino acid residues from said peptide; (c) deprotecting the oligonucleotide bound to the peptide; (d) contacting the deprotected oligonucleotide with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation. Some embodiments include repeating any of the aforementioned steps. The chemically reactive species may include a chemically reactive conjugate described herein.
[0417] Disclosed herein, in some embodiments, are methods, comprising: (a) protecting an oligonucleotide associated with location or identity of a peptide; (b) contacting the N-terminus of said peptide with reagent(s) to cleave one or more amino acid residues from said peptide; (c) deprotecting the oligonucleotide bound to the peptide; (d) contacting the deprotected oligonucleotide with reagent(s) to transfer information by enzymatic ligation, polymerase extension, chemical ligation. Some embodiments include repeating any of the aforementioned steps. The chemically reactive species may include a chemically reactive conjugate described herein.
[0418] Disclosed herein, in some embodiments, are methods, comprising: (a) protecting an oligonucleotide coupled to a solid support; (b) binding a chemically reactive species to a terminal amino acid of a peptide coupled to the solid support; (c) deprotecting the oligonucleotide; (d) reacting a reagent with the oligonucleotide; and (e) reprotecting the oligonucleotide. Some embodiments include cleaving the terminal amino acid of the peptide after reprotecting the oligonucleotide. Some embodiments include deprotecting the oligonucleotide after cleaving the terminal amino acid of the peptide, and then reacting a second reagent with the oligonucleotide. Some examples include a washing step before or after (a), (b), (c), (d), or (e). Washing may include changing a solution, removing an excess reagent or solution. Any of the aforementioned steps (e.g. step (e)), or a combination of said steps, may be optional in some embodiments.
[0419] Disclosed herein, in some embodiments, are methods, comprising: (a) protecting an oligonucleotide coupled to a solid support; (b) cleaving a terminal amino acid of a peptide coupled to the solid support; (c) deprotecting the oligonucleotide; (d) reacting a reagent with the oligonucleotide; and (e) reprotecting the oligonucleotide. Some embodiments include binding a chemically reactive species to a terminal amino acid of the peptide after reprotecting the oligonucleotide. Some embodiments include deprotecting the oligonucleotide after binding the chemically reactive species to the terminal amino acid of the peptide, and then reacting a second reagent with the oligonucleotide. Some examples include a washing step before or after (a), (b), (c), (d), or (e). Washing may include changing a solution, removing an excess reagent or solution. Any of the aforementioned steps (e.g. step (e)), or a combination of said steps, may be optional in some embodiments.
[0420] Some embodiments relate to a method. The method may include providing a conjugate comprising a reactive molecule coupled to a protected oligonucleotide. The method may include contacting the reactive moiety with a terminal amino acid of a peptide, for example thereby binding the reactive moiety to the terminal amino acid. The method may include optionally cleaving the terminal amino acid from the peptide. The method may include deprotecting the oligonucleotide. The method may include contacting the deprotected oligonucleotide with an enzyme or reagent for ligation or polymerization. Disclosed herein, in some embodiments, are methods, comprising: providing a conjugate comprising a reactive molecule coupled to a protected oligonucleotide; contacting the reactive moiety with a terminal amino acid of a peptide, thereby binding the reactive moiety to the terminal amino acid, and optionally cleaving the terminal amino acid from the peptide; deprotecting the oligonucleotide; and contacting the deprotected oligonucleotide with an enzyme or reagent for ligation or polymerization. Some embodiments include reprotecting the oligonucleotide. In some embodiments, the reactive moiety cleaves the terminal amino acid from the peptide to expose a next terminal amino acid, and wherein the method further comprising contacting the next amino acid with another of the conjugate after reprotecting the oligonucleotide. In some embodiments, the terminal amino acid is N- terminal. In some embodiments, the peptide is immobilized to a solid support. In some embodiments, the conjugate comprises an organic, small molecule. In some embodiments, the conjugate comprises a chemically-reactive conjugate (CRC) comprising: (A) the oligonucleotide; (B) the reactive moiety; and (C) an immobilization moiety. In some embodiments, the oligonucleotide comprises a cycle nucleic acid.
[0421] Some embodiments relate to a method. The method may include providing a conjugate comprising a peptide coupled to a protected oligonucleotide. The method may include contacting the terminal amino acid of the peptide, e.g. thereby binding a reactive moiety to the terminal amino acid. The method may include optionally cleaving the terminal amino acid from the peptide. The method may include deprotecting the oligonucleotide. The method may include contacting the deprotected oligonucleotide with an enzyme or reagent for ligation or polymerization. Disclosed herein, in some embodiments, are methods, comprising: providing a conjugate comprising a peptide coupled to a protected oligonucleotide; contacting the terminal amino acid of the peptide, thereby binding a reactive moiety to the terminal amino acid, and optionally cleaving the terminal amino acid from the peptide; deprotecting the oligonucleotide; and contacting the deprotected oligonucleotide with an enzyme or reagent for ligation or polymerization. Some embodiments include reprotecting the oligonucleotide. In some embodiments, the reactive moiety cleaves the terminal amino acid from the peptide to expose a next terminal amino acid, and wherein the method further comprising contacting the next amino acid with another of the conjugate after reprotecting the oligonucleotide. In some embodiments, the terminal amino acid is N-terminal. In some embodiments, the peptide is immobilized to a solid support. In some embodiments, the conjugate comprises an organic, small molecule. Subset Sequencing
[0422] Disclosed herein, are methods for sequencing a subset of nucleotides or nucleotides, or excluding a subset of nucleotides or nucleotides from sequencing. The method for sequencing a subset of nucleotides may be included as part of a method for determining protein information such as amino acid sequence, identity, or location. The method may be useful in distinct methods involving DNA sequencing. In some embodiments, only subset of nucleotides are sequenced. In some embodiments, some nucleotides are not sequenced. For example, in some embodiments, only two nucleotides of a sequence (such as A and C) are sequenced, and the other nucleotides are not sequenced. This may reduce sequencing costs as it reduces the need for sequencing reagents.
[0423] Subset sequencing may be particularly useful when an oligonucleotide is required to function during a physiochemical activity, such as a primer for PCR or a spacer oligo, and function to store information. In some embodiments nucleotides of a sequence that is functional during physiochemical activities provide redundant stored information. An aspect such as a barcode nucleic acid or recode nucleic acid may include nucleotides such as A, G, C, and T, whereas information content of the physiochemically functional sequence may be represented by a subset of the nucleotides (such as A and C, or T and G). In some embodiments, a recode tag, cycle tag, and/or recode block nucleic acids include sequence that is useful to obtain. In some aspects, this information can be obtained by sequencing a subset of the nucleotides that comprise the nucleic acid. When an oligonucleotide that includes the redundant information sequenced, a subset of nucleotides may be skipped during sequencing.
[0424] Disclosed herein, in some embodiments, are methods for sequencing a subset of the nucleotides of an oligonucleotide. The method may include (a) providing, in a nucleic acid sequencing reaction, a combination reversibly terminated nucleotides and nucleotides that are not reversibly terminated. In some embodiments, reversibly terminated nucleotides are fluorescent. In some embodiments, non- reversibly terminated nucleotides are fluorescent. In some embodiments, nucleotides of the nucleic acid being sequenced that correspond with the nucleotides that are not reversibly terminated are not sequenced. In some embodiments, only a subset of nucleotides of the nucleic acid are sequenced. In some embodiments, a subset of nucleotides of the nucleic acid are excluded from sequencing. The method may include providing, in a nucleic acid sequencing reaction, a combination reversibly terminated nucleotides and nucleotides that are not reversibly terminated, wherein nucleotides of the nucleic acid being sequenced that correspond with the nucleotides that are not reversibly terminated are not sequenced. The method may include identifying nucleotides of the nucleic acid being sequenced that correspond with the reversibly terminated nucleotides. In some embodiments, the nucleic acid being sequenced comprises a region that includes only a subset of nucleotides selected from A, C, G, and T, and wherein the subset of nucleotides are not sequenced. In some embodiments, the subset of nucleotides selected from A, C, G, and T comprises 2 nucleotides selected from A, C, G, and T. In some embodiments, the subset of nucleotides selected from A, C, G, and T comprises 3 nucleotides selected from A, C, G, and T. In some embodiments, the region comprises a primer sequence. In some embodiments, the region does not include a barcode sequence, recode nucleic acid sequence or a portion thereof, or a cycle nucleic acid sequence or a portion thereof. The region that is not sequenced may comprise 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000, or more nucleotides, or a range of nucleotides defined by any two or more of the aforementioned integers. The part that is sequenced may comprise 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 750, 1000, or more nucleotides, or a range of nucleotides defined by any two or more of the aforementioned integers.
[0425] In some embodiments, the subset includes a combination of A, G, C, or T. In some embodiments, the subset of nucleotide constituents identified through DNA sequencing is 2 of 4 natural nucleotides (e.g. 2 of A, G, C and T). The subset may include A and G, A and C, A and T, G and C, G and T, or C and T. The subset may exclude A and G, A and C, A and T, G and C, G and T, or C and T. In some embodiments, the subset of nucleotides identified through DNA sequencing is A and C.
[0426] In some embodiments, the subset being sequenced includes all four natural nucleotides, wherein non-natural nucleotides are incorporated and are not sequenced and are skipped by non-reversibly terminated nucleotides.
[0427] In some embodiments, the subset of nucleotide constituents identified through DNA sequencing is 3 of 4 natural nucleotides (e.g. 3 of A, G, C and T). The subset may include A, G and C; A, G, and T; A, C, and T; or G, C and T. The subset may exclude A, G and C; A, G, and T; A, C, and T; or G, C and T.
[0428] The subset of nucleotides may be sequenced through the use of modified nucleotides (e.g. dideoxy (ddNTPs) such as may be used in Sanger sequencing). The modified nucleotides may include reversible terminated chemistry. The modified nucleotides may include a dye or tag such as a fluorescent dye or tag. The modified nucleotides may be provided in a sequencing reaction. In some embodiments, other nucleotides not included in the subset are not sequenced (e.g. are skipped). The nucleotides not included in the subset may exclude the modification. For example, unmodified nucleotides corresponding to the nucleotides that are skipped or not included in the subset may be used in a sequencing reaction mix.
[0429] Disclosed herein, in some embodiments, are methods may include sequencing a subset of nucleotides of an oligonucleotide molecule, comprising: (a) providing a solution that includes oligonucleotides to be sequenced; (b) providing a sequencing reagent comprising one or more nucleotides as predominantly reversibly terminated nucleotides and one or more nucleotides as predominantly non-terminated nucleotides; (c) preparing (a) for sequencing according to protocols for a sequencing system; (d) sequencing the prepared solution of (a) using as at least one component of the sequencing reagents the sequencing reagent of (b) for at least one cycle of DNA sequencing; and (e) obtaining a sequence order for a subset of the nucleotides in the original oligonucleotide sequence. In some embodiments, the oligonucleotides have been designed to contain information about the composition of a peptide or amino acid from a peptide. In some embodiments, the oligonucleotide is a memory oligo, a recode tag, a recode block, or a cycle tag. In some embodiments, the oligonucleotide is derived from a protein sequencing method that creates barcoded nucleic acid information representing protein sequence and/or protein identity. In some embodiments, the oligonucleotides is any nucleic acid sequence that embodies information related to peptide or amino acid sequence or composition. In some embodiments, information of a memory oligo is acquired via DNA sequencing of a subset of the nucleotides that comprise the memory oligo. In some embodiments, any suitable subset of nucleotides is identified through a DNA sequencing process. In some embodiments, the DNA sequencing method is next-generation sequencing (NGS). In some embodiments, the DNA sequencing is a sequencing by synthesis approach using an Illumina Sequencer or a PacBio sequencer. In some embodiments, the DNA sequencing is by ligation approach, a sequence hybridization approach, and/or a ligation-based approach is used. In some embodiments, the subset of nucleotides identified through DNA sequencing is A and C. In some embodiments, the subset of nucleotide constituents identified through DNA sequencing is 2 of the 4 natural nucleotides. In some embodiments, the subset is one of a combination of A, G, C, or T. Some embodiments include introducing non-fluorescent, non-reversibly-terminated nucleotides into NGS sequencing reagent mixtures. In some embodiments, the nucleotides in the oligonucleotide are natural nucleotides (e.g. A, C, G, and/or T). In some embodiments, the nucleotides in the oligonucleotide comprise non-natural nucleotides.
[0430] Disclosed herein, in some embodiments, are methods for analyzing one or more peptides from a sample comprising a plurality of peptides, proteins, and/or protein complexes, the method comprising: (a) designing oligonucleotides that include 2, 3, 4, 5, 6 or more different types of nucleotide constituents and that employ a subset of the nucleotide constituents to represent cycle, amino acid, location, and/or protein information; (b) utilizing the physicochemical properties the designed oligonucleotides within a protein sequencing method, such as may be described herein; (c) collecting DNA sequence information for the nucleotides that represent protein information; and (d) analyzing DNA sequence information of a subset of nucleotides to infer protein information. In some embodiments, the oligonucleotide is a memory oligo, a recode tag, a recode block, or a cycle tag. In some embodiments, the oligonucleotide is derived from a protein sequencing method that creates barcoded nucleic acid information representing protein sequence and/or protein identity. In some embodiments, the oligonucleotides is any nucleic acid sequence that embodies information related to peptide or amino acid sequence or composition. In some embodiments, information of a memory oligo is acquired via DNA sequencing of a subset of the nucleotides that comprise the memory oligo. In some embodiments, the DNA sequencing method is NGS. In some embodiments, the DNA sequencing is a sequencing by synthesis approach using an Illumina Sequencer or a PacBio sequencer. In some embodiments, the DNA sequencing is by ligation approach, a sequence hybridization approach, and/or a ligation-based approach is used. In some embodiments, the subset of nucleotides identified through DNA sequencing is A and C. In some embodiments, the subset of nucleotide constituents identified through DNA sequencing is 2 of the 4 natural nucleotides. In some embodiments, the subset is one of a combination of A, G, C, or T. In some embodiments, any suitable subset of nucleotides is identified through a DNA sequencing process. In some embodiments, the method includes introducing non-fluorescent, non- reversibly terminated nucleotides into NGS sequencing reagent mixtures.
[0431] Disclosed herein, in some embodiments, are SBS sequencing reagent mixes. Some embodiments include an SBS sequencing reagent mix comprising one or more nucleotides as predominantly reversibly terminated nucleotides and one or more nucleotides as predominantly nonterminated nucleotides.
On-Chip Decode Methods
[0432] A method for on-chip decoding is disclosed herein. Some embodiments include aspects for decoding the identity of immobilized oligonucleotides on substrates described in Gunderson et al. (Decoding Randomly Ordered DNA Arrays, Genome Res., 2004 May; 14(5):870-7). Gunderson et al. devised an algorithm to identify members within a collection of objects on an optical surface. The algorithm utilizes recursive hybridization and dehybridization, called “stages,” and the approach was applied to the fabrication of randomly assembled arrays of beads in wells.
[0433] Following a hybridization and imaging stage, arrays were dehybridized by immersion in 0.1 N NaOH for 1 minute and subsequently neutralized. This method is inefficient in situations where DNA is indirectly associated with a member of a collection of objects, such as instances where affinity molecules having oligonucleotides are associated with an isolated DNA-labeled amino acid target. In these cases, the dehybridization step(s) may cause dissociation of affinity molecules, necessitating in some embodiments repeated expensive and inefficient recognition binding steps. In situations where multiple targets are co-localized, meaning they are not optically-resolved by the imaging system, the oligo pool formulation strategy developed by Gunderson et al. may become ineffective, as codes collisions may obfuscate identity by generating degenerate codes. For example, two targets co-localized with codes of [101] and [010] may not be distinguishable from a target with a code [111], leading to ambiguity in results. The present disclosure provides alternative formulations of hybridization oligo pools and methods of deconvolution for cases involving co-localized targets. For example, this may be resolved by choosing encoding methods that do not result in conflicts when 2, 3, or more possible targets are co-localized. The present disclosure additionally provides methods of deconvolution, and enhanced resolution of fluorescent groups per channel. Benefits include decoding multiple cycles with overlapping signals, optimized oligonucleotide pool designs to reduce dehybridization steps, serial removal of oligonucleotides with multiple fluorophores and melt curve analysis for multiple stage encoding. Hybridization-capable Codes for Identifying Immobilized Tarsets
[0434] Some embodiments include using or generating 'hybridization-capable' codes, which may eliminate the need for dehybridization steps. The design constraints for these coded fluorophores differ significantly from those described in Gunderson et al. For instance, in a decode process with 2 states, 'bright' and 'dark', N stages would yield 2AN different codes. In a four-stage hybridization, there are 16 possible codes:
[0000], [0001], [0010], [0100], [1000], [0011], [0101], [1001], [0111], [1011], [1101], [1110], [1111]
Without dehybridization steps, there may be no way to distinguish between codes [1000] and [1100], as both would appear as [1111] in imaging. By removing all non-unique codes when no dehybridization is applied, the following unique codes would remain: [0001], [0011], [0111], [1111]. This would result in decoding a total of four possible targets with four possible stages.
[0435] More generally, with two decode channels (e.g. FAM and HEX), one could decode up to 24 unique targets as shown in Table 3.
Table 3
[0436] For N stages and M channels, the resulting number of unique codes and thus mathematically- resolvable cycle-amino acid combinations, including a [0000], [000] all dark code, is (NAM) instead of (2AM)AN codes in the original work. Although the number of identifiable unique codes is smaller than in the initial work, this improvement enables the use of affinity binders tagged with DNA codes without the need for dehybridization steps.
[0437] The hybridization-capable codes may be applied to a wide range of use cases, such as the identification of proteins, peptides, nucleic acids, and other biomolecules. This method may be employed in high-throughput screening assays for drug discovery and development, disease diagnostics, biomarker identification, and personalized medicine. Moreover, the method may be adapted for environmental monitoring, food safety testing, and various research applications in molecular biology, biochemistry, and biotechnology. The elimination of dehybridization steps simplifies the experimental workflow, reduces the likelihood of sample loss, and potentially shortens the overall analysis time, thereby making it more cost-effective and efficient for various applications.
Enhanced resolution of fluorescent groups per channel
[0438] Some embodiments include an improved method for identifying immobilized targets by increasing the resolution of fluorescent groups per channel. In some embodiments, this method allows for more bits of information to be obtained per channel, enabling the differentiation of a greater number of targets.
[0439] For instance, in just the FAM channel, one group could have a code of 0111 and another 0001. If both are present, the detected signal would be 0112, allowing the identification of both targets. This method may offer higher-resolution decoding, as it can distinguish between multiple fluorophores contributing to the signal in a single channel.
[0440] If it is possible to resolve 0, 1, or 2 fluorophores through four stages, a larger number of states can be enumerated without dehybridization, with all monotonically increasing codes being valid for each channel. The following examples illustrate the number of codes that may be generated with different levels of resolution:
1 level (5 codes): [0000], [0001], [0011], [0111], [1111]
2 levels with perfect resolution (15 codes): [0000], [0001], [0002], [0011], [0012], [0022], [0111], [0112], [0122], [0222], [1111], [1112], [1122], [1222], [2222]
2 levels with only changes measurable, not absolute (11 codes): [0000], [0001], [0011], [0012], [0111], [0112], [0122], [1111], [1112], [1122], [1222]
With N stages and L levels, the number of codes per channel can be calculated as:
(N + L) choose (L) where "choose" represents the combinatorial operator, nchoosek, for perfect resolution. If only changes can be detected and not absolute values, the number of codes for L = 2 would be:
(N + L) choose (L) - N
[0441] For higher levels (L = 3+), the scaling differs slightly and requires further calculations. However, as the number of levels approaches infinity, the number of codes asymptotically approaches the original (NAM-1), since a measurable difference can be generated at every stage.
[0442] This enhanced resolution method may be applied to various applications, such as high- throughput screening assays for drug discovery and development, disease diagnostics, biomarker identification, and personalized medicine. Furthermore, it may be employed in environmental monitoring, food safety testing, and research applications in molecular biology, biochemistry, and biotechnology. The ability to differentiate between multiple fluorophores in a single channel increases the efficiency and accuracy of target identification, making it more cost-effective and suitable for a wide range of applications. Table 4
Decoding Multiple Cycles with Overlapping Signals
[0443] Some embodiments include a method for decoding multiple cycles simultaneously, even in the presence of overlapping signals. This can be achieved by identifying non-colliding codes in an additive vector space, given that at most N different components are present. In some embodiments, this method can be combined with sparse decode spaces and serial addition of fluorophores without dehybridization. [0444] The mathematics underlying valid non-colliding codes in an additive vector space is described by Sidon sequences. While a solution is NP hard, constructing oligo pools (and orthogonal sets of Sidon sequence oligo pool sets) in a manner that prevents overlapping code identities is tractable.
[0445] This method could be particularly useful when combined with sparse decode spaces, where only a few target components are expected over many cycles, and most clusters on most cycles would be 'dark'. For example, identifying only tyrosine (Tyr) and phosphotyrosine (P-Tyr) over 100 cycles would require 200 codes, but some stages might have overlapping signals. Encoding in an additive vector space that allows up to N collisions without resulting in a degenerate code state (e.g., more than one combination of results leading to the same readout) may enable a readout design with a minimal number of stages.
[0446] Moreover, this approach may be combined with the addition of more fluorophores serially without dehybridization, as the signal differential from stage to stage may be measured. This may enable quick decoding of overlapping codes by spacing them apart in time. An example could involve ensuring that amino acid codes do not overlap, while cycle codes may, and each cycle's worth of fluorophores may be added sequentially. [0447] Various application areas for this method include high-throughput screening assays for drug discovery and development, disease diagnostics, biomarker identification, and personalized medicine. Additionally, it could be employed in environmental monitoring, food safety testing, and research applications in molecular biology, biochemistry, and biotechnology. By enabling the simultaneous decoding of multiple cycles with overlapping signals, this method increases the efficiency and accuracy of target identification, making it more cost-effective and suitable for a wide range of applications.
Melt Curve Analysis for Multiple- Stage Encoding
[0448] Several alternative methodologies to provide orthogonal channels exist. Using multiple fluorescent wavelengths, or multiple intensity levels are 2 obvious examples. Another example utilizes temperature. In this method hybridization probe sets with discernable inter-set hybridization energy differences may allow preferential hybridization at 2 or more temperatures. This approach has been used to provide additional channels in multiplexed PCR applications (patent reference: AU2018204665B2) and is amenable for use in some embodiments of multiple-stage decoding. This method may be further refined to incorporate melt behavior of oligonucleotides and can be used with or without Al to decipher oligo probe identity, allowing for efficient decoding of immobilized targets. By offering a robust and high-throughput method for target identification, this approach could enable the rapid and accurate decoding of complex samples, streamlining the process of target identification and analysis. Melt curve analysis for multiple-stage decoding could find applications in various fields, including drug discovery and development, diagnostics, environmental monitoring, and molecular biology research.
META DATA TAGGING
[0449] The ability to capture biologically-relevant attributes like N-terminal modifications, protein conformations, or other metadata in concert with protein sequence information will provide an enriched view of biology, and may accelerate discovery of disease etiology. Anomalies in these parameters often correlate with disease states. [Reynaud, E. (2010) Nature Education 3(9):28] Complexities and challenges associated with sample preparation to accomplish holistic evaluation of proteomic content have highlighted a need for improved methods thereof. Thus, methods of precise measurement of metadata parameters in concert with acquisition of protein sequence information is disclosed below.
Analysis of N-Terminal Modifications
[0450] N-terminal modifications are important to cellular function and are integral to cell regulation and health. [Ramazi, S., Zahiri, J. Post-translational modifications in proteins: resources, tools and prediction methods. Database (2021) Vol. 2021]. It is estimated that most human and plant proteins are post-translationally modified. As many as 90% are N-acetylated, and as many as 20% N-glycosylated.
Il l [Khoury, et. al, Nature Scientific Reports, 1:90 (2011); Linster et. al., Journal of Experimental Botany, Vol. 69, No. 19 pp. 4555-4568, 2018]. Certain common types of N-terminal modifications are shown in FIG. 53 for illustrative purposes. Yet, N-terminal modifications significantly interfere with chemical protein sequencing methods. [Mozdzanowski, J. (2003). In: Smith, B.J. (eds) Protein Sequencing Protocols. Methods in Molecular Biology, vol 211 pp 365-369]. As such, it is beneficial to remove these modifications prior to sequencing. Since these N-terminal modifications are important to cellular function, cell regulation and health, the present disclosure provides methods to identify the nature of these N-terminal post-translational modifications (metadata) prior to removing them. Metadata may be converted to a recode block and associated with single-molecule protein sequence information via the methods described in the present disclosure. Grouping metadata and sequencing data provides a more comprehensive protein analysis. The concept may be generalized to identify any feature of a protein or sample (metadata) that along with sequence information provides additional biological insight.
[0451] In certain embodiments of the present disclosure, a process is provided for identifying and recording N-terminal modifications, before their removal from a protein. This may involve the use of a metadata conjugate capable of recognizing an N-terminal modification.
[0452] FIG. 54 illustrates a simplified block diagram of an exemplary workflow (Process 400) for capturing N-terminal modifications during protein sequencing, according to embodiments of the present disclosure. FIG. 55 schematically illustrates exemplary operations of the process in FIG. 54, according to embodiments of the present disclosure.
[0453] As shown in FIG. 54, in certain embodiments, process 400 begins at operation 402, where proteins from an isolated biological sample are immobilized on a solid support, some of the proteins having one of a set of a priori defined N-terminal modifications. In step 404 a metadata conjugate having 3 functional moieties is introduced. Functions include: 1) a binding moiety, which could be an antibody, aptamer, single-chain variable fragment (scFv), or any molecule having specificity for an N- terminal modification. In specific embodiments, an antibody-conjugate specifically designed to recognize, e.g., N-terminal ubiquitin of the protein, is introduced, as shown at left in FIG. 55, 2) a metadata tag. In certain examples this could be an oligonucleotide, as shown in FIG. 55, and 3) an immobilizing moiety, e.g., a moiety that joins the metadata tag to either the cognate peptide, or directly to the solid support in proximity to the anchor point of the cognate immobilized peptide.
[0454] Accordingly, at operation 406 and as shown at center in FIG. 55, the metadata tag is immobilized. Immobilization can be facilitated through multiple mechanisms. For instance, the metadata conjugate can undergo a chemical modification to form a covalent bond with the protein or a functional group present on the solid support or may be joined via photon-induced immobilization upon exposure to a specific wavelength of light. Subsequently, at operation 408 and as shown at right in FIG. 55, the post-translational modification is enzymatically or chemically removed from the immobilized protein to expose a free amino N-terminus. Modifications may be removed by the action(s) of aminopeptidase C, aminopeptidase N, dipeptidase, endopeptidase such as trypsin, ClpAP protease, Sentrin/SUMO-specific proteases (SENP), deubiquitinating enzymes or by any suitable means. [Ronan, et.al., Cell Research (2016) 26:441-456; Amerik, et.al. Biochim Biophys Acta 2004; 1695:189-207] [0455] Exposing the free N-terminal amine may allow reverse translation protein sequencing to proceed on the immobilized proteins, step 410, as described in embodiments herein (e.g., process 200) and in PCT/US2023/070077 included by reference in its entirety. During reverse translation protein sequencing, at operation 412, the metadata tag is assembled together with nucleic acid recode blocks that represent amino acid sequence information of the associated peptide. Ligation, polymerization, or other suitable methods may be utilized. The result is a chimeric nucleic acid (memory oligo) that not only represents the amino acid sequence of the corresponding immobilized protein, but also carries the information of the metadata tag.
[0456] DNA sequencing at operation 414 provides detailed information not only about the protein’s amino acid sequence but also about its former modification. As a result, the final data set offers a comprehensive picture of the protein, including N-terminal modification, thereby significantly enriching the proteomic analysis.
[0457] Disclosed herein, in some embodiments, are methods for analyzing a peptide having a post- translational modification. Some embodiments include contacting a peptide having a post-translational modification with a binding agent that targets or binds to the post-translational modification. In some embodiments, the binding agent comprises a nucleic acid having a sequence representing the post- translational modification. Some embodiments include transferring the sequence information or its complement to the peptide or a position proximal to the immobilized peptide. Some embodiments include removing the post-translational modification from the peptide. Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing the peptide. Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing an amino acid of the peptide. Some embodiments include performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing at least one amino acid of the peptide. In some embodiments, the information representing said post-translational modification is incorporated into the nucleic acid sequence, and the nucleic acid sequence is sequenced. Disclosed herein, in some embodiments, are methods for analyzing a peptide having a post-translational modification, comprising: contacting a peptide having a post-translational modification with a binding agent that targets or binds to the post- translational modification, wherein the binding agent comprises a nucleic acid having a sequence representing the post-translational modification; transferring the sequence information or its complement to the peptide or a position proximal to the immobilized peptide; removing the post- translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate a nucleic acid sequence representing at least one amino acid of the peptide, wherein the information representing said post-translational modification is incorporated into the nucleic acid sequence, and the nucleic acid sequence is sequenced. In some embodiments, in the post-translational modification comprises phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, or succinylation. In some embodiments, each binding agent comprises a recode tag comprising the sequence representing the post-translational modification or a complement thereof. Some embodiments include generating a recode block comprising sequence information from a cycle tag, and sequence information from the recode tag. Some embodiments include generating a memory oligonucleotide from the recode block and other recode blocks.
[0458] In some embodiments, the binding moiety specifically binds an N-terminal modification. In some embodiments, the N-terminal modification comprises a post-translational modification. Some embodiments include: before contacting the immobilized amino acid complex with a binding agent: contacting the immobilized amino acid complex with a modification binding agent, wherein the N- terminal amino acid comprises a post-translational modification, and wherein the modification binding agent comprises a modification binding moiety for preferentially binding to the post-translational modification; and a modification recode tag comprising a recode nucleic acid corresponding with the modification binding agent, thereby forming a complex comprising an immobilized post-translationally modified amino acid complex and a modification binding agent and thereby bringing a cycle tag into proximity with a modification recode tag. Some embodiments include removing the N-terminal modification from the N-terminal amino acid. Some embodiments include: based on the obtained sequence information, determining identity and positional information of the N-terminal modification or post-translational modification. A modification binding agent may include an antibody or other binding agent that binds to a post-translational modification.
[0459] Disclosed herein, in some embodiments are methods for analyzing a peptide. The peptide may have a post-translational modification. The method may include contacting the peptide with a binding agent. The method may include contacting the peptide having the post-translational modification with a binding agent. The binding agent may target the post-translational modification. The binding agent may bind to the post-translational modification. The binding agent may include a nucleic acid. The nucleic acid of the binding agent may include a nucleic acid sequence. The nucleic acid sequence of the binding agent may the post-translational modification. The binding agent may include a first nucleic acid sequence. The binding agent may include a nucleic acid having a first nucleic acid sequence representing the post-translational modification. The first nucleic acid sequence may represent the post-translational modification. The peptide may be immobilized to a solid support. The method may include transferring sequence information. The method may include transferring a complement of sequence information. The method may include transferring a complement of first sequence information. The sequence information or first sequence information may be of the first sequence to the peptide. The method may include transferring first sequence information. The sequence information may be transferred to the peptide. The first sequence information may be transferred to the peptide. The sequence information may be transferred to a position proximal to the peptide (e.g. when the peptide is immobilized). The first sequence information may be transferred to a position proximal to the peptide (e.g. when the peptide is immobilized). The method may include transferring first sequence information of the first sequence or of a complement of the first sequence to the peptide. The method may include transferring first sequence information of the first sequence or of a complement of the first sequence to the peptide or a position of the solid support proximal to the peptide (e.g. when the peptide is immobilized). The method may include transferring sequence information of a binding agent to a position of a solid support proximal to a peptide. The method may include removing the post-translational modification. The method may include removing the post- translational modification from the peptide. The method may include performing reverse translation protein sequencing. The method may include performing reverse translation protein sequencing of the peptide. The method or reverse translation protein sequencing may generate a nucleic acid sequence. The method or reverse translation protein sequencing may generate a second nucleic acid sequence. The second nucleic acid sequence may represent the peptide. The second nucleic acid sequence may represent a portion of the peptide. The nucleic acid sequence generated by the reverse translation protein sequencing may represent the peptide. The nucleic acid sequence generated by the reverse translation protein sequencing may include or incorporate the first nucleic acid sequence. The nucleic acid sequence generated by the reverse translation protein sequencing may include or incorporate a nucleic acid sequence representing the post-translational modification. The first sequence information representing said post-translational modification may be incorporated into the second nucleic acid sequence. The second nucleic acid sequence may be sequenced. The method may include combining information of a nucleic acid representing a post-translational modification and a nucleic acid representing a peptide sequence or a portion of the peptide sequence. The method may include combining information of a nucleic acid representing a post-translational modification and a nucleic acid representing a peptide sequence.
[0460] Disclosed herein, in some embodiments are methods for analyzing a peptide having a post- translational modification, comprising: contacting the peptide having the post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a first nucleic acid sequence representing the post- translational modification, and wherein the peptide is optionally immobilized to a solid support; transferring first sequence information of the first sequence or of a complement of the first sequence to the peptide, or to the peptide or a position of the solid support proximal to the peptide when the peptide is immobilized; removing the post-translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate a second nucleic acid sequence, the second nucleic acid sequence representing at least one amino acid of the peptide, wherein the first sequence information representing said post-translational modification is incorporated into the second nucleic acid sequence, and the second nucleic acid sequence is sequenced. Disclosed herein, in some embodiments are methods for analyzing a peptide having a post-translational modification, comprising: contacting the peptide having the post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a first nucleic acid sequence representing the post-translational modification, and wherein the peptide is immobilized to a solid support; transferring sequence information of the binding agent to a position of the solid support proximal to the peptide; removing the post- translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate nucleic acid sequence representing at least one amino acid of the peptide sequence; and combining the information of the nucleic acid representing the post-translational modification and the nucleic acid representing at least one amino acid of the peptide sequence. [0461] In some embodiments, the post-translational modification comprises phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, or succinylation.
[0462] In some embodiments, the binding agent comprises a recode tag. In some embodiments, the binding agent comprises a recode tag comprising the first nucleic acid sequence representing the post- translational modification or a complement thereof.
[0463] Some embodiments include generating a recode block. Some embodiments include generating a recode block comprising sequence information of a cycle tag, and sequence information of the recode tag.
[0464] Some embodiments include generating a memory oligonucleotide. The memory oligonucleotide may be generated from the recode block and from other recode blocks. Some embodiments include generating a memory oligonucleotide from the recode block and other recode blocks.
[0465] Some embodiments include determining identity or positional information of the post- translational modification within the peptide. Some embodiments include: based on the sequenced second nucleic acid sequence, determining identity and positional information of the post-translational modification within the peptide.
[0466] In some embodiments, the peptide is immobilized. In some embodiments, the peptide is immobilized to the solid support.
General Identification of “metadata”
[0467] Many types of metadata may be collected in concert with amino acid sequence data using methods disclosed and described above. A non-comprehensive list includes: enzyme/substrate complexes that are stabilized; lipid binding status; lipid anchors (GPI anchors, myristoylation, and prenylation; glycosylation status, or C-terminal modifications, etc. In the case of glycosylation the binding moiety could be a lectin [Raposo et.al., Biomolecules. 2021 Feb; 11(2): 188] Note that simultaneous identification of more than one type of protein modification may be accomplished by creating appropriate pools of metadata conjugates. Also, it is evident that metadata conjugates for enrichment and/or depletion may be provided simultaneously with metadata conjugates that preserve diverse types of metadata, such as protein conformation, protein modification status, etc.
[0468] In some embodiments, the metadata tag could be applied prior to immobilization of the peptide onto solid support.
[0469] In other embodiments, the method could be incorporated directly into single molecule protein sequencing methodologies that rely on recording tags
Dynamic Range Compression
[0470] Certain embodiments described herein relate to compositions, systems, and methods for addressing the dynamic range of protein analytes in biological samples. For example, due to the dominance of specific proteins in blood plasma, marking and depleting these “dominant” proteins is advantageous for improving the sensitivity to observe low-abundance proteins in blood plasma samples. Exemplary methods for marking and depleting these dominant proteins are described in further detail below.
[0471] FIG. 51 illustrates a simplified block diagram of an exemplary workflow of a Process 100 for analyzing polymeric macromolecules of a biological sample, including polymeric macromolecules such as peptides and proteins of a blood plasma sample, according to embodiments of the present disclosure. FIG. 52 schematically illustrates exemplary operations of the process in FIG. 51, according to embodiments of the present disclosure.
[0472] As shown in FIG. 51, in certain embodiments, Process 100 begins at operation 102, where proteins from an isolated biological sample are immobilized on a solid support. In step 104 a metadata conjugate having 3 functional moieties is introduced. Functions include: 1) a binder, which could be an antibody, aptamer, single-chain variable fragment (scFv), or any molecule having specificity for a high- abundance protein targeted for depletion from the sample, 2) a nucleic acid that comprises a recognition site for a restriction enzyme, or other molecule. In certain examples, and as shown in FIG. 52, this is a nucleic acid tag which may be referred to as a “metadata tag” or a “target depletion tag”, as it facilitates the “depletion” of the high-abundance protein in subsequent operations of Process 100, and 3) an immobilizing moiety, e.g., a moiety that joins the metadata tag to either a protein of interest, or to the solid support in proximity to the anchor point of the immobilized peptide. Accordingly, at operation 106 and as shown at center in FIG. 52, the metadata tag is immobilized. Immobilization can be facilitated through multiple mechanisms. For instance, the tag can undergo a chemical modification to form a covalent bond with the protein or a functional group present on the solid support or via photon- induced immobilization upon exposure to a specific wavelength of light. At operation 108, reverse translation protein sequencing is performed on the immobilized proteins, as described in embodiments herein. During this reverse translation protein sequencing run, at operation 110, a recode block representing the metadata information is assembled together with nucleic acid recode blocks that represent amino acid sequence information of the associated peptide. Ligation, polymerization, or other suitable methods may be utilized. The result is a chimeric nucleic acid (memory oligo) that not only represents the amino acid sequence of the corresponding immobilized protein, but also carries the information of the metadata tag. Thereafter, and prior to DNA sequencing, the DNA strand is treated with a corresponding restriction enzyme that recognizes the restriction site of the metadata tag at operation 112. The enzyme cleaves the memory oligos specifically related to abundant proteins, effectively depleting the signal of high-abundance protein. Non-cleaved memory oligos, which represent low-abundance proteins in the biological sample are enriched without bias, and are sequenced at operation 114 and analyzed.
[0473] In certain embodiments the metadata tag enables a different method of depletion. For example, the metadata tag may comprise a CRISPR cleavage site, or a biotinylated tag that enables pulldown via streptavidin-biotin interactions. In some embodiments the metadata tag enables various methods of enrichment. For example, PCR amplification with specific primers, hybridization capture techniques, magnetic bead separation, microfluidic separation, and antibody-based enrichment or depletion. Additionally, enrichment or depletion strategies that do not rely on nucleic acid properties can be utilized. These may include pH- or temperature-sensitive linkers that cleave under specific conditions, or enzymatic cleavage sites that respond to specific proteases. These alternative tagging and removal mechanisms aim to selectively remove or enhance information corresponding to high-abundance proteins, thereby enhancing the detection sensitivity for low-abundance proteins.
[0474] In certain embodiments, a sequence-specific autocatalytic cleavage method may be employed. For example, DNAzyme or ribozyme sequences may be incorporated into nucleic acids representing abundant proteins, enabling selective self-cleavage under defined conditions. The catalytic nucleic acid sequences may be designed to recognize specific flanking sequences associated with abundant proteins. In some embodiments, photochemical approaches may be utilized wherein photocleavable linkers are incorporated at specific sequences associated with abundant proteins, enabling selective depletion through controlled light exposure. The wavelength sensitivity of such photocleavable linkers may be tuned through chemical modification to enable multiplexed depletion strategies. These self-cleaving or photo-induced approaches may provide advantages in reaction specificity and temporal control compared to enzyme-based methods. Such methods may be used individually, in combination with other depletion strategies described herein, or in staged depletion protocols to achieve precise control over protein representation in the final analysis.
[0475] In certain embodiments, the metadata conjugate is designed to target proteins of particular interest, such as low-abundance proteins that are relevant for diagnostic or therapeutic applications. Alternative enrichment methods include isoform-specific antibody tagging, magnetic bead capture, hybridization-based enrichment, or the like. In some examples, the tag is multifunctional, simultaneously serving as an identifier and an enrichment facilitator. For example, a nucleic acid tag of a specific oligonucleotide sequence may be used for both identifying the targeted protein and facilitating its capture via complementary base-pairing. Such enrichment strategies aim to selectively enhance the representation of specific proteins in the sequencing data, improving the analytical depth for targeted proteomic studies.
[0476] In certain embodiments, a mixed pool of metadata conjugates is employed to achieve fractional depletion of targeted high-abundance proteins. In an exemplary set up, 50% of the metadata conjugates can be equipped with a metadata tag comprising a restriction enzyme recognition site, while the remaining 50% of metadata conjugates may lack this tag, allowing only the desired fraction of targeted protein to remain in the sequence data. By using this mixed pool of metadata conjugates, a balanced 50% depletion of the targeted high-abundance protein is achieved, offering a more nuanced approach to dynamic range compression. It should be noted that the ratio of binders with and without the nucleic acid tags can be adjusted to any desired proportion, allowing for flexible control over the degree of protein depletion. Alternative percentages or ratios can be chosen to result in different levels of depletion, such as 0.1%, 1%, 25%, 75%, 99%, 99.9%, 99.99%. Such selective, controlled depletion methods allow for the targeted adjustment of protein concentrations in the sequencing data, providing a way to mitigate the challenges posed by the wide dynamic range of protein concentrations in complex biological samples like plasma.
[0477] In certain embodiments, rather than a single metadata conjugate, a diverse array of metadata conjugate is employed to interact with high-abundance proteins for targeted depletion or enrichment. These metadata conjugates may be targeted toward one or a plurality of the putative proteins in a sample. In some examples, a pool may consist of metadata conjugates to the 10 most abundant proteins as the 10 most abundant proteins represents approximately 90% of the total protein mass in human plasma. In some examples, a pool may consist of metadata conjugates against the 20 highest abundance proteins the 20 highest abundance proteins represents approximately 99% of the total protein mass in human plasma. Any size pool can be used, including but not limited to 1,2,3,4,5,10,20 or more. Binders can have one or more tags with one or more functions. Multiple binders can be used to the same target. Binders can identify whole, intact proteins, denatured proteins, peptide fragments, or short motifs. For example, the four amino acid sequence adjacent to the n-terminus in albumin is unique across the whole proteome.
[0478] In certain embodiments, the metadata conjugates can target proteins in various conformations and states, such as naturally folded, denatured, derivatized or reduced forms, and/or specific common protein motifs. A pool of metadata conjugates can also be used to target multiple conformations of the same protein, thus increasing the robustness and comprehensiveness of the dynamic range compression method. Proteins being targeted can be in native conformation or denatured or in alternate conformations or modified in other ways, and metadata conjugates can be designed to recognize such states. [0479] In certain embodiments, the metadata conjugate is designed to be transferred to one of a plurality of locations on the protein or adjacent substrate. For example, the metadata conjugate can be designed to be transferred directly to the protein, either at its N-terminus, C-terminus, or specific side chains using chemical conjugation techniques such as amide bond formation or click chemistries. Alternatively, the metadata conjugate can be designed to attach to the linkage that connects the protein to the solid support This can be facilitated through enzymatic methods such as ligase chain reaction (LCR) or chemical methods like click chemistries, or thiol-end coupling. The tag can also be configured to be integrated directly onto the solid support, using surface modification techniques like silanization for silica-based supports. Additionally, methods to immobilize metadata tags include PCR amplification to extend existing nucleic acid sequences, ligation to append sequences, or via direct synthesis techniques. The act of transferring the information of the metadata conjugate to a memory oligo may include creating a direct copy of the tag, a reverse complement of the tag, or transferring the metadata tag itself, among other methods.
[0480] In certain embodiments, the metadata conjugate is attached to the peptide analyte in solution, i.e., completing steps 104 and 106 of Fig 1. prior to completing step 102. This holds the advantage that solution kinetics for complex formation via affinity-recognition and tagging may be favorable to those on a solid support. Once the protein is tagged in solution, it can then be immobilized on the solid support for subsequent sequencing steps. This approach may enable greater flexibility in metadata conjugate design, as well as potentially improved binding kinetics due to the unconstrained environment of the solution phase.
[0481] Biological sample preparation processes described above improve the analysis of samples having vast dynamic range of proteins while reducing NGS sequencing throughput requirements. Marking and excluding abundant proteins from NGS sequencing enhances the detection sensitivity for low-abundance proteins, improves proteomic analysis efficiency, and reduces cost.
The present disclosure thus addresses a critical gap in the field of protein sequencing, ensuring that detailed and comprehensive protein data, including post-translational modification, is attainable, enhancing the depth and breadth of proteomic analysis.
[00482] Disclosed herein, in some embodiments, are assay methods for dynamic range compression. Some embodiments include contacting peptides with a binding agent that targets or binds to a high- abundance peptide of the peptides. In some embodiments, the binding agent comprises a nucleic acid having a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme. Some embodiments include performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides. Some embodiments include incorporating the nucleic acid sequence with said recognition site for a restriction enzyme or complement thereof into the nucleic acid sequence representing a high-abundance peptide. Some embodiments include introducing a restriction enzyme to the nucleic acid sequences. Some embodiments include cleaving nucleic acid sequence representing high-abundance peptides. Some embodiments include depleting representation of high-abundance peptides from among the nucleic acid sequences, and sequencing the nucleic acid sequences. Disclosed herein, in some embodiments, are assay methods, comprising: contacting peptides with a binding agent that targets or binds to a high-abundance peptide of the peptides, wherein the binding agent comprises a nucleic acid having a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme; and performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides, wherein the reverse translation protein sequencing comprises: incorporating the nucleic acid sequence with said recognition site for a restriction enzyme or complement thereof into the nucleic acid sequence representing a high-abundance peptide, introducing a restriction enzyme to the nucleic acid sequences; thereby cleaving nucleic acid sequence representing high-abundance peptides and thereby depleting representation of high-abundance peptides from among the nucleic acid sequences, and sequencing the nucleic acid sequences. In some embodiments, each binding agent comprises a recode tag comprising the sequence representing the post-translational modification or a complement thereof. Some embodiments include generating a memory oligonucleotide from the recode block and other recode blocks.
[0483] Disclosed herein, in some embodiments are assay methods. The method may include contacting peptides with a binding agent. The binding agent may target a high-abundance peptide. The binding agent may binds to a high-abundance peptide. The high-abundance peptide may be a peptide of the peptides. The binding agent may include a nucleic acid comprising a nucleic acid sequence. The binding agent may include a nucleic acid sequence. The nucleic acid sequence of the binding agent may include a recognition site. The recognition site may be for a restriction enzyme. The nucleic acid sequence of the binding agent may include a complement of a recognition site for a restriction enzyme. The method may include performing reverse translation protein sequencing. The method may include performing reverse translation protein sequencing of the high-abundance protein. The method may include performing reverse translation protein sequencing of the peptides. The method or reverse translation protein sequencing may generate nucleic acid sequences representing the peptides. The nucleic acid sequences representing the peptides may include a nucleic acid sequence representing the high- abundance peptide. The method or reverse translation protein sequencing may include incorporating the nucleic acid sequence comprising said recognition site into the nucleic acid sequence representing the high-abundance peptide. The method or reverse translation protein sequencing may include incorporating the nucleic acid sequence comprising said recognition site for the restriction enzyme or complement thereof into the nucleic acid sequence representing the high-abundance peptide. The method or reverse translation protein sequencing may include contacting the restriction enzyme with the nucleic acid sequences. The method or reverse translation protein sequencing may include cleaving the nucleic acid sequence representing high-abundance peptides. The method or reverse translation protein sequencing may include depleting representation of the high-abundance peptide from among the nucleic acid sequences representing the peptides. The method or reverse translation protein sequencing may include sequencing the nucleic acid sequences representing the peptides.
[0484] Disclosed herein, in some embodiments are assay methods, comprising: contacting peptides with a binding agent that targets or binds to a high-abundance peptide of the peptides, wherein the binding agent comprises a nucleic acid sequence comprising a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme; and performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides, the nucleic acid sequences representing the peptides comprising a nucleic acid sequence representing the high-abundance peptide, wherein the reverse translation protein sequencing comprises: incorporating the nucleic acid sequence comprising said recognition site for the restriction enzyme or complement thereof into the nucleic acid sequence representing the high-abundance peptide, contacting the restriction enzyme with the nucleic acid sequences, thereby cleaving the nucleic acid sequence representing high-abundance peptides and thereby depleting representation of the high-abundance peptide from among the nucleic acid sequences representing the peptides, and sequencing the nucleic acid sequences representing the peptides.
[0485] In some embodiments, the binding agent comprises a recode tag. A recode tag may include a nucleic acid sequence such as a nucleic acid with a sequence including a recognition site for a restriction enzyme. In some embodiments, the binding agent comprises a recode tag comprising the nucleic acid sequence comprising the recognition site for the restriction enzyme or complement thereof.
[0486] Some embodiments include generating a recode block. Some embodiments include generating a recode block comprising sequence information of a cycle tag, and sequence information of the recode tag.
[0487] Some embodiments include generating a memory oligonucleotide. Some embodiments include generating a memory oligonucleotide from the recode block and other recode blocks.
[0488] In some embodiments, the peptide is immobilized. In some embodiments, the peptide is immobilized by being bound to a solid support.
PEPTIDE IDENTITY NUMBER (PIN)
[0489] As background, Unique Molecular Identification tags (UMIs) are often used to provide a unique barcode for each molecule within a given sample of DNA oligonucleotides. They are helpful in counting assays, such as RNAseq assays, to correct for PCR amplification bias; they can be used to ensure that each original molecule is counted only once. Additionally, they are helpful in the face of sequence errors introduced during PCR amplification, library preparation, target enrichment, or the nextgeneration DNA sequencing process itself. In the case of these errors, the sequence of a duplicated oligonucleotide may be different than the original. The new sequence may be confused with another original molecule, even though it is only an imperfect copy of the original. So, without UMIs, and in the face of real-world errors, accurate quantification of the number of original molecules in a sample can be tricky. Sequencing with UMIs reduces errors and increases accuracy of quantification, because bioinformatics software can filter out duplicate reads and PCR errors with a high level of accuracy.
[0490] The oligonucleotide synthesis of UMIs is accomplished via applying equimolar mixtures of two or more degenerate nucleotide bases within a single step of oligonucleotide synthesis. For example, if any of the 4 natural degenerate DNA bases A, G, C, or T could be incorporated in a growing oligonucleotide strand at each new position, then after 10 cycles of synthesis, the library of DNA would contain 4A10 (over a million) possible UMI sequences. "Wobbles" are equimolar mixtures of two or more different bases at a given position within an oligonucleotide sequence.
[0491] For similar reasons, a method to associate a unique identification number with each memory oligo may be advantageous. Assembly of blocks of information, such as provided by the present disclosure, affords an opportunity to assemble non-unique elements into a unique identifier. The assembled identifier is termed a Polymer Identity Number (PIN). A PIN may be used in a similar fashion as a UMI to account for PCR duplicates, and in general can be used to improve the accuracy of digital counting quantification. To illustrate, FIG. 56 shows five recode blocks, each having a priori defined positions at which the nucleotide identity is random, e.g., a wobble. Given assembly of the recode blocks into a memory oligo we can assign a unique PIN to over a million original memory oligo molecules. If 10 blocks are assembled the possible PIN space may grow to over 1 quadrillion. This concept is generalizable, and may be applied in any situation where multiple elements are assembled and there is an advantage to know whether the assembled product shares that same origin as other assembled products in an amplified and/or manipulated sample.
[0492] In some embodiments, a cycle tag comprises a wobble in 1 or more positions, the position(s) being a priori defined. The number of wobbles in the cycle tag for cycle i may be the same or different than the number of wobbles in any other cycle tag. The position of the wobble within the cycle tag sequence of cycle i may be the same or different than the position of wobbles in any other cycle tag. [0493] In some embodiments, a recode tag comprises a wobble in 1 or more positions, the position(s) being a priori defined. The number of wobbles in the recode tag for amino acid i may be the same or different than the number of wobbles in any other recode tag. The position of the wobble within the recode tag sequence of recode tag i may be the same or different than the position of wobbles in any other recode tag.
[0494] In some embodiments, a metadata tag comprises a wobble in 1 or more positions, the position(s) being a priori defined. The number of wobbles in the metadata tag for one metadata attribute may be the same or different than the number of wobbles in the metadata tag representing any other metadata attribute. The position of the wobble within the metadata sequence of metadata attribute i may be the same or different than the position of wobbles in any other metadata attribute.
[0495] In some embodiments, a recode block comprises cycle tags, recode tags, and/or metadata tags that have one or more wobbles. [0496] In some embodiments, a memory oligo comprises recode blocks and/or metadata blocks having one or more wobbles.
[0497] In some embodiments, the information of the wobble base(s) is directly transferred to the recode block or memory oligo through the actions of enzymatic ligation, chemical ligation, polymerization, or any combination thereof.
[0498] In some aspects, the information of the wobble base(s) is directly or indirectly assembled to create a recode block or memory oligo using any of the methods describe herein.
[0499] In some embodiments the number of wobble bases is the same across elements, leading to a consistent PIN length. In some embodiments the number of wobble bases is the different across elements, leading to a variation in PIN length.
[0500] In some embodiments a memory oligo comprising information of wobble base(s) is analyzed to determine and/or utilize the Polymer Identity Number (PIN). When analyzed, the random information of each tag can be informatically combined in silico to generate a PIN.
[0501] In some embodiments information is assembled from two or more subunits, where some or all subunits comprise a random element, to form a product having PIN information.
MODIFIED CARRIER MOLECULES FOR ENHANCED LOW-INPUT PROTEIN SEQUENCING
[0502] The present disclosure provides a comprehensive set of systems and methods for modifying carrier molecules to be unresponsive to common protein sequencing techniques. A range of carrier molecules can be utilized with the present systems and methods, including but not limited to carrier proteins such as [3-lactoglobulin, bovine serum albumin (BSA), and ovalbumin. In certain embodiments, chemical and/or enzymatic modifications are utilized to alter both the N- and C-termini of such carrier proteins to render them unresponsive. The modified carrier proteins, as described herein, can be useful to reduce non-specific absorptive losses during low-input protein sequencing, thereby improving sequencing efficiency and accuracy.
Introduction to the Use of Carrier Molecules
[0503] The fields of proteomics and molecular biology have progressed substantially in recent years due to the evolution of methods and tools for analyzing proteins and their roles in biological systems. Proteins are integral to physiological and biochemical processes in cells. Anomalies in protein sequences or concentrations are often indicative of disease states, therefore necessitating precise and sensitive detection tools.
[0504] However, complexities and challenges associated with preparatory phases for traditional protein sequencing methods often lead to incomprehensive or inaccurate sequencing data for analysis. There is thus a need in the art for improved systems and methods of preparing samples for protein sequencing. The present disclosure addresses this and other needs. [0505] The present disclosure provides methods for analyzing, and for preparing for analysis, polymeric macromolecules, such as peptides, polypeptides, and proteins. Accordingly, aspects of the present disclosure relate to the field of proteomics.
[0506] The fields of proteomics and molecular biology have progressed substantially in recent years due to the evolution of methods and tools for analyzing proteins and their roles in biological systems. Proteins are integral to physiological and biochemical processes in cells. Anomalies in protein sequences or concentrations are often indicative of disease states, therefore necessitating precise and sensitive protein detection and analysis tools. However, complexities and challenges associated with preparatory phases for traditional protein sequencing methods often lead to suboptimal or inaccurate sequencing data for analysis.
[0507] Protein sequencing is a valuable technique in proteomics and molecular biology, offering valuable insights into the structure and function of proteins. With advancements in high-throughput technologies, such as mass spectrometry emerging techniques like single molecule protein sequencing, the capacity to analyze complex protein samples has increased considerably.
[0508] However, these techniques are not without limitations, particularly when input protein concentrations are suboptimal. Low-input scenarios, such as situations where there are samples containing microgram (pg) or sub-microgram levels of protein, present challenges related to nonspecific adsorption and sample loss.
[0509] Carrier proteins are useful to mitigate non-specific adsorptive losses. These proteins may serve as a matrix that prevents a target protein from adsorbing onto the walls of sample containers or other analytical equipment. Examples of carrier proteins may include [3-lactoglobulin, bovine serum albumin (BSA), and ovalbumin, among others. The amount of carrier protein added to a sample can range from 1:1 to 100:1 (carrier-to-target protein mass ratio), depending on the specific requirements of the experimental design.
[0510] However, the use of carrier proteins may introduce a significant computational and analytical burden when sequencing, thereby inflating throughput requirements. For instance, inclusion of carrier proteins at a 10:1 mass ratio can increase the raw data generated by up to an order of magnitude, requiring additional computational power for data sorting and analysis. Moreover, the carrier proteins themselves may become subject to sequencing, complicating the data output. Current methods for mitigating this issue, such as sample pre-processing and data filtration, are inadequate and can introduce bias or error.
[0511] Given these limitations, there is a clear gap in existing methodologies for protein sequencing. Most approaches focus either on the optimization of sequencing technologies or on data postprocessing, but do not address the fundamental issue of carrier protein interference. There is therefore a pressing need for novel approaches that allow the utilization of carrier proteins to prevent sample loss while simultaneously minimizing their interference in the sequencing process. The present disclosure aims to fill this technological and scientific gap by introducing novel chemically and enzymatically modified carrier proteins and other carrier molecules that are inert to common protein sequencing techniques, as well as methods of forming the same.
Carrier Molecule Summary
[0512] The present disclosure includes compositions of matter, methods, and systems for analyzing polymeric macromolecules, including peptides, polypeptides, and proteins, as well as for preparing such polymeric macromolecules for analysis. Analysis, as described herein, may refer to sequencing, such as protein sequencing, and other related processes for determining one or more characteristics of a macromolecule.
[0513] In certain embodiments, the present disclosure provides compositions, systems, and methods for preparing the preparatory phase (e.g., before cyclic sequencing) of a sample for protein sequencing. Such compositions, systems, and methods facilitate improved protein sequencing accuracy and efficiency.
[0514] In certain embodiments, the present disclosure provides novel chemically and enzymatically modified carrier proteins, and other carrier molecules, that are inert to common protein sequencing techniques, as well as methods of forming the same.
[0515] By modifying specific residues on carrier proteins, the structure and/or behavior of such proteins can be altered such that the proteins are unresponsive to sequencing methods and do not cause any interference or complications thereof. Accordingly, the present disclosure offers a novel, practical approach to improving the efficiency and accuracy of protein sequencing in scenarios where carrier proteins are utilized, thereby addressing critical gap in the field.
[0516] Certain embodiments herein provide a method for modifying a carrier protein for use in protein sequencing, the method comprising: selecting a carrier protein; and modifying either or both of the N- terminus or C-terminus of the carrier protein to render it unresponsive to sequencing processes.
[0517] Certain embodiments herein provide a method for reducing non-specific absorptive losses of a target protein in low-input protein sequencing processes using the modified carrier protein, the modified carrier protein being modified at either or both the N-terminus or C-terminus to render it unresponsive to the sequencing processes
[0518] In certain embodiments, a new class of carrier proteins is provided, such carrier proteins being chemically and/or enzymatically modified to be inert in the context of established sequencing techniques. These carrier proteins do not interfere with protein sequencing chemistries, nor do they inflate computational and analytical requirements, thereby resolving the unmet need for an optimized, efficient protein sequencing process in low-input conditions. Carrier Molecule Description
[0519] In proteomic scenarios with low protein input, the utilization of a carrier protein, such as [3- lactoglobulin, can help minimize nonspecific losses during sample preparation and sequencing. However, the use of a carrier protein may inflate sequencing throughput requirements, as the carrier protein itself becomes a subject for sequencing. Accordingly, the present disclosure addresses this issue by introducing methods to modify carrier proteins or peptides, such as -lactoglobulin, so that they are unresponsive to sequencing processes.
[0520] FIG. 57 and FIG. 58 schematically illustrate exemplary mechanisms for modifying a carrier protein or peptide to render it unresponsive to sequencing operations, according to embodiments of the present disclosure.
[0521] In certain embodiments, a carrier protein (or peptide), such as -lactoglobulin, is selected to minimize nonspecific absorptive losses of a target protein during sample handling and sequencing. Referring to FIG. 57, to prevent the carrier protein from being sequenced via an N-terminal degradation based sequencing method, a specific protecting group may be covalently attached to the N-terminus of the protein. The chemistry of this protecting group can be designed to resist removal and such that it remains intact during sample preparation and subsequent sequencing processes. This can ensure that although the carrier protein is present during sequencing, it remains unresponsive, or chemically/biologically inert, to sequencing reactions. The modified carrier protein can therefore be present throughout the entire sample preparation and sequencing process but does not contribute to the sequencing throughput, thereby optimizing efficiency.
[0522] In an alternative embodiment shown in FIG. 58, the focus is shifted to preventing the carrier protein from binding to the solid support utilized to anchor the target protein for sequencing. Here, in an example, a carrier protein, such as P-lactoglobulin, is selected to minimize nonspecific losses of the target protein during sample handling and sequencing. The carrier protein may be exposed to a chemical modification, for example, a blocking group can be added, to its C-terminal end, or other binding sites as applicable. This modification is useful to inhibit the carrier protein's ability to bind to the solid support during sample preparation for sequencing. In some embodiments, as a result, the modified carrier protein will not bind to the solid sequencing support and can be washed away during the preparation process, thereby not participating in subsequent sequencing reactions of the sample. This can ensure that only the target protein is sequenced, optimizing sequencing resources.
[0523] Although shown and described in the alternative with reference to FIGs. 57 and 58, in certain embodiments, a carrier protein may be modified at both N-terminal and C-terminal ends.
[0524] In some embodiments, similar approaches to blocking portions of the peptide or protein can be used for alternative sequencing methods, such as C-terminal based degradation methods or solid support attachment via side chains.
[0525] In certain embodiments, the sample matrix comprises a carrier molecule or carrier protein. [0526] In certain embodiments, a carrier molecule or carrier protein is pre-incubated with surfaces used during sample preparation.
[0527] In certain embodiments, a carrier molecule or carrier protein is pre-incubated with surfaces used during next generation protein/peptide sequencing (NGPS).
[0528] In certain embodiments, one or more types of carrier molecules or carrier proteins may be utilized simultaneously. For example, a modified BSA and a modified immunoglobulin may both function as carrier proteins for a single sample.
[0529] In certain embodiments, a carrier molecule or carrier protein may be prevented from participating in sequencing operations via a modification to the chemical composition of the carrier molecule or carrier protein that interferes with sequence reactions.
[0530] In certain embodiments, a carrier molecule or carrier protein may be prevented from participating in sequencing operations due to a modification to the carrier molecule or carrier protein that prevents it from immobilizing to a surface.
[0531] In certain embodiments, a carrier molecule or carrier protein may be prevented from participating in sequencing operations due to one or more modifications to the carrier molecule or carrier protein that prevent it from immobilizing to a surface and that interfere with sequence reactions. [0532] In certain embodiments, a carrier molecule or carrier protein may be prevented from participating in sequencing operations because of inherent design properties of the carrier molecule or carrier protein that prevent immobilization to a surface and/or lack of functional moieties of the carrier molecule or carrier protein that would interact with sequencing reagents.
[0533] In certain embodiments, a carrier molecule includes a natural protein, modified natural protein, synthetic peptide.
[0534] In certain embodiments, a carrier molecule or carrier protein is modified at multiple sites, e.g., at the N-terminal, the C-terminal, and/or on any amino acid.
[0535] Various carrier proteins or molecules may be employed according to the methods and systems described herein. For example, such proteins or molecules may include, but are not limited to: [3- lactoglobulin, bovine serum albumin (BSA), ovalbumin, lysozyme, casein, immunoglobulins, and the like. Furthermore, wholly synthetic peptides or even non-peptidic molecules that possess desired absorptive characteristics can also be used as carrier substances. Non-peptidic molecules would, by default, not take part in the sequencing process and can also be of value. Examples include but are not limited to: nucleic acid protein conjugates or complexes. These alternative carriers may also undergo either the addition of protecting groups to reactive amines, carboxyls, thiols, or hydroxyls, or modification(s) to inhibit binding to the solid support.
[0536] In certain embodiments, cyclization of a carrier protein prevents reactions with sequencing reactants such as PITC used in Edman degradation. In such embodiments, a carrier protein can be subjected to enzymatic cyclization using enzymes like sortase, which links the N-terminus to the C- terminus, forming a cyclic protein. [0537] In certain embodiments, N-alkylated glycine oligomers, also referred to as peptoids, can be used as carrier molecules. A peptoid's unique backbone renders it naturally resistant to Edman degradation, and its chemical structure may minimize non-specific adhesion of target proteins to solid supports.
[0538] In certain embodiments, beta-amino acids are utilized to construct a carrier backbone (i.e., a carrier structure) that can serve as a carrier molecule. The extra methylene group in the backbone can confer resistance to proteolytic degradation and sequencing processes, while allowing tailored modifications for specific carrier functions.
[0539] In certain embodiments, oligo-ureas can serve as a carrier molecule. A urea-based backbone may be resistant to proteolytic degradation and Edman sequencing, and its surface properties can be engineered to minimize nonspecific interactions.
[0540] In certain embodiments, chemical polymers having one of more of the following monomer building blocks may be engineered to minimize nonspecific interactions, while remaining chemically inert to sequencing reagents: acrylate, vinyl pyridine, dihydroxy methacrylates, other methacrylates, HEMA, PHEMA, PVA, HPMC, PLGA, PEG, or the like in linear, branched, and/or crosslinked configurations, a block co-polymer configuratios, or a configuration conducive to reduce adsorptive losses of sample molecules, or other similar polymer.
[0541] In certain embodiments, side-chain modification of a carrier protein can be performed to prevent immobilization of the protein to surface. Specific amino acid residues, like lysines or cysteines, can be selected for this modification.
[0542] These aspects and other features and advantages of the present disclosure are described below in more detail.
Methods for Preventing Reactions of an Amine Group of an N -Terminal Amino Acid
[0543] In certain embodiments, chemical methods can be employed to introduce N-terminal modifications that prevent sequencing of a carrier molecule or carrier protein. For example, methods such as acylation, with carboxylic acid anhydrides, acyl halides, activated esters of carboxylic acids or ketenes, and dimethylation with formaldehyde, can be used for the introduction of their corresponding n-terminal modifications, while other techniques may include guanidination or sulfonylation. Chemical methods offer the advantage of specificity and may also allow for controlled modifications, wherein selective modification at the n-terminal, exclusive of the amine side chain of lysine, is beneficial or desired. For example, selective n-terminal modification using 2-pyridinecarboxyaldehydes [MacDonald et al., Nat. Chem Bio, 11:326-331 (2015)], 2-ethynylbenzaldehy des [Deng et al., Common. Chem, 3:1-9 (2020)], or tunable phenol esters [Mikkelsen et al, Bioconjug. Chem, 33:635-633 (2022)] have been described. Additionally, bio-orthogonal chemical methods can be developed to add novel chemical groups that are not naturally occurring but can effectively prevent amino groups from reacting with protein sequencing reagents [De Rosa et al., Molecules, 26(12):3521 (2021), Chan et al., J. Am. Chem. Soc., 134(5):2589-2598 (2012)].
[0544] In certain embodiments, enzymatic methods can be employed to introduce N-terminal modifications that prevent sequencing of a carrier molecule or carrier protein. For example, acetyltransferases can be used for acetylation, while methyltransferases can facilitate methylation. Enzymatic methods offer the advantage of specificity and also the potential for controlled modifications. Additionally, bio-orthogonal enzymatic methods can be developed to add novel chemical groups that are not naturally occurring but can effectively prevent amino groups from reacting with protein sequencing reagents.
[0545] In certain embodiments, genetic engineering approaches are employed for modifying carrier molecules or carrier proteins at the genetic level. For example these modifications may aim to express proteins with altered N-terminals that are inherently resistant to conventional sequencing methods. This inventive approach encompasses techniques such as site-directed mutagenesis, which allows for precise alterations of the genetic sequences encoding carrier proteins. Through this method, specific amino acid residues at the N-terminus can be targeted and modified to resist reactivity or recognition by sequencing enzymes and chemicals. Additionally, the use of recombinant DNA technology is contemplated, enabling the design and production of carrier proteins with tailor-made N-terminal sequences or structures. These genetically engineered carrier proteins, exhibiting modified N-terminals, ensure that while they perform their essential roles in sample preparation and handling, they remain unresponsive to sequencing processes. This approach not only broadens the scope of modifying carrier proteins but also introduces a level of precision and customization that can be finely tuned to meet specific requirements of various sequencing techniques. Such genetic modifications, being intrinsic to the protein structure, provide a robust and permanent solution to mitigate interference by carrier proteins in sequencing operations, thereby enhancing the efficiency and accuracy of proteomic analyses.
[0546] These methods are non-limiting examples, and various other methods, chemicals, enzymes and conditions or combinations of such are contemplated to achieve a desired N-terminal modification.
Methods to Prevent Reaction of the Carboxyl Group of a C-Terminal Amino Acid
[0547] In certain embodiments, chemical methods can be employed to introduce C-terminal modifications that prevent sequencing of a carrier molecule or carrier protein. For example, methods such as esterification of acidic groups using anhydrous acidic methanol or ethanol are utilized to incorporate modifications at the c-terminal position, while amidation of carboxylic acids by reaction with amines in the presence of activators, including carbodiimides or carbonyldiimidazole, is a classical approach for alternative modification. Chemical methods offer the advantage of specificity and also allow for controlled modifications wherein selective modification at the c-terminal, exclusive of the carboxyl side chains of aspartic and glutamic acids, is beneficial or desired. For example, photoredox catalysis [Zhang et al., ACS Chem Bio, 16(l l):2595-2603 (2021)] has been utilized to exploit the differences in oxidation potentials between internal versus c-terminal carboxylates in obtaining functionalization exclusively at the c-terminal, while a chemoenzymatic approach leveraging the transpeptidase activity of carboxypeptidase Y has been employed to achieve selective c-terminal modification [Xu et al., ACS Chem Bio, 6:1015-1020 (2011)]. Additionally, bio-orthogonal chemical methods can be devised to add novel chemical groups that are not naturally occurring but can effectively prevent amino groups from reacting with protein sequencing reagents.
[0548] In certain embodiments, enzymatic methods can be employed to introduce C-terminal modifications that prevent sequencing of a carrier molecule. For example, transpeptidation reactions, mediated by sortase, can also be used to attach specialized peptides or other molecules to the C- terminus. Like the N-terminal enzymatic methods, C-terminal enzymatic modifications can also include bio-orthogonal enzymatic approaches to add novel chemical groups.
[0549] These methods are non-limiting examples, and various other methods, chemicals, enzymes, and conditions or combinations of such could be explored to achieve a desired N-terminal modification.
Methods for Macromolecule Immobilization to Solid Support
[0550] In certain embodiments, chemical methods can be employed to immobilize a macromolecule to a solid support. For example, a first reactive group on the macromolecule can be chemically activated, and this activated group may be subsequently reacted with a second reactive group on the solid support. As a specific example, activation of carboxyl groups on the macromolecule to yield NHS-esters are then reacted with amino groups on a solid support [Dixit et al., Nat Protoc, 6:439-445 (2011)]. Additionally, a first reactive group on the solid support can react with a second reactive group on the macromolecule, such as the reaction of an isothiocyanate group on a solid support with an amine group on the macromolecule [Wachter et al, Febs Letters, 35(l):97-102 (1973)]. Additionally, a first reactive group on the solid support can be chemically activated, and the activated group subsequently reacted with a second reactive group on the macromolecule, such as the activation of carboxyl groups on the solid support to yield NHS-esters which then can be reacted with amino groups on the macromolecule [Becke et al., J Vis Exp, 20(138):58167 (2018)]. Additionally a first reactive group on a macromolecule can be linked to a second reactive group on a solid support using a bifunctional linker. For example, phenylene diisothiocyanate can be used to link amine groups on a macromolecule with amine groups on a solid support [Laursen et al., Febs Letters, 21(1): 1482 (1972)] , and cyanuric chloride may similarly be used to link amine groups on a macromolecule with amine groups on a solid support [Hermanson, Bioconjugate Techniques, 3rd Edition, 549-740 (2013)]. Additionally, bio-orthogonal chemical methods, such as the use of azide-alkyne cycloadditions [Meldal et al., FlOOOResearch, 5, Fl 000 Faculty Rev-2302 (2016)], can be developed to add novel chemical groups that are not naturally occurring but can effectively immobilize macromolecules to solid supports. For example, an N-linked glycan or glycoside may be converted to aldehyde to allow specific orthogonal reaction of a modified protein with hydrazine or oxyamino reactive groups on a surface [Spears et al., Org. Biomol. Chem, 14:7622 (2016)]. Similarly, a thiol functional group of a protein may be specifically couple with thiols or maleimides on the surface. One or more step immobilization methods may also be employed to add linkers, such as PEG linkers, between protein and surface conjugation reactive moieties.
Modification of Macromolecular Analytes in Preparation for Determination of Amino Acid Identity and Positional Information
[0551] In certain embodiments, it is advantageous to use chemical methods to alter a macromolecule analyte prior to single-molecule protein sequencing. Specifically, derivatization of amino acid side chains may increase their stability or impart highly-recognizable non-native elements, and thereby afford a higher accuracy of identification at recognition steps within the single-molecule protein sequencing methods described herein. For example, the disulfide bond of cystine can be oxidized, such as by use of performic acid [Pesavento et al., Mol Cell Proteomics, 6(9):1510-26 (2007)], to yield the corresponding cysteic acids, while the disulfide bond of cystine can be reduced and alkylated, such as by use of TCEP as a reductant and iodoacetamide as an alkylating agent [Suttapitugsakul et al., Mol Biosyst, 13(12):2574-2582 (2017)], to yield an alkylated cysteine. Additionally, a phosphorylated or glycosylated serine residue within a macromolecule can be modified, such as by [3-elimination with Ba(OH)2 followed by nucleophilic addition of 2-aminoethanethiol, to yield an aminoethylated cysteine residue [Rusnak et al., J Biomol Tech, 15(4):296-304 (2004)]. Additionally, the side chain reactivity of an amino acid within a macromolecule can be modified to yield an alternate side chain reactivity. For example, the guanidino- functionality of arginine can be modified to the amino- functionality of ornithine [Honegger et al., Biochem J, 199(l):53-59 1981].
[0552] In certain embodiments, chemical methods can be employed to alter a macromolecule analyte prior to immobilization to enhance specificity of the immobilization. For example, conditions for reaction of carboxyl groups in a macromolecule with a carbodiimide, such as N-ethyl,N’- (3dimethylaminopropyl)-carbodiimide, can be developed to favor the blocking of side chain carboxyls as N-acyl ureas, while activating the c-terminal carboxyl group as an oxazolinone, to obtain specific c- terminal immobilization [Previero et al., Febs Letters, 33(10): 135- 138 (1973)]. Additionally, multiple chemical methods, used in a defined order of operations, can be employed to immobilize a macromolecule to a solid support through reaction with a modified amino acid side chain group. For example, a macromolecule can be reacted with (a) an isothiocyanate, such as phenylisothiocyanate, to yield a thiocarbamylated macromolecule, which (b) following treatment with a strong acid, such as trifluoroacetic acid, to remove the first amino acid of the macromolecule, will yield a macromolecule with a free n-terminus [Jay et al, J. Biol. Chem., 259(24): 15572-15578 (1984)], which (c) when treated with hydrazine will yield a macromolecule with a free n-terminus and ornithine amino groups, which (d) when reacted with a solid support bearing isothiocyanate groups will yield a macromolecule immobilized to a solid support through modified arginine residues.
[0553] In some embodiments, multiple serial derivatizations provide a stably immobilized peptide having R-groups that are stable to reaction conditions of a single-molecule protein sequencing method. [0554] FIG. 59 depicts a flow diagram an example set of reactions leading to a modified analyte containing some or all of the original amino acid identity and positional information, and that is amenable to protein sequencing. In some embodiments, the order of operations is useful or critical to achieve a desired output. Steps for the example set of reaction in FIG. 59 comprise:
1) selecting a carrier molecule or carrier protein based on one or more characteristics of the carrier molecule or carrier protein, such as hydrophobicity, PI, size, availability, cost;
2) manipulating, or modifying, the carrier molecule or carrier protein as described above to prevent interference with the following steps, as well as sequencing operations. Manipulations may include derivatizations of amine terminus, carboxyl terminus, or R-group derivatizations, or a combination thereof. In certain embodiments, a pre-manipulated carrier molecule or carrier protein may be used, and/or may come as part of a kit;
3) depleting highly abundant proteins [Tu et al, J. Proteome Res, 9(10):4982-91 (2010)] in a sample to be sequenced. There are several commercially available methods that may be employed, including those of ThermoFisher, Seer, and Agilent. It is recognized that enrichment or depletion operations may also affect the concentration of carrier proteins;
4) modifying one or more target protein analytes in the sample via protection or derivatization chemistries, as described above, to improve stability or preserve distinction between amino acid side chains which may be lost following exposure to acidic sequencing chemical reagents. Modification to C-terminal, N-terminal, R-groups, or a combination thereof may be carried out. In some aspects, the order of derivatization is important. For example, reduction of disulfide bonds prior to conversion of cysteine to S-pyridylethyl cysteine [Friedman et al., J. Protein Chem, 20:431-453 (2001)] is preferred. Modification may also include digestion to optimize the length of peptide;
5) immobilizing target protein analyte(s) to a solid support, such as by the methods described herein. Several mechanisms are available, and appropriate methods include consideration of the chemistries employed in previous stages of preparing the sample and subsequent stages of a workflow. For example, during any of the stages that include: depletion, contacting with blocking agents, reduction, alkylation, digestion, chemical modification; and
6) carrying out protein sequencing operations with the target protein analyte(s).
[0555] Some embodiments include any one or more of the above steps, or may include a different order of steps. Modification of Carrier Proteins to Prevent Immobilization
[0556] Carrier molecules that comprise chemical moieties that are similar to that of a macromolecular analyte are often used to block non-specific binding of the analyte to surfaces before or during assay steps. Given the methods for manipulation and immobilization of a protein analyte to a surface describe herein, one can envision multiple methods to modify a carrier molecule protein such that it may block the interactions of analytes with containers and surfaces used in a single-molecule protein sequencing method, but not participate in the sequencing method. In one embodiment, the carrier protein is esterified, amidated, or treated with sortase to prevent it from immobilizing to a solid support. In one embodiment, the carrier proteins are alkylated, acylated, arylated, or treated with acetyltransferase to prevent it from manifesting a primary amine capable of providing signal in within the single-molecule protein sequencing method. It is recognized that blocking immobilization is strongly desired in addition to blocking interaction with sequencing reagents. Thus, in one embodiment, the carrier protein is both alkylated and trans-acylated.
Alternative Carrier Molecules
[0557] Various carrier proteins or molecules may be employed according to the methods and systems described herein. For example, such proteins or molecules may include, but are not limited to, [3- lactoglobulin, bovine serum albumin (BSA), ovalbumin, lysozyme, and casein. Furthermore, wholly synthetic peptides or even non-peptidic molecules that possess the desired absorptive characteristics can also be used as carrier substances. Non-peptidic molecules would, by default, not take part in the sequencing process and can also be of value. These alternative carriers may also undergo either the addition of a protecting group to their N-terminus, or a modification to inhibit binding to the solid support, similar to the methods described in previous embodiments. This offers greater flexibility in the choice of carrier substances, allowing for the optimization of the sequencing process based on the specific requirements of the target protein and the sequencing system being employed.
Cyclized Proteins and Cyclization Methods
[0558] In certain embodiments, cyclization of the carrier protein can be employed to render it unresponsive to sequencing processes such as Edman degradation. In such embodiments, a carrier protein, such as P-lactoglobulin, can subjected to enzymatic cyclization using enzymes like sortase, which links the N-terminus to the C-terminus, forming a cyclic protein. Chemical cyclization methods using crosslinking agents like l-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC) can also be employed. The cyclic structure of the modified protein lacks a free N-terminus, making it inherently resistant to Edman degradation and similar processes, while retaining the carrier function of the protein. Peptoids
[0559] Peptoids (poly-N-substituted glycines), are a class of biomimetic molecules that can be used as carrier molecules according to the methods described herein. They are polymers where the side chains are attached to the nitrogen atom of the backbone, unlike in peptides where the side chains are attached to the a-carbon. This unique structure of peptoids prevents the formation of intra-backbone hydrogen bonds that are typical in proteins. This difference allows for the exploration of polymer properties and chain folding without these hydrogen bonds. Peptoids are used in various fields, including health, environmental, and drug delivery applications, and are synthesized primarily using automated solidphase synthesis methods. In certain embodiments, peptoids can used as carrier substances. A peptoid's backbone renders it naturally resistant to Edman degradation, and its surface can be modified to minimize nonspecific binding to target proteins or solid supports.
Beta-Amino Acids
[0560] In certain embodiments, beta-amino acids are utilized to construct a carrier backbone (i.e., a carrier structure). Synthesis of a beta-amino acid can be accomplished through the homologation of an alpha-amino acid, which involves a sequence of chemical reactions that extend the carbon chain of the alpha-amino acid, thus converting it into a beta-amino acid. This process can be facilitated by a Reformatsky reaction combined with a Mannich-type iminium electrophile [Moumne et al., J. Org. Chem, 71(8):3332-3334 (2006)]. The extra methylene group in the beta-amino acid backbone can confer resistance to standard proteolytic and sequencing processes, while allowing tailored modifications for specific carrier functions. This may reduce non-specific binding in a sequencing system, while not actively participating in the sequencing process itself.
Oligourea
[0561] In certain embodiments, oligo-ureas can serve as a carrier substance. These can be synthesized through well-established methods utilizing, e.g., Boc-protected activated succinimidyl carbamates and other amino acid derivatives. A urea-based backbone may be resistant to proteolytic degradation and Edman sequencing, and its surface properties can be engineered to minimize nonspecific interactions during sequencing.
Other Groups for Blocking Methods
[0562] In certain embodiments, side-chain functionalization of a carrier protein can be performed to disrupt Edman degradation. Specific amino acid residues, like lysines or cysteines, can be selected for this modification. Reagents like N-hydroxy succinimide (NHS) esters for lysines or maleimides for cysteines may be used. These modifications can introduce bulky or reactive groups, such as azides or alkynes, that may sterically or chemically interfere with the sequencing process.
Degradation Resistant Tagging Groups
[0563] In certain embodiments of the present disclosure, the incorporation of degradation-resistant tags at either the N-terminal or C-terminal of carrier proteins is proposed as a strategy to prevent sequencing of these proteins from those respective ends. These tags are specifically designed to resist degradation processes typically encountered in protein sequencing. Embodiments and techniques above may be used to confer degradation resistance to the terminal end of a peptide or protein.
KITS
[0564] Disclosed herein, in some embodiments are kits. The kit may include any component herein, or any aspect which is described. The kit may be useful for analyzing polymeric macromolecules, including polymeric macromolecules such as peptides, polypeptides, and proteins.
[0565] Some embodiments include instructions such as written instructions for use. For example, the kit may include instructions for use in a method of determining identity and positional information of amino acids of peptides.
[0566] In some embodiments, the kit includes a chemically-reactive conjugate.
[0567] In some embodiments, the kit includes a binding agent.
[0568] In some embodiments, the kit includes a reagent for transferring information of the recode nucleic acid and/or the cycle nucleic acid of the conjugate complex to generate a recode block.
[0569] Some embodiments include a method for analyzing polymeric macromolecules such as polymeric macromolecules such as peptides, polypeptides, or proteins, comprising: a chemically- reactive conjugate comprising (a) a nucleic acid sequence tag and (b) a reactive moiety that couples to a N-terminal amino acid residue of a peptide, and thereby forms a conjugate complex comprising the chemically-reactive conjugate coupled to the N-terminal amino acid of the peptide; a binding agent comprising a binding moiety for preferentially binding to the conjugate complex, and a recode tag comprising a recode nucleic acid corresponding with the binding agent; and a reagent for transferring information of the recode nucleic acid and the cycle nucleic acid of the conjugate complex to generate a recode block.
[0570] In some embodiments, the kit includes any or all of the following aspects: (a) a solid support for coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions; (b) one or more reagents having chemically-reactive conjugates, the chemically-reactive conjugates comprising: (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number, (y) a reactive moiety for binding the N-terminal amino acid residue of the peptide, and (z) an immobilizing moiety for immobilization to the solid support; (c) a reagent for coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex, when the peptide is contacted with the chemically-reactive conjugate; (d) one or more reagents for immobilizing the conjugate complex to the solid support via the immobilizing moiety; (e) one or more reagents for cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue; (f) one or more reagents having one or more binding agents comprising: (i) a binding moiety for preferentially binding to the immobilized amino acid complex, and (ii) a recode tag comprising a recode nucleic acid corresponding with the binding agent, wherein upon contact of the immobilized amino acid complex with the binding agent, immobilized amino acid complex and the binding agent form an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent; (g) one or more reagents for transferring information of the recode nucleic acid to the cycle nucleic acid of the immobilized conjugate complex to generate a recode block; one or more reagents for joining two or members of the plurality of recode blocks to form a memory oligonucleotide; and/or (j) one or more sequencing reagents for obtaining sequence information of the recode block.
[0571] The kit may be used for sequencing a subset of nucleotides of an oligonucleotide, and may include one or more reagents for sequencing a subset of nucleotides of an oligonucleotide. Some embodiments include an SBS sequencing reagent mix comprising one or more nucleotides as predominantly reversibly terminated nucleotides and one or more nucleotides as predominantly nonterminated nucleotides.
[0572] The kit may include any reagent or aspect described herein.
[0573] Described herein, in some embodiments, are kits for protein sequencing. The kit may include one or more carrier molecules. In some embodiments, the one or more carrier molecules are designed or modified such as to reduce non-specific adsorptive losses of a sample analyte in a sample for protein sequencing. In some embodiments, the one or more carrier molecules are designed or modified such as to not interfere with protein sequencing operations. In some embodiments, the one or more carrier molecules are designed or modified to not increase sequence data collection or analysis burden during the protein sequencing operations. Described herein, in some embodiments, are kits for protein sequencing, the kit comprising: one or more carrier molecules, wherein: the one or more carrier molecules are designed or modified such as to reduce non-specific adsorptive losses of a sample analyte in a sample for protein sequencing; the one or more carrier molecules are designed or modified such as to not interfere with protein sequencing operations; or the one or more carrier molecules are designed or modified to not increase sequence data collection or analysis burden during the protein sequencing operations. In some embodiments, the N-terminus or C-terminus of the carrier molecule comprises a covalently attached protective group. In some embodiments, the protective group is a tert-butyloxycarbonyl (Boc) covalently attached to the N-terminus. In some embodiments, the protective group is an adamantane carboxylic acid covalently attached to the C-terminus. In some embodiments, the carrier molecule comprises at least one of [3-lactoglobulin, immunoglobulin, bovine serum albumin (BSA), ovalbumin, lysozyme, or casein. In some embodiments, the one or more carrier molecules are modified at a plurality of sites on each of the one or more carrier molecules. In some embodiments, the one or more carrier molecules comprise a cyclized protein. In some embodiments, the one or more carrier molecules comprise a modified natural protein. In some embodiments, the one or more carrier molecules comprise a synthetic peptide. In some embodiments, the one or more carrier molecules comprise a peptoid.
DEFINITIONS
[0574] As used in the present disclosure, the term “amino acid” and notation “AA” refer to natural d-, 1-, non-natural, and post-translationally modified amino acids. An “N-terminal amino acid” refers to an amino acid that has a free amine group, and is linked to only one other amino acid of the peptide through an amide bond. Similarly, a “C-terminal amino acid” refers to an amino acid that has a free carboxyl group, and is linked to only one other amino acid of the peptide through an amide bond.
[0575] The term “AA tag” refers to a nucleic acid molecule of any length, but typically in the range 5- 20 bases, which contains a sequence that is defined to represent a particular amino acid or class of amino acids that share structural or functional similarity. If recoding a polymer that does not comprise amino acids, then the AA tag sequence may be defined to represent a particular monomer or class of monomers that share structural or functional similarity. It may also refer to any construct that enables a method of subsequent identification of the cycle information, such as a mass tag.
[0576] The terms “analyze” and “analyzing” refer to assigning a sequence, and/or quantification, and/or identity to the macromolecule, or a part of the macromolecule analyte.
[0577] The term “assembly oligo” (e.g., an assembly Oligo) refers to a nucleic acid capable of hybridizing to a memory oligo tethered to a solid support and/or hydrogel. Assembly oligos may be utilized to facilitate ligation assembly of a complementary DNA strand to a memory oligo that is tethered to the hydrogel surface and or solid support as a template. Ligation assembly of a complementary strand avoids the need for polymerase extension through tethered nucleic acids to create a solution phase nucleic acid representative of the analyte sequence. An assembly oligo comprises a sequence complementary to a cycle tag sequence and a sequence complementary to an amino acid sequence.
[0578] The term “binding agent” refers to an entity comprised of a binding moiety joined with a recode tag. The binding moiety and recode tag may be joined by a linker.
[0579] The term “binding moiety” refers to a molecule or macromolecule that recognizes and binds with a target analyte or a feature of the target analyte. Exemplary binding moieties include: antibodies, F(ab’)2, Fab, and scFv regions, nanobodies, DNA aptamers, RNA aptamers, modified aptamers, photoactive or non-photoactive cage compounds, oligo peptide permease (Opp), amino-acyl t-RNA synthetase (aaRS), periplasmic binding proteins (PBP), dipeptide permease (Dpp), proton dependent oligopeptide transporters (POT), modified aminopeptidases, modified amino acyl tRNA synthetases, modified anticalins, modified ClpS, Lectin, or clathrates. A binding moiety may form a covalent association or non-covalent association with target analytes, which include immobilized conjugate complexes, such as an immobilized PTC-AA-cycle tag-conjugate complex. The binding moiety may exhibit preferential binding to one conjugate complex over another one depending on the amino acid of the complex. The binding moiety may bind preferentially to classes of amino acids that are structurally or functionally similar within the conjugate complex.
[0580] In addition to caged drugs and bioactive small molecules, amino acids and derivatized amino acids offer a number of possibilities for caging. For example, amines, carboxylates, and amino acid side chains offer a number of easily caged functional groups. More particularly, caged serine, threonine, tyrosine, cysteine, methionine, aspartate, glutamate, and lysine have all been reported; see Pirrung et al., Synthesis of photodeprotectable serine derivatives - caged serine, Bioorg. Med. Chem. Lett. 2, 1489-1492 (1992); Tatsu et al., Solid-phase synthesis of caged peptides using tyrosine modified with a photocleavable protecting group, Biochem. Biophys. Res. Comm. 227, 688-693 (1996); Gee, K.R., Carpenter, B. K., and Hess, G.P., Synthesis, photochemistry, and biological characterization of photolabile protecting groups for carboxylic acids and neurotransmitters, Met. Enz. 291, 30-50 (1998); Tatsu et al., Synthesis of caged peptides using caged lysine: Application to the synthesis of caged AIP, a highly specific inhibitor of calmodulin-dependent protein kinase II, Bioorg. Med. Chem. Lett. 9, 1093- 1096 (1999); Okuno, T., Hirota, S., and Yamauchi, 01., Folding character of cytochrome c studied by onitrobenzyl modification of methionine 65 and subsequent ultraviolet light irradiation, Biochem. 39, 7538-7545 (2000).
[0581] The terms “biochip” and “microarray” refer to consumable devices that support fluidic operations and further support a recode workflow. In some embodiments, these could include a flowcell used directly by an NGS sequencing instrument in a DNA sequencing process.
[0582] The term “biologically or synthetically-derived sample” refers to a sample of macromolecules that has its origins from a biological process, such as a cell lysate solution, or has origins from a sample created using synthetic biology techniques, or a sample of macromolecules created using purely chemical synthesis, for example a solution of synthetic peptides, synthetic nucleic acids, or chemically- synthesized polymers.
[0583] The term “C/AA tag” refers to a nucleic acid molecule of any length, but typically in the range 5-40 bases, having a sequence that represents a particular amino acid and cycle of a single-molecule sequencing workflow. The length of a C/AA tag may differ for different cycles of the workflow. The C/AA tag may optionally comprise additional nucleic acid sequences that direct assembly of memory oligos in subsequent steps, such as unifying assembly sequences which facilitate recode block assembly irrespective of the order of assembly. In certain examples, a C/AA tag may optionally comprise a restriction endonuclease sequence, and/or a sequence facilitating amplification of recode blocks, or any sequence functionality disclosed herein. The term, “C/AA tag” may also refer to any construct that enables a method of subsequent identification of the cycle information, such as a mass tag, or a chemically reactive moiety. It provides identifying amino acid (or monomer subunit) information for its associated binding agent. It may uniquely identify one amino acid or may identify a class of amino acids with structural and/or functional similarity. A C/AA tag may provide a probabilistic estimate as to the identity of the amino acid, and thereby provide sufficient information for analysis. The C/AA tag is a preconstructed recode block in terms of function in the described systems.
[0584] The term “chemically-reactive conjugate” may include or refer to a conjugate comprising (a) a reactive moiety(ies) that can bind and cleave a terminal amino acid, (b) a reactive moiety that allows immobilization to a solid support, and (c) a cycle tag with identifying information regarding the workflow cycle. In certain embodiments, various moieties may be replaced with reactive moieties for attaching said groups. In some embodiments, a chemically-reactive conjugate may include a reactive moiety that can bind to a first reactive moiety such as a molecule comprising an ITC. A chemically- reactive conjugate may be or include (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety (e.g. that binds or reacts with an ITC conjugate); and (z) a immobilizing moiety for immobilization to the solid support.
[0585] The term “codespace” refers to the universe of codes that are associated with cycle tags and AA tags and are used to represent workflow cycle and monomer identity information, respectively. Codespace is defined by a set of rules that provide practical separation distance between codes and improve fidelity and accuracy while reading information. For example, Hamming distance theory, or other modern digital code space theories (e.g., Lee, Levenshtein-Tenengolts, Reed-Solomon, or others) may be applied to assign codes and enable error detection and error correction capability and account for: 1) NGS sequencing errors during analysis, 2) errors in oligonucleotide synthesis, 3) errors in reagents used in the recoding process, 3) errors that occur during assembly of recode blocks, 4) errors that occur during assembly of memory oligos, or combinations of errors that may occur during any step in the determination of protein sequence and protein abundance by recoding amino acid polymers into DNA polymers and analyzing.
[0586] The term “cognate binding agent” refers to a binding agent that was designed to, and that binds with high relative affinity to, a cognate target analyte or a feature or portion of the cognate target analyte. This is contrasted with a “non-cognate binding agent”, that was not designed to bind to, and thus interacts with low relative affinity to, a non-cognate target analyte or a feature or portion of the noncognate target analyte, such that the non-cognate binding agent does not effectively transfer recode tag information to the recode block under conditions appropriate for recode block assembly by cognate binding agents. [0587] The terms “conjugate complex” and “immobilized conjugate complex” refer to a chemicallyreactive conjugate having been joined optionally as appropriate within the context to: an amino acid (e.g., a monomer of the macromolecular analyte), a peptide, a linker, a solid support, and/or a cycle tag. [0588] The term "complementary" refers to Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds. In general, a nucleic acid includes a nucleotide sequence described as having a "percent complementarity" or “percent homology” to a specified second nucleotide sequence. For example, a nucleotide sequence may have 80%, 90%, or 100% complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence.
[0589] The term “cycle tag” (e.g., “cycleTag”) refers to a nucleic acid molecule of any length, but typically in the range 5-20 bases, having a sequence that is defined to represent a particular cycle of the recode workflow. The length of a cycle tag may differ for different cycles of the workflow. The cycle tag may optionally comprise additional nucleic acid sequences that direct assembly of memory oligos in subsequent steps, such as universal assembly sequences which facilitate recode block assembly irrespective of the order of assembly. In certain examples, a cycle tag may optionally comprise a restriction endonuclease sequence. The term, “cycle tag” may also refer to any construct that enables a method of subsequent identification of the cycle information, such as a mass tag.
[0590] The term “deprotecting” refers to removing protecting moieties that preserve the integrity of a functional group during exposure to conditions and potential reactants that may otherwise react to alter the functional group. Exemplary protecting agents for nucleic acids include: FMOC, acetyl (Ac), benzoyl (Bz), dimethylformamidine (DMFA), and phenoxyacetyl (PAC). See, Radhakrishnan P. Iyer, Current Protocols in Nucleic Acid Chemistry.
[0591] The terms "homology" or "identity" or "similarity" refer to sequence similarity between two peptides or between two nucleic acid molecules.
[0592] The term “hydrogel” refers to synthetic polymers, natural polymers, and/or hybrid polymers. Exemplary monomers that may form the hydrogel include one or more: acrylamide, acrylate, vinyl pyridine, dihydroxy methacrylates, other methacrylates, HEMA, PHEMA, PVA, HPMC, PLGA, PEG, etc., in linear, branched, and crosslinked configurations, block co-polymers configurations, or other configurations conducive to sequencing macromolecules. See, Faisal Raza, Hajra Zafar, Ying Zhu, Yuan Ren, Aftab -Ullah, Asif Ullah Khan, Xinyi He, Han, Md Aquib, Kofi Oti Boakye-Yiadom and Liang Ge, A Review on Recent Advances in Stabilizing Peptides/Proteins upon Fabrication in Hydrogels from Biodegradable Polymers, Pharmaceutics 2018, 10, 16. A hydrogel may be associated with a solid support through covalent or non-covalent interactions. The hydrogel may further comprise orthogonal conjugation chemistry modalities to support the recode workflow. [0593] The terms “zth,” “(z- l)th”, etc., refer to an arbitrary position in the macromolecular analyte and it’ s nearest neighbor.
[0594] The term “initiation oligo” refers to any nucleic acid that when immobilized is capable of initiating assembly of co-localized nucleic acids. Initiation oligos may comprise a sequence facilitating assembly of recode blocks, determination of a shared relative spatial location, amplification of nucleic acids, and/or ULI, U-AS, U-HS, hyb tags, ligation complements, or combinations of these or other sequences described herein for the analysis of peptides. It is of any length, but typically in the range 15- 80 bases and the length of one hyb tag may differ from that of another. In certain examples, an initiation oligo may optionally comprise a restriction endonuclease sequence. Initiation oligos may be linked via a linker, or may comprise a linker. It is also recognized that the initiation oligos may be another molecular format that is not a nucleic acid, and that information may be joined with an initiation oligo via a chemical reaction.
[0595] The term “ITC-Conjugate” refers to a molecule having an amine-reactive group (RG), including but not limited to isothiocyanate, alkyl isothiocyanate, aryl isothiocyanate, substituted aryl isothiocyanate, isoselenocyanate, alkyl selenocyanate, aryl selenocyanate, substituted aryl selenocyanate and a functional group capable of being joined to a CRC, or other complementary reactive element. The ITC-conjugate may include substituents that influence the reactivity or physicochemical characteristics of the molecule, such as fluoro, nitro, halo, carboxyl, cyano, pyridyl, ether, thioether, amide, carbonate, carbamate, tertiary amino, quaternary amino groups and combinations thereof. The ITC conjugate may possess heterocyclic structures including imidazole, pyrazole, pyrazines, thiophene, furan, pyrrole, pyran, pyrimidine, oxazole, thiazole. It is recognized that in the case of C-terminal sequencing the RG group would be reactive to the carboxyl terminus.
[0596] The term “ligation oligo” (e.g., “ligationOligo”) refers to a nucleic acid that becomes ligated to a cycle tag of an immobilized conjugate complex when appropriately directed by a cognate binding agent via hybridization to the recode tag of the cognate binding agent. Ligation oligos may, in certain embodiments, hold information related to amino acid and workflow cycle assembly, and are complementary to the recode tag of a cognate binding agent. It is also recognized that the ligation oligo may be another molecular format that is not a nucleic acid, and that recodes amino acid and workflow cycle information that can be joined with a cycle tag via a chemical reaction. In certain embodiments, ligation oligos may optionally comprise a sequence facilitating ligation, extension: ligation, or chemical ligation of a recode block to another other recode block irrespective of the order of assembly. For example, by including a 3’ and/or 5’ universal assembly sequence on a plurality of recode blocks such that at least two recode blocks share the same universal assembly sequence, assembly of such recode blocks into a memory oligo, in any given order, is enabled.
[0597] The term “linker” or “spacer” refers to a molecule used to join two or more molecules. The composition of the molecule may be a polymer, a monomer or combination of both. A linker may further comprise reactive elements that promote covalent and/or non-covalent conjugation between molecules. Exemplary linkers include those used to join a binding agent to a recode tag, or a cycle tag to other elements of a conjugate complex, e.g. a molecule having a NHS-ester at one end and an azide at the other end of a PEG molecule, or a molecule having a biotin at one end and an maleimide moiety at the other end of a nucleic acid.
[0598] The term “linking oligo” (e.g., linkingOligo”) refers to a nucleic acid capable of promoting ligation between a recode block associated with a given workflow cycle and a second recode block associated with any other workflow cycle of the recoding process. Linking oligos are useful to complete the assembly of a memory oligo, because they can substitute for errors, e.g., in upstream processes that resulted incomplete or unexpected recode block sequence for one or more workflow cycles, no recode block assembly for one or more workflow cycles, or steric effects that prevent interaction between and assembly of recode blocks. Linking oligos may optionally comprise a sequence complementary to the cycle tag sequence of one workflow cycle and the cycle tag sequence of any other workflow cycle. Ligation of recode blocks via linking oligos may create a lack of information related to the recode block that was skipped in the assembly of the memory oligo. In this case it is recognized that the memory oligo may still be valuable for analysis of macromolecular information, since information may be inferred during analysis that an unknown (or multiple unknown) monomers separate the positions of known monomers, and mapping to references sequence allows macromolecule sequence and identity information. In certain embodiments, linking oligos may optionally comprise a sequence for promoting ligation between a recode block associated with a workflow cycle and a second recode block associate with another workflow cycle of the recoding process. For example, such ligation may be promoted via complementarity between universal assembly sequences of the cycle tag and/or the recode tag.
[0599] The term “location linker” refers to any molecule configured to attach a peptide to a solid support, and further configured to bind to a nucleic acid. In some examples, a location linker refers to a molecule with 3 or more functional elements that facilitate the attachment of a peptide, a nucleic acid, and a solid support. In some examples, the nucleic acid can be a UMI that carries code information related to a location of isolation for isolated immobilized PTC-conjugates.
[0600] The term “location oligo” (e.g., “locationOligo”) refers to a nucleic acid of any suitable length, but typically in the range 10-40 bases, that contains a sequence that represents the x,y,z coordinates of an immobilized macromolecular analyte and is held in proximity to a macromolecule via a location linker. Location oligos are useful to transfer location information to spatially-adjacent immobilized recode blocks.
[0601] The terms “macromolecule” and “macromolecular polymer” refer to a high molecular weight molecule composed of subunits. Examples of macromolecules include, but are not limited to, protein complexes such as a photosynthetic reaction center antenna complex, multi-subunit proteins such as a photosynthetic reaction center or a pore protein, single subunit proteins such as cytochrome-c, protein fragments, peptides, polypeptides, nucleic acids, carbohydrates, and polymers such as urethane or acrylamide. “Macromolecule” also describes natural and synthetic combinations of two or more macromolecular types, such as a peptide covalently bound to a nucleic acid, or a lectin bound to a carbohydrate though electrostatic, van der waals forces, or any non-covalent forces.
[0602] The terms “memory oligonucleotide,” and “memory oligo” (e.g., “memoryOligo”) refer to a construct that comprises location information, monomer relative positional information, and/or monomer identity information. It is typically assembled by aggregating the information of recode blocks. Typically, a memory oligo comprises information for one associated macromolecular analyte. However, it is recognized that there are embodiments where a memory oligo comprises identifying information for one or more macromolecular analytes. Optionally, a memory oligo may further comprise: sample indexes, UMIs, universal priming sites, linkers, and other identifiers of macromolecule provenance. The length of a memory oligo will typically be between 25 and 25,000 base pairs. When perfectly assembled, the length of the memory oligo equals the sum of the lengths of provenance identifiers plus the lengths of cycle tag and AA tag sequences multiplied by the number of workflow cycles. It is recognized that cycle tag lengths may be different for different workflow cycles. Note that imperfect assembly of a recode block may produce a memory oligo with shorter or longer lengths than the perfectly assembled memory oligos and that are valuable for analysis of the macromolecule, since cycle and amino acid (e.g., monomer) information is transferred to adjacent registers of the memory oligo. It is further recognized that sequential assembly of recode block information into a memory oligo is not required to provide a memory oligo for analysis that is useful for macromolecule analyte analysis.
[0603] The term “metadata conjugate” refers to a conjugate comprising (a) a binding moiety(ies) that can bind to an attribute of the peptide analyte (b) a reactive moiety that allows immobilization to a solid support, and (c) a metadata tag with identifying information regarding the cognate attribute of the immobilized peptide.
[0604] The term “metadata tag” may refer to a nucleic acid molecule of any length, but typically in the range 5-40 bases, having a sequence that is a priori defined to represent a particular attribute of a peptide. The length of a metadata tag may differ for different attributes. The metadata tag may optionally comprise additional nucleic acid sequences that direct assembly of memory oligos in subsequent steps, such as unifying assembly sequences which facilitate recode block assembly irrespective of the order of assembly. In certain examples, a metadata tag may optionally comprise a restriction endonuclease sequence. The composition of a metadata tag may be DNA, RNA, LNA, PNA, XNA, TNA, BNA, NA with both backbone and base modifications, chemically protected nucleic acids, or a combination thereof. The term metadata tag can also be used to describe any entity that provides metadata information or functionality. For example, a metadata tag may be a peptide, a biotinylation moiety, a small molecule, a reactive small molecule, a click reagent, or other.
[0605] The term “n” refers to the length of the target macromolecular analyte, or the workflow cycle number. It also refers to terminal subunit of the macromolecular analyte, e.g., nth subunit. Accordingly, the next subunit is denoted as n-1, then the n-2, and so on down the length of the peptide. Theses labels can be assigned starting from the N-terminal or the C-terminal end of a macromolecule.
[0606] The terms “n-1”, n-2”, etc., refer to a cycle prior to the last cycle and, so on. It can also refer to a nearest and a next-nearest subunit molecule to the terminal subunit of a macromolecular analyte.
[0607] The term “polynucleic acid” or “polynucleotide” refers to a polymer of deoxyribonucleotides linked by 3'-5' phosphodiester bonds. This also includes polymers with nucleotide analogs and nonnatural nucleotides such as Iso-G and Iso-C. This also includes nucleotides linked by thiophosphate bonds or peptidyl bonds such as in PNA. This also covers RNA and polymers with a modified ribose moiety or moieties, such as LNA, XNA, or BNA.
[0608] The terms “nucleic acid sequencing,” “NGS,” or “next generation sequencing” refer to high- throughput methods to determine the sequence of a nucleic acid polymer. These methods are exemplified by commercially available products from Illumina, Pacific Biosciences, and Oxford Nanopore.
[0609] The term “peptide” or “polypeptide” refers to a chain of two (2) or more amino acids, and no discrimination in terms of length is implied by the terms: peptide, polypeptide, or protein. Similarly, no discrimination or restriction is implied in terms of 1-, d-, non-natural, or post-translationally modified amino acids monomers that comprise the peptide.
[0610] The term “PITC-conjugate” refers to a chemically-reactive conjugate that has not been reacted with an amino acid or a solid support. It is recognized that the qualifier “PITC” is representative terminology to describe any number of molecules (or sets of molecules) that can function similarly to bind to N-terminal or C-terminal amino acids and cleave the terminal subunit.
[0611] The terms conjugate complex, “PTC-conjugate,” and “PTC-AA-cycle tag-conjugate complex”, refer to a chemically-reactive conjugate that has been reacted with an amino acid, but not necessarily been immobilized to a solid support. It is recognized that the qualifier “PTC” is representative terminology to describe any number of alternative molecules (or sets of molecules) that can function similarly to bind to N-terminal or C-terminal amino acids and cleave the terminal subunit. The terms “immobilized conjugate complex,” “immobilized PTC-conjugate,” and “immobilized PTC-AA-cycle tag-conjugate complex” refer to a chemically-reactive conjugate that has been reacted with an amino acid been immobilized to a solid support. It is recognized that the qualifier “PTC” is representative terminology to describe any number of alternative molecules (or sets of molecules) that can function similarly to bind to N-terminal or C-terminal amino acids and cleave the terminal subunit.
[0612] The term “post-translational modification” refers to any modification of an 1-, d-, or non-natural amino acid, either biologically or synthetically. The modifications can occur at the terminal amine, the terminal carboxyl, or any reactive moiety of a peptide. Examples include, but are not limited to, phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, and succinylation. Further information regarding post- translational modifications may be found in, DOI: 10.1021/acs.biochem.7b00861. Biochemistry 2018, 57, 177-185, which is herein incorporated by reference in its entirety.
[0613] The term “recode block” (e.g., “recodeBlock”) refers a construct created by interaction between a cycle tag of an immobilized conjugate complex and the recode tag of a cognate binding agent. Typically, a recode block is a chimeric nucleic acid molecule that contains the information relating the workflow cycle and the amino acid, or class of amino acid, composition that comprises the conjugate complex. Further, the recode block holds information to direct assembly of a memory oligo, and/or amplify the recode block. A recode block may be formed by utilizing an extension-ligation method to transfer information from the recode tag to the recode block, or via a ligation reaction under appropriate conditions in the presence of ligase and ligation oligo, or otherwise transferring information from the cycle tag to a separate entity. A recode block may be formed by utilizing an extension-ligation method to transfer information from the cycle tag to the recode block, or via a ligation reaction under appropriate conditions in the presence of ligase and ligation oligo, or otherwise transferring information from the cycle tag to a separate tag. A recode block may be formed by otherwise transferring information from the cycle tag and recode tag to a separate entity. A recode block may be formed by transferring information from the cycle tag into a new sequence of nucleic acids that is not the cycle tag or its complement, but otherwise indicates the cycle of the workflow. The format of a recode block is not necessarily a nucleic acid. It may also take the form of mass tags that could be used to assign identity for cycle and amino acids of the cognate conjugate complex, or other modalities that represent the information of the immobilized conjugate complex, and are amenable to group that information for analysis.
[0614] The term “recode tag” (e.g., “recodeTag”) refers to a nucleic acid molecule of any length, but typically in the range 15-60 bases, having a sequence comprised of an zth cycle tag complement, an AA tag complement, and an (z-l)th cycle tag complement. It provides identifying amino acid (or monomer subunit) information for its associated binding agent. It may uniquely identify one amino acid or may identify a class of amino acids with structural and/or functional similarity. A recode tag may provide a probabilistic estimate as to the identity of the amino acid component of an immobilized PTC-AA-cycle tag-conjugate complex, and thereby provide sufficient information for analysis. In certain embodiments, a recode tag may optionally comprise the ith cycle tag complement, an AA tag complement, and/or a universal assembly sequence or a complement of the universal assembly sequence that aids in the assembly of a memory oligo. In certain embodiments, a recode tag may optionally comprise a universal assembly sequence at both the 3’ and 5’ ends to facilitate memory oligo assembly without regard to the order of assembly of constituent recode blocks. In further embodiments, a recode tag may comprise a sequence facilitating amplification of recode blocks. In some embodiments, a recode tag may not comprise the ith cycle tag complement, but a site for attaching said cycle tag or its complement, or information identifying to the cycle of the workflow that is not identical to said cycle tag or its complement. [0615] The term “sample index” refers to an identifier incorporated during a post-recode preparation of a DNA library for NGS analysis, or an identifier that can be ligated as a component of a memory oligo during its assembly, and used during NGS analysis to identify the provenance of oligonucleotides in the DNA library.
[0616] The term “solid support” or “surface” refers to any solid material substrate in planar form, spherical form, or a combination of forms including, but not limited to: a solid bead, a porous bead, a solid planar material, a porous planar material, a patterned or non-patterned solid material, a nanoparticle, or a inorganic or polymeric microsphere, or a capillary. For example, the solid support may comprise a glass slide or wafer, a silicon slide or wafer, a PC, PTC, polyethylene (PE), high density polyethylene (HDPE), or other plastic slide, a teflon, nylon, nitrocellulose membrane, or borosilicate capillary, a ceramic surface or a gold surface. Particles and beads may be formed from polystyrene, cross-linked polystyrene, agarose, or acrylamide. Beads or nanoparticles may be magnetic or paramagnetic to support separation or purification processes. Solid supports may be passivated with glass, silicon oxide, tantalum pentoxide, DLC diamond-like carbon, or other passivation agents. A “solid support,” including membranes, may be passivated or activated via corona or other plasma treatments methods. Solid supports may further be assembled with other components to facilitate fluid transport and/or detection (e.g., flowcell, biochip, a microtiter plate. Solid supports may comprise an associated hydrogel that supports joining components for macromolecule recoding and/or analysis workflows. In certain examples, the term, “solid support” may include any of the described solid supports above further associated with a hydrogel.
[0617] The term “splint” refers to a nucleic acid with complementarity to the 5’ end of one nucleic acid and the 3’ end of another nucleic acid, such that hybridization of the splint to both nucleic acids brings the 5 ’and 3’ ends into proximity to promote either chemical or biological ligation.
[0618] The term “strobe sequencing” refers to a method of sequencing (e.g., nucleic acids, peptides, and other polymers) wherein short gapped reads, or interspersed subreads, are generated from a contiguous fragment rather than a single uninterrupted read. Such subreads are referred to as “strobe” or “strobed” reads.
[0619] As used in the present disclosure, the term “unique molecular identifier” or “UMI” refers to a nucleic acid molecule of length 10 to 40 bases that can be assembled into, e.g., the memory oligo and provides unique identification for in silico deconvolution of NGS sequencing data as to a specific memory oligo.
[0620] The term “universal priming site” or “universal primer” refers to a nucleic acid molecule, which may be used for library amplification and/or during NGS. Exemplary universal priming sequences can include P5, P7, P5’, P7’, SBS Read 1, and SBS Read 2 primers.
[0621] The term “universal sequence” or “universal assembly sequence” or “universal amplification sequence” refers to a common complementary polynucleotide sequence that can be appended to a 3’ and/or 5’ end of a tag, e.g., a recode tag, for facilitating amplification thereof with common primers or assembly into an oligo, e.g., a memory oligo. In certain embodiments, a universal sequence comprises a repetitive sequence, e.g., a dinucleotide repetitive sequence such as (GT)n, or other relatively short nucleotide motif. The universal sequence may be silent during sequencing of the oligo to facilitate efficient detection and analysis of the assembled constituents of the oligo.
[0622] The term, “workflow cycle” or “cycle” refers to the iteration number of any one of the operations of a process flow or method described herein.
[0623] Some embodiments refer to a nucleic acid or nucleic acid sequence. In some embodiments, a nucleic acid is or includes RNA. In some embodiments, a nucleic acid is or includes DNA. Some sequences include uracil (U) or (T), and in some embodiments, where applicable a U may be replaced with a T or vice versa when considering DNA or RNA.
[0624] Some embodiments refer to a sequence. The sequence may be included in the accompanying sequence listing. Any discrepancies between the sequence listing and specification may usually be resolved by referring to the sequence as described in the specification.
[0625] References to oligonucleotides are employed, and are included or named as in Table 5.
Table 5
[0626] Note that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “an oligo” refers to one or more oligos, and so forth. Additionally, it is to be understood that terms such as "left," "right," "top," "bottom," "front," "rear," "side," "height," "length," "width," "upper," "lower," "interior," "exterior," "inner," "outer" that may be used herein merely describe points of reference and do not necessarily limit embodiments of the present disclosure to any particular orientation or configuration. Furthermore, terms such as "first," "second," "third," etc., merely identify one of a number of portions, components, steps, operations, functions, and/or points of reference as disclosed herein, and likewise do not necessarily limit embodiments of the present disclosure to any particular configuration or orientation.
[0627] Any discrepancy between the written description and a sequence listing submitted herein may typically be resolved in favor of the written description, unless an error is readily apparent.
[0628] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. All publications mentioned herein are incorporated by reference for the purpose of describing and disclosing devices, methods and cell populations that may be used in connection with the presently described disclosure.
[0629] Where a range of values is provided, it is understood that each intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
[0630] In this description, numerous specific details are set forth to provide a more thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described in order to avoid obscuring the disclosure.
[0631] The functionalities described in connection with one embodiment are intended to be applicable to the additional embodiments described herein except where expressly stated or where the feature or function is incompatible with the additional embodiments. For example, where a given feature or function is expressly described in connection with one embodiment but not expressly mentioned in connection with an alternative embodiment, it should be understood that the feature or function may be deployed, utilized, or implemented in connection with the alternative embodiment unless the feature or function is incompatible with the alternative embodiment.
[0632] The practice of the techniques described herein may employ some conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including some recombinant techniques), cell biology, proteomics, biochemistry, and sequencing technology, within the skill of those who practice in the art. Specific illustrations of suitable techniques can be had by reference to the examples herein. However, other equivalent procedures can, of course, also be used. Some such techniques and descriptions can be found in standard laboratory manuals such as Green et al., Eds. (1999), Genome Analysis: A Laboratory Manual Series (Vols. I-IV); Weiner, Gabriel, Stephens, Eds. (2007), Genetic Variation: A Laboratory Manual; Dieffenbach, Dveksler, Eds. (2003), PCR Primer: A Laboratory Manual; Mount (2004), Bioinformatics: Sequence and Genome Analysis; Sambrook and Russell (2006), Condensed Protocols from Molecular Cloning: A Laboratory Manual; and Sambrook and Russell (2002), Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press); Stryer, L. (1995) Biochemistry (4th Ed.) W.H. Freeman, New York N.Y.; Gait, “Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press, London; Nelson and Cox (2000), Lehninger, Principles of Biochemistry 3rd Ed., W. H. Freeman Pub., New York, N.Y.; Berg et al. (2002) Biochemistry, 5th Ed., W.H. Freeman Pub., New York, N.Y.; all of which are herein incorporated in their entirety by reference for all purposes.
EXAMPLES
[0633] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present disclosure, and are not intended to limit the scope of what the inventors regard as their disclosure, nor are they intended to represent or imply that the experiments below are all of or the only experiments performed. It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the disclosure as shown in the specific aspects without departing from the spirit or scope of the disclosure as broadly described. The present aspects are, therefore, to be considered in all respects as illustrative and not restrictive.
Example 1: Characterization and Validation of Trifunctional Chemically Reactive Conjugate (TCRC) Function
[0634] This example describes the synthesis, testing of ITC function, binding, and characterization of a functional Propargyl-PITC-Oligo, which is a trifunctional chemically reactive conjugate. A chemically reactive conjugate (CRC) may include (x) a cycle tag (or a moiety for covalent attachment to a cycle tag such an aminoxy group in this example), (y) a reactive moiety (such as PITC in this example) for binding and cleaving the N-terminal amino acid residue of the peptide, exposing a next amino acid residue as an N-terminal amino acid residue on the cleaved peptide, and (z) an immobilizing moiety (such as propargyl in this example) for immobilization to a solid support. The ability to synthesize the trifunctional molecule, bind its reactive moiety to and N-terminal amino acid of an immobilized peptide, cleave the N-terminal amino acid, hybridize to the cycle tag, ligate the cycle tag, and bind the CRC to the solid support through the immobilizing moiety, was demonstrated using PPO, an example of a CRC compound as illustrated in FIG. 42B.
[0635] Thus, an example which has been shown to be functional herein is PPO is an example CRC that has been shown here to be functional. PPO (Propargyl-PITC-Oligo): l-(l-deoxyribonucleotido- indol-3-yl)-N-(12-(4-(3-(4-isothiocyanatophenyl)-3,9-dihydro-8H-dibenzo[b,f][l,2,3]triazolo[4,5- d]azocin-8-yl)-4-oxobutanoyl)-3,6,9,15,18-pentaoxa-12-azahenicos-20-yn-l-yl)-3,6,9,12,15- pentaoxa-2-azaoctadec- 1 -en- 18-amide.
[0636] Chemical names of intermediates that may be formed during synthesis, such as the synthesis shown in FIG. 42A-42B, may be as follows:
• PDA: N-(propargyl-PEG2)-DBCO-PEG3-Amine (Broadpharm cat# 29932)
• PDON : N-( 12-(4-( 11 , 12-didehydrodibenzo[b,f] azocin-5 (6H)-yl)-4-oxobutanoyl)-3 ,6,9, 15,18- pentaoxa- 12-azahenicos-20-yn- 1 -yl)-2,5 ,8 , 11,14-pentaoxa- 1 -azaheptadecan- 17-amide
• PDON-tBOC: PDON tert-butyloxycarbonyl
• PDO: l-(l-deoxyribonucleotido-indol-3-yl)-N-(12-(4-(l l,12-didehydrodibenzo[b,f]azocin- 5(6H)-yl)-4-oxobutanoyl)-3,6,9,15,18-pentaoxa-12-azahenicos-20-yn-l-yl)-3,6,9,12,15- pentaoxa-2-azaoctadec- 1 -en- 18-amide
[0637] As a preliminary test, in FIG. 35, a ~lkd model trifunctional molecule that included vanillin in place of an oligonucleotide was generated with: (1) phenyl isothiocyanate, (2) propargyl, and (3) model vanillin at the oligo position to simplify analytical characterization of the base structure: NNN- (Propargyl-PEG2) (6-oxo-6-(dibenzo[b,f]azacyclo oct-4-yn-l-yl)-caproic) (PEG3-l-acetamido-4-iso- thiocyanato-benzene). The molecular structure was confirmed using LC ESI-MS, and its function was tested. The HPLC analysis shows formation of a product with high yield indicating functional activity of the key reactive isothiocyanate moiety. A trifunctional molecule base was created with a modular design, so that each component and linkage may be exchanged for alternative structures, if needed. Its composition was designed for stability under cyclic Edman conditions while retaining downstream functionality:
• PEG is inert to acid and base degradation.
• The peptide linkages used to connect the modular components are similar to the internal peptide bonds of a protein that are largely unaffected during the Edman degradation process.
• 1,2,3-triazole is considered stable to anhydrous acid and basic conditions in ranges useful for the protein sequencing described herein. • The same oligonucleotide protecting groups as used during phosphoramidite synthesis may be used. The exposure to trichloroacetic acid (TCA) during synthesis of long oligos exceeds anticipated chemical stress of protein sequencing steps herein.
Synthesis of PPO, a trifunctional CRC
[0638] Synthesis of PDON-tBOC: N-(Propargyl-PEG2)-DBCO-PEG3 -Amine, TFA salt (PDA, Broadpharm cat# 29932, 4.56 mg, 0.0063 mmol) was dissolved in 200 pL of 100 mM pH 8.65 phosphate buffer and mixed with 15.8 pL of 400 mM carbonate buffer pH 9.6. t-Boc-Aminooxy-PEG4- NHS ester (Broadpharm cat# 24429, 10 mg, 0.021 mmol) was dissolved in 100 pL of DMSO. Solutions were combined and mixed by pipette, 200 pL of dimethylsulfoxide (DMSO) was added and the reaction was incubated at room temperature (RT) for 18 hours. The product was purified using high-performance liquid chromatography (HPLC). An electrospray ionization-mass spectrometry (ESI-MS) peak at m/z = 969 (positive mode) [M+H]+ indicated successful synthesis.
[0639] Synthesis of PDON: PDON-tBOC was evaporated at reduced pressure at 45C for 3 hrs then redissolved in dichloromethane (100 ul). Trifluoroacetic acid was added (30 pL), and the mixture was incubated at room temperature (RT) for 1.5 hrs, neutralized by adding 180 pL of 4. IM imidazole in acetonitrile/methanol (2:3 v:v) and purified using HPLC. The successful synthesis of the intermediate, PDON, was confirmed through ESI-MS analysis. Specifically, the observation of a peak at m/z = 869 [M+H] + indicated the successful synthesis of PDON.
[0640] Synthesis of PDO: PDON was partially evaporated at reduced pressure at 45C for 3 hrs. The concentration was quantified using optical density measurement at 310 nm (OD = 45, 3.4 mM). Sys3 SOC Oligonucleotide
(/5Phos/TGAAGGG/iFormInd/TGACCTAGCAATGGTGAAGTTAATGCAGGTAGTTAAG (SEQ ID NO: 108), Integrated DNA Technology, 178.8 nmol, where iFormlnd denotes a formylindole modification for subsequent tethering)) was resuspended in 100 pL IxSSPE buffer (Sigma), and 10 pL of the oligo solution was added to 10 pL 11.3 mM 5-aminoindole, and 20 pL 390 mM pH 5.5 acetate buffer. To this mixture, 15 pL PDON solution was added, the solution was mixed by pipette, and the reaction was incubated at RT for 18 hr. Following the reaction, the product was purified using high- performance liquid chromatography (HPLC), and then dried at reduced pressure at a temperature of 45°C for 4 hrs. Electrospray ionization time-of-flight mass spectrometry (ESI-TOF-MS) analysis was conducted on the product, which produced a peak at m/z = 14920 [M+H] +, indicating successful synthesis of the compound PDO. In a control experiment, the mass spectrum of the Sys3 SOC oligonucleotide alone was found to have a peak at m/z = 14069 [M+H] +.
[0641] Synthesis of PPO: PDO was resuspended in 28 pL milli-Q water (OD260 = 20, 43 uM). 4- azidophenylisothiocyanate (N3PITC, 1.30 mg, 0.0074 mmol) was dissolved in 1 mL DMSO to form a 7.4 mM solution). 90 pL of the N3PITC solution was added to the PDO solution and pipette mixed. The reaction was incubated at RT for 3 hr. After this incubation period, the product was purified using high-performance liquid chromatography (HPLC), which resulted in two prominent peaks at 14 and 18 minutes. These product peaks were further analyzed using quadrupole time-of-flight mass spectrometry (QTOF MS). This analysis yielded a peak at m/z = 15094 [M]+ for the product corresponding to the 14-minute mark in the HPLC analysis. For the product corresponding to the 18-minute mark, a peak at m/z = 15095 [M+H]+ was observed, indicating successful completion of the synthesis step. The two peaks may be assumed to be isomers (e.g. regioisomers of the DBCO-azide adduct) due to the same mass and functional testing performance.
Testing of ITC function
[0642] The functionality of the isothiocyanate (ITC) group was examined through solution phase testing of PPO. HPLC -purified fractions of PPO, which were suspended in a solution of 35 mM TEAA with 5% acetonitrile, were used. To each 100 pL aliquot of these purified fractions, 10 p L of 400 mM carbonate buffer (pH 9.6) was added, thenl pL of a 10 mM solution of FAM-PEG3-NH2 (Broadpharm cat# BP-20958) in DMSO. The reaction mixture was thoroughly mixed using a pipette and subsequently incubated at RT for a period of Ihr. Following this incubation period, the reaction samples were analyzed using high-performance liquid chromatography (HPLC). FAM-PEG3-NH2, a fluorescent dye, was separately analyzed under the same buffer conditions as the reaction mixture for use as a control. Notably, the HPLC analysis of the reaction samples indicated that the retention times had shifted towards a shorter time from the original retention time. Furthermore, absorbance at 488 nm, corresponding to the FAM fluorophore, was observed in the HPLC chromatogram. These observations were indicative of a successful conjugation of FAM-PEG3-NH2 to PPO, thereby validating the functionality of the ITC group. The ITC group is an example of a reactive moiety for binding an N- terminal amino acid residue or a peptide.
Binding PPO through the ITC group to a surface and to an oligo tag
[0643] Testing was conducted on the HPLC -purified fractions of PPO, suspended in a solution of 35 mM TEAA and 5% acetonitrile. PPO was combined with Phosphate buffer pH 7.2, 50 mM tris (3- hydroxypropyltriazolylmethyl) amine (THPTA), 10 mM CuSO4, 100 mM sodium ascorbate, 1% 10 pm azide-functional silica beads (Nanocs cat# Sil0u-AZ-l). The mixture was mixed by pipette and incubated at RT for an hour. Subsequently, 2 pL of a 10 mM solution of FAM-PEG3-NH2 (Broadpharm cat#20958) in DMSO and 20 pL of a 400 mM carbonate buffer solution (pH 9.6) were added, and the reaction was continued for one hour. The beads were washed 5X with D.I. H2O using a centrifugation method. The beads were analyzed using a fluorescence plate reader (484 nm excitation, 530 nm emission).
[0644] Control reactions were performed in parallel, one without the addition of the copper catalyst to the PPO/azide beads, and the other without PPO. The results, shown in FIG. 43, demonstrated fluorescence intensity above background in the beads that had undergone the reaction with PPO in the presence of copper compared to the controls. This indicated the successful functionality of both the solid support binding group and the ITC group, demonstrating the ability to bind and retain the N- terminal model FAM-PEG3-NH2 and conjugate to solid support.
[0645] A complementary oligonucleotide tagged with a fluorophore (5’TET/TAACTTCACCATTGC (SEQ ID NO: 124), where TET is tetrachlorofluorescein) was hybridized to the PPO-functionalized beads. This procedure was conducted at room temperature for 5 minutes, using a 1 pM concentration in a 2xPBST buffer. Following the hybridization, the beads underwent a washing process involving 5 rounds of rinse with 1 mL of 2xPBST buffer. The washed beads were subsequently analyzed on a fluorescent plate reader (515 nm excitation and 545 nm emission).
[0646] The results, shown in FIG.44, demonstrated that the beads with the immobilized PPG exhibited a higher fluorescence intensity than the background control beads. This finding confirms the functionality of the cycle tag oligonucleotide on the solid-support-bound CRC, thus validating the trifunctional nature of the CRC.
[0647] In another embodiment of a solid support, a borosilicate glass slide underwent an organic solvent and acid bath cleaning procedure. The slide was rinsed copiously with water and dried at 100 degrees Celsius for 10 minutes. The slide was then silanized with a 2.5% by weight solution of 3- aminopropyltriethoxysilane in ethanol at room temperature for one hour. Subsequent rinse with ethanol and drying at 100 degrees Celsius for an hour completed the slide surface preparation. Selected positions of the slide were treated with 10 pL fractions of PPO mixed with 1 pL of 400 mM pH 9.6 carbonate buffer and incubated at room temperature for an hour. The positions were subjected to several water rinses, and each position received 20 pL of a mixture comprising 2 pL 10 mM FAM-PEG4-N3 (Broadpharm cat#B P-23405) in DMSO, 10 pL 10 mM CuSO4 in water, 10 pL 50 mM THPTA in water, 20 pL 200 mM phosphate buffer pH 7.2, and 20 pL 100 mM sodium ascorbate in water. Control wells were prepared using the same solution but excluding CuSO4. The reaction was allowed to proceed for one hour, after which the positions were rinsed copiously with water. Fluorescence analysis was performed using a plate reader (484 nm excitation, 530 nm emission). The results, shown in FIG. 45 indicated that wells incorporating copper produced a more intense fluorescence signal compared to the background control wells. This confirms the capability to use the CRC on multiple embodiments of solid support.
Cleavage of N-terminal amino acid and exposure of next amino acid residue
[0648] The functional ability of the reactive moiety of the CRC to bind and cleave the N-terminal amino acid was tested, showing cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as a N-terminal amino acid residue on the cleaved peptide. [0649] PPO Sysl SOC: PPO was synthesized using Sysl SOC oligonucleotide (/5Phos/ATGAGTG/iFormInd/AGGGAAATAGCTTCTGGTCGAACTAGTTGTTCGTCAA) (SEQ ID NO: 75) in a similar manner to that described for the Sys3 SOC oligonucleotide.
[0650] Azide functional beads: 2 mL of amine-functionalized silica beads (CD Bioparticles cat DNG- F046, 20 um dia, 5 wt% solids, 4 umol amine/g) were subjected to centrifugation at 21,000 ref for 1 min, then resuspended in a 0.5 mL solution of pH 9.6 400 mM carbonate buffer. A separate solution was prepared by dissolving 28 mg of azidoacetic acid NHS ester (141 mmol, Broadpharm BP-22467) in 0.2 mL DMSO. The two solutions were then combined, and an additional 0.5 mL DMSO was introduced to solubilize any precipitate that had formed. The resulting mixture was incubated in an Eppendorf tube on a rotator for 2.5 hr at ambient temperature. The beads were subsequently washed by adding 1 mL volumes of the following solutions in sequence: water, acetonitrile, water, DMSO, water. After each addition, the solution was resuspended by shaking, then centrifuged (21k ref 1 min), and the supernatant was removed. The beads were finally resuspended in 1.25 mL of water, creating an 8 wt% slurry.
[0651] Peptide Functional Beads: The peptide (0.5 mg, 860 g/mol, sequence from N-terminus to C- terminus: {pTyr} {Ser} {Ser} {pTyr} {Ser} -propargyl) (SEQ ID NO: 219) was dissolved in 0.5 mL water to create a 1.16 mM solution. Three peptide immobilization reactions were initiated by combining the reactants in Table 6 (volumes in uL). Reaction A was conducted at 50C for 1 hr on a rotator, while Reactions B and C were left to incubate at ambient temperature on a 600 rpm shaker for 1 hr.
[0652] The beads were subsequently washed by adding 1 mL volumes of various solutions in the following order: 100 mM pH 9.6 carbonate buffer, DMSO, water, 100 mM pH 9.6 carbonate buffer, water, DMSO. After each addition, the solution was resuspended through shaking, centrifuged (21,000 ref 1 min), and the supernatant was removed. The DMSO solution was incubated with the beads at 57C for 4 min. This was followed by washing with acetonitrile, water, 100 mM pH 9.6 carbonate buffer, and water. The carbonate buffer was incubated with the beads for 10 min at ambient temperature.
[0653] The beads were analyzed using a fluorescent plate reader (545 nm excitation, 586 nm emission).
Table 6
[0654] Immobilization of PPO-System 1 on beads: To each bead aliquot, 100 uL of PPO-Sysl SOC (OD260 = 2, -4 uM) and 20 uL of 400 mM carbonate buffer pH 9.6 were added. The resulting mixture was incubated at ambient temperature for 30 min on a rotator. Afterward, the beads were centrifuged, and the supernatant was removed. A second aliquot of 100 uL PPO-Sysl SOC, 40 uL of 133 mM carbonate buffer pH 9.2, and 120 uL of IM NaCl were then added. The reaction was again incubated for 30 min on a rotator at ambient temperature.
[0655] The beads were washed via centrifugation with 1 mL of water and 1 mL of 2x phosphate- buffered saline with 0.2% Tween 20 (2xPBST). A fluorescent complementary oligo to Sysl SOC (/5Alex546N/TTCGACCAGAAGCTA, SEQ ID NO: 218) was dissolved in 2x PBST buffer to a concentration of 1 uM, and 0.3 mL of this solution was incubated with the beads for 5 min at ambient temperature.
[0656] The beads were subsequently washed thoroughly with 2xPBST. Both the washed beads and the supernatant were analyzed on a fluorescent plate reader (545 nm excitation, 586 nm emission). The beads were dehybridized using NaOH. The beads were washed with water and read on the plate reader, along with the supernatant from the dehybridization. The Cu-catalyzed Huisgen reaction was performed to immobilize PPG on the bead surface for reactions B and C. The incubation was performed for 20 min on a rotator at 37C.
[0657] Edman Degradation: The beads were exchanged into anhydrous acetonitrile (Sigma Aldrich 99.8%, catalog number 271004), and brought to 50% (v/v) trifluoroacetic acid (TFA). The resulting mixture was incubated at 46C for 25 min. The reactions were subsequently neutralized with 4.1 M imidazole in a 2:3 (v: v) acetonitrile: methanol solution, and exchanged into 133 mM pH 9.2 carbonate buffer.
[0658] PPO-Sys3 SOC Immobilization: The beads were added to a solution comprising 100 uL of PPO-system 3 (18 min retention time peak, ~0.5 OD, -1 uM), 80 uL of 133 mM pH 9.2 carbonate buffer, and 120 uL of IM NaCl. The reactions were incubated on a rotator at 37C for 30 min. Subsequently, the beads were exchanged into 2x PBST, and analyzed on the fluorescence plate reader. [0659] The beads were hybridized with a solution of a fluorescent complementary oligo to Sys3 SOC (5TET/TAACTTCACCATTGC) (SEQ ID NO: 124) at 2 uM in 2x PBST for 5 min at ambient temperature. The beads were subsequently washed five times with 2x PBST. Both the supernatant and beads were analyzed on a fluorescent plate reader (500 nm excitation, 550 nm emission). Supernatant was removed NaOH was added to dehybridize the beads. The dehybridization solution was analyzed, and the beads were washed copiously with water, resuspended in 2x PBST and analyzed on the fluorescent plate reader. [0660] As demonstrated in FIG. 29, the beads exhibited an increase in fluorescence during the hybridization reactions with the fluorescent complementary oligo to Sysl SOC. Significant fluorescence was detected in the dehybridization solutions, and the beads subsequently lost most of their fluorescence following the dehybridization treatment. After undergoing Edman degradation, and with the PPO Sys3-SOC immobilization, the hybridization with the fluorescent Sys3 SOC complementary oligo resulted in a fluorescence level akin to that observed during the Sysl SOC hybridization. Upon dehybridization, the dehybridization solutions again displayed significant fluorescence, and the beads, in turn, lost most of their fluorescence.
[0661] These results indicate a chemically-reactive conjugate can be synthesized and contacting an immobilized peptide with a chemically-reactive conjugate, thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex is viable. These results further indicate that the chemically-reactive conjugate can immobilize the conjugate complex to the solid support via the immobilizing moiety to provide an immobilized amino acid complex. These results further indicate that a chemically-reactive conjugate can cleave and thereby separate the N- terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as a N- terminal amino acid residue on the cleaved peptide.
Example 2: Assembly of a Recode Block
[0662] This example describes an experiment that achieved successful ligation of model recode block oligos using T4 DNA Ligase. In this example, ligation under standard conditions is demonstrated to the 5’ and 3’ ends of a model cycle tag having a formylindole modification of nucleobase internal to 5’ and 3’ ends of the oligonucleotide. Formylindole nucleobase modification of a cycle tag oligonucleotide may facilitate synthesis of a CRC having an oligonucleotide moiety. For example, aminoxy-PEGl - azide (ONH2-PEG-N3, broadpharm cat#23596) may be conjugated to a cycle tag oligonucleotide, which has a formylindole modification. The aminoxy group of an aminoxy-PEGl -azide will react with the aldehyde group on the formylindole nucleobase to form an oxime bond. The azide group can be used to generate further linkages, if desired.
[0663] Accordingly, lOOmM Aminoxy-PEGl -azide was mixed with a 5mM solution of 5-aminoindole catalyst at pH 6. An oligo solution of Sysl-SOC oligonucleotide (SEQ ID NO: 75), was prepared at 100 pM. Thes reaction components were mixed and incubated at 40°C for 24hrs. An aliquot of the product was reacted with alkyne-FAM under standard Huisgen reaction conditions to confirm the reaction product was formed. HPLC confirmed the product by a shift in the peak of the oligos and association of 488nm absorption with the oligonucleotide elution peak. In addition to the above samples, a series of controls were prepared, including reactions where the CuSO4 was omitted from the cycloaddition reaction. The product was purified using HPLC, recovered in 35mm TEAA: acetonitrile, dried and resuspended in SSPE. Concentration of the purified ssDNA was quantified using the Qubit assay (Thermofisher) to determine appropriate DNA concentration into the ligation reaction. Ligation oligos for the 3’ end (SEQ ID NO: 85) and 5’ end (SEQ ID NO: 84), splint oligos (SEQ ID NO: 78 and SEQ ID NO: 79), Sys#001 SOC oligo (SEQ ID NO: 75) (with and without out aminooxy-PEG-azide conjugated), T4 DNA Ligase (M0202L, NEB), T4 DNA Ligase buffer 10X (B0202S, NEB), MilliQ water, and a comparator oligo Sys#001 COM-105 (SEQ ID NO: 88) were mixed to create various ligation conditions according to the method provided by New England Biolabs (NEB). The process was initiated in a microcentrifuge tube, which was maintained 4C. Oligonucleotides were utilized at ~0.2 pM. Following the assembly of the mixture, all the ingredients, excluding the ligase, were vortexed to ensure the homogeneity of the mixture, and subsequently centrifuged. T4 DNA Ligase was added and the components were mixed with gentle pipette mixing and left at RT for 30 mins, followed by a 65C 10 min. heat-inactivation of the ligation mixture. Ligation products were analyzed (4% agarose) E-Gel Power Snap Electrophoresis System User Guide, "E-Gel EX 4%". A DNA ladder (cat# 10597012 from Invitrogen) was prepared, following the indicated procedures, and denatured in 0.1M NaOH before loading on the gel. Gel electrophoresis (FIG. 36) showed the successful creation of the desired product and successful ligation in the presence of modified bases internal to the 5’ and 3’ ends of the SOC oligonucleotide. In lane 2 are the products from the ligation of the 45-mer oligo with tether arm with a 30-mer ligation oligo on both the 3' (SEQ ID NO: 85) and 5' (SEQ ID NO: 84) ends. In addition, PCR was conducted on ligation output (Fig. 47) showing amplification of ligated oligos both with and without internally modified bases. These results support method described herein to generate recode blocks and memory oligos, and indicates that within each formed affinity complex, a cycle tag or a reverse complement thereof to a recode tag can be joined to form a recode block, thereby creating a plurality of recode blocks, each recode block corresponding with a formed affinity complex as well as two or members of the plurality of recode blocks can be joined to form a memory oligonucleotide.
Example 3: Validate Affinity Binding Capability and Binder Fidelity
[0664] This example describes validation of the affinity binding capacity, binding kinetics, and binder affinity to contact the immobilized amino acid complexes with a binding moiety. In some approaches, binder fidelity plays a role in the sequencing accuracy. An in-silico simulation was conducted to assess the impact of binder fidelity on the accuracy of protein identification. A probability matrix was computed for a set of analyte-ligand complexes using empirically determined binding constants of N- terminal amino acid binding proteins (NAABs from Rodriques et al, see FIG. 46A-46B).
[0665] These dissociation constants (Kd) were converted into association constants to represent the affinity between each amino acid pair. Then the partition function was computed for each analyte using principles of thermodynamics. This process accounted for all possible states of each analyte, whether it was unbound or bound to any ligand. [0666] By applying the law of mass action, the steady-state concentrations of each analyte-ligand complex were computed. This computation ensured the total conservation of each analyte's concentration across its potential states, thus allowing us to determine the occupancy rate of bound pairs in a competitive binding system. The calculated bound occupancy rate then served as input for further simulations.
[0667] A cohort of proteins randomly selected from the UniProt database was mutated according to the steady-state probabilities in the matrix to simulate a 'measured cohort' using the NAABs. The 'measured cohort' was mapped using an in-house custom alignment algorithm to evaluate the impact of binder infidelity. A custom alignment algorithm was also developed to assess the alignability of the mutated proteins with the reference proteins. This algorithm utilized the Levenshtein distance between pairs of mutated peptide strings and reference proteins. The distance between letters as a function of the inverse of the probability matrix elements was also accounted for. This approach can ensure that likely mutations are perceived as closer to the reference string than unlikely mutations.
[0668] The study resulted in compelling evidence of the approach's effectiveness, showing nearly perfect outcomes even when sequencing the first 12 amino acids from the N-terminus (as shown in FIG. 46A-46B). These findings substantiate the potential of the methodologies described herein for accurate alignment to the proteome despite binder variability using existing binders. These simulations demonstrate the potential to accurately identify and quantify proteins against the reference proteome even with the relative affinities of N-terminal acid binders currently developed. N-terminal amino acid binders represent a more difficult case than isolated amino acids as the local environment varies due to different nearest neighbor amino acids, showing clear ability to develop binders for the method described herein.
[0669] Experimental validations were carried out to measure binding kinetics using a high-throughput digital benchtop surface plasmon resonance (SPR) system (Nicoya Alto). The measurement includes loading samples and reagents into a 16-Channel Carboxyl disposable digital fluidics cartridge (part # KC-CBX-PEG-16) that contains optical sensors, thermal zones, a bottom plate consisting of electrodes, and a top plate with wells to load reagents. The reagents include cartridge fluid, capture kits (consisting of reagents such as low and high refractive index normalization fluids (4% and 32% glycerol), EDC, NHS, lOmM HC1, and IM Ethanolamine, lOmM Sodium Acetate, and lOmM MES), and Streptavidin Reagent Kit (part # ALTO-R-STV-KIT). The experiment included adjusting ligand concentration, salt concentrations, and analyte concentrations to provide optimal density for analyte binding on the 48 analyte wells of the 16-Channel Carboxyl disposable cartridge.
[0670] For the samples, an off-the-shelf anti-phosphotyrosine antibody (Sigma, 05-321) was used, and it’ s binding to a custom synthesized and immobilized PTH-phosphotyrosine conjugate was observed. The experiment demonstrated strong binding (KD=9.6nM), with no detectable binding observed for a series of non-cognate conjugates. This indicated suitable discrimination between phosphotyrosine and other amino acids with commercially available antibodies, including post-translationally modified amino acids, using a commercially available anti-phosphotyrosine antibody. The empirical fidelity observed here even surpasses that assumed in the in-silico simulation which was itself sufficient for high fidelity identification of peptides, providing strong evidence of the effectiveness of the methods described herein.
[0671] Finally, resilience to variability in Edman degradation efficiency was assessed. Simulations showed that even with degradation efficiencies as low as 70% average for all cleavage cycles, there were no significant issues in alignment to reference proteins. This is because the ordered cleavage still results in unique, alignable “fingerprints.” This resilience to variation in Edman degradation efficiency, which can significantly depend on the identity of the N-terminal amino acid, further underscores the robustness and versatility of the methods described herein.
[0672] This shows the ability to contact the immobilized amino acid complexes with a binding moiety for preferentially binding to one or to a subset of the immobilized amino acid complexes using existing binders in the art.
Example 4: Recodins
[0673] This example describes the multi-step construction of chemically-reactive conjugates and the binding agents, and provides the desired outcomes of the chronological performance of some embodiments of the recoding processes. Biologically or synthetically derived samples may be manipulated prior to the recoding process. These manipulations may include lysis, purification, enrichment, protein fragmentation, etc. Serine proteases (or serine endopeptidases) include a broad class of enzymes that cleave peptide bonds in proteins. The trypsin-like proteases cleave peptide bonds following a positively charged amino acid (lysine or arginine), while chymotrypsin-like serine proteases have specificity for hydrophobic residues, such as tyrosine, phenylalanine, and tryptophan. Digestions using these reagents include time titration, and controlled protease and protein concentrations to generate peptides in the range of 20 to 200 amino acids. ThermoFisher, Sigma, and others offer a comprehensive and broad range of products to accommodate a variety of sample preparation strategies. Pre-formulated reagents and robust methods for the preparation of high-quality samples that are ready for MS analysis in less than 3 hours are available. See, e.g., Sample Preparation for Mass Spectrometry ThermoFisher Scientific, 2022. These procedures include methods for protein extractions from lysates, abundant protein depletion, protein digestion, peptide clean-up, and are amenable to recode sample preparation. Timing of procedural steps may be modified to achieve peptide lengths within a desired range. Peptide length distributions may be measured using polyacrylamide gel electrophoresis.
[0674] Solid supports for immobilization of peptides, conjugates, and nucleic acid primers may be formed by spin coating 500uL of hydrogel polymer using a Sigma Chemat precision spin-coater at 500 rpm for 1 minute onto a corning glass slide. Hydrogel polymer can be obtained by co-polymerization of acrylamide with modified acrylate-based monomers having sidechains that include hydrazine, having sidechains that include amine, and having sidechains that include azide. Briefly, a RAFT polymerization of acrylamide and acrylate may follow procedures as described by Palmiero et.al. The RAFT copolymerization of acrylic acid and acrylamide in Polymer (2016), 98, 156-164. The coated substrate is then assembled into a flowcell by sandwiching a SA-S-4L Grace Bio-Labs double-sided adhesive gasket between the coated Corning slide and a cover slide to create a ~500um channel that facilitates fluid administration.
[0675] Peptides are anchored to the hydrogel via an end-terminal or internal carboxyl group using carbodiimide-mediated conjugation. This is the most frequently used technique, since EDC (N-(3- Dimethylaminopropyl)-N’ -ethylcarbodiimide) is readily obtained commercially, and protocols are well known (Hermanson, 1996, Bioconjugate Techniques, Academic Press Inc.). Primers are anchored to the hydrogel via an aldehyde modification at the 5’ end of the primer oligonucleotides, e.g. P5 and P7 possible containing sample indexes and/or UMIs. The reaction is completed in phosphate-buffered saline (137 mM Na+, 2.7 mM K+, 12 mM phosphate, pH 7.4 at 25 °C for 2 hours.
[0676] In one approach, chemically-reactive conjugates may be constructed in multiple steps (e.g., as shown in FIG. 29). Briefly, an aliphatic hydrazine is derivatized to a carbon of the phenyl ring of phenylisothiocyanate. A 3mer reagent with trifunctional orthogonally reactive groups is synthesized using well known phosphoramidite chemical protocols to connect a 1-Ethynyl-dSpacer CE Phosphoramidite (Glen Research, Cat#10-1910) with a 5-Formylindole-CE Phosphoramidite (Glen Research, Cat#10-1934) and S-Bz-Thiol-Modifier C6-dT (Glen Research, Cat#10-1538). Conjugation of the phenylisothiocyanate -hydrazine derivative to the 3mer is accomplished with the derivative in excess under neutral pH conditions at mM concentration at room temperature for 6 hrs. A cycle tag oligo having an internal modified T nucleobase is reacted with a slight molar excess of SPDP-PEG- succinimidyl(NHS) valerate (Broad Pharma Cat# BP-25336) at ImM in alkaline conditions (pH 7.2 to 9 borate buffer) at room temperature for 60 minutes. The NHS is preferentially reactive to the primary amine of the modified-dT over amines attached directly to the nucleobases. The molecular weight of BP-25336 is 5000 daltons, thus length is approximately 50 nm. Unreacted NHS-PEG-SPDP crosslinker is removed by hybridization of the complex to complementary immobilized DNA, followed by washing. The SPDP-PEG-cycleTag is elute under basic conditions. Finally, the SPDP group is reacted to phenylisothiocyanate-hydrazine-3mer conjugate in 100 mM sodium phosphate pH 7.2 to 8.0, 1 mM EDTA, at room temperature for 8 to 16 hrs. Fully functional chemically-reactive conjugate complex is separated from impurities by hybridization to DNA complementary to cycle tag sequences immobilized on beads, washed, and eluted for use in the recoding process. It is recognized that multiple routes to produce the conjugate are possible based on modular conjugation chemistries.
[0677] In one approach, binding agents are constructed in multiple steps. Briefly, a 5’ alkyne-labeled DNA recode tag oligonucleotide is first coupled to azido-PEG8 -hydrazide HC1 Salt (BroadPharma, Cat # BP-24118) under conditions and using protocols that are well known to form a oligo-azido-PEG8- hydrazide unit (lOmM ascorbic acid, 2mM PMDETA, and 0.5mM Cu2+ catalyst, Presolski et al. (2011) Copper-Catalyzed Azide-Alkyne Click Chemistry for Bioconjugation. Current Protocols in Chemical Biology. 3(4), 153-162; Hong et al., (2009) Analysis and Optimization of Copper Catalyzed Azide- Alkyne Cycloaddition for Bioconjugation. Angew. Chem. Int. Ed., 48(52), 9879-9883). This unit is then joined to a binding moiety scFV by expressing the recombinant scFV with an N-terminal serine, treating the scFv under mildly oxidative conditions using periodate to convert the N-terminal serine to aldehyde (Chelius et.al., 2002, Bioconjugate Chem. 2003, 14, 1, 205-211), exchanging buffer into phosphate-buffered saline (137 mM Na+, 2.7 mM K+, 12 mM phosphate, pH 7.4), and then reacting the oligo-azido-PEG8-hydrazide unit with the scFv at 25 °C for 2 hours. It is recognized that multiple routes to produce the binding agents are also possible based on modular conjugation chemistries. [0678] Contacting the N-terminal amino acid of the immobilized peptide with a chemically-reactive conjugate is accomplished in either aqueous or organic solution. Coupling of phenylisothiocyanate (PITC) to the a-amino group of a peptide or protein occurs under many experimental conditions. In 0.4M dimethylallylamine (DMAA) in propanol-water (60:40 v/v) adjusted to pH 9.5 with TFA results in complete coupling in 30 min at 45°C. Aqueous conditions at pH 8 at 45C have also been reported (Matsudaira, (1993) in A Practical Guide to Protein and Peptide Purification for Microsequencing (Second Edition), ppl04-123).
[0679] Unreacted PITC-conjugate is washed from the surface extensively using 5 flowcell volumes of PBS. The solution is exchanged for click reaction buffer (neutral pH PBS, 2mM PMDETA, ImM Cu2+, lOrnM ascorbate) and the alkyne groups of the conjugates react with the surface-bound azide groups (30 min at room temperature).
[0680] Cleaving the N-terminal amino acid via cyclization in anhydrous trifluoroacetic acid (TFA) to form the 2-anilino-5-thiazolinone can damage DNA that is not protected. The recode workflow may be inherently compatible with multiple variations of acidic conditions for this step, because precautions to protect the cycle tag oligo are readily incorporated and include: retaining the protecting groups used during nucleic acid synthesis through the first 4 steps shown in FIG. 14. Protecting groups include N(6)-benzoyl A, N(4)-benzoyl C, and N(2) -isobutyryl G, or protecting groups that are removable under more mild conditions, e.g., phenoxyacetyl (Pac) protected dA and 4-isopropyl- phenoxy acetyl (iPr-Pac) protected dG, along with acetyl protected dC. These are commercially available and meet the desired criteria for ultra-mild deprotection described below.
[0681] Repetition of operations 2-4 of the process 300 in FIG. 3 results in a lawn of immobilized PTC- AA-cycleTag conjugates. Deprotection of nucleic acid protecting groups is accomplished with ammonium hydroxide, or 0.4 M sodium hydroxide in methanol/water (4:1) in 2 hours at room temperature, or 4 hours with 0.05M potassium carbonate in methanol.
[0682] Amino acid information is associated with cycle information by contacting the immobilized PTC-AA-cycle tag conjugates with binding agents and transferring the recode tag information of the binding agent to the cognate cycle tag of the immobilized conjugate to create an immobilized recode block. Exemplary scFv-recode tag binding conditions include: PBS at neutral pH, EDTA ImM, slow annealing from 37C to 4C with a ramp of 1C per minute. Washing excess binding agent is accomplished by exchanging 5 flowcell volumes at 4C with PBS pH 11, 10 mM MgCh, 50 pg/ml BSA, 0.1% TX- 100. The wash step is followed by ligation. Exemplary enzymatic T4 DNA ligation reaction conditions are: PBS pH 7.8, 10 mM MgCl2, 0.1 mM DTT, 1 mM ATP, 50 pg/ml BSA, 0.1% TX-100, 2.0 U/pL T4 DNA ligase (New England Biolabs), O.luM 5’ phosphorylated ligation oligo (each) at room temperature for 1 hr. Conditions using HiFi Taq DNA Ligase (New England Biolabs, cat#M0647S) are similar with addition of ImM NAD+), and may provide additional fidelity to reduce unintended ligation. Repetition of the binding, wash and ligation steps 10 times drives toward completion of recode block assembly.
[0683] Memory oligo assembly is accomplished by adding 5 ’phosphorylated AA tag oligos having complementary sequence to the A A tag sequence of the recode blocks. Ligation conditions are: PBS pH 7.8, 10 mM MgCh, 0.1 mM DTT, 1 mM ATP, 50 pg/ml BSA, 0.1% TX-100, 2.0 U/pL T4 DNA ligase (New England Biolabs), O.luM 5’ phosphorylated AA tag complements (each) at room temperature for 1 hr.
[0684] Linking oligos can remediate incomplete memory oligo assembly. Also, in this step, attachment of nucleic acids having universal primer, sample indexes, and/or UMIs can be added by ligation to the ends of the memory oligo. The primers, indexes, UMIs, etc. may be bound to the solid support or free in solution. Ligation conditions are: PBS pH 7.8, 10 mM MgCh, 0.1 mM DTT, 1 mM ATP, 50 pg/ml BSA, 0.1% TX-100, 2.0 U/pL T4 DNA ligase (New England Biolabs), O.luM 5’ phosphorylated linking oligos (each) at room temperature for 1 hr.
[0685] Tethers of the recode blocks may be cleaved using 4mM dithiothreitol (DTT) in neutral pH PBS, 1 mM EDTA, to provide greater freedom for any non-ligated recode blocks or memory oligo fragments to come into proximity. Following cleavage of the SPDP linker and washing using 5 flowcell volumes, ligation using linking oligos can be repeated to ensure memory oligo assembly results in an amplicon that can be analyzed using NGS.
Example 5: Alternative Events during a Recodins Process
[0686] The example describes alternative events due to incomplete reactions or other causes, process efficiencies, and how alternative events may be addressed.
[0687] As a baseline and framework, each operation of the recoding process can be assigned an efficiency value. These target efficiencies are noted below and may be used within a system model to predict overall efficiency. Assuming:
[0688] (Operation 2) PITC binding to N-terminal AA (target efficiency: 0.99)
[0689] (Operation 3) Immobilization of PTC conjugate to hydrogel (target efficiency: 0.95)
[0690] (Operation 4a) Edman cleavage (target efficiency: 0.98)
[0691] (Operation 4b) Nucleotide deprotection (target efficiency: 0.99) [0692] (Operations 5a, b) ‘Binder’ recognition /retention onto a PITC-conjugate - repeated 10 X a) Only 20% converted to correct block each attempt b) Assuming 10 iterative cycles of step 5: 1 - (1 - 2O%)10 = (target efficiency: 0.89)
[0693] (Operations 5c, d) Information transfer to create a ‘block’ , ligation efficiency (0.95)
[0694] (Operation 6) Memory oligo assembly (target efficiency: 0.9)
[0695] (Operation 7) Linking oligo ligation (target efficiency: 1)
[0696] The product of these stepwise efficiencies is referred to as the overall efficiency, and these target values predict that that on average a memory oligo will represent -80% of the attempted information for -90% of the immobilized analytes (e.g., peptides).
[0697] A recode sequence (memory oligo) may imperfectly represent the true physical sequence of a sample analyte due to alternative events within the recoding process. Thus, as a baseline, it is important to establish that incomplete or probabilistic information associated with an imperfect recode sequence is valuable for the identification of proteins and their concentrations in a sample. As proof, a random sampling of contiguous and non-contiguous 20 amino acid “reads” from an E.coli 6-phosphogluconate dehydrogenase sequence in Uniprot allowed unambiguous mapping of 100% of these reads to this specific dehydrogenase, i.e., there were no matches with the sequences of any other proteins in the E. coli proteome. In this example, the 20 amino acid identities and their relative sequence were drawn from a set of 30 amino acids from which identity and sequence information was attempted to be drawn, i.e., 30 recode cycles where only 20 successfully provided information. This demonstrates the value of analysis given only partial identification information for a component or components of an associated macromolecule, such as would be represented by imperfectly assembled conjugates, recode blocks, memory oligos, etc. Similarly, probabilistic identification of amino acids, i.e., as belonging to a subset of possible amino acids, and their relative sequence can be used to create an estimate for the identity of a protein. In a similar way, comparison to reference sequence can be used to impute accurate mapping of imperfect recode sequence in the case of insertion, deletion, and mismatch errors. Deep learning algorithms, Bayesian models, Markov models, and artificial intelligence (Al) can aid in accounting for incomplete information, random errors, and systematic errors, to identify and map perfect and imperfect recode sequences to reference. Information quality based on binding moiety discrimination and other factors can be learned and incorporated into these analyses. For more information regarding Al, algorithms, and models as applied to the field of proteomics, see Crook, Chung, and Deane, Challenges and Opportunities for Bayesian Statistics in Proteomics, J. Proteome Res. 2022, 21(4), 849-864, which is herein incorporated in its entirety by reference for all purposes.
[0698] Stepwise alternative events are presented below with estimates of frequency, consequences to recode sequence error rate, consequences for recode sequence efficiency, and methods to mitigate or minimize the effects of such events. [0699] Conjugate immobilization A desired outcome of operation 2 of the recoding process (e.g., process 300) may be that 100% of N-terminal amino acids bind with a PITC conjugate. One alternative event at operation 2 includes incomplete binding of the N-terminal amino acid. Frequency is estimated to be 1% based on literature. A potential consequence to recode sequence error rate is a phasing phenomenon. Phasing may occur wherein the incorrect cycle will be assigned (i+k cycle instead of the ith cycle) where i is the current cycle and k is the number of “skipped” cycles during which a conjugate is not bound to an N-terminal amino acid. This results in an apparent sequence deletion with respect to a reference with a frequency of 1%, without the remediation steps outlined below. A potential consequence for recode sequence efficiency is that n cycles of recoding result in only n-1 piece of sequence information. Mitigation includes: optimizing binding conditions, increasing conjugate concentrations, repeating the step several times to complete the binding, or flooding the surface with free PITC to bind and remove N-terminal amino acid and eliminate phasing.
[0700] Another alternative event of operation 2 includes the incomplete wash of conjugate that did not bind a N-terminal amino acid. The frequency is estimated to be 1%. A potential consequence on recode sequence error rate is negligible based on effective mitigation strategy below. These conjugates may bind in operation 3 of process 300 to the support surface, but not necessarily in close enough proximity to react with a N-terminal amino acid in the next recode workflow cycle. A potential consequence for recode sequence efficiency is that n cycles of recoding result in only n-1 piece of sequence information. Mitigation includes: optimizing wash buffers and protocol, repeating the step several times to complete the binding, and in an intervening operation (operation 4b) quench immobilized conjugates that are bound to the surface using an amino acid mimic that is not recognized by binding agent in subsequent steps, or is recognized as an error event.
[0701] Yet another alternative event at operation 2 of the recoding process is that the N-terminal amino acid could be cleaved prior to immobilization of the conjugate to the solid support. Based on the frequency predicted from literature, this event may be neglected.
[0702] Conjugate immobilization A desired outcome of operation 3 may be that 100% of conjugate complexes become immobilized to the surface. One of the alternative events at operation 3 is thus incomplete immobilization. The frequency is estimated to be low based on the reactivity of Cu- catalyzed click chemistry. The system model places this as 5%. A potential consequence on recode sequence error rate is skipped information, and the consequence for recode sequence efficiency may be that n cycles of recoding result in only n-1 piece of sequence information. Mitigation includes: optimizing reaction buffers and protocol, repeating the step several times to complete the conjugate immobilization.
[0703] Conjugate immobilization A desired outcome of operation 4 of the recoding process is that 100% of N-terminal amino acids are cleaved to reveal new N-terminal AA and a perfect immobilized conjugate complex. Alternative events include: 1) incomplete cleavage of the N-terminal amino acid; 2) termination of recoding, if the cleavage does not occur during operation 4 of a subsequent workflow cycle; and 3) damage to the nucleobases that reduce their effectiveness to carry information in subsequent steps.
[0704] Incomplete cleavage is estimated to be about 3%. Phasing phenomenon may occur wherein the current cycle amino acid is associated with the correct cycle, but once cleavage of the N-terminal amino acid does occur (possibly during step 4 of a subsequent workflow cycle) the z+l+kth cycle information is associated with the i+kth amino acid, where i is the current cycle and k is the number of “skipped” cycles during which the N-terminal amino acid is not cleaved. This results in an apparent deletions of sequences with respect to a reference with frequency of about 3%, without performing any of the mitigation steps outlined below. A potential consequence on recode sequence error rate is about 3%, and a consequence on recode sequence efficiency may be that n cycles of recoding result in only n-1 piece of sequence information. Mitigation includes: optimizing conditions, increasing the repeating the reaction.
[0705] Termination of recoding has no effect on error rate but reduces recode sequence efficiency by about 3%.
[0706] Damage to the nucleobases is estimated to be low since the only oligos present are the protected cycle tag oligos. The effect on error rate and sequence conversion efficiency are complex and dependent on the code space and other NGS related factors. Mitigation includes increasing cycle tag length to compensate for the fraction of bases that are degraded.
[0707] Reagent purity. Reagent purity may have an effect on error rates and process efficiency. Preferred methodologies to produce chemically-reactive conjugate include joining multiple components as shown in FIG. 29. Stepwise yield for phosphoramidite synthesis is approximately 99.5%. Purity of the 3mer trifunctional linker can be assured and improved via preparative HPLC purification to remove any truncated products of the phosphoramidite synthesis. The attachment of functional elements to the trifunctional linker may not be complete. Alternative events caused by low purity reagents include conjugates that do not have a cycle tag; they can be removed via a hybridization purification step during production, as described herein. If not removed, the information gap may not show as a sequence deletion, but rather as an unknown amino acid for one analyte at a particular cycle.
[0708] The purity of 1-Ethynyl-dSpacer CE Phosphoramidite (Glen Research, Cat#10-1910) is > 99.5%, so that ensures capability to bind to the solid support in operation 4a over 99.5% of the time.
[0709] Free PITC-hydrazine may interfere at operation 2 of the recoding process by blocking an N- terminal amino acid, and then in operation 4 cleaving that amino acid, making it invisible to the recoding process and creating a sequence deletion. Thus, in some examples, unbound PITC-hydrazine may be removed. This may be accomplished via the hybridization purification, preparative HPLC, and tested for trace PITC using analytical HPLC. Any conjugates lacking PITC will be spectators in operations 2 through 4. A 1% free PITC (or conjugate lacking the alkyne or cycle tag functionality) impurity in operation 2 is estimated to produce a 1% deletion frequency. Note that cross-contamination of cycle tags during manufacture will result in the potential for mismatch errors, where amino acids are erroneously identified. A 1% cross-contamination is estimated to result in about 1% mismatch error.
[0710] Conjugate recognition by binding agents. A desired outcome of operation 5a is that a cognate binding agent is bound to each immobilized conjugate. Alternative events include: (1) no binding agent is bound; (2) a binding agent with cognate amino acid affinity, but non-cognate cycle tag is bound; (3) a binding agent with non-cognate amino acid affinity, but cognate cycle tag is bound; (4) a binding agent with non-cognate amino acid affinity and non-cognate cycle tag is bound; and (5) a binding agent having either non-cognate or cognate affinity is non-specifically bound (NSB) in proximity to a cycle tag. None of these events by themselves result in sequence insertions, deletions, or mismatch errors at this point in the recoding process. Their effect on error rate will be discussed in context of operation 5c. A potential consequence for recode sequence efficiency is related to the number of iterative cycles to push recode block assembly to >90%. The binding of the binding agent relies primarily on the interaction energy of the binding moiety of the binding agent. However, a feature of the binding agent is the hybridization energy of the cycle tag oligo contributes to the overall binding energy through hybridization to complementary DNA of a cognate recode tag.
[0711] Alternative event (1) depends on the affinity and concentration of binding agents. Frequency can be tuned to be low by adjusting binding formulation and condition. This may vary depending on the cognate amino acid. When assessing alternative event (2), the differential binding energies between binding agents will determine how frequently a non-cognate binding agent will block the immobilized conjugate, and render it unable to participate in the following ligation step. Alternative events (3) and (4) will be negligible because hybridization energy is low under the experimental wash conditions. They are estimated to be less than 1%. And alternative event (5) may be tuned by adjusting the formulations, conditions, adding passivation components, and/or modifying the hydrogel to reduce NSB. Any alternative events associated with recognition by binding agents may result in the need for high numbers of iterative cycles in operation 5, and may optionally include contacting the solid support with generic binding agents that do not discriminate binding based on amino acid, and have a high binding affinity to any immobilized conjugate. This promotes complete recode block formation, which aids in memory oligo assembly in subsequent steps. This mitigation gives up amino acid identity information, but provides position information even for amino acids whose identity is not determined. As outlined above, this is useful information when mapping to reference sequences for the identification and quantification of analytes in a sample.
[0712] Oligo synthesis Recode tag sequences may be incorrect due to oligo synthesis errors. Typical error rates are approximately 1 per 500 bases. The number of AA tag nucleotides in each memory oligo in this example is 6x30=180. Only off-by-2 errors will result in undetectable mismatch errors, due to the binary error-checking design of the codespace discussed in Example 6. Assuming 30 cycles of recoding and oligo synthesis errors are random, implies that 4.5% of memory oligos will have 1 mismatch error. This contributes 0.15% to the per AA error rate. [0713] Recode block assembly A desired outcome of operation 5b is that 100% of non-cognate binding agents are washed from the surface and do not interact with immobilized conjugates. Alternative events at operation 5b include incomplete removal of non-cognate molecules. Similar to operation 5a, this does not by itself result in insertion, deletion, or mismatch errors at this point in the recoding process, and does not have an effect on the recode sequence efficiency. Mitigation for incomplete removal includes: optimizing the time, flowrate, temperature, pH, salt, and/or other stringency factors during the wash step. Reducing the hybridization energy by increasing pH is an effective way to dissociate doublestranded DNA. Effective removal of non-cognate DNA is desired, so, binder moiety selection and affinity maturation at elevated pH will be beneficial to aid this wash step. Removal of non-cognate oligos, not held bound by interaction of a binding agent with cognate amino acid affinity to an immobilized conjugate, is presumed to be > 0.1%. The off rate of a binding agent may be a factor in maintaining cognate binding agent association with its cognate immobilized target. Tuning the time, formulations, and conditions through and between wash and ligation steps may impact occupancy of immobilized conjugates (i.e., the fraction with a bound binding agent) and thereby the number of iterative cycles required to push recode assembly to >90%. It is estimated that the fraction of conjugates bound to a cognate binding agent in any given iteration is 20%. Under this conservative assumption and further assuming no systematic effects, 10 iterations should achieve 90% recode block assembly.
[0714] A desired outcome of operation 5c is 100% ligation of the cognate ligation oligo to a recode block. Alternative events include: (1) no binding agent is bound; (2) a binding agent with cognate amino acid affinity, but non-cognate cycle tag is bound; (3) a binding agent with non-cognate amino acid affinity, but cognate cycle tag is bound; (4) a binding agent with non-cognate amino acid affinity and non-cognate cycle tag is bound; (5) a binding agent having either non-cognate or cognate affinity is non-specifically bound (NSB) in proximity to a cycle tag; and 6) incomplete ligation.
[0715] Alternative event (1) does not result in recode sequence error. A potential consequence for recode sequence efficiency may be additional time to iterate the bind, wash, and ligation cycles. Similarly, alternative events (3) and (4) do not result in significant recode sequence error. The < 0.1% association of non-cognate cycle tags with recode tags is further reduced by sequence differences at the ends of non-cognate cycle tags that do not participate effectively in the ligation. A potential consequence of this alternative event for recode sequence efficiency is additional time to iterate the bind, wash, and ligation cycles. Alternative event (6) may not result in recode sequence error. A potential consequence for recode sequence efficiency is additional time to iterate the bind, wash, and ligation cycles.
[0716] Alternative event (2) is binding of a binding agent with cognate amino acid affinity, but non- cognate cycle tag. This may be difficult to remove by washing due to the similar binding energy for a fully cognate binding agent compared to one with cognate amino acid affinity but not the cognate cycle tag. Too stringent a wash could dissociate cognate binding agents, and prevent the fully cognate binding agent from transferring information to the recode block during ligation. Thus, the frequency of the interference by “binding agents” can be estimated to be high, leading to poor per cycle information transfer efficiency. This can be remediated by iterative cycling, and it impacts the process efficiency. Fortunately, the consequence for recode sequence error rate is low since the cycle tag sequences are chosen to not interact and to be especially different at the 3’ end to prevent errant ligation. By using high fidelity ligation at high salt concentrations ligation of incorrect oligos is estimated to be > 0.1% (Lohman, et.al. (2015) Nucleic Acids Research, 2016, Vol. 44, No. 2). Even through 20 iterative cycles in attempts to find the cognate binding agent this suggests mis-association of cycle with amino acid will add >1% to recode error rate. Mitigation includes: optimization of ligase conditions and formulations, choice of ligase, avoidance of GT base pairing at the 3’ end junction, optimization of cycle tag sequence differences, and slow annealing.
[0717] Alternative event (5) is non-specific binding (NSB) of binding agents in proximity to immobilized conjugate. Non-cognate binding agents could have complementary recode tag sequence to a cycle tag in the vicinity. Hybridization to the cycle tag produces a viable ligation target. While difficult to quantify, this alternative event has the potential to contribute to the recode error rate. The probability that the errant recode tag outcompetes the recode tag of an associated binding agent is equivalent, if the fully cognate binding agent is bound, and is high if the recode tag of the bound binding agent has a non- complementary recode tag. Mitigation includes stringent wash of the solid support prior to ligation, adding passivation agents to the formulated reagents, and/or modifying the hydrogel to reduce NSB. Recode process efficiency is not affected by alternative event (5).
[0718] The analysis of stepwise error rates suggests that >90% of the identity and sequencing information represented in a memory oligo is accurate.
[0719] A desired outcome of the operation 5d is that 100% of cognate binding moieties are dissociated from cognate PTC-AA binding site of the immobilized conjugate to prepare for the next iteration of information transfer. Alternative events include incomplete removal of the binding agent. There may be no consequence to error rate however, as conjugates that are not free to find a cognate binding agent will be spectators in the next iteration cycle and significant residual binder will increase the number of requisite iterations of operation 5. Mitigation includes adjusting wash conditions to be longer, higher flowrate, higher temperature, and formulations that include protein denaturing conditions, such as high or low pH, and high detergent concentrations.
[0720] Memory oligo assembly A desired outcome of operation 6 is that 100% of recode blocks are ligated to form a complete memory oligo, which can serve as a template for cluster generation and NGS data collection is subsequent steps. Alternative events include incomplete ligation of recode blocks. The frequency of incomplete memory oligo assembly is estimated to be high due to “missing recode blocks” for some cycles, steric restriction during the assembly process, and incomplete ligation using enzymatic ligation methods. There is no consequence of this event on recode sequence error rate. However, the penalty in terms of the recode efficiency may be significant. Failure to assemble an amplicon results in no information from a given analyte fragment. Assuming recode block assembly rates are governed by the target stepwise efficiencies above, then for 30 recode cycles and without mitigation, the number of memory oligo amplicons capable of being analyzed by NGS would be -0.1%. This is derived from an 80% probability to have assembled any given recode block, raised to the power of the number of cycles, which in this example is 30. Thus, methods to assemble incomplete sets of recode blocks may be needed.
[0721] Mitigation of imperfect assembly to achieve a memory oligo includes the concept described in operation 7 of the recoding process wherein linking oligos are used to ligate any non-ligated recode block or memory oligo fragments. This can be done in multiple steps using subsets of the full complement of linking oligos capable of splinting any recode blocks (or memory oligo fragments) together. In addition, repeating operation 6 and 7 after cleaving the SPDP tethers in operation 8 to allow greater flexibility and accessibility of components can promote complete assembly of memory oligos. Note that ligation of any non-ligated recode block to any other to complete a memory oligo amplicon can result in a valuable memory oligo construct suitable for NGS analysis. Recode blocks can be assembled in any order and deconvoluted in silica, since the cycle information is adjacent to the AA tag information in each recode block. In some embodiments the cycle information is flanked by a universal assembly sequence that allows recode block assembly into the memory oligo in any order, and sequence is deconvoluted in silica', and 2) incorrect ligation of recode blocks. As covered in the previous paragraph, the consequence on recode sequence error rate is negligible, and the consequence for recode sequence efficiency is negligible since in the majority of cases the memory oligo will be imperfect, but still represent a significant quantity of analyte information and be suitable for NGS analysis. Shortcuts could cause a fraction of the recode block information to be lost, for example if recode block 1 ligated to recode block 30, and omitted the information of intervening recode blocks. Strategies may be used for stepwise ligation using ligation oligo subpools to maximize information capture.
[0722] Sensitivity analysis indicates robustness. The analysis identifies controllable factors to limit errors to acceptable levels and stabilize overall process efficiency. Reagent impurity, incompletely executed steps, binder fidelity, and alternative interactions within the recode process contribute to deletion of sequences, insertion of sequences, and mismatch errors. Degradation of the conjugate complex, recode block, or memory oligo, hydrogel, hydrogel delamination, or other degradation mechanisms may further result in recode sequence error or changes in efficiency. The frequency is controllable by choice of materials, methods, and protecting groups. In the example above second- order propagation of error is neglected, because these are estimated to be negligible.
Example 6: Code Space and Sequence Space
[0723] This example describes the two considerations and exemplary rules when assigning nucleic acid sequences to AA tags and cycle Tags that include: 1) the code space, and 2) the sequence space. It is not obvious that code space and sequence space are separable since the same nucleotides comprise both the physical and digital attributes of AA tags and cycle tags. However, recognizing that code space and sequence space are not same provides a capability to largely deconvolute the physiochemical properties of the sequence space (i.e., the physical system: hybridization temperature and energy, spatial interference, specificity of nucleic acid interaction) from code space (i.e., the in silico recode information). Pragmatically, deconvolution comes through utilizing a sequencing method to identify recoded information wherein only a subset of the nucleotides of the memory oligo are identified through DNA sequencing, and a subset are not identified. This may be achieved by introducing non-fluorescent, non-reversibly-terminated nucleotides into the sequencing reagent mixtures. The value is that one can tune the physiochemical properties without increasing sequencing time, or cost.
[0724] In this Example, a customized reagent set is created wherein a solution of nucleotides that contains blocked and fluorescently-labeled nucleotide triphosphates for A and C, and triphosphate nucleotides for G and T (Trilink Cat #: N-2513, and Cat #: N-2512, respectively) is substituted for the nucleotide reagent in a sequencing kit that contains blocked and fluorescently labeled triphosphates. A flowcell (Illumina, San Diego, CA) is seeded with memory oligos, clusters are generated using standard processes, and sequencing ensues. Sequencing proceeds under standard conditions using a commercial sequencing kit (Illumina NextSeq 500/550 High Output Kit v2.5 (300 Cycles) 20024908). At each sequencing cycle polymerase adds cognate nucleotides to the growing SBS oligo, directed by the DNA template in a given sequencing cluster. When a G, or a T, or a stretch of G’s or a stretch of T’s, or a combination of G and T is encountered, the polymerase during that cycle of sequencing adds as many G’ s and T’ s as necessary to get to the next A or C nucleotide. Then the polymerase adds blocked and fluorescently-labeled nucleotide A or C to the SBS oligo, as directed by the template. No further nucleotides may be added during this cycle because of the 3’ OH blocking group of the blocked and labeled nucleotide A or C triphosphates. The flowcell is imaged to read the color of the fluorophore attached to A or C for each cluster. At the end of the sequencing run, the resultant FASTQ file records only the information associated with the A and C bases of the memory oligo. Example sequences are shown with their corresponding code in the table below. In this example, an oligo sequence of length 15 bp provides one of 64 binary codes in 6 sequencing cycles. A fraction of the code space, for example, the codes with even parity, can be used, and the remainder unused to provide error checking and mitigate error modes in the processes of recoding and/or sequencing (Gunderson, et.al. Decoding randomly ordered DNA arrays, Genome Res 2004 May; 14(5):870-7). In this example, even parity codes are assigned to cycle tags, and odd parity codes are assigned to AA tags. The FASTQ file can be parsed to identify the amino acid sequence represented by each cluster, and mapped to reference protein sequences to identify proteins and quantify their concentrations. Table 7: Sequence Space and Code Space
[0725] In this example, the recode information is captured in a base-2 (binary) code, using A and C. Other subsets of nucleotides may be preferred in some instances. Subsets include: AGT, ACT, CTG, ACG while using a non-fluorescent, non-reversibly-terminated C, G, A, or T, respectively, in the sequencing reagent mix. In this case, information is coded using a base-3 code space. When choosing to create a code in binary space it is advantageous to choose one purine and one pyrimidine, as it allows tuning the non-coding bases with a ratio of purine to pyrimidine that provides flexibility to adjust %CG, Tm, and other physiochemical properties.
[0726] One clear benefit of recoding using a reduced number of nucleotide types is the ability to tune the physiochemical properties of the AA tag and cycle tag sequences relatively independently of the code that they hold. Note that the melting temperature of the physical sequencing in Table 7 may be between 35°C and 45°C under standard experimental conditions, while that of 6mer sequences that could be used to code the AA tag and cycle tag information may be near 0°C. Note also the greater hybridization specificity that can be obtained using the physical sequences, as compared to that which would be obtained by using the code sequences to support the physiochemical process of hybridization. [0727] Another benefit is the ability to design the physical sequences to support conjugation and avoid steric interferences. Note the 8th bases in the physical sequences of the example are all “T”. Commercially available phosphoramidites exist that allow conjugation through a modified nucleobase “T,” making reagent preparation straightforward (Glen Research, Amino-Modifier C6 dT (10-1039) Catalog #: 10-1039, CAS #: 178925-21-8). By placing a conjugation site near the middle of the oligo, steric interference with ligase is avoided during critical steps in the recode block assembly and memory oligo assembly processes. Alternately, 1) a basic conjugation site can be placed somewhere in middle of the nucleic acid using a compound during oligonucleotide synthesis such as 1-Ethynyl-dSpacer CE Phosphoramidite (Glen Research Cat# 10-1910), having an alkyne group in place of the nucleobase, or 2) a 5-Formylindole-CE Phosphoramidite (Glen Research, Cat# 10-1934) could serve to enable aldehyde -hydrazine conjugation at an internal site in the nucleic acid cycle tag.
[0728] In Example 4, each recode cycle creates a nucleotide long enough to hold the cycle and amino acid identity. The number of codes to differentiate 30 recode cycles is (43=64). This means at least 3 bases are utilized to hold the cycle tag information in some embodiments. The number of nucleotides to support the physiochemical requirement of the recode process may be between 5 and 20 (e.g. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or any range thereof). Other numbers may be included that work outside the range of 5 to 20. Similarly, the number of codes to differentiate 20 amino acids is (43=64). This means at least 3 bases hold the AA tag information. Again, the number of nucleotides to support the recode process may be between 5 and 20. Thus, each sequenced amino acid may require 10’ s of nucleotide bases. If these all needed to be sequenced, it may restrict the length of amino acid sequence that could be ascertained. Typical short-read NGS kits are capable to sequence 2x150 or 2x300 nucleotides. As the analyte grows longer than this, sequencing quality degrades. Thus, another benefit of decoupling code space and sequence space is the ability to reduce DNA sequencing cycles when analyzing the memory oligo using SBS NGS. By using non-labeled and non-blocked nucleotide triphosphates the memory oligo length is not limited to the maximum number of DNA sequencing cycles. [0729] Exemplary rules for code space follow the theories of digital communication error checking and correction, e.g., Hamming, et.al. Binary codes of length 5 are sufficient to code cycle and amino acid information, but a binary code length of 6 is required to check and correct errors due to imperfect recode block formation, memory oligo assembly, ligation of non-cognate information, oligo synthesis errors, or NGS sequencing errors.
[0730] Exemplary rules for sequence space include: 1) maximizing the sequence difference at the 3'end of all nucleic acids that are to be ligated during the process, 2) further, the greatest discrimination of the ligase activity may be obtained by excluding nucleic acids with GG,GC,CG, or CC at the 3’ end, 3) no shared words greater than 6mer and maximum distance between sequences to avoid cross hybridization 4) no homopolymer stretches >3mer, 5) a “T” nucleotide near the middle of the nucleic acid to support conjugation and avoid steric interferences with conjugation sites during the recode process, 6) the requisite number of “A” nucleotides and/or “C” nucleotides to create the codes within the sequence, 7) Tm matched, 8) %CG between 40% and 60%, 9) minimized hairpin structures, 10) defined sequence length (can be different for AA tags and cycle tags).
[0731] Concepts of Example 6 can effectively break the 1:1 connection between code space and physiochemical properties of the oligonucleotides. This can effectively be used to increase Tm during ligation assembly events, while reducing NGS cycles to obtain the recoded information of the memory oligo.
[0732] It is contemplated that memory oligos may have a limited number of unique constituent recode blocks (e.g., sequence blocks) as a result of the number of cycles and number of binding agents in the recoding process. For example, with thirty (30) cycles of sequencing and twenty (20) amino acids, there are only six hundred (600) blocks for identification using available detection modalities (30 cycles x 20 different amino acids). As an alternative to NGS sequencing techniques, analysis by hybridization using a combinatorial approach can be used to “decode” the identity of recode blocks in memory oligos, which in certain embodiments, can be 30mer sequences. Again, for decoding techniques, see Gunderson et al., Decoding Randomly Ordered DNA Arrays, Genome Res., 2004 May; 14(5):870-7, which is herein incorporated in its entirety by reference for all purposes. In such embodiments, instead of sequencing each nucleotide base, memory oligo information may be collected by performing sequential hybridization and de-hybridization steps, interspersed with imaging.
[0733] Given the proximity of recode blocks to each other for a given macromolecular analyte (e.g., when anchored to a solid support during the recoding process), it is contemplated that recode blocks may be analyzed by hybridization without prior assembly into a memory oligo. This can be carried out at the single molecule level, or following amplification of each individual recode block while maintaining proximity to each analyte anchor position. As described above with reference to FIG. 3- 4, localized amplification of recode blocks may be facilitated by primers such as P5 or P7 immobilized within the hydrogel polymer. The point — spread function of high-resolution optical systems is approximately 2, where I is the wavelength of the emitted photon(s), and is typically in the range of 500-800 nm for fluorescent dyes. Accordingly, since the distance between chemically-reactive conjugate anchor points around a central analyte anchor point may be in the order of 10s’ of nm to 200 nm, and the optimal distance between analytes is in the order of several 100’s of nm, the optical resolution enables isolation between analytes but not between recode blocks of a given analyte. This applies even if the recode blocks are not connected via phosphodiester bonds or other direct covalent linkages, as described herein for assembly of memory oligos.
[0734] Thus, while assembly of memory oligos from recode blocks and subsequent amplification of the memory oligos may facilitate signal enhancement during sequencing, single molecule analysis of memory oligos and/or recode blocks (using appropriate instrumentation/systems and dyes) is also contemplated herein. In such embodiments, the memory oligos, and or recode blocks in proximity to one another, can be analyzed using single-molecule imaging techniques, such as single-molecule decode-based imaging techniques. For more information regarding such single-molecule imaging techniques, see Shashkova and Leake, Single-molecule fluorescence microscopy review: shedding new light on old problems, Biosci Rep 2017 Aug 31; 37(4), which is herein incorporated in its entirety by reference for all purposes.
[0735] In addition to the application described herein, this method may be broadly useful in genomics for overcoming some limitations of short read technology. Short-read sequencing, while a powerful tool in genomics, has several limitations that can hinder its utility in certain applications. One issue is limited read length. Short-read sequencing technologies, such as those provided by Illumina, typically generate reads of up to 300 base pairs. This limitation can make it challenging to assemble complex genomes, particularly those with repetitive regions, as the short reads may not span the entire length of the repeat. [0736] Another issue is the difficulty in mapping structural arrangements. Structural variants, such as inversions, deletions, duplications, and translocations, can have significant impacts on gene function and expression. However, these variants can be challenging to detect with short-read sequencing, as the reads may not span the entire length of the variant, making it difficult to accurately map their locations. [0737] Additionally, short-read sequencing can struggle with accurately measuring gene fusions. Gene fusions, which occur when two previously separate genes become joined together, often play a critical role in diseases such as cancer. However, the short length of the reads can make it difficult to accurately identify the breakpoint where the two genes are fused together.
[0738] Other issues with short-read sequencing include difficulties in phasing alleles, accurately identifying long repeat expansions, and resolving complex regions of the genome, such as those with high GC content.
[0739] The methods described herein may be useful for addressing some of these issues. By skipping certain base pairs during sequencing, it may be possible to sequence farther than traditional short-read sequencing methods, potentially allowing for longer reads. This could help to resolve some of the issues associated with short-read sequencing, such as difficulties in assembling complex genomes and mapping structural variants. [0740] The method may improve the accuracy of gene fusion detection for certain fusions. By sequencing farther, it may be possible to more accurately identify the breakpoint where two genes are fused together, improving the accuracy of gene fusion detection.
[0741] Furthermore, the ability of some such methods to sequence farther may help with phasing alleles, identifying long repeat expansions, and resolving complex regions of the genome. By sequencing farther, it may be possible to span the entire length of long repeat expansions or complex regions, improving the accuracy of these analyses.
[0742] In RNA sequencing (RNAseq), longer reads can provide a more complete picture of individual transcripts, especially for organisms with complex genomes, or in the study of alternative splicing events. Longer reads can also improve the annotation of novel genes and isoforms. Longer reads may improve mapping accuracy, especially in regions with repetitive sequences. Shorter reads might map to multiple locations, making it difficult to assign them unambiguously. Longer reads may improve the quantification accuracy of expression levels, especially for longer transcripts.
[0743] In addition to extracting part of a sequence from a longer than normal segment, this could enable shorter runs. Sequencing with longer reads may be more expensive. The higher cost may limit the number of samples that can be sequenced in a given project, potentially reducing its statistical power.
[0744] Finally, use of a customized reagent set, which includes blocked and fluorescently-labeled nucleotide triphosphates for A and C, and standard triphosphate nucleotides for G and T, may be easily incorporated into existing sequencing technology using standard flowcells and other consumables, and standard primary analysis techniques to determine the base pairs read. Kits that use this may include any one, two, or three of the four reversibly terminated nucleotides being substituted for a normal, unblocked base, in addition to non-natural or other synthetic nucleotides being introduced for reading synthetic codes and skipping uninformative regions as previously described. Such kits and methods may be applied to any number of sequencing technologies that utilize reversible terminators, including, but not limited to the sequencers by Element Biosciences (Aviti), Pacific Biosciences (Onso), or others.
Example 7: Deprotection and Reprotection of Oligonucleotides
[0745] This example described an exemplary protocol may be used to illustrate protection or reprotection as follows:
[0746] For adenine and cytosine bases: dissolve 250mg of benzoyl chloride in 1 mL of anhydrous dimethylformamide (DMF), contact the oligonucleotide with the solution at room temperature for 1-3 hours. Wash the surface with DMF to remove unreacted reagents and byproducts.
[0747] For guanine bases: dissolve 250 mg of isobutyryl chloride in 1 mL of anhydrous DMF, contact the oligonucleotide with the solution at room temperature for 1-3 hours. Wash the surface with DMF to remove unreacted reagents and byproducts. [0748] In some embodiments, the location of immobilized amino acid complexes may be defined by a nucleic acid that is joined to the solid support in proximity, a “location oligo.” It may be useful to transfer the sequence information of the location oligo to a cycle tag, a recode block or a memory oligo. In these cases, protection, deprotection and/or reprotection methods described herein may be applicable. [0749] Oligonucleotide protection can be applied broadly in any protein sequencing method where chemical conditions used within the process may impart changes to oligonucleotide structure or function.
Example 8: Ligation and Amplification of Oligonucleotides in situ
[0750] This example describes the synthesis of functionalized silica beads, streptavidin oligo conjugate, successful ligation and PCR amplification of oligonucleotides on a solid support to form an exemplar memory oligonucleotide using a custom peptide conjugated to silica beads. FIG. 48 illustrates an embodiment in which the C-terminus of the peptide was covalently linked to the bead surface and a chemically-reactive conjugate containing an oligonucleotide (PPO-[/5Phos/ATGAGTG/iFormInd/ AGGGAAATAGCTTCTGGTCGAACTAGTTGTTCGTCAA (SEQ ID NO: 75)]-SOC) was reacted with the N-terminal amine of the peptide. Streptavidin was labelled with a second oligonucleotide (Syst#002-SOC-[/5Phos/GAACGTG/iFormInd/CTTCTGATGAAGTTTGGAGACAAATTGC GTGGGAGCA (SEQ ID NO: 91)]) and bound via biotin-streptavidin interaction to form a model affinity complex. The two oligonucleotides, now in close proximity, were ligated using a sequencespecific splint oligonucleotide and T4 DNA ligase. After ligation and amplification of the resultant surface-bound oligonucleotide, qPCR was performed using primers specifically designed to amplify the ligated product, thus, amplification only occurred if the complete ligation product was present. As a positive control, the ligation reaction was performed in solution in the absence of the peptide. Similar Ct values were observed for the solution ligation and on-bead ligation conditions (FIG 39). As a negative control, the beads were prepared and incubated without the addition of T4 DNA ligase. Melt curve analysis indicated the desired product was produced (Tm = 80 °C) in both the bead and the solution ligation samples, while any product produced in the negative control sample was non-specific off-target amplification (Tm ~ 68 °C).
[0751] FIG. 49 shows qPCR amplification curves indicating the successful ligation of the products on bead thereby showing steps of the method: (f) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex, and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and the binding agent and thereby bringing the cycle tag into proximity with the recode tag within the affinity complex, (g) transferring information of the recode nucleic acid to the cycle nucleic acid of the immobilized conjugate complex to generate a recode block; and finally (h) obtaining sequence information of the recode block, in this case via PCR, melt temperature analysis, and Sanger sequencing.
Synthesis of Azide-F unctionalized Silica Beads:
[0752] Five mg of amine-functional silica beads (CD Biosciences DNG-F046, 20 pm, 5 wt%, 4 pmol/g NH2) were added to a 0.5 mL protein lo-bind tube. Solution exchange was accomplished using an Eppendorf benchtop fixed angle centrifuge at 21000 x g for 1 minute. Supernatants were carefully removed by pipette, ensuring the bead pellet remained undisturbed. Surface amine groups were converted to azide using 34 mg of Azido acetic acid NHS ester (AA-NHS, Broadpharm BP-22467) under standard NHS-coupling conditions.
[0753] Following conversion, the beads were rinsed twice with deionized water, followed by a single wash with 200 mM carbonate buffer (pH 9.6). Subsequent washing steps were conducted thrice with deionized water. After the last wash, the supernatant was removed, and the bead pellet was resuspended in a solution containing 1 mg of fluorescamine (Aldrich cat F9015) dissolved in 1 mL of DMSO to test for residual amine. After allowing this mixture to react for 10 minutes at room temperature, the rinsed beads and supernatant solutions were transferred to a 96-well plate, and fluorescence was measured using a plate reader, yielding a bead RFU value of 1.42xlOA6. Residual amines were capped using a solution of 1.32 M succinic anhydride in 0.32 mL of dimethylformamide (DMF), 10% Diisopropylethylamine (DIPEA, Aldrich cat D125806). Following reaction at 60C for 2 hours, excess reactant was removed by serially washing with DMF DMSO, and water. A fluorescamine assay was again performed to check for residual amines after succinilation, and reported acceptably low background signal. Finally, the beads were suspended in IX SSPE buffer (prepared from Aldrich cat 15591043 20X stock) and stored at 4°C, shielded from light.
Synthesis of Peptide 5-Functionalized Silica Beads:
[0754] The azide-functionalized beads prepared as described above served as the starting material. These were combined with a 200 mM phosphate buffer at pH 7, 50 mM THPTA, 10 mM CuSO4, and 0.8 mM of Peptide 5, which has the sequence {lys-biotin} {ser} {ser} {lys-FAM} {ser}-Pra; where “Pra” denotes a propargyl group at the C-terminus. Freshly prepared 100 mM sodium ascorbate was also added.
Synthesis of streptavidin-oligo conjugate
[0755] Streptavidin (SA, Sigma - SA101) was solubilized in PBS buffer to 100 pM, yielding approximately 2 mL. NHS-PEG4-DBCO (BP-22288) was prepared at a lOmM concentration in DMSO. NHS-PEG4-DBCO was added to the Streptavidin in a 2-fold molar excess, targeting 1-2 linkers per SA molecule. The reaction proceeded at room temperature for 60 minutes. Unreacted NHS-PEG4-DBCO was removed via serial rinses using a 10k MWCO spin column (Sigma UFC5010). The conjugate was stored at -20°C. Purity and yield were evaluated using absorption values at 280 nm and 309 nm.
[0756] /5Phos/GAACGTG/iFormInd/CTTCTGATGAAGTTTGGAGACAAATTGCGTGGGAGCA (SEQ ID NO: 91) was reacted for 16h at 40C with aminoxy-PEG-azide in the presence of 5-aminoindole catalyst at pH 6.5, and purified via HPLC, dried, and resuspended in nuclease-free DI water. Verification of the correct product was conducted via high-performance liquid chromatography (HPLC) and electrospray ionization time-of-flight mass spectrometry (ESI-TOF-MS). Subsequently, the N3- oligo conjugate at 100 pM was mixed with the DBCO-SA conjugate at a 1.3:1 molar ratio of oligo to DBCO-SA. The reaction proceeded for 2 hours at room temperature. Unreacted N3-oligo was removed via replicate rinses through a 30k MWCO spin column with PBS and the conjugate concentration was determined to be -120 pM.
Preparation of Beads having Affinity Complexes
[0757] The chemically-reactive conjugate, PPO-Sysl, as described above (see synthesis of PPO, a trifunctional CRC) with the SEQ ID 75 substituted for SEQ ID 108), purified by HPLC, were introduced to beads in 200 mM carbonate (pH 9.6) and allowed to couple to the N-terminal alpha amine of the peptide. Beads were then rinsed with SSPE buffer.
[0758] The SA-oligo conjugate was introduced to the bead at nM concentration, and the excess removed by copious rinsing with PBS.
Ligation of Oligos on Bead Surface:
[0759] The oligonucleotide ligation steps utilized several components, including T4 ligase (NEB cat M0202S), T4 DNA Ligase Reaction Buffer 10X (NEB cat B0202SVIAL), and a 1 M NaCl solution. Nuclease-free water was used throughout the process. In 0.2mL PCR tubes, the following reactions were mixed and prepared:
Table 8 [0760] 4pL of T4 ligase buffer (10X) and IpL of T4 ligase was added to each tube (except no T4 ligase in the no-ligase control), and gently pipette mixed. The reaction proceeded at RT for 1 hour, then the ligate was heat inactivated at 65 °C for 10 min. qPCR of Ligation Products and Controls:
[0761] The real-time PCR was performed using SYBR Green Master Mix (Bio-Rad cat 1708880). qPCR cycling was run on a standard mode with an initial denaturation step of 3 minutes at 95 °C, followed by 40 cycles of 10 seconds at 95°C and 30 seconds at 60°C. The melt curve stage started at 65°C, increasing by 0.5°C every 5 seconds until 95°C. Primers for amplification included SysOOl PR1 and Sys002 PR3, and appropriate controls were set to assess the efficiency of the qPCR reaction. Data analysis was performed using a qPCR software suite.
Example 9: Stability of Protected and Deprotected DNA oligonucleotide Edman chemical conditions
[0762] This example assesses the stability of the protected and deprotected DNA oligonucleotides using HPLC. A deprotected 15mer DNA oligonucleotide (CCTGTTGTCAATGAG, Sys#003, LO1) (SEQ ID NO: 126) was obtained from Integrated DNA Technologies. Cleavage, deprotection, and desalting were performed by IDT using their standard process. It was resuspended in Mol Bio grade H2O at 160uM and a 40uL aliquot was dried (Eppendorf, Vacufuge+, 45C for ~ 2 hrs) prior to subjecting to Edman cleavage chemistry. The deprotected oligonucleotide material used for the baseline measurement was resuspended as above, but not dried.
[0763] A protected DNA oligonucleotide (5Phos/ATGAGTG/iFormInd/ AGGGAAATAGCTTCTGGTCGAACTAGTTGTTCGTCAA/idSp/TTTCTTT, Sys#001 , SOCAB) (SEQ ID NO: 125) was liberated from CPG support using EndoIV (NEB, M0304) and desalted (Zymo) to provide five 60 uL aliquots at 70uM of /5Phos/ATGAGTG/iFormInd/ AGGGAAATAGCTTCTGGTCGAACTAGTTGTTCGTCAA (SEQ ID NO: 75), which were dried as described above. The protected oligonucleotide material used for the baseline measurement was desalted as above, but not dried.
[0764] In anhydrous environment 40 uL of DMSO was added to each dried aliquot. Following dissolution 40 uL TFA was added to each. Solutions were incubated at 45deg C for 0 mins, 30 mins, 60 mins, 4 hrs, and overnight. Samples were neutralized by addition of 228 uL of 4. IM imidazole solution at the conclusion of their incubation period. HPLC chromatograms were collected for each sample.
Example 10: Associating a Metadata Tag with an Immobilized Peptide Analyte
[0765] This example describes and assesses incorporation of a metadata tag. Anti-all pan myristolation PAb (MyBiosource, MBS2549218) at 6.7uM is activated by incubation in PBS-EDTA (lOOmM sodium phosphate, 150mM NaCl, ImM EDTA, 0.02% sodium azide, pH 7.2) having 500uM Sulfo-LC-SPDP, sulfosuccinimidyl 6-[3'(2-pyridyldithio)-propionamido] hexanoate (SPDP, Thermo 21650) for 4 hours at room temperature. Excess reagent is removed via spin filtration at 4C using an Amicon Ultra-0.5, lOkDa MWCO and the activated antibody is washed with PBS having Img/mL BSA and brought to the original volume.
[0766] A 5 ’thiol-labeled oligonucleotide having a sequence representing ubiquitin and a modified penultimate alkyne-labeled nucleotide (Glen Research, 5-(octa-l,7-diynyl)-2'-deoxyuridine, 101540) is obtained from IDT and solubilized at ImM in water.
[0767] The activated PAb and oligo are combined and allowed to react for 24 hours at 4°C. The resulting disulfide-linked conjugate is purified by HPLC using a SAX column (Agilent, 5190-2463). Alternative methods include utilization of custom small molecules or linkers to incorporate the metadata tag.
[0768] A synthetic peptide having a myristolated N-terminus and a propargyl C-terminus is obtained from Biomatik. Similarly, a negative control non-myristolated synthetic peptide is obtained. The peptide is incubated on the surface of a microscope slide previously treated with trimethoxyazidosilane (Gelest) in PBS-EDTA having ImM copper sulfate, ImM THPTA, and lOmM sodium ascorbate for 30 minutes at room temperature under conditions that allow attachment to the azide-functionalized surface. [0769] A lOnM solution of the anti-myristol-PAb-oligo conjugate in PBS-BSA buffer is incubated on the surface of the slide for 30 minutes at 4C, followed by a rinse with copious PBS-BSA. Antibody directed oligo remaining on the surface through specific binding of the PAb to the myristol group of the peptide is covalently immobilized by addition of PBS-EDTA having ImM copper sulfate, ImM THPTA, and lOmM sodium ascorbate under conditions that allow attachment to the azide- functionalized surface. Subsequently, the slide surface is treated with PBS having 25 mM DTT and washed with PBS to liberate the PAb from the surface. Results are analyzed by microscopy following hybridization to fluorescently-labeled complement (IDT) and by incubation with Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor 568 (Thermo Fisher A-11011).
Example 11: Providing Free Amine at the N-terminus Following Immobilization of a Metadata tag
[0770] Synthetic peptide(s) from Example 1 are immobilized to the surface of beads(s) as described in Example 1 , and then acylated using acetic anhydride. Recombinant aminopeptidase P1/XPNPEP1 (Bio- techne, 2970-ZN-010) is a soluble form of aminopeptidase capable of cleaving various peptides. It is incubated on the slide surface at 0.1 ug/mL in 50 mM Tris, 250 mM NaCl, 0.5 mM MnC12, pH 8.0 for 5 minutes at room temperature, followed by washing with copious PBS-EDTA. Existence of a new N- terminal amine group is confirmed using cytometry following the conjugation of the new exposed amine with FITC at neutral pH. Alternatively, trypsin or ClpAP may be used to cleave other various immobilized peptides motifs. Example 12: Nucleic Acids for Use as Cycle Tags:
[0771] A useful consideration in preparing CRC conjugates is choosing a nucleic acid structure that is unaffected by chemical conditions of the process. Various nucleic acids have been engineered to have improved stability [Duffy et al. BMC Biology (2020) 18:112], and as such are candidates for use in protein sequencing workflows. These may include backbone-modified nucleic acids, and/or basemodified nucleic acids, and/or nucleic acids with both backbone and base modifications. Oligonucleotides that contain only pyrimidine bases may have improved stability under some conditions [Kratschmer et.al., Nucleic Acid Ther. 2017;27(6):335^14]. In order to utilize modified nucleic acids having only 2 or 3 different types of bases, sequences are designed. Exemplary sequences may conform to design rules to improve specificity and sensitivity of the reverse translation process. The 2-base and 3-base sequences in Table 9 and Table 10 follow the criteria: %GC in range 40-60%, no hairpin structures, no homopolymer runs greater than 3, Tm range +/-2.5C, no cross-sequence similarity (heteroduplex) greater than 7 bp (for 3-base system) or no cross-sequence similarity greater than 9 bp (for 2-base system), no self-dimers greater than 3bp, and less than 4 di-nucleotide repeats. Table 9 and Table 10 provides exemplary sequences for a system having only C & T nucleotides, and for a system having only 7-deaza-adenosine, C, & T nucleotides, respectively. The drawback of 2-base and 3-base sequences is lack of diversity that can lead to lack of specificity and protein sequencing errors.
Table 9: Two-state Sequences TablelO: Three-state Sequences
* where A in the sequence is a 7-deaza-adenosine in place of adenosine
Example 13: Processing a Peptide using an ITC-conjugate:
[0772] This example describes a process that can be utilized to derivatize the N-terminal amine of an immobilized peptide with an ITC-conjugate. In this example, 20uL of a 40mM solution of methyltetrazineisothiocyanate in DMSO and luL of diisoproylethylamine is added to a dried 5pg sample of silica beads functionalized with a model peptide as described in Example 8 under anhydrous conditions. The mixture was incubated for 15 minutes at 40C. The solution was then exchanged under anhydrous conditions for a solution of neat phenylisothiocyanate and luL of diisoproylethylamine. This mixture was incubated for 15 minutes at 40C, and then the solution was allowed to cool and exchanged into ethanol.
[0773] A CRC of FIG. 41 where X’ is TCO is provided. A lOuL aliquot of lOuM CRC is contacted with the beads and the tetrazine moiety of the N-terminal bound ITC-conjugate is allowed to couple with the TCO moiety of the CRC for 40 minutes at room temperature.
[0774] Unreacted CRC was washed from the surface extensively using degasses organic and aqueous solvents. The solution was exchanged for click reaction buffer (neutral pH PBS, 2mM PMDETA, ImM copper sulfate, lOmM sodium ascorbate) and the alkyne group of the CRC was allowed to react with the surface-bound azide groups for 30 min at room temperature.
[0775] Cleaving the N-terminal amino acid to form the immobilized amino acid complex was completed by exchanging the beads into dry solvent, removing that solvent, and adding anhydrous trifluoroacetic acid.
Example 14: Assembly of memory oligos:
[0776] This example describes the assembly of oligonucleotides on a solid support to form an exemplar memory oligonucleotide. A 5ug aliquot of beads prepared using steps (a) through (1) of process 200-3 and the steps illustrated in FIG. 24A-24F to generate an immobilized initiation oligo are provided. The resulting nucleic acids on the surface are:
Initiator:
TGAGATTACGACAGCTCAATTCGAAGTACAGCATATCCAGACAAGCTGCGTTTCCCT
ACACTCAT (SEQ ID NO: 205). In this model system the first 20 nucleotides comprise the Hyb tag'; the second 30 nucleotides comprise the hyb tag ligation oligo; the last 15 bases comprise the U_AS'
Cycle i: rArUrGrArGrUrGrUrArGrGrGrArArAGACATTGTACTTTCGTCCAACATGTGTTGGTTTGT CCTCATGACG+A+T+G+A+G+T+G+T+A+G+G+G+A+A+ATT/amine/TT/didioxyC/ (SEQ ID NO: 206)
Cycle i+1: rArUrGrArGrUrGrUrArGrGrGrArArAGAATAGTTGCGAATGTCCAACATGTGTTGGGCCT GATTGTCAGCG+A+T+G+A+G+T+G+T+A+G+G+G+A+A+ATT/amine/TT/didioxyC/ (SEQ ID NO: 207)
Cycle i+2: rArUrGrArGrUrGrUrArGrGrGrArArAAAGCGGATTGAAGTGTCCAACATGTGTTGGCGAG AGCTGTTTCCG+A+T+G+A+G+T+G+T+A+G+G+G+A+A+ATT/amine/TT/didioxyC/ (SEQ ID NO: 208)
[0777] In this model system the first 15 nucleotides comprise the U-AS; the second 15 nucleotides comprise the cycCode; the next 15 nucleotides comprise the U-HS, the next 15 nucleotides comprise the AA code, and the last 15 nucleotides comprise the 3’ U-HS. The ‘r’ designates an RNA base, the ‘+’ designates a LNA base.
[0778] To join the immobilized co-localized recode block nucleic acids the beads are incubated in lOOuL volume containing: lOuL of 10X RNAse-H reaction buffer (NEB, Cat# B0209); luL (5 units) of RNAse-H (NEB, Cat #M0297); 4uL (32 units) of BST2.0 polymerase (NEB cat#M0537), dNTPs (lOmM), lOmM MgSO4, lOuL of 10X PCR buffer; luL of thermolabile Uracil DNA Glycosylase (M0372 ) for carryover contamination prevention; and luM of 3’diddioxy blocked DNA sequence complementary to the hybtag sequences. The mixture is allowed to incubate for 180 min at 45C.
Example 15: Targeting Epitopes on Intact Peptides Prior to Deconstruction
[0779] This example describes a method for analyzing peptides with Post-Translational Modifications (PTMs). PTMs are critical regulators of protein function, but their analysis presents significant technical challenges. To demonstrate a method for analyzing peptides with PTMs, we evaluated the specific detection and recording of dinitrophenyl (DNP) modifications using a controlled system where sequence information about the modification is transferred into nucleic acid form while maintaining the linkage to the peptide sequence information. Specific incorporation of DNP (dinitrophenyl) tags into memory oligonucleotides was demonstrated using a model system comparing two synthetic peptides: [0780] peplO: NH2-{pTyr} {Lys-biotin} {arg}
[0781] pepl l: NH2-{pTyr} {Lys-DNP} {Lys-biotin} {arg}
[0782] The peptides were identical except that pepl l contained a DNP modification on lysine, while peplO lacked DNP, providing a controlled system to evaluate specific DNP tag incorporation.
[0783] Memory oligonucleotides were generated as described previously. Libraries containing the memory oligonucleotides were sequenced using an Element AVITI sequencer with 2xl50bp paired- end reads, generating approximately 1-1.3 million reads per sample. Analysis was performed based on DNA-based alignment against expected memory oligo sequences.
[0784] The results demonstrated highly specific incorporation of DNP tags only in the presence of the DNP target. In samples from peplO (lacking DNP), DNP recode blocks were present at only background levels of 0.14-0.16% . In contrast, with pep 11 (containing DNP) , DNP recode blocks were substantially enriched to 57-62% of reads with recode blocks in the sample when DNP was incorporated in the first cycle. The >350-fold enrichment of DNP recode blocks in the presence vs. absence of the DNP target demonstrates highly specific tag incorporation.
[0785] This example demonstrates an effective method for analyzing peptides having post- translational modifications by: (1) contacting peptides containing PTMs with specific binding agents (anti-DNP antibodies) carrying nucleic acid tags representing the modification; (2) transferring the sequence information to a position proximal to the immobilized peptide through specific binding and incorporation into memory oligonucleotides; (3) removing the PTM during the peptide sequencing process; and (4) performing reverse translation protein sequencing to generate nucleic acid sequences that preserve both the peptide sequence and PTM information, as evidenced by the > 350-fold enrichment of DNP-specific sequences only in samples containing the DNP modification. The robust specificity and enrichment observed validates this approach for analyzing peptides with post- translational modifications. Example 16: Use of Alternative Nucleic Acid Tags and Transfer of Information into Polymerizable
Molecules
[0786] Disclosed herein are methods for transferring sequence information from non-polymerizable nucleic acid analogs to polymerizable nucleic acids. In certain embodiments, the method comprises utilizing a peptide nucleic acid (PNA) molecule that contains workflow cycle-specific information, wherein said PNA molecule cannot directly participate in enzymatic ligation or polymerization reactions. The method employs a bridging oligonucleotide comprising a first domain complementary to the PNA sequence and a second domain that functions as a splint to facilitate ligation. The bridging oligonucleotide enables the transfer of cycle-specific information from the PNA to a ligation product comprising a recode tag oligonucleotide, which contains information about a detected molecule, and a ligation oligonucleotide, which contains workflow cycle information. Through the action of the bridging oligonucleotide, the recode tag and ligation oligonucleotide are brought into proximity and joined via enzymatic ligation to form a recode block. The resulting recode block incorporates both the molecular identity information from the recode tag and the cycle-specific information originally encoded in the PNA molecule. This method enables the integration of information from non- enzymatically active nucleic acid analogs into amplifiable and sequenceable DNA constructs.
[0787] This example describes a method for transferring information from non-polymerizable nucleic acids to polymerizable nucleic acids by showing the transfer of information from PNA to DNA. A model peptide (PEP6) having the sequence {pTyr,Ser,Lys-FAM,Ser,Lys-Biotin,Ser} (SEQ ID NO: 209) was prepared and conjugated to solid support beads according to methods described in Example 8 . The peptide was subjected to five cycles of sequential amino acid removal, wherein cycles 1 and 5 employed tetrazine-modified phenylisothiocyanate (Tz-PITC) in conjunction with chemically-reactive conjugates PPOT2 and PPOT3 (PPOT: N-(Pentynoyl)-5'-oligopeptidenucleic acid-3'-(8-trans(cyclooct- 5-enyloxyacetyl)lysine)), respectively, while cycles 2-4 utilized unmodified PITC. The chemically- reactive conjugates PPOT2 and PPOT3 comprised PNA cycle tags having the sequences CTTGCACAGAAGACT (SEQ ID NO: 210, PNA2, 3’ to 5’) and ACTTCAAACCTCTGT (SEQ ID NO: 211, PNA3, 3' to 5'), respectively.
[0788] Bridge oligonucleotides complementary to the PNA cycle tags were synthesized as follows: BR-001 (5’ to 3’): GAACGTGTCTTCTGACCAACTCATGTTGGACGAAAGTACAATGTC/3ddC/ (SEQ ID NO: 212, complementary to PPOT2 PNA)
BR-002 (5’ to 3’): TGAAGTTTGGAGACACCAACTCATGTTGGACATTCGCAACTATTCC (SEQ ID NO: 213, complementary to PPOT3)
[0789] The following recode tag-modified proteins were employed as binding agents were employed: Anti-phosphotyrosine antibody (Sigma 05-1050X) conjugated to C2-PNA-RT-002 (SEQ ID NO 214):
(/5Phos/TCCAACATGTGTTGGGCCTGATTGTCAGCG+G+A+G+T+G+T+ATT/iUniAmM/T773d dC/)
Streptavidin conjugated to C2-PNA-RT-003 (SEQ ID NO: 215):
(/5Phos/TCCAACATGTGTTGGCGAGAGCTGTTTCCG+G+A+G+T+G+T+ATT/iUniAmM/T773d dC/)
[0790] Ligation oligonucleotides specific to each cycle were utilized:
C2-PNA-LO-001 for PPOT2 (SEQ ID NO: 216): (T7T7TCCATGGAGTGTAGACA7TGTACTTTCG) C2-PNA-LO-002 for PPOT3 (SEQ ID NO: 217): (T7T7TCCATGGAGTGTAGAATAGTTGCGAATG) [0791] The beads containing immobilized amino acid complexes from cycles 1 and 5 were processed according to the following protocol. Unless otherwise specified, all wash steps and incubations were performed in buffer comprising PBS supplemented with BSA and Pluronic acid and Tween surfactants. The binding agents were introduced at 10 nM and incubated with the beads for 1 hour at room temperature. Following a wash step, bridge oligonucleotides were added at 10 pM and incubated for 30 minutes at room temperature. After washing, ligation oligonucleotides (10 pM) were combined with T4 DNA ligase and incubated for 1 hour at room temperature to facilitate formation of the recode blocks. The beads were then subjected to a final wash step.
[0792] For analysis, the processed beads were transferred to PCR reaction vessels. Amplification was performed using primers complementary to the terminal sequences of the fully formed recode blocks according to methods known in the art. PCR analysis demonstrated significant product enrichment in samples containing the peptide compared to background controls. Ct values for the recode block representing phosphotyrosine were 25.8 for the condition with peptide PEP6 (SEQID 209) and 29.1 for the control without PEP6 (SEQID 209), and for the recode block representing the biotin tag was 20.7 for the condition with PEP6 (SEQID 209) and 28.4 for the control condition without PEP6 (SEQID 209), confirming successful transfer of sequence information from the non-polymerizable PNA cycle tags to amplifiable DNA constructs through the bridge oligonucleotide-mediated ligation process.
[0793] This example demonstrates successful transfer of sequence information from non- polymerizable PNA cycle tags to generate amplifiable DNA recode blocks, directly validating methods for transferring information from the nucleic acid recode tag associated with a binding agent (anti- phosphotyrosine antibody or streptavidin recode tag conjugates) and cycle tag information (PNA cycle tags) to generate recode blocks. The bridge oligonucleotide system enabled conversion of non- amplifiable PNA-based information into DNA constructs suitable for downstream analysis, as evidenced by successful PCR amplification showing significant product enrichment specifically in peptide-containing samples. The ability to transfer information from non-polymerizable nucleic acid analogs through bridge oligonucleotide-mediated ligation provides important flexibility in the choice of information-carrying molecules during the recode block generation process, while ensuring the final product remains compatible with standard DNA amplification and sequencing methods. This approach expands the toolkit for recording and analyzing peptide sequence information by enabling the use of alternative nucleic acid chemistries that may offer advantages in stability or binding properties while maintaining compatibility with DNA-based workflows.
Example 17: Restriction Enzyme-Based Depletion of Memory Oligonucleotides
[0794] This example demonstrates a depletion method using a restriction enzyme recognition site incorporated into a specific recode block sequence. Memory oligonucleotides were generated containing recode blocks for DNP (dinitrophenyl), phosphotyrosine (pY), and biotin/streptavidin targets. The memory oligonucleotides contained single recode blocks or combinations of ligated recode blocks. The DNP recode blocks included a recognition site for a specific restriction enzyme.
[0795] Libraries containing the memory oligonucleotides were prepared and split into two portions. One portion was treated with the restriction enzyme specific to the DNP recode block sequence (treated sample), while the other portion remained untreated (control sample). Both samples were then sequenced using an Element AVITI sequencer with 2xl50bp paired-end reads, generating approximately 1.6 million reads per sample.
[0796] Analysis of the sequencing data using UAS-based anchoring revealed highly efficient and specific depletion of memory oligonucleotides containing DNP recode blocks, regardless of whether they were single-block or multi-block constructs. Memory oligonucleotides containing DNP showed 94.76% depletion in the treated sample compared to control (from 18,160 reads to 952 reads). This depletion was consistent across single DNP block oligos (94.74% depletion) and memory oligos containing DNP plus additional blocks like streptavidin or phosphotyrosine (95.85% depletion).
[0797] Importantly, the depletion was highly specific to DNP-containing sequences. Memory oligonucleotides that did not contain DNP showed no depletion, and in fact showed a slight increase in representation (from 10,506 reads to 11,540 reads, +9.84% change). This demonstrates that the restriction enzyme treatment specifically targeted and depleted memory oligonucleotides containing the DNP recode block while leaving other sequences intact.
[0798] These results demonstrate that by reverse translating peptides with binding agents containing a restriction enzyme recognition site, nucleic acid sequence with the restriction site was successfully incorporated into sequences representing the target peptide (DNP-containing sequences in this example). Subsequently, introducing the restriction enzyme to these nucleic acid sequences resulted in specific cleavage of sequences containing the recognition site, thereby depleting representation of the targeted sequences from among the nucleic acid population by 94.76%. The high specificity was demonstrated by unaffected levels of non-targeted sequences. When analyzed by next-generation sequencing, the depleted sequences showed dramatically reduced representation while maintaining the ability to detect and quantify the remaining sequences. This example thus provides clear evidence to support the method of selective depletion of targeted peptide sequences through incorporation of restriction sites via binding agents, enzymatic cleavage, and subsequent sequencing.
Example 18: Generation and Validation of Peptide Identification Numbers
[0799] This example demonstrates the utility of incorporating randomized nucleotides within recode blocks to generate unique peptide identification numbers (PINs). In this example, memory oligonucleotides were generated containing recode blocks for DNP (dinitrophenyl), phosphotyrosine (pY), and biotin/streptavidin targets. The memory oligonucleotides contained either single recode blocks or combinations of assembled recode blocks. A single randomized nucleotide position (A, C, G, or T) was incorporated into the cycle tag portion of each recode block during synthesis as described in previous examples.
[0800] Libraries containing the memory oligonucleotides were sequenced using an Element AVITI sequencer with 2xl50bp paired-end reads. Analysis through multiple computational approaches demonstrated successful incorporation and detection of the randomized PIN nucleotides:
[0801] For the first recode block, the normalized distribution of bases at the PIN position showed relatively even incorporation: A (33.9%), C (20.3%), G (19.7%), and T (26.1%). Similarly for the next recode block, the normalized distribution was: A (37.0%), C (18.0%), G (18.7%), and T (26.3%). This even distribution across nucleotides demonstrates unbiased incorporation at the PIN positions.
[0802] For memory oligonucleotides containing both recode blocks, all 16 possible two-nucleotide PIN combinations (AA, AC, AG, AT, etc.) were observed at roughly equivalent frequencies, with no single combination representing more than 12% of the total combinations. This even distribution validates the random incorporation and demonstrates the utility of PINs for unique molecular identification.
[0803] Control experiments using recode blocks having defined (non-random) sequences without intentional PIN positions showed effectively zero erroneous base insertions (0.05% background rate), confirming the specificity of the PIN incorporation process. When examining sequence locations corresponding to where PINs would be located, only rare T insertions were observed at a rate of approximately 0.05%. This low background validates that observed PINs represent incorporated randomized bases rather than synthesis or sequencing artifacts.
[0804] The results validate that incorporating single randomized nucleotides at defined positions within recode blocks during synthesis enables generation of unique molecular identifiers through combinatorial assembly into memory oligonucleotides. The even distribution of bases and combinations, coupled with low background in controls, demonstrates that this approach can reliably generate unique identifiers for tracking individual peptide molecules through the sequencing workflow. This PIN strategy provides molecular counting capabilities similar to traditional UMIs while leveraging the inherent assembly of recode blocks, without requiring separate UMI incorporation steps. [0805] For a two-PIN system as demonstrated here, up to 16 unique molecular identifiers can be generated. Expanding to additional PIN positions would increase the number of possible unique identifiers exponentially (4An where n is the number of PIN positions). This scalable approach enables molecular counting and identification of individual peptides while maintaining compatibility with standard DNA synthesis and sequencing methods. It would be obvious to those skilled in the art to scale this to an arbitrary number of PINs to enable as many unique molecular identifiers as needed for a given application.
Example 19: Cappins the N-terminus of a carrier protein
[0806] This example demonstrates a method to render a carrier protein inactive during a reverse translation process, such as process 200 or 300 disclosed herein. Bovine [3-lactoglobulin (BLG), which is one of the major proteins in milk, may be used to passivate surfaces such that sample proteins are less likely to adhere non-specifically to surfaces involved in a reverse translation process.
[0807] In certain embodiments, BLG with an N-terminal modification can serve as a carrier protein. In certain embodiments the N-terminus is chemically or enzymatically modified. In one embodiment, the N-terminus of BLG is acetylated using acetic anhydride: BLG (2 mg) is solubilized in 1.0 ml of 50 mM sodium bicarbonate (pH 8.0), containing 6 M urea. Acetic anhydride (3uL) is added dropwise, while the pH is maintained via addition of aliquots of lOOmM NaOH. The mixture is incubated at room temperature for 3 hours after which the acetylated protein is desalted on a G-25 Sephadex column. Example 20: Cappins the C-terminus of a carrier protein
[0808] This example demonstrates a method to render a carrier protein inactive during a reverse translation process, such as process 200 or 300 disclosed herein. In one embodiment, the C-terminus of Bovine Serum Bovine (BSA) is amidated with [3-alanine. BSA is useful as a general protein blocking agent because it can prevent sample analyte proteins of interest from being adsorbed onto surfaces non- specifically. In certain embodiments, BSA with a C-terminal modification can serve as a carrier protein to achieve maximum selectivity during immobilization of the analyte to a surface.
[0809] To render the the C-terminus of BSA incapable of immobilizing to a surface, it is amidated with P-alanine amide using the following procedure: BSA (1.5 mg), EDC (1.5 mg), and sulfo-NHS (3.5 mg) are dissolved in 2.5 ml of buffer (10 mM MES 150mM NaCl, pH 6.0). The mixture is incubated at room temperature for 30 minutes, after which 1.5 ml of a 20mM solution of P-alanine amide in buffer (lOOmM phosphate buffer, 50mM NaCl, pH 7.6) is added. Following incubation for an additional 4 hours at room temperature, the reaction solution is desalted on a G25 Sephadex column.
Example 21: Memory Oli o Assembly
[0810] Disclosed herein is an example demonstrating the ability to systematically assemble amplicons amenable for NGS sequencing that retain information related to peptide analytes (e.g. a memory oligo). In certain embodiments, the method comprises assembly of amplicons from recode blocks. In some embodiments, the recode block sequence comprises: nucleic acid sequence denoting amino acid identity, relative location of the immobilized peptide target, cycle information, sequence helpful for directing assembly of recode block, such as a universal hybridization sequence, and sequence helpful for directing assembly of memory oligos, such as a universal assembly sequence, or restriction endonuclease sites, primer sites, and PIN sequence components.
Prepare Beads having an Starter sequence associated with each immobilized peptide
[0811] To demonstrate the transfer of amino acid information via recode blocks into a nucleic acidbased memory oligo, a model peptide (PEP11) having the sequence (N->C) {pTyr} {Lys-DNP} {Lys- biotin} {arg} was prepared and conjugated to solid support beads according to methods described in Example 8. Once bound, the peptide was coupled to an ITC conjugate and further conjugated to a functionalized oligonucleotide according to methods described in Example 13. The sequence of the functionalized oligonucleotide comprised a hybridization sequence (SEQ ID NO: 220, S4-DNA-HT- 001 (5’->3’): /5hexynyl/T/iUniAmM/GAACGTGTCTTCTGATGAAGTTTGGAGACAAATTGCGTGGGAGCA). [0812] To initiate the workflow, the bead-bound hybridization sequence was incubated with a complementary oligonucleotide (SEQ ID NO: 221, S4-DNA-SE-001 (5’->3’): T*T*T *T*T*CTGCAGGAGTGTATTGACGAACAACTAGTTCGACCAGAAGCTATGCTCCCACGCA ATTTGTCTCCAAACTTCATCAGAAGACACGTTC/3ddC/) by incubation at 25nM in Isothermal Amplification Complete Buffer with dNTP at 200uM (IAB) (New England Biolabs) at RT for 30 minutes, and then extended using Bst2.0 polymerase ( NEB, Cat# M0275) reaction under standard conditions incubated at 65 °C for 30min, followed by heat inactivation at 80°C for 20min. Notable features of the complementary oligonucleotide include: universal assembly sequence helpful for assembly of recode block information (U-AS), a PstI restriction endonuclease site to control access to the universal sequence, and phosphorothioate linkages between the five 5’ terminal nucleotides of the sequence. Following the extension reaction, excess solution-phase oligonucleotide was removed through a 6x sequential wash procedure in buffer comprising PBS supplemented with BSA and Pluronic and Tween surfactants (PBSAT+B+P). Briefly, beads were sequentially centrifuged atl4Kxg at 25°C for Imin, and supernatant was repeatedly disposed, and replaced by PBST+B+P. The output of this process produced a bead set comprising a dsDNA Starter covalently bound to PEP11 on the bead surface.
Preparation of Binding Agents
[0813] Binding agents toward phosphotyrosine and toward DNP were prepared by first conjugating DBCO to antibodies (Sigma, cat# 05-321X & Sigma, cat# D8406, respectively) using an oYo-Link® DBCO - Mouse IgGl kit (Alpha Thera) under recommended conditions. The conjugated antibodies were incubated with 5’ azide functionalized oligonucleotides in PBS at 4C overnight (PNA-RT2- 240402 and PNA-RT 1-240402 below), and purified using 30k MWCO spin filters. Binding agents toward streptavidin were prepared as described in Experiment 8 using (PNA-RT3-240402 below). The extent of oligonucleotide conjugation was in the range of 5% to 1% as determined by functional testing with cytometric analyses for the antibodies and > 90% for streptavidin. Model binding agents were used without further purification.
Preparation of model binding agents having complete recode blocks
[0814] Model binding agents having complete recode blocks were pre-assembled in solution by ligating oligonucleotide sequence containing both cycle and U-AS’ information to binding agents using T4 ligase under std conditions incubated at 25 °C for Ihr. Complementary bridge oligo splints were present at equimolar concentrations with both binding agents and ligation oligos. The recode tag of the binding agent and the ligated oligo both contain U-AS’ sequences that participate in Memory Oligo formation, facilitating a directional assembly. The output of this process produced 3 ‘model binding agents’ at [lOOuM], each comprising a unique recode block sequence.
[0815] Recode Block #1 (anti-DNP):
[0816] PNA-LO 1-240402 (SEQ ID NO: 222)(5’->3’):
T*T*T*T*T*CTGCAGGAGTGTAG*A*C*A*T*TGTACTTTCG
[0817] PNA-RT 1-240402 (SEQ ID NO: 223)(5’->3’): /5Phos/TCCAACATGTGTTGGTTTGTCCTCATGACGTGCAG+G+A+G+T+G+T+ATT/iAzideN/ TT/3ddC/ [0818] C2-PNA-BR-001f (SEQ ID NO: 224)(5’->3’):
G*C*G*A*C*ATAACCATTGTGAGGACAAACCAACACATGTTGGACGAAAGTACAATGTC/
3ddC/
[0819] Recode Block #2 (anti-Phophotyrosine):
[0820] PNA-L02-240402 (SEQ ID NO: 225)(5’->3’):
T*T*T*T*T*CTGCAGGAGTGTAG*A*A*T*A*GTTGCGAATG
[0821] PNA-RT2-240402 (SEQ ID NO: 226)(5’->3’):
/5Phos/TCCAACATGTGTTGGGCCTGATTGTCAGCGTGCAG+G+A+G+T+G+T+ATT/iAzideN/ TT/3ddC/
[0822] C2-PNA-BR-002f (SEQ ID NO: 227)(5’->3’):
A*T*A*G*T*CAAGAGACGCACAATCAGGCCCAACACATGTTGGACATTCGCAACTATTC/3 ddC/
[0823] Recode Block #3 (Biotin):
[0824] PNA-L03-240402 (SEQ ID NO: 228)(5’->3’):
T*T*T*T*T*CTGCAGGAGTGTAA*A*G*C*G*GATTGAAGTG
[0825] PNA-RT3-240402 (SEQ ID NO: 229)(5’->3’):
/5Phos/TCCAACATGTGTTGGCGAGAGCTGTTTCCGTGCAG+G+A+G+T+G+T+ATT/iAzideN/ TT/3ddC/
[0826] C2-PNA-BR-003f (SEQ ID NO: 230)(5’->3’):
A*C*T*T*G*CAACGCCGAGACAGCTCTCGCCAACACATGTTGGACACTTCAATCCGCTT/3 ddC/
[0827] Here, * indicates a phosphorothioate modification, where a sulfur atom replaces a nonbridging oxygen in the phosphate backbone between nucleotides. This modification is useful to provide resistance to nuclease degradation. The “iAzideN” represents an internal azide modification incorporated through an NHS ester functional group at a thymine (T) position in the oligonucleotide. Such a modification adds a dT nucleotide at its position of incorporation. Further, + designates a LNA base.
Memory Oligo assembly
[0828] The process to create a memory oligo comprised 4 sequential steps: 1) binding a model binding agent to the immobilized peptide, 2) endonuclease and exonuclease introduction to expose a single-stranded U-AS sequence of the starter oligonucleotide, 3) hybridization of the recode block of a model binding agent to a universal assembly sequence, and 4) transfer of information of the model binding agent to a memory oligo via a polymerase extension reaction. These sequential steps were iterated three times to assembly a memory oligo having information of three recode blocks. Briefly, beads having an Starter sequence associated with each immobilized peptide were incubated with one of the model binding agents described above at 25nM for 2hr at RT in PBSAT+B+P on a rotator. Following incubation, excess unbound materials were removed via the washing procedure described above. The beads were incubate with PstI restriction endonuclease (NEB cat# R0140) and Lambda exonuclease (NEB cat# M0262) in buffer (NEB, IX rCutSmart) at 37°C for 30min to expose the 5’ U-AS sequence of the starter oligonucleotide in iteration 1 (or to expose the 5’ U-AS sequence of the growing memory oligo in iteration 2 or 3). This allowed hybridization of the ssDNA U-AS’ segment of the recode block of the associated model binding agent to the ssDNA U-AS’ portion of the memory oligo. Information of the recode block was transferred via incubation with Bst2.0 polymerase reaction in IAB at 65 °C for 30min, followed by heat inactivation at 80°C for 20min. A stringent wash via the washing procedure described above removed the model binding agent to reset the system for the second and third iterations of memory oligo extension. Anti-phosphotyrosine , anti-DNP, and anti-biotin model binding agents were applied in iterations 1, 2, and 3, respectively.
[0829] To facilitate amplification of the memory oligo, an terminator oligonucleotide (SEQ IDNO : 231, 5’->3’):
T*T*T*T*T*CTAAGATTGGGCATTACAGCGTTAAGTGTTTGCAG+G+A+G+T+G+T+A) having a sequence complementary to an amplification primer was appended to the memory oligo after the third iteration via the same steps described above, with the following modifications: Step (1) was omitted and in step (3), instead of hybridizing the recode block of a model binding agent, the terminator oligonucleotide was hybridized at 25nM to the exposed memory oligo U-AS’ sequence for 30min at RT.
[0830] For analysis, the processed beads were transferred to PCR reaction vessels. Amplification was performed using primers complementary to the terminal sequences of the memory oligo according to methods known in the art. (SEQ ID NO: 232 FP Extended DNA Starter (5’->3’):
GAACGTGTCTTCTGATGAAGTTTGG; and (SEQ ID NO: 233 S4-DNA-RP-Term-001 (5’->3’): CTAAGATTGGGCATTACAGCGTTAAGTGTT). Under these conditions PCR analyses demonstrated significant product enrichment (delta Ct > 8) in samples containing the peptide compared to background controls without peptide or without Starter, confirming successful transfer of amino acid-specific sequence information. In addition, amplified products showed molecular weight profiles (10% PAGE TBE gel), consistent with 1, 2 or 3 Recode Blocks incorporated into the memory oligo, providing insights into the reaction efficiencies of the nucleic acid assembly process and materials purity. NGS analysis provided evidence of assembly of the anticipated sequences. With 99%, 0.95%, and 0.01% of memory oligos holding information for 1, 2, and 3 recode blocks, respectively.
***
[0831] While this disclosure is satisfied by embodiments in many different forms, as described in detail in connection with preferred embodiments of the disclosure, it is understood that the present disclosure is to be considered as exemplary of the principles of the disclosure and is not intended to limit the disclosure to the specific embodiments illustrated and described herein. Numerous variations may be made by persons skilled in the art without departure from the spirit of the disclosure. The scope of the disclosure will be measured by the appended claims and their equivalents. The abstract and the title are snot to be construed as limiting the scope of the present disclosure, as their purpose is to enable the appropriate authorities, as well as the general public, to quickly determine the general nature of the disclosure. In the claims that follow, unless the term “means” is used, none of the features or elements recited therein should be construed as means-plus-function limitations pursuant to 35 U.S.C. §112, 16.

Claims

1. An assay method, comprising: contacting peptides with a binding agent that targets or binds to a high-abundance peptide of the peptides, wherein the binding agent comprises a nucleic acid sequence comprising a recognition site for a restriction enzyme or a complement of a recognition site for a restriction enzyme; and performing reverse translation protein sequencing of the peptides to generate nucleic acid sequences representing the peptides, the nucleic acid sequences representing the peptides comprising a nucleic acid sequence representing the high-abundance peptide, wherein the reverse translation protein sequencing comprises: incorporating the nucleic acid sequence comprising said recognition site for the restriction enzyme or complement thereof into the nucleic acid sequence representing the high-abundance peptide, contacting the nucleic acid sequences with the restriction enzyme , thereby cleaving the nucleic acid sequence representing high-abundance peptides and thereby depleting representation of the high-abundance peptide from among the nucleic acid sequences representing the peptides, and sequencing the nucleic acid sequences representing the peptides.
2. The method of claim 1 , wherein the binding agent comprises a recode tag comprising the nucleic acid sequence comprising the recognition site for the restriction enzyme or complement thereof.
3. The method of claim 2, further comprising generating a recode block comprising sequence information of a cycle tag, and sequence information of the recode tag.
4. The method of claim 3, further comprising generating a memory oligonucleotide from the recode block and other recode blocks.
5. The method of claim 1, wherein the peptide is immobilized by being bound to a solid support.
6. A method for analyzing a peptide having a post-translational modification, comprising: contacting the peptide having the post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a first nucleic acid sequence representing the post-translational modification, and wherein the peptide is immobilized to a solid support; transferring first sequence information of the first sequence or of a complement of the first sequence to the peptide, or to the peptide or a position of the solid support proximal to the peptide when the peptide is immobilized; removing the post-translational modification from the peptide; performing reverse translation protein sequencing of the peptide to generate a second nucleic acid sequence, the second nucleic acid sequence representing at least one amino acid of the peptide, wherein the first sequence information representing said post-translational modification is incorporated into the second nucleic acid sequence, and the second nucleic acid sequence is sequenced.
7. A method for analyzing a peptide having a post-translational modification, comprising: contacting the peptide having the post-translational modification with a binding agent that targets or binds to the post-translational modification, wherein the binding agent comprises a nucleic acid having a nucleic acid sequence representing the post-translational modification, and wherein the peptide is immobilized to a solid support; transferring sequence information of the binding agent to a position of the solid support proximal to the peptide; removing the post-translational modification from the peptide; performing reverse translation protein sequencing of the peptide to a generate nucleic acid sequence representing at least one amino acid of the peptide; combining the information of the nucleic acid representing the post-translational modification and the nucleic acid representing at least one amino acid of the peptide sequence.
8. The method of claim 6 or 7, wherein the post-translational modification comprises phosphorylation, glycosylation, glycanation, methylation, acetylation, ubiquitination, carboxylation, hydroxylation, biotinylation, pegylation, or succinylation.
9. The method of claim 6 or 7, wherein the binding agent comprises a recode tag comprising the first nucleic acid sequence representing the post-translational modification or a complement thereof.
10. The method of claim 9, further comprising generating a recode block comprising sequence information of a cycle tag, and sequence information of the recode tag.
11. The method of claim 10, further comprising generating a memory oligonucleotide from the recode block and other recode blocks.
12. The method of claim 6 or 7, further comprising: based on the sequenced second nucleic acid sequence, determining identity and positional information of the post-translational modification within the peptide.
13. A method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising:
(a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions;
(b) performing (bl) or (b2): (bl) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a reactive moiety for binding and cleaving the N-terminal amino acid residue of the peptide; and (z) a immobilizing moiety for immobilization to the solid support, or
(b2) coupling the N-terminal amino acid with a first reactive moiety, and providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety; and (z) a immobilizing moiety for immobilization to the solid support;
(c) contacting the peptide with the chemically-reactive conjugate of (bl) thereby coupling the chemically-reactive conjugate to the N-terminal amino acid of the peptide to form a conjugate complex, or contacting the peptide with the chemically-reactive conjugate of (b2) thereby coupling the chemically-reactive conjugate to the first reactive moiety coupled to the N-terminal amino acid of the peptide to form the conjugate complex;
(d) immobilizing the conjugate complex to the solid support via the immobilizing moiety;
(e) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as the N-terminal amino acid residue on the cleaved peptide, and thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue;
(f) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex;
(g) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block;
(h) obtaining sequence information for the recode block; and
(i) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide.
14. A method for determining identity and positional information of an amino acid residue of a peptide coupled to a solid support, the method comprising:
(a) coupling the peptide to the solid support such that a N-terminal amino acid residue of the peptide is not directly coupled to the solid support and is exposed to reaction conditions;
(b) coupling the N-terminal amino acid with a first reactive moiety;
(c) providing a chemically-reactive conjugate comprising (x) a cycle tag comprising a cycle nucleic acid associated with a cycle number; (y) a second reactive moiety that binds or reacts with the first reactive moiety; and (z) a immobilizing moiety for immobilization to the solid support; (d) contacting the peptide with the chemically-reactive conjugate thereby coupling the chemically-reactive conjugate to the first reactive moiety coupled to the N-terminal amino acid of the peptide to form the conjugate complex;
(e) immobilizing the conjugate complex to the solid support via the immobilizing moiety;
(f) cleaving and thereby separating the N-terminal amino acid residue from the peptide, thereby exposing the next amino acid residue as the N-terminal amino acid residue on the cleaved peptide, and thereby providing an immobilized amino acid complex, the immobilized amino acid complex comprising the cleaved and separated N-terminal amino acid residue;
(g) contacting the immobilized amino acid complex with a binding agent, the binding agent comprising: a binding moiety for preferentially binding to the immobilized amino acid complex; and a recode tag comprising a recode nucleic acid corresponding with the binding agent, thereby forming an affinity complex, the affinity complex comprising an immobilized amino acid complex and a binding agent and thereby bringing a cycle tag into proximity with a recode tag within each formed affinity complex;
(h) transferring the information of the nucleic acid recode tag associated with the first binding agent and the cycle tag of the first immobilized conjugate complex to generate a first recode block;
(i) obtaining sequence information for the recode block; and
(j) based on the obtained sequence information, determining identity and positional information of an amino acid residue of the peptide.
15. The method of claim 13 or 14, wherein the first reactive moiety comprises isothiocyanate (ITC) or an ITC conjugate.
16. The method of any one of the previous claims, wherein the binding moiety comprises an antibody or a fragment thereof, or aptamer.
17. The method of claim 13 or 14, wherein the binding moiety specifically binds an N- terminal modification.
18. The method of claim 17, wherein the N-terminal modification comprises a post- translational modification.
19. The method of claim 13 or 14, further comprising, before contacting the immobilized amino acid complex with the binding agent: contacting the immobilized amino acid complex with a modification binding agent, wherein the N-terminal amino acid comprises a post-translational modification, and wherein the modification binding agent comprises a modification binding moiety for preferentially binding to the post- translational modification and a modification recode tag comprising a recode nucleic acid corresponding with the modification binding agent, thereby forming a complex comprising an immobilized post-translationally modified amino acid complex and a modification binding agent and thereby bringing a cycle tag into proximity with a modification recode tag.
20. The method of claim 13 or 14, further comprising, before contacting the immobilized amino acid complex with the binding agent: contacting the immobilized amino acid complex, wherein the N-terminal amino acid comprises a post-translational modification, with a binding agent, wherein the binding agent comprises a binding moiety for preferentially binding to the post-translational modification; and a recode tag corresponding with the post-translational modification, thereby forming an affinity complex comprising the immobilized amino acid complex and binding agent, thereby bringing a cycle tag into proximity with the recode tag.
21. The method of claim 17, further comprising removing the N-terminal modification from the N-terminal amino acid.
22. The method of claim 17, further comprising: based on the obtained sequence information, determining identity and positional information of the N-terminal modification.
23. The method of any one of claims 5 or 12-14, wherein the solid support comprises a bead, plate, chip, slide, glass, silica, resin, gel, hydrogel, membrane, polystyrene, metal, nitrocellulose, mineral, plastic, polyacrylamide, latex, or ceramic.
24. The method of any one of claims 1, 6, 7, 13 or 14, wherein the peptide or peptides comprises a hormone, neurotransmitter, enzyme, antibody, viral protein, bacterial protein, synthetic peptide, bioactive peptide, peptide hormone, oligopeptide, polypeptide, fusion protein, cyclic peptide, branched peptide, recombinant protein, tumor marker, therapeutic peptide, antigenic peptide, or signaling peptide.
25. The method of any one of claims 1, 6, 7, 13 or 14, wherein the peptide or peptides are associated with a disease.
26. The method of any one of claims 1, 6, 7, 13 or 14, wherein the peptide or peptides are obtained from a sample comprising a cell lysate, blood sample, plasma sample, serum sample, tissue biopsy, saliva sample, urine sample, cerebrospinal fluid sample, sweat sample, synovial fluid sample, fecal sample, gut microbiome sample, environmental water sample, soil sample, bacterial culture, viral culture, organoid, tumor biopsy, sputum sample, or hair sample.
PCT/US2025/011251 2024-01-11 2025-01-10 Determination of protein information by recoding amino acid polymers into dna polymers with metadata tagging Pending WO2025151826A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202463620076P 2024-01-11 2024-01-11
US63/620,076 2024-01-11
US202463551359P 2024-02-08 2024-02-08
US63/551,359 2024-02-08

Publications (1)

Publication Number Publication Date
WO2025151826A1 true WO2025151826A1 (en) 2025-07-17

Family

ID=96387613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/011251 Pending WO2025151826A1 (en) 2024-01-11 2025-01-10 Determination of protein information by recoding amino acid polymers into dna polymers with metadata tagging

Country Status (1)

Country Link
WO (1) WO2025151826A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004031730A2 (en) * 2002-10-03 2004-04-15 Norman Leigh Anderson High sensitivity quantitation of peptides by mass spectrometry
US20180051320A1 (en) * 2016-08-22 2018-02-22 The Regents Of The University Of California Depletion of abundant sequences by hybridization (dash)
US20200348307A1 (en) * 2017-10-31 2020-11-05 Encodia, Inc. Methods and compositions for polypeptide analysis
WO2024015875A2 (en) * 2022-07-12 2024-01-18 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into dna polymers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004031730A2 (en) * 2002-10-03 2004-04-15 Norman Leigh Anderson High sensitivity quantitation of peptides by mass spectrometry
US20180051320A1 (en) * 2016-08-22 2018-02-22 The Regents Of The University Of California Depletion of abundant sequences by hybridization (dash)
US20200348307A1 (en) * 2017-10-31 2020-11-05 Encodia, Inc. Methods and compositions for polypeptide analysis
WO2024015875A2 (en) * 2022-07-12 2024-01-18 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into dna polymers

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU TAO, QIAN WEI-JUN, MOTTAZ HEATHER M., GRITSENKO MARINA A., NORBECK ANGELA D., MOORE RONALD J., PURVINE SAMUEL O., CAMP DAVID G: "Evaluation of Multi-Protein Immunoaffinity Subtraction for Plasma Proteomics and Candidate Biomarker Discovery Using Mass Spectrometry", MOLECULAR & CELLULAR PROTEOMICS, AMERICAN SOCIETY FOR BIOCHEMISTRY AND MOLECULAR BIOLOGY, US, vol. 5, no. 11, 1 November 2006 (2006-11-01), US , pages 2167 - 2174, XP093337756, ISSN: 1535-9476, DOI: 10.1074/mcp.T600039-MCP200 *
S TAMBOULIAN: "Using high-abundance proteins as guides for fast and effective peptide/ protein identification from human gut metaproteomic data", MICROBIOME, 1 April 2021 (2021-04-01), pages 1 - 17, XP021288976, DOI: 10.1186/s40168-021-01035-8 *

Similar Documents

Publication Publication Date Title
JP7097627B2 (en) Large molecule analysis using nucleic acid encoding
US12474347B2 (en) Determination of protein information by recoding amino acid polymers into DNA polymers
US12188940B2 (en) Determination of protein information by recoding amino acid polymers into DNA polymers
JP7253833B2 (en) Methods and Kits Using Nucleic Acid Encoding and/or Labeling
CN111662960B (en) Spatially encoded bioanalytical analysis using microfluidic devices
US11634709B2 (en) Methods for preparing analytes and related kits
US20250155447A1 (en) Methods and systems for processing polymeric analytes
US20250327811A1 (en) Single-molecule peptide sequencing using dithioester and thiocarbamoyl amino acid reactive groups
WO2025151826A1 (en) Determination of protein information by recoding amino acid polymers into dna polymers with metadata tagging
WO2009061402A2 (en) Methods and compounds for phototransfer
US20250188538A1 (en) Single-molecule peptide sequencing using xanthate amino acid reactive groups
WO2025166050A1 (en) Nanopore-based sequencing of peptides

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25739364

Country of ref document: EP

Kind code of ref document: A1