[go: up one dir, main page]

WO2025137775A1 - Method of generating and screening synthetic peptide aptamer libraries - Google Patents

Method of generating and screening synthetic peptide aptamer libraries Download PDF

Info

Publication number
WO2025137775A1
WO2025137775A1 PCT/CA2024/051738 CA2024051738W WO2025137775A1 WO 2025137775 A1 WO2025137775 A1 WO 2025137775A1 CA 2024051738 W CA2024051738 W CA 2024051738W WO 2025137775 A1 WO2025137775 A1 WO 2025137775A1
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
amino acid
synthetic peptide
amino acids
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CA2024/051738
Other languages
French (fr)
Inventor
John G. Marshall
Peter BOWDEN
Jaimie DUFRESNE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2025137775A1 publication Critical patent/WO2025137775A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/10Libraries containing peptides or polypeptides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1138Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against receptors or cell surface proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/115Aptamers, i.e. nucleic acids binding a target molecule specifically and with high affinity without hybridising therewith ; Nucleic acids binding to non-nucleic acids, e.g. aptamers
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B30/00Methods of screening libraries
    • C40B30/04Methods of screening libraries by measuring the ability to specifically bind a target molecule, e.g. antibody-antigen binding, receptor-ligand binding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/16Aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2330/00Production
    • C12N2330/30Production chemically synthesised
    • C12N2330/31Libraries, arrays
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6845Methods of identifying protein-protein interactions in protein mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients

Definitions

  • a method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide, the at least one synthetic peptide comprises two or more domains separated by a protease cleavage site, wherein at least one domain comprises at least one amino acid this is randomly selected from a subset of the 20 common amino acids; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; and analyzing the bound synthetic peptide aptamers, comprising: generating a MS/MS query spectrum of the bound peptide aptamer; receiving one or more parameters of the query spectrum;
  • At least one domain comprises at least one random amino acid.
  • the at least one random amino acid is randomly selected from a subset of amino acids.
  • the subset of amino acids comprises a subset of fewer than 5 amino acids.
  • the cleavage site comprises a protease cleavage site.
  • the at least one synthetic peptide further comprises at least one predetermined amino acid at a predetermined position within at least one domain.
  • Figure 1 is an embodiment of the method to create new peptide aptamer drugs.
  • the general scheme shows creating new punctuated peptide aptamer drugs with fixed amino acids involved in ligand-receptor binding and cleavage sites where each cleave product may contain a pool of >2 amino acids.
  • the synthetic peptide comprises two domains. In some embodiments, the synthetic peptide comprises three domains. In some embodiments, the synthetic peptide comprises four domains. In some embodiments, the synthetic peptide comprises five domains. In some embodiments, the synthetic peptide comprises six domains. In some embodiments, the synthetic peptide comprises seven domains. In some embodiments, the synthetic peptide comprises eight domains. In some embodiments, the synthetic peptide comprises nine domains. In some embodiments, the synthetic peptide comprises ten domains. In some embodiments, the synthetic peptide comprises more than ten domains.
  • the cleavage site can be a chemical cleavage site or a protease cleavage site, such as a trypsin cleavage site or a chymotrypsin cleavage site.
  • the cleavage site comprises a protease cleavage site. In one embodiment, the cleavage site comprises a trypsin cleavage site. In one embodiment, the cleavage site comprises a chymotrypsin cleavage site.
  • the at least one cleavage site comprises an arginine. In some embodiments, the at least one cleavage site comprises a lysine. In some embodiments, the at least one cleavage site comprises a tryptophan. In some embodiments, the at least one cleavage site comprises a tyrosine. In some embodiments, the at least one cleavage site comprises a phenylalanine.
  • the cleavage site comprises a chemical cleavage site.
  • the subset of amino acids comprises a subset of fewer than 20 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 19 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 18 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 17 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 16 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 15 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 14 amino acids.
  • the subset of amino acids comprises a subset of fewer than 13 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 12 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 11 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 10 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 9 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 8 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 7 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 6 amino acids.
  • the subset of amino acids comprises a subset of fewer than 5 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 4 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 3 amino acids. In some embodiments, the subset of amino acids comprises a subset of 2 amino acids.
  • the subset of amino acids comprises a subset of between 2 and 19 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 18 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 17 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 16 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 15 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 14 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 13 amino acids.
  • the subset of amino acids comprises a subset of between 2 and 12 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 11 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 10 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 9 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 8 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 7 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 6 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 5 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 4 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 or 3 amino acids.
  • the subset of amino acids comprises a subset of the
  • the term “20 common amino acids” refers to the amino acids alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine.
  • the subset of amino acids is not limited to a subset of the 20 common amino acids. Subsets of other amino acid sets can also be used.
  • An amino acid set can comprise one or more common amino acids, one or more uncommon amino acids, one or more unnatural amino acids, or any combination thereof.
  • At least one of the domains comprises a random sequence.
  • the random sequence is a random sequence of 20 amino acids. In some embodiments, the random sequence is a random sequence of 19 amino acids. In some embodiments, the random sequence is a random sequence of 18 amino acids. In some embodiments, the random sequence is a random sequence of 17 amino acids. In some embodiments, the random sequence is a random sequence of 16 amino acids. In some embodiments, the random sequence is a random sequence of 15 amino acids. In some embodiments, the random sequence is a random sequence of 14 amino acids. In some embodiments, the random sequence is a random sequence of 13 amino acids. In some embodiments, the random sequence is a random sequence of 12 amino acids. In some embodiments, the random sequence is a random sequence of 11 amino acids.
  • the random sequence is a random sequence of 10 amino acids. In some embodiments, the random sequence is a random sequence of 9 amino acids. In some embodiments, the random sequence is a random sequence of 8 amino acids. In some embodiments, the random sequence is a random sequence of 7 amino acids. In some embodiments, the random sequence is a random sequence of 6 amino acids. In some embodiments, the random sequence is a random sequence of 5 amino acids. In some embodiments, the random sequence is a random sequence of 4 amino acids. In some embodiments, the random sequence is a random sequence of 3 amino acids. In some embodiments, the random sequence is a random sequence of 2 amino acids.
  • the synthetic peptide can further comprise one or more invariant amino acid residues.
  • the one or more invariant amino acid residues can have acidic, basic or hydrophobic characters.
  • the peptide further comprises at least one predetermined amino acid residue at a predetermined position within at least one domain.
  • the at least one predetermined amino acid residue comprises a hydrophobic amino acid.
  • the at least one predetermined amino acid residue comprises tryptophan, histidine, phenylalanine, methionine, tyrosine, cysteine, or lysine.
  • the peptide can further comprise different random pools of amino acids between two cleavage sites, between two invariant amino acid residues, or between a cleavage site and an invariant amino acid residue, wherein each random pool comprises a pool of a subset of the 20 common amino acids.
  • at least one domain comprises at least one random amino acid.
  • a peptide aptamer library can be designed with tryptic and chymotryptic sites spaced to create several subdomains and in each domain an defined subset of the 20 common amino acids can be randomly ordered alongside invariant amino acids with acidic, basic or hydrophobic characters.
  • An aptamer of all 20 common amino acids randomly ordered would create a large computational problem.
  • randomizing all 20 common amino acids would also create a chemical problem where there were too many peptide species in the library for anyone to reach a concentration in the atto molar or femtomolar or picomolar or nanomolar range for binding assays.
  • the whole aptamer may contain ⁇ 20 random amino acids or modifications thereof, while the cleavage products are each short enough and simple enough to be identified by tandem mass spectrometry.
  • the term “contacting the peptide aptamer library with the target” means allowing the peptide aptamer library and the target to interact by any means.
  • the target can be immobilized on a surface, and the peptide aptamer library can be allowed to come into contact with the surface.
  • Contacting the peptide aptamer library with the target can cause one or more synthetic peptide aptamers to bind the target.
  • synthetic peptide aptamers that are not bound to the target after contacting the peptide aptamer library with the target are removed.
  • the bound synthetic peptide aptamers can be released from the target for analysis.
  • the methods disclosed herein can further involve: generating the in silico peptide library to match the characteristics of the physical library used in the experiment and reducing the dimension for each MS/MS or MSn spectra searched using the observed amino acids from the MS/MS or MSn spectra; fitting the observed MS/MS or MSn spectra from the physical peptide that bound the target to the possible peptide in the in silico library using de novo, or goodness of fit, or regression or linear models or correlation or heuristic algorithms to determine the amino acid sequence and the molecular composition of matter of the physical peptide that bound the target.
  • the method can comprise generating a MS/MS query spectrum of the bound synthetic peptide aptamer.
  • the query spectrum can comprise one or more parameters.
  • the one or more parameters can be received from user input.
  • the one or more parameters can be one or more of an amino acid or an amino acid group.
  • a position corresponding to each of the amino acid or the amino acid group can be included.
  • the one or more parameters can be derived from the query spectrum by a processor.
  • the selection of candidate peptide sequences from memory can be based on the one or more parameters received that can be used as constraints to narrow the search space of candidate peptide sequences.
  • the one or more parameters can include a particular amino acid (e.g., lysine or arginine) for the candidate peptide sequence.
  • the one or more parameters can also include a particular position (e.g., last position) for the particular amino acid.
  • Peptide sequences can be identified from the plurality of peptide sequences stored in memory having the particular amino acid in the particular position and then the candidate peptide sequences can be selected from amongst the peptide sequences having the particular amino acid in the particular position.
  • candidate peptide sequences can be randomly generated to include known amino acids at known or unknown positions. Furthermore, the randomly generated candidate peptide sequences can be tailored to what is known about a particular binding target.
  • the pre-determined spectra intensity can be based on a candidate peptide sequence, such as a percentage of the spectra intensity of a candidate peptide sequence.
  • the percentage of the spectra intensity can be specified by the user.
  • the maximum spectra lines filtering can be applied dynamically.
  • the max spectra relates to the intensity values. Assuming a max spectra value of 50 the engine would only examine the 50 most intense spectra lines. This significantly reduces the computation time as a single MS2 spectra can contain thousands of spectra lines.
  • Precursor mass testing involves comparing a precursor mass of a query sample to the mass of a candidate peptide sequence. Query samples having a precursor mass matching the mass of any candidate peptide can be further processed. Query samples having a precursor mass that does not match any candidate peptide can be discarded from further consideration, and the next sample is assessed.
  • Precursor mass testing can involve determining a precursor mass for a query sample. If the precursor mass is substantially equal to a mass of a candidate peptide sequence of the one or more candidate peptide sequences, that query sample can be selected for comparison with the one or more candidate peptide sequences. The precursor mass can be considered to be substantially equal to a mass of a candidate peptide sequence within a pre-determined error range of the mass of the candidate peptide sequence. In some embodiments, the pre-determined error range can be specified by the user. [00107] In some embodiments, a precursor mass for a query sample can be determined at different charge states. For example, precursor masses at charge states of 1 , 2, and 3 can be determined.
  • Base peak testing involves comparing a mass of a theoretical ion of the query sample to the mass of a base peak.
  • Query samples having theoretical ions having a mass that matches the mass of the base peak can be further processed.
  • Query samples having theoretical ions having a mass that does not match the base peak can be discarded from further consideration, and the next sample is assessed.
  • a mass of a theoretical ion of the query sample is determined. If the mass of the theoretical ion is substantially equal to a mass of a base peak, that query sample can be selected for comparison with one or more candidate peptide sequences.
  • the precursor mass can be added to the theoretical spectra.
  • the likelihood of identifying the first amino acid in the peptide sequence can be increased and the Amino Acid Match Ratio (AAMR) score can be improved.
  • a theoretical ion can be added to the theoretical spectra, that is the low end of the spectra can be added to.
  • the theoretical ion can represent a terminal mass. Addition of the theoretical ion can improve the peptide match mass ratio score.
  • a theoretical peptide’s MH value at charge state 1 , or 2 or 3 could be added to the theoretical spectra. Addition of the theoretical peptide’s MH value at charge state 1 , or 2 or 3 can apply to the peptide match score. Once a random peptide is generated, both the theoretical spectra and MH value can be calculated. The MH mass can be compared to the precursor mass so as to calculate the peptide's charge. The mass spectrometer however, regardless of the precursor mass may well be reporting the spectra for a peptide at each of the three charge states within a single MS2 scan.
  • the real 1 + spectra representing the unknown peptide can generally provide the most usable spectra.
  • the 1 + spectra can be used as the primary identification signal followed by examination of the 2+ and 3+ spectra for additional evidence of a good identification.
  • the MH of a particular peptide can be injected into the observed spectra. Addition of the MH of a particular peptide can be used to calculate the peptide match score.
  • the difference in theoretical mass and the observed peptide mass can be calculated using the estimated charge of a theoretical peptide, the known theoretical mass of the same theoretical peptide, and the known observed precursor mass.
  • the delta mass can be used to identify a modification which is an extra chemical element on the peptide. This is also known as a modification mass. For example, phosphorylation is a commonly observed post translational modification and has a known mass shift of 79.99 Da; if the delta mass is about this value, then it can be determined that the peptide is phosphorylated.
  • application of signal processing filters can vary. For example, in some embodiments, only minimum spectra counting can be used. In other embodiments, minimum spectra counting, maximum spectra lines filtering, sequential ion (e.g., b and y ions) counting, and precursor testing can be used. Other combinations are possible. Furthermore, the signal processing filters can also depend on whether a candidate peptide sequence is selected from memory or randomly generated.
  • Candidate peptide sequences can be selected from memory.
  • one theoretical peptide or candidate peptide sequence at a time is compared to all available scans by tests and scoring algorithms until the peptide list is exhausted.
  • each theoretical peptide and spectrum pair must pass a precursor mass test and base peak test before they are scored by at least one of chi square, multiple/linear or nested regression, amino acid match ratio, and ion intensity match ratio. A user defined combination of these scores can be used to identify the best match for each scan.
  • one query sample can be compared to each candidate peptide sequence before another query sample is compared to each candidate peptide sequences.
  • Candidate peptide sequences can be randomly generated.
  • a “first in first out” (FIFO) queue of candidate peptide sequences can be used. The queue grows as candidate peptide sequences are generated and shrinks each time a candidate peptide sequence is processed.
  • FIFO first in first out
  • Constraints can be used to narrow the possible combinations for randomly generated candidate peptide sequences. Constraints can be specified by the user, by one or more parameters. Constraints can also be determined from the library search method or from amino acid matching. As set out above, constraints can relate to a pre-determined length, a minimum number of amino acids from different amino acid groups, a particular amino acid at a particular position, and/or a precursor mass.
  • a constraint relating to the precursor mass is used. All query samples having a precursor mass within a tolerance window can be retrieved and a list of the base peak values created from these scans.
  • a candidate peptide sequence can be randomly generated based on a pre-determined length and wild cards, and then the MH value of that candidate peptide sequence can be generated at each of 3 charge states.
  • the precursor test to the candidate peptide sequence can be run and if passed, theoretical spectra can be generated and the base peak test can be run. If the base peak test is passed, the more computationally intensive AAMR and I IM R tests are run. If these last two scores are above minimum thresholds, then the candidate peptide sequence is considered a match and is stored.
  • the precursor and base peak tests can be repeated again later in the workflow due to technical reasons.
  • the precursor hint can be a list of m/z values so one candidate peptide sequence might not match at the precursor at the precursor level for a particular scan, and similarly it may not pass the base peak test.
  • a likelihood indicator for each candidate peptide sequence can be determined based on a comparison with the at least one query sample.
  • determining a likelihood indicator for each candidate peptide sequence can involve applying scoring techniques, filtering techniques or a combination thereof.
  • scoring techniques can first be applied.
  • the scoring techniques can be analogous to the signal processing filters, including spectra line filtering.
  • the likelihood indicator can alternatively or additionally be determined based on additional filtering techniques including one or more of a chi-square score, a regression score, and a cross correlation score. Other scoring functions can be used to generate a likelihood indicator.
  • a chi-square score can be used to rank candidate peptide sequence matches.
  • a candidate peptide sequence can be selected as a proposed peptide sequence based on differences between an observed ion count of the candidate peptide sequence and the predicted theoretical ion count.
  • the chi-square score can be calculated by summing the number of expected theoretical ions (e.g., b and y ions) for the expected peptide that fall within the M/z range of the mass spectrometer; summing the number of observed ions that match the expected M/z values, within the mass resolution limit of the mass spectrometer and applying Equation (2).
  • expected theoretical ions e.g., b and y ions
  • the best fit per spectrum can be selected by Library and Search Engine.
  • the counts may be split off into the method where the SpectralD scored the best, or had the highest likelihood indicator. In this case there is minimal overlap between the two methods except where the SpectralD scored equally highest in both methods.
  • Searchdefinition ID corresponds to the library and search settings.
  • the method disclosed herein can comprise applying a signal to noise filter.
  • Noise can be generated from various sources.
  • noise can come from a statistical control and/or a non-specific binding control.
  • a statistical control can include, for example, an electromagnetic noise control, a random MS/MS spectra control, or both.
  • a pre-immune serum may be used in binding.
  • An affinity column without an antigen attached, a affinity column with a control antigen attached, a naive affinity column that has not been pre-treated (blocked by a blocking agent), and/or a conditioned affinity column can also be a non-specific binding control.
  • an observation frequency of at least one control and an observation frequency of at least one query sample can be determined. If the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the at least one query sample can be further analyzed and/or the candidate spectrum can be selected as a proposed spectrum.
  • a threshold of the difference between the observation frequencies may be set to allow the at least one query sample to be further analyzed.
  • applying the signal to noise filter comprises determining an observation frequency of at least one query sample. In some embodiments, applying the signal to noise filter comprises determine an observation frequency of at least one control. In some embodiments, if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the at least one query sample is further analyzed. In some embodiments, if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
  • applying the signal to noise filter comprises (i) determining an observation frequency of at least one query sample; (ii) determine an observation frequency of at least one control; (iii) if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
  • the method can comprise the use of experimental controls.
  • the use of experimental controls can comprise the use of: (1 ) naive blank columns (never used) or conditioned blank columns; (2) control affinity support resin without the specific antigen, ligand, receptor, or diagnostic or therapeutic target; (3) a support resin with a control antigen, ligand, receptor, or diagnostic or therapeutic target; (4) support resin with the specific antigen, ligand, receptor, or diagnostic or therapeutic target for use with pre-immune serum; or (5) some other controls from tissues, cells or bodily fluids.
  • the method may comprise the use of statistical controls, which can comprise the use of naturally-occurring electromagnetic noise or random MS/MS spectra or computer generated random MS/MS spectra to select potential peptide sequences with a higher probability of being true positive peptide identification from MS/MS spectra.
  • the target comprises a receptor or a fragment thereof. In one embodiment, the target comprises a ligand or a fragment thereof. In one embodiment, the target comprises an enzyme or a fragment thereof. In one embodiment, the target comprises a protein or a fragment thereof. In one embodiment, the target comprises an antibody or a fragment thereof. In one embodiment, the target comprises a variable domain or a fragment thereof. In one embodiment, the target comprises a drug.
  • the target is immobilized on microbeads, nanobeads, a 2-dimensional surface, a 3-dimensional scaffold, and/or a 3-dimensional fiber.
  • Synthetic peptides may be obtained from a random combinatorial approach.
  • Synthetic peptides may be obtained where fixed residues are comprised of hydrophobic amino acids that may contribute to binding including but not limited to W, H, F, M, Y, C or L.
  • a peptide aptamer library will be designed with tryptic and chymotryptic sites spaced to create several subdomains and in each domain a defined subset of the 20 amino acids will be randomly ordered alongside invariant amino acids with acidic, basic or hydrophobic characters. While an aptamer of all 20 amino acid randomly ordered would create a large computational problem. Moreover, randomizing all 20 amino acids would also create a chemical problem where there are too many peptide species in the library for anyone to reach a concentration in the atto molar or femtomolar or picomolar or nanomolar range for binding assays.
  • a synthetic peptide with the following structure can be made for generating a library of peptide aptamers (FIG. 3):
  • Specific high observation frequency peptide adapters can be:
  • Example 4 Using the aptamer library to identify binding agents to a target
  • the aptamer library will be incubated with the receptor, ligand, enzyme, protein, antibody, variable domain or drug to induce binding.
  • the receptor, ligand, enzyme, protein, antibody, variable domain or drug will be immobilized on microbeads or nanobeads or a flat 2 dimensional surface or 3 dimensional scaffold or fiber.
  • Bound peptides will be eluted with strong salt solutions, strong mixtures of organic solvents with water, strong acids or bases far from neutral pH.
  • the eluted peptides will be identified by mass spectrometry, or top down mass spectrometry, or electrospray ionization, or MALDI ionization or chemical ionization or electron impact ionization or LC-ESI-MS/MS.
  • the amino acid sequence will be derived from the MS/MS spectra by de novo sequencing.
  • the observed MS/MS spectra will be fitted to a predicted library of MS/MS spectra by 64 bit computation using de novo sequencing using XTANDEM, SEQUEST, or regression, or goodness of fit, or heuristic algorithms, or other algorithms.
  • Real peptides and their fragments frequently contain heavy isotopes and so a filter to look for spectrum lines where there is a presence of isotopes of hydrogen rearrangements or H loss may or hydrogen rearrangements and rarely - using isotope filtering to remove the noise.
  • Isotopes and hydrogen re-arrangements or losses may occur in precursor peptides or fragments.
  • accession numbers provided herein including for example accession numbers and/or biomarker sequences (e.g. protein and/or nucleic acid) provided in the Tables or elsewhere, are incorporated by reference in its entirely.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Microbiology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Immunology (AREA)
  • Signal Processing (AREA)
  • Pharmacology & Pharmacy (AREA)

Abstract

Provided herein are methods of making a peptide aptamer library of partially random peptide aptamers. The peptide aptamer library can be designed with tryptic and chymotryptic sites spaced to create domains. In each domain, a defined subset of amino acids are randomly ordered alongside invariant amino acids. Also provided herein are methods to identify peptide aptamers that bind a target using the peptide aptamer library disclosed herein.

Description

TITLE: METHOD OF GENERATING AND SCREENING SYNTHETIC PEPTIDE APTAMER LIBRARIES
CROSS-REFERENCE TO RELATED APPLICATIONS:
[0001] This application claims priority from U.S. Provisional Application No. 63/616,404, filed December 29, 2023, the content of which is incorporated herein by reference.
FIELD:
[0002] The present disclosure generally relates to identifying peptide aptamers that bind a target, specifically by using a peptide aptamer library of partially random peptide aptamers.
BACKGROUND:
[0003] Methods to identify protein-protein interactions at the proteome level is known in the art (see e.g. Elhabashy et al (2022) Exploring protein-protein interactions at the proteome level, Structure, 30: 462-475; Richards et al (2021 ) Mass spectrometrybased protein-protein interaction networks for the study of human diseases. Mol Syst Biol. 17(1 ):e8792). LC-ESI-MS/MS can be used to identify the ligands of plasma and identify their receptor complexes on the surface of live cells. However, there remains a need for systematic affinity ligand-receptor systems to characterize protein interaction within the cell and on the cell surface, for the discovery of new circulating ligands and receptor drug targets for diagnostic and therapeutic purposes.
SUMMARY:
[0004] In one aspect, provided herein is a method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide, the at least one synthetic peptide comprises two or more domains separated by a protease cleavage site, wherein at least one domain comprises at least one amino acid this is randomly selected from a subset of the 20 common amino acids; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; and analyzing the bound synthetic peptide aptamers, comprising: generating a MS/MS query spectrum of the bound peptide aptamer; receiving one or more parameters of the query spectrum; generating one or more candidate peptide sequences based on the one or more parameters of the query spectrum; generating a plurality of samples of the query spectrum; selecting at least one query sample from the plurality of query samples for comparison with the one or more candidate peptide sequences; determining a likelihood indicator for each of the one or more candidate peptide sequences based on a comparison with the at least one query sample; applying a signal to noise filter to the one or more candidate peptide sequences based on the likelihood indicators for the candidate peptide sequences; and selecting at least one candidate peptide sequence as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
[0005] In another aspect, provided herein is a method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; analyzing the bound synthetic peptide aptamers; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
[0006] In some embodiments, the at least one synthetic peptide comprises two or more domains separated by a cleavage site.
[0007] In some embodiments, at least one domain comprises at least one random amino acid.
[0008] In some embodiments, the at least one random amino acid is randomly selected from a subset of amino acids.
[0009] In some embodiments, the subset of amino acids comprises a subset of the 20 common amino acids.
[0010] In some embodiments, the subset of amino acids comprises a subset of fewer than 5 amino acids. [0011] In some embodiments, the cleavage site comprises a protease cleavage site.
[0012] In some embodiments, the cleavage site comprises a chemical cleavage site.
[0013] In some embodiments, the at least one synthetic peptide further comprises at least one predetermined amino acid at a predetermined position within at least one domain.
[0014] In some embodiments, analyzing the bound synthetic peptide aptamers comprises: generating a MS/MS query spectrum of the bound peptide aptamer; receiving one or more parameters of the query spectrum; generating one or more candidate peptide sequences based on the one or more parameters of the query spectrum; generating a plurality of samples of the query spectrum; selecting at least one query sample from the plurality of query samples for comparison with the one or more candidate peptide sequences; determining a likelihood indicator for each of the one or more candidate peptide sequences based on a comparison with the at least one query sample; applying a signal to noise filter to the one or more candidate peptide sequences based on the likelihood indicators for the candidate peptide sequences; and selecting at least one candidate peptide sequence as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences.
[0015] In some embodiments, applying the signal to noise filter comprises (i) determining an observation frequency of at least one query sample; (ii) determine an observation frequency of at least one control; (iii) if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
[0016] In some embodiments, the method further comprises releasing bound synthetic peptide aptamers from the target after removing unbound synthetic peptide aptamers.
[0017] In some embodiments, generating the MS/MS query spectrum comprises generating a MS/MS query spectrum of the released bound synthetic peptide aptamers. [0018] In some embodiments, wherein the target comprises a receptor, a ligand, an enzyme, a protein, an antibody, a variable domain, or a drug.
[0019] In some embodiments, identifying the synthetic peptide aptamer comprises fitting of observed MS/MS spectra to a predicted library.
[0020] In some embodiments, the fitting of observed MS/MS spectra to the predicted library comprises 64 bit computation.
[0021] In some embodiments, the 64 bit computation comprises use of cross correlation, XTANDEM, SEQUEST, regression, goodness of fit, count of fragment spectra matches, heuristic algorithms, or combination thereof.
[0022] In yet another aspect, provided herein is a method of making a synthetic peptide aptamer library, comprising: providing a synthetic peptide, the synthetic peptide comprises two or more domains separated by a cleavage site; and cleaving the synthetic peptide to generate a pool of cleavage products, thereby producing the synthetic peptide aptamer library.
[0023] Other features and advantages of the present disclosure will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples while indicating embodiments of the disclosure are given by way of illustration only, since various changes and modifications within the spirit and scope of the disclosure will become apparent to those skilled in the art from this detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS:
[0024] The embodiments of the application will now be described in greater detail with reference to the attached drawings in which:
[0025] Figure 1 is an embodiment of the method to create new peptide aptamer drugs. The general scheme shows creating new punctuated peptide aptamer drugs with fixed amino acids involved in ligand-receptor binding and cleavage sites where each cleave product may contain a pool of >2 amino acids.
[0026] Figure 2 is a flow diagram of identifying synthetic peptide aptamers that bind a target according to one embodiment. [0027] Figure 3 depicts a polypeptide for generating synthetic peptide aptamers according to one embodiment.
DETAILED DESCRIPTION OF THE DISCLOSURE:
[0028] The following is a detailed description provided to aid those skilled in the art in practicing the present disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in the description herein is for describing particular embodiments only and is not intended to be limiting of the disclosure. All publications, patent applications, patents, figures and other references mentioned herein are expressly incorporated by reference in their entirety.
[0029] All of the features disclosed in this specification may be combined in any combination. Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent, or similar purpose. Thus, unless expressly stated otherwise, each feature disclosed is only an example of a generic series of equivalent or similar features.
I. Definitions
[0030] In understanding the scope of the present disclosure, the term "comprising" and its derivatives, as used herein, are intended to be open ended terms that specify the presence of the stated features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. The foregoing also applies to words having similar meanings such as the terms, "including", "having" and their derivatives.
[0031] All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
[0032] The term “consisting” and its derivatives, as used herein, are intended to be closed ended terms that specify the presence of stated features, elements, components, groups, integers, and/or steps, and also exclude the presence of other unstated features, elements, components, groups, integers and/or steps. [0033] Further, terms of degree such as "substantially", "about" and "approximately" as used herein mean a reasonable amount of deviation of the modified term such that the end result is not significantly changed. These terms of degree should be construed as including a deviation of at least ±5% of the modified term if this deviation would not negate the meaning of the word it modifies.
[0034] More specifically, the term “about” means plus or minus 0.1 to 20%, 5-20%, or 10-20%, 10%-15%, preferably 5-10%, most preferably about 5% of the number to which reference is being made.
[0035] As used in this specification and the appended claims, the singular forms “a”, “an” and “the” include plural references unless the content clearly dictates otherwise. Thus, for example, a composition containing “a compound” includes a mixture of two or more compounds. It should also be noted that the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
[0036] The definitions and embodiments described in particular sections are intended to be applicable to other embodiments herein described for which they are suitable as would be understood by a person skilled in the art.
[0037] The recitation of numerical ranges by endpoints herein includes all numbers and fractions subsumed within that range (e.g. 1 to 5 includes 1 , 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term "about."
[0038] Further, the definitions and embodiments described in particular sections are intended to be applicable to other embodiments herein described for which they are suitable as would be understood by a person skilled in the art. For example, in the following passages, different aspects of the disclosure are defined in more detail. Each aspect so defined may be combined with any other aspect or aspects unless clearly indicated to the contrary.
[0039] Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, examples of methods and materials are now described.
II. Methods [0040] Peptide aptamers are artificial protein molecules that bind to targets, for example, target proteins. A proteomic approach may be used to identify peptide aptamers that bind a specific target. However, randomizing all 20 amino acids would create a large computational problem. Furthermore, randomizing all 20 amino acids would also create a chemical problem where there are too many peptide species in the library of peptide aptamers. It is disclosed herein that the problems associated with mass spectrometry of peptide aptamers can be overcome by using a peptide aptamer library of partially random peptide aptamers.
[0041] Accordingly, in one aspect, provided herein is a method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide, the at least one synthetic peptide comprises two or more domains separated by a protease cleavage site, wherein at least one domain comprises at least one amino acid this is randomly selected from a subset of the 20 common amino acids; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; and analyzing the bound synthetic peptide aptamers, comprising: generating a MS/MS query spectrum of the bound peptide aptamer; receiving one or more parameters of the query spectrum; generating one or more candidate peptide sequences based on the one or more parameters of the query spectrum; generating a plurality of samples of the query spectrum; selecting at least one query sample from the plurality of query samples for comparison with the one or more candidate peptide sequences; determining a likelihood indicator for each of the one or more candidate peptide sequences based on a comparison with the at least one query sample; applying a signal to noise filter to the one or more candidate peptide sequences based on the likelihood indicators for the candidate peptide sequences; and selecting at least one candidate peptide sequence as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
[0042] In one aspect, provided herein is a method of identifying a synthetic peptide aptamer that binds a target. The synthetic peptide aptamer can be from a library comprising a plurality of synthetic peptide aptamers. The method can comprise providing the peptide aptamer library. The peptide aptamer library can comprise the plurality of synthetic peptide aptamers. The plurality of synthetic peptide aptamers can be generated from at least one synthetic peptide. The at least one synthetic peptide can comprise two or more domains. The two or more domains can be separated by a protease cleavage site. The at least one domain can comprise at least one amino acid that is randomly selected from a subset of amino acids. The subset of common acids can be a subset of 20 common amino acids. The method can comprise contacting the peptide aptamer library with the target. The method can comprise removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target. The method can comprise analyzing the bound synthetic peptide aptamers. The method can comprise generating a MS/MS query spectrum of the bound synthetic peptide aptamers. The method can comprise receiving one or more parameters of the query spectrum. The method can comprise providing one or more candidate spectra. The method can comprise generating a plurality of query samples of the query spectrum. The method can comprise selecting at least one query sample from the plurality of query samples for comparison with the one or more candidate spectra. The method can comprise determining a likelihood indicator for each of the one or more candidate spectra based on a comparison with the at least one query sample. The method can comprise applying a signal to noise filter to the one or more candidate spectra based on the likelihood indicators for the candidate spectra. The method can comprise selecting at least one candidate spectrum as a proposed spectrum. The method can comprise determining a peptide sequence of the proposed spectrum, thereby identifying the synthetic peptide aptamer of the plurality of peptide aptamers that binds the target.
[0043] In another aspect, provided herein is a method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; analyzing the bound synthetic peptide aptamers; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
[0044] In one aspect, provided herein is a method of identifying a synthetic peptide aptamer that binds a target. The synthetic peptide aptamer can be from a library comprising a plurality of synthetic peptide aptamers. The method can comprise providing the peptide aptamer library. The peptide aptamer library can comprise the plurality of synthetic peptide aptamers. The plurality of synthetic peptide aptamers can be generated from at least one synthetic peptide. The method can comprise contacting the peptide aptamer library with the target. The method can comprise removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target. The method can comprise analyzing the bound synthetic peptide aptamers, thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
Synthetic peptide aptamers
[0045] In some embodiments, the method comprises providing a peptide aptamer library comprising a plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide.
[0046] As used herein, the terms “protein”, “peptide”, “polypeptide” and the like are interchangeable and refer to any chain of two or more natural or unnatural amino acid residues. The terms encompass modifications, such as modifications to the backbone and modifications to side chains.
[0047] As used herein, the term “synthetic peptide” refers to any peptide that is not obtained from a natural source. For example, a synthetic peptide can be generated by chemical synthesis. Chemical synthesis can involve techniques well known in the chemistry of proteins such as solid phase synthesis (Merrifield, 1964, J. Am. Chem. Assoc. 85:2149-2154) or synthesis in homogenous solution (Houbenweyl, 1987, Methods of Organic Chemistry, ed. E. Wansch, Vol. 15 I and II, Thieme, Stuttgart). The peptide may also be prepared using a recombinant protein expression system.
[0048] The synthetic peptide can comprise two or more domains. The synthetic peptide can further comprise at least one cleavage site. [0049] In some embodiments, the at least one synthetic peptide comprises two or more domains separated by a cleavage site.
[0050] As used herein, the term “domain”, in the context of the synthetic peptide, refers to a region of the synthetic peptide flanked at least on one side by one or more cleavage sites.
[0051] The structure of the synthetic peptide may be represented by:
Di-Ci ...-Dx+i wherein D is a domain; C is a cleavage site; - is a bond.
[0052] Examples of the structure of the synthetic peptide include: D1-C1-D2, D1-C1- D2-C2-D3, D1-C1-D2-C2-D3 C3-D4-C4-D5 Cs-De, etc.
[0053] In some embodiments, the synthetic peptide comprises two domains. In some embodiments, the synthetic peptide comprises three domains. In some embodiments, the synthetic peptide comprises four domains. In some embodiments, the synthetic peptide comprises five domains. In some embodiments, the synthetic peptide comprises six domains. In some embodiments, the synthetic peptide comprises seven domains. In some embodiments, the synthetic peptide comprises eight domains. In some embodiments, the synthetic peptide comprises nine domains. In some embodiments, the synthetic peptide comprises ten domains. In some embodiments, the synthetic peptide comprises more than ten domains.
[0054] In some embodiments, the synthetic peptide comprises at least one cleavage site between two adjacent domains. In some embodiments, the at least one synthetic peptide comprises two or more domains with at least one cleavage site between two adjacent domains.
[0055] The cleavage site can be a chemical cleavage site or a protease cleavage site, such as a trypsin cleavage site or a chymotrypsin cleavage site.
[0056] Non-specific and/or sequence specific proteases can be used. Examples of proteases that can be used include but are not limited to Arg-C, Asp-N, Asp-N (N-terminal Glu), BNPS or NCS/urea, Chymotrypsin, Chymotrypsin (low specificity), Clostripain, Glu- C (AmAc buffer), Glu-C (Phos buffer), Lys-C, Lys-N, Lys-N (Cys modified), Pancreatic elastase, Pepsin A, Pepsin A (low specificity), Prolyl endopeptidase, Proteinase K, Thermolysin, Trypsin, Trypsin (Arg blocked), Trypsin (Cys modified), Trypsin (Lys blocked). Examples of sequence-specific protease include but are not limited to Caspase- 1 , Caspase-2, Caspase-3, Caspase-4, Caspase-5, Caspase-6, Caspase-7, Caspase-8, Caspase-9, Caspase-10, Enterokinase, Factor Xa, Granzyme B, HRV3C protease, TEV protease, and/or thrombin.
[0057] Examples of chemical digest include but are not limited to CNBr, CNBr (methyl-Cys), CNBr (with acids), Formic acid, Glu-C (AmAc buffer), Glu-C (Phos buffer), Hydroxylamine, lodosobenzoic acid, Mild acid hydrolysis, NBS (long exposure), NBS (short exposure), and/or NTCB.
[0058] In some embodiments, the cleavage site comprises a protease cleavage site. In one embodiment, the cleavage site comprises a trypsin cleavage site. In one embodiment, the cleavage site comprises a chymotrypsin cleavage site.
[0059] Trypsin preferentially cleaves C-terminal to arginine or lysine, and chymotrypsin preferentially cleaves C-terminal to tryptophan, tyrosine, or phenylalanine. In some embodiments, the at least one cleavage site comprises an arginine. In some embodiments, the at least one cleavage site comprises a lysine. In some embodiments, the at least one cleavage site comprises a tryptophan. In some embodiments, the at least one cleavage site comprises a tyrosine. In some embodiments, the at least one cleavage site comprises a phenylalanine.
[0060] In some embodiments, the cleavage site comprises a chemical cleavage site.
[0061 ] The sequence of a domain can be randomly generated. For example, a pool of different amino acids, rather than a specific amino acid, can be used to couple an amino acid at a particular cycle during chemical synthesis. Further, the sequence can be randomly generated using a subset of amino acids.
[0062] In some embodiments, at least one domain comprises at least one random amino acid. In some embodiments, wherein the at least one random amino acid is randomly selected from a subset of amino acids. [0063] From a set of amino acids, a subset can be selected for generating one or more random amino acids in a domain. The number of amino acids in the subset can determine the complexity to the resulting aptamer library.
[0064] In some embodiments, the subset of amino acids comprises a subset of fewer than 20 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 19 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 18 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 17 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 16 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 15 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 14 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 13 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 12 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 11 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 10 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 9 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 8 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 7 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 6 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 5 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 4 amino acids. In some embodiments, the subset of amino acids comprises a subset of fewer than 3 amino acids. In some embodiments, the subset of amino acids comprises a subset of 2 amino acids.
[0065] In some embodiments, the subset of amino acids comprises a subset of between 2 and 19 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 18 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 17 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 16 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 15 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 14 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 13 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 12 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 11 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 10 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 9 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 8 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 7 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 6 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 5 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 and 4 amino acids. In some embodiments, the subset of amino acids comprises a subset of between 2 or 3 amino acids.
[0066] In some embodiments, the subset of amino acids comprises a subset of the
20 common amino acids.
[0067] As used herein, the term “20 common amino acids” refers to the amino acids alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine.
[0068] The subset of amino acids is not limited to a subset of the 20 common amino acids. Subsets of other amino acid sets can also be used. An amino acid set can comprise one or more common amino acids, one or more uncommon amino acids, one or more unnatural amino acids, or any combination thereof.
[0069] The different domains of the synthetic peptide can be randomly generated using the same or different sets of amino acids.
[0070] In some embodiments, at least one of the domains comprises a random sequence.
[0071] In some embodiments, the random sequence is a random sequence of 20 amino acids. In some embodiments, the random sequence is a random sequence of 19 amino acids. In some embodiments, the random sequence is a random sequence of 18 amino acids. In some embodiments, the random sequence is a random sequence of 17 amino acids. In some embodiments, the random sequence is a random sequence of 16 amino acids. In some embodiments, the random sequence is a random sequence of 15 amino acids. In some embodiments, the random sequence is a random sequence of 14 amino acids. In some embodiments, the random sequence is a random sequence of 13 amino acids. In some embodiments, the random sequence is a random sequence of 12 amino acids. In some embodiments, the random sequence is a random sequence of 11 amino acids. In some embodiments, the random sequence is a random sequence of 10 amino acids. In some embodiments, the random sequence is a random sequence of 9 amino acids. In some embodiments, the random sequence is a random sequence of 8 amino acids. In some embodiments, the random sequence is a random sequence of 7 amino acids. In some embodiments, the random sequence is a random sequence of 6 amino acids. In some embodiments, the random sequence is a random sequence of 5 amino acids. In some embodiments, the random sequence is a random sequence of 4 amino acids. In some embodiments, the random sequence is a random sequence of 3 amino acids. In some embodiments, the random sequence is a random sequence of 2 amino acids.
[0072] The synthetic peptide can further comprise one or more invariant amino acid residues. The one or more invariant amino acid residues can have acidic, basic or hydrophobic characters.
[0073] In some embodiments, the peptide further comprises at least one predetermined amino acid residue at a predetermined position within at least one domain.
[0074] In some embodiments, the at least one predetermined amino acid residue comprises a hydrophobic amino acid.
[0075] In some embodiments, the at least one predetermined amino acid residue comprises tryptophan, histidine, phenylalanine, methionine, tyrosine, cysteine, or lysine.
[0076] The peptide can further comprise different random pools of amino acids between two cleavage sites, between two invariant amino acid residues, or between a cleavage site and an invariant amino acid residue, wherein each random pool comprises a pool of a subset of the 20 common amino acids. [0077] In some embodiments, at least one domain comprises at least one random amino acid.
[0078] In some embodiments, the at least one random amino acid is randomly selected from a subset of the 20 common amino acids.
[0079] For example, a peptide aptamer library can be designed with tryptic and chymotryptic sites spaced to create several subdomains and in each domain an defined subset of the 20 common amino acids can be randomly ordered alongside invariant amino acids with acidic, basic or hydrophobic characters. An aptamer of all 20 common amino acids randomly ordered would create a large computational problem. Moreover, randomizing all 20 common amino acids would also create a chemical problem where there were too many peptide species in the library for anyone to reach a concentration in the atto molar or femtomolar or picomolar or nanomolar range for binding assays. In contrast, by inserting tryptic and chymotryptic sites along the backbone of the peptide aptamer, restricting the number of random amino acids in each results cleavage peptide and inserting invariant amino acids at known location, the whole aptamer may contain <20 random amino acids or modifications thereof, while the cleavage products are each short enough and simple enough to be identified by tandem mass spectrometry.
[0080] As used herein, the term “contacting the peptide aptamer library with the target” means allowing the peptide aptamer library and the target to interact by any means. In some embodiments, the target can be immobilized on a surface, and the peptide aptamer library can be allowed to come into contact with the surface. Contacting the peptide aptamer library with the target can cause one or more synthetic peptide aptamers to bind the target. In some embodiments, synthetic peptide aptamers that are not bound to the target after contacting the peptide aptamer library with the target are removed. In some embodiments, the bound synthetic peptide aptamers can be released from the target for analysis.
Fitting of spectra
[0081] Identifying the synthetic peptide aptamer that binds the target can be performed using a computational method described in W02024000077, the content of which is incorporated by reference herein in its entirety. [0082] In some embodiments, the method can further comprise analyzing the bound synthetic peptide aptamers by MS/MS or MSn. In some embodiments, the MS/MS or MSn can be fitted to peptide sequences with or without a database or using statistical methods to fit the peptides to randomly generated or other peptides. Optionally, the methods can also involve using the principles of peptide library design and de novo sequencing to constrain and guide the creation of a peptide library based on the amino acid composition determined from MS, MS/MS, and/or MSn. The methods disclosed herein can further involve: generating the in silico peptide library to match the characteristics of the physical library used in the experiment and reducing the dimension for each MS/MS or MSn spectra searched using the observed amino acids from the MS/MS or MSn spectra; fitting the observed MS/MS or MSn spectra from the physical peptide that bound the target to the possible peptide in the in silico library using de novo, or goodness of fit, or regression or linear models or correlation or heuristic algorithms to determine the amino acid sequence and the molecular composition of matter of the physical peptide that bound the target.
[0083] The methods disclosed herein can match spectra from a sample to peptide sequences. In some embodiments, a combination of scoring algorithms can be used, for example, based on a spectra line count, a count of sequential ions, and fitting algorithms based on chi square, linear regression, amino acid matching and a signal to noise intensity score to identify peptide sequences. The methods can use a search engine and the search engine can toggle back and forth between spectra generated from a reference library and a random number generator given certain parameters depending on the nature of the experiment. The search engine can employ a series of nested loops and can map fragments onto their respective precursor masses for multiple levels of fragmentation events (MSn). The disclosed embodiments can validate results by for example, counting a number of spectra in the sample that match each identified peptide sequence and comparing this count with a count for randomly simulated and/or experimentally derived (e.g., blank noise) spectra.
[0084] In some embodiments, the method can comprise generating a MS/MS query spectrum of the bound synthetic peptide aptamer. The query spectrum can comprise one or more parameters. The one or more parameters can be received from user input. For example, the one or more parameters can be one or more of an amino acid or an amino acid group. In addition, a position corresponding to each of the amino acid or the amino acid group can be included. The one or more parameters can be derived from the query spectrum by a processor. For example, the one or more parameters can be one or more of a neutral loss - including ammonia losses and water losses, a post-translational modification shift - which may only be located on a small group of amino acids, an immonium ion (e.g., methionine oxidation), a subtraction of a B ion, or a subtraction of a Y ion.
[0085] The method can further comprise generating one or more candidate peptide sequences based on the one or more parameters. In some embodiments, generating one or more candidate peptide sequences can involve selecting at least one of a plurality of peptide sequences stored in memory to use as at least one candidate peptide sequence. For example, the plurality of peptide sequences can be stored in a peptide database or library. In some embodiments, the plurality of peptide sequences can include naturally occurring peptide sequences and/or synthetic peptide sequences. Selecting candidate peptide sequences from the memory can be suitable when natural aptamers are from a known source and/or where the possible peptide sequences are known (e.g., but not limited to digested plasma).
[0086] The selection of candidate peptide sequences from memory can be based on the one or more parameters received that can be used as constraints to narrow the search space of candidate peptide sequences. For example, the one or more parameters can include a particular amino acid (e.g., lysine or arginine) for the candidate peptide sequence. The one or more parameters can also include a particular position (e.g., last position) for the particular amino acid. Peptide sequences can be identified from the plurality of peptide sequences stored in memory having the particular amino acid in the particular position and then the candidate peptide sequences can be selected from amongst the peptide sequences having the particular amino acid in the particular position.
[0087] In another example, the one or more parameters can include at least one amino acid group for the candidate peptide sequence (e.g., basic, acidic, neutral, aliphatic or polar amino acid). Each amino acid group can include a plurality of amino acids. Peptide sequences can be identified from the plurality of peptide sequences stored in memory having an amino acid within the amino acid group and then select the candidate peptide sequences from amongst the peptide sequences having an amino acid of the amino acid group.
[0088] In some embodiments, the one or more parameters can include both at least one particular amino acid and at least one particular amino acid group. Other constraints are possible.
[0089] In yet a further example, the one or more parameters can also include a particular position (e.g., last position) for the amino acid of a particular amino acid group. Peptide sequences can be identified from the plurality of peptide sequences stored in memory having an amino acid of the particular amino acid group in the particular position and then the candidate peptide sequences can be selected from amongst the peptide sequences having an amino acid in the particular amino acid in the particular position.
[0090] In some embodiments, generating a plurality of candidate peptide sequences can involve randomly generating at least one candidate peptide sequence. Randomly generating candidate peptide sequences can be suitable with synthetic peptide sequences or unknown peptide aptamers.
[0091] The one or more parameters can be used as constraints to narrow the possible combinations for randomly generated candidate peptide sequences. For example, the one or more parameters can include a predetermined length for the randomly generated candidate peptide sequence.
[0092] Additional constraints can be used to further narrow the possible combinations for randomly generated candidate peptide sequences. The one or more parameters can relate to a minimum number of amino acids from particular amino acids groups, a particular position for an amino acid from the particular amino acid group, a particular amino acid, and a particular position for the particular amino acid, a neutral loss, a post- translational modification shift, an immonium ion, a subtraction of a B ion, or a subtraction of a Y ion.
[0093] In some embodiments, at least one amino acid can be assigned to the randomly generated candidate peptide sequence based on the one or more parameters. For example, a particular amino acid can be assigned to a particular position, based on the one or more parameters. Amino acids can be randomly assigned for the remaining unassigned positions. [0094] In some embodiments, randomly generating a candidate peptide sequence for a pre-determined length can involve randomly selecting an unassigned position within the candidate peptide sequence. For the unassigned position, an amino acid group can be randomly selected from a plurality of amino acid groups and an amino acid from the selected amino acid group can be randomly selected. Amino acid groups can include, for example, an anchor for tryptic peptide, acidic, basic, polar uncharged, polar uncharged with amine side chain, sulfide R, neutral, aliphatic, and proline. The selected amino acid can be assigned to the unassigned position and the process can continue until each position is assigned.
[0095] In some embodiments, each randomly generated candidate peptide sequence can include at least one amino acid from each amino acid group. When the randomly generated candidate peptide sequence requires at least one amino acid from each amino acid group, the subsequent random amino acid group selection can be from amongst the unselected amino acid groups. However, other methods are possible for ensuring that at least one amino acid from each amino acid group is included in the randomly generated candidate peptide sequence.
[0096] In some embodiments, candidate peptide sequences can be randomly generated to include known amino acids at known or unknown positions. Furthermore, the randomly generated candidate peptide sequences can be tailored to what is known about a particular binding target.
[0097] In some embodiments, whether to select candidate peptide sequences from memory or by randomly generating candidate peptide sequences can be determined based on the one or more parameters.
[0098] In some embodiments, the one or more parameters can include a parameter specified by the user for selecting whether to select candidate peptide sequences from memory or by randomly generating candidate peptide sequences. In some embodiments, the one or more parameters can be compared to certain thresholds to determine whether to select candidate peptide sequences from memory or by randomly generating candidate peptide sequences.
[0099] In some embodiments, candidate peptide sequences can be selected from memory and randomly generated. For example, a subset of amino acids can be identified from a protein database and then the candidate peptide sequence can be randomly generated to identify post translational modifications (PTM) by including the amino acid masses with mass shifts, depending on the nature of the PTM.
[00100] In some embodiments, candidate peptide sequences can initially be selected from memory. Depending on the likelihood indicators, the same query spectrum can be analyzed again but with candidate peptide sequences randomly generated.
[00101 ] A plurality of samples of the query spectrum for which a peptide is being identified can be generated. In some embodiments, the samples of the query spectrum can be generated experimentally or by simulation. In some embodiments, the simulation can be a Monte Carlo random simulation. Generating a plurality of samples can involve generating spectra lines of the samples.
[00102] At least one query sample from the plurality of query samples can be selected for comparison with the one or more candidate peptide sequences. Selecting query samples of the plurality of query samples for comparison can reduce the computational burden of fitting a large dataset to a large search space. In order to select query samples for comparison, one or more signal processing filters can be applied. Signal processing filters can include minimum spectra counting, maximum spectra lines filtering, precursor mass testing, base peak testing, and/or delta mass filtering.
[00103] Minimum spectra counting involves discarding spectra with line counts lower than a given threshold as there is likely not enough information below the threshold of spectra lines to make a reliable peptide match. In some embodiments, a number of spectra lines of a query sample can be counted. In some embodiments, the number of matching sequential ion (e.g., b and y ions) spectra lines can be counted. If the number of spectra lines of the query sample or the number of matching sequential ion spectra lines is less than a pre-determined minimum number of spectra lines, that query sample can be excluded from comparison with the one or more candidate peptide sequences. In some embodiments, a correction is applied to the number of matching sequential ion spectra lines. For example, a ratio of the number of matching sequential ion spectra lines and the total number of spectra lines can be determined and the pre-determined minimum number of spectra lines can correspond to a pre-determined minimum ratio. In some embodiments, the predetermined minimum number of spectra lines can be specified by the user. For example, the pre-determined minimum number of spectra lines can be a parameter received from the user.
[00104] Maximum spectra lines filtering involves using the most intense spectra. For maximum spectra lines filtering, a spectra intensity of a query sample can be determined. In some embodiments, the sum of the spectra intensities of matching ion spectra lines, the total sum of the spectra intensity (i.e., the sum of the spectra intensities of all of the spectra lines), the ratio of the sum of the spectra intensities of the matching ion spectra lines and the total sum of the spectra intensity can be determined. If the ratio is less than a predetermined threshold, that query sample can be excluded from comparison with the one or more candidate peptide sequences. In some embodiments, the pre-determined spectra intensity can be based on a candidate peptide sequence, such as a percentage of the spectra intensity of a candidate peptide sequence. The percentage of the spectra intensity can be specified by the user. In this manner, with the pre-determined spectra intensity being based on a candidate peptide sequence, the maximum spectra lines filtering can be applied dynamically. The max spectra relates to the intensity values. Assuming a max spectra value of 50 the engine would only examine the 50 most intense spectra lines. This significantly reduces the computation time as a single MS2 spectra can contain thousands of spectra lines.
[00105] Precursor mass testing involves comparing a precursor mass of a query sample to the mass of a candidate peptide sequence. Query samples having a precursor mass matching the mass of any candidate peptide can be further processed. Query samples having a precursor mass that does not match any candidate peptide can be discarded from further consideration, and the next sample is assessed.
[00106] Precursor mass testing can involve determining a precursor mass for a query sample. If the precursor mass is substantially equal to a mass of a candidate peptide sequence of the one or more candidate peptide sequences, that query sample can be selected for comparison with the one or more candidate peptide sequences. The precursor mass can be considered to be substantially equal to a mass of a candidate peptide sequence within a pre-determined error range of the mass of the candidate peptide sequence. In some embodiments, the pre-determined error range can be specified by the user. [00107] In some embodiments, a precursor mass for a query sample can be determined at different charge states. For example, precursor masses at charge states of 1 , 2, and 3 can be determined.
[00108] In some embodiments, a precursor mass for a query sample can be determined with consideration of the presence of one or more post-translational modifications (PTMs). For example, the precursor mass can include a mass shift from one or more post-translational modifications.
[00109] Base peak testing involves comparing a mass of a theoretical ion of the query sample to the mass of a base peak. Query samples having theoretical ions having a mass that matches the mass of the base peak can be further processed. Query samples having theoretical ions having a mass that does not match the base peak can be discarded from further consideration, and the next sample is assessed. In some embodiments, a mass of a theoretical ion of the query sample is determined. If the mass of the theoretical ion is substantially equal to a mass of a base peak, that query sample can be selected for comparison with one or more candidate peptide sequences.
[00110] In some embodiments, if the precursor mass is not present in the observed spectra, the precursor mass can be added to the theoretical spectra. By adding the precursor mass to the theoretical spectra, the likelihood of identifying the first amino acid in the peptide sequence can be increased and the Amino Acid Match Ratio (AAMR) score can be improved.
[00111 ] In some embodiments, a theoretical ion can be added to the theoretical spectra, that is the low end of the spectra can be added to. The theoretical ion can represent a terminal mass. Addition of the theoretical ion can improve the peptide match mass ratio score.
[00112] In some embodiments, a theoretical peptide’s MH value at charge state 1 , or 2 or 3 could be added to the theoretical spectra. Addition of the theoretical peptide’s MH value at charge state 1 , or 2 or 3 can apply to the peptide match score. Once a random peptide is generated, both the theoretical spectra and MH value can be calculated. The MH mass can be compared to the precursor mass so as to calculate the peptide's charge. The mass spectrometer however, regardless of the precursor mass may well be reporting the spectra for a peptide at each of the three charge states within a single MS2 scan. If this is the case, then the real 1 + spectra representing the unknown peptide can generally provide the most usable spectra. In some embodiments, the 1 + spectra can be used as the primary identification signal followed by examination of the 2+ and 3+ spectra for additional evidence of a good identification.
[00113] In some embodiments, the MH of a particular peptide can be injected into the observed spectra. Addition of the MH of a particular peptide can be used to calculate the peptide match score.
[00114] In some embodiments, the difference in theoretical mass and the observed peptide mass can be calculated using the estimated charge of a theoretical peptide, the known theoretical mass of the same theoretical peptide, and the known observed precursor mass. The delta mass can be used to identify a modification which is an extra chemical element on the peptide. This is also known as a modification mass. For example, phosphorylation is a commonly observed post translational modification and has a known mass shift of 79.99 Da; if the delta mass is about this value, then it can be determined that the peptide is phosphorylated. Such modifications are relevant to the role a peptide may play with respect to human health as the presence of this element shows in the precursor mass but is not part of the theoretical mass, and also may not appear in the MS2 spectra. This delta mass value is then highly suggestive of some modification. To test for a modification, the modification mass is subtracted from the precursor mass. If the results of this calculation match the theoretical mass of the random peptide and the real MS2 spectra matches the theoretical spectra, it can be assumed that there is a theoretical peptide and a modification.
[00115] In some embodiments, application of signal processing filters can vary. For example, in some embodiments, only minimum spectra counting can be used. In other embodiments, minimum spectra counting, maximum spectra lines filtering, sequential ion (e.g., b and y ions) counting, and precursor testing can be used. Other combinations are possible. Furthermore, the signal processing filters can also depend on whether a candidate peptide sequence is selected from memory or randomly generated.
[00116] Candidate peptide sequences can be selected from memory. In some embodiments, one theoretical peptide or candidate peptide sequence at a time is compared to all available scans by tests and scoring algorithms until the peptide list is exhausted. In this embodiment, each theoretical peptide and spectrum pair must pass a precursor mass test and base peak test before they are scored by at least one of chi square, multiple/linear or nested regression, amino acid match ratio, and ion intensity match ratio. A user defined combination of these scores can be used to identify the best match for each scan. In some embodiments, one query sample can be compared to each candidate peptide sequence before another query sample is compared to each candidate peptide sequences.
[00117] Candidate peptide sequences can be randomly generated. In some embodiments, a “first in first out” (FIFO) queue of candidate peptide sequences can be used. The queue grows as candidate peptide sequences are generated and shrinks each time a candidate peptide sequence is processed.
[00118] Constraints can be used to narrow the possible combinations for randomly generated candidate peptide sequences. Constraints can be specified by the user, by one or more parameters. Constraints can also be determined from the library search method or from amino acid matching. As set out above, constraints can relate to a pre-determined length, a minimum number of amino acids from different amino acid groups, a particular amino acid at a particular position, and/or a precursor mass.
[00119] In an example, a constraint relating to the precursor mass is used. All query samples having a precursor mass within a tolerance window can be retrieved and a list of the base peak values created from these scans. A candidate peptide sequence can be randomly generated based on a pre-determined length and wild cards, and then the MH value of that candidate peptide sequence can be generated at each of 3 charge states. The precursor test to the candidate peptide sequence can be run and if passed, theoretical spectra can be generated and the base peak test can be run. If the base peak test is passed, the more computationally intensive AAMR and I IM R tests are run. If these last two scores are above minimum thresholds, then the candidate peptide sequence is considered a match and is stored.
[00120] The precursor and base peak tests can be repeated again later in the workflow due to technical reasons. In particular, the precursor hint can be a list of m/z values so one candidate peptide sequence might not match at the precursor at the precursor level for a particular scan, and similarly it may not pass the base peak test. [00121 ] A likelihood indicator for each candidate peptide sequence can be determined based on a comparison with the at least one query sample.
[00122] In some embodiments, determining a likelihood indicator for each candidate peptide sequence can involve applying scoring techniques, filtering techniques or a combination thereof. For example, scoring techniques can first be applied. The scoring techniques can be analogous to the signal processing filters, including spectra line filtering.
[00123] The likelihood indicator can alternatively or additionally be determined based on additional filtering techniques including one or more of a chi-square score, a regression score, and a cross correlation score. Other scoring functions can be used to generate a likelihood indicator.
[00124] In some embodiments, a chi-square score can be used to rank candidate peptide sequence matches. For example, a candidate peptide sequence can be selected as a proposed peptide sequence based on differences between an observed ion count of the candidate peptide sequence and the predicted theoretical ion count.
[00125] Chi-square can be applied to the m/z values of observed and expected spectra lines for all matching ions as described by Equation (1 ) where the smallest score is highest ranked (i.e., wins):
(ObsMzl-ExpMzl)2 (0bsMz2-ExpMz2)z (ObsMzn-ExpMzn)2
%2 = Equation (1 )
ExplMz Exp2Mz ExpNMz
[00126] Alternatively, the chi-square score can be calculated by summing the number of expected theoretical ions (e.g., b and y ions) for the expected peptide that fall within the M/z range of the mass spectrometer; summing the number of observed ions that match the expected M/z values, within the mass resolution limit of the mass spectrometer and applying Equation (2).
Equation (2)
Figure imgf000027_0001
[00127] In some embodiments, prior to applying Equation (2), a correction can be applied to the expected theoretical ion count. A correction can be applied to compensate for the M/z range and/or mass error of the mass spectrometer. In some embodiments, applying the correction involves multiplying the expected theoretical ion count by the total number of observed ions and dividing the product by the number mass spectrometer bins. The number of bins can be determined according to Equation (3). Equation (3)
Figure imgf000028_0001
[00128] For example, for a mass spectrometer with a range of 50 to 2000 M/z units and a mass resolution of +/- 0.5 M/z units, the number of bins is equal to 2000-50/(0.5-(- 0.5)=1950.
[00129] Various regression methods can be used to generate a likelihood indicator in some embodiments. Linear regression can be used to generate a simple linear regression model for each candidate peptide sequence to identify the best fitting model. Multiple linear regression can also be used to incorporate additional explanatory variables into the model such as intensity.
[00130] In some embodiments, cross correlation can be used to generate a likelihood indicator. A cross correlation score of a sample relative to a candidate peptide sequence can be determined. For example, a scoring function can measure the similarity of an MS/MS spectra relative to a theoretical spectrum. Cross correlation can use a sliding dot product.
[00131 ] A method of determining a cross correlation score can involve first assigning an intensity to theoretical b ions and y ions. The square root of the ion intensities of the observed b and y ions is then calculated. The mean of the cross correlation over 500 lags is then calculated and subtracted from the cross correlation between the observed and theoretical ions at a lag of 0. Other methods of determining a cross correlation score can be used, such as the method described in Eng, J. K., McCormack, A. L., & Yates, J. R. (1994). “An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American society for mass spectrometry”, 5(11 ), 976-989.
[00132] Other scoring functions can also be used, such as an ion intensity match ratio (I IMR), or an Amino Acid Match Ratio (AAMR).
[00133] As described, the ion intensity match ratio (IIMR) can generate a signal to noise score for each theoretical peptide match. The ion intensity match ratio can be calculated from the sum of ion intensities where there is a match to the theoretical spectra and is divided by the total sum of the ion intensities in the observed spectra. The IIMR value ranges from 0 to 1 , in which 1 represents a perfect score and corrects for duplicate ion matches.
[00134] It is noted that the signal processing filters previously mentioned may have already reduced the noise. Since the ion intensity match ratio is determined after the signal processing filters have been applied, the ion intensity match ratio can be based on the Total Ion Current (TIC) for that spectrum. That is, the IIMR can be determined by dividing by the TIC for that spectrum.
[00135] The Amino Acid Match Ratio can be used to generate an ordered list of suspected amino acids in the peptide by comparing the m/z differences in pairs of observed spectra lines.
[00136] The Peptide Match Ratio can be calculated by taking the sum of the correctly ordered amino acids and dividing that sum by the total number of amino acids used to generate the theoretical spectra. Since the Peptide Match Ratio score is computationally expensive, the Peptide Match Ratio may be only run against the best scoring results.
[00137] Independent scoring algorithms can also be used to generate a single vector using Equation (4). The single vector can be more computationally efficient. c = (a2 + b2) Equation (4)
[00138] Neutral Loss ions and A ions can be considered diagnostic or confirmatory ions which contribute to a score. Ion pairs, which can identify an Amino Acid as described in the Peptide Match ion, can represent a secondary scoring mechanism related to these confirmatory ions.
[00139] A signal-to-noise filter can be applied to the one or more candidate sequences based on the likelihood indicators for the candidate peptide sequences.
[00140] Mass spectrometers, even without a sample, continuously generate source noise spectra. A signal-to-noise filter can be applied using a Monte Carlo random simulation to separate real data from random spectra by chi-square or other statistical means. The signal-to-noise filter can eliminate samples that show minimal significance when compared to random or source noise spectra. The random source noise can be generated either with blank solvents applied to naive columns or using random number generators. Peptide sequences matching to random MS/MS spectra at high frequency compared to real data by chi square or other statistical means are not carried forward for further analysis.
[00141 ] In some embodiments, random source noise spectra at high frequencies can be generated. A difference between the random source noise spectra at high frequencies and the one or more candidate peptide sequences can be determined. If the difference exceeds a pre-determined threshold difference, the sample can be excluded from selection as a proposed peptide sequence for the query spectrum. Otherwise, the sample can be selected as a proposed peptide sequence.
[00142] A candidate peptide sequence can be selected as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences. The selection of proposed peptide sequences from filtered candidate peptide sequences can be based on a best fit. The fit can relate to a peptide, accession, or gene symbol. In some embodiments, a machine learning model can be used to select the best fitting filtered candidate peptide sequence to the proposed peptide sequence.
[00143] In some embodiments, a signal to noise ratio filter can be applied after a proposed peptide sequence for the query spectrum has been selected. In such embodiments, a signal to noise ratio filter may not be applied to the candidate peptide sequence. For example, the number of spectra in the query sample that match the proposed peptide sequence can be counted and compare this count to a count for randomly generated and/or experimentally obtained spectra (e.g., inauthentic). Samples associated with counts that are similar to those of inauthentic samples can be excluded from further analysis.
[00144] A challenge with mass spectrometry relates to generating non-redundant datasets for analysis. One or more strategies can be employed to reduce the redundancy of the final data without a loss of information at the level of peptides, accessions, and gene symbols. For example, one strategy can involve assigning a unique identifier to each query sample or MS/MS spectrum (e.g., SpectralD) and selecting the best fit per spectrum (BFPS) for each query sample or MS/MS spectrum. The best fit can be determined by the likelihood indicator. In some embodiments, the best fit can be determined using a machine learning model.
[00145] Candidate peptide sequences can be generated by multiple combinations of methods, including multiple libraries and multiple random generation methods. A unique identifier can be assigned to each combination (e.g., search definition ID). There can be different approaches to calculating a best fit. In some embodiments, a best fit can be calculated by search definition ID, where for each search definition ID, a best fit per spectra is chosen independently of the other search definition IDs.
[00146] In some embodiments, the best fit per spectrum (BFPS) can be selected by Library and Search Engine. In the case where there are multiple static search methods for one library engine combination, the counts may be split off into the method where the SpectralD scored the best, or had the highest likelihood indicator. In this case there is minimal overlap between the two methods except where the SpectralD scored equally highest in both methods. Once the best fit per spectrum is identified, a non-redundant list can be made for each level that the data may be analyzed.
[00147] At the level of gene symbols, only unique SpectralD-SearchdefinitionlD- Genesymbol combinations are carried forward to create counts for all the subgroups for the experiment and are used for further statistical analysis where groups are compared by gene symbol.
[00148] The same is done at the level of peptides, where only SpectralD- SearchdefinitionlD-Peptide combinations are carried forward for count tables and analysis at the level of peptides. Searchdefinition ID corresponds to the library and search settings.
Signal to Noise Filter
[00149] The method disclosed herein can comprise applying a signal to noise filter.
[00150] Noise can be generated from various sources. For example, noise can come from a statistical control and/or a non-specific binding control. A statistical control can include, for example, an electromagnetic noise control, a random MS/MS spectra control, or both. There are various ways to perform a non-specific binding control. For example, a pre-immune serum may be used in binding. An affinity column without an antigen attached, a affinity column with a control antigen attached, a naive affinity column that has not been pre-treated (blocked by a blocking agent), and/or a conditioned affinity column can also be a non-specific binding control.
[00151 ] When applying a signal to noise filter, an observation frequency of at least one control and an observation frequency of at least one query sample can be determined. If the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the at least one query sample can be further analyzed and/or the candidate spectrum can be selected as a proposed spectrum.
[00152] A threshold of the difference between the observation frequencies may be set to allow the at least one query sample to be further analyzed.
[00153] In some embodiments, applying the signal to noise filter comprises determining an observation frequency of at least one query sample. In some embodiments, applying the signal to noise filter comprises determine an observation frequency of at least one control. In some embodiments, if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the at least one query sample is further analyzed. In some embodiments, if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
[00154] In some embodiments, applying the signal to noise filter comprises (i) determining an observation frequency of at least one query sample; (ii) determine an observation frequency of at least one control; (iii) if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
[00155] In some embodiments, the method can comprise the use of experimental controls. The use of experimental controls can comprise the use of: (1 ) naive blank columns (never used) or conditioned blank columns; (2) control affinity support resin without the specific antigen, ligand, receptor, or diagnostic or therapeutic target; (3) a support resin with a control antigen, ligand, receptor, or diagnostic or therapeutic target; (4) support resin with the specific antigen, ligand, receptor, or diagnostic or therapeutic target for use with pre-immune serum; or (5) some other controls from tissues, cells or bodily fluids.
[00156] The method may comprise the use of statistical controls, which can comprise the use of naturally-occurring electromagnetic noise or random MS/MS spectra or computer generated random MS/MS spectra to select potential peptide sequences with a higher probability of being true positive peptide identification from MS/MS spectra.
Target
[00157] In some embodiments, the target comprises a receptor or a fragment thereof. In one embodiment, the target comprises a ligand or a fragment thereof. In one embodiment, the target comprises an enzyme or a fragment thereof. In one embodiment, the target comprises a protein or a fragment thereof. In one embodiment, the target comprises an antibody or a fragment thereof. In one embodiment, the target comprises a variable domain or a fragment thereof. In one embodiment, the target comprises a drug.
[00158] In some embodiments, the target comprises a receptor, a ligand, an enzyme, a protein, an antibody, a variable domain, or a drug.
[00159] The target can be immobilized on a suitable surface prior to or after incubation with the peptide aptamer library, for example, immobilization on microbeads, nanobeads, a 2-dimensional surface, a 3-dimensional scaffold, and/or a 3-dimensional fiber. The target can be immobilized on the surface using any suitable method. The target can be immobilized with or without the use of a linker, optionally, a cleavable linker.
[00160] In some embodiments, the target is immobilized on microbeads, nanobeads, a 2-dimensional surface, a 3-dimensional scaffold, and/or a 3-dimensional fiber.
[00161 ] In some embodiments, immobilization comprises use of a cleavable linker.
[00162] In some embodiments, the target is immobilized prior to contacting with the peptide aptamer library. In one embodiment, the target is immobilized after contacting with the peptide aptamer library.
[00163] After incubation of the target with the peptide aptamer library, unbound peptide aptamers can be removed, for example, by performing one or more washing steps. [00164] Bound peptide aptamers can be eluted from the target. Alternatively, bound peptide aptamers with the target can be released from the surface, for example, by cleaving the cleavable linker.
[00165] In some embodiments, the method further comprises removing unbound peptide aptamers after contacting the peptide aptamer library with the target and releasing bound peptide aptamers after removing unbound peptide aptamers.
[00166] In some embodiments, removing unbound peptide aptamers comprises one or more washing steps. In one embodiment, the one or more washing steps comprise washing with an aqueous buffer, a weak salt solution, a weak mixture of organic solvent with water, a weak acid or base close to neutral pH, and/or a mass spec compatible detergent.
[00167] In some embodiments, releasing bound peptide aptamers comprises eluting with a strong salt solution, a strong mixture of an organic solvent with water, and/or a strong acid or base far from neutral pH.
[00168] In some embodiments, releasing the bound peptide aptamers comprises cleaving the cleavage linker.
[00169] The released peptide aptamers can be identified by mass spectrometry, including but not limited to top down mass spectrometry, or electrospray ionization, or MALDI ionization or chemical ionization or electron impact ionization or LC-ESI-MS/MS.
[00170] The amino acid sequence can be derived from the MS/MS spectra by de novo sequencing. Alternatively, the observed MS/MS spectra can be fitted to a predicted library of MS/MS spectra by 64 bit computation using de novo sequencing using cross correlation, XTANDEM, SEQUEST, regression, goodness of fit, count of fragment spectra matches, heuristic algorithms, other algorithms, or combinations thereof.
[00171 ] In some embodiments, identifying the peptide aptamers that bind the target comprises identifying by mass spectrometry, or top down mass spectrometry, or electrospray ionization, or MALDI ionization or chemical ionization or electron impact ionization or LC-ESI-MS/MS.
[00172] In some embodiments, identifying the peptide aptamers comprises de novo sequencing. [00173] In some embodiments, identifying the peptide aptamers comprises fitting of observed MS/MS spectra to a predicted library.
[00174] In some embodiments, fitting of observed MS/MS spectra to the predicted library comprises 64 bit computation.
[00175] In some embodiments, the 64 bit computation comprises use of cross correlation, XTANDEM, SEQUEST, regression, goodness of fit, count of fragment spectra matches, heuristic algorithms, or combinations thereof.
[00176] The identified peptide aptamers can be used, for example, as binding reagents against drug targets for therapeutic or biomarkers for diagnostic purposes. The drug target may be a ligand or a receptor or an enzyme or other.
[00177] The identified peptide aptamers can be used in screening assays, such as binding assays, enzyme linked immune-mass spectrometric assay (ELiMSA), and live cell affinity chromatograph (LARC), to identify new aptamer drugs.
[00178] In yet another aspect, provided herein is a method of making a synthetic peptide aptamer library, comprising: providing a synthetic peptide, the synthetic peptide comprises two or more domains separated by a cleavage site; cleaving the synthetic peptide to generate a pool of cleavage products, thereby producing the synthetic peptide aptamer library. In yet another aspect, provided herein is a method of making a synthetic peptide aptamer library. The method can comprise providing a synthetic peptide. The synthetic peptide can comprise two or more domains. The two or more domains can be separated by a cleavage site. The method can comprise cleaving the synthetic peptide. Cleaving the synthetic peptide can generate a pool of cleavage products, thereby producing the synthetic peptide aptamer library.
[00179] The above disclosure generally describes the present application. A more complete understanding can be obtained by reference to the following specific examples. These examples are described solely for the purpose of illustration and are not intended to limit the scope of the application. Changes in form and substitution of equivalents are contemplated as circumstances might suggest or render expedient. Although specific terms have been employed herein, such terms are intended in a descriptive sense and not for purposes of limitation.
EXAMPLES
Example 1 - Identification of plasma peptides using mass spectrometry
[00180] Liquid chromatography electrospray ionization tandem mass spectrometry (LC/ESI-MS/MS) can be used to identify the ligands of plasma and identify their receptor complexes on the surface of live cells. We have found that results from HPLC LC-ESI- MS/MS linear quadrupole ion trap (LTQ) mass spectrometers and results from the orbitrap show excellent agreement. Examining orbitrap data we have discovered that in contrast to small molecules most ionizing peptides have heavy isotopes and hydrogen rearrangements and that only a minority of peptides are found at the monoisotopic mass but instead the peptide MS/MS spectra from precursor with MH delta mass values of -1 , -2, -3 and the heavy isotopes at MH delta mass of +1 , +2, +3, +4 and +5 all match and far outnumber the peptides at the monoisotopic mass at 0. Thus, peptides ionize as an envelope that is 9 Daltons wide and so there is essentially no point to using mass spectrometers that have much higher mass accuracy than that of Nature for the precursor. Instead, a wide range of precursor masses need to be considered and the MS/MS spectra computed to reveal the identity of the peptides. The LTQ is entirely sufficient to identify and quantify the observation frequency of peptides and proteins from plasma protein ligands or their receptor complex on the surface of live cells. However commercial software packages often re-use MS/MS spectra and this redundancy needs to be removed in a relational database management system. In addition, all analytical experiments require a blank and so we maintain a database of peptides from blank noise injections (Albumin and antibodies from the skin) to correct experimental observation frequency.
Example 2 - Statistical approach
[00181 ] The decoy library method for proteomics does not rely on any known statistical distribution or test but is rather an arbitrary competition for significance against and arbitrarily chosen library that does not generate an FDR q-value. We have reevaluated 18 non-human protein standards used for the decoy library method and shown that the so called empirical statistical model is based on the incorrect assumption that sigma protein standards are pure when they in fact contain hundreds or thousands of proteins and so the empirical statistical model showed a 99.7% false negative rate. In agreement with the known hydrogen rearrangement and heavy isotope or peptides the optimal search conditions for fitting peptide MS/MS fragmentation spectra are -4.5 to + 5.5 Da. Accordingly, we have instead applied classical statistics to the problem of proteomics using the SQL Server and R statistical system.
[00182] Classical statistical methods like ANOVA, regression or Chi Square used in biomedicine, agriculture and engineering applications all require that the data is randomly and independently sampled from a population with a known distribution. We have shown that intensity may be log transformed to yield a normal distribution and the observation frequency is a discrete distribution that follows the alpha n=1 distribution and so is amenable to the Chi square (x2) test. Classical statistical methods like ANOVA, regression or Chi Square all have a null hypothesis (Ho) and test statistic (F, x2) and a critical value to generate a p-value from the known distribution. The p-values are generated and corrected by the FDR method of Benjamini and Hochberg. We employed two independent statistical methods to fit the peptides by MS/MS spectra to the theoretical fragmentation spectra and the Monte Carlo simulation versus millions of random MS/MS spectra to determine the type I error rate of our plasma ligand and cell surface receptor experiments. We have shown that the identity of peptides and proteins can be computed entirely in SQL Server and R using classical statistics and that showed excellent agreement with the XITANDEM and SEQUEST algorithm after correcting for redundant use of MS/MS spectra to ensure that each MS/MS spectra is never fit to more than one single peptide sequence that eliminated virtually all error in proteomics. Thus there is no need for high resolution orbitraps for the proteomic analysis of proteins from the fit of MS/MS spectra to tryptic peptides. In contrast to peptides, small molecules are in high abundance, have few hydrogen rearrangements and heavy isotopes and few fragments that can be fit by regression or Chi Square and so insensitive high resolution mass spectrometry like Orbitraps or Qq-TOF is entirely appropriate for small molecule analysis but are not required for the analysis of observation frequency of tryptic peptides for plasma and receptor discovery. The observation that classical statistics in SQL Server and the R statistical system may be used to describe blood plasma ligands and their receptors of the surface of cells with great sensitivity to the pg range and defined type I and type II error established from regression, ANOVA, or Chi Square.
Example 3 - Approaches to generate a peptide aptamer library from synthetic peptides
[00183] Synthetic peptides may be obtained from a random combinatorial approach.
[00184] Synthetic peptides may be obtained from a partially random combinatorial approach punctuated with known protease or chemical cleavage sites, and/or fixed amino acid.
[00185] Synthetic peptides may be obtained from a partially random combinatorial where different random pools of amino acids restricted to particular cleavage products or between particular fixed residues.
[00186] Synthetic peptides may be obtained where fixed residues are comprised of hydrophobic amino acids that may contribute to binding including but not limited to W, H, F, M, Y, C or L.
[00187] Synthetic peptides may be obtained from a random combinatorial approach punctuated with known protease or chemical cleavage sites, and/or fixed amino acid.
[00188] In an alternative and completely synthetic approach, a peptide aptamer library will be designed with tryptic and chymotryptic sites spaced to create several subdomains and in each domain a defined subset of the 20 amino acids will be randomly ordered alongside invariant amino acids with acidic, basic or hydrophobic characters. While an aptamer of all 20 amino acid randomly ordered would create a large computational problem. Moreover, randomizing all 20 amino acids would also create a chemical problem where there are too many peptide species in the library for anyone to reach a concentration in the atto molar or femtomolar or picomolar or nanomolar range for binding assays. In contrast, by inserting tryptic and chymotryptic sites along the backbone of the peptide aptamer, restricting the number of random amino acids in each results cleavage peptide and inserting invariant amino acids at known location the whole aptamer may contain <20 random amino acids or modifications thereof, while the cleave peptides of the aptamer are each short enough and simple enough to be identified by tandem mass spectrometry. [00189] For example, a synthetic peptide with the following structure can be made for generating a library of peptide aptamers (FIG. 3):
A1 -A2-W/Y/F-A3-A4-A5-A6-R/K-A7-A8-A9-Y-A10-W/Y/F-A11 -A12-R/K-A13-A14-A15- A16-A17-A18-A19-A20-W/Y/F-A21 -A22-R/K-A23-A24-W/Y/F-A25-A26-A27-A28-R/K- A29-A30— AN
[00190] Specific high observation frequency peptide adapters can be:
A1 -A2-W/Y/F-A3-A4-A5-A6-R/K
A3-A4-A5-A6-R/K
A11 -A12-R/K-A13-A14-A15-A16-A17-A18-A19-A20-W/Y/F
A23-A24-W/Y/F-A25-A26-A27-A28-R/K-A29-A30— AN
A23-A24-W/Y/F-A25-A26-A27-A28 -R/K
Example 4 - Using the aptamer library to identify binding agents to a target
[00191 ] The aptamer library will be incubated with the receptor, ligand, enzyme, protein, antibody, variable domain or drug to induce binding.
[00192] The receptor, ligand, enzyme, protein, antibody, variable domain or drug will be immobilized on microbeads or nanobeads or a flat 2 dimensional surface or 3 dimensional scaffold or fiber.
[00193] The receptor, ligand, enzyme, protein, antibody, variable domain or drug will be immobilized prior to or after incubation with the aptamer library.
[00194] Unbound peptides will be washed away with aqueous buffer, weak salt solutions, weak mixtures of organic solvents with water, weak acids or bases close to neutral pH, and/or mass spec compatible detergents.
[00195] Bound peptides will be eluted with strong salt solutions, strong mixtures of organic solvents with water, strong acids or bases far from neutral pH.
[00196] The eluted peptides will be identified by mass spectrometry, or top down mass spectrometry, or electrospray ionization, or MALDI ionization or chemical ionization or electron impact ionization or LC-ESI-MS/MS. [00197] The amino acid sequence will be derived from the MS/MS spectra by de novo sequencing.
[00198] Alternatively, the observed MS/MS spectra will be fitted to a predicted library of MS/MS spectra by 64 bit computation using de novo sequencing using XTANDEM, SEQUEST, or regression, or goodness of fit, or heuristic algorithms, or other algorithms.
Example 5 - MS/MS computation
[00199] Real peptides and their fragments frequently contain heavy isotopes and so a filter to look for spectrum lines where there is a presence of isotopes of hydrogen rearrangements or H loss may or hydrogen rearrangements and rarely - using isotope filtering to remove the noise.
[00200] Isotopes and hydrogen re-arrangements or losses may occur in precursor peptides or fragments.
[00201 ] The de novo of the MS/MS spectra or the fit of the a,b,c & x,y,z fragment series, or regression, goodness of fit or cross correlation to generate the top fits.
[00202] The observation frequency of peptides or polypeptides will be compared to those of random MS/MS spectra to identify true positive peptides.
[00203] Selecting the top fitting 10-50 peptides will be filtered from the noise by the use of observation frequency versus the Monte Carlo at the level of proteins, polypeptides or peptides to select the true positive results.
[00204] Blank injections with no aptamer injected would serve as an analytical control.
[00205] True positive results will accumulate in a subset of peptides or variable domains while false positive hits will show a random distribution.
[00206] Thus, a series of signal to noise filters along with classical statistical analysis and specifically designed aptamer libraries to make the computation of synthetic random aptamers feasible.
Example 6 - Binding of synthetic aptamers with CD64 ectodomain
[00207] A synthetic peptide aptamer library was made from a synthetic peptide with the following structure: S-T-X1 -X1 -X1 -Y-N-Q-X2-X2-X2-R-S-N-X3-X3-X3-Y-T-Q-X4-X4-X4-R
[00208] Where X1 was randomly chosen from the amino acids D, S and A; X2 was randomly chosen from the amino acids D, T and V; X3 was randomly chosen from the amino acids H, N, and L; and X4 was randomly chosen from the amino acids Q, F, G, and I.
[00209] T rypsin-digested or Chymotrypsin-digested synthetic aptamer libraries were used to bind CD64 ectodomain target immobilized on an affinity column. Conditioned blank column and column without CD64 ectodomain target were used as experimental controls. Results are shown in Table 1 and 2.
Table 1. Representative peptides identified using trypsin-digested synthetic aptamer library
Figure imgf000041_0001
Table 2. Representative peptides identified using chymotrypsin-digested synthetic aptamer library
Figure imgf000042_0001
[00210] While the present application has been described with reference to what are presently considered to be the preferred examples, it is to be understood that the application is not limited to the disclosed examples. To the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
[00211 ] All publications, patents and patent applications are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety. Specifically, the sequences associated with each accession numbers provided herein including for example accession numbers and/or biomarker sequences (e.g. protein and/or nucleic acid) provided in the Tables or elsewhere, are incorporated by reference in its entirely.
[00212] The scope of the claims should not be limited by the embodiments and examples, but should be given the broadest interpretation consistent with the description as a whole.

Claims

CLAIMS:
1 . A method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide, the at least one synthetic peptide comprises two or more domains separated by a protease cleavage site, wherein at least one domain comprises at least one amino acid that is randomly selected from a subset of the 20 common amino acids; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; and analyzing the bound synthetic peptide aptamers, comprising: generating a MS/MS query spectrum of the bound peptide aptamer; receiving one or more parameters of the query spectrum; generating one or more candidate peptide sequences based on the one or more parameters of the query spectrum; generating a plurality of query samples of the query spectrum; selecting at least one query sample from the plurality of samples for comparison with the one or more candidate peptide sequences; determining a likelihood indicator for each of the one or more candidate peptide sequences based on a comparison with the at least one query sample; applying a signal to noise filter to the one or more candidate peptide sequences based on the likelihood indicators for the candidate peptide sequences; and selecting at least one candidate peptide sequence as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
2. A method of identifying a synthetic peptide aptamer, from a library comprising a plurality of synthetic peptide aptamers, that binds a target, comprising: providing the peptide aptamer library comprising the plurality of synthetic peptide aptamers, the plurality of synthetic peptide aptamers being generated from at least one synthetic peptide; contacting the peptide aptamer library with the target; removing unbound synthetic peptide aptamers after contacting the peptide aptamer library with the target; and analyzing the bound synthetic peptide aptamers; thereby identifying the synthetic peptide aptamer of the plurality of synthetic peptide aptamers that binds the target.
3. The method of claim 2, wherein the at least one synthetic peptide comprises two or more domains separated by a cleavage site.
4. The method of claim 3, wherein at least one domain comprises at least one random amino acid.
5. The method of claim 4, wherein the at least one random amino acid is randomly selected from a subset of amino acids.
6. The method of claim 5, wherein the subset of amino acids comprises a subset of the 20 common amino acids.
7. The method of claim 5, wherein the subset of amino acids comprises a subset of fewer than 5 amino acids.
8. The method of any one of claims 3-7, wherein the cleavage site comprises a protease cleavage site.
9. The method of any one of claims 3-8, wherein the cleavage site comprises a trypsin cleavage site and/or a chymotrypsin cleavage site.
10. The method of any one of claims 3-9, wherein the cleavage site comprises a chemical cleavage site.
11 .The method of any one of claims 3-10, wherein the at least one synthetic peptide further comprises at least one predetermined amino acid at a predetermined position within at least one domain.
12. The method of claim 11 , wherein the at least one predetermined amino acid comprises a hydrophobic amino acid.
13. The method of claim 11 , wherein the at least one predetermined amino acid comprises tryptophan, histidine, phenylalanine, methionine, tyrosine, cysteine, or lysine.
14. The method of any one of claims 2-13, wherein analyzing the bound synthetic peptide aptamers comprises: generating a MS/MS query spectrum of the synthetic peptide aptamer; receiving one or more parameters of the query spectrum; generating one or more candidate peptide sequences based on the one or more parameters of the query spectrum; generating a plurality of samples of the query spectrum; selecting at least one query sample from the plurality of query samples for comparison with the one or more candidate peptide sequences; determining a likelihood indicator for each of the one or more candidate peptide sequences based on a comparison with the at least one query sample; applying a signal to noise filter to the one or more candidate peptide sequences based on the likelihood indicators for the candidate peptide sequences; and selecting at least one candidate peptide sequence as a proposed peptide sequence for the query spectrum based on the filtered candidate peptide sequences.
15. The method of any one of claims 1 -14, wherein applying the signal to noise filter comprises (i) determining an observation frequency of at least one query sample; (ii) determine an observation frequency of at least one control; (iii) if the observation frequency of the at least one query sample is higher than the observation frequency of the at least one control, then the candidate spectrum is selected as a proposed spectrum.
16. The method of any one of claims 1 -15, wherein generating one or more candidate peptide sequences based on the one or more parameters of the peptide query comprises one or more of: storing a plurality of peptide sequences in a computer-readable medium; selecting, from the computer-readable medium, at least one of the plurality of stored peptide sequences to use as at least one candidate peptide sequence, based on the one or more parameters; and randomly generating at least one candidate peptide sequence, based on the one or more parameters.
17. The method of claim 16, wherein: receiving one or more parameters of the query spectrum comprises receiving the one or more parameters from user input at a computing device; the one or more parameters comprise one or more of an amino acid, a position of the amino acid, an amino acid group, or a position of the amino acid group.
18. The method of any one of claims 16-17, wherein: receiving one or more parameters of a query spectrum comprises: receiving the query spectrum; and deriving the one or more parameters from the query spectrum; the one or more parameters comprise one or more of a neutral loss, a post- translational modification shift, an immonium ion, a subtraction of a B ion, or a subtraction of a Y ion.
19. The method of any one of claim 16-18, wherein selecting, from the computer- readable medium, at least one of the plurality of stored peptide sequences to use as at least one candidate peptide sequence, based on the one or more parameters comprises: identifying a subset of peptide sequences from the plurality of stored peptide sequences, each peptide sequence satisfying the one or more parameters; and selecting the at least one candidate peptide sequence from the subset of peptide sequences.
20. The method of any one or claims 16-19, wherein the plurality of stored peptide sequences comprise one or more of: a naturally occurring peptide sequence and a synthetic peptide sequence.
21 . The method of any one of claims 16-20, wherein randomly generating at least one candidate peptide sequence, based on the one or more parameters comprises: randomly generating a peptide sequence; determining whether the randomly generated peptide sequence satisfies the one or more parameters; and if the randomly generated peptide sequence satisfies the one or more parameters, use the randomly generated peptide sequence as a candidate peptide sequence, otherwise discard the randomly generated peptide sequence.
22. The method of any one of claims 16-21 , wherein each of the at least one randomly generated candidate peptide sequence has a pre-determined length.
23. The method of claim 22, wherein randomly generating at least one candidate peptide sequence, based on the one or more parameters comprises, for each randomly generated candidate peptide sequence: assigning at least one amino acid to the randomly generated candidate peptide sequence based on the one or more parameters; randomly selecting an unassigned position of the predetermined length; for the unassigned position: randomly selecting an amino acid; and assigning the amino acid to the unassigned position; and continue randomly assigning amino acids to the randomly generated candidate peptide sequence until each position is assigned.
24. The method of claim 23, wherein randomly selecting an amino acid comprises: randomly selecting an amino acid group from a plurality of amino acid groups, each amino acid group comprising a plurality of amino acids; and randomly selecting the amino acid from the randomly selected amino acid group.
25. The method of claim 24, wherein each randomly generated candidate peptide sequence comprises at least one amino acid from each amino acid group of the plurality of amino acid groups.
26. The method of any one of claims 1 -25, wherein generating a plurality of query samples of the query spectrum comprises one or more of generating experimental spectra or generating simulated spectrum.
27. The method of claim 26, wherein generating simulated spectra comprises using a Monte Carlo random simulation.
28. The method of any one of claims 1 -27, further comprises releasing bound synthetic peptide aptamers from the target after removing unbound synthetic peptide aptamers.
29. The method of claim 28, wherein generating the MS/MS query spectrum comprises generating a MS/MS query spectrum of the released bound synthetic peptide aptamers.
30. The method of any one of claims 1 -29, wherein the target comprises a receptor, a ligand, an enzyme, a protein, an antibody, a variable domain, or a drug.
31 . The method of claim any one of claims 1 -30, wherein the target is immobilized on microbeads, nanobeads, a 2-dimensional surface, a 3-dimensional scaffold, and/or a 3-dimensional fiber.
32. The method of claim 31 , wherein immobilization comprises use of a cleavable linker.
33. The method of claim 31 , wherein the target is immobilized prior to contacting with the peptide aptamer library.
34. The method of claim 31 , wherein the target is immobilized after contacting with the peptide aptamer library.
35. The method of any one of claims 1 -34, further comprises removing unbound peptide aptamers after contacting the peptide aptamer library with the target and releasing bound peptide aptamers after removing unbound peptide aptamers.
36. The method of claim 35, wherein removing unbound peptide aptamers comprises one or more washing steps.
37. The method of claim 36, wherein the one or more washing steps comprise washing with an aqueous buffer, a weak salt solution, a weak mixture of an organic solvent with water, a weak acid or base close to neutral pH, and/or a mass spec compatible detergent.
38. The method of claim 35, wherein releasing bound peptide aptamers comprises eluting with a strong salt solution, a strong mixture of an organic solvent with water, and/or a strong acid or base far from neutral pH.
39. The method of claim 31 , wherein immobilization of the target comprises use of a cleavable linker and releasing bound peptide aptamers comprises cleaving the cleavable linker.
40. The method of any one of claims 1 -39, wherein identifying the synthetic peptide aptamer that bind the target comprises identifying by mass spectrometry, or top down mass spectrometry, or electrospray ionization, or MALDI ionization or chemical ionization or electron impact ionization or LC-ESI-MS/MS.
41 . The method of claim 1 or 2, wherein identifying the synthetic peptide aptamer comprises de novo sequencing.
42. The method of claim 1 or 2, wherein identifying the synthetic peptide aptamer comprises fitting of observed MS/MS spectra to a predicted library.
43. The method of claim 42, wherein the fitting of observed MS/MS spectra to the predicted library comprises 64 bit computation.
44. The method of claim 43, wherein the 64 bit computation comprises use of cross correlation, XTANDEM, SEQUEST, regression, goodness of fit, count of fragment spectra matches, heuristic algorithms, or combination thereof.
45. A method of making a synthetic peptide aptamer library, comprising: providing a synthetic peptide, the synthetic peptide comprises two or more domains separated by a cleavage site; and cleaving the synthetic peptide to generate a pool of cleavage products, thereby producing the synthetic peptide aptamer library.
46. The method of claim 45, wherein at least one of the two or more domains comprises at least one random amino acid.
47. The method of claim 46, wherein the at least one random amino acid is randomly selected from a subset of amino acids.
48. The method of claim 47, where the subset of amino acids comprises a subset of the 20 common amino acids.
49. The method of claim 47, where the subset of amino acids comprises fewer than 10 amino acids.
50. The method of any one of claims 45-49, wherein the cleavage site comprises a protease cleavage site.
51 . The method of any one of claims 45-50, wherein the cleavage site comprises a trypsin cleavage site and/or a chymotrypsin cleavage site.
52. The method of any one of claims 45-51 , wherein the cleavage site comprises a chemical cleavage site.
53. The method of any one of claims 45-52, wherein the synthetic peptide further comprises at least one predetermined amino acid at a predetermined position within at least one domain.
54. The method of claim 53, wherein the at least one predetermined amino acid comprises a hydrophobic amino acid.
55. The method of claim 54, wherein the at least one predetermined amino acid comprises tryptophan, histidine, phenylalanine, methionine, tyrosine, cysteine, or lysine.
PCT/CA2024/051738 2023-12-29 2024-12-27 Method of generating and screening synthetic peptide aptamer libraries Pending WO2025137775A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363616404P 2023-12-29 2023-12-29
US63/616,404 2023-12-29

Publications (1)

Publication Number Publication Date
WO2025137775A1 true WO2025137775A1 (en) 2025-07-03

Family

ID=96216254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2024/051738 Pending WO2025137775A1 (en) 2023-12-29 2024-12-27 Method of generating and screening synthetic peptide aptamer libraries

Country Status (1)

Country Link
WO (1) WO2025137775A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050053970A1 (en) * 2001-11-06 2005-03-10 Benson John D. Methods and compositions for identifying peptide aptamers capable of altering a cell phenotype
WO2007090630A2 (en) * 2006-02-07 2007-08-16 Stiftung Für Diagnostische Forschung Peptide aptamer for neutralizing the binding of platelet antigene specific antibodies and diagnostic and therapeutic applications containing the same
WO2024000077A1 (en) * 2022-06-30 2024-01-04 Yyz Pharmatech Inc. Systems and methods for identifying peptides

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050053970A1 (en) * 2001-11-06 2005-03-10 Benson John D. Methods and compositions for identifying peptide aptamers capable of altering a cell phenotype
WO2007090630A2 (en) * 2006-02-07 2007-08-16 Stiftung Für Diagnostische Forschung Peptide aptamer for neutralizing the binding of platelet antigene specific antibodies and diagnostic and therapeutic applications containing the same
WO2024000077A1 (en) * 2022-06-30 2024-01-04 Yyz Pharmatech Inc. Systems and methods for identifying peptides

Similar Documents

Publication Publication Date Title
Sweet et al. Large Scale Localization of Protein Phosphorylation by Use of Electron Capture Dissociation Mass Spectrometry* S
US11467167B2 (en) SRM methods in Alzheimer&#39;s disease and neurological disease assays
CN102395886B (en) Method for quantifying modified peptides
Chicooree et al. The application of targeted mass spectrometry‐based strategies to the detection and localization of post‐translational modifications
CN110678756A (en) Method for absolute quantification of low abundance polypeptides using mass spectrometry
Hoffert et al. Taking aim at shotgun phosphoproteomics
EP1739424B1 (en) Rapid and quantitative proteome anaysis and related methods
Baker et al. Improving software performance for peptide electron transfer dissociation data analysis by implementation of charge state-and sequence-dependent scoring
Tang et al. CLPM: a cross-linked peptide mapping algorithm for mass spectrometric analysis
JP2005510575A (en) IG heavy chain, IG kappa, and IG lambda biopolymer markers that predict Alzheimer&#39;s disease
van der Laarse et al. Targeting proline in (phospho) proteomics
CA2553172A1 (en) Methods and system for the identification and characterization of peptides and their functional relationships by use of measures of correlation
WO2025137775A1 (en) Method of generating and screening synthetic peptide aptamer libraries
US20250166735A1 (en) Systems and methods for identifying peptides
Valente et al. Snake venom proteopeptidomics: What lies behind the curtain
KR100805775B1 (en) How to Analyze Sequence and Modification Information of Modified Polypeptides
WO2025137774A1 (en) Method of generating and screening peptide aptamer libraries from naturally occurring proteins
US20050192755A1 (en) Methods and systems for identification of macromolecules
JP2005510728A (en) Protein biopolymer markers that predict insulin resistance
Gucinski et al. Understanding and exploiting Peptide fragment ion intensities using experimental and informatic approaches
JP2005510721A (en) IG lambda biopolymer marker predicts Alzheimer&#39;s disease
JP2005510718A (en) Complement C3 precursor biopolymer marker to predict type II diabetes
CN1653333A (en) Identifying peptide modifications
Ham Proteomics of biological systems: protein phosphorylation using mass spectrometry techniques
JP2005525535A (en) Macroglobulin biopolymer marker showing insulin resistance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24909272

Country of ref document: EP

Kind code of ref document: A1