US20180340941A1 - Method to Map Protein Landscapes - Google Patents
Method to Map Protein Landscapes Download PDFInfo
- Publication number
- US20180340941A1 US20180340941A1 US15/988,566 US201815988566A US2018340941A1 US 20180340941 A1 US20180340941 A1 US 20180340941A1 US 201815988566 A US201815988566 A US 201815988566A US 2018340941 A1 US2018340941 A1 US 2018340941A1
- Authority
- US
- United States
- Prior art keywords
- polypeptide
- sample
- mass spectrometry
- spectrometry data
- amino acid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 72
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 54
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 197
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 186
- 229920001184 polypeptide Polymers 0.000 claims abstract description 163
- 150000001413 amino acids Chemical class 0.000 claims abstract description 28
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 150000002500 ions Chemical class 0.000 claims description 61
- 238000004949 mass spectrometry Methods 0.000 claims description 59
- 108091005804 Peptidases Proteins 0.000 claims description 55
- 239000004365 Protease Substances 0.000 claims description 55
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 40
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 claims description 36
- 239000002243 precursor Substances 0.000 claims description 23
- 239000013043 chemical agent Substances 0.000 claims description 22
- 238000004885 tandem mass spectrometry Methods 0.000 claims description 20
- 102000035195 Peptidases Human genes 0.000 claims description 19
- 230000004048 modification Effects 0.000 claims description 15
- 238000012986 modification Methods 0.000 claims description 15
- 230000004481 post-translational protein modification Effects 0.000 claims description 12
- 230000001225 therapeutic effect Effects 0.000 claims description 12
- 108090000631 Trypsin Proteins 0.000 claims description 10
- 102000004142 Trypsin Human genes 0.000 claims description 10
- 229940049595 antibody-drug conjugate Drugs 0.000 claims description 10
- 238000013467 fragmentation Methods 0.000 claims description 10
- 238000006062 fragmentation reaction Methods 0.000 claims description 10
- 239000012588 trypsin Substances 0.000 claims description 10
- -1 Lys-N Proteins 0.000 claims description 9
- 239000000611 antibody drug conjugate Substances 0.000 claims description 9
- 108090000317 Chymotrypsin Proteins 0.000 claims description 7
- 229960002376 chymotrypsin Drugs 0.000 claims description 7
- 239000012634 fragment Substances 0.000 claims description 7
- 238000003064 k means clustering Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 6
- 101001018085 Lysobacter enzymogenes Lysyl endopeptidase Proteins 0.000 claims description 5
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 239000002773 nucleotide Substances 0.000 claims description 5
- 125000003729 nucleotide group Chemical group 0.000 claims description 5
- 229940127121 immunoconjugate Drugs 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 3
- 230000035772 mutation Effects 0.000 claims description 3
- 230000003851 biochemical process Effects 0.000 claims description 2
- 230000000155 isotopic effect Effects 0.000 claims 1
- 239000003814 drug Substances 0.000 abstract description 10
- 229940079593 drug Drugs 0.000 abstract description 7
- 230000014509 gene expression Effects 0.000 abstract description 6
- 238000013459 approach Methods 0.000 abstract description 3
- 238000003766 bioinformatics method Methods 0.000 abstract description 3
- 229960000106 biosimilars Drugs 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000012350 deep sequencing Methods 0.000 abstract description 2
- 235000018102 proteins Nutrition 0.000 description 54
- 235000001014 amino acid Nutrition 0.000 description 25
- 239000000203 mixture Substances 0.000 description 14
- 230000029087 digestion Effects 0.000 description 13
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 12
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 12
- 239000000047 product Substances 0.000 description 12
- 210000004027 cell Anatomy 0.000 description 10
- NYFAQDMDAFCWPU-UVCHAVPFSA-N ubiquinone-5 Chemical compound COC1=C(OC)C(=O)C(C\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CCC=C(C)C)=C(C)C1=O NYFAQDMDAFCWPU-UVCHAVPFSA-N 0.000 description 9
- 239000012491 analyte Substances 0.000 description 8
- 108010026552 Proteome Proteins 0.000 description 7
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 6
- 239000000872 buffer Substances 0.000 description 6
- 239000004202 carbamide Substances 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 235000019253 formic acid Nutrition 0.000 description 6
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 4
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 4
- 230000021736 acetylation Effects 0.000 description 4
- 239000000427 antigen Substances 0.000 description 4
- 108091007433 antigens Proteins 0.000 description 4
- 102000036639 antigens Human genes 0.000 description 4
- 229940088598 enzyme Drugs 0.000 description 4
- 102000054765 polymorphisms of proteins Human genes 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 3
- 238000006640 acetylation reaction Methods 0.000 description 3
- 238000005804 alkylation reaction Methods 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005194 fractionation Methods 0.000 description 3
- 230000013595 glycosylation Effects 0.000 description 3
- 238000006206 glycosylation reaction Methods 0.000 description 3
- 230000029226 lipidation Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000011987 methylation Effects 0.000 description 3
- 238000007069 methylation reaction Methods 0.000 description 3
- 230000026731 phosphorylation Effects 0.000 description 3
- 238000006366 phosphorylation reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000017854 proteolysis Effects 0.000 description 3
- 230000002797 proteolythic effect Effects 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 239000003643 water by type Substances 0.000 description 3
- NQUNIMFHIWQQGJ-UHFFFAOYSA-N 2-nitro-5-thiocyanatobenzoic acid Chemical compound OC(=O)C1=CC(SC#N)=CC=C1[N+]([O-])=O NQUNIMFHIWQQGJ-UHFFFAOYSA-N 0.000 description 2
- 239000004475 Arginine Substances 0.000 description 2
- BXTVQNYQYUTQAZ-UHFFFAOYSA-N BNPS-skatole Chemical compound N=1C2=CC=CC=C2C(C)(Br)C=1SC1=CC=CC=C1[N+]([O-])=O BXTVQNYQYUTQAZ-UHFFFAOYSA-N 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- 239000001888 Peptone Substances 0.000 description 2
- 108010080698 Peptones Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 230000029936 alkylation Effects 0.000 description 2
- VZTDIZULWFCMLS-UHFFFAOYSA-N ammonium formate Chemical compound [NH4+].[O-]C=O VZTDIZULWFCMLS-UHFFFAOYSA-N 0.000 description 2
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 2
- 238000007622 bioinformatic analysis Methods 0.000 description 2
- 229940041514 candida albicans extract Drugs 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 239000008121 dextrose Substances 0.000 description 2
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 2
- 238000011067 equilibration Methods 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 238000005040 ion trap Methods 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 230000009635 nitrosylation Effects 0.000 description 2
- 230000003647 oxidation Effects 0.000 description 2
- 238000007254 oxidation reaction Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000008188 pellet Substances 0.000 description 2
- 238000012510 peptide mapping method Methods 0.000 description 2
- 235000019319 peptone Nutrition 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000034512 ubiquitination Effects 0.000 description 2
- 238000010798 ubiquitination Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- 239000012138 yeast extract Substances 0.000 description 2
- HNSDLXPSAYFUHK-UHFFFAOYSA-N 1,4-bis(2-ethylhexyl) sulfosuccinate Chemical compound CCCCC(CC)COC(=O)CC(S(O)(=O)=O)C(=O)OCC(CC)CCCC HNSDLXPSAYFUHK-UHFFFAOYSA-N 0.000 description 1
- LTPSRQRIPCVMKQ-UHFFFAOYSA-N 2-amino-5-methylbenzenesulfonic acid Chemical compound CC1=CC=C(N)C(S(O)(=O)=O)=C1 LTPSRQRIPCVMKQ-UHFFFAOYSA-N 0.000 description 1
- 241000372033 Andromeda Species 0.000 description 1
- 102000007350 Bone Morphogenetic Proteins Human genes 0.000 description 1
- 108010007726 Bone Morphogenetic Proteins Proteins 0.000 description 1
- ACTIUHUUMQJHFO-UHFFFAOYSA-N Coenzym Q10 Natural products COC1=C(OC)C(=O)C(CC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)C)=C(C)C1=O ACTIUHUUMQJHFO-UHFFFAOYSA-N 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- VGGSQFUCUMXWEO-UHFFFAOYSA-N Ethene Chemical compound C=C VGGSQFUCUMXWEO-UHFFFAOYSA-N 0.000 description 1
- 239000005977 Ethylene Substances 0.000 description 1
- 108010051815 Glutamyl endopeptidase Proteins 0.000 description 1
- AVXURJPOCDRRFD-UHFFFAOYSA-N Hydroxylamine Chemical compound ON AVXURJPOCDRRFD-UHFFFAOYSA-N 0.000 description 1
- 102000014150 Interferons Human genes 0.000 description 1
- 108010050904 Interferons Proteins 0.000 description 1
- 102000015696 Interleukins Human genes 0.000 description 1
- 108010063738 Interleukins Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 102100027370 Parathymosin Human genes 0.000 description 1
- 229940122907 Phosphatase inhibitor Drugs 0.000 description 1
- 102000055027 Protein Methyltransferases Human genes 0.000 description 1
- 108700040121 Protein Methyltransferases Proteins 0.000 description 1
- 102000007562 Serum Albumin Human genes 0.000 description 1
- 108010071390 Serum Albumin Proteins 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000010933 acylation Effects 0.000 description 1
- 238000005917 acylation reaction Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 238000010296 bead milling Methods 0.000 description 1
- 230000006696 biosynthetic metabolic pathway Effects 0.000 description 1
- 229960000182 blood factors Drugs 0.000 description 1
- 229940112869 bone morphogenetic protein Drugs 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 238000007623 carbamidomethylation reaction Methods 0.000 description 1
- 230000021235 carbamoylation Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 235000017471 coenzyme Q10 Nutrition 0.000 description 1
- ACTIUHUUMQJHFO-UPTCCGCDSA-N coenzyme Q10 Chemical compound COC1=C(OC)C(=O)C(C\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CCC=C(C)C)=C(C)C1=O ACTIUHUUMQJHFO-UPTCCGCDSA-N 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- ATDGTVJJHBUTRL-UHFFFAOYSA-N cyanogen bromide Chemical compound BrC#N ATDGTVJJHBUTRL-UHFFFAOYSA-N 0.000 description 1
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 239000005350 fused silica glass Substances 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 239000003102 growth factor Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000004896 high resolution mass spectrometry Methods 0.000 description 1
- 239000005556 hormone Substances 0.000 description 1
- 229940088597 hormone Drugs 0.000 description 1
- 230000033444 hydroxylation Effects 0.000 description 1
- 238000005805 hydroxylation reaction Methods 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229940047124 interferons Drugs 0.000 description 1
- 229940047122 interleukins Drugs 0.000 description 1
- 230000026045 iodination Effects 0.000 description 1
- 238000006192 iodination reaction Methods 0.000 description 1
- PGLTVOMIXTUURA-UHFFFAOYSA-N iodoacetamide Chemical compound NC(=O)CI PGLTVOMIXTUURA-UHFFFAOYSA-N 0.000 description 1
- 238000004811 liquid chromatography Methods 0.000 description 1
- 238000001294 liquid chromatography-tandem mass spectrometry Methods 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 210000004897 n-terminal region Anatomy 0.000 description 1
- 230000009527 neddylation Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 230000001323 posttranslational effect Effects 0.000 description 1
- 230000013823 prenylation Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000002331 protein detection Methods 0.000 description 1
- 230000006337 proteolytic cleavage Effects 0.000 description 1
- 238000000275 quality assurance Methods 0.000 description 1
- 230000035484 reaction time Effects 0.000 description 1
- NPCOQXAVBJJZBQ-UHFFFAOYSA-N reduced coenzyme Q9 Natural products COC1=C(O)C(C)=C(CC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)C)C(O)=C1OC NPCOQXAVBJJZBQ-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 238000011451 sequencing strategy Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- MFBOGIVSZKQAPD-UHFFFAOYSA-M sodium butyrate Chemical compound [Na+].CCCC([O-])=O MFBOGIVSZKQAPD-UHFFFAOYSA-M 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 229910001220 stainless steel Inorganic materials 0.000 description 1
- 239000010935 stainless steel Substances 0.000 description 1
- 238000006277 sulfonation reaction Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 150000003573 thiols Chemical class 0.000 description 1
- 229960000103 thrombolytic agent Drugs 0.000 description 1
- 230000002537 thrombolytic effect Effects 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 229940035936 ubiquinone Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6848—Methods of protein analysis involving mass spectrometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6818—Sequencing of polypeptides
- G01N33/6824—Sequencing of polypeptides involving N-terminal degradation, e.g. Edman degradation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6842—Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
-
- G06F19/22—
-
- G06F19/24—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/0027—Methods for using particle spectrometers
- H01J49/0036—Step by step routines describing the handling of the data generated during a measurement
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/004—Combinations of spectrometers, tandem spectrometers, e.g. MS/MS, MSn
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2570/00—Omics, e.g. proteomics, glycomics or lipidomics; Methods of analysis focusing on the entire complement of classes of biological molecules or subsets thereof, i.e. focusing on proteomes, glycomes or lipidomes
Definitions
- Mass-spectrometry-based proteomics is a key technology for studying the proteome, which can comprise canonical gene products, alternative gene products, post-translational modifications (PTMs), non-synonymous single nucleotide polymorphisms (SNPs) and other sequence variations.
- the most prevalent paradigms are top-down proteomics and bottom-up proteomics, also known as shotgun proteomics.
- shotgun proteomics proteins in a sample undergo proteolytic digestion, breaking the proteins into smaller pieces (peptides), which are then subjected to analysis by mass spectrometry.
- the resulting data is often processed using a search engine in conjunction with a sequence database containing data of known peptides and proteins.
- protease In shotgun proteomics, it is important that the digest is able to produce peptides appropriate for the mass spectrometer.
- the most common and effective protease is trypsin, which will cleave C-terminal to the amino acids arginine and lysine, resulting in a mean protein sequence coverage in the range of 15% to 20%. Trypsin's moderate sequence coverage is sufficient for most proteomics experiments but does not provide sufficient sequence coverage for distinguishing various forms of a protein (proteoforms).
- Other proteases are similar in that they only provide partial coverage of the total protein sequence.
- Embodiments of the invention include a deep sequencing strategy to provide more protein sequence coverage than is typically achieved by conventional means, as well as a computational approach to view protein expression across the full length of the protein to identify regions that are potentially subject to alterations, regulation, processing, and modifications.
- aspects of the invention include improved sample preparation, high resolution mass-spectrometry, bioinformatic analysis, and combinations thereof.
- embodiments of the present invention encompass the use of multiple proteases, which allow for the increase of the mean sequence coverage (in some cases, up to 80%), concomitant with bioinformatics analysis in order to distinguish putative proteoforms with improved amino acid resolution.
- multiple samples of the same polypeptide are used to determine the sequence and proteoform information of the polypeptide with great accuracy.
- one or more samples of different polypeptides are used to determine and compare the sequence and proteoform information of the polypeptides.
- the present invention provides a method for analyzing a polypeptide having an amino acid sequence.
- the method comprises the steps of: a) digesting a first sample of the polypeptide with a first protease or chemical agent; b) digesting a second sample of the polypeptide with a second protease or chemical agent; c) generating tandem mass spectrometry data on each digested polypeptide sample; and d) combining mass spectrometry data from each digested polypeptide sample to generate comprehensive mass spectrometry data on the polypeptide.
- the method further comprises digesting one or more additional samples of the polypeptide with one or more additional proteases or chemical agents, wherein the protease or chemical agent used for each sample is a different protease or chemical agent used to digest any other sample.
- the method further comprise digesting a third sample of the polypeptide with a third protease, digesting a fourth sample of the polypeptide with a fourth protease, digesting a fifth sample of the polypeptide with a fifth protease, and/or digesting a sixth sample of the polypeptide with a sixth protease.
- Each protease or chemical agent used to digest a sample is different.
- three to four samples are independently digested by three to four unique proteases.
- each sample is digested and analyzed concurrently with the other sample, analyzed on the mass spectrometer device, and/or as part of the same experiment.
- the present invention provides a method for analyzing two or more polypeptides comprising the steps of: a) independently digesting a first sample of a first polypeptide and a first sample of a second polypeptide with a first protease or chemical agent; b) independently digesting a second sample of the first polypeptide and a second sample of the second polypeptide with a second protease or chemical agent; c) generating tandem mass spectrometry data on each digested polypeptide sample; d) for each polypeptide, combining mass spectrometry data from each digested polypeptide sample of that polypeptide to generate comprehensive mass spectrometry data; and e) generating at least a partial consensus amino acid sequence for each polypeptide from the comprehensive mass spectrometry data and calculating abundances of amino acids.
- the partial consensus amino acid sequence provides at least 50% of the full length polypeptide sequence, preferably, 60% of the full length polypeptide sequence, or 80% of the full length polypeptide sequence.
- a further embodiment comprises f) comparing the consensus sequence abundances of amino acids of each polypeptide, and identifying differences in amino acid abundance between the polypeptides.
- the method further comprises independently digesting a third sample of the first polypeptide and a third sample of the second polypeptide with a third protease; independently digesting a fourth sample of the first polypeptide and a fourth sample of the second polypeptide with a fourth protease; independently digesting a fifth sample of the first polypeptide and a fifth sample of the second polypeptide with a fifth protease; and/or independently digesting a sixth sample of the first polypeptide and a sixth sample of the second polypeptide with a sixth protease.
- Each protease or chemical agent used to digest a sample is different. In an embodiment, three to four samples of each polypeptide are independently digested.
- tandem mass spectrometry data for each digested polypeptide sample in the methods described herein is generated by first generating a distribution of precursor ions during MS 1 stage ionization, fragmenting precursor ions having a mass-to-charge ratio (m/z) within a selected target m/z range during MS 2 stage fragmentation, thereby generating a plurality of product ions where the product ions correspond to portions of the amino acid sequence of the polypeptide, and measuring the m/z and intensity of the product ions, thereby generating mass spectrometry data for each digested polypeptide sample.
- m/z mass-to-charge ratio
- the comprehensive mass spectrometry data is used to generate at least a partial consensus amino acid sequence of the polypeptide.
- the comprehensive mass spectrometry data is used to calculate the quantity or abundances of amino acids for one or more selected portions of the polypeptide, including, but not limited to, portions which comprise the N-terminus of the polypeptide.
- the abundance or quantification of amino acids is performed without the use of an isobaric or chemical label attached to the polypeptide.
- the comprehensive mass spectrometry data provides sequence coverage for at least 20% of the full length amino acid sequence of the polypeptide, at least 30% of the full length amino acid sequence of the polypeptide at least 40% of the full length amino acid sequence of the polypeptide, at least 50% of the full length amino acid sequence of the polypeptide, at least 60% of the full length amino acid sequence of the polypeptide, at least 70% of the full length amino acid sequence of the polypeptide, or at least 80% of the full length amino acid sequence of the polypeptide.
- the mass spectrometry data for different polypeptides is compared to one another to determine any differences between the polypeptide samples. These differences include the presence or absence of amino acids, polymorphisms, mutations, and post translational modification (PTMs) of amino acids, including but not limited to phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, lipidation and proteolysis.
- PTMs post translational modification
- at least one sample polypeptide is a control polypeptide and least one of the other sample polypeptides is a polypeptide which has undergone a suspected modification, splice, truncation, polymorphism or mutation.
- the modification is a post-translational modification or the result of a single nucleotide polymorphism.
- the measured intensities of the product ions from each digested polypeptide sample are normalized during generation of the comprehensive mass spectrometry data.
- the normalized data is then optionally used to identify a portion of the product ions as corresponding to one or more known amino acid sequence fragments of the polypeptide sample.
- k-means clustering analysis is performed on the normalized intensity data during generation of the comprehensive mass spectrometry data.
- alternative clustering algorithms can be used to group data points.
- Proteases suitable for use with the present invention include, but are not limited to, the group consisting of trypsin, Lys-N, Lys-C, Glu-C (Protease V8), chymotrypsin, Asp-N, and combinations thereof.
- Chemical agents suitable for use with the present invention include, but are not limited to, the group consisting of cyanogen bromide, formic acid, hydroxylamine, 2-nitro-5-thiocyanobenzoic acid (NTCB), and BNPS skatole (2-(2-nitrophenylsulfenyl)-3-methyl-3-bromoindolenine).
- the polypeptides analyzed with the methods of the present invention are antibodies, antibody-drug conjugates, or therapeutic proteins that can be administered to a subject.
- the present methods are used to determine if therapeutic products generated during a biochemical or manufacturing process have the same quality and proteoform as a desired control antibody, antibody-drug conjugate, or therapeutic protein.
- a first analyzed polypeptide is a control therapeutic polypeptide, antibody, or antibody-drug conjugate
- a second analyzed polypeptide is a production therapeutic polypeptide, antibody, or antibody-drug conjugate made during a biochemical process or manufacturing process.
- the first polypeptide is a control polypeptide produced by a cell and the second polypeptide is produced by a cell which has been administered a treatment.
- the polypeptides are then analyzed to determine if the treatment alters the sequence or proteoform of the polypeptide.
- the mass spectrometry data is generated at a resolution of 60K or greater. In another embodiment, the mass spectrometry data is generated at a resolution of 120K or greater. In another embodiment, mass spectrometry data collected from the MS 1 stage is generated at a resolution of 60K or greater, and mass spectrometry data collected from the MS 2 stage is generated at a resolution of 120K or greater.
- the methods and systems described herein can also be used to determine specific amino acid abundance in addition to or instead of peptide abundance. Accordingly, the methods described herein will be greatly useful to the proteomics community, as well as to the pharmaceutical industry by allowing the full characterization of the sequence and structure of biosimilar drug therapeutics and determining quality assurance. Taking all together, these methods can impact the proteomics community and pharmaceutical industry alike.
- the invention also provides a system for analyzing a polypeptide having an amino acid sequence comprising: a) an ion source for generating ions from a plurality of digested samples of the polypeptide; b) ion fragmentation optics in communication with the ion source for generating product ions; c) an ion detector in communication with the ion fragmentation optics for detecting ions according to their mass-to-charge ratios; and d) a mass analyzer in communication with the ion detector.
- the mass analyzer comprises a software program enabling the mass analyzer to: i) measure m/z and intensity of the detected ions, thereby generating mass spectrometry data for each digested polypeptide sample; ii) normalize the measured intensities of the product ions from each digested polypeptide sample; and iii) combine mass spectrometry data from each digested polypeptide sample to generate comprehensive mass spectrometry data on the polypeptide, wherein the comprehensive mass spectrometry data provides sequence coverage for at least 20% (preferably at least 50% or 80%) of the full length amino acid sequence of the polypeptide.
- the mass analyzer is able to generate at least a partial consensus amino acid sequence for the polypeptide from the comprehensive mass spectrometry data and calculate abundances of amino acids for one or more selected portions of the polypeptide from the comprehensive mass spectrometry data.
- the mass analyzer utilizes k-means clustering analysis on the normalized intensity data to generate the comprehensive mass spectrometry data.
- the system generates comprehensive mass spectrometry data from two or more samples, three or more samples, four or more samples, five or more samples, or six or more samples.
- FIG. 1 illustrates peptide mapping to coenzyme Q5.
- Each line represents a peptide identified and quantified by mass-spectrometry based proteomics over the full sequence length of coenzyme Q5.
- the peptides are a product of proteolytic digestion with six proteases (Asp-N, Chymotrypsin, Glu-C, Lys-C, Lys-N, and Trypsin).
- FIG. 2 shows an amino acid consensus map of coenzyme Q5. Each bar represents the ratio of the normalized amino acid intensity to the median amino acid intensity over the full sequence length of coenzyme Q5.
- the N-terminal region corresponds to a transit peptide.
- proteoform refers to the specific molecular form of a protein product arising from a specific gene.
- the proteoform of a polypeptide encompasses not only the translated amino acid sequence of the polypeptide, but also includes post-translational modifications of the polypeptide.
- Post-translational modifications are modifications that occur on a protein, typically catalyzed by enzymes, after its translation by ribosomes is complete.
- PTMS generally refer to the covalent addition of a functional group to a protein, proteolytic cleavage, or degradation of protein regions.
- PTMs include but are not limited to phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, neddylation, lipidation and proteolysis.
- mass spectrometry refers to an analytical technique for the determination of the elemental composition of an analyte.
- Mass spectrometric techniques are useful for elucidating the chemical structures of analytes, such as peptides and other chemical compounds.
- the mass spectrometry principle consists of ionizing analytes to generate charged species or species fragments and measurement of their mass-to-charge ratios. Conducting a mass spectrometric analysis of an analyte results in the generation of mass spectrometry data relating to the mass-to-charge ratios of the analyte and analyte fragments.
- Mass spectrometry data corresponding to analyte ion and analyte ion fragments is presented in mass-to-charge (m/z) units representing the mass-to-charge ratios of the analyte ions and/or analyte ion fragments.
- mass-to-charge (m/z) units representing the mass-to-charge ratios of the analyte ions and/or analyte ion fragments.
- tandem mass spectrometry MS/MS
- samples containing a mixture of proteins and peptides are ionized and the resulting precursor ions scanned to determine their mass-to-charge ratio.
- MS 2 stage selected precursor ions are fragmented and further analyzed according to the mass-to-charge ratio of the fragments.
- mass-to-charge ratio refers to the ratio of the mass of a species to the charge state of a species.
- the term “precursor ion” is used herein to refer to an ion which is produced during ionization stage of mass spectrometry analysis, including the MS 1 ionization stage of MS/MS analysis.
- the term “product ion” is to refer to an ion which is produced during a fragmentation process of a precursor ion, such as the MS 2 fragmentation stage of MS/MS analysis.
- peptide and “polypeptide” are used synonymously in the present description, and refer to a class of compounds composed of amino acid residues chemically bonded together by amide bonds (or peptide bonds).
- Peptides and polypeptides are polymeric compounds comprising at least two amino acid residues or modified amino acid residues. Modifications can be naturally occurring or non-naturally occurring, such as modifications generated by chemical synthesis.
- Modifications to amino acids in peptides include, but are not limited to, phosphorylation, glycosylation, lipidation, prenylation, sulfonation, hydroxylation, acetylation, methylation, methionine oxidation, alkylation, acylation, carbamylation, iodination and the addition of cofactors.
- Peptides include proteins and further include compositions generated by degradation of proteins, for example by proteolyic digestion. Peptides and polypeptides can be generated by substantially complete digestion or by partial digestion of proteins.
- Polypeptides include, for example, polypeptides comprising 1 to 100 amino acid units, optionally for some embodiments 1 to 50 amino acid units and, optionally for some embodiments 1 to 20 amino acid units.
- Antibodies are specialized proteins produced by the immune system as a defense against foreign agents (antigens). Each antibody has a region that binds specifically to a particular antigen which it neutralizes.
- ADCs Antibody Drug Conjugates
- mAbs monoclonal antibodies
- the antibody region is preferably selective for an antigen expressed on cells or tissues to which the biologically active drug is designed to be delivered.
- the antibody region may be selective for a tumor-associated antigen that has restricted or no expression on normal (healthy) cells and therefore enables the ADC to deliver the biologically active drug to the tumor cells.
- Therapeutic proteins can be any protein, fusion protein, or polypeptide isolated or produced for pharmaceutical use.
- Therapeutic proteins include, but are not limited to, anticoagulants, blood factors, bone morphogenetic proteins, engineered protein scaffolds, enzymes, growth factors, hormones, interferons, interleukins, and thrombolytics.
- Therapeutic proteins can also be classified based on their molecular mechanism of activity as (a) binding non-covalently to target, e.g., mAbs; (b) affecting covalent bonds, e.g., enzymes; and (c) exerting activity without specific interactions, e.g., serum albumin.
- K-means clustering analysis is a commonly used data clustering process for unsupervised learning tasks (see, for example, Hartigan and Wang, 1979, “A K-means clustering algorithm,” Applied Statistics 28: 100-108; and. J. B. MacQueen, 1967, “Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability”, Berkeley, University of California Press, 1:281-297). K-means clustering can be used to find groups which have not been explicitly labeled in the data.
- mass spectrometry can be used to detect peptides that are created by digesting proteins, either purified protein or complex mixtures of proteins, into smaller, easier to detect peptide samples.
- the samples are typically separated via liquid chromatography after which the peptides are ionized and injected into the mass spectrometer where they are separated based on the mass to charge ratio (m/z) during the MS 1 stage.
- a selected set of ions are subsequently fragmented and separated during the MS 2 stage, the result of which is a “fingerprint” that is used to identify the protein via a comparative database search.
- This process known as shotgun proteomics, has become the primary method for protein detection and quantification.
- the present invention provides methods and systems for analyzing polypeptides which provide increased sequence coverage and improved analysis. Aspects of the present invention provide a sequencing strategy to provide more protein sequence coverage than is typically achieved, and a computational approach to view a proteins expression across its full length and identify regions of the protein that are potentially subject to such regulation. This technology has global utility in any proteomics experiment and will be of particular use for the analysis of biosimilar protein drug therapeutics.
- a large scale study of the yeast proteome was performed.
- the proteome was digested with the following six proteases: trypsin, Lys-N, Lys-C, Glu-C, chymotrypsin, and Asp-N.
- trypsin trypsin
- Lys-N Lys-N
- Lys-C Lys-C
- Glu-C Glu-C
- chymotrypsin Asp-N.
- COQ5 coenzyme Q5
- a methyltransferase essential to the ubiquinone biosynthesis pathway.
- FIG. 1 illustrates the sequencing depth of COQ5, with over 250 quantified peptides.
- the peptide mapping reveals the dynamic range of peptides from multiple proteases, but also putative proteoforms.
- an amino acid consensus map was built by grouping the peptide information to distinguish features unique to proteoforms, such as truncation or PTM sites.
- peptide intensities were normalized to adjust bias in protease activity and ionization efficiency over proteases ( FIG. 2 ).
- a test statistic was implemented to assign significance to variability in amino acid intensities.
- the N-terminal processed proteform (amino acid position 31 ) was able to be distinguished from the unprocessed proteoform ( FIG. 2 ) of COQ5.
- This method enables the detection of proteoforms not readily accessible in a typical shotgun proteomics experiment.
- This example is one of many for N-terminal truncation, but other proteoform features, such as alternative gene products, PTM sites, and sequence variations, are also able to be distinguished.
- This is a use case, which requires a single condition, but it would be easy to extend the method to multiple conditions.
- Such a comparative analysis e.g., between control and treatment sample(s) would allow to identify changes in proteoform composition and respective features.
- proteases overcomes a limitation of common shotgun proteomic techniques that rely heavily on trypsin. Cleavage with trypsin at amino acids arginine and lysine results in a mean protein sequence coverage of 15% to 20%. While sufficient for most proteomics experiments, the limited coverage does not allow for the identification of individual proteoforms.
- the mixture of six proteases increases the mean sequence coverage to 80%, and when combined with the bioinformatic analysis, allows for the identification of putative proteoforms with unprecedented amino acid resolution.
- a pellet corresponding to 5% of the total cells grown was resuspended in lysis buffer containing 8 M urea, 50 mM tris (pH 8), 75 mM sodium chloride, 100 mM sodium butyrate, protease (Roche) and phosphatase inhibitor tablet (Roche).
- Yeast cells were lysed by glass bead milling (Retsch). Briefly, 2 ml of acid washed glass beads were combined with 2.5 ml of resuspended yeast cells in a stainless steel container and shaken 8 times at 30 Hz for 4 min with a 1 min rest in between. Lysate protein concentration was measured by BCA (Thermo Pierce).
- LysC digestion a 1 mg protein aliquot was digested overnight with 20 ⁇ g LysC (Wako, Richmond, Va.) at room temperature in 4 M urea.
- LysN digestion a 1 mg protein aliquot was digested for four hours with 20 ⁇ g LysN (Thermo Pierce) at 37° C. in 4 M urea.
- GluC digestion a 1 mg protein aliquot was digested overnight with 25 ⁇ g GluC (Roche Diagnostics, Indianapolis, Ind.) at room temperature in 0.5 M urea.
- chymotrypsin digestion a 1 mg protein aliquot was digested overnight with 12.5 ⁇ g of chymotrypsin resuspended in 0.2% FA (Promega, Madison, Wis.) in 1 M urea.
- AspN a 1 mg protein aliquot was incubated with 6 ⁇ g AspN (Roche Diagnostics, Indianapolis, Ind.) at room temperature overnight. Each digest was quenched by the addition of TFA and desalted on a 100 mg C18 Sep-Pak cartridge (Waters, Milford, Mass.).
- Peptide cations were electrosprayed into a Thermo Orbitrap Fusion (Q-OT-qIT, Thermo). All fractions were analyzed using HCD and ETD.
- HCD precursor scans were performed from 300 to 1,500 m/z at either 60K or 120K resolution (at 400 m/z). A 5 ⁇ 10 5 ion count target was used.
- Precursors selected for tandem MS were isolated at 0.7 Th with the quadrupole, fragmented by HCD with a normalized collision energy of 30, and analyzed using turbo scan in the ion trap.
- the maximum injection time for MS 2 analysis was normally set at either 25 or 35 ms, but was set higher for some analyses, with an ion count target of 10 4 .
- Precursors with a charge state of 2-8 were sampled for MS 2 .
- Dynamic exclusion time was set at 15 seconds, with a 10 ppm tolerance around the selected precursor and its isotopes.
- Monoisotopic precursor selection was turned on. Analyses were performed in top speed mode with either 3 or 5 second cycles.
- precursor scans were performed from 200 to 800 m/z at either 60K or 120K resolution (at 400 m/z).
- a 5 ⁇ 10 5 ion count target was used on the Orbitrap Fusion, a 1 ⁇ 10 6 ion count target was used on the Orbitrap Lumos.
- Precursors selected for tandem MS were isolated at 0.7 Th with the quadrupole. Precursors were fragmented by ETD using custom reaction times; +3: 40 ms, +4: 22 ms, +5: 14 ms, +6: 10 ms, +2: 70 ms. EThcD was performed on +2 precursors, at 25% supplemental activation collision energy.
- Precursor ions were selected for fragmentation based on charge state in the following order: +3, +4, +5, +6, +2. Fragment ions were analyzed in the ion trap. Dynamic exclusion time was set at 15 seconds, with a 10 ppm tolerance around the selected precursor and its isotopes. Monoisotopic precursor selection was turned on. Analyses were performed in top speed mode with either 3 or 5 second cycles.
- the raw mass spectrometry data was processed using the MaxQuant software (version 1.5.7.5). Searches were performed against the UniProt database (U P000002311_559292). Searches were conducted using the default precursor mass tolerances set by Andromeda (20 ppm first search, 4.5 ppm main search) and product mass tolerance of 0.35 Da and 0.015 Da, respectively. A maximum of two missed tryptic cleavages was allowed. The fixed modification specified was carbamidomethylation of cysteine residues. The variable modifications specified were oxidation of methionine and protein acetylation (N-term). For all experiments, peptides and their corresponding proteins groups were both filtered to a 1% false discovery rate.
- the peptide extracted-ion chromatogram (XIC) intensities from the MaxQuant peptides file were used.
- the XIC intensities were normalized by quantile normalization.
- To construct the amino acid map the peptide sequences were assembled to the reference sequences provided in the protein sequence database. Amino acid abundances were calculated as the mean of the XIC peptide intensities matching to the amino acid position in the protein sequence. A ratio for each amino acid was calculated by dividing the abundance by the median abundance. The ratios were used in one or two sample T-tests (requires replicate analysis) to infer statistical significance for each amino acid position. To control Type I errors a multiple hypothesis test correction (FDR) was performed.
- FDR multiple hypothesis test correction
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Analytical Chemistry (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Hematology (AREA)
- Biomedical Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Data Mining & Analysis (AREA)
- Microbiology (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Medicinal Chemistry (AREA)
- Cell Biology (AREA)
- Food Science & Technology (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
Description
- This application claims priority from United States Provisional Patent Application No. 62/511,011, filed May 25, 2017, which is incorporated by reference herein to the extent that there is no inconsistency with the present disclosure.
- This invention was made with government support under GM 118110 and GM108538 awarded by the National Institutes of Health. The government has certain rights in the invention.
- Mass-spectrometry-based proteomics is a key technology for studying the proteome, which can comprise canonical gene products, alternative gene products, post-translational modifications (PTMs), non-synonymous single nucleotide polymorphisms (SNPs) and other sequence variations. The most prevalent paradigms are top-down proteomics and bottom-up proteomics, also known as shotgun proteomics. In shotgun proteomics, proteins in a sample undergo proteolytic digestion, breaking the proteins into smaller pieces (peptides), which are then subjected to analysis by mass spectrometry. To infer the amino acid sequence and quantity of the peptides, the resulting data is often processed using a search engine in conjunction with a sequence database containing data of known peptides and proteins.
- In shotgun proteomics, it is important that the digest is able to produce peptides appropriate for the mass spectrometer. The most common and effective protease is trypsin, which will cleave C-terminal to the amino acids arginine and lysine, resulting in a mean protein sequence coverage in the range of 15% to 20%. Trypsin's moderate sequence coverage is sufficient for most proteomics experiments but does not provide sufficient sequence coverage for distinguishing various forms of a protein (proteoforms). Other proteases are similar in that they only provide partial coverage of the total protein sequence.
- As a result of the partial coverage provided by conventional digests, typically only a fraction of the peptides from the parent protein are actually detected and analyzed. Many modifications, splicing events, single nucleotide polymorphisms, and truncations of the proteome frequently occur and are tightly regulated. However, because the entire protein sequence is not detected, it is often impossible to determine whether the expressed protein is present in a modified, spliced, or truncated form. As such, this level of information is often not collected during conventional proteomic analyses, and there is currently no good way to monitor which form(s) of a protein exists in the cell, or which form(s) of a protein are present in a purified protein drug therapeutic.
- To overcome the above limitations, the present invention discloses methods and systems for analyzing polypeptides which provide increased sequence coverage and improved analysis of the protein and proteoform. Embodiments of the invention include a deep sequencing strategy to provide more protein sequence coverage than is typically achieved by conventional means, as well as a computational approach to view protein expression across the full length of the protein to identify regions that are potentially subject to alterations, regulation, processing, and modifications.
- Aspects of the invention include improved sample preparation, high resolution mass-spectrometry, bioinformatic analysis, and combinations thereof. For example, embodiments of the present invention encompass the use of multiple proteases, which allow for the increase of the mean sequence coverage (in some cases, up to 80%), concomitant with bioinformatics analysis in order to distinguish putative proteoforms with improved amino acid resolution.
- In embodiments described herein, multiple samples of the same polypeptide are used to determine the sequence and proteoform information of the polypeptide with great accuracy. In some embodiments, one or more samples of different polypeptides are used to determine and compare the sequence and proteoform information of the polypeptides.
- In an embodiment, the present invention provides a method for analyzing a polypeptide having an amino acid sequence. The method comprises the steps of: a) digesting a first sample of the polypeptide with a first protease or chemical agent; b) digesting a second sample of the polypeptide with a second protease or chemical agent; c) generating tandem mass spectrometry data on each digested polypeptide sample; and d) combining mass spectrometry data from each digested polypeptide sample to generate comprehensive mass spectrometry data on the polypeptide.
- Optionally, the method further comprises digesting one or more additional samples of the polypeptide with one or more additional proteases or chemical agents, wherein the protease or chemical agent used for each sample is a different protease or chemical agent used to digest any other sample. For example, in an embodiment the method further comprise digesting a third sample of the polypeptide with a third protease, digesting a fourth sample of the polypeptide with a fourth protease, digesting a fifth sample of the polypeptide with a fifth protease, and/or digesting a sixth sample of the polypeptide with a sixth protease. Each protease or chemical agent used to digest a sample is different. In an embodiment, three to four samples are independently digested by three to four unique proteases. In an embodiment, each sample is digested and analyzed concurrently with the other sample, analyzed on the mass spectrometer device, and/or as part of the same experiment.
- In an embodiment, the present invention provides a method for analyzing two or more polypeptides comprising the steps of: a) independently digesting a first sample of a first polypeptide and a first sample of a second polypeptide with a first protease or chemical agent; b) independently digesting a second sample of the first polypeptide and a second sample of the second polypeptide with a second protease or chemical agent; c) generating tandem mass spectrometry data on each digested polypeptide sample; d) for each polypeptide, combining mass spectrometry data from each digested polypeptide sample of that polypeptide to generate comprehensive mass spectrometry data; and e) generating at least a partial consensus amino acid sequence for each polypeptide from the comprehensive mass spectrometry data and calculating abundances of amino acids. In an embodiment, the partial consensus amino acid sequence provides at least 50% of the full length polypeptide sequence, preferably, 60% of the full length polypeptide sequence, or 80% of the full length polypeptide sequence. A further embodiment comprises f) comparing the consensus sequence abundances of amino acids of each polypeptide, and identifying differences in amino acid abundance between the polypeptides.
- Optionally, the method further comprises independently digesting a third sample of the first polypeptide and a third sample of the second polypeptide with a third protease; independently digesting a fourth sample of the first polypeptide and a fourth sample of the second polypeptide with a fourth protease; independently digesting a fifth sample of the first polypeptide and a fifth sample of the second polypeptide with a fifth protease; and/or independently digesting a sixth sample of the first polypeptide and a sixth sample of the second polypeptide with a sixth protease. Each protease or chemical agent used to digest a sample is different. In an embodiment, three to four samples of each polypeptide are independently digested.
- Optionally, the tandem mass spectrometry data for each digested polypeptide sample in the methods described herein is generated by first generating a distribution of precursor ions during MS1 stage ionization, fragmenting precursor ions having a mass-to-charge ratio (m/z) within a selected target m/z range during MS2 stage fragmentation, thereby generating a plurality of product ions where the product ions correspond to portions of the amino acid sequence of the polypeptide, and measuring the m/z and intensity of the product ions, thereby generating mass spectrometry data for each digested polypeptide sample.
- In an embodiment, the comprehensive mass spectrometry data is used to generate at least a partial consensus amino acid sequence of the polypeptide. For example, the comprehensive mass spectrometry data is used to calculate the quantity or abundances of amino acids for one or more selected portions of the polypeptide, including, but not limited to, portions which comprise the N-terminus of the polypeptide. In an embodiment, the abundance or quantification of amino acids is performed without the use of an isobaric or chemical label attached to the polypeptide.
- Preferably, the comprehensive mass spectrometry data provides sequence coverage for at least 20% of the full length amino acid sequence of the polypeptide, at least 30% of the full length amino acid sequence of the polypeptide at least 40% of the full length amino acid sequence of the polypeptide, at least 50% of the full length amino acid sequence of the polypeptide, at least 60% of the full length amino acid sequence of the polypeptide, at least 70% of the full length amino acid sequence of the polypeptide, or at least 80% of the full length amino acid sequence of the polypeptide.
- In a further embodiment, the mass spectrometry data for different polypeptides is compared to one another to determine any differences between the polypeptide samples. These differences include the presence or absence of amino acids, polymorphisms, mutations, and post translational modification (PTMs) of amino acids, including but not limited to phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, lipidation and proteolysis. In an embodiment, at least one sample polypeptide is a control polypeptide and least one of the other sample polypeptides is a polypeptide which has undergone a suspected modification, splice, truncation, polymorphism or mutation. In a further embodiment, the modification is a post-translational modification or the result of a single nucleotide polymorphism.
- In a further embodiment, the measured intensities of the product ions from each digested polypeptide sample are normalized during generation of the comprehensive mass spectrometry data. The normalized data is then optionally used to identify a portion of the product ions as corresponding to one or more known amino acid sequence fragments of the polypeptide sample. In a further embodiment, k-means clustering analysis is performed on the normalized intensity data during generation of the comprehensive mass spectrometry data. Optionally, alternative clustering algorithms can be used to group data points.
- Proteases suitable for use with the present invention include, but are not limited to, the group consisting of trypsin, Lys-N, Lys-C, Glu-C (Protease V8), chymotrypsin, Asp-N, and combinations thereof. Chemical agents suitable for use with the present invention include, but are not limited to, the group consisting of cyanogen bromide, formic acid, hydroxylamine, 2-nitro-5-thiocyanobenzoic acid (NTCB), and BNPS skatole (2-(2-nitrophenylsulfenyl)-3-methyl-3-bromoindolenine).
- Preferably, the polypeptides analyzed with the methods of the present invention are antibodies, antibody-drug conjugates, or therapeutic proteins that can be administered to a subject. In an embodiment, the present methods are used to determine if therapeutic products generated during a biochemical or manufacturing process have the same quality and proteoform as a desired control antibody, antibody-drug conjugate, or therapeutic protein. For example, in an embodiment, a first analyzed polypeptide is a control therapeutic polypeptide, antibody, or antibody-drug conjugate, and a second analyzed polypeptide is a production therapeutic polypeptide, antibody, or antibody-drug conjugate made during a biochemical process or manufacturing process.
- In a further embodiment, the first polypeptide is a control polypeptide produced by a cell and the second polypeptide is produced by a cell which has been administered a treatment. The polypeptides are then analyzed to determine if the treatment alters the sequence or proteoform of the polypeptide.
- In an embodiment, the mass spectrometry data is generated at a resolution of 60K or greater. In another embodiment, the mass spectrometry data is generated at a resolution of 120K or greater. In another embodiment, mass spectrometry data collected from the MS1 stage is generated at a resolution of 60K or greater, and mass spectrometry data collected from the MS2 stage is generated at a resolution of 120K or greater.
- In addition to determining the proteoform of a protein, the methods and systems described herein can also be used to determine specific amino acid abundance in addition to or instead of peptide abundance. Accordingly, the methods described herein will be greatly useful to the proteomics community, as well as to the pharmaceutical industry by allowing the full characterization of the sequence and structure of biosimilar drug therapeutics and determining quality assurance. Taking all together, these methods can impact the proteomics community and pharmaceutical industry alike.
- In an embodiment, the invention also provides a system for analyzing a polypeptide having an amino acid sequence comprising: a) an ion source for generating ions from a plurality of digested samples of the polypeptide; b) ion fragmentation optics in communication with the ion source for generating product ions; c) an ion detector in communication with the ion fragmentation optics for detecting ions according to their mass-to-charge ratios; and d) a mass analyzer in communication with the ion detector. The mass analyzer comprises a software program enabling the mass analyzer to: i) measure m/z and intensity of the detected ions, thereby generating mass spectrometry data for each digested polypeptide sample; ii) normalize the measured intensities of the product ions from each digested polypeptide sample; and iii) combine mass spectrometry data from each digested polypeptide sample to generate comprehensive mass spectrometry data on the polypeptide, wherein the comprehensive mass spectrometry data provides sequence coverage for at least 20% (preferably at least 50% or 80%) of the full length amino acid sequence of the polypeptide. Optionally, the mass analyzer is able to generate at least a partial consensus amino acid sequence for the polypeptide from the comprehensive mass spectrometry data and calculate abundances of amino acids for one or more selected portions of the polypeptide from the comprehensive mass spectrometry data. In an embodiment, the mass analyzer utilizes k-means clustering analysis on the normalized intensity data to generate the comprehensive mass spectrometry data. In an embodiment, the system generates comprehensive mass spectrometry data from two or more samples, three or more samples, four or more samples, five or more samples, or six or more samples.
-
FIG. 1 illustrates peptide mapping to coenzyme Q5. Each line represents a peptide identified and quantified by mass-spectrometry based proteomics over the full sequence length of coenzyme Q5. The peptides are a product of proteolytic digestion with six proteases (Asp-N, Chymotrypsin, Glu-C, Lys-C, Lys-N, and Trypsin). -
FIG. 2 shows an amino acid consensus map of coenzyme Q5. Each bar represents the ratio of the normalized amino acid intensity to the median amino acid intensity over the full sequence length of coenzyme Q5. The N-terminal region corresponds to a transit peptide. - In general the terms and phrases used herein have their art-recognized meaning, which can be found by reference to standard texts, journal references and contexts known to those skilled in the art. The following definitions are provided to clarify their specific use in the context of the invention.
- As used herein, the term “proteoform” refers to the specific molecular form of a protein product arising from a specific gene. The proteoform of a polypeptide encompasses not only the translated amino acid sequence of the polypeptide, but also includes post-translational modifications of the polypeptide.
- Post-translational modifications (PTMs) are modifications that occur on a protein, typically catalyzed by enzymes, after its translation by ribosomes is complete. PTMS generally refer to the covalent addition of a functional group to a protein, proteolytic cleavage, or degradation of protein regions. PTMs include but are not limited to phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, neddylation, lipidation and proteolysis.
- As used herein, the term “mass spectrometry” (MS) refers to an analytical technique for the determination of the elemental composition of an analyte. Mass spectrometric techniques are useful for elucidating the chemical structures of analytes, such as peptides and other chemical compounds. The mass spectrometry principle consists of ionizing analytes to generate charged species or species fragments and measurement of their mass-to-charge ratios. Conducting a mass spectrometric analysis of an analyte results in the generation of mass spectrometry data relating to the mass-to-charge ratios of the analyte and analyte fragments. Mass spectrometry data corresponding to analyte ion and analyte ion fragments is presented in mass-to-charge (m/z) units representing the mass-to-charge ratios of the analyte ions and/or analyte ion fragments. In tandem mass spectrometry (MS/MS), multiple rounds of mass spectrometry analysis are performed. For example, during the MS1 stage of tandem Mass spectrometry, samples containing a mixture of proteins and peptides are ionized and the resulting precursor ions scanned to determine their mass-to-charge ratio. During the MS2 stage, selected precursor ions are fragmented and further analyzed according to the mass-to-charge ratio of the fragments.
- As used herein, the term “mass-to-charge ratio” refers to the ratio of the mass of a species to the charge state of a species.
- As used herein, the term “precursor ion” is used herein to refer to an ion which is produced during ionization stage of mass spectrometry analysis, including the MS1 ionization stage of MS/MS analysis. As used herein, the term “product ion” is to refer to an ion which is produced during a fragmentation process of a precursor ion, such as the MS2 fragmentation stage of MS/MS analysis.
- The terms “peptide” and “polypeptide” are used synonymously in the present description, and refer to a class of compounds composed of amino acid residues chemically bonded together by amide bonds (or peptide bonds). Peptides and polypeptides are polymeric compounds comprising at least two amino acid residues or modified amino acid residues. Modifications can be naturally occurring or non-naturally occurring, such as modifications generated by chemical synthesis. Modifications to amino acids in peptides include, but are not limited to, phosphorylation, glycosylation, lipidation, prenylation, sulfonation, hydroxylation, acetylation, methylation, methionine oxidation, alkylation, acylation, carbamylation, iodination and the addition of cofactors. Peptides include proteins and further include compositions generated by degradation of proteins, for example by proteolyic digestion. Peptides and polypeptides can be generated by substantially complete digestion or by partial digestion of proteins. Polypeptides include, for example, polypeptides comprising 1 to 100 amino acid units, optionally for some embodiments 1 to 50 amino acid units and, optionally for some embodiments 1 to 20 amino acid units.
- Antibodies are specialized proteins produced by the immune system as a defense against foreign agents (antigens). Each antibody has a region that binds specifically to a particular antigen which it neutralizes.
- Antibody Drug Conjugates (ADCs) are monoclonal antibodies (mAbs) attached to biologically active drugs by chemical linkers with labile bonds. The antibody region is preferably selective for an antigen expressed on cells or tissues to which the biologically active drug is designed to be delivered. For example, the antibody region may be selective for a tumor-associated antigen that has restricted or no expression on normal (healthy) cells and therefore enables the ADC to deliver the biologically active drug to the tumor cells.
- Therapeutic proteins can be any protein, fusion protein, or polypeptide isolated or produced for pharmaceutical use. Therapeutic proteins include, but are not limited to, anticoagulants, blood factors, bone morphogenetic proteins, engineered protein scaffolds, enzymes, growth factors, hormones, interferons, interleukins, and thrombolytics. Therapeutic proteins can also be classified based on their molecular mechanism of activity as (a) binding non-covalently to target, e.g., mAbs; (b) affecting covalent bonds, e.g., enzymes; and (c) exerting activity without specific interactions, e.g., serum albumin.
- K-means clustering analysis is a commonly used data clustering process for unsupervised learning tasks (see, for example, Hartigan and Wang, 1979, “A K-means clustering algorithm,” Applied Statistics 28: 100-108; and. J. B. MacQueen, 1967, “Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability”, Berkeley, University of California Press, 1:281-297). K-means clustering can be used to find groups which have not been explicitly labeled in the data.
- As described herein, mass spectrometry can be used to detect peptides that are created by digesting proteins, either purified protein or complex mixtures of proteins, into smaller, easier to detect peptide samples. The samples are typically separated via liquid chromatography after which the peptides are ionized and injected into the mass spectrometer where they are separated based on the mass to charge ratio (m/z) during the MS1 stage. A selected set of ions are subsequently fragmented and separated during the MS2 stage, the result of which is a “fingerprint” that is used to identify the protein via a comparative database search. This process, known as shotgun proteomics, has become the primary method for protein detection and quantification.
- Because of incomplete digestion, typically only a fraction of the peptides from the parent protein are actually detected and analyzed. Many modifications, splicing events, single nucleotide polymorphisms, and truncations of the proteome frequently occur and are tightly regulated. However, because the entire protein sequence is not detected, it is often impossible to determine whether the expressed protein is present in a modified, spliced, or truncated form.
- To overcome these limitations, the present invention provides methods and systems for analyzing polypeptides which provide increased sequence coverage and improved analysis. Aspects of the present invention provide a sequencing strategy to provide more protein sequence coverage than is typically achieved, and a computational approach to view a proteins expression across its full length and identify regions of the protein that are potentially subject to such regulation. This technology has global utility in any proteomics experiment and will be of particular use for the analysis of biosimilar protein drug therapeutics.
- In a first example, a large scale study of the yeast proteome was performed. The proteome was digested with the following six proteases: trypsin, Lys-N, Lys-C, Glu-C, chymotrypsin, and Asp-N. Over 6,000 proteins (95% of the yeast proteome) were identified with a mean sequence coverage of 80%. To enable the identification of endogenous peptides produced by natural proteolytic activity in the cell, the data was processed with a no enzyme search. Of special interest was the mitochondrial proteome, most of which is subject to N-terminal processing of the full protein sequence. The well characterized coenzyme Q5 (COQ5) protein, a methyltransferase, essential to the ubiquinone biosynthesis pathway, was selected. COQ5 undergoes post-translational processing by N-terminal truncation.
FIG. 1 illustrates the sequencing depth of COQ5, with over 250 quantified peptides. - The peptide mapping reveals the dynamic range of peptides from multiple proteases, but also putative proteoforms. To distinguish features unique to proteoforms, e.g., truncation or PTM sites, an amino acid consensus map was built by grouping the peptide information to distinguish features unique to proteoforms, such as truncation or PTM sites. In doing so, peptide intensities were normalized to adjust bias in protease activity and ionization efficiency over proteases (
FIG. 2 ). To distinguish technical from biological variability, a test statistic was implemented to assign significance to variability in amino acid intensities. - With this method, the N-terminal processed proteform (amino acid position 31) was able to be distinguished from the unprocessed proteoform (
FIG. 2 ) of COQ5. - This method enables the detection of proteoforms not readily accessible in a typical shotgun proteomics experiment. This example is one of many for N-terminal truncation, but other proteoform features, such as alternative gene products, PTM sites, and sequence variations, are also able to be distinguished. This is a use case, which requires a single condition, but it would be easy to extend the method to multiple conditions. Such a comparative analysis, e.g., between control and treatment sample(s) would allow to identify changes in proteoform composition and respective features.
- In addition, the use of multiple proteases overcomes a limitation of common shotgun proteomic techniques that rely heavily on trypsin. Cleavage with trypsin at amino acids arginine and lysine results in a mean protein sequence coverage of 15% to 20%. While sufficient for most proteomics experiments, the limited coverage does not allow for the identification of individual proteoforms. The mixture of six proteases increases the mean sequence coverage to 80%, and when combined with the bioinformatic analysis, allows for the identification of putative proteoforms with unprecedented amino acid resolution.
- Yeast Culture and Lysis. Saccharomyces cerevisiae strain BY4741 was grown in yeast extract peptone dextrose media (1% yeast extract, 2% peptone, 2% dextrose). Four liters of media was divided between four two-liter flasks and inoculated with a starter culture (OD600=1.17). Cells were allowed to propagate for ˜18 h to an average OD600 of 1.31. The cells were harvested by centrifugation at 5000 rpm for 5 min, the supernatant was decanted, and the pellets were resuspended in chilled NanoPure water. The cells were washed two more times and centrifuged for the final pelleting at 5000 rpm for 10 min. A pellet corresponding to 5% of the total cells grown was resuspended in lysis buffer containing 8 M urea, 50 mM tris (pH 8), 75 mM sodium chloride, 100 mM sodium butyrate, protease (Roche) and phosphatase inhibitor tablet (Roche). Yeast cells were lysed by glass bead milling (Retsch). Briefly, 2 ml of acid washed glass beads were combined with 2.5 ml of resuspended yeast cells in a stainless steel container and shaken 8 times at 30 Hz for 4 min with a 1 min rest in between. Lysate protein concentration was measured by BCA (Thermo Pierce).
- Digestion. Protein was reduced by addition of 5 mM dithiothreitol and incubated for 45 min at 55° C. The mixture was cooled to room temperature, followed by alkylation of free thiols by addition of 15 mM iodoacetamide in the dark for 30 min. The alkylation reaction was quenched with 5 mM dithiothreitol. For tryptic digestion, a 1 mg protein aliquot was digested overnight with 20 μg trypsin (Promega, Madison, Wis.) at room temperature in 1 M urea. For LysC digestion, a 1 mg protein aliquot was digested overnight with 20 μg LysC (Wako, Richmond, Va.) at room temperature in 4 M urea. For LysN digestion, a 1 mg protein aliquot was digested for four hours with 20 μg LysN (Thermo Pierce) at 37° C. in 4 M urea. For GluC digestion, a 1 mg protein aliquot was digested overnight with 25 μg GluC (Roche Diagnostics, Indianapolis, Ind.) at room temperature in 0.5 M urea. For chymotrypsin digestion, a 1 mg protein aliquot was digested overnight with 12.5 μg of chymotrypsin resuspended in 0.2% FA (Promega, Madison, Wis.) in 1 M urea. For digestion with AspN, a 1 mg protein aliquot was incubated with 6 μg AspN (Roche Diagnostics, Indianapolis, Ind.) at room temperature overnight. Each digest was quenched by the addition of TFA and desalted on a 100 mg C18 Sep-Pak cartridge (Waters, Milford, Mass.).
- Fractionation. High-pH RP fractionation was performed using a Surveyor LC quarternary pump. Fractionation was performed at a flow rate of 1.0 mL/min using a 5 μm column packed with C18 particles (250-mm by 4.6-mm, Phenomenex) on a Surveyor LC quarternary pump. Samples were resuspended in buffer A and separated using the following gradient: 0-2 min, 100% buffer A and separated by increasing buffer B over a 60-minute gradient at a flow rate of 0.8 mL/minute (buffer A: 20 mM ammonium formate,
pH 10; buffer B: 20 mM ammonium formate,pH 10, in 80% ACN). Flow rate was increased to 1.5 mL/minute during equilibration. Peptides were concatenated to a final total of twenty fractions per enzymatic digest. - LC-MS/MS. Samples were resuspended in 0.2% formic acid (FA) and separated via reversed phase (RP) chromatography. Peptides were injected on to a RP column prepared in-house. Approximately 35 cm of 75 μm-360 μm inner-outer diameter bare-fused silica capillary, each with a laser pulled electrospray tip, were packed with 1.7 μm diameter, 130 Å pore size, Bridged Ethylene Hybrid C18 particles (Waters). Columns were fitted on to either a nanoAcquity (Waters) or Dionex (Thermo) and heated to 60° C. using a home-built column heater. Mobile phase buffer A was composed of water and 0.2% formic acid. Mobile phase B was composed of 70% ACN, 0.2% formic acid, and 5% DMSO. Each sample was separated over a 100-min gradient, including time for column re-equilibration. Flow rates were set at 300-350 μl/min.
- Peptide cations were electrosprayed into a Thermo Orbitrap Fusion (Q-OT-qIT, Thermo). All fractions were analyzed using HCD and ETD. For HCD, precursor scans were performed from 300 to 1,500 m/z at either 60K or 120K resolution (at 400 m/z). A 5×105 ion count target was used. Precursors selected for tandem MS were isolated at 0.7 Th with the quadrupole, fragmented by HCD with a normalized collision energy of 30, and analyzed using turbo scan in the ion trap. The maximum injection time for MS2 analysis was normally set at either 25 or 35 ms, but was set higher for some analyses, with an ion count target of 104. Precursors with a charge state of 2-8 were sampled for MS2. Dynamic exclusion time was set at 15 seconds, with a 10 ppm tolerance around the selected precursor and its isotopes. Monoisotopic precursor selection was turned on. Analyses were performed in top speed mode with either 3 or 5 second cycles.
- To maximize identifications from ETD analysis, precursor scans were performed from 200 to 800 m/z at either 60K or 120K resolution (at 400 m/z). A 5×105 ion count target was used on the Orbitrap Fusion, a 1×106 ion count target was used on the Orbitrap Lumos. Precursors selected for tandem MS were isolated at 0.7 Th with the quadrupole. Precursors were fragmented by ETD using custom reaction times; +3: 40 ms, +4: 22 ms, +5: 14 ms, +6: 10 ms, +2: 70 ms. EThcD was performed on +2 precursors, at 25% supplemental activation collision energy. Precursor ions were selected for fragmentation based on charge state in the following order: +3, +4, +5, +6, +2. Fragment ions were analyzed in the ion trap. Dynamic exclusion time was set at 15 seconds, with a 10 ppm tolerance around the selected precursor and its isotopes. Monoisotopic precursor selection was turned on. Analyses were performed in top speed mode with either 3 or 5 second cycles.
- Data Processing and Bioinformatics Analysis. The raw mass spectrometry data was processed using the MaxQuant software (version 1.5.7.5). Searches were performed against the UniProt database (U P000002311_559292). Searches were conducted using the default precursor mass tolerances set by Andromeda (20 ppm first search, 4.5 ppm main search) and product mass tolerance of 0.35 Da and 0.015 Da, respectively. A maximum of two missed tryptic cleavages was allowed. The fixed modification specified was carbamidomethylation of cysteine residues. The variable modifications specified were oxidation of methionine and protein acetylation (N-term). For all experiments, peptides and their corresponding proteins groups were both filtered to a 1% false discovery rate.
- The peptide extracted-ion chromatogram (XIC) intensities from the MaxQuant peptides file were used. The XIC intensities were normalized by quantile normalization. To construct the amino acid map the peptide sequences were assembled to the reference sequences provided in the protein sequence database. Amino acid abundances were calculated as the mean of the XIC peptide intensities matching to the amino acid position in the protein sequence. A ratio for each amino acid was calculated by dividing the abundance by the median abundance. The ratios were used in one or two sample T-tests (requires replicate analysis) to infer statistical significance for each amino acid position. To control Type I errors a multiple hypothesis test correction (FDR) was performed.
- Having now fully described the present invention in some detail by way of illustration and examples for purposes of clarity of understanding, it will be obvious to one of ordinary skill in the art that the same can be performed by modifying or changing the invention within a wide and equivalent range of conditions, formulations and other parameters without affecting the scope of the invention or any specific embodiment thereof, and that such modifications or changes are intended to be encompassed within the scope of the appended claims.
- When a group of materials, compositions, components or compounds is disclosed herein, it is understood that all individual members of those groups and all subgroups thereof are disclosed separately. Every formulation or combination of components described or exemplified herein can be used to practice the invention, unless otherwise stated. Whenever a range is given in the specification, for example, a temperature range, a time range, or a composition range, all intermediate ranges and subranges, as well as all individual values included in the ranges given are intended to be included in the disclosure. Additionally, the end points in a given range are to be included within the range. In the disclosure and the claims, “and/or” means additionally or alternatively. Moreover, any use of a term in the singular also encompasses plural forms.
- As used herein, “comprising” is synonymous with “including,” “containing,” or “characterized by,” and is inclusive or open-ended and does not exclude additional, unrecited elements or method steps. As used herein, “consisting of” excludes any element, step, or ingredient not specified in the claim element. As used herein, “consisting essentially of” does not exclude materials or steps that do not materially affect the basic and novel characteristics of the claim. Any recitation herein of the term “comprising”, particularly in a description of components of a composition or in a description of elements of a device, is understood to encompass those compositions and methods consisting essentially of and consisting of the recited components or elements.
- One of ordinary skill in the art will appreciate that starting materials, device elements, analytical methods, mixtures and combinations of components other than those specifically exemplified can be employed in the practice of the invention without resort to undue experimentation. All art-known functional equivalents, of any such materials and methods are intended to be included in this invention. The terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention that in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. The invention illustratively described herein suitably may be practiced in the absence of any element or elements, limitation or limitations which is not specifically disclosed herein. Headings are used herein for convenience only.
- All publications referred to herein are incorporated herein to the extent not inconsistent herewith. Some references provided herein are incorporated by reference to provide details of additional uses of the invention. All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the invention pertains. References cited herein are incorporated by reference herein in their entirety to indicate the state of the art as of their filing date and it is intended that this information can be employed herein, if needed, to exclude specific embodiments that are in the prior art.
Claims (25)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/988,566 US20180340941A1 (en) | 2017-05-25 | 2018-05-24 | Method to Map Protein Landscapes |
| US17/231,977 US12061204B2 (en) | 2017-05-25 | 2021-04-15 | Method to map protein landscapes |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762511011P | 2017-05-25 | 2017-05-25 | |
| US15/988,566 US20180340941A1 (en) | 2017-05-25 | 2018-05-24 | Method to Map Protein Landscapes |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/231,977 Continuation US12061204B2 (en) | 2017-05-25 | 2021-04-15 | Method to map protein landscapes |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20180340941A1 true US20180340941A1 (en) | 2018-11-29 |
Family
ID=64401141
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/988,566 Abandoned US20180340941A1 (en) | 2017-05-25 | 2018-05-24 | Method to Map Protein Landscapes |
| US17/231,977 Active 2038-06-14 US12061204B2 (en) | 2017-05-25 | 2021-04-15 | Method to map protein landscapes |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/231,977 Active 2038-06-14 US12061204B2 (en) | 2017-05-25 | 2021-04-15 | Method to map protein landscapes |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US20180340941A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111579702A (en) * | 2019-02-18 | 2020-08-25 | 上海美吉生物医药科技有限公司 | Method for detecting coverage of protein amino acid sequence |
| CN112786105A (en) * | 2020-12-07 | 2021-05-11 | 中山大学附属第五医院 | Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics |
| CN114137124A (en) * | 2021-12-01 | 2022-03-04 | 北京中医药大学 | Method for performing rapid peptide map analysis on protein |
| WO2023118561A1 (en) * | 2021-12-23 | 2023-06-29 | F.Hoffmann-La Roche Ag | Method of extracting information about protein sequence modifications |
| US12061204B2 (en) | 2017-05-25 | 2024-08-13 | Wisconsin Alumni Research Foundation | Method to map protein landscapes |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024238393A1 (en) * | 2023-05-10 | 2024-11-21 | Northwestern University | Proteoform imaging and characterization mass spectrometry |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060255258A1 (en) | 2005-04-11 | 2006-11-16 | Yongdong Wang | Chromatographic and mass spectral date analysis |
| TW201239355A (en) | 2011-03-23 | 2012-10-01 | Abbott Lab | Methods and systems for the analysis of protein samples |
| US20180340941A1 (en) | 2017-05-25 | 2018-11-29 | Wisconsin Alumni Research Foundation | Method to Map Protein Landscapes |
-
2018
- 2018-05-24 US US15/988,566 patent/US20180340941A1/en not_active Abandoned
-
2021
- 2021-04-15 US US17/231,977 patent/US12061204B2/en active Active
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12061204B2 (en) | 2017-05-25 | 2024-08-13 | Wisconsin Alumni Research Foundation | Method to map protein landscapes |
| CN111579702A (en) * | 2019-02-18 | 2020-08-25 | 上海美吉生物医药科技有限公司 | Method for detecting coverage of protein amino acid sequence |
| CN112786105A (en) * | 2020-12-07 | 2021-05-11 | 中山大学附属第五医院 | Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics |
| CN114137124A (en) * | 2021-12-01 | 2022-03-04 | 北京中医药大学 | Method for performing rapid peptide map analysis on protein |
| WO2023118561A1 (en) * | 2021-12-23 | 2023-06-29 | F.Hoffmann-La Roche Ag | Method of extracting information about protein sequence modifications |
Also Published As
| Publication number | Publication date |
|---|---|
| US20210239708A1 (en) | 2021-08-05 |
| US12061204B2 (en) | 2024-08-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12061204B2 (en) | Method to map protein landscapes | |
| Hansen et al. | Mass spectrometric analysis of protein mixtures at low levels using cleavable 13C-isotope-coded affinity tag and multidimensional chromatography | |
| Thiede et al. | High resolution quantitative proteomics of HeLa cells protein species using stable isotope labeling with amino acids in cell culture (SILAC), two-dimensional gel electrophoresis (2DE) and nano-liquid chromatograpohy coupled to an LTQ-OrbitrapMass spectrometer | |
| Picotti et al. | The implications of proteolytic background for shotgun proteomics | |
| US8669116B2 (en) | Detection and quantification of modified proteins | |
| Schrader et al. | Historical perspective of peptidomics | |
| CN101600959B (en) | Quantitative methods for peptides and proteins | |
| McLafferty et al. | Top‐down MS, a powerful complement to the high capabilities of proteolysis proteomics | |
| Romijn et al. | Recent liquid chromatographic–(tandem) mass spectrometric applications in proteomics | |
| Wu et al. | An integrated top-down and bottom-up strategy for broadly characterizing protein isoforms and modifications | |
| Cutillas et al. | Detection and analysis of urinary peptides by on-line liquid chromatography and mass spectrometry: application to patients with renal Fanconi syndrome | |
| Taouatas et al. | Strong cation exchange-based fractionation of Lys-N-generated peptides facilitates the targeted analysis of post-translational modifications | |
| Tanco et al. | C‐terminomics: targeted analysis of natural and posttranslationally modified protein and peptide C‐termini | |
| Wang et al. | Structural comparison of two anti-CD20 monoclonal antibody drug products using middle-down mass spectrometry | |
| US20170285042A1 (en) | Gas-Phase Purification for Accurate Isobaric Tag-Based Quantification | |
| TWI808975B (en) | Methods for absolute quantification of low-abundance polypeptides using mass spectrometry | |
| Poutanen et al. | Use of matrix‐assisted laser desorption/ionization time‐of‐flight mass mapping and nanospray liquid chromatography/electrospray ionization tandem mass spectrometry sequence tag analysis for high sensitivity identification of yeast proteins separated by two‐dimensional gel electrophoresis | |
| Jin et al. | Complete characterization of cardiac myosin heavy chain (223 kDa) enabled by size-exclusion chromatography and middle-down mass spectrometry | |
| Nakazawa et al. | Terminal proteomics: N‐and C‐terminal analyses for high‐fidelity identification of proteins using MS | |
| Gu et al. | Use of deuterium-labeled lysine for efficient protein identification and peptide de novo sequencing | |
| Liao et al. | Shotgun proteomics in neuroscience | |
| Trevisiol et al. | The use of proteases complementary to trypsin to probe isoforms and modifications | |
| Gupta et al. | Analyzing protease specificity and detecting in vivo proteolytic events using tandem mass spectrometry | |
| Bakhtiar et al. | Mass spectrometry of the proteome | |
| Zhou et al. | Discovery top-down proteomics in symbiotic soybean root nodules |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: WISCONSIN ALUMNI RESEARCH FOUNDATION, WISCONSIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARX, HARALD;COON, JOSHUA;REEL/FRAME:046180/0553 Effective date: 20170620 |
|
| AS | Assignment |
Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF WISCONSIN-MADISON;REEL/FRAME:046419/0386 Effective date: 20180622 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |