Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

PrSMBooster: Improving the Accuracy of Top-Down Proteoform Characterization Using Deep Learning Rescoring Models

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14956))

Included in the following conference series:

Abstract

The aim of top-down mass spectrometry-based proteoform identification and characterization is to achieve optimal alignment between mass spectra and proteoforms. Consequently, the accuracy of identification results is crucial. Proteins with multiple primary structure alterations generate various proteoforms, leading to a combinatorial explosion due to their vast numbers. Furthermore, there is no gold set as a reference. So, enhancing the accuracy of identification results remains challenging. We propose a novel rescoring algorithm, PrSMBooster, which employs an ensemble approach. This approach utilizes non-deep models such as XGBoost, Decision Trees, and SVM as weak learners to extract latent features from proteoform spectrum matches. Ultimately, the deep learning model ResNeXt is used for final rescoring. We applied the PrSMBooster rescoring model to 47 independent cross-species datasets. Our comparison with the identification algorithm TopPIC demonstrates that PrSMBooster scores more accurately. In the vast majority of datasets, PrSM increases were observed at 1% FDR. Our findings indicate that PrSMBooster enhances scoring accuracy, reveals more identification results, and exhibits strong generalization capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chiribiri, A., Masci, P.G.: From the epicardial vessels to the microcirculation: the coronary vasculature at the crossroad of HFpEF. Cardiovascular Imaging 14(12), 2334–2336 (2021)

    Google Scholar 

  2. Zhong, J., et al.: Proteoform characterization based on top-down mass spectrometry. Brief. Bioinform. 22(2), 1729–1750 (2021)

    Article  Google Scholar 

  3. Zamdborg, L., et al.: ProSight PTM 2.0: improved protein identification and characterization for top-down mass spectrometry. Nucleic Acids Res. 35(suppl_2), W701–W706 (2007)

    Google Scholar 

  4. Karabacak, N.M., et al.: Sensitive and specific identification of wild type and variant proteins from 8 to 669 kDa using top-down mass spectrometry* S. Mol. Cell. Proteomics 8(4), 846–856 (2009)

    Article  Google Scholar 

  5. Théberge, R., Infusini, G., Tong, W., McComb, M.E., Costello, C.E.: Top-down analysis of small plasma proteins using an LTQ-Orbitrap. Potential for mass spectrometry-based clinical assays for transthyretin and hemoglobin. Int. J. Mass Spectrom. 300(2–3), 130–142 (2011)

    Google Scholar 

  6. Li, L., Zhixin, T.: Interpreting raw biological mass spectra using isotopic mass‐to‐charge ratio and envelope fingerprinting. Rapid Commun. Mass Spectrom. 27(11), 1267–1277 (2013)

    Google Scholar 

  7. Solntsev, S.K., Shortreed, M.R., Frey, B.L., Smith, L.M.: Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17(5), 1844–1851 (2018)

    Article  Google Scholar 

  8. Toby, T.K., et al.: A comprehensive pipeline for translational top-down proteomics from a single blood draw. Nat. Protoc. 14(1), 119–152 (2019)

    Article  Google Scholar 

  9. Tsai, Y.S., et al.: Precursor ion independent algorithm for top-down shotgun proteomics. J. Am. Soc. Mass Spectrom. 20, 2154–2166 (2009)

    Article  Google Scholar 

  10. Frank, A.M., Pesavento, J.J., Mizzen, C.A., Kelleher, N.L., Pevzner, P.A.: Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80(7), 2499–2505 (2008)

    Article  Google Scholar 

  11. Liu, X., et al.: Protein identification using top-down spectra. Mol. Cell. Proteomics 11(6), 008524 (2012)

    Google Scholar 

  12. Cai, W., et al.: MASH suite pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteomics 15(2), 703–714 (2016)

    Article  Google Scholar 

  13. Liu, X., Hengel, S., Wu, S., Tolic, N., Pasa-Tolic, L., Pevzner, P.A.: Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12(12), 5830–5838 (2013)

    Article  Google Scholar 

  14. Kou, Q., Xun, L., Liu, X.: TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32(22), 3495–3497 (2016)

    Article  Google Scholar 

  15. Sun, R.X., et al.: pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88(6), 3082–3090 (2016)

    Google Scholar 

  16. Kira, V., et al.: De novo sequencing of peptides from top-down tandem mass spectra. J. Proteome Res. 14(11), 4450–4462 (2015)

    Article  Google Scholar 

  17. Vyatkina, K., et al.: Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 32(18), 2753–2759 (2016)

    Article  Google Scholar 

  18. Park, J., et al.: Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14(9), 909–914 (2017)

    Article  Google Scholar 

  19. Kou, Q., Wu, S., Tolić, N., Paša-Tolić, L., Liu, Y., Liu, X.: A mass graph-based approach for the identification of modified Proteoforms using top-down tandem mass spectra. Bioinformatics 33(9), 1309–1316 (2017)

    Article  Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  21. Basharat, A.R., Ning, X., Liu, X.: EnvCNN: a convolutional neural network model for evaluating isotopic envelopes in top-down mass-spectral deconvolution. Anal. Chem. 92(11), 7778–7785 (2020)

    Article  Google Scholar 

  22. Melani, R.D., et al.: The blood Proteoform Atlas: a reference map of Proteoforms in human hematopoietic cells. Science 375(6579), 411–418 (2022)

    Article  Google Scholar 

  23. McCool, E.N., et al.: Deep top-down proteomics revealed significant proteoform-level differences between metastatic and nonmetastatic colorectal cancer cells. Sci. Adv. 8(51), eabq6348 (2022)

    Google Scholar 

  24. Wang, Q., Xu, T., Fang, F., Wang, Q., Lundquist, P., Sun, L.: Capillary zone electrophoresis-tandem mass spectrometry for top-down proteomics of mouse brain integral membrane proteins. Anal. Chem. 95(34), 12590–12594 (2023)

    Article  Google Scholar 

  25. Project PXD018772. https://www.ebi.ac.uk/pride/archive/projects/PXD018772. Accessed 20 march 2024

  26. Albanese, P., Tamara, S., Saracco, G., Scheltema, R.A., Pagliano, C.: How paired PSII–LHCII supercomplexes mediate the stacking of plant thylakoid membranes unveiled by structural mass-spectrometry. Nat. Commun. 11(1), 1361 (2020)

    Article  Google Scholar 

  27. Wang, Q., Sun, L., Lundquist, P.K.: Large-scale top-down proteomics of the Arabidopsis thaliana leaf and chloroplast proteomes. Proteomics 23(3–4), 2100377 (2023)

    Article  Google Scholar 

  28. Sadeghi, S.A., et al.: Pilot evaluation of the long-term reproducibility of capillary zone electrophoresis–tandem mass spectrometry for top-down proteomics of a complex proteome sample. J. Proteome Res. 23, 1399–1407 (2024)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the National Natural Science Foundation of China, grant no 62372171, the Hunan Provincial Natural Science Foundation of China, grant no. 2023JJ30414. Scientific Research Fund of Hunan Provincial Education Department (No. 23A0100);

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaokai Wang .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

No competing interests.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, J., Yang, C., Yuan, M., Wang, S. (2024). PrSMBooster: Improving the Accuracy of Top-Down Proteoform Characterization Using Deep Learning Rescoring Models. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14956. Springer, Singapore. https://doi.org/10.1007/978-981-97-5087-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5087-0_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5086-3

  • Online ISBN: 978-981-97-5087-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics