Abstract
The aim of top-down mass spectrometry-based proteoform identification and characterization is to achieve optimal alignment between mass spectra and proteoforms. Consequently, the accuracy of identification results is crucial. Proteins with multiple primary structure alterations generate various proteoforms, leading to a combinatorial explosion due to their vast numbers. Furthermore, there is no gold set as a reference. So, enhancing the accuracy of identification results remains challenging. We propose a novel rescoring algorithm, PrSMBooster, which employs an ensemble approach. This approach utilizes non-deep models such as XGBoost, Decision Trees, and SVM as weak learners to extract latent features from proteoform spectrum matches. Ultimately, the deep learning model ResNeXt is used for final rescoring. We applied the PrSMBooster rescoring model to 47 independent cross-species datasets. Our comparison with the identification algorithm TopPIC demonstrates that PrSMBooster scores more accurately. In the vast majority of datasets, PrSM increases were observed at 1% FDR. Our findings indicate that PrSMBooster enhances scoring accuracy, reveals more identification results, and exhibits strong generalization capabilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chiribiri, A., Masci, P.G.: From the epicardial vessels to the microcirculation: the coronary vasculature at the crossroad of HFpEF. Cardiovascular Imaging 14(12), 2334–2336 (2021)
Zhong, J., et al.: Proteoform characterization based on top-down mass spectrometry. Brief. Bioinform. 22(2), 1729–1750 (2021)
Zamdborg, L., et al.: ProSight PTM 2.0: improved protein identification and characterization for top-down mass spectrometry. Nucleic Acids Res. 35(suppl_2), W701–W706 (2007)
Karabacak, N.M., et al.: Sensitive and specific identification of wild type and variant proteins from 8 to 669 kDa using top-down mass spectrometry* S. Mol. Cell. Proteomics 8(4), 846–856 (2009)
Théberge, R., Infusini, G., Tong, W., McComb, M.E., Costello, C.E.: Top-down analysis of small plasma proteins using an LTQ-Orbitrap. Potential for mass spectrometry-based clinical assays for transthyretin and hemoglobin. Int. J. Mass Spectrom. 300(2–3), 130–142 (2011)
Li, L., Zhixin, T.: Interpreting raw biological mass spectra using isotopic mass‐to‐charge ratio and envelope fingerprinting. Rapid Commun. Mass Spectrom. 27(11), 1267–1277 (2013)
Solntsev, S.K., Shortreed, M.R., Frey, B.L., Smith, L.M.: Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17(5), 1844–1851 (2018)
Toby, T.K., et al.: A comprehensive pipeline for translational top-down proteomics from a single blood draw. Nat. Protoc. 14(1), 119–152 (2019)
Tsai, Y.S., et al.: Precursor ion independent algorithm for top-down shotgun proteomics. J. Am. Soc. Mass Spectrom. 20, 2154–2166 (2009)
Frank, A.M., Pesavento, J.J., Mizzen, C.A., Kelleher, N.L., Pevzner, P.A.: Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80(7), 2499–2505 (2008)
Liu, X., et al.: Protein identification using top-down spectra. Mol. Cell. Proteomics 11(6), 008524 (2012)
Cai, W., et al.: MASH suite pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteomics 15(2), 703–714 (2016)
Liu, X., Hengel, S., Wu, S., Tolic, N., Pasa-Tolic, L., Pevzner, P.A.: Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12(12), 5830–5838 (2013)
Kou, Q., Xun, L., Liu, X.: TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32(22), 3495–3497 (2016)
Sun, R.X., et al.: pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88(6), 3082–3090 (2016)
Kira, V., et al.: De novo sequencing of peptides from top-down tandem mass spectra. J. Proteome Res. 14(11), 4450–4462 (2015)
Vyatkina, K., et al.: Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 32(18), 2753–2759 (2016)
Park, J., et al.: Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14(9), 909–914 (2017)
Kou, Q., Wu, S., Tolić, N., Paša-Tolić, L., Liu, Y., Liu, X.: A mass graph-based approach for the identification of modified Proteoforms using top-down tandem mass spectra. Bioinformatics 33(9), 1309–1316 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Basharat, A.R., Ning, X., Liu, X.: EnvCNN: a convolutional neural network model for evaluating isotopic envelopes in top-down mass-spectral deconvolution. Anal. Chem. 92(11), 7778–7785 (2020)
Melani, R.D., et al.: The blood Proteoform Atlas: a reference map of Proteoforms in human hematopoietic cells. Science 375(6579), 411–418 (2022)
McCool, E.N., et al.: Deep top-down proteomics revealed significant proteoform-level differences between metastatic and nonmetastatic colorectal cancer cells. Sci. Adv. 8(51), eabq6348 (2022)
Wang, Q., Xu, T., Fang, F., Wang, Q., Lundquist, P., Sun, L.: Capillary zone electrophoresis-tandem mass spectrometry for top-down proteomics of mouse brain integral membrane proteins. Anal. Chem. 95(34), 12590–12594 (2023)
Project PXD018772. https://www.ebi.ac.uk/pride/archive/projects/PXD018772. Accessed 20 march 2024
Albanese, P., Tamara, S., Saracco, G., Scheltema, R.A., Pagliano, C.: How paired PSII–LHCII supercomplexes mediate the stacking of plant thylakoid membranes unveiled by structural mass-spectrometry. Nat. Commun. 11(1), 1361 (2020)
Wang, Q., Sun, L., Lundquist, P.K.: Large-scale top-down proteomics of the Arabidopsis thaliana leaf and chloroplast proteomes. Proteomics 23(3–4), 2100377 (2023)
Sadeghi, S.A., et al.: Pilot evaluation of the long-term reproducibility of capillary zone electrophoresis–tandem mass spectrometry for top-down proteomics of a complex proteome sample. J. Proteome Res. 23, 1399–1407 (2024)
Acknowledgments
This work has been supported by the National Natural Science Foundation of China, grant no 62372171, the Hunan Provincial Natural Science Foundation of China, grant no. 2023JJ30414. Scientific Research Fund of Hunan Provincial Education Department (No. 23A0100);
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
No competing interests.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhong, J., Yang, C., Yuan, M., Wang, S. (2024). PrSMBooster: Improving the Accuracy of Top-Down Proteoform Characterization Using Deep Learning Rescoring Models. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14956. Springer, Singapore. https://doi.org/10.1007/978-981-97-5087-0_10
Download citation
DOI: https://doi.org/10.1007/978-981-97-5087-0_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5086-3
Online ISBN: 978-981-97-5087-0
eBook Packages: Computer ScienceComputer Science (R0)