Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Probabilistic and Machine Learning Models for the Protein Scaffold Gap Filling Problem

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2024)

Abstract

In de novo protein sequencing, we often could only obtain an incomplete protein sequence, namely scaffold, from top-down and bottom-up tandem mass spectrometry. While most sections of the proteins can be inferred from its homologous sequences, some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffold. Thus, we only focus on predicting the gaps based on a probabilistic algorithm and machine learning models instead predicting the complete protein sequence using generative AI models in this paper. We study two versions of the protein scaffold filling problem with known size gaps and known mass gaps. For the known size gaps version, we develop several machine learning models based on random forest, k-nearest neighbors, decision tree and fully connected neural network. For the known mass gap problem, we design a probabilistic algorithm to predict the missing amino acids in the gaps. The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100% and close to 100% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422(6928), 198–207 (2003)

    Article  Google Scholar 

  2. Bricas, E., Van Heijenoort, J., Barber, M., Wolstenholme, W., Das, B., Lederer, E.: Determination of amino acid sequences in oligopeptides by mass spectrometry. IV. Synthetic n-acyl oligopeptide methyl esters. Biochemistry 4(10), 2254–2260 (1965)

    Article  Google Scholar 

  3. Dupré, M., et al.: De novo sequencing of antibody light chain proteoforms from patients with multiple myeloma. Anal. Chem. 93(30), 10627–10634 (2021). pMID: 34292722. https://doi.org/10.1021/acs.analchem.1c01955

  4. Kinter, M., Sherman, N.E.: Protein Sequencing and Identification Using Tandem Mass Spectrometry. Wiley, Hoboken (2005)

    Google Scholar 

  5. Liu, X., et al.: De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res. 13(7), 3241–3248 (2014)

    Article  MathSciNet  Google Scholar 

  6. National Center for Biotechnology Information: Blast (2023). https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins

  7. Qingge, L., Liu, X., Zhong, F., Zhu, B.: Filling a protein scaffold with a reference. IEEE Trans. Nanobiosci. 16(2), 123–130 (2017)

    Article  Google Scholar 

  8. Standing, K.G.: Peptide and protein de novo sequencing by mass spectrometry. Curr. Opin. Struct. Biol. 13(5), 595–601 (2003)

    Article  Google Scholar 

  9. Sturtz, J., Annan, R., Zhu, B., Liu, X., Qingge, L.: A convolutional denoising autoencoder for protein scaffold filling. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds.) Bioinformatics Research and Applications, ISBRA 2023. LNCS, vol. 14248, pp. 518–529. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-7074-2_42

  10. Sturtz, J., Zhu, B., Liu, X., Fu, X., Yuan, X., Qingge, L.: Deep learning approaches for the protein scaffold filling problem. In: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1055–1061. IEEE (2022)

    Google Scholar 

  11. Tran, N.H., Rahman, M.Z., He, L., Xin, L., Shan, B., Li, M.: Complete de novo assembly of monoclonal antibody sequences. Sci. Rep. 6(1), 1–10 (2016)

    Article  Google Scholar 

  12. Wulfson, N., et al.: Mass spectrometric determination of the amino (hydroxy) acid sequence in peptides and depsipeptides. Tetrahedron Lett. 6(32), 2805–2812 (1965)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the NSF of the United States under Award 2307571, 2307572 and 2307573. We also thank anonymous reviewers for their insightful comments and inputs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binhai Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Badal, K., Qingge, L., Liu, X., Zhu, B. (2024). Probabilistic and Machine Learning Models for the Protein Scaffold Gap Filling Problem. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14956. Springer, Singapore. https://doi.org/10.1007/978-981-97-5087-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5087-0_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5086-3

  • Online ISBN: 978-981-97-5087-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics