abstract

Public Access

Real-time peptide identification from high-throughput mass-spectrometry data

Authors:

BCB '21: Proceedings of the 12th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Article No.: 86, Page 1

https://doi.org/10.1145/3459930.3470856

Published: 01 August 2021 Publication History

Abstract

Peptide deduction remains one of the most challenging research problems in the large-scale study of proteomes using high-throughput Mass Spectrometers. The identification of large number of proteins from complex biological samples can be carried out in two steps: 1) tryptic digestion of protein sample to isolate constituent peptides, and then generating MS/MS data using high-throughput mass spectrometers; 2) Once the data is generated various methods such as database-search tools are used to compare mass-spectrometry data against a repository of known peptides. Advances in the MS instrumentation now allow generation of high-resolution data in massive volume and velocity making traditional MS based algorithms a bottleneck in the overall workflow. New generation of state-of-the-art database search tools are now capable of producing high-quality matches with impressively low FDR; however, the search time usually takes somewhere between a few weeks to a few months depending on the size of database and search parameters. To accelerate the overall search times, several studies have been proposed which target this computational bottleneck by exploiting specialized hardware architectures including HPC compute clusters and GPUs. Even with these accelerated pipelines the dream of realizing a true real-time processing and deduction of peptides from MS data is a far from realization. One bottleneck preventing the design of true real-time processing of MS based data is the cost of communication of the data required for the existing workflows i.e. moving the data from storage to computational nodes and across hierarchies of system memory, dominates the overall search process in MS data analysis. Therefore, techniques which can minimize the communication cost by enabling the computational search process to execute near the source of data-generation are highly desirable. In particular, specialized computer architecture designed by utilizing FPGAs to process high-resolution MS data as soon as it is generated by a mass-spectrometer can alleviate the latency involved in data storage and movement. FPGA based designs can exploit the inherent data-parallelism and minimize communication overhead by using a custom pipeline design aimed at reducing the number of main memory accesses. In this paper, we propose to design, and develop an FPGA based hardware accelerator. Our design consists of asynchronous parallel processing elements which implement efficient dataflow operations by using configurable data-caching, contention aware bus-arbiter, and double buffering. Our results have shown that we are able to achieve 600x reduction in average number of DRAM accesses and an average of 24x speed-up in the overall computation compared with a CPU. These results were obtained by processing publicly available MS data, whereas real-time performance can be achieved if the search operations are moved close to the source of data generation. In this regard, a streaming network-based hardware accelerator can greatly enhance the scale of proteomics which reads raw data directly from the mass-spectrometer to process the MS data in real-time in a streaming fashion and produce peptides deductions.

Index Terms

Real-time peptide identification from high-throughput mass-spectrometry data

Index terms have been assigned to the content through auto-classification.

Recommendations

POTAMOS mass spectrometry calculator

Mass spectrometry is a widely used technique for protein identification and it has also become the method of choice in order to detect and characterize the post-translational modifications (PTMs) of proteins. Many software tools have been developed to ...
A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data

Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods ...
Modeling Proteolysis from Mass Spectrometry Proteomic Data
From Mathematical Beauty to the Truth of Nature: to Jerzy Tiuryn on his 60th Birthday

In this paper we propose a mathematical model of the proteolysis process. Protein digestion is modelled with the use of chemical master equation (CME), i.e. the system of stochastic differential equations corresponding to the network of enzymatic ...

Comments

Information & Contributors

Information

Published In

BCB '21: Proceedings of the 12th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

August 2021

603 pages

ISBN:9781450384506

DOI:10.1145/3459930

General Chairs:
Hongmei Jiang
Northwestern University
,
Xiuzhen Huang
Arkansas State University
,
Jiajie Zhang
The University of Texas Health Science Center at Houston

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2021

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

Conference

BCB '21

Sponsor:

SIGBIOM

BCB '21: 12th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

August 1 - 4, 2021

Florida, Gainesville

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Index Terms

Recommendations

POTAMOS mass spectrometry calculator

A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data

Modeling Proteolysis from Mass Spectrometry Proteomic Data

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Abstract

Index Terms

Recommendations

POTAMOS mass spectrometry calculator

A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data

Modeling Proteolysis from Mass Spectrometry Proteomic Data

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations