Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459930.3470856acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
abstract
Public Access

Real-time peptide identification from high-throughput mass-spectrometry data

Published: 01 August 2021 Publication History

Abstract

Peptide deduction remains one of the most challenging research problems in the large-scale study of proteomes using high-throughput Mass Spectrometers. The identification of large number of proteins from complex biological samples can be carried out in two steps: 1) tryptic digestion of protein sample to isolate constituent peptides, and then generating MS/MS data using high-throughput mass spectrometers; 2) Once the data is generated various methods such as database-search tools are used to compare mass-spectrometry data against a repository of known peptides. Advances in the MS instrumentation now allow generation of high-resolution data in massive volume and velocity making traditional MS based algorithms a bottleneck in the overall workflow. New generation of state-of-the-art database search tools are now capable of producing high-quality matches with impressively low FDR; however, the search time usually takes somewhere between a few weeks to a few months depending on the size of database and search parameters. To accelerate the overall search times, several studies have been proposed which target this computational bottleneck by exploiting specialized hardware architectures including HPC compute clusters and GPUs. Even with these accelerated pipelines the dream of realizing a true real-time processing and deduction of peptides from MS data is a far from realization. One bottleneck preventing the design of true real-time processing of MS based data is the cost of communication of the data required for the existing workflows i.e. moving the data from storage to computational nodes and across hierarchies of system memory, dominates the overall search process in MS data analysis. Therefore, techniques which can minimize the communication cost by enabling the computational search process to execute near the source of data-generation are highly desirable. In particular, specialized computer architecture designed by utilizing FPGAs to process high-resolution MS data as soon as it is generated by a mass-spectrometer can alleviate the latency involved in data storage and movement. FPGA based designs can exploit the inherent data-parallelism and minimize communication overhead by using a custom pipeline design aimed at reducing the number of main memory accesses. In this paper, we propose to design, and develop an FPGA based hardware accelerator. Our design consists of asynchronous parallel processing elements which implement efficient dataflow operations by using configurable data-caching, contention aware bus-arbiter, and double buffering. Our results have shown that we are able to achieve 600x reduction in average number of DRAM accesses and an average of 24x speed-up in the overall computation compared with a CPU. These results were obtained by processing publicly available MS data, whereas real-time performance can be achieved if the search operations are moved close to the source of data generation. In this regard, a streaming network-based hardware accelerator can greatly enhance the scale of proteomics which reads raw data directly from the mass-spectrometer to process the MS data in real-time in a streaming fashion and produce peptides deductions.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '21: Proceedings of the 12th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2021
603 pages
ISBN:9781450384506
DOI:10.1145/3459930
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2021

Check for updates

Author Tags

  1. FPGA
  2. accelerator
  3. architecture
  4. mass-spectrometry
  5. peptide-identification
  6. proteomics
  7. real-time

Qualifiers

  • Abstract

Funding Sources

Conference

BCB '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media