Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2649387.2660823acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

AlignMR: mass spectrometry peak alignment using Hadoop MapReduce

Published: 20 September 2014 Publication History

Abstract

Proteomics is the study of the structure and behavior of proteins, and one of the primary approaches to protein identification and quantification is through the analysis of Mass Spectrometry (MS) data. This analysis typically involves a series of different computational steps, and the Purdue University Bindley Bioscience Center employs a computational workflow system, the Omics Discovery Pipeline (ODP), to assist in its analysis of MS data. One of the ODP's stages entails aligning the peaks in the MS data across multiple subjects, and due to the large number of subjects that may be used in a study and the large number of peaks found in each subject's corresponding MS data, the alignment step qualifies as a data intensive computation. This research focuses on using Apache Hadoop MapReduce to align the processed MS data in a computationally faster manner than the serial approach currently used in the ODP.

References

[1]
What is Cancer Proteomics? Retrieved June 18, 2014 from Office of Cancer Clinical Proteomics Research, National Cancer Institute: http://proteomics.cancer.gov/whatisproteomics.
[2]
Aditi Magikar and John Springer. 2013. PXAlign: A parallel implementation of XAlign. Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM New York, NY, 690.
[3]
Xiang Zhang, John M. Asara, Jiri Adamec, Mourad Ouzzani, and Ahmed K. Elmagarmid. 2005. Data pre-processing in liquid chromatography--mass spectrometry-based proteomics. Bioinformatics 21, 21 (November 2005), 4054--4059.
[4]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM - 50th anniversary issue: 1958--2008 1 (Jan 2008). ACM New York, NY, 107--113.
[5]
J. Rosen, N. Polyzotis, V. R. Borkar, Y. Bu, M. J. Carey, M. Weimer, T. Condie, and R. Ramakrishnan. 2013. Iterative MapReduce for Large Scale Machine Learning. In Proceedings of CoRR.
[6]
Robson Cordeiro, Caetano Traina, Agma Traina, Juilio Lopez, U. Kang and Christos Faloutsos. 2011. Clustering very large multi-dimensional datasets with MapReduce. KDD '11 Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, 690--698.
[7]
C. Hillman, Y. Ahmad, M. Whitehorn, and A. Cobley. 2014. Near real-time processing of proteomics data using Hadoop. Big Data 2,1, 44--49.

Cited By

View all
  • (2015)Toward an analytical framework for proteomics softwareProceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/2808719.2812221(582-588)Online publication date: 9-Sep-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Check for updates

Author Tags

  1. Hadoop
  2. MapReduce
  3. big data
  4. proteomics

Qualifiers

  • Research-article

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Toward an analytical framework for proteomics softwareProceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/2808719.2812221(582-588)Online publication date: 9-Sep-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media