Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2618243.2618276acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Toward efficient and reliable genome analysis using main-memory database systems

Published: 30 June 2014 Publication History

Abstract

Improvements in DNA sequencing technologies allow to sequence complete human genomes in a short time and at acceptable cost. Hence, the vision of genome analysis as standard procedure to support and improve medical treatment becomes reachable. In this vision paper, we describe important data-management challenges that have to be met to make this vision come true. Besides genome-analysis performance, data-management capabilities such as data provenance and data integrity become increasingly important to enable comprehensible and reliable genome analysis. We argue to meet these challenges by using main-memory database technologies, which combine fast processing capabilities with extensive data-management capabilities. Finally, we discuss possibilities of integrating genome-analysis tasks into DBMSs and derive new research questions.

References

[1]
Daniel J. Abadi et al. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006.
[2]
Daniel Blankenberg et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, 89:19.10.1--19.10.21, 2010.
[3]
Y. Bromberg. Building a genome analysis pipeline to predict disease risk and prevent disease. J. Mol. Biol., 425(21):3993--4005, 2013.
[4]
CLC bio. Read mapping. Techn. rep., CLC bio, 2012.
[5]
Sebastian Dorok et al. Toward efficient variant calling inside main-memory database systems. In BIOKDD-DEXA, 2014.
[6]
Mohamed Y. Eltabakh et al. bdbms - a database management system for biological data. In CIDR, pages 196--206, 2007.
[7]
Ayat Hatem et al. Benchmarking short sequence mapping tools. BMC Bioinformatics, 14(1):184, 2013.
[8]
Romeo Kienzler et al. Incremental DNA sequence analysis in the cloud. In SSDBM, pages 640--645, 2012.
[9]
Eric S. Lander. Initial impact of the sequencing of the human genome. Nature, 470(7333):187--197, 2011.
[10]
Ben Langmead et al. Searching for SNPs with cloud computing. Genome Biol., 10(11):R134, 2009.
[11]
Heng Li et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078--2079, 2009.
[12]
Lin Liu et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol., 2012:1--11, 2012.
[13]
Stefan Manegold et al. Optimizing database architecture for the new bottleneck: memory access. The VLDB Journal, 9(3):231--246, 2000.
[14]
Elaine Mardis. The $1,000 genome, the $100,000 analysis? Genome Med, 2(11):84, 2010.
[15]
Michael L. Metzker. Sequencing technologies - the next generation. Nat. Rev. Genet., 11(1):31--46, 2009.
[16]
Jason O'Rawe et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med, 5(3):28, 2013.
[17]
Andrew Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165--178, 2009.
[18]
Astrid Rheinländer et al. Prefix tree indexing for similarity search and similarity joins on genomic data. In SSDBM, pages 519--536, 2010.
[19]
Uwe Röhm and José A. Blakeley. Data management for high-throughput genomics. In CIDR, 2009.
[20]
Wolfgang Sadée and Zunyan Dai. Pharmacogenetics/-genomics and personalized medicine. Hum. Mol. Genet., 14(suppl 2):R207--R214, 2005.
[21]
Matthieu-P. Schapranow and Hasso Plattner. HIG - an in-memory database platform enabling real-time analyses of genome data. In BigData, pages 691--696, 2013.
[22]
Sohrab P. Shah et al. Atlas - a data warehouse for integrative bioinformatics. BMC Bioinformatics, 6:34, 2005.
[23]
Jay Shendure and Hanlee Ji. Next-generation DNA sequencing. Nat Biotechnol, 26:1135--1145, 2008.
[24]
Michael Stonebraker et al. The end of an architectural era (It's time for a complete rewrite). In PVLDB, pages 1150--1160, 2007.

Cited By

View all
  • (2024)Bridging Genomic Data and CRMNew Trends in Marketing and Consumer Science10.4018/979-8-3693-2754-8.ch006(113-134)Online publication date: 14-Jun-2024
  • (2015)The Relational Way To Dam The Flood Of Genome DataProceedings of the 2015 ACM SIGMOD on PhD Symposium10.1145/2744680.2744692(9-13)Online publication date: 31-May-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '14: Proceedings of the 26th International Conference on Scientific and Statistical Database Management
June 2014
417 pages
ISBN:9781450327220
DOI:10.1145/2618243
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SSDBM '14

Acceptance Rates

SSDBM '14 Paper Acceptance Rate 26 of 71 submissions, 37%;
Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bridging Genomic Data and CRMNew Trends in Marketing and Consumer Science10.4018/979-8-3693-2754-8.ch006(113-134)Online publication date: 14-Jun-2024
  • (2015)The Relational Way To Dam The Flood Of Genome DataProceedings of the 2015 ACM SIGMOD on PhD Symposium10.1145/2744680.2744692(9-13)Online publication date: 31-May-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media