Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2666158.2666175acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

GOLAM: A Framework for Analyzing Genomic Data

Published: 07 November 2014 Publication History

Abstract

The emerging medical models aim at leveraging on high-throughput genome sequencing technologies to better target drugs to patients' personal profiles so as to increase their effectiveness. However, the huge amount of data made available by these technologies calls for sophisticated and automated analysis techniques. In this direction we present GOLAM, a framework for OLAP analysis and mining of matches between genomic regions extracted from ENCODE, a worldwide-available collection of shared genomic data. The goal of GOLAM is to overcome the current limitations of genome analysis methods, that are normally based on browsing. This is done by partially automating and speeding-up the analysis process on the one hand, by making it more flexible and introducing a multi-resolution view of data on the other. The framework has been partially implemented so far; in this paper we focus on conveying its potential and on describing its functional architecture and the underlying data models.

References

[1]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The aqua approximate query answering system. In Proc. SIGMOD, pages 574--576, Philadelphia, Pennsylvania, 1999.
[2]
N. Alkharouf, C. Jamison, and B. Matthews. Online analytical processing (OLAP): a fast and effective data mining tool for gene expression databases. BioMed Research International, 2005(2):181--188, 2005.
[3]
P. F. Brown, V. J. D. Pietra, P. V. de Souza, J. C. Lai, and R. L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479, 1992.
[4]
M. D. Chikina and O. G. Troyanskaya. An effective statistical evaluation of ChIPseq dataset similarity. Bioinformatics, 28(5):607--613, 2012.
[5]
M. Cornell, N. Paton, C. Hedeler, P. Kirby, D. Delneri, A. Hayes, and S. Oliver. GIMS: an integrated data storage and analysis environment for genomic and functional data. Yeast, 20(15):1291--1306, 2003.
[6]
A. Cuzzocrea. Approximate OLAP query processing over uncertain and imprecise multidimensional data streams. In Proc. DEXA (2), pages 156--173, Prague, Czech Republic, 2013.
[7]
R. Dijkman, M. Dumas, and L. Garcıa-Banuelos. Graph matching algorithms for business process model similarity search. In Proc. BPM, pages 48--63, Ulm, Germany, 2009.
[8]
S. Dzeroski, D. Hristovski, and B. Peterlin. Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions. In Proc. AMIA, pages 215--219, 2000.
[9]
K. Fellenberg, N. Hauser, B. Brors, J. Hoheisel, and M. Vingron. Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis. Bioinformatics, 18(3):423--433, 2002.
[10]
M. Fischer, Q. Thai, M. Grieb, and J. Pleiss. DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics, 7(1):495--504, 2006.
[11]
E. Gallinucci, M. Golfarelli, and S. Rizzi. Meta-stars: multidimensional modeling for social business intelligence. In Proc. DOLAP, pages 11--18, San Francisco, CA, 2013.
[12]
G. Ginsburg and J. McCarthy. Personalized medicine: revolutionizing drug discovery and patient care. Trends in Biotechnology, 19(12):491--496, 2001.
[13]
M. Golfarelli and S. Rizzi. Data Warehouse design: Modern principles and methodologies. McGraw-Hill, 2009.
[14]
J. Han. OLAP mining: Integration of OLAP with data mining. In Proc. Working Conf. on Database Semantics, pages 3--20, Leysin, Switzerland, 1997.
[15]
D. Havaleshko, H. Cho, M. Conaway, C. Owens, G. Hampton, J. Lee, and D. Theodorescu. Prediction of drug combination chemosensitivity in human bladder cancer. Molecular Cancer Therapeutics, 6(2):578--586, 2007.
[16]
J. Kent, C. Sugnet, T. Furey, K. Roskin, T. Pringle, A. Zahler, and D. Haussler. The human genome browser at UCSC. Genome Res., (12):996--1006, 2002.
[17]
M. Kircher and J. Kelso. High-throughput DNA sequencing -- concepts and limitations. BioEssays, 32(6):524--536, 2010.
[18]
J. K. Lee, P. D. Williams, and S. Cheon. Data mining in genomics. Clinics in Laboratory Medicine, 28(1):145 -- 166, 2008.
[19]
X.-J. Ma, R. Patel, X. Wang, R. Salunga, J. Murage, R. Desai, T. Tuggle, W. Wang, S. Chu, K. Stecker, R. Raja, H. Robin, M. Moore, D. Baunoch, D. Sgroi, and M. Erlander. Molecular classification of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay. Archives of Pathology and Laboratory Medicine, 130(4):465--473, 2006.
[20]
V. Markowitz and T. Topaloglou. Applying data warehouse concepts to gene expression data management. In Proc. BIBE, pages 65--72, Bethesda, Maryland, 2001.
[21]
A. E. Monge and C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.
[22]
V. Poosala, V. Ganti, and Y. E. Ioannidis. Approximate query answering using histograms. IEEE Data Eng. Bull., 22(4):5--14, 1999.
[23]
B. Raney, M. Cline, K. Rosenbloom, T. Dreszer, K. Learned, G. Barber, L. Meyer, C. Sloan, V. Malladi, K. Roskin, B. Suh, A. Hinrichs, H. Clawson, A. Zweig, V. Kirkup, P. Fujita, B. Rhead, K. Smith, A. Pohl, R. Kuhn, D. Karolchik, D. Haussler, and J. Kent. ENCODE whole-genome data in the UCSC genome browser (2011 update). Nucleic Acids Res., (39):D871--D875, 2011.
[24]
S. Shah, Y. Huang, T. Xu, M. Yuen, J. Ling, and F. Ouellette. Atlas--a data warehouse for integrative bioinformatics. BMC Bioinformatics, 6(1):34--49, 2005.
[25]
T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195--197, 1981.
[26]
G. Stelzer, I. Dalah, T. I. Stein, Y. Satanower, N. Rosen, N. Nativ, D. Oz-Levi, T. Olender, F. Belinky, I. Bahir, et al. In-silico human genomics with GeneCards. Human genomics, 5(6), 2011.
[27]
L. Wang, A. Zhang, and M. Ramanathan. BioStar models of clinical and genomic data for biomedical data warehouse design. International Journal of Bioinformatics Research and Applications, 1(1):63--80, 2005.

Cited By

View all
  • (2017)QETL: An approach to on-demand ETL from non-owned data sourcesData & Knowledge Engineering10.1016/j.datak.2017.09.002112(17-37)Online publication date: Nov-2017
  • (2016)G-quadruplex Structure Prediction and integration in the GenData2020 data modelProceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2975167.2985692(663-670)Online publication date: 2-Oct-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DOLAP '14: Proceedings of the 17th International Workshop on Data Warehousing and OLAP
November 2014
110 pages
ISBN:9781450309998
DOI:10.1145/2666158
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data warehouse
  2. genomics
  3. on-line analytical mining

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '14
Sponsor:

Acceptance Rates

DOLAP '14 Paper Acceptance Rate 8 of 22 submissions, 36%;
Overall Acceptance Rate 29 of 79 submissions, 37%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)QETL: An approach to on-demand ETL from non-owned data sourcesData & Knowledge Engineering10.1016/j.datak.2017.09.002112(17-37)Online publication date: Nov-2017
  • (2016)G-quadruplex Structure Prediction and integration in the GenData2020 data modelProceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2975167.2985692(663-670)Online publication date: 2-Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media