Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2723372.2731084acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Overview of Data Exploration Techniques

Published: 27 May 2015 Publication History

Abstract

Data exploration is about efficiently extracting knowledge from data even if we do not know exactly what we are looking for. In this tutorial, we survey recent developments in the emerging area of database systems tailored for data exploration. We discuss new ideas on how to store and access data as well as new ideas on how to interact with a data system to enable users and applications to quickly figure out which data parts are of interest. In addition, we discuss how to exploit lessons-learned from past research, the new challenges data exploration crafts, emerging applications and future research directions.

References

[1]
Magic quadrant for business intelligence and analytics platforms. Gartner Group, 2015.
[2]
A. Abouzied, D. J. Abadi, and A. Silberschatz. Invisible loading: access-driven data transfer from raw files into database systems. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 1--10, 2013.
[3]
A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In Proceedings of the International Conference on Principles of Database Systems (PODS), 2013.
[4]
A. Abouzied, J. M. Hellerstein, and A. Silberschatz. Playful query specification with dataplay. Proceedings of the Very Large Data Bases Endowment (PVLDB), 5(12):1938--1941, 2012.
[5]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua Approximate Query Answering System. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 574--576, 1999.
[6]
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2014.
[7]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In EuroSys, 2013.
[8]
I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. Nodb: efficient query execution on raw data files. InrevProceedings of the ACM SIGMOD Conference on Management of Data, pages 241--252, 2012.
[9]
I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: a hands-free adaptive store. InrevProceedings of the ACM SIGMOD Conference on Management of Data, pages 1103--1114, 2014.
[10]
V. Alvarez, F. M. Schuhknecht, J. Dittrich, and S. Richter. Main memory adaptive indexing for multi-core systems. InrevProceedings of the International Workshop on Data Management on New Hardware (DAMON), page 3, 2014.
[11]
L. Battle, R. Chang, and M. Stonebraker. Dynamic Reduction of Query Result Sets for Interactive Visualization. In IEEE Workshop on Big Data Visualization, 2013.
[12]
E. Blais, A. Kim, A. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid Sampling for Visualizations with Ordering Guarantees. In Proceedings of the Very Large Data Bases Endowment (PVLDB), 2015.
[13]
A. Bonifati, R. Ciucanu, and S. Staworko. Interactive inference of join queries. In Proceedings of the International Conference on Extending Database Technology (EDBT), 2014.
[14]
U. Cetintemel, M. Cherniack, J. DeBrabant, Y. Diao, K. Dimitriadou, A. Kalinin, O. Papaemmanouil, and S. Zdonik. Query Steering for Interactive Data Exploration. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2013.
[15]
Y. Cheng and F. Rusu. Parallel in-situ data processing with speculative loading. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 1287--1298, 2014.
[16]
G. Cormode, M. N. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1--3):1--294, 2012.
[17]
P. Cudré-Mauroux, E. Wu, and S. Madden. The case for rodentstore: An adaptive, declarative storage system. InrevProceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2009.
[18]
K. Dimitriadou, O. Papaemmanouil, and Y. Diao. Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2014.
[19]
J. Dittrich and A. Jindal. Towards a one size fits all database architecture. InrevProceedings of the biennial Conference on Innovative Data Systems Research (CIDR), pages 195--198, 2011.
[20]
M. Drosou and E. Pitoura. Ymaldb: exploring relational databases via result-driven recommendations. VLDB J., 22(6):849--874, 2013.
[21]
J. Fan, G. Li, and L. Zhou. Interactive SQL Query Suggestion: Making Databases User-Friendly. InrevProceedings of the International Conference on Data Endineering (ICDE), 2011.
[22]
G. Graefe, F. Halim, S. Idreos, H. Kuno, and S. Manegold. Concurrency Control for Adaptive Indexing.revProceedings of the Very Large Data Bases Endowment (PVLDB), 5(7):656--667, 2012.
[23]
F. Halim, S. Idreos, P. Karras, and R. H. C. Yap. Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores. Proceedings of the Very Large Data Bases Endowment (PVLDB), 5(6):502--513, 2012.
[24]
J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth, and P. J. Haas. Interactive data analysis: The control project. Computer, 32(8), 1999.
[25]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online Aggregation. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1997.
[26]
S. Idreos. Database Cracking: Towards Auto-tuning Database Kernels. CWI, PhD Thesis, 2010.
[27]
S. Idreos. Big Data Exploration. Taylor and Francis, 2013.
[28]
S. Idreos, I. Alagiannis, R. Johnson, and A. Ailamaki. Here are my Data Files. Here are my Queries. Where are my Results? In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2011.
[29]
S. Idreos, M. L. Kersten, and S. Manegold.revDatabase cracking. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2007.
[30]
S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 413--424, 2007.
[31]
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column stores. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 297--308, 2009.
[32]
S. Idreos and E. Liarou. dbtouch: Analytics at your fingertips. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2013.
[33]
S. Idreos, S. Manegold, H. Kuno, and G. Graefe.revMerging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores.revProceedings of the Very Large Data Bases Endowment (PVLDB), 4(9):585--597, 2011.
[34]
H. V. Jagadish, A. Nandi, and L. Qian. Organic databases. In International Workshop Databases in Networked Information Systems, pages 49--63, 2011.
[35]
P. Jayachandran, K. Tunga, N. Kamat, and A. Nandi. Combining user interaction, speculative query execution and sampling in the DICE system.revProceedings of the Very Large Data Bases Endowment (PVLDB), 7(13):1697--1700, 2014.
[36]
A. Kalinin, U. Cetintemel, and S. Zdonik. Interactive Data Exploration using Semantic Windows. InrevProceedings of the ACM SIGMOD Conference on Management of Data, 2014.
[37]
N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed Interactive Cube Exploration. InrevProceedings of the International Conference on Data Endineering (ICDE), 2014.
[38]
D. Keim. Exploring Big Data using Visual Analytics-Keynote. In Exploratory Search in Databases and the Web. EDBT/ICDT Workshops, 2014.
[39]
M. Kersten, S. Idreos, S. Manegold, and E. Liarou. The Researcher's Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds.revProceedings of the Very Large Data Bases Endowment (PVLDB), 4(12):1474--1477, 2011.
[40]
A. Key, B. Howe, D. Perry, and C. Aragon. VizDeck: Self-Organizing Dashboards for Visual Analytics. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2012.
[41]
H. Khan, M. Sharaf, and A. Albarrak. DivIDE: efficient diversification for interactive data exploration. In SSDBM, 2014.
[42]
Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. Proceedings of the Very Large Data Bases Endowment (PVLDB), 7(10):853--864, 2014.
[43]
C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.
[44]
E. Liarou and S. Idreos. dbTouch in Action: Database kernels for touch-based data exploration. InrevProceedings of the International Conference on Data Endineering (ICDE), pages 1262--1265, 2014.
[45]
A. Nandi. Querying Without Keyboards. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2013.
[46]
A. Nandi and H. V. Jagadish. Guided interaction: Rethinking the query-result paradigm. PVLDB, 4(12):1466--1469.
[47]
A. Nandi, L. Jiang, and M. Mandel. Gestural Query Specification. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2014.
[48]
P. Neophytou, R. Gheorghiu, R. Hachey, T. Luciani, D. Bao, A. Labrinidis, G. E. Marai, and P. K. Chrysanthis. AstroShelf: Understanding the Universe Through Scalable Navigation of a Galaxy of Annotations (Demo). In Proceedings of the ACM SIGMOD Conference on Management of Data, 2012.
[49]
A. Parameswaran, N. Polyzotis, and H. Garcia-Molina. SeeDB: Visualizing Database Queries Efficiently. Proceedings of the Very Large Data Bases Endowment (PVLDB), 7(4):325--328, 2013.
[50]
H. Pirk, E. Petraki, S. Idreos, S. Manegold, and M. L. Kersten. Database cracking: fancy scan, not poor man's sort! InrevProceedings of the International Workshop on Data Management on New Hardware (DAMON), 2014.
[51]
F. Psallidas, B. Ding, K. Chakrabarti, and S. Chaudhuri. Top-k Spreadsheet-Style Search for Query Discovery. InrevProceedings of the ACM SIGMOD Conference on Management of Data, 2015.
[52]
B. Qarabaqi and M. Riedewald. User-driven refinement of imprecise queries. In Proceedings of the International Conference on Data Endineering (ICDE), 2014.
[53]
S. Richter, J. Quiané-Ruiz, S. Schuh, and J. Dittrich. Towards zero-overhead static and adaptive indexing in hadoop. VLDB J., 23(3):469--494, 2014.
[54]
S. Sarawagi, R. Agrawwal, and N. Megiddo. Discovery-driven Exploration of OLAP Data Cubes. InrevProceedings of the International Conference on Extending Database Technology (EDBT), 2008.
[55]
S. Sarawagi and G. Sathe. i3: Intelligent, Interactive Investigation of OLAP Data Cubes. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2000.
[56]
F. M. Schuhknecht, A. Jindal, and J. Dittrich. The uncracked pieces in database cracking. Proceedings of the Very Large Data Bases Endowment (PVLDB), 7(2):97--108, 2013.
[57]
T. Sellam and M. Kersten. Meet Charles, big data query advisor. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2013.
[58]
Y. Shen, K. Chakrabarti, S. Chaudhuri, B. Ding, and L. Novik. Discovering Queries based on Example Tuples. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2014.
[59]
L. Sidirourgos, M. L. Kersten, and P. A. Boncz. SciBORQ: Scientific data management with Bounds On Runtime and Quality. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), 2011.
[60]
L. Sidirourgos, M. L. Kersten, and P. A. Boncz. Scientific discovery through weighted sampling. In BigData Conference, 2013.
[61]
C. Stolte, D. Tang, and P. Hanrahan. Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases. IEEE Transactions on Visualization and Computer Graphics, 8(1), 2002.
[62]
M. Stonebraker and J. Kalash. TIMBER: A Sophisticated Relation Browser. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1982.
[63]
F. Tauheed, T. Heinis, F. Schurmann, H. Markram, and A. Ailamaki. SCOUT: Prefetching for Latent Structure Following Queries.revProceedings of the Very Large Data Bases Endowment (PVLDB), 5(11):1531--1542, 2012.
[64]
Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2009.
[65]
M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In Proceedings of the International Conference on Data Endineering (ICDE), 2011.
[66]
E. Wu, L. Battle, and S. Madden. The Case for Data Visualization Management Systems. Proceedings of the Very Large Data Bases Endowment (PVLDB), 7(10):903--906, 2014.
[67]
J. X. Yu, L. Qin, and L. Chang. Keyword search in relational databases: A survey. IEEE Data Eng. Bull., 33(1):67--78, 2010.
[68]
K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 1555--1566, 2014.

Cited By

View all
  • (2024)Multidimensional Perspective to Data Preprocessing for Model Cognition VerityRecent Trends and Future Direction for Data Analytics10.4018/979-8-3693-3609-0.ch002(15-57)Online publication date: 12-Jul-2024
  • (2024)Hidden in Plain Sight: A Data-Driven Approach to Safety Risk Management for Highway Traffic OfficersBuildings10.3390/buildings1411350914:11(3509)Online publication date: 2-Nov-2024
  • (2024)SeLeP: Learning Based Semantic Prefetching for Exploratory Database WorkloadsProceedings of the VLDB Endowment10.14778/3659437.365945817:8(2064-2076)Online publication date: 31-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. data exploration

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)245
  • Downloads (Last 6 weeks)41
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multidimensional Perspective to Data Preprocessing for Model Cognition VerityRecent Trends and Future Direction for Data Analytics10.4018/979-8-3693-3609-0.ch002(15-57)Online publication date: 12-Jul-2024
  • (2024)Hidden in Plain Sight: A Data-Driven Approach to Safety Risk Management for Highway Traffic OfficersBuildings10.3390/buildings1411350914:11(3509)Online publication date: 2-Nov-2024
  • (2024)SeLeP: Learning Based Semantic Prefetching for Exploratory Database WorkloadsProceedings of the VLDB Endowment10.14778/3659437.365945817:8(2064-2076)Online publication date: 31-May-2024
  • (2024)Cocoon: Semantic Table Profiling Using Large Language ModelsProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics10.1145/3665939.3665957(1-7)Online publication date: 14-Jun-2024
  • (2024)Cognitive Psychology Meets Data Management: State of the Art and Future DirectionsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654682(590-596)Online publication date: 9-Jun-2024
  • (2024)HiRegEx: Interactive Visual Query and Exploration of Multivariate Hierarchical DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638931:1(699-709)Online publication date: 10-Sep-2024
  • (2024)Comparison Queries Generation Using Mathematical Programming for Exploratory Data AnalysisIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.347482836:12(7792-7804)Online publication date: Dec-2024
  • (2024)ScaleViz: Scaling Visualization Recommendation Models on Large DataAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2262-4_8(93-104)Online publication date: 25-Apr-2024
  • (2024)Fundamentals of Data Analytics and LifecycleData Analytics and Machine Learning10.1007/978-981-97-0448-4_2(19-37)Online publication date: 20-Mar-2024
  • (2024)Interestingness Measures for Exploratory Data Analysis: a SurveyNew Trends in Database and Information Systems10.1007/978-3-031-70421-5_2(14-24)Online publication date: 14-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media