Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

New trends on exploratory methods for data analytics

Published: 01 August 2017 Publication History

Abstract

Data usually comes in a plethora of formats and dimensions, rendering the exploration and information extraction processes cumbersome. Thus, being able to cast exploratory queries in the data with the intent of having an immediate glimpse on some of the data properties is becoming crucial. An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL) and mechanisms, and at the same time retain the flexibility and expressiveness of such languages. Recently, we have witnessed a rediscovery of the so called example-based methods, in which the user, or the analyst circumvent query languages by using examples as input. An example is a representative of the intended results, or in other words, an item from the result set. Example-based methods exploit inherent characteristics of the data to infer the results that the user has in mind, but may not able to (easily) express. They can be useful both in cases where a user is looking for information in an unfamiliar dataset, or simply when she is exploring the data without knowing what to find in there. In this tutorial, we present an excursus over the main methods for exploratory analysis, with a particular focus on example-based methods. We show how different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data.

References

[1]
M. Arenas, G. I. Diaz, and E. V. Kostylev. Reverse engineering sparql queries. In WWW, pages 239--249, 2016.
[2]
A. Bonifati, R. Ciucanu, and A. Lemay. Learning path queries on graph databases. In EDBT, 2015.
[3]
A. Bonifati, R. Ciucanu, and S. Staworko. Learning join queries from user examples. TODS, 40(4):24, 2016.
[4]
A. Bonifati, U. Comignani, E. Coquery, and R. Thion. Interactive mapping specification with exemplar tuples. In SIGMOD, pages 667--682, 2017.
[5]
I. Bordino, G. De Francisci Morales, I. Weber, and F. Bonchi. From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In WSDM, pages 275--284, 2013.
[6]
D. Deutch and A. Gilad. Qplain: Query by explanation. In ICDE, pages 1358--1361, 2016.
[7]
G. Diaz, M. Arenas, and M. Benedikt. Sparqlbye: Querying rdf data by example. Proceedings of the VLDB Endowment, 9(13):1533--1536, 2016.
[8]
K. Dimitriadou, O. Papaemmanouil, and Y. Diao. Explore-by-example: An automatic query steering framework for interactive data exploration. In SIGMOD, pages 517--528. ACM, 2014.
[9]
B. Eravci and H. Ferhatosmanoglu. Diversity based relevance feedback for time series search. PVLDB, 7(2):109--120, 2013.
[10]
A. Gionis, M. Mathioudakis, and A. Ukkonen. Bump hunting in the dark: Local discrepancy maximization on graphs. In ICDE, pages 1155--1166, 2015.
[11]
M. F. Hanafi, A. Abouzied, L. Chiticariu, and Y. Li. Synthesizing extraction rules from user examples with seer. In SIGMOD, pages 1687--1690, 2017.
[12]
Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mindreader: Querying databases through multiple examples. In VLDB, 1998.
[13]
N. Jayaram, A. Khan, C. Li, X. Yan, and R. Elmasri. Querying knowledge graphs by example entity tuples. TKDE, 27(10):2797--2811, 2015.
[14]
I. M. Kloumann and J. M. Kleinberg. Community membership identification from small seed sets. In KDD, 2014.
[15]
H. Li, C.-Y. Chan, and D. Maier. Query from examples: An iterative, data-driven approach to query construction. PVLDB, 8(13):2158--2169, 2015.
[16]
Y. Ma, T.-K. Huang, and J. G. Schneider. Active search and bandits on graphs using sigma-optimality. In UAI, pages 542--551, 2015.
[17]
S. Metzger, R. Schenkel, and M. Sydow. Qbees: query by entity examples. In CIKM, pages 1829--1832, 2013.
[18]
D. Mottin, M. Lissandrini, Y. Velegrakis, and T. Palpanas. Searching with xq: the exemplar query search engine. In SIGMOD, pages 901--904. ACM, 2014.
[19]
D. Mottin, M. Lissandrini, Y. Velegrakis, and T. Palpanas. Exemplar queries: a new way of searching. VLDB J., pages 1--25, 2016.
[20]
F. Murai, D. Rennó, B. Ribeiro, G. L. Pappa, D. Towsley, and K. Gile. Selective harvesting over networks. arXiv preprint arXiv:1703.05082, 2017.
[21]
B. Perozzi, L. Akoglu, P. Iglesias Sánchez, and E. Müller. Focused clustering and outlier detection in large attributed graphs. In KDD, pages 1346--1355, 2014.
[22]
F. Psallidas, B. Ding, K. Chakrabarti, and S. Chaudhuri. S4: Top-k spreadsheet-style search for query discovery. In SIGMOD, pages 2001--2016, 2015.
[23]
R. Rolim, G. Soares, L. D'Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. Learning syntactic program transformations from examples. In ICSE, pages 404--415. IEEE Press, 2017.
[24]
N. Ruchansky, F. Bonchi, D. García-Soriano, F. Gullo, and N. Kourtellis. The minimum wiener connector problem. In SIGMOD, pages 1587--1602, 2015.
[25]
T. Sellam and M. Kersten. Cluster-driven navigation of the query space. TKDE, 28(5):1118--1131, 2016.
[26]
Y. Shen, K. Chakrabarti, S. Chaudhuri, B. Ding, and L. Novik. Discovering queries based on example tuples. In SIGMOD, pages 493--504, 2014.
[27]
R. Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations. PVLDB, 9(10):816--827, 2016.
[28]
G. Sobczak, M. Chochól, R. Schenkel, and M. Sydow. iqbees: Towards interactive semantic entity search based on maximal aspects. In Foundations of Intelligent Systems, pages 259--264. Springer, 2015.
[29]
Y. Su, S. Yang, H. Sun, M. Srivatsa, S. Kase, M. Vanni, and X. Yan. Exploiting relevance feedback in knowledge graph search. In KDD, pages 1135--1144, 2015.
[30]
S. Tong and E. Chang. Support vector machine active learning for image retrieval. In MM, pages 107--118, 2001.
[31]
Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query reverse engineering. VLDB J., 23(5):721--746, 2014.
[32]
C. Wang, A. Cheung, and R. Bodik. Interactive query synthesis from input-output examples. In SIGMOD, pages 1631--1634, 2017.
[33]
C. Wang, A. Cheung, and R. Bodik. Synthesizing highly expressive sql queries from input-output examples. In PLDI, 2017.
[34]
A. Wasay, M. Athanassoulis, and S. Idreos. Queriosity: Automated data exploration. In Proceedings of the IEEE International Congress on Big Data, 2015.
[35]
Y. Y. Weiss and S. Cohen. Reverse engineering spj-queries from examples. In SIGMOD, pages 151--166, 2017.
[36]
E. Wu, L. Battle, and S. R. Madden. The case for data visualization management systems: Vision paper. Proc. VLDB Endow., 7(10):903--906, June 2014.
[37]
M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: Entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD, 2012.
[38]
M. Zhu and Y.-F. B. Wu. Search by multiple examples. In WSDM, pages 667--672, 2014.
[39]
M. M. Zloof. Query by example. In AFIPS NCC, pages 431--438, 1975.

Cited By

View all
  • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022
  • (2022)Survey on Learnable DatabasesBig Data Research10.1016/j.bdr.2021.10030427:COnline publication date: 28-Feb-2022
  • (2021)Balancing Familiarity and Curiosity in Data Exploration with Deep Reinforcement LearningFourth Workshop in Exploiting AI Techniques for Data Management10.1145/3464509.3464884(16-23)Online publication date: 20-Jun-2021
  • Show More Cited By
  1. New trends on exploratory methods for data analytics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 12
    August 2017
    427 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2017
    Published in PVLDB Volume 10, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Guided Text-based Item ExplorationProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557141(3410-3420)Online publication date: 17-Oct-2022
    • (2022)Survey on Learnable DatabasesBig Data Research10.1016/j.bdr.2021.10030427:COnline publication date: 28-Feb-2022
    • (2021)Balancing Familiarity and Curiosity in Data Exploration with Deep Reinforcement LearningFourth Workshop in Exploiting AI Techniques for Data Management10.1145/3464509.3464884(16-23)Online publication date: 20-Jun-2021
    • (2021)Query Definability and Its Approximations in Ontology-based Data ManagementProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482466(271-280)Online publication date: 26-Oct-2021
    • (2021)DORA THE EXPLORERProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481967(4769-4773)Online publication date: 26-Oct-2021
    • (2021) ViewSeekerBig Data Research10.1016/j.bdr.2021.10023825:COnline publication date: 15-Jul-2021
    • (2020)Guided exploration of user groupsProceedings of the VLDB Endowment10.14778/3397230.339724213:9(1469-1482)Online publication date: 1-May-2020
    • (2020)Cohort analytics: efficiency and applicabilityThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00625-629:6(1527-1550)Online publication date: 27-Aug-2020
    • (2019)Exploring the Data Wilderness through ExamplesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314031(2031-2035)Online publication date: 25-Jun-2019
    • (2019)Data Pipelines for User Group AnalyticsProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314028(2048-2053)Online publication date: 25-Jun-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media