Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3384695acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries

Published: 31 May 2020 Publication History

Abstract

The very first step of many data analytics is to find and (possibly) rank desired tuples, typically through writing SQL queries - this is feasible only for data experts who can write SQL queries and know the data very well. Unfortunately, in practice, the queries might be complicated (for example, "find and rank good off-road cars based on a combination of Price, Make, Model, Age, Mileage, and so on" is complicated because it contains many if-then-else, and, or and not logic) such that even data experts cannot precisely specify SQL queries; and the data might be unknown, which is common in data discovery that one tries to discover desired data from a data lake. Naturally, a system that can help users to discover and rank desired tuples without writing SQL queries is needed. We propose to demonstrate such as a system, namely DExPlorer. To use DExPlorer for data exploration, the user only needs to interactively perform two simple operations over a set of system provided tuples: (1) annotate which tuples are desired (i.e., true labels) or not (i.e., false labels), and (2) annotate whether a tuple is more preferred than another one (i.e., partial orders or ranked lists). We will show that DExPlorer can find user's desired tuples and rank them in a few interactions, even for complicated queries.

References

[1]
K. Dimitriadou and et al. Explore-by-example: An automatic query steering framework for interactive data exploration. In SIGMOD, 2014.
[2]
R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Operations research letters, 1997.
[3]
X. He and et al. Practical lessons from predicting clicks on ads at facebook. In ADKDD, pages 5:1--5:9, 2014.
[4]
T. Joachims. Training linear svms in linear time. In SIGKDD, 2006.
[5]
F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, pages 563--574, 2006.
[6]
Y. Luo, X. Qin, and et al. Steerable self-driving data visualizationn. In IEEE TKDE, 2020.
[7]
Y. Luo, X. Qin, N. Tang, and G. Li. Deepeye: Towards automatic data visualization. In ICDE, pages 101--112, 2018.
[8]
X. Qin, Y. Luo, N. Tang, and G. Li. Making data visualization more efficient and effective: A survey. The VLDB Journal, 2019.
[9]
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 2010.
[10]
M. Xie, T. Chen, and et al. Findyourfavorite: An interactive system for finding the user's favorite tuple in the database. In SIGMOD, 2019.
[11]
S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In ASE, pages 224--234, 2013.

Cited By

View all
  • (2023)GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete DataProceedings of the ACM on Management of Data10.1145/35893021:2(1-27)Online publication date: 20-Jun-2023
  • (2023)VR-Explorer: A Demonstration of a Virtual Reality Interactive Data Exploration SystemCompanion Proceedings of the 2023 ACM International Conference on Supporting Group Work10.1145/3565967.3570976(15-17)Online publication date: 8-Jan-2023
  • (2023)Demystifying Artificial Intelligence for Data PreparationCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589406(13-20)Online publication date: 4-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SQL query
  2. data exploration
  3. database
  4. rank

Qualifiers

  • Short-paper

Funding Sources

  • NSF of China
  • TAL Education Group
  • Huawei

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete DataProceedings of the ACM on Management of Data10.1145/35893021:2(1-27)Online publication date: 20-Jun-2023
  • (2023)VR-Explorer: A Demonstration of a Virtual Reality Interactive Data Exploration SystemCompanion Proceedings of the 2023 ACM International Conference on Supporting Group Work10.1145/3565967.3570976(15-17)Online publication date: 8-Jan-2023
  • (2023)Demystifying Artificial Intelligence for Data PreparationCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589406(13-20)Online publication date: 4-Jun-2023
  • (2023)Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00135(1720-1733)Online publication date: Apr-2023
  • (2022)Selective data acquisition in the wild for model chargingProceedings of the VLDB Endowment10.14778/3523210.352322315:7(1466-1478)Online publication date: 22-Jun-2022
  • (2022)Data Management for Machine Learning: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3148237(1-1)Online publication date: 2022
  • (2022)Synthesizing Privacy Preserving Entity Resolution Datasets2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00222(2359-2371)Online publication date: May-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media