Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2899388acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

QFix: Demonstrating Error Diagnosis in Query Histories

Published: 26 June 2016 Publication History

Abstract

An increasing number of applications in all aspects of society rely on data. Despite the long line of research in data cleaning and repairs, data correctness has been an elusive goal. Errors in the data can be extremely disruptive, and are detrimental to the effectiveness and proper function of data-driven applications. Even when data is cleaned, new errors can be introduced by applications and users who interact with the data. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Any discovered errors tend to be corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. In this demo proposal, we outline the design of QFix, a query-centric framework that derives explanations and repairs for discrepancies in relational data based on potential errors in the queries that operated on the data. This is a marked departure from traditional data-centric techniques that directly fix the data. We then describe how users will use QFix in a demonstration scenario. Participants will be able to select from a number of transactional benchmarks, introduce errors into the queries that are executed, and compare the fixes to the queries proposed by QFix as well as existing alternative algorithms such as decision trees.

References

[1]
V. Chandola, A. Banerjee, and V. Kumar. Outlier detection: A survey. ACM Computing Surveys, 2007.
[2]
S. Chen, X. L. Dong, L. V. Lakshmanan, and D. Srivastava. We challenge you to certify your updates. In SIGMOD, 2011.
[3]
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. In VLDB, 2013.
[4]
W. W. Eckerson. Data quality and the bottom line. TDWI Report, The Data Warehouse Institute, 2002.
[5]
W. Fan, F. Geerts, and X. Jia. A revival of integrity constraints for data cleaning. In VLDB, 2008.
[6]
B. Grady. Oakland unified makes$7.6M accounting error in budget; asking schools not to count on it. In Oakland, 2013.
[7]
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: interactive visual specification of data transformation scripts. In CHI, 2011.
[8]
S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. Enterprise data analysis and visualization: An interview study. IEEE Trans. Vis. Comput. Graph., 2012.
[9]
N. Khoussainova, M. Balazinska, and D. Suciu. Towards correcting input data errors probabilistically using integrity constraints. In MobiDE, 2006.
[10]
R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, and H. Zhu. Systemt: a system for declarative information extraction. SIGMOD Record, 2009.
[11]
S. Krishnan, J. Wang, M. J. Franklin, K. Goldberg, T. Kraska, T. Milo, and E. Wu. Sampleclean: Fast and reliable analytics on dirty data. arXiv, 2015.
[12]
M. Sakal and L. Raković. Errors in building and using electronic tables: Financial consequences and minimisation techniques. In Strategic Management, 2012.
[13]
C. Thomsen and T. B. Pedersen. A survey of open source tools for business intelligence. Data Warehousing and Knowledge Discovery, 2005.
[14]
X. Wang, X. L. Dong, and A. Meliou. Data x-ray: A diagnostic tool for data errors. In SIGMOD, 2015.
[15]
X. Wang, A. Meliou, and E. Wu. Qfix: Diagnosing errors through query histories. CoRR, abs/1601.07539, 2016.
[16]
E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. In PVLDB, 2013.
[17]
J. Yates. Data entry error wipes out life insurance coverage. In Chicago Tribune, 2005.

Cited By

View all
  • (2021)SOSRepair: Expressive Semantic Search for Real-World Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2019.294491447:10(2162-2181)Online publication date: 1-Oct-2021
  • (2020)Debugging Database Queries: A Survey of Tools, Techniques, and UsersProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376485(1-16)Online publication date: 21-Apr-2020
  • (2019)Automatically generating precise Oracles from structured natural language specificationsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00035(188-199)Online publication date: 25-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data cleaning
  2. mixed-integer linear programming
  3. query provenance

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)5
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)SOSRepair: Expressive Semantic Search for Real-World Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2019.294491447:10(2162-2181)Online publication date: 1-Oct-2021
  • (2020)Debugging Database Queries: A Survey of Tools, Techniques, and UsersProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376485(1-16)Online publication date: 21-Apr-2020
  • (2019)Automatically generating precise Oracles from structured natural language specificationsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00035(188-199)Online publication date: 25-May-2019
  • (2018)Software fairnessProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3264838(754-759)Online publication date: 26-Oct-2018
  • (2018)Themis: automatically testing software for discriminationProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3264590(871-875)Online publication date: 26-Oct-2018
  • (2017)Fairness testing: testing software for discriminationProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106277(498-510)Online publication date: 21-Aug-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media