Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Discovering denial constraints

Published: 01 August 2013 Publication History

Abstract

Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example, an American citizen cannot have lower salary and higher tax rate than another citizen in the same state. In this paper, we tackle the challenges of discovering dependencies in a more expressive integrity constraint language, namely Denial Constraints (DCs). DCs are expressive enough to overcome the limits of previous languages and, at the same time, have enough structure to allow efficient discovery and application in several scenarios. We lay out theoretical and practical foundations for DCs, including a set of sound inference rules and a linear algorithm for implication testing. We then develop an efficient instance-driven DC discovery algorithm and propose a novel scoring function to rank DCs for user validation. Using real-world and synthetic datasets, we experimentally evaluate scalability and effectiveness of our solution.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[3]
M. Baudinet, J. Chomicki, and P. Wolper. Constraint-generating dependencies. J. Comput. Syst. Sci., 59(1):94-115, 1999.
[4]
O. Benjelloun, H. Garcia-Molina, H. Gong, H. Kawai, T. E. Larson, D. Menestrina, and S. Thavisomboon. D-swoosh: A family of algorithms for generic, distributed entity resolution. In ICDCS, 2007.
[5]
L. E. Bertossi. Database Repairing and Consistent Query Answering. Morgan & Claypool Publishers, 2011.
[6]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, 2006.
[7]
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. ICDE, 2007.
[8]
F. Chiang and R. J. Miller. Discovering data quality rules. PVLDB, 1(1):1166-1177, 2008.
[9]
X. Chu, I. F. Ilyas, and P. Papotti. Discovering denial constraints. Tech. Report QCRI2013-1 at http://da.qcri.org/dc/.
[10]
X. Chu, I. F. Ilyas, and P. Papotti. Holistic data cleaning: Putting violations into context. In ICDE, 2013.
[11]
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240-251, 2002.
[12]
D. Deroos, C. Eaton, G. Lapis, P. Zikopoulos, and T. Deutsch. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill, 2011.
[13]
W. Fan and F. Geerts. Foundations of Data Quality Management. Morgan & Claypool Publishers, 2012.
[14]
W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst., 33(2), 2008.
[15]
W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. IEEE TKDE, 23(5):683-698, 2011.
[16]
L. Golab, H. J. Karloff, F. Korn, D. Srivastava, and B. Yu. On generating near-optimal tableaux for conditional functional dependencies. PVLDB, 1(1):376-390, 2008.
[17]
Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. Comput. J., 42(2):100-111, 1999.
[18]
I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pages 647-658, 2004.
[19]
C. M. Wyss, C. Giannella, and E. L. Robertson. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In DaWaK, 2001.

Cited By

View all
  • (2025)Data-centric Artificial Intelligence: A SurveyACM Computing Surveys10.1145/371111857:5(1-42)Online publication date: 24-Jan-2025
  • (2025)Leveraging local and global relationships for corrupted label detectionFuture Generation Computer Systems10.1016/j.future.2025.107729(107729)Online publication date: Jan-2025
  • (2025)Enabling Efficient and Semantic-Aware Constraint Validation in Knowledge GraphsThe Semantic Web: ESWC 2024 Satellite Events10.1007/978-3-031-78955-7_11(104-114)Online publication date: 28-Jan-2025
  • Show More Cited By
  1. Discovering denial constraints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 6, Issue 13
    August 2013
    180 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2013
    Published in PVLDB Volume 6, Issue 13

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Data-centric Artificial Intelligence: A SurveyACM Computing Surveys10.1145/371111857:5(1-42)Online publication date: 24-Jan-2025
    • (2025)Leveraging local and global relationships for corrupted label detectionFuture Generation Computer Systems10.1016/j.future.2025.107729(107729)Online publication date: Jan-2025
    • (2025)Enabling Efficient and Semantic-Aware Constraint Validation in Knowledge GraphsThe Semantic Web: ESWC 2024 Satellite Events10.1007/978-3-031-78955-7_11(104-114)Online publication date: 28-Jan-2025
    • (2024)Learned Query Optimization by Constraint-Based Query Plan AugmentationMathematics10.3390/math1219310212:19(3102)Online publication date: 3-Oct-2024
    • (2024)Order in Desbordante: Techniques for Efficient Implementation of Order Dependency Discovery Algorithms2024 35th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT61870.2024.10516381(413-424)Online publication date: 24-Apr-2024
    • (2024)Efficient Validation of SHACL Shapes with ReasoningProceedings of the VLDB Endowment10.14778/3681954.368202317:11(3589-3601)Online publication date: 30-Aug-2024
    • (2024)Rapidash: Efficient Detection of Constraint ViolationsProceedings of the VLDB Endowment10.14778/3659437.365945417:8(2009-2021)Online publication date: 31-May-2024
    • (2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 16-Dec-2024
    • (2024)GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language ModelsProceedings of the ACM on Management of Data10.1145/36988112:6(1-29)Online publication date: 20-Dec-2024
    • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media