Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1514894.1514901acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

On approximating optimum repairs for functional dependency violations

Published: 23 March 2009 Publication History

Abstract

We study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the functional dependencies, and minimizes, among all repairs, a distance measure that depends on the number of corrections made in the database and the weights of tuples modified. We show that like other versions of the repair problem, checking the existence of a repair within a certain distance of a database is NP-complete. We also show that finding a constant-factor approximation for the optimum repair for any set of functional dependencies is NP-hard. Furthermore, there is a small constant and a set of functional dependencies, for which finding an approximate solution for the optimum repair within the factor of that constant is also NP-hard. Then we present an approximation algorithm that for a fixed set of functional dependencies and an arbitrary input inconsistent database, produces a repair whose distance to the database is within a constant factor of the optimum repair distance. We finally show how the approximation algorithm can be used in data cleaning using a recent extension to functional dependencies, called conditional functional dependencies.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. On the representation and querying of sets of possible worlds. Theor. Comput. Sci., 78(1):158--187, 1991.
[3]
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999.
[4]
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Answer sets for consistent query answering in inconsistent databases. TPLP, 3(4--5):393--424, 2003.
[5]
Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, and Jeremy Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci., 3(296):405--434, 2003.
[6]
Mihir Bellare, Shafi Goldwasser, Carsten Lund, and A. Russeli. Efficient probabilistically checkable proofs and applications to approximations. In STOC, pages 294--304, 1993.
[7]
Leopoldo E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006.
[8]
Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, and Andrei Lopatenko. The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf. Syst., 33(4--5):407--434, 2008.
[9]
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.
[10]
Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005.
[11]
Jan Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007.
[12]
Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1--2):90--121, 2005.
[13]
Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. Improving data quality: Consistency and accuracy. In VLDB, pages 315--326, 2007.
[14]
Dorit S. Hochbaum (Editor). Approximation Algorithms for NP-Hard Problems. PWS, 1997.
[15]
Wenfei Fan. Dependencies revisited for improving data quality. In PODS, pages 159--170, 2008.
[16]
Sergio Flesca, Filippo Furfaro, and Francesco Parisi. Consistent query answers on numerical databases under aggregate constraints. In DBPL, pages 279--294, 2005.
[17]
Enrico Franconi, Antonio Laureti Palma, Nicola Leone, Simona Perri, and Francesco Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001.
[18]
Filippo Furfaro, Sergio Greco, and Cristian Molinaro. A three-valued semantics for querying and repairing inconsistent databases. Ann. Math. Artif. Intell., 51(2--4):167--193, 2007.
[19]
Ariel Fuxman, Elham Fazli, and Renée J. Miller. Conquer: Efficient management of inconsistent databases. In SIGMOD Conference, pages 155--166, 2005.
[20]
Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610--635, 2007.
[21]
Solmaz Kolahi and Leonid Libkin. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3nf. In PODS, pages 114--123, 2006.
[22]
Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000.
[23]
Andrei Lopatenko and Loreto Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, pages 216--225, 2007.
[24]
Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425--440, 1991.
[25]
Vijay V. Vazirani. Approximation Algorithms. Springer, 2003.
[26]
Jef Wijsen. Condensed representation of database repairs for consistent query answering. In ICDT, pages 375--390, 2003.
[27]
Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005.
[28]
Jef Wijsen. Project-join-repair: An approach to consistent query answering under functional dependencies. In FQAS, pages 1--12, 2006.

Cited By

View all
  • (2024)Efficient Validation of SHACL Shapes with ReasoningProceedings of the VLDB Endowment10.14778/3681954.368202317:11(3589-3601)Online publication date: 1-Jul-2024
  • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
  • (2024)Database Repairing with Soft Functional DependenciesACM Transactions on Database Systems10.1145/365115649:2(1-34)Online publication date: 10-Apr-2024
  • Show More Cited By

Index Terms

  1. On approximating optimum repairs for functional dependency violations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDT '09: Proceedings of the 12th International Conference on Database Theory
    March 2009
    334 pages
    ISBN:9781605584232
    DOI:10.1145/1514894
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation algorithm
    2. functional dependency violation
    3. inconsistent databases
    4. repair

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '09
    EDBT/ICDT '09: EDBT/ICDT '09 joint conference
    March 23 - 25, 2009
    St. Petersburg, Russia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Validation of SHACL Shapes with ReasoningProceedings of the VLDB Endowment10.14778/3681954.368202317:11(3589-3601)Online publication date: 1-Jul-2024
    • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
    • (2024)Database Repairing with Soft Functional DependenciesACM Transactions on Database Systems10.1145/365115649:2(1-34)Online publication date: 10-Apr-2024
    • (2024)BClean: A Bayesian Data Cleaning System2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00263(3407-3420)Online publication date: 13-May-2024
    • (2024)A hybrid physics-corrected neural network for RUL prognosis under random missing dataExpert Systems with Applications10.1016/j.eswa.2024.124939(124939)Online publication date: Aug-2024
    • (2024)Computing Minimum Subset Repair on Incomplete DataWeb and Big Data10.1007/978-981-97-7238-4_28(444-459)Online publication date: 28-Aug-2024
    • (2024)Computing Maximal Likelihood Subset Repair for Inconsistent DataWeb and Big Data10.1007/978-981-97-2390-4_1(1-15)Online publication date: 28-Apr-2024
    • (2023)Time Series Data ValidityProceedings of the ACM on Management of Data10.1145/35889391:1(1-26)Online publication date: 30-May-2023
    • (2023)From Minimum Change to Maximum Density: On Determining Near-Optimal S-RepairIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3294401(1-12)Online publication date: 2023
    • (2023)Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00391(1502-1515)Online publication date: Apr-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media