Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3196959.3196980acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Public Access

Computing Optimal Repairs for Functional Dependencies

Published: 27 May 2018 Publication History

Abstract

We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair) that is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard, and in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a "most probable database" that satisfies a set of FDs with a single attribute on the left hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.

References

[1]
Foto N. Afrati and Phokion G. Kolaitis. 2009. Repair checking in inconsistent databases: algorithms and complexity ICDT. ACM, 31--41.
[2]
Paola Alimonti and Viggo Kann. 2000. Some APX-completeness results for cubic graphs. Theor. Comput. Sci. Vol. 237, 1--2 (2000), 123--134.
[3]
Omid Amini, Stéphane Pérennes, and Ignasi Sau. 2009. Hardness and approximation of traffic grooming. Theor. Comput. Sci. Vol. 410, 38--40 (2009), 3751--3760.
[4]
Periklis Andritsos, Ariel Fuxman, and Renée J. Miller. 2006. Clean Answers over Dirty Databases: A Probabilistic Approach ICDE. IEEE Computer Society, 30.
[5]
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases PODS. 68--79.
[6]
Ahmad Assadi, Tova Milo, and Slava Novgorodov. 2017. DANCE: Data Cleaning with Constraints and Experts ICDE. IEEE Computer Society, 1409--1410.
[7]
Reuven Bar-Yehuda and Shimon Even. 1981. A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem. J. Algorithms Vol. 2, 2 (1981), 198--203.
[8]
Catriel Beeri and Moshe Y. Vardi. 1984. Formal Systems for Tuple and Equality Generating Dependencies. SIAM J. Comput. Vol. 13, 1 (1984), 76--98.
[9]
Moria Bergman, Tova Milo, Slava Novgorodov, and Wang-Chiew Tan. 2015. QOCO: A Query Oriented Data Cleaning System with Oracles. PVLDB Vol. 8, 12 (2015), 1900--1903.
[10]
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2007. Conditional Functional Dependencies for Data Cleaning ICDE. IEEE, 746--755.
[11]
Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. 1984. Inclusion Dependencies and Their Interaction with Functional Dependencies. J. Comput. Syst. Sci. Vol. 28, 1 (1984), 29--59.
[12]
Jan Chomicki and Jerzy Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Inf. Comput. Vol. 197, 1--2 (2005), 90--121.
[13]
Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. 2013. NADEEF: a commodity data cleaning system. In SIGMOD. ACM, 541--552.
[14]
Nilesh N. Dalvi and Dan Suciu. 2004. Efficient Query Evaluation on Probabilistic Databases VLDB. Morgan Kaufmann, 864--875.
[15]
C. J. Date. 1981. Referential Integrity. In VLDB. VLDB Endowment, 2--12.
[16]
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis. 2015. Dichotomies in the Complexity of Preferred Repairs PODS. ACM, 3--15.
[17]
Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers.
[18]
Terry Gaasterland, Parke Godfrey, and Jack Minker. 1992. An Overview of Cooperative Answering. J. Intell. Inf. Syst. Vol. 1, 2 (1992), 123--157.
[19]
Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. 2013. The LLUNATIC Data-Cleaning Framework. PVLDB Vol. 6, 9 (2013), 625--636.
[20]
Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. 2014. The Most Probable Database Problem. In BUDA.
[21]
Johan Håstad. 2001. Some optimal inapproximability results. J. ACM Vol. 48, 4 (2001), 798--859.
[22]
Benny Kimelfeld. 2012. A dichotomy in the complexity of deletion propagation with functional dependencies. In PODS. 191--202.
[23]
Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. 2017. Detecting Ambiguity in Prioritized Database Repairing ICDT. 17:1--17:20.
[24]
Solmaz Kolahi and Laks V. S. Lakshmanan. 2009. On approximating optimum repairs for functional dependency violations ICDT, Vol. Vol. 361. ACM, 53--62.
[25]
Mark W. Krentel. 1988. The Complexity of Optimization Problems. J. Comput. Syst. Sci. Vol. 36, 3 (1988), 490--509.
[26]
H. W. Kuhn and Bryn Yaw. 1955. The Hungarian method for the assignment problem. Naval Res. Logist. Quart (1955), 83--97.
[27]
Ester Livshits and Benny Kimelfeld. 2017. Counting and Enumerating (Preferred) Database Repairs PODS. 289--301.
[28]
Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. 2017. Computing Optimal Repairs for Functional Dependencies. CoRR Vol. abs/1712.07705 (2017). {arxiv}1712.07705 http://arxiv.org/abs/1712.07705
[29]
Andrei Lopatenko and Leopoldo E. Bertossi. 2007. Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics. In ICDT. 179--193.
[30]
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic Data Repairs with Probabilistic Inference. PVLDB Vol. 10, 11 (2017), 1190--1201.
[31]
Slawek Staworko, Jan Chomicki, and Jerzy Marcinkowski. 2012. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. Vol. 64, 2--3 (2012), 209--246.
[32]
Dan Suciu, Dan Olteanu, R. Christopher, and Christoph Koch. 2011. Probabilistic Databases (bibinfoedition1st ed.). Morgan & Claypool Publishers.

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Time Series Data ValidityProceedings of the ACM on Management of Data10.1145/35889391:1(1-26)Online publication date: 30-May-2023
  • (2023)From Minimum Change to Maximum Density: On Determining Near-Optimal S-RepairIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3294401(1-12)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
May 2018
462 pages
ISBN:9781450347068
DOI:10.1145/3196959
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation
  2. cardinality repairs
  3. database cleaning
  4. dichotomy
  5. functional dependencies
  6. inconsistent databases
  7. optimal repairs
  8. value repairs

Qualifiers

  • Research-article

Funding Sources

  • Israel Cyber Bureau
  • The Israel Science Foundation
  • Technion Hiroshi Fujiwara Cyber Security Research Center
  • NIH
  • NSF

Conference

SIGMOD/PODS '18
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)9
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Query-Guided Resolution in Uncertain DatabasesProceedings of the ACM on Management of Data10.1145/35893251:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Time Series Data ValidityProceedings of the ACM on Management of Data10.1145/35889391:1(1-26)Online publication date: 30-May-2023
  • (2023)From Minimum Change to Maximum Density: On Determining Near-Optimal S-RepairIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3294401(1-12)Online publication date: 2023
  • (2022)Toward interpretable and actionable data analysis with explanations and causalityProceedings of the VLDB Endowment10.14778/3554821.355490215:12(3812-3820)Online publication date: 1-Aug-2022
  • (2022)On repairing timestamps for regular interval time seriesProceedings of the VLDB Endowment10.14778/3538598.353860715:9(1848-1860)Online publication date: 27-Jul-2022
  • (2021)Stream Data Cleaning under Speed and Acceleration ConstraintsACM Transactions on Database Systems10.1145/346574046:3(1-44)Online publication date: 28-Sep-2021
  • (2021)Explanations for Data Repair Through Shapley ValuesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482341(362-371)Online publication date: 26-Oct-2021
  • (2021)From Minimum Change to Maximum Density: On S-Repair under Integrity Constraints2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00181(1943-1948)Online publication date: Apr-2021
  • (2020)Aggregated deletion propagation for counting conjunctive query answersProceedings of the VLDB Endowment10.14778/3425879.342589214:2(228-240)Online publication date: 16-Nov-2020
  • (2020)First-Order Rewritability in Consistent Query Answering with Respect to Multiple KeysProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387654(113-129)Online publication date: 14-Jun-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media