Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1514894.1514899acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Repair checking in inconsistent databases: algorithms and complexity

Published: 23 March 2009 Publication History

Abstract

Managing inconsistency in databases has long been recognized as an important problem. One of the most promising approaches to coping with inconsistency in databases is the framework of database repairs, which has been the topic of an extensive investigation over the past several years. Intuitively, a repair of an inconsistent database is a consistent database that differs from the given inconsistent database in a minimal way. So far, most of the work in this area has addressed the problem of obtaining the consistent answers to a query posed on an inconsistent database. Repair checking is the following decision problem: given two databases r and r', is r' a repair of r? Although repair checking is a fundamental algorithmic problem about inconsistent databases, it has not received as much attention as consistent query answering. In this paper, we give a polynomial-time algorithm for subset-repair checking under integrity constraints that are the union of a weakly acyclic set of local-as-view (LAV) tuple-generating dependencies and a set of equality-generating dependencies. This result significantly generalizes earlier work for subset-repair checking when the integrity constraints are the union of an acyclic set of inclusion dependencies and a set of functional dependencies. We also give a polynomial-time algorithm for symmetric-difference repair checking, when the integrity constraints form a weakly acyclic set of LAV tgds. After this, we establish a number of complexity-theoretic results that delineate the boundary between tractability and intractability for the repair-checking problem. Specifically, we show that the aforementioned tractability results are optimal; in particular, subset-repair checking for arbitrary weakly acyclic sets of tuple-generating dependencies is a coNP-complete problem. We also study cardinality-based repairs and show that cardinality-repair checking is coNP-complete for various classes of integrity constraints encountered in database design and data exchange.

References

[1]
M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'99), pages 68--79, 1999.
[2]
O. Benjelloun, H. Garcia-Molina, H. Gong, H. Kawai, T. E. Larson, D. Menestrina, and S. Thavisomboon. D-swoosh: A family of algorithms for generic, distributed entity resolution. In ICDCS, page 37, 2007.
[3]
L. E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006.
[4]
L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Data cleansing for numerical data sets. In SEBD, pages 292--299, 2005.
[5]
L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.
[6]
L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.
[7]
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.
[8]
P. Bohannon, M. Flaster, W. Fan, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005.
[9]
J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007.
[10]
J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2):90--121, 2005.
[11]
A. Deutsch and V. Tannen. Reformulation of XML Queries and Constraints. In ICDT, pages 225--241, 2003.
[12]
W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Technical report, The Data Warehousing Institute, 2002. http://www.tdwi.org/research/display.aspx?ID=6064.
[13]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Preliminary version in ICDT 2003.
[14]
E. Franconi, A. L. Palma, N. Leone, S. Perri, and F. Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001.
[15]
A. Fuxman, P. G. Kolaitis, R. J. Miller, and W. C. Tan. Peer data exchange. ACM Trans. Database Syst., 31(4):1454--1498, 2006. Preliminary version in PODS 2005.
[16]
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Improving data cleaning quality using a data lineage facility. In DMDW, page 3, 2001.
[17]
G. Gottlob and A. Nash. Data exchange: computing cores in polynomial time. In PODS, pages 40--49, 2006.
[18]
R. Greenlaw, H. J. Hoover, and W. L. Ruzzo. Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, 1995.
[19]
M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov., 2(1):9--37, 1998.
[20]
P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005.
[21]
M. Lenzerini. Data Integration: A Theoretical Perspective. pages 233--246, 2002.
[22]
A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007.
[23]
D. Menestrina, O. Benjelloun, and H. Garcia-Molina. Generic entity resolution with data confidences. In CleanDB, 2006.
[24]
C. Moore and J. M. Robson. Hard tiling problems with simple tiles. Discrete and Computational Geometry, 26(4):573--590, 2001.
[25]
E. Rahm and H. Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 2000.
[26]
V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001.
[27]
S. Staworko. Declarative Inconsistency Handling in Relational and Semi-structured databases. PhD thesis, May 2007.
[28]
J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDT '09: Proceedings of the 12th International Conference on Database Theory
March 2009
334 pages
ISBN:9781605584232
DOI:10.1145/1514894
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coNP-complete problem
  2. consistent query answering
  3. database repairs
  4. equality-generating dependencies
  5. inconsistent databases
  6. polynomial time
  7. repair checking
  8. tuple-generating dependencies
  9. weakly acyclic set

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 23 - 25, 2009
St. Petersburg, Russia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)14
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computing Minimum Subset Repair on Incomplete DataWeb and Big Data10.1007/978-981-97-7238-4_28(444-459)Online publication date: 28-Aug-2024
  • (2024)Optimal Update Repair with Maximum Likelihood and Minimum CostDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_20(299-314)Online publication date: 1-Oct-2024
  • (2024)Computing Maximal Likelihood Subset Repair for Inconsistent DataWeb and Big Data10.1007/978-981-97-2390-4_1(1-15)Online publication date: 28-Apr-2024
  • (2024)Computing Repairs Under Functional and Inclusion Dependencies via ArgumentationFoundations of Information and Knowledge Systems10.1007/978-3-031-56940-1_2(23-42)Online publication date: 8-Apr-2024
  • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160(108948)Online publication date: Sep-2023
  • (2023)The MaxIS-Shapley Value in Perfect GraphsCombinatorial Optimization and Applications10.1007/978-3-031-49611-0_14(196-210)Online publication date: 9-Dec-2023
  • (2022)Design and Implementation of a Strong Representation System for Network Policies2022 International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN54977.2022.9868871(1-10)Online publication date: Jul-2022
  • (2022)Complexity thresholds in inclusion logicInformation and Computation10.1016/j.ic.2021.104759287:COnline publication date: 1-Sep-2022
  • (2022)Handling inconsistencies in tables with nulls and functional dependenciesJournal of Intelligent Information Systems10.1007/s10844-022-00700-059:2(285-317)Online publication date: 15-Apr-2022
  • (2022)Generating repairs for inconsistent modelsSoftware and Systems Modeling10.1007/s10270-022-00996-022:1(297-329)Online publication date: 4-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media