Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The Complexity of Counting Problems Over Incomplete Databases

Published: 08 September 2021 Publication History

Abstract

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: Given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join–free conjunctive query and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations: For instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption. Moreover, we find that both (1) and (2) can reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial-time randomized approximation scheme (FPRAS), in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. Retrieved from http://webdam.inria.fr/Alice/.
[2]
Serge Abiteboul, Paris Kanellakis, and Gösta Grahne. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1 (1991), 159–187. Retrieved from https://www.sciencedirect.com/science/article/pii/0304397551900072.
[3]
Adam. 2013. Number of surjective functions. Retrieved from https://math.stackexchange.com/a/615136/378365.
[4]
Parag Agrawal, Anish Das Sarma, Jeffrey Ullman, and Jennifer Widom. 2010. Foundations of uncertain-data integration. Proc. VLDB Endow. 3, 1-2 (2010), 1080–1090. Retrieved from http://ilpubs.stanford.edu:8090/846/2/main.pdf.
[5]
Carme Álvarez and Birgit Jenner. 1993. A very hard log-space counting class. Theoret. Comput. Sci. 107, 1 (1993), 3–30. Retrieved from https://www.sciencedirect.com/science/article/pii/030439759390252O.
[6]
Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. 2006. Clean answers over dirty databases: A probabilistic approach. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, 30–30. Retrieved from ftp://ftp.cs.toronto.edu/csrg-technical-reports/513/tr513.pdf.
[7]
Lyublena Antova, Christoph Koch, and Dan Olteanu. 2009. 10(106) worlds and beyond: Efficient representation and processing of incomplete information. VLDB J. 18, 5 (2009), 1021–1040. Retrieved from https://arxiv.org/abs/cs/0606075.
[8]
Marcelo Arenas, Pablo Barceló, and Mikaël Monet. 2020. Counting problems over incomplete databases. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 165–177. Retrieved from https://arxiv.org/abs/1912.11064.
[9]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 68–79. Retrieved from http://marenas.sitios.ing.uc.cl/publications/pods99.pdf.
[10]
Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. 2019. Efficient logspace classes for enumeration, counting, and uniform generation. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 59–73. Retrieved from https://arxiv.org/abs/1906.09226.
[11]
Omar Benjelloun, Anish Das Sarma, Alon Halevy, Martin Theobald, and Jennifer Widom. 2008. Databases with uncertainty and lineage. VLDB J. 17, 2 (2008), 243–264. Retrieved from http://ilpubs.stanford.edu:8090/811/1/2007-26.pdf.
[12]
Leopoldo Bertossi. 2011. Database repairing and consistent query answering. Synth. Lect. Data Manag. 3, 5 (2011), 1–121. Retrieved from https://www.cs.ubc.ca/ laks/cpsc504/dc-leo.pdf.
[13]
Leopoldo Bertossi. 2019. Database repairs and consistent query answering: Origins and further developments. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 48–58. Retrieved from https://dl.acm.org/citation.cfm?id=3322190.
[14]
Jin-Yi Cai, Pinyan Lu, and Mingji Xia. 2012. Holographic reduction, interpolation and hardness. Computat. Complex. 21, 4 (2012), 573–604. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.145.8882&rep=rep 1&type=pdf.
[15]
Marco Calautti, Marco Console, and Andreas Pieris. 2019. Counting database repairs under primary keys revisited. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 104–118. Retrieved from https://core.ac.uk/download/pdf/224804156.pdf.
[16]
T. Codd. 1975. Understanding relations (installment# 7). FDT Bull. ACM SIGMOD 7 (1975), 23–28.
[17]
Marco Console, Paolo Guagliardo, and Leonid Libkin. 2016. Approximations and refinements of certain answers via many-valued logics. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning. 349–358. Retrieved from https://homepages.inf.ed.ac.uk/libkin/papers/kr16.pdf.
[18]
Nilesh Dalvi, Christopher Re, and Dan Suciu. 2011. Queries and materialized views on probabilistic databases. J. Comput. Syst. Sci. 77, 3 (2011), 473–490. Retrieved from https://www-cs.stanford.edu/ chrismre/papers/jcss-probdb.pdf.
[19]
Nilesh Dalvi and Dan Suciu. 2013. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM 59, 6 (2013), 1–87. Retrieved from https://homes.cs.washington.edu/ suciu/jacm-dichotomy.pdf.
[20]
Martin Dyer, Alan Frieze, and Mark Jerrum. 2002. On counting independent sets in sparse graphs. SIAM J. Comput. 31, 5 (2002), 1527–1541. Retrieved from http://yaroslavvb.com/papers/dyer-on.pdf.
[21]
Jack Edmonds. 1965. Paths, trees, and flowers. Can. J. Math. 17 (1965), 449–467. Retrieved from https://math.nist.gov/ JBernal/p_t_f.pdf.
[22]
Genghua Fan, Yan Li, Ning Song, and Daqing Yang. 2015. Decomposing a graph into pseudoforests with one having bounded degree. J. Combinat. Theor., Series B 115 (2015), 72–95. Retrieved from https://www.sciencedirect.com/science/article/pii/S0095895615000581.
[23]
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Datab. Syst. 33, 2 (2008), 1–48. Retrieved from https://www.inf.ed.ac.uk/publications/online/0949.pdf.
[24]
Stephen A. Fenner, Lance J. Fortnow, and Stuart A. Kurtz. 1994. Gap-definable counting classes. J. Comput. Syst. Sci. 48, 1 (1994), 116–148. Retrieved from https://www.sciencedirect.com/science/article/pii/S0022000005800248.
[25]
Amélie Gheerbrant and Cristina Sirangelo. 2019. Best answers over incomplete data: Complexity and first-order rewritings. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). Retrieved from https://www.ijcai.org/proceedings/2019/236.
[26]
Omer Giménez and Marc Noy. 2006. On the complexity of computing the Tutte polynomial of bicircular matroids. Combinat., Probab. Comput. 15, 3 (2006), 385–395. Retrieved from https://www.cambridge.org/core/journals/combinatorics-probability-and-com puting/article/on-the-complexity-of-computing-the-tutte-polynomial-of-bicircul ar-matroids/8B45505EDDDF91337E62D45B143EC5E5.
[27]
Logan Grout and Benjamin Moore. 2019. On decomposing graphs into forests and pseudoforests. arXiv preprint arXiv:1904.12435 (2019).
[28]
Sanjay Gupta. 1991. The power of witness reduction. In Proceedings of the 6th Structure in Complexity Theory Conference. IEEE, 43–59. Retrieved from https://ieeexplore.ieee.org/abstract/document/160242.
[29]
Sanjay Gupta. 1995. Closure properties and witness reduction. J. Comput. Syst. Sci. 50, 3 (1995), 412–432. Retrieved from https://www.sciencedirect.com/science/article/pii/S002200008571032X.
[30]
Tomasz Imielinski and Witold Lipski Jr.1984. Incomplete information in relational databases. J. ACM 31, 4 (1984), 761–791. Retrieved from https://cs.uwaterloo.ca/ david/cs848s14/il84.pdf.
[31]
Neil Immerman. 2012. Descriptive Complexity. Springer Science & Business Media. Retrieved from https://people.cs.umass.edu/ immerman/book/ch0_1_2.pdf.
[32]
François Jaeger, Dirk L. Vertigan, and Dominic J. A. Welsh. 1990. On the computational complexity of the Jones and Tutte polynomials. In Math. Proc. Cambridge Phil. Soc., Vol. 108. Cambridge University Press, 35–53. Retrieved from https://www.cambridge.org/core/journals/mathematical-proceedings-of-the-cambridge-philosophical-society/article/on-the-computational-complexity-of-the-jones-and-tutte-polynomials/0EA052341269A2816C36B15380B8AA02.
[33]
Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. 1986. Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43 (1986), 169–188. Retrieved from http://www2.stat.duke.edu/ scs/Courses/Stat376/Papers/ConvergeRates/Rando mizedAlgs/JerrumValiantVaziraniTCS1986.pdf.
[34]
Ker-I. Ko. 1982. Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett. 14, 1 (1982), 39–43. Retrieved from https://www.sciencedirect.com/science/article/pii/0020019082901399.
[35]
Johannes Köbler, Uwe Schöning, and Jacobo Torán. 1989. On counting and approximation. Acta Inform. 26, 4 (1989), 363–379. Retrieved from https://www.researchgate.net/publication/226508658_On_counting_and_approx imation.
[36]
Łukasz Kowalik. 2006. Approximation scheme for lowest outdegree orientation and graph density measures. In Proceedings of the International Symposium on Algorithms and Computation. Springer, 557–566. Retrieved from https://www.mimuw.edu.pl/ kowalik/papers/orient.pdf.
[37]
Leonid Libkin. 2014. Incomplete data: What went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 1–13. Retrieved from https://homepages.inf.ed.ac.uk/libkin/papers/pods14.pdf.
[38]
Leonid Libkin. 2018. Certain answers meet zero-one laws. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 195–207. Retrieved from https://homepages.inf.ed.ac.uk/libkin/papers/pods18.pdf.
[39]
Meena Mahajan, Thomas Thierauf, and N. V. Vinodchandran. 1994. A note on SpanP functions. Inf. Process. Lett. 51, 1 (1994), 7–10. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.9933&rep=rep1 &type=pdf.
[40]
Dany Maslowski and Jef Wijsen. 2013. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci. 79, 6 (2013), 958–983. Retrieved from https://www.sciencedirect.com/science/article/pii/S0022000013000214.
[41]
Dany Maslowski and Jef Wijsen. 2014. Counting database repairs that satisfy conjunctive queries with self-joins. In Proceedings of the International Conference on Database Theory. 155–164. Retrieved from http://www.openproceedings.org/ICDT/2014/paper_17.pdf.
[42]
Mitsunori Ogiwara and Lane A. Hemachandra. 1993. A complexity theory for feasible closure properties. J. Comput. Syst. Sci. 46, 3 (1993), 295–325. Retrieved from https://www.sciencedirect.com/science/article/pii/002200009390006I.
[43]
James Oxley. 2003. What is a matroid. Cubo Matem. Educac. 5, 3 (2003), 179–218. Retrieved from https://www.math.lsu.edu/ oxley/survey4.pdf.
[44]
J. Scott Provan and Michael O. Ball. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4 (1983), 777–788. Retrieved from https://epubs.siam.org/doi/abs/10.1137/0212053.
[45]
Raymond Reiter. 1978. On Closed World Data Bases. Springer US, 55–76. Retrieved from https://pdfs.semanticscholar.org/d82a/786f460a5d5c2c6d97aa60f0ead0e70dc67 e.pdf.
[46]
Anish Das Sarma, Omar Benjelloun, Alon Halevy, Shubha Nabar, and Jennifer Widom. 2009. Representing uncertain data: Models, properties, and algorithms. VLDB J. 18, 5 (2009), 989–1019. Retrieved from http://ilpubs.stanford.edu:8090/924/1/uncertainData.pdf.
[47]
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool. Retrieved from https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016.
[48]
Seinosuke Toda and Osamu Watanabe. 1992. Polynomial-time 1-Turing reductions from# PH to# P. Theoret. Comput. Sci. 100, 1 (1992), 205–221. Retrieved from https://www.sciencedirect.com/science/article/pii/030439759290369Q.
[49]
Leslie G. Valiant. 1976. Relative complexity of checking and evaluating. Inf. Process. Lett. 5, 1 (1976), 20–23. Retrieved from https://www.sciencedirect.com/science/article/pii/0020019076900971.
[50]
Leslie G. Valiant. 1979. The complexity of computing the permanent. Theoret. Comput. Sci. 8, 2 (1979), 189–201. Retrieved from https://core.ac.uk/download/pdf/82500417.pdf.
[51]
Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410–421. Retrieved from https://www.math.cmu.edu/ af1p/Teaching/MCC17/Papers/enumerate.pdf.
[52]
Ron van der Meyden. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems. Springer, 307–356. Retrieved from https://link.springer.com/chapter/10.1007/978-1-4615-5643-5_10.
[53]
Moshe Y. Vardi. 1982. The complexity of relational query languages. In Proceedings of the 14th ACM Symposium on Theory of Computing. 137–146. Retrieved from http://www.dis.uniroma1.it/ degiacom/didattica/semingsoft/SIS05-06/materiale/1-query-congiuntive/riferimenti/vardi-1982.pdf.
[54]
Dominic Welsh. 1999. The Tutte polynomial. Rand. Struct. Algor. 15, 3-4 (1999), 210–228. Retrieved from https://onlinelibrary.wiley.com/doi/pdf/10.1002/(SICI)1098-2418(199910/12)15:3/43C210::AID-RSA23E3.0.CO;2-R.
[55]
Mingji Xia, Peng Zhang, and Wenbo Zhao. 2007. Computational complexity of counting problems on 3-regular planar graphs. Theoret. Comput. Sci. 384, 1 (2007), 111–125. Retrieved from https://core.ac.uk/download/pdf/82063901.pdf.
[56]
Thomas Zaslavsky. 1982. Bicircular geometry and the lattice of forests of a graph. Quart. J. Math. 33, 4 (1982), 493–511. Retrieved from https://academic.oup.com/qjmath/article-abstract/33/4/493/1498307?redirec tedFrom=PDF.

Cited By

View all
  • (2024)Consistent Query Answering for Primary Keys on Rooted Tree QueriesProceedings of the ACM on Management of Data10.1145/36511392:2(1-26)Online publication date: 14-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computational Logic
ACM Transactions on Computational Logic  Volume 22, Issue 4
October 2021
264 pages
ISSN:1529-3785
EISSN:1557-945X
DOI:10.1145/3483333
  • Editor:
  • Anuj Dawar
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2021
Accepted: 01 April 2021
Received: 01 November 2020
Published in TOCL Volume 22, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Incomplete databases
  2. closed-world assumption
  3. counting complexity
  4. Fully Polynomial-time Randomized Approximation Scheme (FPRAS)

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • ANID - Millennium Science Initiative Program
  • Fondecyt

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Consistent Query Answering for Primary Keys on Rooted Tree QueriesProceedings of the ACM on Management of Data10.1145/36511392:2(1-26)Online publication date: 14-May-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media