Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3375395.3387656acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Counting Problems over Incomplete Databases

Published: 14 June 2020 Publication History

Abstract

We study the complexity of various fundamental counting problems that arise in the context of incomplete databases, i.e., relational databases that can contain unknown values in the form of labeled nulls. Specifically, we assume that the domains of these unknown values are finite and, for a Boolean query q, we consider the following two problems: given as input an incomplete database D, (a) return the number of completions of D that satisfy q; or (b) return or the number of valuations of the nulls of D yielding a completion that satisfies q. We obtain dichotomies between #P-hardness and polynomial-time computability for these problems when q is a self-join-free conjunctive query, and study the impact on the complexity of the following two restrictions: (1) every null occurs at most once in D (what is called Codd tables); and (2) the domain of each null is the same. Roughly speaking, we show that counting completions is much harder than counting valuations (for instance, while the latter is always in #P, we prove that the former is not in #P under some widely believed theoretical complexity assumption). Moreover, we find that both (1) and (2) reduce the complexity of our problems. We also study the approximability of these problems and show that, while counting valuations always has a fully polynomial randomized approximation scheme, in most cases counting completions does not. Finally, we consider more expressive query languages and situate our problems with respect to known complexity classes.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. http://webdam.inria.fr/Alice/ Foundations of databases. Vol. 8. Addison-Wesley Reading.
[2]
Serge Abiteboul, Paris C. Kanellakis, and Gö sta Grahne. 1991. https://www.sciencedirect.com/science/article/pii/0304397551900072 On the representation and querying of sets of possible worlds. TCS, Vol. 78, 1 (1991), 158--187.
[3]
Adam. 2013. https://math.stackexchange.com/a/615136/378365 Number of surjective functions. https://math.stackexchange.com/a/615136/378365
[4]
Parag Agrawal, Anish Das Sarma, Jeffrey D. Ullman, and Jennifer Widom. 2010. http://ilpubs.stanford.edu:8090/846/2/main.pdf Foundations of uncertain-data integration. PVLDB, Vol. 3, 1 (2010), 1080--1090.
[5]
Carme Álvarez and Birgit Jenner. 1993. https://www.sciencedirect.com/science/article/pii/030439759390252O A very hard log-space counting class. TCS, Vol. 107, 1 (1993), 3--30.
[6]
Periklis Andritsos, Ariel Fuxman, and Renee J Miller. 2006. ftp://ftp.cs.toronto.edu/csrg-technical-reports/513/tr513.pdf Clean answers over dirty databases: A probabilistic approach. In ICDE. IEEE, 30--30.
[7]
Lyublena Antova, Christoph Koch, and Dan Olteanu. 2009. https://arxiv.org/abs/cs/060607510((10(6 ))) worlds and beyond: efficient representation and processing of incomplete information. VLDB J., Vol. 18, 5 (2009), 1021--1040.
[8]
Marcelo Arenas, Pablo Barceló, and Mikaël Monet. 2019 a. Counting Problems over Incomplete Databases. arXiv preprint arXiv:1912.11064 (2019). https://arxiv.org/abs/1912.11064 Extended version of the current article.
[9]
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. 1999. http://marenas.sitios.ing.uc.cl/publications/pods99.pdf Consistent query answers in inconsistent databases. In PODS. 68--79.
[10]
Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. 2019 b. https://arxiv.org/abs/1906.09226 Efficient logspace classes for enumeration, counting, and uniform generation. In PODS. 59--73.
[11]
Omar Benjelloun, Anish Das Sarma, Alon Y. Halevy, and Jennifer Widom. 2006. http://www.vldb.org/conf/2006/p953-benjelloun.pdf ULDBs: Databases with uncertainty and lineage. In VLDB. 953--964.
[12]
Leopoldo E. Bertossi. 2011. https://www.cs.ubc.ca/laks/cpsc504/dc-leo.pdf Database repairing and consistent query answering. Morgan & Claypool Publishers.
[13]
Leopoldo E. Bertossi. 2019. https://dl.acm.org/citation.cfm?id=3322190 Database repairs and consistent query answering: Origins and further developments. In PODS. 48--58.
[14]
Jin-Yi Cai, Pinyan Lu, and Mingji Xia. 2012. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.145.8882&rep=rep1&type=pdf Holographic reduction, interpolation and hardness. Computational Complexity, Vol. 21, 4 (2012), 573--604.
[15]
Marco Calautti, Marco Console, and Andreas Pieris. 2019. https://dl.acm.org/citation.cfm?id=3319703 Counting database repairs under primary Keys revisited. In PODS. 104--118.
[16]
T Codd. 1975. Understanding relations (installment# 7). FDT Bull. of ACM Sigmod, Vol. 7 (1975), 23--28.
[17]
Marco Console, Paolo Guagliardo, and Leonid Libkin. 2016. https://dl.acm.org/citation.cfm?id=3032069 Approximations and refinements of certain answers via many-valued logics. In KR. 349--358.
[18]
Nilesh Dalvi, Christopher Re, and Dan Suciu. 2011. https://www-cs.stanford.edu/ chrismre/papers/jcss-probdb.pdf Queries and materialized views on probabilistic databases. J. Comput. System Sci., Vol. 77, 3 (2011), 473--490.
[19]
Nilesh N. Dalvi and Dan Suciu. 2012. https://homes.cs.washington.edu/suciu/jacm-dichotomy.pdf The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM, Vol. 59, 6 (2012), 30.
[20]
Martin Dyer, Alan Frieze, and Mark Jerrum. 2002. http://yaroslavvb.com/papers/dyer-on.pdf On counting independent sets in sparse graphs. SIAM J. on Computing, Vol. 31, 5 (2002), 1527--1541.
[21]
Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. https://www.inf.ed.ac.uk/publications/online/0949.pdf Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst., Vol. 33, 2 (2008), 6:1--6:48.
[22]
Stephen A. Fenner, Lance Fortnow, and Stuart A. Kurtz. 1994. https://www.sciencedirect.com/science/article/pii/S0022000005800248 Gap-definable counting classes. J. Comput. Syst. Sci., Vol. 48, 1 (1994), 116--148.
[23]
Amé lie Gheerbrant and Cristina Sirangelo. 2019. https://www.ijcai.org/proceedings/2019/236 Best answers over incomplete data: Complexity and first-order rewritings. In IJCAI. 1704--1710.
[24]
Sanjay Gupta. 1991. https://ieeexplore.ieee.org/abstract/document/160242 The power of witness reduction. In [1991] Proceedings of the Sixth Annual Structure in Complexity Theory Conference. IEEE, 43--59.
[25]
Tomasz Imieli'nski and Witold Lipski, Jr. 1984. https://cs.uwaterloo.ca/ david/cs848s14/il84.pdf Incomplete information in relational databases. J. ACM, Vol. 31, 4 (1984), 761--791.
[26]
Francc ois Jaeger, Dirk L Vertigan, and Dominic JA Welsh. 1990. https://www.cambridge.org/core/journals/mathematical-proceedings-of-the-cambridge-philosophical-society/article/on-the-computational-complexity-of-the-jones-and-tutte-polynomials/0EA052341269A2816C36B15380B8AA02 On the computational complexity of the Jones and Tutte polynomials. In Mathematical Proc. of the Cambridge Phil. Soc., Vol. 108. Cambridge University Press, 35--53.
[27]
Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. 1986. http://www2.stat.duke.edu/ scs/Courses/Stat376/Papers/ConvergeRates/RandomizedAlgs/JerrumValiantVaziraniTCS1986.pdf Random generation of combinatorial structures from a uniform distribution. TCS, Vol. 43 (1986), 169--188.
[28]
Ker-I Ko. 1982. https://www.sciencedirect.com/science/article/pii/0020019082901399 Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett., Vol. 14, 1 (1982), 39--43.
[29]
Johannes Köbler, Uwe Schöning, and Jacobo Torán. 1989. https://www.researchgate.net/publication/226508658_On_counting_and_approximation On counting and approximation. Acta Informatica, Vol. 26, 4 (1989), 363--379.
[30]
Leonid Libkin. 2014. https://homepages.inf.ed.ac.uk/libkin/papers/pods14.pdf Incomplete data: what went wrong, and how to fix it. In PODS. 1--13.
[31]
Leonid Libkin. 2018. https://homepages.inf.ed.ac.uk/libkin/papers/pods18.pdf Certain answers meet zero-one laws. In PODS. 195--207.
[32]
Dany Maslowski and Jef Wijsen. 2013. https://www.sciencedirect.com/science/article/pii/S0022000013000214 A dichotomy in the complexity of counting database repairs. JCSS, Vol. 79, 6 (2013), 958--983.
[33]
Dany Maslowski and Jef Wijsen. 2014. http://www.openproceedings.org/ICDT/2014/paper_17.pdf Counting database repairs that satisfy conjunctive queries with self-joins. In ICDT. 155--164.
[34]
Mikaël Monet. 2019. https://cstheory.stackexchange.com/questions/43888/a-variant-of-positive-2-dnf A variant of #POSITIVE-2-DNF. https://cstheory.stackexchange.com/q/43888/38111
[35]
Mitsunori Ogiwara and Lane A. Hemachandra. 1993. https://www.sciencedirect.com/science/article/pii/002200009390006I A complexity theory for feasible closure properties. J. Comput. Syst. Sci., Vol. 46, 3 (1993), 295--325.
[36]
J Scott Provan and Michael O Ball. 1983. https://epubs.siam.org/doi/abs/10.1137/0212053 The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., Vol. 12, 4 (1983), 777--788.
[37]
Raymond Reiter. 1978. https://pdfs.semanticscholar.org/d82a/786f460a5d5c2c6d97aa60f0ead0e70dc67e.pdf On closed world data bases. Springer US, 55--76.
[38]
Anish Das Sarma, Omar Benjelloun, Alon Y. Halevy, Shubha U. Nabar, and Jennifer Widom. 2009. http://ilpubs.stanford.edu:8090/924/1/uncertainData.pdf Representing uncertain data: models, properties, and algorithms. VLDB J., Vol. 18, 5 (2009), 989--1019.
[39]
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016 Probabilistic databases. Morgan & Claypool.
[40]
Seinosuke Toda and Osamu Watanabe. 1992. https://www.sciencedirect.com/science/article/pii/030439759290369Q Polynomial time 1-Turing reductions from #PH to #P. Theor. Comput. Sci., Vol. 100, 1 (1992), 205--221.
[41]
Leslie G. Valiant. 1976. https://www.sciencedirect.com/science/article/pii/0020019076900971 Relative complexity of checking and evaluating. Inf. Process. Lett., Vol. 5, 1 (1976), 20--23.
[42]
Leslie G Valiant. 1979 a. https://core.ac.uk/download/pdf/82500417.pdf The complexity of computing the permanent. TCS, Vol. 8, 2 (1979), 189--201.
[43]
Leslie G. Valiant. 1979 b. https://www.math.cmu.edu/ af1p/Teaching/MCC17/Papers/enumerate.pdf The complexity of enumeration and reliability problems. SIAM J. Comput., Vol. 8, 3 (1979), 410--421.
[44]
Ron van der Meyden. 1998. https://link.springer.com/chapter/10.1007/978--1--4615--5643--5_10 Logical approaches to incomplete information: A survey. In Logics for databases and information systems. Springer, 307--356.
[45]
Moshe Y Vardi. 1982. http://www.dis.uniroma1.it/degiacom/didattica/semingsoft/SIS05-06/materiale/1-query-congiuntive/riferimenti/vardi-1982.pdf The complexity of relational query languages. In STOC. ACM, 137--146.

Cited By

View all
  • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160(108948)Online publication date: Sep-2023
  • (2021)The Complexity of Counting Problems Over Incomplete DatabasesACM Transactions on Computational Logic10.1145/346164222:4(1-52)Online publication date: 8-Sep-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
June 2020
480 pages
ISBN:9781450371087
DOI:10.1145/3375395
  • General Chair:
  • Dan Suciu,
  • Program Chair:
  • Yufei Tao,
  • Publications Chair:
  • Zhewei Wei
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. closed-world assumption
  2. counting complexity
  3. fpras
  4. incomplete databases

Qualifiers

  • Research-article

Funding Sources

  • Millennium Institute for Foundational Research on Data (IMFD)

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160(108948)Online publication date: Sep-2023
  • (2021)The Complexity of Counting Problems Over Incomplete DatabasesACM Transactions on Computational Logic10.1145/346164222:4(1-52)Online publication date: 8-Sep-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media