Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic Databases

Published: 14 May 2024 Publication History

Abstract

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.

Supplemental Material

MP4 File
Presentation video

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.
[2]
Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, and Dan Olteanu. 2023. Banzhaf Values for Facts in Query Answering. arXiv preprint arXiv:2308.05588.
[3]
Antoine Amarilli. 2023. Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries. In ICDT (LIPIcs, Vol. 255). 14:1--14:17.
[4]
Antoine Amarilli, Florent Capelli, Mikaël Monet, and Pierre Senellart. 2020. Connecting knowledge compilation classes and width parameters. Theory of Computing Systems, Vol. 64 (2020), 861--914.
[5]
Marcelo Arenas, Pablo Barceló, Leopoldo E. Bertossi, and Mikaë l Monet. 2021. The Tractability of SHAP-Score-Based Explanations for Classification over Deterministic and Decomposable Boolean Circuits. In AAAI. 6670--6678.
[6]
Marcelo Arenas, Pablo Barceló, Leopoldo E Bertossi, and Mikaël Monet. 2023. J. Mach. Learn. Res., Vol. 24, 63 (2023), 1--58.
[7]
Marcelo Arenas, Pablo Barceló, Leonid Libkin, Wim Martens, and Andreas Pieris. 2022. Database Theory. Work in progress, latest version at https://github.com/pdm-book/community.
[8]
John F Banzhaf III. 1964. Weighted voting doesn't work: A mathematical analysis. Rutgers L. Rev., Vol. 19 (1964), 317.
[9]
Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, and Mikaël Monet. 2023. The Shapley value in database management. SIGMOD Record, Vol. 52, 2 (2023), 6--17.
[10]
Meghyn Bienvenu, Diego Figueira, and Pierre Lafourcade. 2024. When is Shapley Value Computation a Matter of Counting?. In PODS.
[11]
Surajit Borkotokey, Sujata Gowala, and Rajnish Kumar. 2023. The Expected Shapley value on a class of probabilistic games. arXiv preprint arXiv:2308.03489.
[12]
Francesc Carreras and Maria Albina Puente. 2015a. Coalitional multinomial probabilistic values. European Journal of Operational Research, Vol. 245, 1 (2015), 236--246.
[13]
Francesc Carreras and Mar'ia Albina Puente. 2015b. Multinomial probabilistic values. Group decision and negotiation, Vol. 24, 6 (2015), 981--991.
[14]
Nilesh Dalvi and Dan Suciu. 2013. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), Vol. 59, 6 (2013), 1--87.
[15]
John Deegan and Edward W Packel. 1978. A new index of power for simple n-person games. International Journal of Game Theory, Vol. 7 (1978), 113--123.
[16]
Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2021. On the Tractability of SHAP Explanations. In AAAI. 6505--6513.
[17]
Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaë l Monet. 2022. Computing the Shapley Value of Facts in Query Answering. In SIGMOD Conference. 1570--1583.
[18]
Manfred J Holler and Edward W Packel. 1983. Power, luck and the right index. Zeitschrift für Nationalökonomie, Vol. 43 (1983), 21--29.
[19]
Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. 1986. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, Vol. 43 (1986), 169--188.
[20]
Ron J Johnston. 1977. National sovereignty and national power in European institutions. Environment and Planning A, Vol. 9, 5 (1977), 569--577.
[21]
Ronald John Johnston. 1978. On the measurement of power: Some reactions to Laver. Environment and Planning A, Vol. 10, 8 (1978), 907--914.
[22]
Ahmet Kara, Dan Olteanu, and Dan Suciu. 2024. From Shapley Value to Model Counting and Back. In PODS.
[23]
Adam Karczmarz, Tomasz P. Michalak, Anish Mukherjee, Piotr Sankowski, and Piotr Wygocki. 2022. Improved feature importance computation for tree models based on the Banzhaf value. In UAI.
[24]
Werner Kirsch and Jessica Langner. 2010. Power indices and minimal winning coalitions. Social Choice and Welfare, Vol. 34, 1 (2010), 33--46.
[25]
Maurice Koster, Sascha Kurz, Ines Lindner, and Stefan Napel. 2017. The prediction value. Social Choice and Welfare, Vol. 48 (2017), 433--460.
[26]
Jean-Marie Lagniez and Pierre Marquis. 2017. An Improved Decision-DNNF Compiler. In IJCAI. 667--673.
[27]
Annick Laruelle. 1999. On the choice of a power index. Technical Report. Instituto Valenciano de Investigaciones Económicas.
[28]
Annick Laruelle and Federico Valenciano. 2008. Potential, value, and coalition formation. Transactions in Operations Research, Vol. 16 (2008), 73--89.
[29]
Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021. The Shapley value of tuples in query answering. Logical Methods in Computer Science, Vol. 17 (2021).
[30]
Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley value of tuples in query answering. In ICDT, Vol. 155. 20:1--20:19.
[31]
Mikaël Monet. 2020. Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases. In PODS. 149--163.
[32]
Guillermo Owen. 1972. Multilinear extensions of games. Management Science, Vol. 18, 5-part-2 (1972), 64--79.
[33]
J Scott Provan and Michael O Ball. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., Vol. 12, 4 (1983), 777--788.
[34]
Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The impact of negation on the complexity of the Shapley value in conjunctive queries. In PODS. 285--297.
[35]
Pierre Senellart. 2017. Provenance and Probabilities in Relational Databasesstring: From Theory to Practice. SIGMOD Record, Vol. 46, 4 (2017).
[36]
Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. ProvSQL: Provenance and Probability Management in PostgreSQL. Proc. VLDB Endow., Vol. 11, 12 (2018), 2034--2037.
[37]
Lloyd S. Shapley et al. 1953. A value for n-person games. (1953).
[38]
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool.
[39]
G Tseitin. 1968. On the complexity of derivation in propositional calculus. Studies in Constrained Mathematics and Mathematical Logic (1968).
[40]
Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2022. On the tractability of SHAP explanations. Journal of Artificial Intelligence Research, Vol. 74 (2022), 851--886.
[41]
Robert J Weber. 1988. Probabilistic values for games. The Shapley Value. Essays in Honor of Lloyd S. Shapley (1988), 101--119.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 2
PODS
May 2024
852 pages
EISSN:2836-6573
DOI:10.1145/3665155
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2024
Published in PACMMOD Volume 2, Issue 2

Permissions

Request permissions for this article.

Author Tags

  1. banzhaf value
  2. d-d circuit
  3. knowledge compilation
  4. probabilistic database
  5. provenance
  6. shapley value

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media