research-article

Open access

Banzhaf Values for Facts in Query Answering

Authors:

Omer Abramovich,

Dan OlteanuAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 3

Article No.: 123, Pages 1 - 26

https://doi.org/10.1145/3654926

Published: 30 May 2024 Publication History

Abstract

Quantifying the contribution of database facts to query answers has been studied as means of explanation. The Banzhaf value, originally developed in Game Theory, is a natural measure of fact contribution, yet its efficient computation for select-project-join-union queries is challenging. In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm, an anytime deterministic approximation algorithm with relative error guarantees, and an algorithm for ranking and top-k. They have three key building blocks: compilation of query lineage into an equivalent function that allows efficient Banzhaf value computation; dynamic programming computation of the Banzhaf values of variables in a Boolean function using the Banzhaf values for constituent functions; and a mechanism to compute efficiently lower and upper bounds on Banzhaf values for any positive DNF function.

We complement the algorithms with a dichotomy for the Banzhaf-based ranking problem: given two facts, deciding whether the Banzhaf value of one is greater than of the other is tractable for hierarchical queries and intractable for non-hierarchical queries.

We show experimentally that our algorithms significantly outperform exact and approximate algorithms from prior work, most times up to two orders of magnitude. Our algorithms can also cover challenging problem instances that are beyond reach for prior work.

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. http://webdam.inria.fr/Alice/

Digital Library

[2]

Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, and Dan Olteanu. 2023 a. Banzhaf Values for Facts in Query Answering. arxiv: 2308.05588 [cs.DB] Extended Version.

[3]

Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, and Dan Olteanu. 2023 b. GitHub Repository. https://github.com/Omer-Abramovich/AdaBan.

[4]

Dana Arad, Daniel Deutch, and Nave Frost. 2022. LearnShapley: Learning to Predict Rankings of Facts Contribution Based on Query Logs. In Proc. of CIKM. 4788--4792.

Digital Library

[5]

Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021. The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits. In Proc. of AAAI. https://arxiv.org/abs/2007.14045

[6]

Marcelo Arenas, Pablo Barceló, Leopoldo E. Bertossi, and Mikaë l Monet. 2023. On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results. J. Mach. Learn. Res., Vol. 24 (2023), 63:1--63:58. http://jmlr.org/papers/v24/21-0389.html

[7]

J.F. Banzhaf. 1965. Weighted Voting Doesn't Work: A Mathematical Analysis. Rutgers Law Review, Vol. 19, 2 (1965), 317--343.

[8]

Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, and Mikaël Monet. 2023. The Shapley Value in Database Management. SIGMOD Rec., Vol. 52, 2 (2023).

Digital Library

[9]

Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and Where: A Characterization of Data Provenance. In Proc. of ICDT. Springer, 316--330.

[10]

Adriane Chapman and HV Jagadish. 2009. Why Not?. In Proc. of SIGMOD. 523--534.

Digital Library

[11]

James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, Vol. 1, 4 (2009), 379--474. https://doi.org/10.1561/1900000006

Digital Library

[12]

Yingwei Cui, Jennifer Widom, and Janet L Wiener. 2000. Tracing the Lineage of View Data in a Warehousing Environment. ACM Trans. Datab. Syst., Vol. 25, 2 (2000), 179--227. http://ilpubs.stanford.edu:8090/252/1/1997--3.pdf

Digital Library

[13]

Radu Curticapean, Holger Dell, Fedor V. Fomin, Leslie Ann Goldberg, and John Lapinskas. 2019. A Fixed-Parameter Perspective on #BIS. Algorithmica, Vol. 81, 10 (2019), 3844--3864.

Digital Library

[14]

Adnan Darwiche and Pierre Marquis. 2002. A Knowledge Compilation Map. JAIR, Vol. 17 (2002), 229--264. https://arxiv.org/abs/1106.1819

[15]

Susan B. Davidson, Daniel Deutch, Nave Frost, Benny Kimelfeld, Omer Koren, and Mikaë l Monet. 2022. ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values. In Proc. of SIGMOD. 2373--2376. https://doi.org/10.1145/3514221.3520172

Digital Library

[16]

Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaë l Monet. 2022. Computing the Shapley Value of Facts in Query Answering. In Proc. of SIGMOD. 1570--1583.

Digital Library

[17]

Dennis Dosso, Susan B. Davidson, and Gianmaria Silvello. 2022. Credit Distribution in Relational Scientific Databases. Inf. Syst., Vol. 109 (2022), Article 102060.

Digital Library

[18]

Martin E. Dyer, Leslie Ann Goldberg, Catherine S. Greenhill, and Mark Jerrum. 2004. The Relative Complexity of Approximate Counting Problems. Algorithmica, Vol. 38, 3 (2004), 471--500.

[19]

Algaba E, Prieto A, and Saavedra-Nieves A. 2023. Risk Analysis Sampling Methods in Terrorist Networks based on the Banzhaf Value. Risk Analysis, Vol. 37210375 (May 2023). https://doi.org/10.1111/risa.14156

[20]

Vincent Feltkamp. 1995. Alternative Axiomatic Characterizations of the Shapley and Banzhaf Values. Int. J. Game Theory, Vol. 24 (1995), 179--186.

Digital Library

[21]

Robert Fink, Jiewen Huang, and Dan Olteanu. 2013. Anytime Approximation in Probabilistic Databases. VLDB J., Vol. 22, 6 (2013), 823--848. https://doi.org/10.1007/s00778-013-0310--5

Digital Library

[22]

Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance Semirings. In Proc. of PODS. 31--40. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research

[23]

Todd J Green and Val Tannen. 2017. The Semiring Framework for Database Provenance. In Proc. of PODS. 93--99.

Digital Library

[24]

Melanie Herschel, Ralf Diestelk"a mper, and Houssem Ben Lahmar. 2017. A Survey on Provenance: What for? What form? What from? VLDB J., Vol. 26, 6 (2017), 881--906. https://doi.org/10.1007/s00778-017-0486--1

Digital Library

[25]

Adam Karczmarz, Tomasz P. Michalak, Anish Mukherjee, Piotr Sankowski, and Piotr Wygocki. 2022. Improved Feature Importance Computation for Tree Models based on the Banzhaf Value. In Proc. of UAI (PMLR, Vol. 180). PMLR, 969--979. https://proceedings.mlr.press/v180/karczmarz22a.html

[26]

Majd Khalil and Benny Kimelfeld. 2023. The Complexity of the Shapley Value for Regular Path Queries. In Proc. of ICDT (LIPIcs, Vol. 255). 11:1--11:19. https://doi.org/10.4230/LIPIcs.ICDT.2023.11

[27]

Werner Kirsch and Jessica Langner. 2010. Power Indices and Minimal Winning Coalitions. Soc. Choice Welf., Vol. 34, 1 (2010), 33--46. https://doi.org/10.1007/s00355-009-0387--3

[28]

Seokki Lee, Bertram Lud"a scher, and Boris Glavic. 2020. Approximate Summaries for Why and Why-not Provenance. Proc. of VLDB Endow. (PVLDB), Vol. 13, 6 (2020), 912--924. https://doi.org/10.14778/3380750.3380760

Digital Library

[29]

Ehud Lehrer. 1988. An Axiomatization of the Banzhaf Value. Int. J. Game Theory, Vol. 17 (1988), 89--99.

Digital Library

[30]

Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley Value of Tuples in Query Answering. In Proc. of ICDT (LIPIcs, Vol. 155). 20:1--20:19. https://arxiv.org/abs/1904.08679

[31]

Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021a. Query Games in Databases. SIGMOD Rec., Vol. 50, 1 (2021), 78--85. https://doi.org/10.1145/3471485.3471504

Digital Library

[32]

Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021b. The Shapley Value of Tuples in Query Answering. Log. Methods Comput. Sci., Vol. 17, 3 (2021). https://doi.org/10.46298/lmcs-17(3:22)2021

[33]

Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proc. of NeurIPS. 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

[34]

Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. Proc. of VLDB Endow. (PVLDB), Vol. 4, 1 (2010), 34--45.

Digital Library

[35]

Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. 2011. Bringing Provenance to Its Full Potential Using Causal Reasoning. In TaPP, Peter Buneman and Juliana Freire (Eds.). USENIX Association.

[36]

Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and Explanations in Databases. Proc. of VLDB Endow. (PVLDB), Vol. 7, 13 (2014), 1715--1716. http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf

Digital Library

[37]

Zhengjie Miao, Qitian Zeng, Boris Glavic, and Sudeepa Roy. 2019a. Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances. In Proc. of SIGMOD. ACM, 485--502. https://doi.org/10.1145/3299869.3300066

Digital Library

[38]

Zhengjie Miao, Qitian Zeng, Chenjie Li, Boris Glavic, Oliver Kennedy, and Sudeepa Roy. 2019b. CAPE: Explaining Outliers by Counterbalancing. Proc. VLDB Endow., Vol. 12, 12 (2019), 1806--1809. https://doi.org/10.14778/3352063.3352071

Digital Library

[39]

Dan Olteanu and Jiewen Huang. 2008. hrefhttps://www.cs.ox.ac.uk/people/dan.olteanu/papers/oh-sum08.pdfUsing OBDDs for Efficient Query Evaluation on Probabilistic Databases. In Proc. of SUM. Springer, 326--340.

[40]

Dan Olteanu, Jiewen Huang, and Christoph Koch. 2010. Approximate Confidence Computation in Probabilistic Databases. In Proc. of ICDE. 145--156.

[41]

Dan Olteanu and Hongkai Wen. 2012. Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms. In Proc. of ICDE. 282--293. https://doi.org/10.1109/ICDE.2012.61

Digital Library

[42]

L. S. Penrose. 1946. The Elementary Statistics of Majority Voting. Journal of the Royal Statistical Society, Vol. 109, 1 (1946), 53--57. http://www.jstor.org/stable/2981392

[43]

Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries. In Proc. of PODS. 285--297. https://doi.org/10.1145/3375395.3387664

Digital Library

[44]

Mustapha Ridaoui, Michel Grabisch, and Christophe Labreuche. 2018. An Axiomatisation of the Banzhaf Value and Interaction Index for Multichoice Games. In MDAI. https://shs.hal.science/halshs-02381119

[45]

Babak Salimi, Leopoldo E. Bertossi, Dan Suciu, and Guy Van den Broeck. 2016. Quantifying Causal Effects on Query Answering in Databases. In Proc. of TaPP. USENIX Association. http://web.cs.ucla.edu/ guyvdb/papers/SalimiTaPP16.pdf

[46]

Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & analgesia, Vol. 126, 5 (2018), 1763--1768.

[47]

Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient based on Kendall's tau. Journal of the American statistical association, Vol. 63, 324 (1968), 1379--1389.

[48]

Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. Provsql: Provenance and Probability Management in PostgreSQL. Proc. of VLDB Endow. (PVLDB), Vol. 11, 12 (2018), 2034--2037. https://hal.inria.fr/hal-01851538/file/p976-senellart.pdf

Digital Library

[49]

Lloyd S Shapley. 1953. A Value for n-Person Games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317. http://www.library.fa.ru/files/Roth2.pdf#page=39

[50]

Philip D Strafiin Jr. 1988. The Shapley-Shubik and Banzhaf Power Indices as Probabilities. The Shapley value: essays in honor of Lloyd S. Shapley (1988), 71.

[51]

Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016

[52]

Jianyuan Sun, Guoqiang Zhong, Kaizhu Huang, and Junyu Dong. 2018. Banzhaf Random Forests: Cooperative Game Theory Based Random Forests with Consistency. Neural Networks, Vol. 106 (2018), 20--29.

Digital Library

[53]

Rene Van den Brink and Gerard Van der Laan. 1998. Axiomatizations of the Normalized Banzhaf Value and the Shapley Value. Social Choice and Welfare, Vol. 15 (1998), 567--582.

[54]

Diego Varela and Javier Prado-Dominguez. 2012. Negotiating the Lisbon Treaty: Redistribution, Efficiency and Power Indices. Czech Economic Review, Vol. 6, 2 (July 2012), 107--124.

[55]

Jiachen T. Wang and Ruoxi Jia. 2023. Data Banzhaf: A Robust Data Valuation Framework for Machine Learning. In Proc. of AISTATS. 6388--6421. io

Cited By

Bertossi LAzua F(2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669843
Karmakar PMonet MSenellart PBressan S(2024)Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic DatabasesProceedings of the ACM on Management of Data10.1145/36515932:2(1-26)Online publication date: 14-May-2024
https://doi.org/10.1145/3651593

Index Terms

Banzhaf Values for Facts in Query Answering
1. Information systems
  1. Data management systems
    1. Database design and models
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Data provenance

Recommendations

Selecting Materialized Views Based on Top-k Query Algorithm for Lineage Tracing
GCIS '12: Proceedings of the 2012 Third Global Congress on Intelligent Systems

Lineage tracing queries help to locate updated views quickly in data warehouse. Materialized views can improve the efficiency of the data lineage tracing and view maintenance. This paper, a method to select materialized views using Top-k query algorithm ...
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly ...
Computing the Shapley Value of Facts in Query Answering
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 3

SIGMOD

June 2024

1953 pages

EISSN:2836-6573

DOI:10.1145/3670010

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024

Published in PACMMOD Volume 2, Issue 3

Author Tags

Qualifiers

Research-article

Funding Sources

UZH Global Strategy and Partnerships Funding Scheme
European Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
227
Total Downloads

Downloads (Last 12 months)227
Downloads (Last 6 weeks)36

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bertossi LAzua F(2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669843
Karmakar PMonet MSenellart PBressan S(2024)Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic DatabasesProceedings of the ACM on Management of Data10.1145/36515932:2(1-26)Online publication date: 14-May-2024
https://doi.org/10.1145/3651593

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents