Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

Published: 12 December 2023 Publication History

Abstract

What is a minimal set of tuples to delete from a database in order to eliminate all query answers? This problem is called "the resilience of a query" and is one of the key algorithmic problems underlying various forms of reverse data management, such as view maintenance, deletion propagation and causal responsibility. A long-open question is determining the conjunctive queries (CQs) for which resilience can be solved in PTIME.
We shed new light on this problem by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both previously studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (all CQs under set or bag semantics). It is also unified in that all queries and all database instances are treated with the same approach, yet the algorithm is guaranteed to terminate in PTIME for all known PTIME cases. In particular, we prove that for all known easy cases, the optimal solution to our ILP is identical to a simpler Linear Programming (LP) relaxation, which implies that standard ILP solvers return the optimal solution to the original ILP in PTIME.
Our approach allows us to explore new variants and obtain new complexity results. 1) It works under bag semantics, for which we give the first dichotomy results in the problem space. 2) We extend our approach to the related problem of causal responsibility and give a more fine-grained analysis of its complexity. 3) We recover easy instances for generally hard queries, including instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture about a unified hardness criterion from PODS 2020 and prove the hardness of several queries of previously unknown complexity. 5) Experiments confirm that our findings accurately predict the asymptotic running times, and that our universal ILP is at times even quicker than a previously proposed dedicated flow algorithm.

Supplemental Material

MP4 File
Presentation Video

References

[1]
Karen Aardal, George L Nemhauser, and Robert Weismantel. 2005. Handbooks in Operations Research and Management Science: Discrete Optimization. Elsevier. https://doi.org/10.1016/s0927-0507(05)x1200--2
[2]
Tobias Achterberg, Robert E Bixby, Zonghao Gu, Edward Rothberg, and Dieter Weninger. 2020. Presolve reductions in mixed integer programming. INFORMS Journal on Computing, Vol. 32, 2 (2020), 473--506. https://doi.org/10.1287/ijoc.2018.0857
[3]
Albert Atserias and Phokion G Kolaitis. 2022. Structure and complexity of bag consistency. ACM SIGMOD Record, Vol. 51, 1 (2022), 78--85. https://doi.org/10.1145/3542700.3542719
[4]
Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. 1983. On the Desirability of Acyclic Database Schemes. J. ACM, Vol. 30, 3 (July 1983), 479--513. https://doi.org/10.1145/2402.322389
[5]
Leopoldo Bertossi. 2021. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowledge and Information Systems, Vol. 63, 1 (2021), 199--231. https://doi.org/10.1007/s10115-020-01516--6
[6]
Manuel Bodirsky and Carsten Lutz. 2023. The Complexity of Resilience Problems via Valued Constraint Satisfaction Problems. (2023). arxiv: 2309.15654 [math.LO] https://arxiv.org/abs/2309.15654
[7]
Béla Bollobás. 1998. Modern graph theory. Vol. 184. Springer Science & Business Media. https://doi.org/10.1007/978--1--4612-0619--4
[8]
Matteo Brucato, Azza Abouzied, and Alexandra Meliou. 2019. Scalable computation of high-order optimization queries. Commun. ACM, Vol. 62, 2 (2019), 108--116. https://doi.org/10.1145/3299881
[9]
Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT. 316--330. https://doi.org/10.1007/3--540--44503-x_20
[10]
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2002. On Propagation of Deletions and Annotations Through Views. In PODS. 150--158. https://doi.org/10.1145/543613.543633
[11]
Peter Buneman and Wang-Chiew Tan. 2007. Provenance in Databases. In SIGMOD. 1171--1173. https://doi.org/10.1145/1247480.1247646
[12]
Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon. 2022. Linear programs with conjunctive queries. (2022). https://doi.org/10.4230/LIPIcs.ICDT.2022.5
[13]
Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC. 77--90. https://doi.org/10.1145/800105.803397
[14]
Surajit Chaudhuri and Moshe Y Vardi. 1993. Optimization of real conjunctive queries. In PODS. 59--70. https://doi.org/10.1145/153850.153856
[15]
James Cheney, Laura Chiticariu, and Wang Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, Vol. 1, 4 (2009), 379--474. https://doi.org/10.1561/9781601982339
[16]
Hana Chockler and Joseph Y. Halpern. 2004. Responsibility and Blame: A Structural-Model Approach. J. Artif. Intell. Res. (JAIR), Vol. 22 (2004), 93--115. https://doi.org/10.1613/jair.1391
[17]
Michael B Cohen, Yin Tat Lee, and Zhao Song. 2021. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), Vol. 68, 1 (2021), 1--39. https://doi.org/10.1145/3424305
[18]
Michele Conforti, Gérard Cornuéjols, and Kristina Vuvs ković. 2006. Balanced matrices. Discrete Mathematics, Vol. 306, 19--20 (2006), 2411--2437. https://doi.org/10.1016/j.disc.2005.12.033
[19]
Gérard Cornuéjols and Bertrand Guenin. 2002. Ideal clutters. Discrete Applied Mathematics, Vol. 123, 1--3 (2002), 303--338. https://doi.org/10.1016/S0166--218X(01)00344--4
[20]
Yves Crama and Peter L. Hammer. 2011. Boolean Functions: Theory, Algorithms, and Applications. Cambridge University Press. https://doi.org/10.1017/cbo9780511852008.003
[21]
Nilesh N. Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J., Vol. 16, 4 (2007), 523--544. https://doi.org/10.1007/s00778-006-0004--3
[22]
Nilesh N. Dalvi and Dan Suciu. 2012. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM, Vol. 59, 6 (2012), 30. https://doi.org/10.1145/2395116.2395119
[23]
Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. 2001. Complexity and Expressive Power of Logic Programming. ACM Comput. Surv., Vol. 33, 3 (2001), 374--425. https://doi.org/10.1145/502807.502810
[24]
Martin Davis, George Logemann, and Donald Loveland. 1962. A Machine Program for Theorem-Proving. Commun. ACM, Vol. 5, 7 (jul 1962), 394--397. https://doi.org/10.1145/368273.368557
[25]
Umeshwar Dayal and Philip A. Bernstein. 1982. On the Correct Translation of Update Operations on Relational Views. ACM TODS, Vol. 7, 3 (1982), 381--416. https://doi.org/10.1145/319732.319740
[26]
Thomas Eiter and Georg Gottlob. 1993. Propositional circumscription and extended closed-world reasoning are $¶i^P_2$-complete. Theoretical Computer Science, Vol. 114, 2 (1993), 231--245. https://doi.org/10.1016/0304--3975(93)90073--3
[27]
Thomas Eiter and Georg Gottlob. 1995. On the computational cost of disjunctive logic programming: Propositional case. Annals of Mathematics and Artificial Intelligence, Vol. 15 (1995), 289--323. https://doi.org/10.1007/bf01536399
[28]
Thomas Eiter, Georg Gottlob, and Heikki Mannila. 1997. Disjunctive Datalog. ACM Trans. Database Syst., Vol. 22, 3 (1997), 364--418. https://doi.org/10.1145/261124.261126
[29]
Thomas Eiter, Giovambattista Ianni, and Thomas Krennwallner. 2009. Answer set programming: A primer. Springer. https://doi.org/10.1007/978--3--642-03754--2_2
[30]
Thomas Eiter and Axel Polleres. 2006. Towards automated integration of guess and check programs in answer set programming: a meta-interpreter and applications. Theory and Practice of Logic Programming, Vol. 6, 1--2 (2006), 23--60. https://doi.org/10.1017/s1471068405002577
[31]
Lester Randolph Ford and Delbert R Fulkerson. 1956. Maximal flow through a network. Canadian journal of Mathematics, Vol. 8 (1956), 399--404. https://doi.org/10.4153/cjm-1956-045--5
[32]
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. 2015. The Complexity of Resilience and Responsibility for Self-Join-Free Conjunctive Queries. PVLDB, Vol. 9, 3 (2015), 180--191. http://www.vldb.org/pvldb/vol9/p180-freire.pdf
[33]
Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. 2020. New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins. In PODS. 271--284. https://doi.org/10.1145/3375395.3387647
[34]
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint meeting on foundations of software engineering. 498--510. https://doi.org/10.1145/3106237.3106277
[35]
Wolfgang Gatterbauer and Dan Suciu. 2017. Dissociation and propagation for approximate lifted inference with standard relational database management systems. VLDB J., Vol. 26, 1 (2017), 5--30. https://doi.org/10.1007/s00778-016-0434--5
[36]
Martin Gebser, Benjamin Kaufmann, Roland Kaminski, Max Ostrowski, Torsten Schaub, and Marius Schneider. 2011. Potassco: The Potsdam answer set solving collection. Ai Communications, Vol. 24, 2 (2011), 107--124. https://doi.org/10.3233/aic-2011-0491
[37]
Michael Gelfond and Yulia Kahl. 2014. Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach. Cambridge University Press. https://doi.org/10.1017/cbo9781139342124
[38]
Boris Glavic, Alexandra Meliou, and Sudeepa Roy. 2021. Trends in explanations: Understanding and debugging data-driven systems. Foundations and Trends in Databases, Vol. 11, 3 (2021). https://doi.org/10.1561/9781680838817
[39]
Martin Charles Golumbic and Vladimir Gurvich. 2011. Read-once functions. Cambridge University Press, Chapter 10. https://doi.org/10.1017/cbo9780511852008.011
[40]
Martin Charles Golumbic, Aviad Mintz, and Udi Rotics. 2006. Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees. Discrete Applied Mathematics, Vol. 154, 10 (2006), 1465--1477.
[41]
Martin Grötschel, László Lovász, Alexander Schrijver, Martin Grötschel, László Lovász, and Alexander Schrijver. 1993. The ellipsoid method. Geometric Algorithms and Combinatorial Optimization (1993), 64--101. https://doi.org/10.1007/978--3--642--78240--4_4
[42]
LLC Gurobi Optimization. 2021. Mixed-Integer Programming (MIP) -- A Primer on the Basics. https://www.gurobi.com/resource/mip-basics/
[43]
LLC Gurobi Optimization. 2022a. Gurobi Guidelines For Numerical Issues. https://www.gurobi.com/documentation/10.0/refman/guidelines_for_numerical_i.html
[44]
LLC Gurobi Optimization. 2022b. Gurobi Optimizer Reference Manual. http://www.gurobi.com
[45]
Joseph Y. Halpern and Judea Pearl. 2005 a. Causes and Explanations: A structural-model Approach. Part I: Causes. Brit. J. Phil. Sci., Vol. 56 (2005), 843--887. https://doi.org/10.1093/bjps/axi147
[46]
Joseph Y. Halpern and Judea Pearl. 2005 b. Causes and Explanations: A structural-model Approach. Part II: Explanations. Brit. J. Phil. Sci., Vol. 56 (2005), 889--911.
[47]
Melanie Herschel, Mauricio A. Hernández, and Wang Chiew Tan. 2009. Artemis: A System for Analyzing Missing Answers. PVLDB, Vol. 2, 2 (2009), 1550--1553. https://doi.org/10.14778/1687553.1687588
[48]
Xiao Hu, Shouzhuo Sun, Shweta Patwa, Debmalya Panigrahi, and Sudeepa Roy. 2020. Aggregated Deletion Propagation for Counting Conjunctive Query Answers. PVLDB, Vol. 14, 2 (2020), 228--240. https://doi.org/10.14778/3425879.3425892
[49]
Jiansheng Huang, Ting Chen, AnHai Doan, and Jeffrey F. Naughton. 2008. On the provenance of non-answers to queries over extracted data. PVLDB, Vol. 1, 1 (2008), 736--747. https://doi.org/10.14778/1453856.1453936
[50]
Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. Springer, 85--103. https://doi.org/10.1007/978--1--4684--2001--2_9
[51]
Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. 2021. Bag Query Containment and Information Theory. ACM TODS, Vol. 46, 3 (2021). https://doi.org/10.1145/3472391
[52]
Solmaz Kolahi. 2009. Functional Dependency (Encyclopedia of Database Systems). Springer, 1200--1201. https://doi.org/10.1007/978-0--387--39940--9_1247
[53]
Vladimir Kolmogorov, Andrei Krokhin, and Michal Rol'inek. 2017. The complexity of general-valued CSPs. SIAM J. Comput., Vol. 46, 3 (2017), 1087--1110. https://doi.org/10.1109/focs.2015.80
[54]
George Konstantinidis and Fabio Mogavero. 2019. Attacking Diophantus: Solving a Special Case of Bag Containment. In PODS. 399--413. https://doi.org/10.1145/3294052.3319689
[55]
Lap Chi Lau, Ramamoorthi Ravi, and Mohit Singh. 2011. Iterative methods in combinatorial optimization. Vol. 46. Cambridge University Press. https://doi.org/10.1017/cbo9780511977152
[56]
Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In CHI. 2119--2128. http://doi.acm.org/10.1145/1518701.1519023
[57]
Neha Makhija and Wolfgang Gatterbauer. 2023 a. A Unified Approach for Resilience and Causal Responsibility: Code and Experiments. https://github.com/northeastern-datalab/resilience-responsibility-ilp/
[58]
Neha Makhija and Wolfgang Gatterbauer. 2023 b. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations. (2023). arxiv: 2212.08898 [cs.DB] https://arxiv.org/abs/2212.08898
[59]
Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu. 2010a. Causality in Databases. IEEE Data Eng. Bull., Vol. 33, 3 (2010), 59--67. http://sites.computer.org/debull/A10sept/suciu.pdf
[60]
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2009. Why so? or Why no? Functional Causality for Explaining Query Answers, In 4th International Workshop on Management of Uncertain Data (MUD). CoRR, 3--17. http://arxiv.org/abs/0912.5340
[61]
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010b. The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB, Vol. 4, 1 (2010), 34--45. http://www.vldb.org/pvldb/vol4/p34-meliou.pdf
[62]
Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. 2011. Reverse Data Management. PVLDB, Vol. 4, 12 (2011), 1490--1493. http://www.vldb.org/pvldb/vol4/p1490-meliou.pdf
[63]
Alexandra Meliou and Dan Suciu. 2012. Tiresias: the database oracle for how-to queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 337--348. https://doi.org/doi/10.1145/2213836.2213875
[64]
Stuart Mitchell, Michael OSullivan, and Iain Dunning. 2011. PuLP: a linear programming toolkit for python. The University of Auckland, Auckland, New Zealand, Vol. 65 (2011). https://optimization-online.org/?p=11731
[65]
the Potsdam Answer Set Solving Collection Potassco. 2022. clingo. https://potassco.org/clingo/
[66]
Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi. 2022. Interpretable data-based explanations for fairness debugging. In SIGMOD. 247--261. https://doi.org/10.1145/3514221.3517886
[67]
Teodor C. Przymusinski. 1991. Stable Semantics for Disjunctive Programs. New Generation Computing, Vol. 9, 3--4 (1991), 401--424. https://doi.org/10.1007/BF03037171
[68]
Sudeepa Roy and Dan Suciu. 2014. A Formal Approach to Finding Explanations for Database Queries. In SIGMOD. 1579--1590. https://doi.org/10.1145/2588555.2588578
[69]
Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In SIGMOD. 793--810. https://doi.org/10.1145/3299869.3319901
[70]
Alexander Schrijver. 1998. Theory of linear and integer programming. John Wiley & Sons. https://doi.org/10.1137/1030065
[71]
Alexander Schrijver. 2003. Combinatorial optimization: polyhedra and efficiency. Algorithms and Combinatorics, Vol. 24. Springer. https://doi.org/book/9783540443896
[72]
Daniel A Schult and P Swart. 2008. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in science conferences (SciPy 2008), Vol. 2008. Pasadena, CA, 11--16. https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-08-05495
[73]
TPC-H. 2022. TPC-H Homepage. https://www.tpc.org/tpch/
[74]
Jeffrey D. Ullman. 1990. Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W. H. Freeman & Co., New York, NY, USA.
[75]
Moshe Y. Vardi. 1982. The Complexity of Relational Query Languages (Extended Abstract). In STOC. 137--146. https://doi.org/10.1145/800070.802186
[76]
Vijay V Vazirani. 2001. Approximation algorithms. Vol. 1. Springer. https://doi.org/10.1007/978--3--662-04565--7
[77]
Xiaolan Wang, Mary Feng, Yue Wang, Xin Luna Dong, and Alexandra Meliou. 2015. Error diagnosis and data profiling with data x-ray. Proceedings of the VLDB Endowment, Vol. 8, 12 (2015), 1984--1987. https://doi.org/10.14778/2824032.2824117
[78]
Xiaolan Wang, Alexandra Meliou, and Eugene Wu. 2017. QFix: Diagnosing errors through query histories. In Proceedings of the 2017 ACM International Conference on Management of Data. 1369--1384. https://doi.org/10.1145/3035918.3035925
[79]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. PVLDB, Vol. 6, 8 (2013), 553--564. https://doi.org/10.14778/2536354.2536356
[80]
Mihalis Yannakakis. 2022. Technical Perspective: Structure and Complexity of Bag Consistency. ACM SIGMOD Record, Vol. 51, 1 (2022), 77--77. https://doi.org/10.1145/3542700.3542718
[81]
Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2022. On Explaining Confounding Bias. arXiv preprint arXiv:2210.02943 (2022). https://arxiv.org/abs/2210.02943

Cited By

View all
  • (2024)The Complexity of Resilience Problems via Valued Constraint Satisfaction ProblemsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662071(1-14)Online publication date: 8-Jul-2024
  • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024

Index Terms

  1. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 4
      PACMMOD
      December 2023
      1317 pages
      EISSN:2836-6573
      DOI:10.1145/3637468
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 December 2023
      Published in PACMMOD Volume 1, Issue 4

      Author Tags

      1. causal responsibility
      2. dichotomy
      3. linear programming relaxation
      4. query explanation
      5. resilience
      6. reverse data management

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)455
      • Downloads (Last 6 weeks)82
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The Complexity of Resilience Problems via Valued Constraint Satisfaction ProblemsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662071(1-14)Online publication date: 8-Jul-2024
      • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media