Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

Published: 12 December 2023 Publication History
  • Get Citation Alerts
  • Abstract

    What is a minimal set of tuples to delete from a database in order to eliminate all query answers? This problem is called "the resilience of a query" and is one of the key algorithmic problems underlying various forms of reverse data management, such as view maintenance, deletion propagation and causal responsibility. A long-open question is determining the conjunctive queries (CQs) for which resilience can be solved in PTIME.
    We shed new light on this problem by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both previously studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (all CQs under set or bag semantics). It is also unified in that all queries and all database instances are treated with the same approach, yet the algorithm is guaranteed to terminate in PTIME for all known PTIME cases. In particular, we prove that for all known easy cases, the optimal solution to our ILP is identical to a simpler Linear Programming (LP) relaxation, which implies that standard ILP solvers return the optimal solution to the original ILP in PTIME.
    Our approach allows us to explore new variants and obtain new complexity results. 1) It works under bag semantics, for which we give the first dichotomy results in the problem space. 2) We extend our approach to the related problem of causal responsibility and give a more fine-grained analysis of its complexity. 3) We recover easy instances for generally hard queries, including instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture about a unified hardness criterion from PODS 2020 and prove the hardness of several queries of previously unknown complexity. 5) Experiments confirm that our findings accurately predict the asymptotic running times, and that our universal ILP is at times even quicker than a previously proposed dedicated flow algorithm.

    References

    [1]
    Karen Aardal, George L Nemhauser, and Robert Weismantel. 2005. Handbooks in Operations Research and Management Science: Discrete Optimization. Elsevier. https://doi.org/10.1016/s0927-0507(05)x1200--2
    [2]
    Tobias Achterberg, Robert E Bixby, Zonghao Gu, Edward Rothberg, and Dieter Weninger. 2020. Presolve reductions in mixed integer programming. INFORMS Journal on Computing, Vol. 32, 2 (2020), 473--506. https://doi.org/10.1287/ijoc.2018.0857
    [3]
    Albert Atserias and Phokion G Kolaitis. 2022. Structure and complexity of bag consistency. ACM SIGMOD Record, Vol. 51, 1 (2022), 78--85. https://doi.org/10.1145/3542700.3542719
    [4]
    Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. 1983. On the Desirability of Acyclic Database Schemes. J. ACM, Vol. 30, 3 (July 1983), 479--513. https://doi.org/10.1145/2402.322389
    [5]
    Leopoldo Bertossi. 2021. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowledge and Information Systems, Vol. 63, 1 (2021), 199--231. https://doi.org/10.1007/s10115-020-01516--6
    [6]
    Manuel Bodirsky and Carsten Lutz. 2023. The Complexity of Resilience Problems via Valued Constraint Satisfaction Problems. (2023). arxiv: 2309.15654 [math.LO] https://arxiv.org/abs/2309.15654
    [7]
    Béla Bollobás. 1998. Modern graph theory. Vol. 184. Springer Science & Business Media. https://doi.org/10.1007/978--1--4612-0619--4
    [8]
    Matteo Brucato, Azza Abouzied, and Alexandra Meliou. 2019. Scalable computation of high-order optimization queries. Commun. ACM, Vol. 62, 2 (2019), 108--116. https://doi.org/10.1145/3299881
    [9]
    Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT. 316--330. https://doi.org/10.1007/3--540--44503-x_20
    [10]
    Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2002. On Propagation of Deletions and Annotations Through Views. In PODS. 150--158. https://doi.org/10.1145/543613.543633
    [11]
    Peter Buneman and Wang-Chiew Tan. 2007. Provenance in Databases. In SIGMOD. 1171--1173. https://doi.org/10.1145/1247480.1247646
    [12]
    Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon. 2022. Linear programs with conjunctive queries. (2022). https://doi.org/10.4230/LIPIcs.ICDT.2022.5
    [13]
    Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC. 77--90. https://doi.org/10.1145/800105.803397
    [14]
    Surajit Chaudhuri and Moshe Y Vardi. 1993. Optimization of real conjunctive queries. In PODS. 59--70. https://doi.org/10.1145/153850.153856
    [15]
    James Cheney, Laura Chiticariu, and Wang Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, Vol. 1, 4 (2009), 379--474. https://doi.org/10.1561/9781601982339
    [16]
    Hana Chockler and Joseph Y. Halpern. 2004. Responsibility and Blame: A Structural-Model Approach. J. Artif. Intell. Res. (JAIR), Vol. 22 (2004), 93--115. https://doi.org/10.1613/jair.1391
    [17]
    Michael B Cohen, Yin Tat Lee, and Zhao Song. 2021. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), Vol. 68, 1 (2021), 1--39. https://doi.org/10.1145/3424305
    [18]
    Michele Conforti, Gérard Cornuéjols, and Kristina Vuvs ković. 2006. Balanced matrices. Discrete Mathematics, Vol. 306, 19--20 (2006), 2411--2437. https://doi.org/10.1016/j.disc.2005.12.033
    [19]
    Gérard Cornuéjols and Bertrand Guenin. 2002. Ideal clutters. Discrete Applied Mathematics, Vol. 123, 1--3 (2002), 303--338. https://doi.org/10.1016/S0166--218X(01)00344--4
    [20]
    Yves Crama and Peter L. Hammer. 2011. Boolean Functions: Theory, Algorithms, and Applications. Cambridge University Press. https://doi.org/10.1017/cbo9780511852008.003
    [21]
    Nilesh N. Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J., Vol. 16, 4 (2007), 523--544. https://doi.org/10.1007/s00778-006-0004--3
    [22]
    Nilesh N. Dalvi and Dan Suciu. 2012. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM, Vol. 59, 6 (2012), 30. https://doi.org/10.1145/2395116.2395119
    [23]
    Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. 2001. Complexity and Expressive Power of Logic Programming. ACM Comput. Surv., Vol. 33, 3 (2001), 374--425. https://doi.org/10.1145/502807.502810
    [24]
    Martin Davis, George Logemann, and Donald Loveland. 1962. A Machine Program for Theorem-Proving. Commun. ACM, Vol. 5, 7 (jul 1962), 394--397. https://doi.org/10.1145/368273.368557
    [25]
    Umeshwar Dayal and Philip A. Bernstein. 1982. On the Correct Translation of Update Operations on Relational Views. ACM TODS, Vol. 7, 3 (1982), 381--416. https://doi.org/10.1145/319732.319740
    [26]
    Thomas Eiter and Georg Gottlob. 1993. Propositional circumscription and extended closed-world reasoning are $¶i^P_2$-complete. Theoretical Computer Science, Vol. 114, 2 (1993), 231--245. https://doi.org/10.1016/0304--3975(93)90073--3
    [27]
    Thomas Eiter and Georg Gottlob. 1995. On the computational cost of disjunctive logic programming: Propositional case. Annals of Mathematics and Artificial Intelligence, Vol. 15 (1995), 289--323. https://doi.org/10.1007/bf01536399
    [28]
    Thomas Eiter, Georg Gottlob, and Heikki Mannila. 1997. Disjunctive Datalog. ACM Trans. Database Syst., Vol. 22, 3 (1997), 364--418. https://doi.org/10.1145/261124.261126
    [29]
    Thomas Eiter, Giovambattista Ianni, and Thomas Krennwallner. 2009. Answer set programming: A primer. Springer. https://doi.org/10.1007/978--3--642-03754--2_2
    [30]
    Thomas Eiter and Axel Polleres. 2006. Towards automated integration of guess and check programs in answer set programming: a meta-interpreter and applications. Theory and Practice of Logic Programming, Vol. 6, 1--2 (2006), 23--60. https://doi.org/10.1017/s1471068405002577
    [31]
    Lester Randolph Ford and Delbert R Fulkerson. 1956. Maximal flow through a network. Canadian journal of Mathematics, Vol. 8 (1956), 399--404. https://doi.org/10.4153/cjm-1956-045--5
    [32]
    Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. 2015. The Complexity of Resilience and Responsibility for Self-Join-Free Conjunctive Queries. PVLDB, Vol. 9, 3 (2015), 180--191. http://www.vldb.org/pvldb/vol9/p180-freire.pdf
    [33]
    Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. 2020. New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins. In PODS. 271--284. https://doi.org/10.1145/3375395.3387647
    [34]
    Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint meeting on foundations of software engineering. 498--510. https://doi.org/10.1145/3106237.3106277
    [35]
    Wolfgang Gatterbauer and Dan Suciu. 2017. Dissociation and propagation for approximate lifted inference with standard relational database management systems. VLDB J., Vol. 26, 1 (2017), 5--30. https://doi.org/10.1007/s00778-016-0434--5
    [36]
    Martin Gebser, Benjamin Kaufmann, Roland Kaminski, Max Ostrowski, Torsten Schaub, and Marius Schneider. 2011. Potassco: The Potsdam answer set solving collection. Ai Communications, Vol. 24, 2 (2011), 107--124. https://doi.org/10.3233/aic-2011-0491
    [37]
    Michael Gelfond and Yulia Kahl. 2014. Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach. Cambridge University Press. https://doi.org/10.1017/cbo9781139342124
    [38]
    Boris Glavic, Alexandra Meliou, and Sudeepa Roy. 2021. Trends in explanations: Understanding and debugging data-driven systems. Foundations and Trends in Databases, Vol. 11, 3 (2021). https://doi.org/10.1561/9781680838817
    [39]
    Martin Charles Golumbic and Vladimir Gurvich. 2011. Read-once functions. Cambridge University Press, Chapter 10. https://doi.org/10.1017/cbo9780511852008.011
    [40]
    Martin Charles Golumbic, Aviad Mintz, and Udi Rotics. 2006. Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees. Discrete Applied Mathematics, Vol. 154, 10 (2006), 1465--1477.
    [41]
    Martin Grötschel, László Lovász, Alexander Schrijver, Martin Grötschel, László Lovász, and Alexander Schrijver. 1993. The ellipsoid method. Geometric Algorithms and Combinatorial Optimization (1993), 64--101. https://doi.org/10.1007/978--3--642--78240--4_4
    [42]
    LLC Gurobi Optimization. 2021. Mixed-Integer Programming (MIP) -- A Primer on the Basics. https://www.gurobi.com/resource/mip-basics/
    [43]
    LLC Gurobi Optimization. 2022a. Gurobi Guidelines For Numerical Issues. https://www.gurobi.com/documentation/10.0/refman/guidelines_for_numerical_i.html
    [44]
    LLC Gurobi Optimization. 2022b. Gurobi Optimizer Reference Manual. http://www.gurobi.com
    [45]
    Joseph Y. Halpern and Judea Pearl. 2005 a. Causes and Explanations: A structural-model Approach. Part I: Causes. Brit. J. Phil. Sci., Vol. 56 (2005), 843--887. https://doi.org/10.1093/bjps/axi147
    [46]
    Joseph Y. Halpern and Judea Pearl. 2005 b. Causes and Explanations: A structural-model Approach. Part II: Explanations. Brit. J. Phil. Sci., Vol. 56 (2005), 889--911.
    [47]
    Melanie Herschel, Mauricio A. Hernández, and Wang Chiew Tan. 2009. Artemis: A System for Analyzing Missing Answers. PVLDB, Vol. 2, 2 (2009), 1550--1553. https://doi.org/10.14778/1687553.1687588
    [48]
    Xiao Hu, Shouzhuo Sun, Shweta Patwa, Debmalya Panigrahi, and Sudeepa Roy. 2020. Aggregated Deletion Propagation for Counting Conjunctive Query Answers. PVLDB, Vol. 14, 2 (2020), 228--240. https://doi.org/10.14778/3425879.3425892
    [49]
    Jiansheng Huang, Ting Chen, AnHai Doan, and Jeffrey F. Naughton. 2008. On the provenance of non-answers to queries over extracted data. PVLDB, Vol. 1, 1 (2008), 736--747. https://doi.org/10.14778/1453856.1453936
    [50]
    Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. Springer, 85--103. https://doi.org/10.1007/978--1--4684--2001--2_9
    [51]
    Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. 2021. Bag Query Containment and Information Theory. ACM TODS, Vol. 46, 3 (2021). https://doi.org/10.1145/3472391
    [52]
    Solmaz Kolahi. 2009. Functional Dependency (Encyclopedia of Database Systems). Springer, 1200--1201. https://doi.org/10.1007/978-0--387--39940--9_1247
    [53]
    Vladimir Kolmogorov, Andrei Krokhin, and Michal Rol'inek. 2017. The complexity of general-valued CSPs. SIAM J. Comput., Vol. 46, 3 (2017), 1087--1110. https://doi.org/10.1109/focs.2015.80
    [54]
    George Konstantinidis and Fabio Mogavero. 2019. Attacking Diophantus: Solving a Special Case of Bag Containment. In PODS. 399--413. https://doi.org/10.1145/3294052.3319689
    [55]
    Lap Chi Lau, Ramamoorthi Ravi, and Mohit Singh. 2011. Iterative methods in combinatorial optimization. Vol. 46. Cambridge University Press. https://doi.org/10.1017/cbo9780511977152
    [56]
    Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In CHI. 2119--2128. http://doi.acm.org/10.1145/1518701.1519023
    [57]
    Neha Makhija and Wolfgang Gatterbauer. 2023 a. A Unified Approach for Resilience and Causal Responsibility: Code and Experiments. https://github.com/northeastern-datalab/resilience-responsibility-ilp/
    [58]
    Neha Makhija and Wolfgang Gatterbauer. 2023 b. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations. (2023). arxiv: 2212.08898 [cs.DB] https://arxiv.org/abs/2212.08898
    [59]
    Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu. 2010a. Causality in Databases. IEEE Data Eng. Bull., Vol. 33, 3 (2010), 59--67. http://sites.computer.org/debull/A10sept/suciu.pdf
    [60]
    Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2009. Why so? or Why no? Functional Causality for Explaining Query Answers, In 4th International Workshop on Management of Uncertain Data (MUD). CoRR, 3--17. http://arxiv.org/abs/0912.5340
    [61]
    Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010b. The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB, Vol. 4, 1 (2010), 34--45. http://www.vldb.org/pvldb/vol4/p34-meliou.pdf
    [62]
    Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. 2011. Reverse Data Management. PVLDB, Vol. 4, 12 (2011), 1490--1493. http://www.vldb.org/pvldb/vol4/p1490-meliou.pdf
    [63]
    Alexandra Meliou and Dan Suciu. 2012. Tiresias: the database oracle for how-to queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 337--348. https://doi.org/doi/10.1145/2213836.2213875
    [64]
    Stuart Mitchell, Michael OSullivan, and Iain Dunning. 2011. PuLP: a linear programming toolkit for python. The University of Auckland, Auckland, New Zealand, Vol. 65 (2011). https://optimization-online.org/?p=11731
    [65]
    the Potsdam Answer Set Solving Collection Potassco. 2022. clingo. https://potassco.org/clingo/
    [66]
    Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi. 2022. Interpretable data-based explanations for fairness debugging. In SIGMOD. 247--261. https://doi.org/10.1145/3514221.3517886
    [67]
    Teodor C. Przymusinski. 1991. Stable Semantics for Disjunctive Programs. New Generation Computing, Vol. 9, 3--4 (1991), 401--424. https://doi.org/10.1007/BF03037171
    [68]
    Sudeepa Roy and Dan Suciu. 2014. A Formal Approach to Finding Explanations for Database Queries. In SIGMOD. 1579--1590. https://doi.org/10.1145/2588555.2588578
    [69]
    Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In SIGMOD. 793--810. https://doi.org/10.1145/3299869.3319901
    [70]
    Alexander Schrijver. 1998. Theory of linear and integer programming. John Wiley & Sons. https://doi.org/10.1137/1030065
    [71]
    Alexander Schrijver. 2003. Combinatorial optimization: polyhedra and efficiency. Algorithms and Combinatorics, Vol. 24. Springer. https://doi.org/book/9783540443896
    [72]
    Daniel A Schult and P Swart. 2008. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in science conferences (SciPy 2008), Vol. 2008. Pasadena, CA, 11--16. https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-08-05495
    [73]
    TPC-H. 2022. TPC-H Homepage. https://www.tpc.org/tpch/
    [74]
    Jeffrey D. Ullman. 1990. Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W. H. Freeman & Co., New York, NY, USA.
    [75]
    Moshe Y. Vardi. 1982. The Complexity of Relational Query Languages (Extended Abstract). In STOC. 137--146. https://doi.org/10.1145/800070.802186
    [76]
    Vijay V Vazirani. 2001. Approximation algorithms. Vol. 1. Springer. https://doi.org/10.1007/978--3--662-04565--7
    [77]
    Xiaolan Wang, Mary Feng, Yue Wang, Xin Luna Dong, and Alexandra Meliou. 2015. Error diagnosis and data profiling with data x-ray. Proceedings of the VLDB Endowment, Vol. 8, 12 (2015), 1984--1987. https://doi.org/10.14778/2824032.2824117
    [78]
    Xiaolan Wang, Alexandra Meliou, and Eugene Wu. 2017. QFix: Diagnosing errors through query histories. In Proceedings of the 2017 ACM International Conference on Management of Data. 1369--1384. https://doi.org/10.1145/3035918.3035925
    [79]
    Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. PVLDB, Vol. 6, 8 (2013), 553--564. https://doi.org/10.14778/2536354.2536356
    [80]
    Mihalis Yannakakis. 2022. Technical Perspective: Structure and Complexity of Bag Consistency. ACM SIGMOD Record, Vol. 51, 1 (2022), 77--77. https://doi.org/10.1145/3542700.3542718
    [81]
    Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2022. On Explaining Confounding Bias. arXiv preprint arXiv:2210.02943 (2022). https://arxiv.org/abs/2210.02943

    Cited By

    View all
    • (2024)The Complexity of Resilience Problems via Valued Constraint Satisfaction ProblemsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662071(1-14)Online publication date: 8-Jul-2024
    • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024

    Index Terms

    1. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the ACM on Management of Data
        Proceedings of the ACM on Management of Data  Volume 1, Issue 4
        PACMMOD
        December 2023
        1317 pages
        EISSN:2836-6573
        DOI:10.1145/3637468
        Issue’s Table of Contents
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 December 2023
        Published in PACMMOD Volume 1, Issue 4

        Author Tags

        1. causal responsibility
        2. dichotomy
        3. linear programming relaxation
        4. query explanation
        5. resilience
        6. reverse data management

        Qualifiers

        • Research-article

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)223
        • Downloads (Last 6 weeks)53
        Reflects downloads up to

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)The Complexity of Resilience Problems via Valued Constraint Satisfaction ProblemsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662071(1-14)Online publication date: 8-Jul-2024
        • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media