Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3584372.3588670acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article
Public Access

Efficient Computation of Quantiles over Joins

Published: 18 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent dichotomy result rules out the existence of such an algorithm for a general family of queries and orders. Specifically, for acyclic JQs without self-joins, the problem becomes intractable for ordering by sum whenever we join more than two relations (and these joins are not trivial intersections). Moreover, even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ.
    In this work, we develop a new approach to solving %JQ and show how this approach allows not just to recover known results, but also generalize them and resolve open cases. Our solution uses two subroutines: The first one needs to select what we call a "pivot answer". The second subroutine partitions the space of query answers according to this pivot, and continues searching in one partition that is represented as new %JQ over a new database. For pivot selection, we develop an algorithm that works for a large class of ranking functions that are appropriately monotone. The second subroutine requires a customized construction for the specific ranking function at hand.
    We show the benefit and generality of our approach by using it to establish several new complexity results. First, we prove the tractability of min and max for all acyclic JQs, thereby resolving the above question. Second, we extend the previous %JQ dichotomy for sum to all partial sums (over all subsets of the attributes). Third, we handle the intractable cases of sum by devising a deterministic approximation scheme that applies to every acyclic JQ.

    References

    [1]
    Amir Abboud and Virginia Vassilevska Williams. 2014. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In FOCS. 434--443. https://doi.org/10.1109/FOCS.2014.53
    [2]
    Mahmoud Abo-Khamis, Sungjin Im, Benjamin Moseley, Kirk Pruhs, and Alireza Samadian. 2021. Approximate Aggregate Queries Under Additive Inequalities. In APOCS. SIAM, 85--99. https://doi.org/10.1137/1.9781611976489.7
    [3]
    Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In PODS. 13--28. https://doi.org/10.1145/2902251.2902280
    [4]
    Ilya Baran, Erik D. Demaine, and Mihai Patrascu. 2005. Subquadratic Algorithms for 3SUM. In Algorithms and Data Structures. 409--421. https://doi.org/10.1007/11534273_36
    [5]
    Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. 1973. Time bounds for selection. JCSS 7, 4 (1973), 448 -- 461. https://doi.org/10.1016/S0022-0000(73)80033--9
    [6]
    Johann Brault-Baron. 2013. De la pertinence de l'énumération: complexité en logiques propositionnelle et du premier ordre. Ph. D. Dissertation. U. de Caen. https://hal.archives-ouvertes.fr/tel-01081392
    [7]
    Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. 2023. Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries. TODS 48, 1, Article 1 (2023), 45 pages. https://doi.org/10.1145/3578517
    [8]
    Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, and Nicole Schweikardt. 2022. Answering (Unions of) Conjunctive Queries Using Random Access and Random-Order Enumeration. TODS 47, 3, Article 9 (2022), 49 pages. https://doi.org/10.1145/3531055
    [9]
    Johannes Doleschal, Noa Bratman, Benny Kimelfeld, and Wim Martens. 2021. The Complexity of Aggregates over Extractions by Regular Expressions. In ICDT, Vol. 186. 10:1--10:20. https://doi.org/10.4230/LIPIcs.ICDT.2021.10
    [10]
    Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. JCSS 66, 4 (2003), 614--656. https://doi.org/10.1016/S0022-0000(03)00026--6
    [11]
    Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Min. Knowl. Discov. 1, 1 (1997), 29--53. https://doi.org/10.1023/A:1009726021843
    [12]
    Paulo Jesus, Carlos Baquero, and Paulo Sergio Almeida. 2015. A Survey of Distributed Data Aggregation Algorithms. IEEE Communications Surveys & Tutorials 17, 1 (2015), 381--404. https://doi.org/10.1109/COMST.2014.2354398
    [13]
    Donald B Johnson and Tetsuo Mizoguchi. 1978. Selecting the Kth element in X + Y and X1 + X2 + · · · + Xm . SIAM J. Comput. 7, 2 (1978), 147--153. https://doi.org/10.1137/0207013
    [14]
    Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. 2019. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In ICDT, Vol. 127. 21:1--21:19. https://doi.org/10.4230/LIPIcs.ICDT.2019.21
    [15]
    Benny Kimelfeld and Yehoshua Sagiv. 2006. Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries. In International Workshop on Next Generation Information Technologies and Systems (NGITS). 141--152. https://doi.org/10.1007/11780991_13
    [16]
    Andrea Lincoln, Virginia Vassilevska Williams, and R. Ryan Williams. 2018. Tight Hardness for Shortest Cycles and Paths in Sparse Graphs. In SODA. 1236--1252. https://doi.org/10.1137/1.9781611975031.80
    [17]
    Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. 1998. Approximate Medians and other Quantiles in One Pass and with Limited Memory. In SIGMOD. 426--435. https://doi.org/10.1145/276305.276342
    [18]
    Wendy J. Myrvold and Frank Ruskey. 2001. Ranking and unranking permutations in linear time. Inf. Process. Lett. 79, 6 (2001), 281--284. https://doi.org/10.1016/S0020-0190(01)00141--7
    [19]
    Mihai Patrascu. 2010. Towards polynomial lower bounds for dynamic problems. In STOC. 603--610. https://doi.org/10.1145/1806689.1806772
    [20]
    Reinhard Pichler and Sebastian Skritek. 2013. Tractable counting of the answers to conjunctive queries. JCSS 79, 6 (2013), 984--1001. https://doi.org/10.1016/j.jcss.2013.01.012
    [21]
    John A. Rice. 2007. Mathematical Statistics and Data Analysis (3rd ed.). Duxbury Press, Belmont, CA.
    [22]
    Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. 2021. Beyond Equi-joins: Ranking, Enumeration and Factorization. PVLDB 14, 11 (2021), 2599--2612. https://doi.org/10.14778/3476249.3476306
    [23]
    Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. 2022. Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries. CoRR abs/2205.05649 (2022). https://doi.org/10.48550/arXiv.2205.05649
    [24]
    Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In VLDB. 82--94. https://dl.acm.org/doi/10.5555/1286831.1286840

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
    June 2023
    392 pages
    ISBN:9798400701276
    DOI:10.1145/3584372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. answer order
    2. approximation
    3. inequality predicates
    4. join queries
    5. median
    6. pivot
    7. quantiles
    8. ranking function

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 103
      Total Downloads
    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media