Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3526184acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window Functions

Published: 11 June 2022 Publication History

Abstract

Window functions became part of the SQL standard in SQL:2003 and are widely used for data analytics: Percentiles, rankings, moving averages, running sums and local maxima are all expressed as window functions in SQL. Yet, the features offered by SQL's window functions lack composability: Framing is only available for distributive and algebraic aggregate functions, but not for holistic aggregates like percentiles and window functions like ranks. The SQL standard explicitly disallows holistic aggregates from being framed and thereby severely limits data analysts. This paper proposes to remove this restriction, thereby making window functions fully composable. The newly gained composability allows for more complex aggregates which are tricky to evaluate. The lack of subquadratic, parallel algorithms to evaluate framed holistic aggregates is probably the main objection against adding truly composable window functionality to the SQL standard. As such, this paper shows how to efficiently evaluate all window and aggregate functions from SQL:2011, except for DENSE_RANK, in combination with arbitrary window frames. This includes framed distinct aggregates, framed value functions, framed percentiles and framed ranks.

References

[1]
Arvind Arasu and Jennifer Widom. 2004. Resource Sharing in Continuous Sliding-Window Aggregates. In PVLDB. Morgan Kaufmann, 336--347. https://doi.org/10.1016/B978-012088469--8.50032--2
[2]
DuckDB authors. 2021 a. DuckDB - Window functions. https://duckdb.org/docs/sql/window_functions Retrieved June 15, 2021 from
[3]
PostgreSQL authors. 2021 b. PostgreSQL: Documentation: 14: 3.5. Window Functions. https://www.postgresql.org/docs/14/tutorial-window.html Retrieved January 12, 2022 from
[4]
PostgreSQL authors. 2022. PostgreSQL 14.1: gram.y source code. https://github.com/postgres/postgres/blob/REL_14_1/src/backend/parser/gram.y Retrieved January 12, 2022 from
[5]
SQLite authors. 2021 c. Window functions. https://www.sqlite.org/windowfunctions.html Retrieved June 15, 2021 from
[6]
Jon Louis Bentley. 1979. Decomposable Searching Problems. Inform. Process. Lett., Vol. 8, 5 (1979), 244--251. https://doi.org/10.1016/0020-0190(79)90117-0
[7]
Jon Louis Bentley. 1980. Multidimensional Divide-and-Conquer. Commun. ACM, Vol. 23, 4 (1980), 214--229. https://doi.org/10.1145/358841.358850
[8]
Guy E. Blelloch. 1990. Prefix sums and their applications. Technical Report. Carnegie Mellon University.
[9]
Gerth Stølting Brodal, Beat Gfeller, Allan Grønlund Jørgensen, and Peter Sanders. 2011. Towards optimal range medians. Theoretical Computer Science, Vol. 412, 24 (2011), 2588--2601. https://doi.org/10.1016/j.tcs.2010.05.003
[10]
Gerth Stølting Brodal and Allan Grønlund Jørgensen. 2009. Data Structures for Range Median Queries. In Proceedings of the ISAAC (Lecture Notes in Computer Science, Vol. 5878). Springer, 822--831. https://doi.org/10.1007/978--3--642--10631--6_83
[11]
Yu Cao, Chee-Yong Chan, Jie Li, and Kian-Lee Tan. 2012. Optimization of Analytic Window Functions. In PVLDB. 1244--1255. https://doi.org/10.14778/2350229.2350243
[12]
Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, and Volker Markl. 2016. Cutty: Aggregate Sharing for User-Defined Windows. In Proceedings of the CIKM. ACM, 1201--1210. https://doi.org/10.1145/2983323.2983807
[13]
Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T. Wilkinson. 2014. Linear-Space Data Structures for Range Mode Query in Arrays. Theory of Computing Systems, Vol. 55, 4 (2014), 719--741. https://doi.org/10.1007/s00224-013--9455--2
[14]
Bernard Chazelle and Leonidas J. Guibas. 1986 a. Fractional Cascading: I. A Data Structuring Technique. Algorithmica, Vol. 1, 2 (1986), 133--162. https://doi.org/10.1007/BF01840440
[15]
Bernard Chazelle and Leonidas J. Guibas. 1986 b. Fractional Cascading: II. Applications. Algorithmica, Vol. 1, 2 (1986), 163--191. https://doi.org/10.1007/BF01840441
[16]
Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient implementation of sorting on multi-core SIMD CPU architecture. In PVLDB. 1313--1324. https://doi.org/10.14778/1454159.1454171
[17]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, 3rd Edition .MIT Press. http://mitpress.mit.edu/books/introduction-algorithms
[18]
Rhys S. Francis, Ian D. Mathieson, and Linda Pannan. 1993. A Fast, Simple Algorithm to Balance a Parallel Multiway Merge. In Proceedings of PARLE (Lecture Notes in Computer Science, Vol. 694). Springer, 570--581. https://doi.org/10.1007/3--540--56891--3_46
[19]
Sariel Har-Peled and S. Muthukrishnan. 2008. Range Medians. In Proceedings of the European Symposium on Algorithms. Springer, 503--514. https://doi.org/10.1007/978--3--540--87744--8_42
[20]
Joseph M Hellerstein and Michael Stonebraker. 2005. Readings in database systems, 4th edition .MIT press.
[21]
Snowflake Inc. 2021. Window Functions - Snowflake Documentation. https://docs.snowflake.com/en/sql-reference/functions-analytic.html Retrieved June 15, 2021 from
[22]
Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In SIGMOD. ACM, 281--293. https://doi.org/10.1145/2882903.2882957
[23]
Allan Grønlund Jørgensen and Kasper Green Larsen. 2011. Range Selection and Median: Tight Cell Probe Lower Bounds and Adaptive Data Structures. In Proceedings of the SODA. ACM-SIAM, 805--813. https://doi.org/10.1137/1.9781611973082.63
[24]
André Kohn, Viktor Leis, and Thomas Neumann. 2021. Building Advanced SQL Analytics From Low-Level Plan Operators. In SIGMOD. ACM, 1001--1013. https://doi.org/10.1145/3448016.3457288
[25]
Danny Krizanc, Pat Morin, and Michiel H. M. Smid. 2003. Range Mode and Range Median Queries on Lists and Trees. In Proceedings of the ISAAC (Lecture Notes in Computer Science, Vol. 2906). Springer, 517--526. https://doi.org/10.1007/978--3--540--24587--2_53
[26]
Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD. ACM, 743--754. https://doi.org/10.1145/2588555.2610507
[27]
Viktor Leis, Kan Kundhikanjana, Alfons Kemper, and Thomas Neumann. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. In PVLDB. 1058--1069. https://doi.org/10.14778/2794367.2794375
[28]
Microsoft. 2021. OVER Clause (Transact-SQL) - SQL Server | Microsoft Docs. https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql Retrieved June 15, 2021 from
[29]
Oracle. 2021. Analytic Functions. https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions004.htm Retrieved June 15, 2021 from
[30]
Anatoli U. Shein, Panos K. Chrysanthis, and Alexandros Labrinidis. 2018. SlickDeque: High Throughput and Low Latency Incremental Sliding-Window Aggregation. In Proceedings of the EDBT. OpenProceedings.org, 397--408. https://doi.org/10.5441/002/edbt.2018.35
[31]
Ivan Smirnov. 2016. Is it possible to query number of distinct integers in a range in O(lg N)? https://stackoverflow.com/a/39797537/3043948 Retrieved September 11, 2021 from
[32]
Tableau. 2021. Table Calculation Functions. https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm Retrieved June 15, 2021 from
[33]
Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2019. Optimal and General Out-of-Order Sliding-Window Aggregation. In PVLDB. 1167--1180. https://doi.org/10.14778/3339490.3339499
[34]
Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. In PVLDB. 702--713. https://doi.org/10.14778/2752939.2752940
[35]
Simon Tatham. 2017. Counted B-Trees. https://www.chiark.greenend.org.uk/ sgtatham/algorithms/cbtree.html Retrieved January 21, 2022 from
[36]
Jonas Traub, Philipp Marian Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2021. Scotty: General and Efficient Open-source Window Aggregation for Stream Processing Systems. ACM Trans. Database Syst., Vol. 46, 1 (2021), 1:1--1:46. https://doi.org/10.1145/3433675
[37]
Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Mü hlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In Proceedings of the DBTest Workshop at SIGMOD. ACM, 1--6. https://doi.org/10.1145/3209950.3209952
[38]
Richard Wesley and Fei Xu. 2016. Incremental Computation of Common Windowed Holistic Aggregates. In PVLDB. 1221--1232. https://doi.org/10.14778/2994509.2994537

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Check for updates

Author Tags

  1. database systems
  2. holistic aggregates
  3. window functions

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1,289
    Total Downloads
  • Downloads (Last 12 months)451
  • Downloads (Last 6 weeks)70
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media