Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389707acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Functional-Style SQL UDFs With a Capital 'F'

Published: 31 May 2020 Publication History

Abstract

We advocate to express complex in-database computation using a functional style in which SQL UDFs use plain self-invocation to recurse. The resulting UDFs are concise and readable, but their run time performance on contemporary RDBMSs is sobering. This paper describes how to compile such functional-style UDFs into SQL:1999 recursive common table expressions. We build on function call graphs to build the compiler's core and to realize a series of optimizations (reference counting, memoization, exploitation of linear and tail recursion). The compiled UDFs evaluate efficiently, challenging the performance of manually tweaked (but often convoluted) SQL code. SQL UDFs can indeed be functional and fast.

Supplementary Material

Source Code (3318464.3389707_source_code.zip)
Read me (3318464.3389707_readme.pdf)
MP4 File (3318464.3389707.mp4)
Presentation Video

References

[1]
H. Abelson, G.J. Sussman, and J. Sussman. 1996. Structure and Interpretation of Computer Programs. The MIT Press.
[2]
G. Aranda, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández. 2013. R-SQL: An SQL Database System with Extended Recursion. Electronic Communications of the EASST, Vol. 64 (Sept. 2013).
[3]
R.W. Barraclough, D. Binkley, S. Danicic, M. Harman, R.M. Hierons, A. Kiss, M. Laurence, and L. Ouarbya. 2010. A Trajectory-Based Strict Semantics for Program Slicing. Theoretical Computer Science, Vol. 411, 11--13 (2010).
[4]
D.J. Berndt and J. Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. In Proceedings of the KDD Workshop. Seattle, WA, USA.
[5]
C. Binnig, R. Behrmann, F. Faerber, and R. Riewe. 2012. FunSQL: It is Time to Make SQL Functional. In Proceedings of the EDBT/ICDT DanaC Workshop. Berlin, Germany.
[6]
R.S. Bird. 1980. Tabulation Techniques for Recursive Programs. Comput. Surveys, Vol. 12, 4 (Dec. 1980).
[7]
R.S. Bird and R. Hinze. 2003. Trouble Shared is Trouble Halved. In Proceedings of the Haskell Workshop. Uppsala, Sweden.
[8]
M. Boehm, A. Kumar, and J. Yang. 2019. Data Management in Machine Learning Systems. Morgan & Claypool.
[9]
M. Chakravarty, G. Keller, and P. Zadarnowski. 2004. A Functional Perspective on SSA Optimisation Algorithms. Electronic Notes in Theoretical Computer Science, Vol. 82, 2 (April 2004).
[10]
J. Cohen, B. Dolan, M. Dunlap, J.M. Hellerstein, and C. Welton. 2009. MAD Skills: New Analysis Practices for Big Data. Proceedings of the VLDB Endowment, Vol. 2, 2 (Aug. 2009).
[11]
C. Duta, D. Hirn, and T. Grust. 2020. Compiling PL/SQL Away. In Proceedings of the 10th CIDR Conference. Amsterdam, The Netherlands.
[12]
K.V. Emani, K. Ramachandra, S. Bhattacharya, and S. Sudarshan. 2016. Extracting Equivalent SQL from Imperative Code in Database Applications. In Proceedings of the 35th SIGMOD Conference. San Francisco, CA, USA.
[13]
J. Fan, A. Gerald, S. Raj, and J.M. Patel. 2015. The Case Against Specialized Graph Analytics Engines. In Proceedings of the 7th CIDR Conference. Asilomar, CA, USA.
[14]
G. Fawaz, H.I. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller. 2019. Deep Learning for Time Series Classification: A Review. Data Mining and Knowledge Discovery, Vol. 33, 4 (July 2019).
[15]
M.L. Fredman. 1982. The Complexity of Maintaining an Array and Computing its Partial Sums. JACM, Vol. 29, 1 (Jan. 1982).
[16]
J. Gu, Y.H. Watanabe, W.A. Mazza, A. Shkapsky, M. Yang, L. Ding, and C. Zaniolo. 2019. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-Aggregate-SQL on Spark. In Proceedings of the 38th SIGMOD Conference. Amsterdam, The Netherlands.
[17]
P.J. Guo and D. Engler. 2011. Using Automatic Persistent Memoization to Facilitate Data Analysis Scripting. In Proceedings of the ISSTA Conference. Toronto, Canada.
[18]
D. Hirn and T. Grust. 2020. PL/SQL Without the PL. In Proceedings of the 39th SIGMOD Conference. Portland, OR, USA.
[19]
E. Horowitz. 1983. Fundamentals of Programming Languages. Springer.
[20]
P. Hudak, J. Hughes, S. Peyton-Jones, and P. Wadler. 2007. A History of Haskell: Being Lazy with Class. In Proceedings of the HOPL III Conference. San Diego, CA, USA.
[21]
A. Kumar, M. Boehm, and J. Yang. 2017. Data Management in Machine Learning: Challenges, Techniques, and Systems. In Proceedings of the 36th SIGMOD Conference. Chicago, IL, USA.
[22]
D. Michie. 1968. “Memo” Functions and Machine Learning. Nature, Vol. 218, 306 (April 1968).
[23]
Thomas Neumann. 2011. Efficiently Compiling Query Plans for Modern Hardware. Proceedings of the VLDB Endowment, Vol. 4, 9 (Aug. 2011).
[24]
T. Neumann and M. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In Proceedings of the 10th CIDR Conference. Amsterdam, The Netherlands.
[25]
P. Norvig. 1991. Techniques for Automatic Memoization with Applications to Context-Free Parsing. Computational Linguistics, Vol. 17, 1 (Jan. 1991).
[26]
L. Passing, M. Then, N. Hubig, H. Lang, M. Schreier, S. Günnemann, A. Kemper, and T. Neumann. 2017. SQL- and Operator-Centric Data Analytics in Relational Main-Memory Databases. In Proceedings of the 20th EDBT Conference. Venice, Italy.
[27]
K. Ramachandra, M. Chavan, R. Guravannavar, and S. Sudarshan. 2015. Program Transformations for Asynchronous and Batched Query Submission. IEEE TKDE, Vol. 27, 2 (Feb. 2015).
[28]
K. Ramachandra and K. Park. 2019. BlackMagic: Automatic Inlining of Scalar UDFs into SQL Queries with Froid. Proceedings of the VLDB Endowment, Vol. 12, 12 (Aug. 2019).
[29]
K. Ramachandra, K. Park, K.V. Emani, A. Halverson, C. Galindo-Legaria, and C. Cunningham. 2018. Froid: Optimization of Imperative Programs in a Relational Database. Proceedings of the VLDB Endowment, Vol. 11, 4 (2018).
[30]
L.A. Rowe and M. Stonebraker. 1987. The POSTGRES Data Model. In Proceedings of the 13th VLDB Conference. Brighton, UK.
[31]
G.L. Steele Jr. 1977. Debunking the “Expensive Procedure Call” Myth or, Procedure Call Implementations Considered Harmful or, LAMBDA: The Ultimate GOTO. In Proceedings of the ACM Conference. Seattle, WA, USA.
[32]
F. Tip. 1995. A Survey of Program Slicing Techniques. Journal of Programming Languages, Vol. 3, 3 (1995).
[33]
W. Wang, M. Zhang, G. Chen, H.V. Jagadish, B.C. Ooi, and K.-L. Tan. 2016. Database Meets Deep Learning: Challenges and Opportunities. ACM SIGMOD Record, Vol. 45, 2 (June 2016).
[34]
M. Weiser. 1982. Programmer Use Slices When Debugging. Commun. ACM, Vol. 25, 7 (July 1982).
[35]
M. Weiser. 1984. Program Slicing. IEEE Transactions on Software Engineering, Vol. SE-10, 4 (July 1984).
[36]
C. Zaniolo, M. Yang, A. Das, A. Shkapksy, T. Condie, and M. Interlandi. 2017. Fixpoint Semantics and Optimization of Recursive Datalog Programs with Aggregates. Theory and Practice of Logic Programming, Vol. 17, 5--6 (Sept. 2017).

Cited By

View all
  • (2024)Optimizing Nested Recursive QueriesProceedings of the ACM on Management of Data10.1145/36392712:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Comparative Study of Surface Reconstruction Techniques: Traditional Approaches vs GeoUDF Methods2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10653103(581-586)Online publication date: 10-May-2024
  • (2023)User-Defined Functions in Modern Data Engines2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00276(3593-3598)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. SQL
  2. call graph
  3. functional programming
  4. memoization
  5. recursion
  6. user-defined functions

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)23
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Nested Recursive QueriesProceedings of the ACM on Management of Data10.1145/36392712:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Comparative Study of Surface Reconstruction Techniques: Traditional Approaches vs GeoUDF Methods2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10653103(581-586)Online publication date: 10-May-2024
  • (2023)User-Defined Functions in Modern Data Engines2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00276(3593-3598)Online publication date: Apr-2023
  • (2022)BabelfishProceedings of the VLDB Endowment10.14778/3489496.348950115:2(196-210)Online publication date: 4-Feb-2022
  • (2022)Another way to implement complex computationsProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3546930.3547508(1-7)Online publication date: 12-Jun-2022
  • (2022)Functional Programming on Top of SQL EnginesPractical Aspects of Declarative Languages10.1007/978-3-030-94479-7_5(59-78)Online publication date: 17-Jan-2022
  • (2021)Procedural extensions of SQLProceedings of the VLDB Endowment10.14778/3457390.345740214:8(1378-1391)Online publication date: 21-Oct-2021
  • (2021)One WITH RECURSIVE is Worth Many GOTOsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457272(723-735)Online publication date: 9-Jun-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media