Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915244acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

How to Architect a Query Compiler

Published: 26 June 2016 Publication History
  • Get Citation Alerts
  • Editorial Notes

    Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here

    Abstract

    This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort.

    Supplementary Material

    ReadMe (readme.txt)
    Rights information
    Reproducibility (sigmod16_repro.zip)
    Experiments

    References

    [1]
    SC - Systems Compiler. http://data.epfl.ch/sc.
    [2]
    Y. Ahmad and C. Koch. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB, 2(2):1566--1569, 2009.
    [3]
    A. V. Aho, R. Sethi, and J. D. Ullman. Compilers, Principles, Techniques. Addison wesley, 1986.
    [4]
    A. W. Appel. SSA is functional programming. SIGPLAN notices, 33(4):17--20, 1998.
    [5]
    A. W. Appel. Compiling with continuations. Cambridge University Press, 2006.
    [6]
    M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. SIGMOD '15, pages 1383--1394, New York, NY, USA, 2015. ACM.
    [7]
    K. Asai, H. Masuhara, and A. Yonezawa. Partial evaluation of call-by-value lambda-calculus with side-effects. PEPM '97, pages 12--21. ACM, 1997.
    [8]
    S. Bauman, C. F. Bolz, R. Hirschfeld, V. Kirilichev, T. Pape, J. G. Siek, and S. Tobin-Hochstadt. Pycket: A Tracing JIT for a Functional Language. ICFP 2015, pages 22--34. ACM, 2015.
    [9]
    C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. In ICDT '90, volume 470, pages 72--88. 1990.
    [10]
    C. Binnig, S. Hildenbrand, and F. Farber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD '09, pages 283--296. ACM, 2009.
    [11]
    V. Breazu-Tannen, P. Buneman, and L. Wong. Naturally embedded query languages. Springer, 1992.
    [12]
    V. Breazu-Tannen and R. Subrahmanyam. Logical and computational aspects of programming with sets/bags/lists. Springer, 1991.
    [13]
    J. Carette, O. Kiselyov, and C.-C. Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. Journal of Functional Programming, 19(05):509--543, 2009.
    [14]
    M. M. Chakravarty, G. Keller, and P. Zadarnowski. A functional perspective on SSA optimisation algorithms. Electronic Notes in Theoretical Computer Science, 82(2):347--361, 2004.
    [15]
    D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. Gray, W. F. King III, B. G. Lindsay, R. A. Lorie, J. W. Mehl, T. G. Price, G. R. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system R. CACM, 24(10):632--646, 1981.
    [16]
    J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. VLDB '07, pages 339--350. ACM, 2007.
    [17]
    C. Click and K. D. Cooper. Combining analyses, combining optimizations. TOPLAS, 17(2):181--196, Mar. 1995.
    [18]
    D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion. from lists to streams to nothing at all. In ICFP '07, 2007.
    [19]
    A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Çetintemel, and S. B. Zdonik. Tupleware:" big" data, big analytics, small clusters. In CIDR, 2015.
    [20]
    R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451--490, 1991.
    [21]
    A. Darte. On the complexity of loop fusion. Parallel Computing, 26(9):1175 -- 1193, 2000.
    [22]
    L. Fegaras and D. Maier. Optimizing object queries using an effective calculus. ACM Trans. Database Syst., 25(4):457--516, Dec. 2000.
    [23]
    M. Felleisen. On the expressive power of programming languages. In ESOP'90, pages 134--151. Springer, 1990.
    [24]
    J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. TOPLAS, 9(3):319--349, July 1987.
    [25]
    C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In ACM Sigplan Notices, volume 28, pages 237--247. ACM, 1993.
    [26]
    F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. PLDI '05, pages 315--326.
    [27]
    A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. FPCA, pages 223--232. ACM, 1993.
    [28]
    A. J. Gill. Cheap deforestation for non-strict functional languages. PhD thesis, University of Glasgow, 1996.
    [29]
    A. Goldberg and R. Paige. Stream processing. LFP '84, pages 53--62, New York, NY, USA, 1984. ACM.
    [30]
    G. Graefe. Query evaluation techniques for large databases. CSUR, 25(2):73--169, June 1993.
    [31]
    G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994.
    [32]
    J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997.
    [33]
    T. Grust, M. Mayr, J. Rittinger, and T. Schreiber. FERRY: database-supported program execution. SIGMOD 2009, pages 1063--1066. ACM.
    [34]
    T. Grust, J. Rittinger, and T. Schreiber. Avalanche-safe LINQ compilation. PVLDB, 3(1--2):162--172, Sept. 2010.
    [35]
    T. Grust and M. Scholl. How to comprehend queries functionally. Journal of Intelligent Information Systems, 12(2--3):191--218, 1999.
    [36]
    P. Hudak. Building domain-specific embedded languages. ACM Comput. Surv., 28(4es), Dec. 1996.
    [37]
    S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, M. Kersten, et al. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.
    [38]
    N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice Hall, 1993.
    [39]
    S. Jones. Compiling haskell by program transformation: A report from the trenches. In H. Nielson, editor, Programming Languages and Systems - ESOP '96, volume 1058 of Lecture Notes in Computer Science, pages 18--44. Springer Berlin Heidelberg, 1996.
    [40]
    M. Jonnalagedda and S. Stucki. Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala, pages 41--50. ACM, 2015.
    [41]
    V. Jovanovic, A. Shaikhha, S. Stucki, V. Nikolaev, C. Koch, and M. Odersky. Yin-Yang: Concealing the deep embedding of DSLs. GPCE 2014, pages 73--82. ACM, 2014.
    [42]
    J. Kam and J. Ullman. Monotone data flow analysis frameworks. Acta Informatica, 7(3):305--317, 1977.
    [43]
    J. B. Kam and J. D. Ullman. Global data flow analysis and iterative algorithms. J. ACM, 23(1):158--171, Jan. 1976.
    [44]
    M. Karpathiotakis, I. Alagiannis, T. Heinis, M. Branco, and A. Ailamaki. Just-in-time data virtualization: Lightweight data management with vida. In CIDR, 2015.
    [45]
    R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. In ACM SIGPLAN Notices, volume 30, pages 13--22. ACM, 1995.
    [46]
    A. Kennedy. Compiling with continuations, continued. In ACM SIGPLAN Notices, volume 42, pages 177--190, 2007.
    [47]
    K. Kennedy. A survey of data flow analysis techniques. IBM Thomas J. Watson Research Division, 1979.
    [48]
    K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320. Springer Berlin Heidelberg, 1994.
    [49]
    W. Kim. On Optimizing an SQL-like Nested Query. ACM Trans. Database Syst., 7(3):443--469, Sept. 1982.
    [50]
    Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853--864, 2014.
    [51]
    Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic Synthesis of Out-of-core Algorithms. In ACM SIGMOD, pages 133--144, 2013.
    [52]
    C. Koch. Incremental query evaluation in a ring of databases. PODS 2010, pages 87--98. ACM, 2010.
    [53]
    C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.
    [54]
    C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.
    [55]
    C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDBJ, 23(2):253--278, 2014.
    [56]
    K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.
    [57]
    V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. SIGMOD '14, pages 743--754, New York, NY, USA, 2014. ACM.
    [58]
    D. Leivant. Reasoning about functional programs and complexity classes associated with type disciplines. In FOCS, pages 460--469, Nov 1983.
    [59]
    L. Libkin. Expressive Power of SQL. Theor. Comput. Sci., 296(3):379--404, Mar. 2003.
    [60]
    M. Mehta and D. J. DeWitt. Managing intra-operator parallelism in parallel database systems. In VLDB, volume 95, pages 382--394, 1995.
    [61]
    E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET Framework. SIGMOD '06, pages 706--706. ACM, 2006.
    [62]
    F. Nagel, G. Bierman, and S. D. Viglas. Code generation for efficient query processing in managed runtimes. PVLDB, 7(12):1095--1106.
    [63]
    S. Najd, S. Lindley, J. Svenningsson, and P. Wadler. Everything old is new again: Quoted domain-specific languages. PEPM 2016, pages 25--36, New York, NY, USA, 2016. ACM.
    [64]
    T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011.
    [65]
    V. Pankratius, F. Schmidt, and G. Garreton. Combining functional and imperative programming for multicore software: An empirical study evaluating Scala and Java. In ICSE 2012, pages 123--133.
    [66]
    B. C. Pierce. Types and programming languages. MIT press, 2002.
    [67]
    M. Puschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, et al. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005.
    [68]
    R. Ramakrishnan and J. Gehrke. Database Management Systems. Osborne/McGraw-Hill, 2nd edition, 2000.
    [69]
    G. Ramalingam. The undecidability of aliasing. TOPLAS, 16(5):1467--1471, Sept. 1994.
    [70]
    T. Rompf and M. Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. CACM, 55(6):121--130, June 2012.
    [71]
    B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global Value Numbers and Redundant Computations. POPL '88, pages 12--27. ACM, 1988.
    [72]
    O. Shivers and M. Might. Continuations and transducer composition. PLDI '06, pages 295--307. ACM, 2006.
    [73]
    D. G. Spampinato and M. Püschel. A basic linear algebra compiler. CGO '14, pages 23:23--23:32. ACM, 2014.
    [74]
    J. Stanier and D. Watson. Intermediate representations in imperative compilers: A survey. CSUR, 45(3):26:1--26:27, July 2013.
    [75]
    M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A Column-oriented DBMS. VLDB '05, pages 553--564. VLDB Endowment, 2005.
    [76]
    J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. ICFP '02, pages 124--132. ACM, 2002.
    [77]
    W. Taha and T. Sheard. Multi-stage programming with explicit annotations. PEPM '97, pages 203--217, NY, USA, 1997. ACM.
    [78]
    R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: A new approach to optimization. POPL '09, pages 264--276. ACM.
    [79]
    L. Torczon and K. Cooper. Engineering A Compiler. Morgan Kaufmann Publishers Inc., 2nd edition, 2011.
    [80]
    S.-A.-A. Touati and D. Barthou. On the decidability of phase orde- ring problem in optimizing compilation. In Proceedings of the 3rd Conference on Computing Frontiers, CF '06, pages 147--156, 2006.
    [81]
    Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.
    [82]
    P. Trinder. Comprehensions, a Query Notation for DBPLs. In Proc. of the 3rd DBPL workshop, DBPL3, pages 55--68, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc.
    [83]
    D. Turner. Total functional programming. Journal of Universal Computer Science, 10(7):751--768, 2004.
    [84]
    S. Viglas, G. M. Bierman, and F. Nagel. Processing Declarative Queries Through Generating Imperative Code in Managed Runtimes. IEEE Data Eng. Bull., 37(1):12--21, 2014.
    [85]
    E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. ICFP '98, pages 13--26, 1998.
    [86]
    P. Wadler. Deforestation: Transforming programs to eliminate trees. In ESOP'88, pages 344--358. Springer, 1988.
    [87]
    P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90, pages 61--78, New York, NY, USA, 1990. ACM.
    [88]
    T. Würthinger, A. Wöß, L. Stadler, G. Duboscq, D. Simon, and C. Wimmer. Self-optimizing AST Interpreters. DLS '12, pages 73--82, New York, NY, USA, 2012. ACM.
    [89]
    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. NSDI'12. USENIX Association.

    Cited By

    View all
    • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
    • (2024)WeBridge: Synthesizing Stored Procedures for Large-Scale Real-World Web ApplicationsProceedings of the ACM on Management of Data10.1145/36393192:1(1-29)Online publication date: 26-Mar-2024
    • (2024)Optimizing Nested Recursive QueriesProceedings of the ACM on Management of Data10.1145/36392712:1(1-27)Online publication date: 26-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
    June 2016
    2300 pages
    ISBN:9781450335317
    DOI:10.1145/2882903
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication Notes

    Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

    Publication History

    Published: 26 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. domain-specific languages
    2. query compilation

    Qualifiers

    • Research-article

    Funding Sources

    • ERC
    • NCCR MARVEL

    Conference

    SIGMOD/PODS'16
    Sponsor:
    SIGMOD/PODS'16: International Conference on Management of Data
    June 26 - July 1, 2016
    California, San Francisco, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)76
    • Downloads (Last 6 weeks)16
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
    • (2024)WeBridge: Synthesizing Stored Procedures for Large-Scale Real-World Web ApplicationsProceedings of the ACM on Management of Data10.1145/36393192:1(1-29)Online publication date: 26-Mar-2024
    • (2024)Optimizing Nested Recursive QueriesProceedings of the ACM on Management of Data10.1145/36392712:1(1-27)Online publication date: 26-Mar-2024
    • (2024)BIFROST: A Future Graph Database Runtime2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00448(5605-5613)Online publication date: 13-May-2024
    • (2024)Incremental Fusion: Unifying Compiled and Vectorized Query Execution2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00042(462-474)Online publication date: 13-May-2024
    • (2024)QuartetExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122841244:COnline publication date: 15-Jun-2024
    • (2023)Big Data Analytic Toolkit: A General-Purpose, Modular, and Heterogeneous Acceleration Toolkit for Data Analytical EnginesProceedings of the VLDB Endowment10.14778/3611540.361155816:12(3702-3714)Online publication date: 1-Aug-2023
    • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
    • (2023)Asymptotically Better Query Optimization Using Indexed AlgebraProceedings of the VLDB Endowment10.14778/3611479.361150516:11(3018-3030)Online publication date: 24-Aug-2023
    • (2023)Bringing Compiling Databases to RISC ArchitecturesProceedings of the VLDB Endowment10.14778/3583140.358314216:6(1222-1234)Online publication date: 1-Feb-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media