Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Steno: automatic optimization of declarative queries

Published: 04 June 2011 Publication History

Abstract

Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, create opportunities for automatic parallelization and optimization. For example, the Language Integrated Query (LINQ) extensions to C# allow the same declarative query to process in-memory collections, and datasets that are distributed across a compute cluster. However, our experiments show that the serial performance of declarative code is several times slower than the equivalent hand-optimized code, because it is implemented using run-time abstractions---such as iterators---that incur overhead due to virtual function calls and superfluous instructions.
To address this problem, we have developed Steno, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as hand-optimized code. Steno translates a declarative LINQ query into type-specialized, inlined and loop-based imperative code. It eliminates chains of iterators from query execution, and optimizes nested queries. We have implemented Steno for uniprocessor, multiprocessor and distributed computing platforms, and show that, for a real-world distributed job, it can almost double the speed of end-to-end execution.

References

[1]
Apache Hadoop. http://hadoop.apache.org/, accessed 18th March, 2011.
[2]
G. M. Bierman, E. Meijer, and M. Torgersen. Lost in translation: Formalizing proposed extensions to C#. In Proceedings of OOPSLA, 2007.
[3]
P. Buneman, R. E. Frankel, and R. Nikhil. An implementation technique for database query languages. ACM Trans. Database Syst., 7 (2), 1982.
[4]
M. J. Cafarella and C. Ré. Manimal: Relational optimization for data-intensive programs. In Proceedings of WebDB, 2010.
[5]
B. Calder and D. Grunwald. Reducing indirect function call overhead in C programs. In Proceedings of POPL, 1994.
[6]
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Commun. ACM, 24 (10), 1981.
[7]
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In Proceedings of PLDI, 2010.
[8]
E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13 (6), 1970.
[9]
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proceedings of ICFP, 2007.
[10]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of OSDI, 2004.
[11]
J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proceedings of ECOOP, 1995.
[12]
D. J. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35 (6), 1992.
[13]
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In Proceedings of SIGMOD, 1984.
[14]
D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal. The BEA/XQRL streaming XQuery processor. In Proceedings of VLDB, 2003.
[15]
J. C. Freytag and N. Goodman. On the translation of relational queries into iterative programs. ACM Trans. Database Syst., 14 (1), 1989.
[16]
J. C. Freytag and N. Goodman. Translating aggregate queries into iterative programs. In Proceedings of VLDB, 1986.
[17]
A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In Proceedings of FPCA, 1993.
[18]
P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In Proceedings of OSDI, 2010.
[19]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of EuroSys, 2007.
[20]
K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In Proceedings of OOPSLA, 2000.
[21]
M.-Y. Iu and W. Zwaenepoel. HadoopToSQL: a MapReduce query optimizer. In Proceedings of EuroSys, 2010.
[22]
B. Jacobs, E. Meijer, F. Piessens, and W. Schulte. Iterators revisited: proof rules and implementation, 2005.
[23]
K. Krikellas, S. D. Viglas, and M. Cintra. Generating code for holistic query evaluation. In Proceedings of ICDE, 2010.
[24]
X. Li and G. Agrawal. Efficient evaluation of XQuery over streaming data. In Proceedings of VLDB, 2005.
[25]
E. Meijer. Confessions of a used programming language salesman. SIGPLAN Not., 42 (10), 2007.
[26]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: a not-so-foreign language for data processing. In Proceedings of SIGMOD, 2008.
[27]
C. Reichenbach, M. G. Burke, I. Peshansky, and M. Raghavachari. Analysis of imperative XML programs. Information Systems, 34 (7), 2009.
[28]
J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In Proceedings of ICFP, 2002.
[29]
R. Tan, P. Nagpal, and S. Miller. Automated black box testing tool for a parallel programming library. In Proceedings of ICST, 2009.
[30]
D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In Proceedings of ASPLOS, 2006.
[31]
P. Wadler. Deforestation: transforming programs to eliminate trees. In Proceedings of ESOP, 1988.
[32]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of OSDI, 2008.
[33]
Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of SOSP, 2009.

Cited By

View all
  • (2024)SMT2Test: From SMT Formulas to Effective Test CasesProceedings of the ACM on Programming Languages10.1145/36897198:OOPSLA2(222-245)Online publication date: 8-Oct-2024
  • (2023)DynQ: a dynamic query engine with query-reuse capabilities embedded in a polyglot runtimeThe VLDB Journal10.1007/s00778-023-00784-232:5(1111-1135)Online publication date: 13-Mar-2023
  • (2018)FlareProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291227(799-815)Online publication date: 8-Oct-2018
  • Show More Cited By

Index Terms

  1. Steno: automatic optimization of declarative queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 6
    PLDI '11
    June 2011
    652 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1993316
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2011
      668 pages
      ISBN:9781450306638
      DOI:10.1145/1993498
      • General Chair:
      • Mary Hall,
      • Program Chair:
      • David Padua
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2011
    Published in SIGPLAN Volume 46, Issue 6

    Check for updates

    Author Tags

    1. abstract machines
    2. query optimization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media