research-article

Steno: automatic optimization of declarative queries

Authors:

Derek Gordon Murray,

Yuan YuAuthors Info & Claims

ACM SIGPLAN Notices, Volume 46, Issue 6

Pages 121 - 131

https://doi.org/10.1145/1993316.1993513

Published: 04 June 2011 Publication History

Abstract

Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, create opportunities for automatic parallelization and optimization. For example, the Language Integrated Query (LINQ) extensions to C# allow the same declarative query to process in-memory collections, and datasets that are distributed across a compute cluster. However, our experiments show that the serial performance of declarative code is several times slower than the equivalent hand-optimized code, because it is implemented using run-time abstractions---such as iterators---that incur overhead due to virtual function calls and superfluous instructions.

To address this problem, we have developed Steno, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as hand-optimized code. Steno translates a declarative LINQ query into type-specialized, inlined and loop-based imperative code. It eliminates chains of iterators from query execution, and optimizes nested queries. We have implemented Steno for uniprocessor, multiprocessor and distributed computing platforms, and show that, for a real-world distributed job, it can almost double the speed of end-to-end execution.

References

[1]

Apache Hadoop. http://hadoop.apache.org/, accessed 18th March, 2011.

[2]

G. M. Bierman, E. Meijer, and M. Torgersen. Lost in translation: Formalizing proposed extensions to C#. In Proceedings of OOPSLA, 2007.

Digital Library

[3]

P. Buneman, R. E. Frankel, and R. Nikhil. An implementation technique for database query languages. ACM Trans. Database Syst., 7 (2), 1982.

Digital Library

[4]

M. J. Cafarella and C. Ré. Manimal: Relational optimization for data-intensive programs. In Proceedings of WebDB, 2010.

Digital Library

[5]

B. Calder and D. Grunwald. Reducing indirect function call overhead in C programs. In Proceedings of POPL, 1994.

Digital Library

[6]

D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Commun. ACM, 24 (10), 1981.

Digital Library

[7]

C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In Proceedings of PLDI, 2010.

Digital Library

[8]

E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13 (6), 1970.

Digital Library

[9]

D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proceedings of ICFP, 2007.

Digital Library

[10]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of OSDI, 2004.

Digital Library

[11]

J. Dean, D. Grove, and C. Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proceedings of ECOOP, 1995.

Digital Library

[12]

D. J. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35 (6), 1992.

Digital Library

[13]

D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In Proceedings of SIGMOD, 1984.

Digital Library

[14]

D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal. The BEA/XQRL streaming XQuery processor. In Proceedings of VLDB, 2003.

Digital Library

[15]

J. C. Freytag and N. Goodman. On the translation of relational queries into iterative programs. ACM Trans. Database Syst., 14 (1), 1989.

Digital Library

[16]

J. C. Freytag and N. Goodman. Translating aggregate queries into iterative programs. In Proceedings of VLDB, 1986.

Digital Library

[17]

A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In Proceedings of FPCA, 1993.

Digital Library

[18]

P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In Proceedings of OSDI, 2010.

Digital Library

[19]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of EuroSys, 2007.

Digital Library

[20]

K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a Java Just-In-Time compiler. In Proceedings of OOPSLA, 2000.

Digital Library

[21]

M.-Y. Iu and W. Zwaenepoel. HadoopToSQL: a MapReduce query optimizer. In Proceedings of EuroSys, 2010.

Digital Library

[22]

B. Jacobs, E. Meijer, F. Piessens, and W. Schulte. Iterators revisited: proof rules and implementation, 2005.

[23]

K. Krikellas, S. D. Viglas, and M. Cintra. Generating code for holistic query evaluation. In Proceedings of ICDE, 2010.

[24]

X. Li and G. Agrawal. Efficient evaluation of XQuery over streaming data. In Proceedings of VLDB, 2005.

Digital Library

[25]

E. Meijer. Confessions of a used programming language salesman. SIGPLAN Not., 42 (10), 2007.

Digital Library

[26]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: a not-so-foreign language for data processing. In Proceedings of SIGMOD, 2008.

Digital Library

[27]

C. Reichenbach, M. G. Burke, I. Peshansky, and M. Raghavachari. Analysis of imperative XML programs. Information Systems, 34 (7), 2009.

Digital Library

[28]

J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. In Proceedings of ICFP, 2002.

Digital Library

[29]

R. Tan, P. Nagpal, and S. Miller. Automated black box testing tool for a parallel programming library. In Proceedings of ICST, 2009.

Digital Library

[30]

D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In Proceedings of ASPLOS, 2006.

Digital Library

[31]

P. Wadler. Deforestation: transforming programs to eliminate trees. In Proceedings of ESOP, 1988.

Digital Library

[32]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of OSDI, 2008.

Digital Library

[33]

Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of SOSP, 2009.

Digital Library

Cited By

Zhang CSu Z(2024)SMT2Test: From SMT Formulas to Effective Test CasesProceedings of the ACM on Programming Languages10.1145/36897198:OOPSLA2(222-245)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689719
Schiavio FBonetta DBinder W(2023)DynQ: a dynamic query engine with query-reuse capabilities embedded in a polyglot runtimeThe VLDB Journal10.1007/s00778-023-00784-232:5(1111-1135)Online publication date: 13-Mar-2023
https://doi.org/10.1007/s00778-023-00784-2
Essertel GTahboub RDecker JBrown KOlukotun KRompf TArpaci-Dusseau AVoelker G(2018)FlareProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291227(799-815)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291227
Show More Cited By

Index Terms

Steno: automatic optimization of declarative queries
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Steno: automatic optimization of declarative queries
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Operational semantics-directed compilers and machine architectures

We consider the task of automatically constructing intermediate-level machine architectures and compilers generating code for these architectures, given operational semantics for source languages. We use operational semantics in the form of abstract ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 46, Issue 6

PLDI '11

June 2011

652 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1993316

Issue’s Table of Contents

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Published in SIGPLAN Volume 46, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
634
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang CSu Z(2024)SMT2Test: From SMT Formulas to Effective Test CasesProceedings of the ACM on Programming Languages10.1145/36897198:OOPSLA2(222-245)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689719
Schiavio FBonetta DBinder W(2023)DynQ: a dynamic query engine with query-reuse capabilities embedded in a polyglot runtimeThe VLDB Journal10.1007/s00778-023-00784-232:5(1111-1135)Online publication date: 13-Mar-2023
https://doi.org/10.1007/s00778-023-00784-2
Essertel GTahboub RDecker JBrown KOlukotun KRompf TArpaci-Dusseau AVoelker G(2018)FlareProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291227(799-815)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291227
Sridhar KSakkeer MAndrews SJohnson J(2018)MPP SQL Query Optimization with RTCGBig Data Analytics10.1007/978-3-030-04780-1_16(228-249)Online publication date: 18-Dec-2018
https://dl.acm.org/doi/10.1007/978-3-030-04780-1_16
Kiselyov OBiboudis APalladinos NSmaragdakis Y(2017)Stream fusion, to completenessACM SIGPLAN Notices10.1145/3093333.300988052:1(285-299)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1145/3093333.3009880
Kiselyov OBiboudis APalladinos NSmaragdakis YCastagna GGordon A(2017)Stream fusion, to completenessProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages10.1145/3009837.3009880(285-299)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1145/3009837.3009880
Turon AVafeiadis VDreyer D(2014)GPSACM SIGPLAN Notices10.1145/2714064.266024349:10(691-707)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660243
Mitschke RErdweg SKöhler MMezini MSalvaneschi G(2014)i3QLACM SIGPLAN Notices10.1145/2714064.266024249:10(417-432)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660242
Kumar VHendren L(2014)MIX10ACM SIGPLAN Notices10.1145/2714064.266021849:10(617-636)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660218
Uhler RDave N(2014)Smten with satisfiability-based searchACM SIGPLAN Notices10.1145/2714064.266020849:10(157-176)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660208
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents