Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

LEO: An autonomic query optimizer for DB2

Published: 01 January 2003 Publication History

Abstract

Structured Query Language (SQL) has emerged as an industry standard for querying relational database management systems, largely because a user need only specify what data are wanted, not the details of how to access those data. A query optimizer uses a mathematical model of query execution to determine automatically the best way to access and process any given SQL query. This model is heavily dependent upon the optimizer's estimates for the number of rows that will result at each step of the query execution plan (QEP), especially for complex queries involving many predicates and/or operations. These estimates rely upon statistics on the database and modeling assumptions that may or may not be true for a given database. In this paper, we discuss an autonomic query optimizer that automatically self-validates its model without requiring any user interaction to repair incorrect statistics or cardinality estimates. By monitoring queries as they execute, the autonomic optimizer compares the optimizer's estimates with actual cardinalities at each step in a QEP, and computes adjustments to its estimates that may be used during future optimizations of similar queries. Moreover, the detection of estimation errors can also trigger reoptimization of a query in mid-execution. The autonomic refinement of the optimizer's model can result in a reduction of query execution time by orders of magnitude at negligible additional run-time cost. We discuss various research issues and practical considerations that were addressed during our implementation of a first prototype of LEO, a LEarning Optimizer for DB2 ® (Database 2 TM ) that learns table access cardinalities and for future queries corrects the estimation error for simple predicates by adjusting the database statistics of DB2.

Supplementary Material

XML File (sj4213.xml)

References

[1]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price, "Access Path Selection in a Relational Database Management System," Proceedings of the ACM SIGMOD International Conference on Management of Data, Boston, MA, May 1979, ACM, New York (1979), pp. 23-24.
[2]
A. Van Gelder, "Multiple Join Size Estimation by Virtual Domains," Proceedings of the Twelfth ACM Symposium on Principles of Database Systems (May 1993), pp. 180-189.
[3]
A. N. Swami and K. B. Schiefer, "On the Estimation of Join Result Sizes," 4th International Conference on Extending Database Technology (March 1994), pp. 287-300.
[4]
R. Ahad, K. V. B. Rao, and D. McLeod, "On Estimating the Cardinality of the Projection of a Database Relation," ACM Transactions on Database Systems14, No. 1, pp. 28-40 (1989).
[5]
C. Lynch, "Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distributions of Column Values," Proceedings of the 14th International Conference on Very Large Databases (August 1988), pp. 240-251.
[6]
Y. E. Ioannidis and S. Christodoulakis, "On the Propagation of Errors in the Size of Join Results," Proceedings of the ACM SIGMOD International Conference on Management of Data, Denver, CO, May 1991, ACM, New York (1991), pp. 268-277.
[7]
V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita, "Improved Histograms for Selectivity Estimation of Range Predicates," Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996, ACM, New York (1996), pp. 294-305.
[8]
V. Poosala and Y. Ioannidis, "Selectivity Estimation Without the Attribute Value Independence Assumption," Proceedings of the 23rd International Conference on Very Large Databases (VLDB 1997).
[9]
P. Haas, J. Naughton, S. Seshadri, and A. Swami, Selectivity and Cost Estimation for Joins Based on Random Sampling, Research Report RJ-9577, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 (1993).
[10]
T. Urhan, M. J. Franklin, and L. Amsaleg, "Cost-Based Query Scrambling for Initial Delays," Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, WA, June 1998, ACM, New York (1998), pp. 130-141.
[11]
M. Stillger, G. Lohman, V. Markl, and M. Kandil, "LEO--DB2's Learning Optimizer," Proceedings of the 27th International Conference on Very Large Databases (September 2001), pp. 19-28.
[12]
N. Kabra and D. DeWitt, "Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans," Proceedings of the ACM SIGMOD International Conference on Management of Data (June 1998), pp. 106-117.
[13]
N. Roussopoulos, "Materialized Views and Data Warehouses," SIGMOD Record27, No. 1, 21-26, ACM, New York (1998).
[14]
M. Zaharioudakis, R. Cochrane, G. Lapis, H. Pirahesh, and M. Urata, "Answering Complex SQL Queries Using Automatic Summary Tables," Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, TX, May 2000, ACM, New York (2000), pp. 105-116.
[15]
S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim, "Optimizing Queries with Materialized Views," Proceedings of the Eleventh International Conference on Data Engineering (March 1995), pp. 190-220.
[16]
R. Chirkova, A. Y. Halevy, and D. Suciu, "A Formal Perspective on the View Selection Problem," Proceedings of the 27th International Conference on Very Large Databases (September 2001), pp. 59-68.
[17]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by Simulated Annealing," Science220, No. 4598, 671-680 (May 1983).
[18]
V. Markl and G. M. Lohman, "Learning Table Access Cardinalities with LEO," Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, June 2002, ACM, New York (2002), p. 613.

Cited By

View all
  • (2024)Modeling Shifting Workloads for Learned Database SystemsProceedings of the ACM on Management of Data10.1145/36392932:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • (2022)SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based ApplicationsAlgorithmic Aspects of Cloud Computing10.1007/978-3-031-33437-5_2(21-41)Online publication date: 6-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IBM Systems Journal
IBM Systems Journal  Volume 42, Issue 1
January 2003
195 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 January 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Modeling Shifting Workloads for Learned Database SystemsProceedings of the ACM on Management of Data10.1145/36392932:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • (2022)SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based ApplicationsAlgorithmic Aspects of Cloud Computing10.1007/978-3-031-33437-5_2(21-41)Online publication date: 6-Sep-2022
  • (2022)Query Optimization in NoSQL Databases Using an Enhanced Localized R-tree IndexInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_33(391-398)Online publication date: 28-Nov-2022
  • (2021)COMPASS: Online Sketch-based Query Optimization for In-Memory DatabasesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452840(804-816)Online publication date: 9-Jun-2021
  • (2017)Optimization of Disjunctive Predicates for Main Memory Column StoresProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064022(731-744)Online publication date: 9-May-2017
  • (2016)Optimization of conjunctive predicates for main memory column storesProceedings of the VLDB Endowment10.14778/2994509.29945299:12(1125-1136)Online publication date: 1-Aug-2016
  • (2015)Organic databasesInternational Journal of Computational Science and Engineering10.1504/IJCSE.2015.07265111:3(270-283)Online publication date: 1-Oct-2015
  • (2015)Workload-Driven Antijoin Cardinality EstimationACM Transactions on Database Systems10.1145/281817840:3(1-41)Online publication date: 23-Oct-2015
  • (2012)ChimeraProceedings of the 21st USENIX conference on Security symposium10.5555/2362793.2362812(19-19)Online publication date: 8-Aug-2012
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media