research-article

SkinnerDB: regret-bounded query evaluation via reinforcement learning

Authors:

Immanuel Trummer,

Samuel Moseley,

Joseph AntonakakisAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 11, Issue 12

Pages 2074 - 2077

https://doi.org/10.14778/3229863.3236263

Published: 01 August 2018 Publication History

Abstract

Robust query optimization becomes illusory in the presence of correlated predicates or user-defined functions. Occasionally, the query optimizer will choose join orders whose execution time is by many orders of magnitude higher than necessary. We present SkinnerDB, a novel database management system that is designed from the ground up for reliable optimization and robust performance.

SkinnerDB implements several adaptive query processing strategies based on reinforcement learning. We divide the execution of a query into small time periods in which different join orders are executed. Thereby, we converge to optimal join orders with regret bounds, meaning that the expected difference between actual execution time and time for an optimal join order is bounded. To the best of our knowledge, our execution strategies are the first to provide comparable formal guarantees. SkinnerDB can be used as a layer on top of any existing database management system. We use optimizer hints to force existing systems to try out different join orders, carefully restricting execution time per join order and data batch via timeouts. We choose timeouts according to an iterative scheme that balances execution time over different timeouts to guarantee bounded regret. Alternatively, SkinnerDB can be used as a standalone, featuring an execution engine that is tailored to the requirements of join order learning. In particular, we use a specialized multi-way join algorithm and a concise tuple representation to facilitate fast switches between join orders. In our demonstration, we let participants experiment with different query types and databases. We visualize the learning process and compare against baselines.

References

[1]

A. Aboulnaga, P. Haas, M. Kandil, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, and V. Raman. Automated statistics collection in DB2 UDB. In PVLDB, pages 1169--1180, 2004.

Digital Library

[2]

M. Akdere and U. Cetintemel. Learning-based query performance modeling and predection. In ICDE, pages 390--401, 2011.

Digital Library

[3]

K. H. Alyoubi. Database query optimisation based on measures of regret. PhD thesis, 2016.

[4]

K. H. Alyoubi, S. Helmer, and P. T. Wood. Ordering selection operators under partial ignorance. In CIKM, pages 1521--1530, 2015.

Digital Library

[5]

R. Avnur and J. Hellerstein. Eddies: continuously adaptive query processing. In SIGMOD, pages 261--272, 2000.

Digital Library

[6]

B. Babcock and S. Chaudhuri. Towards a robust query optimizer: a principled and practical approach. In SIGMOD, pages 119--130, 2005.

Digital Library

[7]

S. Babu, P. Bizarro, and D. DeWitt. Proactive re-optimization. In SIGMOD, pages 107--118, 2005.

Digital Library

[8]

P. Boncz, K. M.L., and S. Mangegold. Breaking the memory wall in MonetDB. CACM, 51(12):77--85, 2008.

Digital Library

[9]

N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD, pages 263--274, 2002.

Digital Library

[10]

S. Chaudhuri and V. Narasayya. Automating statistics management for query optimizers. In ICDE, pages 7--20, 2001.

Digital Library

[11]

H. D., P. N. Darera, and J. R. Haritsa. Identifying robust plans through plan diagram reduction. PVLDB, 1(1):1124--1140, 2008.

Digital Library

[12]

A. Deshpande. An initial study of overheads of eddies. SIGMOD Record, 33(1):44--49, 2004.

Digital Library

[13]

A. Deshpande, Z. Ives, and V. Raman. Adaptive Query Processing. Foundations and Trends® in, 1(1):1--140, 2006.

Digital Library

[14]

J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance prediction for concurrent database workloads. In SIGMOD, pages 337--348, 2011.

Digital Library

[15]

A. Dutt. QUEST : An exploratory approach to robust query processing. PVLDB, 7(13):5--8, 2014.

Digital Library

[16]

A. Dutt and J. Haritsa. Plan bouquets: query processing without selectivity estimation. In SIGMOD, pages 1039--1050, 2014.

Digital Library

[17]

A. El-Helw, I. F. Ilyas, and C. Zuzarte. StatAdvisor: recommending statistical views. PVLDB, 2(2):1306--1317, 2009.

Digital Library

[18]

S. Ewen, M. Ortega-Binderberger, and V. Markl. A learning optimizer for a federated database management system. Informatik - Forschung und Entwicklung, 20(3):138--151, 2005.

[19]

A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries - better decisions enabled by machine learning. In ICDE, pages 592--603, 2009.

Digital Library

[20]

S. Gelly, L. Kocsis, and M. Schoenauer. The grand challenge of computer go: monte carlo tree search and extensions. Communications of the ACM, 3:106--113, 2012.

Digital Library

[21]

A. Gubichev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.

Digital Library

[22]

P. Haas and A. Swami. Sampling-based selectivity estimation for joins using augmented frequent value statistics. In ICDE, pages 522--531, 2011.

Digital Library

[23]

P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. SIGMOD Rec., 21(2):341--350, 1992.

Digital Library

[24]

L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: a survey. JAIR, 4:237--285, 1996.

[25]

K. Karanasos, A. Balmin, M. Kutsch, F. Ozcan, V. Ercegovac, C. Xia, and J. Jackson. Dynamically optimizing queries over large scale data platforms. In SIGMOD, pages 943--954, 2014.

Digital Library

[26]

L. Kocsis and C. Szepesvári. Bandit based monte-carlo planning. In European Conf. on Machine Learning, pages 282--293, 2006.

Digital Library

[27]

J. Li, A. C. König, V. R. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for SQL queries using statistical techniques. PVLDB, 5(11):1555--1566, 2012.

Digital Library

[28]

R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. In SIGMOD, pages 1--11, 1990.

Digital Library

[29]

T. Malik and R. Burns. A black-box approach to query cardinality estimation. In CIDR, pages 56--67, 2007.

[30]

T. M. T. Malik, R. B. R. Burns, N. V. C. N. V. Chawla, and A. S. A. Szalay. Estimating query result sizes for proxy caching in scientific database federations. In ACM/IEEE SC 2006 Conference (SC'06), pages 102--115, 2006.

Digital Library

[31]

T. Neumann and C. Galindo-Legaria. Taking the edge off cardinality estimation errors using incremental execution. In BTW, pages 73--92, 2013.

[32]

H. Q. Ngo, E. Porat, and C. Ré. Worst-case optimal join algorithms. In PODS, pages 37--48, 2012.

Digital Library

[33]

A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In CIDR, 2017.

[34]

A. D. Popescu, A. Balmin, V. Ercegovac, and A. Ailamaki. PREDIcT: towards predicting the runtime of large scale iterative analytics. PVLDB, 6(14):1678--1689, 2013.

Digital Library

[35]

PostgreSQL Global Development Group. PostgreSQL. https://www.postgresql.org/, 2017.

[36]

L. Quanzhong, S. Minglong, V. Markl, K. Beyer, L. Colby, and G. Lohman. Adaptively reordering joins during query execution. In ICDE, pages 26--35, 2007.

[37]

V. R. V. Raman, A. Deshpande, and J. Hellerstein. Using state modules for adaptive query processing. In ICDE, pages 353--364, 2003.

[38]

M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO -DB2's LEarning Optimizer. In PVLDB, pages 19--28, 2001.

Digital Library

[39]

TPC. TPC-H Benchmark, 2013.

[40]

K. Tzoumas, T. Sellis, and C. S. Jensen. A reinforcement learning approach for adaptive query processing. Technical report, 2008.

[41]

S. D. Viglas, J. F. Naughton, and J. Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In PVLDB, pages 285--296, 2003.

Digital Library

[42]

W. Wu, J. F. Naughton, and H. Singh. Sampling-based query re-optimization. In SIGMOD, pages 1721--1736, 2016.

Digital Library

Cited By

Chu DPanchapakesan RLaddad SKatahanas LLiu CShivakumar KCrooks NHellerstein JHoward H(2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639257
Wu ZMarcus RLiu ZNegi PNathan VPfeil PSaxena GRahman MNarayanaswamy BKraska TBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653391
Zhang YChronis YPatel JRekatsinas T(2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611501
Show More Cited By

Index Terms

SkinnerDB: regret-bounded query evaluation via reinforcement learning
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

SkinnerDB is designed from the ground up for reliable join ordering. It maintains no data statistics and uses no cost or cardinality models. Instead, it uses reinforcement learning to learn optimal join orders on the fly, during the execution of the ...
SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning
SkinnerDB uses reinforcement learning for reliable join ordering, exploiting an adaptive processing engine with specialized join algorithms and data structures. It maintains no data statistics and uses no cost or cardinality models. Also, it uses no ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 11, Issue 12

August 2018

426 pages

ISSN:2150-8097

Editors:
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS
,
Jian Pei
Simon Fraser University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2018

Published in PVLDB Volume 11, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)11

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chu DPanchapakesan RLaddad SKatahanas LLiu CShivakumar KCrooks NHellerstein JHoward H(2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639257
Wu ZMarcus RLiu ZNegi PNathan VPfeil PSaxena GRahman MNarayanaswamy BKraska TBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653391
Zhang YChronis YPatel JRekatsinas T(2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611501
Mozafari BBurcuta RCabrera AConstantin AFrancis DGrömling DJindal AKonkolowicz MMarian Spac VPark YCarranzo RRichardson NRoy ASrivastava ATarte IWestphal BZhang CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data LearningCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589681(239-251)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589681
Saxena GRahman MChainani NLin CCaragea GChowdhury FMarcus RKraska TPandis INarayanaswamy BDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Auto-WLM: Machine Learning Enhanced Workload Management in Amazon RedshiftCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589677(225-237)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589677
Perera ROetomo BRubinstein BBorovica-Gajic R(2022)HMABProceedings of the VLDB Endowment10.14778/3565816.356582416:2(216-229)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.14778/3565816.3565824
Skitsas KPapageorgiou ITalebi MKantere VKatehakis MKarras P(2022)SIFTERProceedings of the VLDB Endowment10.14778/3561261.356126916:1(90-98)Online publication date: 16-Nov-2022
https://dl.acm.org/doi/10.14778/3561261.3561269
Marcus RNegi PMao HTatbul NAlizadeh MKraska T(2022)BaoACM SIGMOD Record10.1145/3542700.354270351:1(6-13)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1145/3542700.3542703
Sharma VDyreson CFlann N(2021)MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement LearningProceedings of the 25th International Database Engineering & Applications Symposium10.1145/3472163.3472176(56-64)Online publication date: 14-Jul-2021
https://dl.acm.org/doi/10.1145/3472163.3472176
Marcus RNegi PMao HTatbul NAlizadeh MKraska TLi GLi ZIdreos SSrivastava D(2021)Bao: Making Learned Query Optimization PracticalProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452838(1275-1288)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3452838
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents