research-article

Open access

SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning

Authors:

Immanuel Trummer,

Samuel Moseley,

Joseph Antonakakis,

Ankush RayabhariAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 46, Issue 3

Article No.: 9, Pages 1 - 45

https://doi.org/10.1145/3464389

Published: 28 September 2021 Publication History

All formats PDF

Abstract

SkinnerDB uses reinforcement learning for reliable join ordering, exploiting an adaptive processing engine with specialized join algorithms and data structures. It maintains no data statistics and uses no cost or cardinality models. Also, it uses no training workloads nor does it try to link the current query to seemingly similar queries in the past. Instead, it uses reinforcement learning to learn optimal join orders from scratch during the execution of the current query. To that purpose, it divides the execution of a query into many small time slices. Different join orders are tried in different time slices. SkinnerDB merges result tuples generated according to different join orders until a complete query result is obtained. By measuring execution progress per time slice, it identifies promising join orders as execution proceeds.

Along with SkinnerDB, we introduce a new quality criterion for query execution strategies. We upper-bound expected execution cost regret, i.e., the expected amount of execution cost wasted due to sub-optimal join order choices. SkinnerDB features multiple execution strategies that are optimized for that criterion. Some of them can be executed on top of existing database systems. For maximal performance, we introduce a customized execution engine, facilitating fast join order switching via specialized multi-way join algorithms and tuple representations.

We experimentally compare SkinnerDB’s performance against various baselines, including MonetDB, Postgres, and adaptive processing methods. We consider various benchmarks, including the join order benchmark, TPC-H, and JCC-H, as well as benchmark variants with user-defined functions. Overall, the overheads of reliable join ordering are negligible compared to the performance impact of the occasional, catastrophic join order choice.

Supplementary Material

trummer (trummer.zip)

Supplemental movie, appendix, image and software files for, SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning

Download
36.02 KB

References

[1]

A. Aboulnaga, P. Haas, M. Kandil, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, and V. Raman. 2004. Automated statistics collection in DB2 UDB. In PVLDB. 1169–1180.

[2]

Mert Akdere and Ugur Cetintemel. 2011. Learning-based query performance modeling and predection. In ICDE. 390–401.

[3]

Khaled Hamed Alyoubi. 2016. Database Query Optimisation Based on Measures of Regret. Ph.D. Dissertation. Birkbeck University of London.

[4]

Khaled H. Alyoubi, Sven Helmer, and Peter T. Wood. 2015. Ordering selection operators under partial ignorance. In CIKM. 1521–1530.

[5]

Ron Avnur and J. M. Hellerstein. 2000. Eddies: Continuously adaptive query processing. In SIGMOD. 261–272.

[6]

Brian Babcock and S. Chaudhuri. 2005. Towards a robust query optimizer: A principled and practical approach. In SIGMOD. 119–130. Retrieved from http://dl.acm.org/citation.cfm?id=1066172.

[7]

Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive re-optimization. In SIGMOD. 107–118.

[8]

Peter Boncz, Angelos Christos Anatiotis, and Steffen Kläbe. 2018. JCC-H: Adding join crossing correlations with skew to TPC-H. LNCS 10661 (2018), 103–119.

[9]

Nicolas Bruno and Surajit Chaudhuri. 2002. Exploiting statistics on query expressions for optimization. In SIGMOD. 263–274.

[10]

Surajit Chaudhuri and Vivek Narasayya. 2001. Automating statistics management for query optimizers. In ICDE. 7–20.

[11]

Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products. In ICDT. 54–67.

[12]

Anne Condon, Amol Deshpande, Lisa Hellerstein, and Ning Wu. 2009. Algorithms for distributional and adversarial pipelined filter ordering problems. ACM Trans. Algor. 5, 2 (2009), 1–34.

Digital Library

[13]

Pierre-Arnaud Coquelin and Rémi Munos. 2007. Bandit algorithms for tree search. In Uncertainty in Artificial Intelligence. AUAI Press, 67–74. arxiv:arXiv:cs/0703062v1.

[14]

D. Harish, Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. PVLDB 1, 1 (2008), 1124–1140. Retrieved from http://dl.acm.org/citation.cfm?id=1453976.

Digital Library

[15]

Amol Deshpande. 2004. An initial study of overheads of eddies. SIGMOD Rec. 33, 1 (2004), 44–49.

Digital Library

[16]

Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2006. Adaptive query processing. Found. Trends Datab. 1, 1 (2006), 1–140.

[17]

Carmel Domshlak and Zohar Feldman. 2013. To UCT, or not to UCT? In SoCS. 1–8.

[18]

Jennie Duggan, Ugur Cetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance prediction for concurrent database workloads. In SIGMOD. 337–348.

[19]

Anshuman Dutt. 2014. QUEST: An exploratory approach to robust query processing. PVLDB 7, 13 (2014), 5–8.

[20]

Anshuman Dutt and Jayant Haritsa. 2014. Plan bouquets: Query processing without selectivity estimation. In SIGMOD. 1039–1050.

[21]

Amr El-Helw, Ihab F. Ilyas, and Calisto Zuzarte. 2009. StatAdvisor: Recommending statistical views. PVLDB 2, 2 (2009), 1306–1317.

Digital Library

[22]

Zohar Feldman and Carmel Domshlak. 2014. Simple regret optimization in online planning for Markov decision processes. J. Artif. Intell. Res. 51 (2014), 165–205.

[23]

Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting multiple metrics for queries—Better decisions enabled by machine learning. In ICDE. 592–603.

[24]

Sylvain Gelly, L. Kocsis, and Marc Schoenauer. 2012. The grand challenge of computer go: Monte Carlo tree search and extensions. Commun. ACM 3 (2012), 106–113.

Digital Library

[25]

Andrey Gubichev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really?PVLDB 9, 3 (2015), 204–215.

[26]

P. J. Haas and A. N. Swami. 2011. Sampling-based selectivity estimation for joins using augmented frequent value statistics. In ICDE. 522–531.

[27]

Peter J. Haas and Arun N. Swami. 1992. Sequential sampling procedures for query size estimation. SIGMOD Rec. 21, 2 (1992), 341–350.

Digital Library

[28]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from data, not from queries!PVLDB 13, 7 (2019), 992–1005.

[29]

Konstantinos Karanasos, Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, and Jesse Jackson. 2014. Dynamically optimizing queries over large scale data platforms. In SIGMOD. 943–954.

[30]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR. Retrieved from http://arxiv.org/abs/1809.00677.

[31]

Levente Kocsis and C. Szepesvári. 2006. Bandit based monte-carlo planning. In ECML. 282–293.

[32]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. Retrieved from http://arxiv.org/abs/1808.03196.

[33]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2018. QTune: A queryaware database tuning system with deep reinforcement learning. PVLDB 12, 12 (2018), 2118–2130.

Digital Library

[34]

Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust estimation of resource consumption for SQL queries using statistical techniques. PVLDB 5, 11 (2012), 1555–1566. Retrieved from http://dl.acm.org/citation.cfm?id=2350229.2350269.

Digital Library

[35]

Richard J. Lipton, Jeffrey F. Naughton, and Donovan A. Schneider. 1990. Practical selectivity estimation through adaptive sampling. In SIGMOD. 1–11.

[36]

Guy Lohman. 2014. Is query optimization a “solved” problem?SIGMOD Blog (2014). https://wp.sigmod.org/?p=1075

[37]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2018. Neo: A learned query optimizer. PVLDB 12, 11 (2018), 1705–1718.

Digital Library

[38]

Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In aiDM. 3. Retrieved from arxiv:arXiv:1803.00055v2.

[39]

Thomas Neumann and Cesar Galindo-Legaria. 2013. Taking the edge off cardinality estimation errors using incremental execution. In BTW. 73–92.

[40]

Thomas Neumann and Alfons Kemper. 2015. Unnesting arbitrary queries. In BTW. 383–402.

[41]

Hung Q. Ngo, Ely Porat, and Christopher Ré. 2012. Worst-case optimal join algorithms. In PODS. 37–48.

[42]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2019. An empirical analysis of deep learning for cardinality estimation. Retrieved from http://arxiv.org/abs/1905.06425.

[43]

Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2020. QuickSel: Quick selectivity learning with mixture models. In SIGMOD. 1017–1033.

[44]

Adrian Daniel Popescu, Andrey Balmin, Vuk Ercegovac, and Anastasia Ailamaki. 2013. PREDIcT: Towards predicting the runtime of large scale iterative analytics. PVLDB 6, 14 (2013), 1678–1689.

Digital Library

[45]

Li Quanzhong, Shao Minglong, Volker Markl, Kevin Beyer, Latha Colby, and Guy Lohman. 2007. Adaptively reordering joins during query execution. In ICDE. 26–35.

[46]

Vijayshankar Raman, A. Deshpande, and J. M. Hellerstein. 2003. Using state modules for adaptive query processing. In ICDE. 353–364.

[47]

P. G. G. Selinger, M. M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access path selection in a relational database management system. In SIGMOD. 23–34.

[48]

David Silver. 2009. Reinforcement Learning and Simulation-based Search in Computer Go. Ph.D. Dissertation. University of Alberta.

[49]

Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2’s LEarning optimizer. In PVLDB. VLDB, 19–28. http://www.vldb.org/conf/2001/P019.pdf.

[50]

Ji Sun and Guoliang Li. 2020. An end-to-end learning-based cost estimator. In VLDBJ, Vol. 13. 307–319.

[51]

TPC. 2013. TPC-H Benchmark. Retrieved from http://www.tpc.org/tpch/.

[52]

Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB: Regret-bounded query evaluation via reinforcement learning. In SIGMOD. 1039–1050.

[53]

Kostas Tzoumas, Timos Sellis, and Christian S. Jensen. 2008. A Reinforcement Learning Approach for Adaptive Query Processing. Technical Report. Aalborg University.

[54]

Todd L. Veldhuizen. 2012. Leapfrog Triejoin: A worst-case optimal join algorithm.

[55]

Stratis D. Viglas, Jeffrey F. Naughton, and Josef Burger. 2003. Maximizing the output rate of multi-way join queries over streaming information sources. In PVLDB. 285–296. Retrieved from http://dl.acm.org/citation.cfm?id=1315451.1315477

[56]

Lucas Woltmann, Claudio Hartmann, Maik Thiele, and Dirk Habich. 2019. Cardinality estimation with local deep learning models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM). 1–8.

Digital Library

[57]

Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In SIGMOD. 1721–1736.

[58]

Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-LSTM for join order selection. In ICDE. 1297–1308.

[59]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In SIGMOD. 415–432.

Cited By

Lao JWang YLi YWang JZhang YCheng ZChen WTang MWang J(2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659449
Justen DRitter DFraser CLamb ALee ABodner THaddad MZeuch SMarkl VBoehm M(2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648175
Wei ZTrummer I(2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654973
Show More Cited By

Index Terms

SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Join algorithms
        Query optimization
        Query planning

Recommendations

SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

SkinnerDB is designed from the ground up for reliable join ordering. It maintains no data statistics and uses no cost or cardinality models. Instead, it uses reinforcement learning to learn optimal join orders on the fly, during the execution of the ...
SkinnerDB: regret-bounded query evaluation via reinforcement learning

Robust query optimization becomes illusory in the presence of correlated predicates or user-defined functions. Occasionally, the query optimizer will choose join orders whose execution time is by many orders of magnitude higher than necessary. We ...
Cloud Query Processing with Reinforcement Learning-Based Multi-objective Re-optimization
Model and Data Engineering
Abstract
Query processing on cloud database systems is a challenging problem due to the dynamic cloud environment. The configuration and utilization of the distributed hardware used to process queries change continuously. A query optimizer aims to generate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 46, Issue 3

September 2021

172 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/3481695

Editor:
Christopher Jermaine
Rice University, USA

Issue’s Table of Contents

Copyright © 2021 Copyright held by the owner/author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2021

Accepted: 01 May 2021

Revised: 01 February 2021

Received: 01 April 2020

Published in TODS Volume 46, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
1,537
Total Downloads

Downloads (Last 12 months)464
Downloads (Last 6 weeks)58

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lao JWang YLi YWang JZhang YCheng ZChen WTang MWang J(2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659449
Justen DRitter DFraser CLamb ALee ABodner THaddad MZeuch SMarkl VBoehm M(2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648175
Wei ZTrummer I(2024)ROME: Robust Query Optimization via Parallel Multi-Plan ExecutionProceedings of the ACM on Management of Data10.1145/36549732:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654973
Trummer I(2023)Can Large Language Models Predict Data Correlations from Column Names?Proceedings of the VLDB Endowment10.14778/3625054.362506616:13(4310-4323)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.14778/3625054.3625066
Wang JGray MTrummer IKara AOlteanu D(2023)Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement LearningProceedings of the VLDB Endowment10.14778/3611540.361162916:12(4094-4097)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611629
Yan ZUotila VLu J(2023)Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and ChallengesProceedings of the VLDB Endowment10.14778/3611540.361157616:12(3882-3885)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611576
Wang JTrummer IKara AOlteanu D(2023)ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement LearningProceedings of the VLDB Endowment10.14778/3611479.361148916:11(2805-2817)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.14778/3611479.3611489
Sun JYe FNedjah NZhang MXu D(2023)Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement LearningInternational Journal of Intelligent Systems10.1155/2023/88351112023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/8835111
Wang FYan XYiu MLI SMao ZTang B(2023)Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality EstimationProceedings of the ACM on Management of Data10.1145/35887081:1(1-25)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588708
Hoang DHuynh NNguyen DHossain ENiyato D(2023)DRL at the Application and Service LayerDeep Reinforcement Learning for Wireless Communications and Networking10.1002/9781119873747.ch8(187-212)Online publication date: 30-Jun-2023
https://doi.org/10.1002/9781119873747.ch8
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents