Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SkinnerDB: regret-bounded query evaluation via reinforcement learning

Published: 01 August 2018 Publication History

Abstract

Robust query optimization becomes illusory in the presence of correlated predicates or user-defined functions. Occasionally, the query optimizer will choose join orders whose execution time is by many orders of magnitude higher than necessary. We present SkinnerDB, a novel database management system that is designed from the ground up for reliable optimization and robust performance.
SkinnerDB implements several adaptive query processing strategies based on reinforcement learning. We divide the execution of a query into small time periods in which different join orders are executed. Thereby, we converge to optimal join orders with regret bounds, meaning that the expected difference between actual execution time and time for an optimal join order is bounded. To the best of our knowledge, our execution strategies are the first to provide comparable formal guarantees. SkinnerDB can be used as a layer on top of any existing database management system. We use optimizer hints to force existing systems to try out different join orders, carefully restricting execution time per join order and data batch via timeouts. We choose timeouts according to an iterative scheme that balances execution time over different timeouts to guarantee bounded regret. Alternatively, SkinnerDB can be used as a standalone, featuring an execution engine that is tailored to the requirements of join order learning. In particular, we use a specialized multi-way join algorithm and a concise tuple representation to facilitate fast switches between join orders. In our demonstration, we let participants experiment with different query types and databases. We visualize the learning process and compare against baselines.

References

[1]
A. Aboulnaga, P. Haas, M. Kandil, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, and V. Raman. Automated statistics collection in DB2 UDB. In PVLDB, pages 1169--1180, 2004.
[2]
M. Akdere and U. Cetintemel. Learning-based query performance modeling and predection. In ICDE, pages 390--401, 2011.
[3]
K. H. Alyoubi. Database query optimisation based on measures of regret. PhD thesis, 2016.
[4]
K. H. Alyoubi, S. Helmer, and P. T. Wood. Ordering selection operators under partial ignorance. In CIKM, pages 1521--1530, 2015.
[5]
R. Avnur and J. Hellerstein. Eddies: continuously adaptive query processing. In SIGMOD, pages 261--272, 2000.
[6]
B. Babcock and S. Chaudhuri. Towards a robust query optimizer: a principled and practical approach. In SIGMOD, pages 119--130, 2005.
[7]
S. Babu, P. Bizarro, and D. DeWitt. Proactive re-optimization. In SIGMOD, pages 107--118, 2005.
[8]
P. Boncz, K. M.L., and S. Mangegold. Breaking the memory wall in MonetDB. CACM, 51(12):77--85, 2008.
[9]
N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD, pages 263--274, 2002.
[10]
S. Chaudhuri and V. Narasayya. Automating statistics management for query optimizers. In ICDE, pages 7--20, 2001.
[11]
H. D., P. N. Darera, and J. R. Haritsa. Identifying robust plans through plan diagram reduction. PVLDB, 1(1):1124--1140, 2008.
[12]
A. Deshpande. An initial study of overheads of eddies. SIGMOD Record, 33(1):44--49, 2004.
[13]
A. Deshpande, Z. Ives, and V. Raman. Adaptive Query Processing. Foundations and Trends® in, 1(1):1--140, 2006.
[14]
J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance prediction for concurrent database workloads. In SIGMOD, pages 337--348, 2011.
[15]
A. Dutt. QUEST : An exploratory approach to robust query processing. PVLDB, 7(13):5--8, 2014.
[16]
A. Dutt and J. Haritsa. Plan bouquets: query processing without selectivity estimation. In SIGMOD, pages 1039--1050, 2014.
[17]
A. El-Helw, I. F. Ilyas, and C. Zuzarte. StatAdvisor: recommending statistical views. PVLDB, 2(2):1306--1317, 2009.
[18]
S. Ewen, M. Ortega-Binderberger, and V. Markl. A learning optimizer for a federated database management system. Informatik - Forschung und Entwicklung, 20(3):138--151, 2005.
[19]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries - better decisions enabled by machine learning. In ICDE, pages 592--603, 2009.
[20]
S. Gelly, L. Kocsis, and M. Schoenauer. The grand challenge of computer go: monte carlo tree search and extensions. Communications of the ACM, 3:106--113, 2012.
[21]
A. Gubichev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.
[22]
P. Haas and A. Swami. Sampling-based selectivity estimation for joins using augmented frequent value statistics. In ICDE, pages 522--531, 2011.
[23]
P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. SIGMOD Rec., 21(2):341--350, 1992.
[24]
L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: a survey. JAIR, 4:237--285, 1996.
[25]
K. Karanasos, A. Balmin, M. Kutsch, F. Ozcan, V. Ercegovac, C. Xia, and J. Jackson. Dynamically optimizing queries over large scale data platforms. In SIGMOD, pages 943--954, 2014.
[26]
L. Kocsis and C. Szepesvári. Bandit based monte-carlo planning. In European Conf. on Machine Learning, pages 282--293, 2006.
[27]
J. Li, A. C. König, V. R. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for SQL queries using statistical techniques. PVLDB, 5(11):1555--1566, 2012.
[28]
R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. In SIGMOD, pages 1--11, 1990.
[29]
T. Malik and R. Burns. A black-box approach to query cardinality estimation. In CIDR, pages 56--67, 2007.
[30]
T. M. T. Malik, R. B. R. Burns, N. V. C. N. V. Chawla, and A. S. A. Szalay. Estimating query result sizes for proxy caching in scientific database federations. In ACM/IEEE SC 2006 Conference (SC'06), pages 102--115, 2006.
[31]
T. Neumann and C. Galindo-Legaria. Taking the edge off cardinality estimation errors using incremental execution. In BTW, pages 73--92, 2013.
[32]
H. Q. Ngo, E. Porat, and C. Ré. Worst-case optimal join algorithms. In PODS, pages 37--48, 2012.
[33]
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In CIDR, 2017.
[34]
A. D. Popescu, A. Balmin, V. Ercegovac, and A. Ailamaki. PREDIcT: towards predicting the runtime of large scale iterative analytics. PVLDB, 6(14):1678--1689, 2013.
[35]
PostgreSQL Global Development Group. PostgreSQL. https://www.postgresql.org/, 2017.
[36]
L. Quanzhong, S. Minglong, V. Markl, K. Beyer, L. Colby, and G. Lohman. Adaptively reordering joins during query execution. In ICDE, pages 26--35, 2007.
[37]
V. R. V. Raman, A. Deshpande, and J. Hellerstein. Using state modules for adaptive query processing. In ICDE, pages 353--364, 2003.
[38]
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO -DB2's LEarning Optimizer. In PVLDB, pages 19--28, 2001.
[39]
TPC. TPC-H Benchmark, 2013.
[40]
K. Tzoumas, T. Sellis, and C. S. Jensen. A reinforcement learning approach for adaptive query processing. Technical report, 2008.
[41]
S. D. Viglas, J. F. Naughton, and J. Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In PVLDB, pages 285--296, 2003.
[42]
W. Wu, J. F. Naughton, and H. Singh. Sampling-based query re-optimization. In SIGMOD, pages 1721--1736, 2016.

Cited By

View all
  • (2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
  • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
  • Show More Cited By

Index Terms

  1. SkinnerDB: regret-bounded query evaluation via reinforcement learning
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 11, Issue 12
      August 2018
      426 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 August 2018
      Published in PVLDB Volume 11, Issue 12

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
      • (2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
      • (2023)Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and AnalysisProceedings of the VLDB Endowment10.14778/3611479.361150116:11(2962-2975)Online publication date: 24-Aug-2023
      • (2023)Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data LearningCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589681(239-251)Online publication date: 4-Jun-2023
      • (2023)Auto-WLM: Machine Learning Enhanced Workload Management in Amazon RedshiftCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589677(225-237)Online publication date: 4-Jun-2023
      • (2022)HMABProceedings of the VLDB Endowment10.14778/3565816.356582416:2(216-229)Online publication date: 1-Oct-2022
      • (2022)SIFTERProceedings of the VLDB Endowment10.14778/3561261.356126916:1(90-98)Online publication date: 16-Nov-2022
      • (2022)BaoACM SIGMOD Record10.1145/3542700.354270351:1(6-13)Online publication date: 1-Jun-2022
      • (2021)MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement LearningProceedings of the 25th International Database Engineering & Applications Symposium10.1145/3472163.3472176(56-64)Online publication date: 14-Jul-2021
      • (2021)Bao: Making Learned Query Optimization PracticalProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452838(1275-1288)Online publication date: 9-Jun-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media