research-article

A Sampling-based Learning Framework for Big Databases

Authors:

Jingtian Zhang,

Gang ChenAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 1871 - 1881

https://doi.org/10.1145/3485447.3511991

Published: 25 April 2022 Publication History

Abstract

The autonomous database of the next generation aims to apply the reinforcement learning (RL) on tasks like query optimization and performance tuning with little or no human DBAs’ intervention. Despite the promise, to obtain a decent policy model in the domain of database optimization is still challenging — primarily due to the inherent computational overhead involved in the data hungry RL frameworks — in particular on large databases. In the line of mitigating this adverse effect, we propose Mirror in this work. The core to Mirror is a sampling process built in an RL framework together with a transferring process of the policy model from the sampled database to its original counterpart. While being conceptually simple, we identify that the policy transfer between databases involves heavy noise and prediction drifting that cannot be neglectable. Thereby we build a theoretical-guided sampling algorithm in Mirror assisted by a continuous fine-tuning module. The experiments on the PostgreSQL and an industry database PolarDB validate that Mirror has effectively reduced the computational cost while maintaining a satisfactory performance.

References

[1]

David Andre, Nir Friedman, and Ronald Parr. 1997. Generalized Prioritized Sweeping. In NIPS. 1001–1007.

[2]

Joy Arulraj, Ran Xian, Lin Ma, and Andrew Pavlo. 2019. Predictive Indexing. CoRR abs/1901.07064(2019).

[3]

Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In Proceedings of the Fifth International Workshop on Testing Database Systems, DBTest 2012, Scottsdale, AZ, USA, May 21, 2012. 9.

Digital Library

[4]

Yu Chen and Ke Yi. 2017. Two-Level Sampling for Join Size Estimation. In SIGMOD. 759–774.

[5]

Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. 666–679.

[6]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In SIGMOD, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1241–1258.

[7]

Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. PVLDB 11, 10 (2018), 1123–1136.

Digital Library

[8]

Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree - Are Hybrid Physical Designs Important?. In SIGMOD. 177–190.

[9]

Peter J. Haas and Joseph M. Hellerstein. 1999. Ripple Joins for Online Aggregation. In SIGMOD. 287–298.

[10]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531(2015). arxiv:1503.02531http://arxiv.org/abs/1503.02531

[11]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR.

[12]

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 489–504.

[13]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196(2018).

[14]

Hai Lan, Zhifeng Bao, and Yuwei Peng. 2020. An Index Advisor Using Deep Reinforcement Learning. In CIKM, Mathieu d’Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (Eds.). ACM, 2105–2108.

[15]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. PVLDB 12, 12 (2019), 2118–2130.

Digital Library

[16]

Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality estimation using neural networks. In CASCON. 53–59.

[17]

Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. 2020. Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. Proc. VLDB Endow. 13, 8 (2020), 1176–1189.

Digital Library

[18]

Horia Mania, Aurelia Guy, and Benjamin Recht. 2018. Simple random search provides a competitive approach to reinforcement learning. CoRR abs/1803.07055(2018).

[19]

Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In SIGMOD. 3:1–3:4.

[20]

Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In SIGMOD, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2789–2792.

[21]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2018. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In SIGMOD. 4:1–4:4.

[22]

Cheng Peng, Canqing Zhang, Cheng Peng, and Junfeng Man. 2017. A reinforcement learning approach to map reduce auto-configuration under networked environment. IJSN 12, 3 (2017), 135–140.

Digital Library

[23]

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. CoRR abs/1606.04671(2016).

[24]

Tim Salimans, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. CoRR abs/1703.03864(2017).

[25]

Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. 2018. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. CoRR abs/1808.07903(2018).

[26]

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In ICLR.

[27]

Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. CoRR abs/1801.05643(2018).

[28]

Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. CoRR abs/1906.02560(2019).

[29]

Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases. Proc. VLDB Endow. 12, 10 (2019), 1221–1234.

Digital Library

[30]

Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. In SIGMOD. 1153–1170.

[31]

Hado van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. In AAAI. 2094–2100.

[32]

TaiNing Wang and Chee-Yong Chan. 2020. Improved Correlated Sampling for Join Size Estimation. In ICDE. 325–336.

[33]

Tsui-Wei Weng, Pin-Yu Chen, Lam M. Nguyen, Mark S. Squillante, Ivan V. Oseledets, and Luca Daniel. 2018. PROVEN: Certifying Robustness of Neural Networks with a Probabilistic Approach. CoRR abs/1812.08329(2018).

[34]

Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane S. Boning, and Inderjit S. Dhillon. 2018. Towards Fast Computation of Certified Robustness for ReLU Networks. In ICML(Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 5273–5282.

[35]

Sai Wu, Xinyi Yu, Gang Chen, Yusong Gao, Xiaojie Feng, and Wei Cao. 2019. Progressive Neural Index Search for Database System. CoRR abs/1912.07001(2019).

[36]

Feng Yu, Wen-Chi Hou, Cheng Luo, Dunren Che, and Mengxia Zhu. 2013. CS2: a new database synopsis for query estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013. 469–480.

Digital Library

[37]

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. 2018. Efficient Neural Network Robustness Certification with General Activation Functions. CoRR abs/1811.00866(2018). arxiv:1811.00866http://arxiv.org/abs/1811.00866

[38]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In SIGMOD. 415–432.

[39]

X. Zhou, C. Chai, G. Li, and J. SUN. 2020. Database Meets Artificial Intelligence: A Survey. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. https://doi.org/10.1109/TKDE.2020.2994641

Index Terms

A Sampling-based Learning Framework for Big Databases

Index terms have been assigned to the content through auto-classification.

Recommendations

Non-Relational Databases in Big Data
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

These days' Big data is becoming a very essential component for the industries where large volume of data at very high speed is used to solve particular data problems. Generally, big data is first analyzed and then used with other available data in the ...
Enhanced abstract data types in object-relational databases

The explosion in complex multimedia content makes it crucial for database systems to support such data efficiently. This paper argues that the “blackbox” ADTs used in current object-relational systems inhibit their performance, thereby limiting their ...
DB-BERT: A Database Tuning Tool that "Reads the Manual"
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

DB-BERT is a database tuning tool that exploits information gained via natural language analysis of manuals and other relevant text documents. It uses text to identify database system parameters to tune as well as recommended parameter values. DB-BERT ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSFC
Zhejiang Provincial Natural Science Foundation
Alibaba Group through Alibaba Innovative Research (AIR) Program
Fundamental Research Funds for the Central Universities of China

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
328
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents