research-article

GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning

Authors:

Gabriel Campero Durand,

Marcus Pinnecke,

Mahmoud Mohsen,

David Broneske,

Maya S. Sekeran,

Fabián Rodriguez, and

Laxmi BalamiAuthors Info & Claims

aiDM'18: Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

June 2018

Article No.: 1, Pages 1 - 7

https://doi.org/10.1145/3211954.3211956

Published: 10 June 2018 Publication History

Abstract

In this paper we define a research agenda to develop a general framework supporting online autonomous tuning of data partitioning and layouts with a reinforcement learning formulation. We establish the core elements of our approach: agent, environment, action space and supporting components. Externally predicted workloads and the current physical design serve as input to our agent. The environment guides the search process by generating immediate rewards based on fresh cost estimates, for either the entirety or a sample of queries from the workload, and by deciding the possible actions given a state. This set of actions is configurable, enabling the representation of different partitioning problems. For use in an online setting the agent learns a fixed-length sequence of n actions that maximize the temporal reward for the predicted workload. Through an initial implementation we assert the feasibility of our approach. To conclude, we list open challenges for this work.

References

[1]

Sanjay Agrawal, Vivek Narasayya, and Beverly Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM, 359--370.

Digital Library

[2]

Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1103--1114.

Digital Library

[3]

Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. ACM, 583--598.

Digital Library

[4]

Tahir Azim, Manos Karpathiotakis, and Anastasia Ailamaki. 2017. ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data. Proceedings of the VLDB Endowment 11, 3 (2017).

Digital Library

[5]

Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2016. Regularized cost-model oblivious database tuning with reinforcement learning. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVIII. Springer, 96--132.

Digital Library

[6]

Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In Proceedings of the Fifth International Workshop on Testing Database Systems. ACM, 9.

Digital Library

[7]

Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment 3, 1--2 (2010), 48--57.

Digital Library

[8]

Dinesh Das, Jiaqi Yan, Mohamed Zait, Satyanarayana R Valluri, Nirav Vyas, Ramarajan Krishnamachari, Prashant Gaharwar, Jesse Kamp, and Niloy Mukherjee. 2015. Query optimization in Oracle 12c database in-memory. Proceedings of the VLDB Endowment 8, 12 (2015), 1770--1781.

Digital Library

[9]

Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. In CIDR, Conference on Innovative Data Systems Research. 195--198.

[10]

Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. Proceedings of the VLDB Endowment 4, 2 (2010), 105--116.

Digital Library

[11]

Paul R Halmos. 2017. Naive set theory. Courier Dover Publications.

[12]

Richard A Hankins and Jignesh M Patel. 2003. -Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In Proceedings 2003 VLDB Conference. Elsevier, 417--428.

Digital Library

[13]

Marina Irodova and Robert H Sloan. 2005. Reinforcement Learning and Function Approximation. In FLAIRS Conference. 455--460.

[14]

Shrainik Jain and Bill Howe. 2018. Query2Vec: NLP Meets Databases for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).

[15]

Alekh Jindal and Jens Dittrich. 2011. Relax and let the database do the partitioning online. In International Workshop on Business Intelligence for the Real-Time Enterprise. Springer, 65--80.

[16]

Alekh Jindal, Endre Palatinus, Vladimir Pavlov, and Jens Dittrich. 2013. A comparison of knives for bread slicing. Proceedings of the VLDB Endowment 6, 6 (2013), 361--372.

Digital Library

[17]

Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, and Jens Dittrich. 2011. Trojan data layouts: right shoes for a running elephant. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 21.

Digital Library

[18]

Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon. 2018. Query-based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data (SIGMOD '18). 15. https://db.cs.cmu.edu/papers/2018/mod435-maA.pdf

Digital Library

[19]

Shamkant Navathe, Stefano Ceri, Gio Wiederhold, and Jinglie Dou. 1984. Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS) 9, 4 (1984), 680--710.

Digital Library

[20]

Rimma Nehme and Nicolas Bruno. 2011. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 1137--1148.

Digital Library

[21]

Stratos Papadomanolakis and Anastassia Ailamaki. 2004. Autopart: Automating schema design for large scientific databases using data partitioning. In Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on. IEEE, 383--392.

Digital Library

[22]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Conference on Innovative Data Systems Research.

[23]

Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 61--72.

Digital Library

[24]

Marcus Pinnecke, David Broneske, Gabriel Campero Durand, and Gunter Saake. 2017. Are databases fit for hybrid workloads on GPUs? A storage engine's perspective. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 1599--1606.

[25]

Philipp Rösch, Lars Dannecker, Franz Färber, and Gregor Hackenbroich. 2012. A storage advisor for hybrid-store databases. Proceedings of the VLDB Endowment 5, 12 (2012), 1748--1758.

Digital Library

[26]

Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. arXiv preprint arXiv:1801.05643 (2018).

[27]

Liwen Sun, Michael J Franklin, Sanjay Krishnan, and Reynold S Xin. 2014. Finegrained partitioning for aggressive data skipping. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1115--1126.

Digital Library

[28]

Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.

Digital Library

Cited By

Liu PLi CChen H(2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11390-024-3538-1
Eldeeb TChen ZCidon AYang JBordawekar RShmueli OAmsterdamer YFirmani DMarcus R(2022)NeuroshardProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534908(1-12)Online publication date: 17-Jun-2022
https://dl.acm.org/doi/10.1145/3533702.3534908
Cai QCui CXiong YWang WXie ZZhang M(2022)A Survey on Deep Reinforcement Learning for Data Processing and AnalyticsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3155196(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3155196
Show More Cited By

Index Terms

GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning
1. Information systems
  1. Data management systems
    1. Database administration
      1. Autonomous database administration
    2. Database design and models
      1. Physical data models

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Read More
Evaluating cooperative-competitive dynamics with deep Q-learning
Highlights
- We address the complex problem of coordinating multiple intelligent agents within cooperative-competitive environments by using multi-agent reinforcement ...
Abstract
We model cooperative-competitive social group dynamics with multi-agent environments, specialized in cases with a large number of agents from only a few distinct types. The multi-agent optimization problems are addressed in turn with ...
Read More
Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

aiDM'18: Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

June 2018

34 pages

ISBN:9781450358514

DOI:10.1145/3211954

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Deutsche Forschungsgemeinschaft

Conference

SIGMOD/PODS '18

Sponsor:

SIGMOD

SIGMOD/PODS '18: International Conference on Management of Data

June 10, 2018

TX, Houston, USA

Acceptance Rates

aiDM'18 Paper Acceptance Rate 5 of 8 submissions, 63%;

Overall Acceptance Rate 19 of 26 submissions, 73%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
436
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)7

Other Metrics

View Author Metrics

Citations

Cited By

Liu PLi CChen H(2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11390-024-3538-1
Eldeeb TChen ZCidon AYang JBordawekar RShmueli OAmsterdamer YFirmani DMarcus R(2022)NeuroshardProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534908(1-12)Online publication date: 17-Jun-2022
https://dl.acm.org/doi/10.1145/3533702.3534908
Cai QCui CXiong YWang WXie ZZhang M(2022)A Survey on Deep Reinforcement Learning for Data Processing and AnalyticsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3155196(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3155196
Sadri ZGruenwald L(2022)A Divergent Index Advisor Using Deep Reinforcement LearningDatabase and Expert Systems Applications10.1007/978-3-031-12423-5_11(139-152)Online publication date: 29-Jul-2022
https://doi.org/10.1007/978-3-031-12423-5_11
Yan ZLu JChainani NLin C(2021)Workload-Aware Performance Tuning for Autonomous DBMSs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00267(2365-2368)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00267
Benkrid SBellatreche LMestoui YOrdonez C(2021)Towards an Adaptive Multidimensional Partitioning for Accelerating Spark SQLBig Data Analytics and Knowledge Discovery10.1007/978-3-030-86534-4_3(27-38)Online publication date: 5-Sep-2021
https://doi.org/10.1007/978-3-030-86534-4_3
Castro-Medina FRodríguez-Mazahua LLópez-Chau ACervantes JAlor-Hernández GMachorro-Cano I(2020)Application of Dynamic Fragmentation Methods in Multimedia Databases: A ReviewEntropy10.3390/e2212135222:12(1352)Online publication date: 30-Nov-2020
https://doi.org/10.3390/e22121352
Benkrid SMestoui YBellatreche LOrdonez C(2020)A Genetic Optimization Physical Planner for Big Data Warehouses2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378196(406-412)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378196
Benkrid SBellatreche L(2020)A Framework for Designing Autonomous Parallel Data WarehousesAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-38961-1_9(97-104)Online publication date: 22-Jan-2020
https://doi.org/10.1007/978-3-030-38961-1_9
Campero Durand GPiriyev RPinnecke MBroneske DGurumurthy BSaake G(2019)Automated Vertical Partitioning with Deep Reinforcement LearningNew Trends in Databases and Information Systems10.1007/978-3-030-30278-8_16(126-134)Online publication date: 1-Sep-2019
https://doi.org/10.1007/978-3-030-30278-8_16

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents