Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3211954.3211956acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning

Published: 10 June 2018 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we define a research agenda to develop a general framework supporting online autonomous tuning of data partitioning and layouts with a reinforcement learning formulation. We establish the core elements of our approach: agent, environment, action space and supporting components. Externally predicted workloads and the current physical design serve as input to our agent. The environment guides the search process by generating immediate rewards based on fresh cost estimates, for either the entirety or a sample of queries from the workload, and by deciding the possible actions given a state. This set of actions is configurable, enabling the representation of different partitioning problems. For use in an online setting the agent learns a fixed-length sequence of n actions that maximize the temporal reward for the predicted workload. Through an initial implementation we assert the feasibility of our approach. To conclude, we list open challenges for this work.

    References

    [1]
    Sanjay Agrawal, Vivek Narasayya, and Beverly Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM, 359--370.
    [2]
    Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1103--1114.
    [3]
    Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. ACM, 583--598.
    [4]
    Tahir Azim, Manos Karpathiotakis, and Anastasia Ailamaki. 2017. ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data. Proceedings of the VLDB Endowment 11, 3 (2017).
    [5]
    Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2016. Regularized cost-model oblivious database tuning with reinforcement learning. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVIII. Springer, 96--132.
    [6]
    Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In Proceedings of the Fifth International Workshop on Testing Database Systems. ACM, 9.
    [7]
    Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment 3, 1--2 (2010), 48--57.
    [8]
    Dinesh Das, Jiaqi Yan, Mohamed Zait, Satyanarayana R Valluri, Nirav Vyas, Ramarajan Krishnamachari, Prashant Gaharwar, Jesse Kamp, and Niloy Mukherjee. 2015. Query optimization in Oracle 12c database in-memory. Proceedings of the VLDB Endowment 8, 12 (2015), 1770--1781.
    [9]
    Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. In CIDR, Conference on Innovative Data Systems Research. 195--198.
    [10]
    Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. Proceedings of the VLDB Endowment 4, 2 (2010), 105--116.
    [11]
    Paul R Halmos. 2017. Naive set theory. Courier Dover Publications.
    [12]
    Richard A Hankins and Jignesh M Patel. 2003. -Data Morphing: An Adaptive, Cache-Conscious Storage Technique. In Proceedings 2003 VLDB Conference. Elsevier, 417--428.
    [13]
    Marina Irodova and Robert H Sloan. 2005. Reinforcement Learning and Function Approximation. In FLAIRS Conference. 455--460.
    [14]
    Shrainik Jain and Bill Howe. 2018. Query2Vec: NLP Meets Databases for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).
    [15]
    Alekh Jindal and Jens Dittrich. 2011. Relax and let the database do the partitioning online. In International Workshop on Business Intelligence for the Real-Time Enterprise. Springer, 65--80.
    [16]
    Alekh Jindal, Endre Palatinus, Vladimir Pavlov, and Jens Dittrich. 2013. A comparison of knives for bread slicing. Proceedings of the VLDB Endowment 6, 6 (2013), 361--372.
    [17]
    Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, and Jens Dittrich. 2011. Trojan data layouts: right shoes for a running elephant. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 21.
    [18]
    Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon. 2018. Query-based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data (SIGMOD '18). 15. https://db.cs.cmu.edu/papers/2018/mod435-maA.pdf
    [19]
    Shamkant Navathe, Stefano Ceri, Gio Wiederhold, and Jinglie Dou. 1984. Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS) 9, 4 (1984), 680--710.
    [20]
    Rimma Nehme and Nicolas Bruno. 2011. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 1137--1148.
    [21]
    Stratos Papadomanolakis and Anastassia Ailamaki. 2004. Autopart: Automating schema design for large scientific databases using data partitioning. In Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on. IEEE, 383--392.
    [22]
    Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Conference on Innovative Data Systems Research.
    [23]
    Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 61--72.
    [24]
    Marcus Pinnecke, David Broneske, Gabriel Campero Durand, and Gunter Saake. 2017. Are databases fit for hybrid workloads on GPUs? A storage engine's perspective. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 1599--1606.
    [25]
    Philipp Rösch, Lars Dannecker, Franz Färber, and Gregor Hackenbroich. 2012. A storage advisor for hybrid-store databases. Proceedings of the VLDB Endowment 5, 12 (2012), 1748--1758.
    [26]
    Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. arXiv preprint arXiv:1801.05643 (2018).
    [27]
    Liwen Sun, Michael J Franklin, Sanjay Krishnan, and Reynold S Xin. 2014. Finegrained partitioning for aggressive data skipping. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1115--1126.
    [28]
    Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.

    Cited By

    View all
    • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
    • (2022)NeuroshardProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534908(1-12)Online publication date: 17-Jun-2022
    • (2022)A Survey on Deep Reinforcement Learning for Data Processing and AnalyticsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3155196(1-1)Online publication date: 2022
    • Show More Cited By

    Index Terms

    1. GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        aiDM'18: Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
        June 2018
        34 pages
        ISBN:9781450358514
        DOI:10.1145/3211954
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 June 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Adaptive layouts
        2. Deep Q-Learning
        3. Physical design

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        SIGMOD/PODS '18
        Sponsor:

        Acceptance Rates

        aiDM'18 Paper Acceptance Rate 5 of 8 submissions, 63%;
        Overall Acceptance Rate 19 of 26 submissions, 73%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)32
        • Downloads (Last 6 weeks)7

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
        • (2022)NeuroshardProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534908(1-12)Online publication date: 17-Jun-2022
        • (2022)A Survey on Deep Reinforcement Learning for Data Processing and AnalyticsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3155196(1-1)Online publication date: 2022
        • (2022)A Divergent Index Advisor Using Deep Reinforcement LearningDatabase and Expert Systems Applications10.1007/978-3-031-12423-5_11(139-152)Online publication date: 29-Jul-2022
        • (2021)Workload-Aware Performance Tuning for Autonomous DBMSs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00267(2365-2368)Online publication date: Apr-2021
        • (2021)Towards an Adaptive Multidimensional Partitioning for Accelerating Spark SQLBig Data Analytics and Knowledge Discovery10.1007/978-3-030-86534-4_3(27-38)Online publication date: 5-Sep-2021
        • (2020)Application of Dynamic Fragmentation Methods in Multimedia Databases: A ReviewEntropy10.3390/e2212135222:12(1352)Online publication date: 30-Nov-2020
        • (2020)A Genetic Optimization Physical Planner for Big Data Warehouses2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378196(406-412)Online publication date: 10-Dec-2020
        • (2020)A Framework for Designing Autonomous Parallel Data WarehousesAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-38961-1_9(97-104)Online publication date: 22-Jan-2020
        • (2019)Automated Vertical Partitioning with Deep Reinforcement LearningNew Trends in Databases and Information Systems10.1007/978-3-030-30278-8_16(126-134)Online publication date: 1-Sep-2019

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media