Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3545008.3545012acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning

Published: 13 January 2023 Publication History

Abstract

Modern distributed storage systems with the immense number of configurations, unpredictable workloads and difficult performance evaluation pose higher requirements to parameter tuning. Providing an automatic parameter tuning solution for distributed storage systems is in demand. Lots of researches have attempted to build automatic tuning systems based on deep reinforcement learning (RL). However, they have several limitations in the face of these requirements, including lack of parameter spaces processing, less advanced RL models and time-consuming and unstable training process. In this paper, we present and evaluate the ADSTS, which is an automatic distributed storage tuning system based on deep reinforcement learning. A general preprocessing guideline is first proposed to generate standardized tunable parameter domain. Thereinto, Recursive Stratified Sampling without the nonincremental nature is designed to sample huge parameter spaces and Lasso regression is adopted to identify important parameters. Besides, the twin-delayed deep deterministic policy gradient method is utilized to find the optimal values of tunable parameters. Finally, Multi-processing Training and Workload-directed Model Fine-tuning are adopted to accelerate the model convergence. ADSTS is implemented on Park and is used in the real-world system Ceph. The evaluation results show that ADSTS can recommend near-optimal configurations and improve system performance by 1.5 × ∼2.5 × with acceptable overheads.

References

[1]
Adnan Ahmad. 2020. YCSB on RADOS of Ceph.https://gitlab.cs.washington.edu/adnana2/ycsb/-/tree/master/rados.
[2]
Joe Arnold. 2014. Openstack swift: Using, administering, and developing for swift object storage. ” O’Reilly Media, Inc.”.
[3]
Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Ruth Aydt, Quincey Koziol, Marc Snir, 2013. Taming parallel I/O complexity with auto-tuning. In SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–12.
[4]
Zhen Cao, Geoff Kuenning, and Erez Zadok. 2020. Carver: Finding important parameters for storage system tuning. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20). 43–57.
[5]
Zhen Cao, Vasily Tarasov, Sachin Tiwari, and Erez Zadok. 2018. Towards better understanding of black-box auto-tuning: A comparative analysis for storage systems. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 893–907.
[6]
George Casella and Roger L Berger. 2021. Statistical inference. Cengage Learning.
[7]
Yu CHEN and Yingchi MAO. 2020. Automatic tuning of Ceph parameters based on random forest and genetic algorithm. Journal of Computer Applications 40, 2 (2020), 347–351.
[8]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205–220.
[9]
Stephanie Donovan, Gerrit Huizenga, Andrew J Hutton, C Craig Ross, Martin K Petersen, and Philip Schwan. 2003. Lustre: Building a file system for 1000-node clusters. In Proceedings of the Linux Symposium, Vol. 2003.
[10]
Terry L Friesz, Hsun-Jung Cho, Nihal J Mehta, Roger L Tobin, and G Anandalingam. 1992. A simulated annealing approach to the network design problem with variational inequality constraints. Transportation science 26, 1 (1992), 18–26.
[11]
Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR, 1587–1596.
[12]
hadoop 2.10.1 2022. dfs-default.xml.https://hadoop.apache.org/docs/r2.10.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.
[13]
Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: applications to nonorthogonal problems. Technometrics 12, 1 (1970), 69–82.
[14]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment 12, 12 (2019), 2118–2130.
[15]
Huiba Li, Yiming Zhang, Dongsheng Li, Zhiming Zhang, Shengyun Liu, Peng Huang, Zheng Qin, Kai Chen, and Yongqiang Xiong. 2019. Ursa: Hybrid block storage for cloud-scale virtual disks. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–17.
[16]
Yan Li, Kenneth Chang, Oceane Bel, Ethan L Miller, and Darrell DE Long. 2017. CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In Proceedings of the international conference for high performance computing, networking, storage and analysis. 1–14.
[17]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).
[18]
Wenhao Lyu, Youyou Lu, Jiwu Shu, and Wei Zhao. 2020. Sapphire: Automatic Configuration Recommendation for Distributed Storage Systems. arXiv preprint arXiv:2007.03220(2020).
[19]
Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, 2019. Park: An open platform for learning-augmented computer systems. Advances in Neural Information Processing Systems 32 (2019).
[20]
Michael D McKay, Richard J Beckman, and William J Conover. 2000. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42, 1 (2000), 55–61.
[21]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, 2013. Playing atari with deep reinforcement learning. arXiv preprint:1312.5602(2013).
[22]
Kazutaka Morita. 2010. Sheepdog: Distributed storage system for qemu/kvm. LCA 2010 DS&R miniconf.
[23]
Ridwan Rashid Noel, Rohit Mehra, and Palden Lama. 2019. Towards self-managing cloud storage with reinforcement learning. In 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 34–44.
[24]
Robin L Plackett and J Peter Burman. 1946. The design of optimum multifactorial experiments. Biometrika 33, 4 (1946), 305–325.
[25]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). Ieee, 1–10.
[26]
Sklearn LassoCV Repository 2022. LassoCV.https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.
[27]
Steven R Soltis, Thomas M Ruwart, and Matthew T O’keefe. 1996. The global file system. In NASA Conference Publication. 319–342.
[28]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[29]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data. 1009–1024.
[30]
Miroslav Vořechovskỳ. 2015. Hierarchical refinement of latin hypercube samples. Computer-Aided Civil and Infrastructure Engineering 30, 5(2015), 394–411.
[31]
Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3-4 (1992), 279–292.
[32]
Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation. 307–320.
[33]
Ji Zhang, Yu Liu, and Ke Zhou. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. Proceedings of the 2019 International Conference on Management of Data (2019).
[34]
Qing Zheng, Haopeng Chen, Yaguang Wang, Jiangang Duan, and Zhiteng Huang. 2012. Cosbench: A benchmark tool for cloud object storage services. In 2012 IEEE Fifth International Conference on Cloud Computing. IEEE, 998–999.
[35]
Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In Proceedings of the 2017 Symposium on Cloud Computing. 338–350.
[36]
Hui Zou. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association 101, 476(2006), 1418–1429.

Cited By

View all
  • (2024)TIE: Fast Experiment-Driven ML-Based Configuration Tuning for In-Memory Data AnalyticsIEEE Transactions on Computers10.1109/TC.2024.336593773:5(1233-1247)Online publication date: 14-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Auto-tuning
  2. Distributed Storage System
  3. Parameter Identification
  4. Reinforcement Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the Creative Research Group Project of NSFC
  • the Key Research and Development Program of Guangdong Provinc
  • the National Natural Science Foundation of China

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)5
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TIE: Fast Experiment-Driven ML-Based Configuration Tuning for In-Memory Data AnalyticsIEEE Transactions on Computers10.1109/TC.2024.336593773:5(1233-1247)Online publication date: 14-Feb-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media