research-article

DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks

Authors:

Pengfei ChenAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 67, Pages 1 - 11

https://doi.org/10.1145/3545008.3545018

Published: 13 January 2023 Publication History

Abstract

To support different application scenarios, big data frameworks usually provide a large number of performance-related configuration parameters. Online auto-tuning these parameters based on deep reinforcement learning to achieve a better performance has shown their advantages over search-based and machine learning-based approaches. Unfortunately, the time consumption during the online tuning phase of conventional DRL-based methods is still heavy, especially for big data applications. Therefore, in this paper, we propose DeepCAT, a cost-efficient deep reinforcement learning-based approach to achieve online configuration auto-tuning for big data frameworks. To reduce the total online tuning cost: 1) DeepCAT utilizes the TD3 algorithm instead of DDPG to alleviate value overestimation; 2) DeepCAT modifies the conventional experience replay to fully utilize the rare but valuable transitions via a novel reward-driven prioritized experience replay mechanism; 3) DeepCAT designs a Twin-Q Optimizer to estimate the execution time of each action without the costly configuration evaluation and optimize the sub-optimal ones to achieve a low-cost exploration-exploitation trade off. Experimental results based on a local 3-node Spark cluster and HiBench benchmark applications show that DeepCAT is able to speed up the best execution time by a factor of 1.45 × and 1.65 × on average respectively over CDBTune and OtterTune, while consuming up to 50.08% and 53.39% less total tuning time.

References

[1]

Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. {CherryPick}: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 469–482.

[2]

Liang Bao, Xin Liu, Fangzheng Wang, and Baoyin Fang. 2019. ACTGAN: automatic configuration tuning for software systems with generative adversarial networks. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 465–476.

Digital Library

[3]

Liang Bao, Xin Liu, Ziheng Xu, and Baoyin Fang. 2018. Autoconfig: Automatic configuration tuning for distributed message systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 29–40.

Digital Library

[4]

Zhendong Bei, Zhibin Yu, Ni Luo, Chuntao Jiang, Chengzhong Xu, and Shengzhong Feng. 2018. Configuring in-memory cluster computing using random forest. Future Generation Computer Systems 79 (2018), 1–15.

[5]

Xiangping Bu, Jia Rao, and Cheng-Zhong Xu. 2009. A reinforcement learning approach to online web systems auto-configuration. In 2009 29th IEEE International Conference on Distributed Computing Systems. IEEE, 2–11.

Digital Library

[6]

Maria Casimiro, Diego Didona, Paolo Romano, Luis Rodrigues, Willy Zwaenepoel, and David Garlan. 2020. Lynceus: Cost-efficient tuning and provisioning of data analytic jobs. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). IEEE, 56–66.

[7]

Hui Dou, Pengfei Chen, and Zibin Zheng. 2020. Hdconfigor: automatically tuning high dimensional configuration parameters for log search engines. IEEE Access 8(2020), 80638–80653.

[8]

Ayat Fekry, Lucian Carata, Thomas Pasquier, Andrew Rice, and Andy Hopper. 2020. To tune or not to tune? in search of optimal configurations for data analytics. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2494–2504.

Digital Library

[9]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.

[10]

Jia-Ke Ge, Yan-Feng Chai, and Yun-Peng Chai. 2021. WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning. Journal of Computer Science and Technology 36, 4 (2021), 741–761.

Digital Library

[11]

Yijin Guo, Huasong Shan, Shixin Huang, Kai Hwang, Jianping Fan, and Zhibin Yu. 2021. GML: Efficiently Auto-Tuning Flink’s Configurations Via Guided Machine Learning. IEEE Transactions on Parallel and Distributed Systems 32, 12 (2021), 2921–2935.

[12]

Xue Han and Tingting Yu. 2020. Automated performance tuning for highly-configurable software systems. arXiv preprint arXiv:2010.01397(2020).

[13]

Haochen He, Zhouyang Jia, Shanshan Li, Yue Yu, Chenglong Zhou, Qing Liao, Ji Wang, and Xiangke Liao. 2021. Multi-Intention Aware Configuration Selection for Performance Tuning. (2021).

[14]

Yigong Hu, Gongqi Huang, and Peng Huang. 2020. Automated reasoning and detection of specious configuration in large systems with symbolic execution. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 719–734.

[15]

Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010). IEEE, 41–51.

[16]

Pooyan Jamshidi and Giuliano Casale. 2016. An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems. In 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). 39–48. https://doi.org/10.1109/MASCOTS.2016.17

[17]

Md Muhib Khan and Weikuan Yu. 2021. ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics. In 50th International Conference on Parallel Processing. 1–10.

[18]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment 12, 12 (2019), 2118–2130.

Digital Library

[19]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).

[20]

Chen Lin, Junqing Zhuang, Jiadong Feng, Hui Li, Xuanhe Zhou, and Guoliang Li. 2022. Adaptive Code Learning for Spark Configuration Tuning. ICDE.

[21]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013).

[22]

Ting-Yu Mu, Ala Al-Fuqaha, and Khaled Salah. 2019. Automating the configuration of MapReduce: A reinforcement learning scheme. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50, 11(2019), 4183–4196.

[23]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[24]

David Buchaca Prats, Felipe Albuquerque Portella, Carlos HA Costa, and Josep Lluis Berral. 2020. You Only Run Once: Spark Auto-Tuning From a Single Run. IEEE Transactions on Network and Service Management 17, 4(2020), 2039–2051.

Digital Library

[25]

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015).

[26]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM international conference on management of data. 1009–1024.

Digital Library

[27]

Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3 (1992), 279–292.

[28]

Jinhan Xin, Kai Hwang, and Zhibin Yu. 2022. LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications [Extended Version]. arXiv preprint arXiv:2203.14889(2022).

[29]

Zhibin Yu, Zhendong Bei, and Xuehai Qian. 2018. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 564–577.

Digital Library

[30]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415–432.

Digital Library

[31]

Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta-learning for cloud databases. In Proceedings of the 2021 International Conference on Management of Data. 2102–2114.

Digital Library

[32]

Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In Proceedings of the 2017 Symposium on Cloud Computing. 338–350.

Digital Library

Cited By

Wang YChen PDou HZhang YYu GHe ZHuang HFilkov VRay BZhou M(2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695477
Mai GHe ZYu GChen ZChen P(2024)CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement LearningProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674809(269-278)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674809
Dou HWang YZhang YChen PZheng Z(2024)DeepCAT⁺: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data FrameworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345988935:11(2114-2131)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3459889

Recommendations

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Spark SQL has been widely deployed in industry but it is challenging to tune its performance. Recent studies try to employ machine learning (ML) to solve this problem, but suffer from two drawbacks. First, it takes a long time (high overhead) to collect ...
DeepCAT<sup>+</sup>: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data Frameworks
Big data frameworks usually provide a large number of performance-related parameters. Online auto-tuning these parameters based on deep reinforcement learning (DRL) to achieve a better performance has shown their advantages over search-based and machine ...
Mjolnir: A framework agnostic auto-tuning system with deep reinforcement learning
Abstract
Choosing the right setting for big data frameworks is an important yet difficult task. These frameworks come with a complex set of parameters that need to be tuned to achieve the best performance in terms of throughput and latency. Learning-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Key-Area Research and Development Program of Guangdong Province

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YChen PDou HZhang YYu GHe ZHuang HFilkov VRay BZhou M(2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695477
Mai GHe ZYu GChen ZChen P(2024)CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement LearningProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674809(269-278)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674809
Dou HWang YZhang YChen PZheng Z(2024)DeepCAT⁺: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data FrameworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.345988935:11(2114-2131)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3459889

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents