Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3545946.3598965acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
poster

Centralized Cooperative Exploration Policy for Continuous Control Tasks

Published: 30 May 2023 Publication History

Abstract

Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. This paper proposes CCEP (C entralized C ooperative E xploration P olicy), which utilizes estimation biases of value functions to contribute to the exploration capacity. CCEP keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. Besides, the exploration capabilities of CCEP have been demonstrated to outperform current state-of-the-art methods on multiple continuous control tasks.

References

[1]
Yuri Burda, Harrison Edwards, Amos Storkey, et al. 2019. Exploration by random network distillation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
[2]
Kamil Ciosek, Quan Vuong, Robert Loftin, and Katja Hofmann. 2019. Better Exploration with Optimistic Actor Critic. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché -Buc, Emily B. Fox, and Roman Garnett (Eds.). 1785--1796. https://proceedings.neurips.cc/paper/2019/hash/a34bacf839b923770b2c360eefa26748-Abstract.html
[3]
Justin Fu, John D. Co-Reyes, and Sergey Levine. 2017. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 2574--2584.
[4]
Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, PMLR, Stockholmsm&@228;ssan, Stockholm, Sweden, 1582--1591. http://proceedings.mlr.press/v80/fujimoto18a.html
[5]
Chen Gong, Qiang He, Yunpeng Bai, Xinwen Hou, et al. 2021. Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games. In 2021 IEEE International Conference on Multimedia and Expo, ICME. IEEE, 1--6.
[6]
Chen Gong, Zhou Yang, Yunpeng Bai, Jieke Shi, Arunesh Sinha, Bowen Xu, David Lo, Xinwen Hou, and Guoliang Fan. 2022. Curiosity-Driven and Victim-Aware Adversarial Policies. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC '22). 186--200. https://doi.org/10.1145/3564625.3564636
[7]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, PMLR, Stockholmsm"a ssan, Stockholm, Sweden, 1856--1865.
[8]
Hado Hasselt. 2010. Double Q-learning. In Advances in Neural Information Processing Systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf
[9]
Hado van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI'16). AAAI Press, 2094--2100.
[10]
Chao Li, Chen Gong, Qiang He, Xinwen Hou, and Yu Liu. 2023. Centralized Cooperative Exploration Policy for Continuous Control Tasks. https://doi.org/10.48550/ARXIV.2301.02375
[11]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70). PMLR, 2778--2787.
[12]
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5026--5033. https://doi.org/10.1109/IROS.2012.6386109
[13]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, Vol. 9, 86 (2008), 2579--2605. http://jmlr.org/papers/v9/vandermaaten08a.html
[14]
Wei Wei, Yujia Zhang, Jiye Liang, Lin Li, and Yyuze Li. 2022. Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 8 (Jun. 2022), 8621--8628. https://doi.org/10.1609/aaai.v36i8.20840
[15]
Dongming Wu, Xingping Dong, Jianbing Shen, and Steven C. H. Hoi. 2020. Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient. IEEE Transactions on Neural Networks and Learning Systems, Vol. 31, 11 (2020), 4933--4945. https://doi.org/10.1109/TNNLS.2019.2959129
[16]
Yuzhong Zhao, Yuanqiang Cai, Weijia Wu, and Weiqiang Wang. 2022. Explore Faster Localization Learning For Scene Text Detection. arXiv preprint arXiv:2207.01342 (2022).

Cited By

View all
  • (2023)Keep various trajectoriesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666353(5223-5235)Online publication date: 10-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
May 2023
3131 pages
ISBN:9781450394321
  • General Chairs:
  • Noa Agmon,
  • Bo An,
  • Program Chairs:
  • Alessandro Ricci,
  • William Yeoh

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 30 May 2023

Check for updates

Author Tags

  1. continuous control tasks
  2. cooperative exploration

Qualifiers

  • Poster

Funding Sources

  • National Key Research and Development Program of China

Conference

AAMAS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Keep various trajectoriesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666353(5223-5235)Online publication date: 10-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media