Article

Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning

Authors:

Xiaochuan Zhang,

Guang KouAuthors Info & Claims

Artificial Neural Networks and Machine Learning – ICANN 2024: 33rd International Conference on Artificial Neural Networks, Lugano, Switzerland, September 17–20, 2024, Proceedings, Part IV

Pages 325 - 339

https://doi.org/10.1007/978-3-031-72341-4_22

Published: 17 September 2024 Publication History

Abstract

Reinforcement Learning (RL) techniques have drawn great attention in many challenging tasks, but their performance deteriorates dramatically when applied to real-world problems. Various methods, such as domain randomization, have been proposed to deal with such situations by training agents under different environmental setups, and therefore they can be generalized to different environments during deployment. However, they usually do not incorporate the underlying environmental factor information that the agents interact with properly and thus can be overly conservative when facing changes in the surroundings. In this paper, we first formalize the task of adapting to changing environmental dynamics in RL using Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such a generalization task. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.

References

[1]

Silver, D, et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–9 (2016)

[2]

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–33 (2015)

[3]

Filos, A., Tigkas, P., McAllister, R., Rhinehart, N., Levine, S., Gal, Y.: Can autonomous vehicles identify, recover from, and adapt to distribution shifts? In: International Conference on Machine Learning, pp. 3145–3153. PMLR (2020)

[4]

Smith, L., et al.: Demonstrating a walk in the park: learning to walk in 20 minutes with model-free reinforcement learning. Robotics: Science and Systems (2023)

[5]

Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019)

[6]

Azar AT et al. Drone deep reinforcement learning: a review Electronics 2021 10 9 999

[7]

Kumar, A., Fu, Z., Pathak, D., Malik, J.: RMA: rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034 (2021)

[8]

Benjamins, C., et al.:. Carl: A benchmark for contextual and adaptive reinforcement learning. arXiv preprint arXiv:2110.02102 (2021)

[9]

Kirk, R., Zhang, A., Grefenstette, E., Rocktäschel, T.: A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794 (2021)

[10]

Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Rob. 5(47), eabc5986 (2020)

[11]

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)

[12]

Yu, W., Tan, J., Liu, C.K., Turk, G.: Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint arXiv:1702.02453 (2017)

[13]

Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W., Abbeel, P.: Asymmetric actor critic for image-based robot learning. arXiv preprint arXiv:1710.06542 (2017)

[14]

Hallak, A., Di Castro, D., Mannor, S.: Contextual Markov decision processes. arXiv preprint arXiv:1502.02259 (2015)

[15]

Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810. IEEE (2018)

[16]

Tan, J., et al.: Sim-to-real: learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332 (2018)

[17]

Yang, R., Yang, G., Wang, X.: Neural volumetric memory for visual locomotion control. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1430–1440 (2023)

[18]

Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Rob. 7(62), eabk2822 (2022)

[19]

Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 12 (1999)

[20]

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304 . PMLR (2018)

[21]

Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999)

[22]

Oliehoek FA and Amato C A Concise Introduction to Decentralized POMDPs 2016 Cham Springer

[23]

Baisero, A., Amato, C.: Unbiased asymmetric actor-critic for partially observable reinforcement learning. arXiv preprint arXiv:2105.11674 (2021)

[24]

Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. In: Conference on Robot Learning, pp. 66–75. PMLR (2020)

[25]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

[26]

Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning 2018 Jul 3 (pp. 1861-1870). PMLR

[27]

Berndt, J.: JSBSim: an open source flight dynamics model in C++. In: AIAA Modeling and Simulation Technologies Conference and Exhibit, p. 4923 (2004)

[28]

Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

[29]

Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Rob. 4(26), eaau5872 (2019)

Index Terms

Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Multi-step methods are important in reinforcement learn- ing (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without ...
Supervised Advantage Actor-Critic for Recommender Systems
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the ...
Multi-actor mechanism for actor-critic reinforcement learning
Abstract
Value estimation is a critical problem in Value-Based reinforcement learning. Most related studies focus on using multi-critic to reduce estimation bias and seldom consider the multi-actor impact on value estimation. This paper proposes a multi-...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Artificial Neural Networks and Machine Learning – ICANN 2024: 33rd International Conference on Artificial Neural Networks, Lugano, Switzerland, September 17–20, 2024, Proceedings, Part IV

Sep 2024

448 pages

ISBN:978-3-031-72340-7

DOI:10.1007/978-3-031-72341-4

Editors:
Michael Wand
IDSIA USI-SUPSI, Lugano, Switzerland
,
Kristína Malinovská
Comenius University, Bratislava, Slovakia
,
Jürgen Schmidhuber
KAUST Center of Generative AI, Thuwal, Saudi Arabia
,
Igor V. Tetko
Helmholtz Zentrum München, Neuherberg, Germany

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 September 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents