Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72341-4_22guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning

Published: 17 September 2024 Publication History

Abstract

Reinforcement Learning (RL) techniques have drawn great attention in many challenging tasks, but their performance deteriorates dramatically when applied to real-world problems. Various methods, such as domain randomization, have been proposed to deal with such situations by training agents under different environmental setups, and therefore they can be generalized to different environments during deployment. However, they usually do not incorporate the underlying environmental factor information that the agents interact with properly and thus can be overly conservative when facing changes in the surroundings. In this paper, we first formalize the task of adapting to changing environmental dynamics in RL using Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such a generalization task. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.

References

[1]
Silver, D, et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–9 (2016)
[2]
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–33 (2015)
[3]
Filos, A., Tigkas, P., McAllister, R., Rhinehart, N., Levine, S., Gal, Y.: Can autonomous vehicles identify, recover from, and adapt to distribution shifts? In: International Conference on Machine Learning, pp. 3145–3153. PMLR (2020)
[4]
Smith, L., et al.: Demonstrating a walk in the park: learning to walk in 20 minutes with model-free reinforcement learning. Robotics: Science and Systems (2023)
[5]
Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019)
[6]
Azar AT et al. Drone deep reinforcement learning: a review Electronics 2021 10 9 999
[7]
Kumar, A., Fu, Z., Pathak, D., Malik, J.: RMA: rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034 (2021)
[8]
Benjamins, C., et al.:. Carl: A benchmark for contextual and adaptive reinforcement learning. arXiv preprint arXiv:2110.02102 (2021)
[9]
Kirk, R., Zhang, A., Grefenstette, E., Rocktäschel, T.: A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794 (2021)
[10]
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Rob. 5(47), eabc5986 (2020)
[11]
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)
[12]
Yu, W., Tan, J., Liu, C.K., Turk, G.: Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint arXiv:1702.02453 (2017)
[13]
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W., Abbeel, P.: Asymmetric actor critic for image-based robot learning. arXiv preprint arXiv:1710.06542 (2017)
[14]
Hallak, A., Di Castro, D., Mannor, S.: Contextual Markov decision processes. arXiv preprint arXiv:1502.02259 (2015)
[15]
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810. IEEE (2018)
[16]
Tan, J., et al.: Sim-to-real: learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332 (2018)
[17]
Yang, R., Yang, G., Wang, X.: Neural volumetric memory for visual locomotion control. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1430–1440 (2023)
[18]
Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Rob. 7(62), eabk2822 (2022)
[19]
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 12 (1999)
[20]
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304 . PMLR (2018)
[21]
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999)
[22]
Oliehoek FA and Amato C A Concise Introduction to Decentralized POMDPs 2016 Cham Springer
[23]
Baisero, A., Amato, C.: Unbiased asymmetric actor-critic for partially observable reinforcement learning. arXiv preprint arXiv:2105.11674 (2021)
[24]
Chen, D., Zhou, B., Koltun, V., Krähenbühl, P.: Learning by cheating. In: Conference on Robot Learning, pp. 66–75. PMLR (2020)
[25]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
[26]
Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning 2018 Jul 3 (pp. 1861-1870). PMLR
[27]
Berndt, J.: JSBSim: an open source flight dynamics model in C++. In: AIAA Modeling and Simulation Technologies Conference and Exhibit, p. 4923 (2004)
[28]
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
[29]
Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Rob. 4(26), eaau5872 (2019)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Artificial Neural Networks and Machine Learning – ICANN 2024: 33rd International Conference on Artificial Neural Networks, Lugano, Switzerland, September 17–20, 2024, Proceedings, Part IV
Sep 2024
448 pages
ISBN:978-3-031-72340-7
DOI:10.1007/978-3-031-72341-4

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 September 2024

Author Tags

  1. Reinforcement learning
  2. Actor-Critic
  3. CMDP

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media