research-article

Hierarchical Reinforcement Learning for Quadruped Locomotion

Authors:

Atil Iscen, and

Ken CaluwaertsAuthors Info & Claims

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

November 2019

Pages 7551 - 7557

https://doi.org/10.1109/IROS40897.2019.8967913

Published: 01 November 2019 Publication History

Abstract

Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework that can automatically learn to decompose complex locomotion tasks. A high-level policy issues commands in the form of a latent vector and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot’s on-board sensors to control the robot’s actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to new tasks by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.

References

[1]

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019.

[2]

Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Erwin Coumans, Vikas Sindhwani, and Vincent Vanhoucke. Policies modulating trajectory generators. In Conference on Robot Learning, pages 916–926, 2018.

[3]

Wenhao Yu, C. Karen Liu, and Greg Turk. Policy transfer with strategy optimization. In International Conference on Learning Representations, 2019.

[4]

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.

Digital Library

[5]

Andrew Levy, Robert Platt, and Kate Saenko. Hierarchical reinforcement learning with hindsight. In International Conference on Learning Representations, 2019.

[6]

Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, pages 3307–3317, 2018.

[7]

Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, and Sergey Levine. Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 1846–1855, 2018.

[8]

Takayuki Osa, Voot Tangkaratt, and Masashi Sugiyama. Hierarchical reinforcement learning via advantage-weighted information maximization. In International Conference on Learning Representations, 2019.

[9]

Dana H Ballard. Modular learning in neural networks. In AAAI, pages 279–284, 1987.

[10]

Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. META LEARNING SHARED HIERARCHIES. In International Conference on Learning Representations, 2018.

[11]

Bastian Bischoff, Duy Nguyen-Tuong, IH Lee, Felix Streichert, Alois Knoll, et al. Hierarchical reinforcement learning for robot navigation. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2013), 2013.

[12]

Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. Learning and transfer of modulated locomotor controllers. arXiv preprint arXiv:1610.05182, 2016.

[13]

Aleksandra Faust, Kenneth Oslund, Oscar Ramirez, Anthony Francis, Lydia Tapia, Marek Fiser, and James Davidson. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5113–5120. IEEE, 2018.

[14]

Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG), 36(4):41, 2017.

[15]

Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics (TOG), 35(4):81, 2016.

[16]

Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search of static linear policies is competitive for reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1800–1809. Curran Associates, Inc., 2018.

[17]

Erwin Coumans. Bullet Physics SDK. https://github.com/bulletphysics/bullet3, 2013.

[18]

Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018.

Cited By

Xu MShe YJin YWang J(2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3623405
Cao HXi XWu GHu RLiu L(2023)ScanBot: Autonomous Reconstruction via Deep Reinforcement LearningACM Transactions on Graphics10.1145/359211342:4(1-16)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592113

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

6597 pages

Copyright © 2019.

Publisher

IEEE Press

Publication History

Published: 01 November 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Xu MShe YJin YWang J(2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3623405
Cao HXi XWu GHu RLiu L(2023)ScanBot: Autonomous Reconstruction via Deep Reinforcement LearningACM Transactions on Graphics10.1145/359211342:4(1-16)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592113

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents