Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IROS40897.2019.8967913guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Hierarchical Reinforcement Learning for Quadruped Locomotion

Published: 01 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework that can automatically learn to decompose complex locomotion tasks. A high-level policy issues commands in the form of a latent vector and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot’s on-board sensors to control the robot’s actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to new tasks by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.

    References

    [1]
    Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019.
    [2]
    Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Erwin Coumans, Vikas Sindhwani, and Vincent Vanhoucke. Policies modulating trajectory generators. In Conference on Robot Learning, pages 916–926, 2018.
    [3]
    Wenhao Yu, C. Karen Liu, and Greg Turk. Policy transfer with strategy optimization. In International Conference on Learning Representations, 2019.
    [4]
    Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
    [5]
    Andrew Levy, Robert Platt, and Kate Saenko. Hierarchical reinforcement learning with hindsight. In International Conference on Learning Representations, 2019.
    [6]
    Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, pages 3307–3317, 2018.
    [7]
    Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, and Sergey Levine. Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 1846–1855, 2018.
    [8]
    Takayuki Osa, Voot Tangkaratt, and Masashi Sugiyama. Hierarchical reinforcement learning via advantage-weighted information maximization. In International Conference on Learning Representations, 2019.
    [9]
    Dana H Ballard. Modular learning in neural networks. In AAAI, pages 279–284, 1987.
    [10]
    Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. META LEARNING SHARED HIERARCHIES. In International Conference on Learning Representations, 2018.
    [11]
    Bastian Bischoff, Duy Nguyen-Tuong, IH Lee, Felix Streichert, Alois Knoll, et al. Hierarchical reinforcement learning for robot navigation. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2013), 2013.
    [12]
    Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. Learning and transfer of modulated locomotor controllers. arXiv preprint arXiv:1610.05182, 2016.
    [13]
    Aleksandra Faust, Kenneth Oslund, Oscar Ramirez, Anthony Francis, Lydia Tapia, Marek Fiser, and James Davidson. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5113–5120. IEEE, 2018.
    [14]
    Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG), 36(4):41, 2017.
    [15]
    Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics (TOG), 35(4):81, 2016.
    [16]
    Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search of static linear policies is competitive for reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1800–1809. Curran Associates, Inc., 2018.
    [17]
    Erwin Coumans. Bullet Physics SDK. https://github.com/bulletphysics/bullet3, 2013.
    [18]
    Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018.

    Cited By

    View all
    • (2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
    • (2023)ScanBot: Autonomous Reconstruction via Deep Reinforcement LearningACM Transactions on Graphics10.1145/359211342:4(1-16)Online publication date: 26-Jul-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    6597 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 November 2019

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent LearningACM Transactions on Intelligent Systems and Technology10.1145/362340514:6(1-28)Online publication date: 14-Nov-2023
    • (2023)ScanBot: Autonomous Reconstruction via Deep Reinforcement LearningACM Transactions on Graphics10.1145/359211342:4(1-16)Online publication date: 26-Jul-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media