research-article

Rapid locomotion via reinforcement learning

Authors:

Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen,

Pulkit AgrawalAuthors Info & Claims

The International Journal of Robotics Research, Volume 43, Issue 4

Pages 572 - 587

https://doi.org/10.1177/02783649231224053

Published: 01 April 2024 Publication History

Abstract

Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/.

References

[1]

Agarwal A, Kumar A, and Malik J, et al. (2022) Legged Locomotion in Challenging Terrains Using Egocentric Vision. In: Proceedings of Conference on Robot Learning (CoRL). Auckland, New Zealand, 403-415. arXiv preprint arXiv:2211.07638.

[2]

Akkaya I, Andrychowicz M, and Chociej M, et al. (2019) Solving Rubik’s Cube With a Robot Hand. arXiv preprint arXiv:1910.07113. arXiv preprint.

[3]

Alexander RM (1984) The gaits of bipedal and quadrupedal animals. The International Journal of Robotics Research 3(2): 49–59.

[4]

Bengio Y, Louradour J, and Collobert R, et al. (2009) Curriculum learning. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Montreal, Canada, 41–48.

Digital Library

[5]

Bledt G and Kim S (2020) Extracting legged locomotion heuristics with regularized predictive control. In: Proceedings of International Conference on Robotics and Automation. Virtual, 406–412.

[6]

Bosworth W, Whitney J, and Kim S, et al. (2016) Robot locomotion on hard and soft ground: measuring stability and ground properties in-situ. In: Proceedings of International Conference on Robotics and Automation. Stockholm, Sweden, 3582–3589.

Digital Library

[7]

Chen D, Zhou B, and Koltun V, et al. (2020) Learning by Cheating. In: Proceedings of Conference on Robot Learning (CoRL), Virtual, 66–75.

[8]

Chen T, Xu J, and Agrawal P (2021) A system for general in-hand object re-orientation. In: Proceedings of Conference on Robot Learning (CoRL), London, UK, 297–307.

[9]

Chignoli M, Kim D, and Stanger-Jones E, et al. (2021) The MIT humanoid robot: design, motion planning, and control for acrobatic behaviors. In: Proceedings under IEEE-RAS International Conference on Humanoid Robots. Munich, Germany, 1–8.

Digital Library

[10]

Choi S, Ji G, and Park J, et al. (2023) Learning quadrupedal locomotion on deformable terrain. Science Robotics 8(74): eade2256.

[11]

Dai H, Valenzuela A, and Tedrake R (2014) Whole-body motion planning with centroidal dynamics and full kinematics. In: Proceedings under IEEE-RAS International Conference on Humanoid Robots. Madrid, Spain, 295–302.

[12]

Ding Y, Pandala A, and Park HW (2019) Real-time model predictive control for versatile dynamic motions in quadrupedal robots. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Montreal, Canada, 8484–8490.

Digital Library

[13]

Fahmi S, Focchi M, and Radulescu A, et al. (2020) STANCE: locomotion adaptation over soft terrain. IEEE Transactions on Robotics 36(2): 443–457.

[14]

Fu Z, Kumar A, and Malik J, et al. (2021) Minimizing energy consumption leads to the emergence of gaits in legged robots. In: Conference on Robot Learning (CoRL), London, UK, 928–937.

[15]

Fu Z, Cheng X, and Pathak D (2022) Deep whole-body control: learning a unified policy for manipulation and locomotion. In: Conference on Robot Learning (CoRL), London, UK.

[16]

Herdt A, Perrin N, and Wieber PB, et al. (2010) Walking without thinking about it. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Taiwan, 190–195.

[17]

Herzog A, Schaal S, and Righetti L (2016) Structured contact force optimization for kino-dynamic motion generation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea, 2703–2710.

Digital Library

[18]

Hessel M, Soyer H, and Espeholt L, et al. (2019) Multi-task deep reinforcement learning with popart. Proceedings of the AAAI Conference on Artificial Intelligence 33: 3796–3803.

[19]

Hwangbo J, Lee J, and Dosovitskiy A, et al. (2019) Learning agile and dynamic motor skills for legged robots. Science Robotics 4(26): aau5872.

[20]

Ji G, Mun J, and Kim H, et al. (2022) Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters 7(2): 4630–4637.

[21]

Ji Y, Margolis GB, and Agrawal P (2023) Dribblebot: dynamic legged manipulation in the wild. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, May 2023.

[22]

Jin Y, Liu X, and Shao Y, et al. (2022) High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence 4: 1198–1208.

[23]

Kajita S, Kanehiro F, and Kaneko K, et al. (2003) Biped walking pattern generation by using preview control of zero-moment point. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), Taipei, Taiwan, Vol. 2, 1620–1626.

[24]

Katz B, Carlo JD, and Kim S (2019) Mini Cheetah: a platform for pushing the limits of dynamic quadruped control. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada, pp. 6295–6301.

Digital Library

[25]

Kim D, Di Carlo J, and Katz B, et al. (2019) Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586. arXiv preprint.

[26]

Kuindersma S, Deits R, and Fallon M, et al. (2015) Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots 40(3): 429–455.

Digital Library

[27]

Kumar A, Fu Z, and Pathak D, et al. (2021) RMA: rapid motor adaptation for legged robots. In: Proceedings of Robotics: Science and Systems, Virtual.

[28]

Lee J, Hwangbo J, and Wellhausen L, et al. (2020) Learning quadrupedal locomotion over challenging terrain. Science Robotics 5(47): eabc5986.

[29]

Li R, Jabri A, and Darrell T, et al. (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), Virtual, 4051–4058.

[30]

Makoviychuk V, Wawrzyniak L, and Guo Y, et al. (2021) Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning. arXiv preprint arXiv:2108.10470. arXiv preprint.

[31]

Margolis GB and Agrawal P (2022) Walk these ways: tuning robot control for generalization with multiplicity of behavior. In: Conference on Robot Learning, Auckland, New Zealand, December 2022.

[32]

Margolis GB, Chen T, and Paigwar K, et al. (2021) Learning to jump from pixels. In: Conference on Robot Learning (CoRL), London, UK, 1025–1034.

[33]

Matiisen T, Oliver A, and Cohen T, et al. (2020) Teacher–student curriculum learning. IEEE Transactions on Neural Networks and Learning Systems 31(9): 3732–3740.

[34]

Miki T, Lee J, and Hwangbo J, et al. (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics 7(62): abk2822.

[35]

Park HW, Wensing PM, and Kim S (2017) High-speed bounding with the MIT Cheetah 2: control design and experiments. The International Journal of Robotics Research 36(2): 167–192.

Digital Library

[36]

Raibert MH (1986) Legged Robots that Balance. Cambridge, MA: MIT Press.

[37]

Raibert M, Blankespoor K, and Nelson G, et al. (2008) Bigdog, the rough-terrain quadruped robot. IFAC Proceedings Volumes 41(2): 10822–10825.

[38]

Righetti L and Schaal S (2012) Quadratic programming for inverse dynamics with optimal distribution of contact forces. In: Proceedings IEEE-RAS International Conference on Humanoid Robots. Osaka, Japan, 538–543.

[39]

Ross S, Gordon G, and Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 627–635.

[40]

Rudin N, Hoeller D, and Reist P, et al. (2021) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning (CoRL), London, UK, 91–100.

[41]

Schulman J, Wolski F, and Dhariwal P, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. arXiv preprint.

[42]

Siekmann J, Green K, and Warila J, et al. (2021) Blind bipedal stair traversal via sim-to-real reinforcement learning. In: Proceedings of the Robotics: Science and Systems, Virtual.

[43]

Tan J, Zhang T, and Coumans E, et al. (2018) Sim-to-real: learning agile locomotion for quadruped robots. In: Proceedings of the Robotics: Science and Systems. Pittsburgh, PA, 1–9.

[44]

Tobin J, Fong R, and Ray A, et al. (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, 23–30.

Digital Library

[45]

Unitree (2022) Unitree Robotics, A1. Hangzhou: Unitree. https://www.unitree.com/products/a1 (Accessed 04 01).

[46]

Xie Z, Clary P, and Dao J, et al. (2020a) Learning locomotion skills for Cassie: iterative design and sim-to-real. In: Proceedings of the Conference on Robot Learning (CoRL), Osaka, Japan, 1–13.

[47]

Xie Z, Ling HY, and Kim NH, et al. (2020b) ALLSTEPS: curriculum-driven learning of stepping stone skills. Computer Graphics Forum 39(8): 213–224.

Digital Library

[48]

Xie Z, Da X, and van de Panne M, et al. (2021) Dynamics randomization revisited: a case study for quadrupedal locomotion. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Virtual, 4955–4961.

Digital Library

[49]

Yu W, Tan J, and Liu CK, et al. (2017) Preparing for the Unknown: Learning a Universal Policy With Online System Identification. arXiv preprint arXiv:1702.02453. In: Proceedings of Robotics: Science and Systems, July 2017, Cambridge, MA, USA.

Cited By

Recommendations

Efficient learning of robust quadruped bounding using pretrained neural networks
Abstract
Bounding is one of the important gaits in quadrupedal locomotion for negotiating obstacles. The authors proposed an effective approach that can learn robust bounding gaits more efficiently despite its large variation in dynamic body movements. The ...
Online Inverse Reinforcement Learning Under Occlusion
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. While this problem is witnessing sustained attention, the related problem of online IRL - where the observations are ...
Reinforcement Learning for Biped Locomotion
ICANN '02: Proceedings of the International Conference on Artificial Neural Networks

This paper studies the reinforcement learning (RL) method for central pattern generators (CPG) that generates stable rhythmic movements such as biped locomotion. RL for biped locomotion is very difficult, since the biped robot is highly unstable and the ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research

International Journal of Robotics Research Volume 43, Issue 4

Apr 2024

201 pages

Issue’s Table of Contents

© The Author(s) 2024.

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 April 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents