Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Rapid locomotion via reinforcement learning

Published: 01 April 2024 Publication History

Abstract

Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/.

References

[1]
Agarwal A, Kumar A, and Malik J, et al. (2022) Legged Locomotion in Challenging Terrains Using Egocentric Vision. In: Proceedings of Conference on Robot Learning (CoRL). Auckland, New Zealand, 403-415. arXiv preprint arXiv:2211.07638.
[2]
Akkaya I, Andrychowicz M, and Chociej M, et al. (2019) Solving Rubik’s Cube With a Robot Hand. arXiv preprint arXiv:1910.07113. arXiv preprint.
[3]
Alexander RM (1984) The gaits of bipedal and quadrupedal animals. The International Journal of Robotics Research 3(2): 49–59.
[4]
Bengio Y, Louradour J, and Collobert R, et al. (2009) Curriculum learning. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Montreal, Canada, 41–48.
[5]
Bledt G and Kim S (2020) Extracting legged locomotion heuristics with regularized predictive control. In: Proceedings of International Conference on Robotics and Automation. Virtual, 406–412.
[6]
Bosworth W, Whitney J, and Kim S, et al. (2016) Robot locomotion on hard and soft ground: measuring stability and ground properties in-situ. In: Proceedings of International Conference on Robotics and Automation. Stockholm, Sweden, 3582–3589.
[7]
Chen D, Zhou B, and Koltun V, et al. (2020) Learning by Cheating. In: Proceedings of Conference on Robot Learning (CoRL), Virtual, 66–75.
[8]
Chen T, Xu J, and Agrawal P (2021) A system for general in-hand object re-orientation. In: Proceedings of Conference on Robot Learning (CoRL), London, UK, 297–307.
[9]
Chignoli M, Kim D, and Stanger-Jones E, et al. (2021) The MIT humanoid robot: design, motion planning, and control for acrobatic behaviors. In: Proceedings under IEEE-RAS International Conference on Humanoid Robots. Munich, Germany, 1–8.
[10]
Choi S, Ji G, and Park J, et al. (2023) Learning quadrupedal locomotion on deformable terrain. Science Robotics 8(74): eade2256.
[11]
Dai H, Valenzuela A, and Tedrake R (2014) Whole-body motion planning with centroidal dynamics and full kinematics. In: Proceedings under IEEE-RAS International Conference on Humanoid Robots. Madrid, Spain, 295–302.
[12]
Ding Y, Pandala A, and Park HW (2019) Real-time model predictive control for versatile dynamic motions in quadrupedal robots. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Montreal, Canada, 8484–8490.
[13]
Fahmi S, Focchi M, and Radulescu A, et al. (2020) STANCE: locomotion adaptation over soft terrain. IEEE Transactions on Robotics 36(2): 443–457.
[14]
Fu Z, Kumar A, and Malik J, et al. (2021) Minimizing energy consumption leads to the emergence of gaits in legged robots. In: Conference on Robot Learning (CoRL), London, UK, 928–937.
[15]
Fu Z, Cheng X, and Pathak D (2022) Deep whole-body control: learning a unified policy for manipulation and locomotion. In: Conference on Robot Learning (CoRL), London, UK.
[16]
Herdt A, Perrin N, and Wieber PB, et al. (2010) Walking without thinking about it. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Taiwan, 190–195.
[17]
Herzog A, Schaal S, and Righetti L (2016) Structured contact force optimization for kino-dynamic motion generation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea, 2703–2710.
[18]
Hessel M, Soyer H, and Espeholt L, et al. (2019) Multi-task deep reinforcement learning with popart. Proceedings of the AAAI Conference on Artificial Intelligence 33: 3796–3803.
[19]
Hwangbo J, Lee J, and Dosovitskiy A, et al. (2019) Learning agile and dynamic motor skills for legged robots. Science Robotics 4(26): aau5872.
[20]
Ji G, Mun J, and Kim H, et al. (2022) Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters 7(2): 4630–4637.
[21]
Ji Y, Margolis GB, and Agrawal P (2023) Dribblebot: dynamic legged manipulation in the wild. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, May 2023.
[22]
Jin Y, Liu X, and Shao Y, et al. (2022) High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence 4: 1198–1208.
[23]
Kajita S, Kanehiro F, and Kaneko K, et al. (2003) Biped walking pattern generation by using preview control of zero-moment point. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), Taipei, Taiwan, Vol. 2, 1620–1626.
[24]
Katz B, Carlo JD, and Kim S (2019) Mini Cheetah: a platform for pushing the limits of dynamic quadruped control. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada, pp. 6295–6301.
[25]
Kim D, Di Carlo J, and Katz B, et al. (2019) Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586. arXiv preprint.
[26]
Kuindersma S, Deits R, and Fallon M, et al. (2015) Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots 40(3): 429–455.
[27]
Kumar A, Fu Z, and Pathak D, et al. (2021) RMA: rapid motor adaptation for legged robots. In: Proceedings of Robotics: Science and Systems, Virtual.
[28]
Lee J, Hwangbo J, and Wellhausen L, et al. (2020) Learning quadrupedal locomotion over challenging terrain. Science Robotics 5(47): eabc5986.
[29]
Li R, Jabri A, and Darrell T, et al. (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA), Virtual, 4051–4058.
[30]
Makoviychuk V, Wawrzyniak L, and Guo Y, et al. (2021) Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning. arXiv preprint arXiv:2108.10470. arXiv preprint.
[31]
Margolis GB and Agrawal P (2022) Walk these ways: tuning robot control for generalization with multiplicity of behavior. In: Conference on Robot Learning, Auckland, New Zealand, December 2022.
[32]
Margolis GB, Chen T, and Paigwar K, et al. (2021) Learning to jump from pixels. In: Conference on Robot Learning (CoRL), London, UK, 1025–1034.
[33]
Matiisen T, Oliver A, and Cohen T, et al. (2020) Teacher–student curriculum learning. IEEE Transactions on Neural Networks and Learning Systems 31(9): 3732–3740.
[34]
Miki T, Lee J, and Hwangbo J, et al. (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics 7(62): abk2822.
[35]
Park HW, Wensing PM, and Kim S (2017) High-speed bounding with the MIT Cheetah 2: control design and experiments. The International Journal of Robotics Research 36(2): 167–192.
[36]
Raibert MH (1986) Legged Robots that Balance. Cambridge, MA: MIT Press.
[37]
Raibert M, Blankespoor K, and Nelson G, et al. (2008) Bigdog, the rough-terrain quadruped robot. IFAC Proceedings Volumes 41(2): 10822–10825.
[38]
Righetti L and Schaal S (2012) Quadratic programming for inverse dynamics with optimal distribution of contact forces. In: Proceedings IEEE-RAS International Conference on Humanoid Robots. Osaka, Japan, 538–543.
[39]
Ross S, Gordon G, and Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 627–635.
[40]
Rudin N, Hoeller D, and Reist P, et al. (2021) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning (CoRL), London, UK, 91–100.
[41]
Schulman J, Wolski F, and Dhariwal P, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. arXiv preprint.
[42]
Siekmann J, Green K, and Warila J, et al. (2021) Blind bipedal stair traversal via sim-to-real reinforcement learning. In: Proceedings of the Robotics: Science and Systems, Virtual.
[43]
Tan J, Zhang T, and Coumans E, et al. (2018) Sim-to-real: learning agile locomotion for quadruped robots. In: Proceedings of the Robotics: Science and Systems. Pittsburgh, PA, 1–9.
[44]
Tobin J, Fong R, and Ray A, et al. (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, 23–30.
[45]
Unitree (2022) Unitree Robotics, A1. Hangzhou: Unitree. https://www.unitree.com/products/a1 (Accessed 04 01).
[46]
Xie Z, Clary P, and Dao J, et al. (2020a) Learning locomotion skills for Cassie: iterative design and sim-to-real. In: Proceedings of the Conference on Robot Learning (CoRL), Osaka, Japan, 1–13.
[47]
Xie Z, Ling HY, and Kim NH, et al. (2020b) ALLSTEPS: curriculum-driven learning of stepping stone skills. Computer Graphics Forum 39(8): 213–224.
[48]
Xie Z, Da X, and van de Panne M, et al. (2021) Dynamics randomization revisited: a case study for quadrupedal locomotion. In: Proceedings IEEE International Conference on Robotics and Automation (ICRA). Virtual, 4955–4961.
[49]
Yu W, Tan J, and Liu CK, et al. (2017) Preparing for the Unknown: Learning a Universal Policy With Online System Identification. arXiv preprint arXiv:1702.02453. In: Proceedings of Robotics: Science and Systems, July 2017, Cambridge, MA, USA.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 43, Issue 4
Apr 2024
201 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 April 2024

Author Tags

  1. Robot learning
  2. legged locomotion
  3. sim-to-real reinforcement learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media