article

Learning variable impedance control

Authors:

Evangelos Theodorou,

Stefan SchaalAuthors Info & Claims

International Journal of Robotics Research, Volume 30, Issue 7

Pages 820 - 833

https://doi.org/10.1177/0278364911402527

Published: 01 June 2011 Publication History

Abstract

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI² (Policy Improvement with Path Integrals). PI² is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI ² algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI² is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI² algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

References

[1]

Basar T. and Bernhard P. (1995) H-infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Boston, MA: Birkhäuser.

[2]

Buchli J., Kalakrishnan M., Mistry M., Pastor P. and Schaal S. (2009) Compliant quadruped locomotion over rough terrain . In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 814-820.

[3]

Buchli J., Theodorou E., Stulp F. and Schaal S. (2010) Variable impdeance control-a reinforcement learning approach . In Proceedings of Robotics: Science and Systems 2010.

[4]

Burdet E., Tee KP, Mareels I., Milner TE, Chew CM, Franklin DW, et al. (2006) Stability and motor adaptation in human arm movements. Biological Cybernetics 94: 20-32.

Digital Library

[5]

Caflisch RE Monte Carlo and quasi-Monte Carlo methods. Acta Numerica 7: 1-49.

[6]

Cheng G., Hyon S., Morimoto J., Ude A., Hale JG, Colvin G., et al. (2007) CB: A humanoid research platform for exploring neuroscience. Journal of Advanced Robotics 21: 1097-1114.

[7]

Fletcher R. (1981) Practical Methods of Optimization, volume 2. New York: Wiley.

[8]

Hogan N. (1985a) Impedance control: An approach to manipulation: Part I-Theory. ASME Transactions, Journal of Dynamic Systems, Measurement, and Control 107: 1-7.

[9]

Hogan N. (1985b) Impedance control: An approach to manipulation: Part II-Implementation. ASME Transactions, Journal of Dynamic Systems, Measurement, and Control 107: 8-16.

[10]

Hogan (2006) Force control with a muscle-activated endoskeleton. Advances in Robot Control . Berlin: Springer.

[11]

Ijspeert AJ, Nakanishi J. and Schaal S. (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), volume 2, p. 1398.

[12]

Jacobson DH and Mayne DQ (1970) Optimal control. Differential Dynamic Programming . New York: Elsevier Publishing Company.

[13]

Kappen HJ (2005a) Linear theory for control of nonlinear stochastic systems . Physical Review Letters 95: 200201.

[14]

Kappen HJ (2005b) Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, 2005: P11011.

[15]

Khatib O. (1995) Inertial properties in robotic manipulation: An object-level framework. The International Journal of Robotics Research 14: 19-36.

Digital Library

[16]

Kober J. and Peters J. (2009) Learning motor primitives in robotics. In Schuurmans D, Benigio J and Koller D (eds), Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, BC, 8-11 December 2009. Cambridge, MA: MIT Press.

[17]

Kormushev P., Calinon S. and Caldwell DG (2010) Robot motor skill coordination with EM-based reinforcement learning. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3232-3237.

[18]

Lantoine G. and Russell RP (2008) A hybrid differential dynamic programming algorithm for robust low-thrust optimization. In AAS/AIAA Astrodynamics Specialist Conference and Exhibit.

[19]

Li W. and Todorov E. (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vol. 1, pp. 222-229.

[20]

Morimoto J. and Atkeson C. (2002) Minimax differential dynamic programming: An application to robust biped walking. In Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, pp. 1563-1570.

[21]

ksendal BK (2003) Stochastic differential equations: an introduction with applications (6th edn). Berlin: Springer.

[22]

Peters J. (2007) Machine learning of motor skills for robotics. PhD thesis, Department of Computer Science, University of Southern California.

[23]

Schaal S. (2009) The SL Simulation and Real-time Control Software Package. Technical report, University of Southern California, 2009.

[24]

Sciavicco L. and Siciliano B. (2000a) Modelling and Control of Robot Manipulators. London: Springer.

[25]

Sciavicco L. and Siciliano B. (2000b) Modelling and Control of Robot Manipulators. Berlin: Springer.

[26]

Selen LPJ, Franklin DW and Wolpert DM (2009) Impedance control reduces instability that arises from motor noise. Journal of Neuroscience 29: 12606-12616.

[27]

Siciliano B., Sciavicco L., Villani L. and Oriolo G. (2009) Robotics-Modelling, Planning and Control. London: Springer.

[28]

Siciliano B. and Villani L. (2000) Robot Force Control. Berlin : Springer.

[29]

Stengel RF (1994) Optimal Control and Estimation. New York: Dover Publications.

[30]

Stulp F., Buchli J., Theodorou E. and Schaal S. (2010) Reinforcement learning of full-body humanoid motor skills . In 10th IEEE-RAS International Conference on Humanoid Robots, pp. 405-410.

[31]

Sutton RS and Barto AG (1998) Adaptive computation and machine learning. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

[32]

Tassa Y., Erez T. and Smart W. (2008) Receding horizon differential dynamic programming . In Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press, pp. 1465-1472.

[33]

Theodorou E., Buchli J and Schaal S (2010a) A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research 11: 3137-3181.

Digital Library

[34]

Theodorou E., Buchli J. and Schaal S. (2010b) Reinforcement learning of motor skills in high dimensions: A path integral approach. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2397-2403.

[35]

Todorov E. (2007) Linearly-solvable Markov decision problems. In Schölkopf B, Platt J and Hoffman T (eds), Advances in Neural Information Processing Systems 19 (NIPS 2007), Vancouver, BC, 2007. Cambridge, MA: MIT Press.

[36]

Todorov E. (2008) General duality between optimal control and estimation . In Proceedings of the 47th IEEE Conference on Decision and Control.

[37]

Todorov E. (2009) Efficient computation of optimal actions. Proceedings of the National Academy of Sciences of the U S A 106: 11478-11483.

[38]

van den Broek B., Wiegerinck W. and Kappen B. (2008) Graphical model inference in optimal control of stochastic multi-agent systems. Journal of Artificial Intelligence Research 32: 95-122.

[39]

Vincent T. and Grantham W. (1997) Non Linear And Optimal Control Systems. New York: John Wiley & Sons Inc.

[40]

Yakowitz S. (1986) The stagewise Kuhn-Tucker condition and differential dynamic programming. IEEE Transactions on Automatic Control 31: 25-30.

[41]

Yong J. (1997) Relations among ODEs, PDEs, FSDEs, BSDEs, and FBSDEs . In Proceedings of the 36th IEEE Conference on Decision and Control, December 1997, volume 3, pp. 2779-2784.

[42]

Zefran M., Kumar V. and Croke CB (1998) On the generation of smooth three-dimensional rigid body motions. IEEE Transactions on Robotics and Automation 14: 576-589.

Cited By

Nah MLachner JHogan N(2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1177/02783649241258782
Padalkar AQuere GRaffin ASilvério JStulp F(2024)Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control TemplatesAutonomous Robots10.1007/s10514-024-10164-648:4-5Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1007/s10514-024-10164-6
Pollayil MAngelini FXin GMistry MVijayakumar SBicchi AGarabini M(2023)Choosing Stiffness and Damping for Optimal Impedance PlanningIEEE Transactions on Robotics10.1109/TRO.2022.321607839:2(1281-1300)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TRO.2022.3216078
Show More Cited By

Learning variable impedance control
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Planning and scheduling

Recommendations

Learning potential functions from human demonstrations with encapsulated dynamic and compliant behaviors

We consider the problem of devising a unified control policy capable of regulating both the robot motion and its physical interaction with the environment. We formulate this control policy by a non-parametric potential function and a dissipative field, ...
Model-Based Reinforcement Learning Variable Impedance Control for Human-Robot Collaboration
Abstract
Industry 4.0 is taking human-robot collaboration at the center of the production environment. Collaborative robots enhance productivity and flexibility while reducing human’s fatigue and the risk of injuries, exploiting advanced control ...
Application of Compliant Control in Position-Based Humanoid Robot
Intelligent Robotics and Applications
Abstract
In this paper, we study the comprehensive application of compliant control in position-based humanoid robot. In order to enhance the external disturbance resistance and terrain adaptability of humanoid robot, we propose three compliant controller ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research

International Journal of Robotics Research Volume 30, Issue 7

June 2011

176 pages

ISSN:0278-3649

Issue’s Table of Contents

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 June 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nah MLachner JHogan N(2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1177/02783649241258782
Padalkar AQuere GRaffin ASilvério JStulp F(2024)Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control TemplatesAutonomous Robots10.1007/s10514-024-10164-648:4-5Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1007/s10514-024-10164-6
Pollayil MAngelini FXin GMistry MVijayakumar SBicchi AGarabini M(2023)Choosing Stiffness and Damping for Optimal Impedance PlanningIEEE Transactions on Robotics10.1109/TRO.2022.321607839:2(1281-1300)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TRO.2022.3216078
Dong BJing YZhu XCui YAn T(2023)Adaptive Impedance Decentralized Control of Modular Robot Manipulators for Physical Human-robot InteractionJournal of Intelligent and Robotic Systems10.1007/s10846-023-01978-0109:3Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1007/s10846-023-01978-0
Carius JRanftl RFarshidian FHutter M(2022)Constrained stochastic optimal control with learned importance samplingInternational Journal of Robotics Research10.1177/0278364921104789041:2(189-209)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1177/02783649211047890
Martín-Martín RLee MGardner RSavarese SBohg JGarg A(2022)Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS40897.2019.8968201(1010-1017)Online publication date: 28-Dec-2022
https://dl.acm.org/doi/10.1109/IROS40897.2019.8968201
Chang CHaninger KShi YYuan CChen ZZhang J(2022)Impedance Adaptation by Reinforcement Learning with Contact Dynamic Movement Primitives2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)10.1109/AIM52237.2022.9863416(1185-1191)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1109/AIM52237.2022.9863416
Zhang YZhao FLiao Z(2022)Learning and Generalizing Variable Impedance Manipulation Skills from Human Demonstrations2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)10.1109/AIM52237.2022.9863389(810-815)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1109/AIM52237.2022.9863389
Gómez VKappen HPeters JNeumann G(2022)Policy Search for Path Integral ControlMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_31(482-497)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44848-9_31
Shu XNi FMin KLiu YLiu H(2021)An Adaptive Force Control Architecture with Fast-Response and Robustness in Uncertain Environment^∗2021 IEEE International Conference on Robotics and Biomimetics (ROBIO)10.1109/ROBIO54168.2021.9739648(1040-1045)Online publication date: 27-Dec-2021
https://dl.acm.org/doi/10.1109/ROBIO54168.2021.9739648
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents