Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Learning variable impedance control

Published: 01 June 2011 Publication History

Abstract

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI 2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

References

[1]
Basar T. and Bernhard P. (1995) H-infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Boston, MA: Birkhäuser.
[2]
Buchli J., Kalakrishnan M., Mistry M., Pastor P. and Schaal S. (2009) Compliant quadruped locomotion over rough terrain . In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 814-820.
[3]
Buchli J., Theodorou E., Stulp F. and Schaal S. (2010) Variable impdeance control-a reinforcement learning approach . In Proceedings of Robotics: Science and Systems 2010.
[4]
Burdet E., Tee KP, Mareels I., Milner TE, Chew CM, Franklin DW, et al. (2006) Stability and motor adaptation in human arm movements. Biological Cybernetics 94: 20-32.
[5]
Caflisch RE Monte Carlo and quasi-Monte Carlo methods. Acta Numerica 7: 1-49.
[6]
Cheng G., Hyon S., Morimoto J., Ude A., Hale JG, Colvin G., et al. (2007) CB: A humanoid research platform for exploring neuroscience. Journal of Advanced Robotics 21: 1097-1114.
[7]
Fletcher R. (1981) Practical Methods of Optimization, volume 2. New York: Wiley.
[8]
Hogan N. (1985a) Impedance control: An approach to manipulation: Part I-Theory. ASME Transactions, Journal of Dynamic Systems, Measurement, and Control 107: 1-7.
[9]
Hogan N. (1985b) Impedance control: An approach to manipulation: Part II-Implementation. ASME Transactions, Journal of Dynamic Systems, Measurement, and Control 107: 8-16.
[10]
Hogan (2006) Force control with a muscle-activated endoskeleton. Advances in Robot Control . Berlin: Springer.
[11]
Ijspeert AJ, Nakanishi J. and Schaal S. (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), volume 2, p. 1398.
[12]
Jacobson DH and Mayne DQ (1970) Optimal control. Differential Dynamic Programming . New York: Elsevier Publishing Company.
[13]
Kappen HJ (2005a) Linear theory for control of nonlinear stochastic systems . Physical Review Letters 95: 200201.
[14]
Kappen HJ (2005b) Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, 2005: P11011.
[15]
Khatib O. (1995) Inertial properties in robotic manipulation: An object-level framework. The International Journal of Robotics Research 14: 19-36.
[16]
Kober J. and Peters J. (2009) Learning motor primitives in robotics. In Schuurmans D, Benigio J and Koller D (eds), Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, BC, 8-11 December 2009. Cambridge, MA: MIT Press.
[17]
Kormushev P., Calinon S. and Caldwell DG (2010) Robot motor skill coordination with EM-based reinforcement learning. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3232-3237.
[18]
Lantoine G. and Russell RP (2008) A hybrid differential dynamic programming algorithm for robust low-thrust optimization. In AAS/AIAA Astrodynamics Specialist Conference and Exhibit.
[19]
Li W. and Todorov E. (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vol. 1, pp. 222-229.
[20]
Morimoto J. and Atkeson C. (2002) Minimax differential dynamic programming: An application to robust biped walking. In Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, pp. 1563-1570.
[21]
ksendal BK (2003) Stochastic differential equations: an introduction with applications (6th edn). Berlin: Springer.
[22]
Peters J. (2007) Machine learning of motor skills for robotics. PhD thesis, Department of Computer Science, University of Southern California.
[23]
Schaal S. (2009) The SL Simulation and Real-time Control Software Package. Technical report, University of Southern California, 2009.
[24]
Sciavicco L. and Siciliano B. (2000a) Modelling and Control of Robot Manipulators. London: Springer.
[25]
Sciavicco L. and Siciliano B. (2000b) Modelling and Control of Robot Manipulators. Berlin: Springer.
[26]
Selen LPJ, Franklin DW and Wolpert DM (2009) Impedance control reduces instability that arises from motor noise. Journal of Neuroscience 29: 12606-12616.
[27]
Siciliano B., Sciavicco L., Villani L. and Oriolo G. (2009) Robotics-Modelling, Planning and Control. London: Springer.
[28]
Siciliano B. and Villani L. (2000) Robot Force Control. Berlin : Springer.
[29]
Stengel RF (1994) Optimal Control and Estimation. New York: Dover Publications.
[30]
Stulp F., Buchli J., Theodorou E. and Schaal S. (2010) Reinforcement learning of full-body humanoid motor skills . In 10th IEEE-RAS International Conference on Humanoid Robots, pp. 405-410.
[31]
Sutton RS and Barto AG (1998) Adaptive computation and machine learning. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
[32]
Tassa Y., Erez T. and Smart W. (2008) Receding horizon differential dynamic programming . In Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press, pp. 1465-1472.
[33]
Theodorou E., Buchli J and Schaal S (2010a) A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research 11: 3137-3181.
[34]
Theodorou E., Buchli J. and Schaal S. (2010b) Reinforcement learning of motor skills in high dimensions: A path integral approach. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2397-2403.
[35]
Todorov E. (2007) Linearly-solvable Markov decision problems. In Schölkopf B, Platt J and Hoffman T (eds), Advances in Neural Information Processing Systems 19 (NIPS 2007), Vancouver, BC, 2007. Cambridge, MA: MIT Press.
[36]
Todorov E. (2008) General duality between optimal control and estimation . In Proceedings of the 47th IEEE Conference on Decision and Control.
[37]
Todorov E. (2009) Efficient computation of optimal actions. Proceedings of the National Academy of Sciences of the U S A 106: 11478-11483.
[38]
van den Broek B., Wiegerinck W. and Kappen B. (2008) Graphical model inference in optimal control of stochastic multi-agent systems. Journal of Artificial Intelligence Research 32: 95-122.
[39]
Vincent T. and Grantham W. (1997) Non Linear And Optimal Control Systems. New York: John Wiley & Sons Inc.
[40]
Yakowitz S. (1986) The stagewise Kuhn-Tucker condition and differential dynamic programming. IEEE Transactions on Automatic Control 31: 25-30.
[41]
Yong J. (1997) Relations among ODEs, PDEs, FSDEs, BSDEs, and FBSDEs . In Proceedings of the 36th IEEE Conference on Decision and Control, December 1997, volume 3, pp. 2779-2784.
[42]
Zefran M., Kumar V. and Croke CB (1998) On the generation of smooth three-dimensional rigid body motions. IEEE Transactions on Robotics and Automation 14: 576-589.

Cited By

View all
  • (2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
  • (2024)Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control TemplatesAutonomous Robots10.1007/s10514-024-10164-648:4-5Online publication date: 4-Jun-2024
  • (2023)Choosing Stiffness and Damping for Optimal Impedance PlanningIEEE Transactions on Robotics10.1109/TRO.2022.321607839:2(1281-1300)Online publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 30, Issue 7
June 2011
176 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 June 2011

Author Tags

  1. Reinforcement learning
  2. compliant control
  3. gain scheduling
  4. motion primitives
  5. stochastic optimal control
  6. variable impedance control

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
  • (2024)Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control TemplatesAutonomous Robots10.1007/s10514-024-10164-648:4-5Online publication date: 4-Jun-2024
  • (2023)Choosing Stiffness and Damping for Optimal Impedance PlanningIEEE Transactions on Robotics10.1109/TRO.2022.321607839:2(1281-1300)Online publication date: 1-Apr-2023
  • (2023)Adaptive Impedance Decentralized Control of Modular Robot Manipulators for Physical Human-robot InteractionJournal of Intelligent and Robotic Systems10.1007/s10846-023-01978-0109:3Online publication date: 26-Oct-2023
  • (2022)Constrained stochastic optimal control with learned importance samplingInternational Journal of Robotics Research10.1177/0278364921104789041:2(189-209)Online publication date: 1-Feb-2022
  • (2022)Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS40897.2019.8968201(1010-1017)Online publication date: 28-Dec-2022
  • (2022)Impedance Adaptation by Reinforcement Learning with Contact Dynamic Movement Primitives2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)10.1109/AIM52237.2022.9863416(1185-1191)Online publication date: 11-Jul-2022
  • (2022)Learning and Generalizing Variable Impedance Manipulation Skills from Human Demonstrations2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)10.1109/AIM52237.2022.9863389(810-815)Online publication date: 11-Jul-2022
  • (2022)Policy Search for Path Integral ControlMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_31(482-497)Online publication date: 10-Mar-2022
  • (2021)An Adaptive Force Control Architecture with Fast-Response and Robustness in Uncertain Environment2021 IEEE International Conference on Robotics and Biomimetics (ROBIO)10.1109/ROBIO54168.2021.9739648(1040-1045)Online publication date: 27-Dec-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media