article

A Human-Robot Collaborative Reinforcement Learning Algorithm

Authors:

Yael EdanAuthors Info & Claims

Journal of Intelligent and Robotic Systems, Volume 60, Issue 2

Pages 217 - 239

https://doi.org/10.1007/s10846-010-9422-y

Published: 01 November 2010 Publication History

Abstract

This paper presents a new reinforcement learning algorithm that enables collaborative learning between a robot and a human. The algorithm which is based on the Q ( ) approach expedites the learning process by taking advantage of human intelligence and expertise. The algorithm denoted as CQ ( ) provides the robot with self awareness to adaptively switch its collaboration level from autonomous (self performing, the robot decides which actions to take, according to its learning function) to semi-autonomous (a human advisor guides the robot and the robot combines this knowledge into its learning function). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated and evaluated using a fixed-arm robot for finding the optimal shaking policy to empty the contents of a plastic bag. A comparison between the CQ ( ) and the traditional Q ( )-reinforcement learning algorithm, resulted in faster convergence for the CQ ( ) collaborative reinforcement learning algorithm.

References

[1]

Zhu, W., Levinson, S.: Vision-based reinforcement learning for robot navigation. In: Proceedings of the International Joint Conference on Neural Networks, Washington DC, vol. 2, pp. 1025-1030 (2001).

[2]

Papudesi, V.N., Huber, M.: Learning from reinforcement and advice using composite reward functions. In: Proceedings of the 16th International FLAIRS Conference, pp. 361-365, St. Augustine, FL (2003).

[3]

Papudesi, V.N., Wang, Y., Huber, M., Cook, D.J.: Integrating user commands and autonomous task performance in a reinforcement learning framework. In: AAAI Spring Symposium on Human Interaction with Autonomous Systems in Complex Environments, pp. 160-165. Stanford University, CA (2003).

[4]

Kui-Hong, P., Jun, J., Jong-Hwan, K.: Stabilization of biped robot based on two mode Q-learning. In: Proceedings of the 2nd International Conference on Autonomous Robots and Agents, pp. 446-451. New Zealand (2004).

[5]

Broadbent, R., Peterson, T.: Robot learning in partially observable, noisy, continuous worlds. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4386-4393. Barcelona, Spain (2005).

[6]

Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation, pp. 2997-3002 (2006).

[7]

Kartoun, U., Stern, H., Edan, Y.: Human-robot collaborative learning of a bag shaking trajectory. In: The Israel Conference on Robotics (ICR 2006), Faculty of Engineering, Tel Aviv University, June (2006).

[8]

Kartoun, U., Stern, H., Edan, Y.: Human-robot collaborative learning system for inspection. In: IEEE International Conference on Systems, Man, and Cybernetics, Taipei, Taiwan, October, pp. 4249-4255 (2006).

[9]

Mihalkova, L., Mooney, R.: Using active relocation to aid reinforcement. In: Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006), Melbourne Beach, Florida, pp. 580-585 (2006).

[10]

Fernández, F., Borrajo, D., Parker, L.E.: A Reinforcement learning algorithm in cooperative multi-robot domains. J. Intell. Robot. Syst. 4(2-4), 161-174 (2005).

[11]

Kartoun, U., Shapiro, A., Stern, H., Edan, Y.: Physical modeling of a bag knot in a robot learning system. IEEE Trans. Automat. Sci. Eng. 7(1), 172-177 (2010).

[12]

Katic, D.M., Rodic, A.D., Vukobratovic, M.K.: Hybrid dynamic control algorithm for humanoid robots based on reinforcement learning. J. Intell. Robot. Syst. 51(1), 3-30 (2008).

Digital Library

[13]

Anderson, G.T., Yang, Y., Cheng, G.: An adaptable oscillator-based controller for autonomous robots. J. Intell. Robot. Syst. 54(5), 755-767 (2009).

Digital Library

[14]

Peters, J., Schaal, S.: Learning to control in operational space. Int. J. Rob. Res. 27, 197-212 (2008).

Digital Library

[15]

Ribeiro, C.: Embedding a priori knowledge in reinforcement learning. J. Intell. Robot. Syst. 21(1), 51-71 (1998).

[16]

Hoffmann, H., Theodorou, E., Schaal, S.: Human optimization strategies under reward feedback. Abstracts of Neural Control of Movement Conference (NCM 2009) (2009).

[17]

Schmidhuber, J.: Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connect. Sci. 18(2), 173-187 (2006).

[18]

Mataric, M.J.: Reinforcement learning in the multi-robot domain. Auton. Robots 4(1), 73-83 (1997).

[19]

Dahl, T.S., Mataric, M.J., Sukhatme, G.S.: Multi-robot task allocation through vacancy chain scheduling. J. Robot. Auton. Syst. 57(6), 674-687 (2009).

Digital Library

[20]

Fukuda, T., Funato, D., Arai, F.: Recognizing environmental change through multiplex reinforcement learning in group robot system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 972-977 (1999).

[21]

Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: International Conference on Autonomous Agents and Multi-agent Systems (AAMAS'07), 2007.

[22]

Touzet, C.F.: Q-Learning for Robots. The Handbook of Brain Theory and Neural Networks, pp. 934-937. MIT Press, Cambridge (2003).

[23]

Inamura, T., Inaba, M., Inoue, H.: Integration model of learning mechanism and dialogue strategy based on stochastic experience representation using Bayesian network. In: Proceedings of the 9th IEEE International Workshop on Robot and Human Interactive Communication, RO-MAN 2000, pp. 247-252 (2000).

[24]

Inamura, T., Inaba, M., Inoue, H.: User adaptation of human-robot interaction model based on Bayesian network and introspection of interaction experience. In: International Conference on Intelligent Robots and Systems (IROS 2000), vol. 3, pp. 2139-2144 (2000).

[25]

Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469-483 (2009).

Digital Library

[26]

Katagami, D., Yamada, S.: Interactive classifier system for real robot learning. In: Proceedings of the 9th IEEE International Workshop on Robot and Human Interactive Communication, RO-MAN 2000, pp. 258-263 (2000).

[27]

Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the International Conference Machine Learning, pp. 12-20 (1997).

[28]

Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J. Artif. Intell. Res. 19, 569-629 (2003).

Digital Library

[29]

Chernova, S., Veloso, M.: Interactive policy learning through confidence-based autonomy. J. Artif. Intell. Res. 34, 1-25 (2009).

Digital Library

[30]

Chernova, S., Veloso, M.: Multi-thresholded approach to demonstration selection for interactive robot learning. In: The 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI'08), pp. 225-232 (2008).

Digital Library

[31]

Thomaz, A.L., Breazeal, C.: Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.

Digital Library

[32]

Thomaz, A.L., Breazeal, C.: Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. 172, 716-737 (2008).

Digital Library

[33]

Lockerd, A.L., Breazeal, C.: Tutelage and socially guided robot learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan (2004).

[34]

Breazeal, C., Thomaz, A.L.: Learning from human teachers with socially guided exploration. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 3539-3544 (2008).

[35]

Abbeel, P., Ng., A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the International Conference Machine Learning, vol. 69, 2004.

[36]

Chernova, S., Veloso, M.: Learning equivalent action choices from demonstration. In: The International Conference on Intelligent Robots and Systems (IROS 2008), pp. 1216-1221 (2008).

[37]

Chernova, S., Veloso, M.: Teaching collaborative multi-robot tasks through demonstration. In: IEEE-RAS International Conference on Humanoid Robots, pp. 385-390 (2008).

[38]

Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University (1989).

[39]

Peng, J., Williams, R.: Incremental multi-step Q-learning. Mach. Learn. 22(1-3), 283-290 (1996).

Digital Library

[40]

Dahmani, Y., Benyettou, A.: Seek of an optimal way by Q-learning. J. Comput. Sci. 1(1), 28-30 (2005).

[41]

Wang, Y., Huber, M., Papudesi, V.N., Cook, D.J.: User-guided reinforcement learning of robot assistive tasks for an intelligent environment. In: Proceedings of the IEEE/RJS International Conference on Intelligent Robots and Systems, vol. 1, pp. 424-429 (2003).

[42]

Clouse, J.A.: An Introspection Approach to Querying a Trainer. Technical Report: UM-CS- 1996-013. University of Massachusetts, Amherst (1996).

[43]

Takamatsu, J., Morita, T., Ogawara, K., Kimura, H., Ikeuchi, K.: Representation for knottying tasks. IEEE Trans. Robot. 22(1), 65-78 (2006).

Digital Library

[44]

Wakamatsu, H., Eiji, A., Shinichi, H.: Knotting/unknotting manipulation of deformable linear objects. Int. J. Rob. Res. 25(4), 371-395 (2006).

Digital Library

[45]

Matsuno, T., Fukuda, T.: Manipulation of flexible rope using topological model based on sensor information. International Conference on Intelligent Robots and Systems, pp. 2638-2643 (2006).

[46]

Saha, M., Isto, P.: Motion planning for robotic manipulation of deformable linear objects. In: International Conference on Intelligent Robots and Systems, vol. 23(6), pp. 1141-1150 (2007).

[47]

Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6, 679-684 (1957).

[48]

Ribeiro, C.: Reinforcement learning agents. Artif. Intell. Rev. 17(3), 223-250 (2002).

[49]

Smart, W.D., Kaelbling, L.: Practical reinforcement learning in continuous spaces. In: Proceedings of the 17th International Conference on Machine Learning, pp. 903-910 (2000).

[50]

Bellman, R., Kalaba, R.: Dynamic Programming and Modern Control Theory. Academic Press, New York (1965).

[51]

Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279-292 (1992).

Digital Library

[52]

Glorennec, P.Y.: Reinforcement learning: an overview. European Symposium on Intelligent Techniques. Aachen, Germany, pp. 17-35 (2000).

[53]

S., Nason, Laird, J.E.: Soar-RL: integrating reinforcement learning with soar. In: Proceedings of the International Conference on Cognitive Modeling, pp. 51-59 (2004).

[54]

Natarajan, S., Tadepalli, P.: Dynamic preferences in multi-criteria reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany (2005).

[55]

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998).

[56]

Kartoun, U., Stern, H., Edan, Y.: Bag Classification Using Support Vector Machines. Applied Soft Computing Technologies: The Challenge of Complexity Series: Advances in Soft Computing, pp. 665-674. Springer, Berlin (2006).

[57]

Frank, M.J., Moustafa, A.A., Haughey, H.M., Curran, T., Hutchison, K.E.: Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. In: Proceedings of the National Academy of Sciences, vol. 104(41), pp. 16311-16316 (2007).

[58]

Abramson, M., Wechsler, H.: Tabu search exploration for on-policy reinforcement learning. In: Proceedings of the International Joint Conference on Neural Networks 4(20-24), 2910-2915 (2003).

[59]

Guo, M., Liu, Y., Malec, J.: A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern., Part B, Cybern. 34(5), 2140-2143 (2004).

Digital Library

[60]

Meng, X., Chen, Y., Pi, Y., Yuan, Q.: A novel multi-agent reinforcement learning algorithm combination with quantum computation. The 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 2613-2617 (2006).

[61]

Kartoun, U.: Human-Robot Collaborative Learning Methods. Ph.D. dissertation, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev (2007).

Cited By

Reinmund TSalvini PKunze LJirotka MWinfield A(2023)Variable Autonomy through Responsible Robotics: Design Guidelines and Research AgendaACM Transactions on Human-Robot Interaction10.1145/363643213:1(1-36)Online publication date: 7-Dec-2023
https://dl.acm.org/doi/10.1145/3636432
Mukherjee DGupta KChang LNajjaran H(2022)A Survey of Robot Learning Strategies for Human-Robot Collaboration in Industrial SettingsRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2021.10223173:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.rcim.2021.102231
Li ZShi LCristea AZhou Y(2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
https://dl.acm.org/doi/10.1145/3461778.3462135
Show More Cited By

Recommendations

Reinforcement Learning in the Multi-Robot Domain

This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of ...
Learning from Human Collaborative Experience: Robot Learning via Crowdsourcing of Human-Robot Interaction
HRI '17: Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction

Human-robot collaboration is a potential yet challenging robot development due to the vase diversity of partner human behaviors for the robot to adapt. In this work, we develop a robot learning framework that can learn by data-driven approach, where ...
Online Inverse Reinforcement Learning Under Occlusion
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. While this problem is witnessing sustained attention, the related problem of online IRL - where the observations are ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent and Robotic Systems

Journal of Intelligent and Robotic Systems Volume 60, Issue 2

November 2010

153 pages

ISSN:0921-0296

Issue’s Table of Contents

Copyright © Copyright © 2010 Springer Science+Business Media B.V.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 November 2010

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Reinmund TSalvini PKunze LJirotka MWinfield A(2023)Variable Autonomy through Responsible Robotics: Design Guidelines and Research AgendaACM Transactions on Human-Robot Interaction10.1145/363643213:1(1-36)Online publication date: 7-Dec-2023
https://dl.acm.org/doi/10.1145/3636432
Mukherjee DGupta KChang LNajjaran H(2022)A Survey of Robot Learning Strategies for Human-Robot Collaboration in Industrial SettingsRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2021.10223173:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.rcim.2021.102231
Li ZShi LCristea AZhou Y(2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
https://dl.acm.org/doi/10.1145/3461778.3462135
Tao LBowman MZhang JZhang X(2021)Learn Task First or Learn Human Partner First: A Hierarchical Task Decomposition Method for Human-Robot Cooperation2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC52423.2021.9659041(590-595)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1109/SMC52423.2021.9659041
Li HDharmawan AParanawithana IYang LTan U(2020)A Control Scheme for Physical Human-Robot Interaction Coupled with an Environment of Unknown StiffnessJournal of Intelligent and Robotic Systems10.1007/s10846-020-01176-2100:1(165-182)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.1007/s10846-020-01176-2
Grigore ERoncone AMangin OScassellati B(2018)Preference-Based Assistance Prediction for Human-Robot Collaboration Tasks2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2018.8593716(4441-4448)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/IROS.2018.8593716
Campbell JGivigi SSchwartz H(2016)Multiple Model Q-Learning for Stochastic Asynchronous RewardsJournal of Intelligent and Robotic Systems10.1007/s10846-015-0222-281:3-4(407-422)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s10846-015-0222-2
Dimeas FAspragathos N(2015)Reinforcement learning of variable admittance control for human-robot co-manipulation2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353494(1011-1016)Online publication date: 28-Sep-2015
https://dl.acm.org/doi/10.1109/IROS.2015.7353494
Magyar GVircikova M(2015)Socially-Assistive Emotional Robot that Learns from the Wizard During the Interaction for Preventing Low Back Pain in ChildrenSocial Robotics10.1007/978-3-319-25554-5_41(411-420)Online publication date: 26-Oct-2015
https://dl.acm.org/doi/10.1007/978-3-319-25554-5_41

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents