Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A Human-Robot Collaborative Reinforcement Learning Algorithm

Published: 01 November 2010 Publication History

Abstract

This paper presents a new reinforcement learning algorithm that enables collaborative learning between a robot and a human. The algorithm which is based on the Q ( ) approach expedites the learning process by taking advantage of human intelligence and expertise. The algorithm denoted as CQ ( ) provides the robot with self awareness to adaptively switch its collaboration level from autonomous (self performing, the robot decides which actions to take, according to its learning function) to semi-autonomous (a human advisor guides the robot and the robot combines this knowledge into its learning function). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated and evaluated using a fixed-arm robot for finding the optimal shaking policy to empty the contents of a plastic bag. A comparison between the CQ ( ) and the traditional Q ( )-reinforcement learning algorithm, resulted in faster convergence for the CQ ( ) collaborative reinforcement learning algorithm.

References

[1]
Zhu, W., Levinson, S.: Vision-based reinforcement learning for robot navigation. In: Proceedings of the International Joint Conference on Neural Networks, Washington DC, vol. 2, pp. 1025-1030 (2001).
[2]
Papudesi, V.N., Huber, M.: Learning from reinforcement and advice using composite reward functions. In: Proceedings of the 16th International FLAIRS Conference, pp. 361-365, St. Augustine, FL (2003).
[3]
Papudesi, V.N., Wang, Y., Huber, M., Cook, D.J.: Integrating user commands and autonomous task performance in a reinforcement learning framework. In: AAAI Spring Symposium on Human Interaction with Autonomous Systems in Complex Environments, pp. 160-165. Stanford University, CA (2003).
[4]
Kui-Hong, P., Jun, J., Jong-Hwan, K.: Stabilization of biped robot based on two mode Q-learning. In: Proceedings of the 2nd International Conference on Autonomous Robots and Agents, pp. 446-451. New Zealand (2004).
[5]
Broadbent, R., Peterson, T.: Robot learning in partially observable, noisy, continuous worlds. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 4386-4393. Barcelona, Spain (2005).
[6]
Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation, pp. 2997-3002 (2006).
[7]
Kartoun, U., Stern, H., Edan, Y.: Human-robot collaborative learning of a bag shaking trajectory. In: The Israel Conference on Robotics (ICR 2006), Faculty of Engineering, Tel Aviv University, June (2006).
[8]
Kartoun, U., Stern, H., Edan, Y.: Human-robot collaborative learning system for inspection. In: IEEE International Conference on Systems, Man, and Cybernetics, Taipei, Taiwan, October, pp. 4249-4255 (2006).
[9]
Mihalkova, L., Mooney, R.: Using active relocation to aid reinforcement. In: Proceedings of the 19th International FLAIRS Conference (FLAIRS-2006), Melbourne Beach, Florida, pp. 580-585 (2006).
[10]
Fernández, F., Borrajo, D., Parker, L.E.: A Reinforcement learning algorithm in cooperative multi-robot domains. J. Intell. Robot. Syst. 4(2-4), 161-174 (2005).
[11]
Kartoun, U., Shapiro, A., Stern, H., Edan, Y.: Physical modeling of a bag knot in a robot learning system. IEEE Trans. Automat. Sci. Eng. 7(1), 172-177 (2010).
[12]
Katic, D.M., Rodic, A.D., Vukobratovic, M.K.: Hybrid dynamic control algorithm for humanoid robots based on reinforcement learning. J. Intell. Robot. Syst. 51(1), 3-30 (2008).
[13]
Anderson, G.T., Yang, Y., Cheng, G.: An adaptable oscillator-based controller for autonomous robots. J. Intell. Robot. Syst. 54(5), 755-767 (2009).
[14]
Peters, J., Schaal, S.: Learning to control in operational space. Int. J. Rob. Res. 27, 197-212 (2008).
[15]
Ribeiro, C.: Embedding a priori knowledge in reinforcement learning. J. Intell. Robot. Syst. 21(1), 51-71 (1998).
[16]
Hoffmann, H., Theodorou, E., Schaal, S.: Human optimization strategies under reward feedback. Abstracts of Neural Control of Movement Conference (NCM 2009) (2009).
[17]
Schmidhuber, J.: Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connect. Sci. 18(2), 173-187 (2006).
[18]
Mataric, M.J.: Reinforcement learning in the multi-robot domain. Auton. Robots 4(1), 73-83 (1997).
[19]
Dahl, T.S., Mataric, M.J., Sukhatme, G.S.: Multi-robot task allocation through vacancy chain scheduling. J. Robot. Auton. Syst. 57(6), 674-687 (2009).
[20]
Fukuda, T., Funato, D., Arai, F.: Recognizing environmental change through multiplex reinforcement learning in group robot system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 972-977 (1999).
[21]
Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: International Conference on Autonomous Agents and Multi-agent Systems (AAMAS'07), 2007.
[22]
Touzet, C.F.: Q-Learning for Robots. The Handbook of Brain Theory and Neural Networks, pp. 934-937. MIT Press, Cambridge (2003).
[23]
Inamura, T., Inaba, M., Inoue, H.: Integration model of learning mechanism and dialogue strategy based on stochastic experience representation using Bayesian network. In: Proceedings of the 9th IEEE International Workshop on Robot and Human Interactive Communication, RO-MAN 2000, pp. 247-252 (2000).
[24]
Inamura, T., Inaba, M., Inoue, H.: User adaptation of human-robot interaction model based on Bayesian network and introspection of interaction experience. In: International Conference on Intelligent Robots and Systems (IROS 2000), vol. 3, pp. 2139-2144 (2000).
[25]
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469-483 (2009).
[26]
Katagami, D., Yamada, S.: Interactive classifier system for real robot learning. In: Proceedings of the 9th IEEE International Workshop on Robot and Human Interactive Communication, RO-MAN 2000, pp. 258-263 (2000).
[27]
Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the International Conference Machine Learning, pp. 12-20 (1997).
[28]
Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J. Artif. Intell. Res. 19, 569-629 (2003).
[29]
Chernova, S., Veloso, M.: Interactive policy learning through confidence-based autonomy. J. Artif. Intell. Res. 34, 1-25 (2009).
[30]
Chernova, S., Veloso, M.: Multi-thresholded approach to demonstration selection for interactive robot learning. In: The 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI'08), pp. 225-232 (2008).
[31]
Thomaz, A.L., Breazeal, C.: Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
[32]
Thomaz, A.L., Breazeal, C.: Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. 172, 716-737 (2008).
[33]
Lockerd, A.L., Breazeal, C.: Tutelage and socially guided robot learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan (2004).
[34]
Breazeal, C., Thomaz, A.L.: Learning from human teachers with socially guided exploration. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 3539-3544 (2008).
[35]
Abbeel, P., Ng., A.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the International Conference Machine Learning, vol. 69, 2004.
[36]
Chernova, S., Veloso, M.: Learning equivalent action choices from demonstration. In: The International Conference on Intelligent Robots and Systems (IROS 2008), pp. 1216-1221 (2008).
[37]
Chernova, S., Veloso, M.: Teaching collaborative multi-robot tasks through demonstration. In: IEEE-RAS International Conference on Humanoid Robots, pp. 385-390 (2008).
[38]
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University (1989).
[39]
Peng, J., Williams, R.: Incremental multi-step Q-learning. Mach. Learn. 22(1-3), 283-290 (1996).
[40]
Dahmani, Y., Benyettou, A.: Seek of an optimal way by Q-learning. J. Comput. Sci. 1(1), 28-30 (2005).
[41]
Wang, Y., Huber, M., Papudesi, V.N., Cook, D.J.: User-guided reinforcement learning of robot assistive tasks for an intelligent environment. In: Proceedings of the IEEE/RJS International Conference on Intelligent Robots and Systems, vol. 1, pp. 424-429 (2003).
[42]
Clouse, J.A.: An Introspection Approach to Querying a Trainer. Technical Report: UM-CS- 1996-013. University of Massachusetts, Amherst (1996).
[43]
Takamatsu, J., Morita, T., Ogawara, K., Kimura, H., Ikeuchi, K.: Representation for knottying tasks. IEEE Trans. Robot. 22(1), 65-78 (2006).
[44]
Wakamatsu, H., Eiji, A., Shinichi, H.: Knotting/unknotting manipulation of deformable linear objects. Int. J. Rob. Res. 25(4), 371-395 (2006).
[45]
Matsuno, T., Fukuda, T.: Manipulation of flexible rope using topological model based on sensor information. International Conference on Intelligent Robots and Systems, pp. 2638-2643 (2006).
[46]
Saha, M., Isto, P.: Motion planning for robotic manipulation of deformable linear objects. In: International Conference on Intelligent Robots and Systems, vol. 23(6), pp. 1141-1150 (2007).
[47]
Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6, 679-684 (1957).
[48]
Ribeiro, C.: Reinforcement learning agents. Artif. Intell. Rev. 17(3), 223-250 (2002).
[49]
Smart, W.D., Kaelbling, L.: Practical reinforcement learning in continuous spaces. In: Proceedings of the 17th International Conference on Machine Learning, pp. 903-910 (2000).
[50]
Bellman, R., Kalaba, R.: Dynamic Programming and Modern Control Theory. Academic Press, New York (1965).
[51]
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279-292 (1992).
[52]
Glorennec, P.Y.: Reinforcement learning: an overview. European Symposium on Intelligent Techniques. Aachen, Germany, pp. 17-35 (2000).
[53]
S., Nason, Laird, J.E.: Soar-RL: integrating reinforcement learning with soar. In: Proceedings of the International Conference on Cognitive Modeling, pp. 51-59 (2004).
[54]
Natarajan, S., Tadepalli, P.: Dynamic preferences in multi-criteria reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany (2005).
[55]
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998).
[56]
Kartoun, U., Stern, H., Edan, Y.: Bag Classification Using Support Vector Machines. Applied Soft Computing Technologies: The Challenge of Complexity Series: Advances in Soft Computing, pp. 665-674. Springer, Berlin (2006).
[57]
Frank, M.J., Moustafa, A.A., Haughey, H.M., Curran, T., Hutchison, K.E.: Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. In: Proceedings of the National Academy of Sciences, vol. 104(41), pp. 16311-16316 (2007).
[58]
Abramson, M., Wechsler, H.: Tabu search exploration for on-policy reinforcement learning. In: Proceedings of the International Joint Conference on Neural Networks 4(20-24), 2910-2915 (2003).
[59]
Guo, M., Liu, Y., Malec, J.: A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern., Part B, Cybern. 34(5), 2140-2143 (2004).
[60]
Meng, X., Chen, Y., Pi, Y., Yuan, Q.: A novel multi-agent reinforcement learning algorithm combination with quantum computation. The 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 2613-2617 (2006).
[61]
Kartoun, U.: Human-Robot Collaborative Learning Methods. Ph.D. dissertation, Department of Industrial Engineering and Management, Ben-Gurion University of the Negev (2007).

Cited By

View all
  • (2023)Variable Autonomy through Responsible Robotics: Design Guidelines and Research AgendaACM Transactions on Human-Robot Interaction10.1145/363643213:1(1-36)Online publication date: 7-Dec-2023
  • (2022)A Survey of Robot Learning Strategies for Human-Robot Collaboration in Industrial SettingsRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2021.10223173:COnline publication date: 1-Feb-2022
  • (2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent and Robotic Systems
Journal of Intelligent and Robotic Systems  Volume 60, Issue 2
November 2010
153 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 November 2010

Author Tags

  1. Human-robot collaboration
  2. Reinforcement learning
  3. Robot learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Variable Autonomy through Responsible Robotics: Design Guidelines and Research AgendaACM Transactions on Human-Robot Interaction10.1145/363643213:1(1-36)Online publication date: 7-Dec-2023
  • (2022)A Survey of Robot Learning Strategies for Human-Robot Collaboration in Industrial SettingsRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2021.10223173:COnline publication date: 1-Feb-2022
  • (2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
  • (2021)Learn Task First or Learn Human Partner First: A Hierarchical Task Decomposition Method for Human-Robot Cooperation2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC52423.2021.9659041(590-595)Online publication date: 17-Oct-2021
  • (2020)A Control Scheme for Physical Human-Robot Interaction Coupled with an Environment of Unknown StiffnessJournal of Intelligent and Robotic Systems10.1007/s10846-020-01176-2100:1(165-182)Online publication date: 1-Oct-2020
  • (2018)Preference-Based Assistance Prediction for Human-Robot Collaboration Tasks2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2018.8593716(4441-4448)Online publication date: 1-Oct-2018
  • (2016)Multiple Model Q-Learning for Stochastic Asynchronous RewardsJournal of Intelligent and Robotic Systems10.1007/s10846-015-0222-281:3-4(407-422)Online publication date: 1-Mar-2016
  • (2015)Reinforcement learning of variable admittance control for human-robot co-manipulation2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2015.7353494(1011-1016)Online publication date: 28-Sep-2015
  • (2015)Socially-Assistive Emotional Robot that Learns from the Wizard During the Interaction for Preventing Low Back Pain in ChildrenSocial Robotics10.1007/978-3-319-25554-5_41(411-420)Online publication date: 26-Oct-2015

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media