Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2696454.2696455acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks

Published: 02 March 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We present a framework for automatically learning human user models from joint-action demonstrations that enables a robot to compute a robust policy for a collaborative task with a human. First, the demonstrated action sequences are clustered into different human types using an unsupervised learning algorithm. A reward function is then learned for each type through the employment of an inverse reinforcement learning algorithm. The learned model is then incorporated into a mixed-observability Markov decision process (MOMDP) formulation, wherein the human type is a partially observable variable. With this framework, we can infer online the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this user. In a human subject experiment (n=30), participants agreed more strongly that the robot anticipated their actions when working with a robot incorporating the proposed framework (p<0.01), compared to manually annotating robot actions. In trials where participants faced difficulty annotating the robot actions to complete the task, the proposed framework significantly improved team efficiency (p<0.01). The robot incorporating the framework was also found to be more responsive to human actions compared to policies computed using a hand-coded reward function by a domain expert (p<0.01). These results indicate that learning human user models from joint-action demonstrations and encoding them in a MOMDP formalism can support effective teaming in human-robot collaborative tasks.

    References

    [1]
    P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proc. ICML, 2004.
    [2]
    B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz. Trajectories and keyframes for kinesthetic teaching: a human-robot interaction perspective. In HRI, 2012.
    [3]
    B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robot. Auton. Syst., May 2009.
    [4]
    C. G. Atkeson and S. Schaal. Robot learning from demonstration. In ICML, pages 12--20, 1997.
    [5]
    T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, and D. Rus. Intention-aware motion planning. In WAFR. Springer, 2013.
    [6]
    S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. Stud. Appl. Math. SIAM, June 1994.
    [7]
    F. Broz, I. Nourbakhsh, and R. Simmons. Designing pomdp models of socially situated tasks. In RO-MAN, 2011.
    [8]
    S. Chernova and M. Veloso. Teaching multi-robot coordination using demonstration of communication and state sharing. In Proc. AAMAS, 2008.
    [9]
    F. Doshi and N. Roy. Efficient model learning for dialog management. In Proc. HRI, March 2007.
    [10]
    A. Dragan, R. Holladay, and S. Srinivasa. An analysis of deceptive robot motion. In RSS, 2014.
    [11]
    M. C. Gombolay, R. A. Gutierrez, G. F. Sturla, and J. A. Shah. Decision-making authority, team efficiency and human worker satisfaction in mixed human-robot teams. In RSS, 2014.
    [12]
    G. Hoffman and C. Breazeal. Effects of anticipatory action on human-robot teamwork efficiency, fluency, and perception of team. In Proc. HRI, 2007.
    [13]
    V. Jaaskinen, V. Parkkinen, L. Cheng, and J. Corander. Bayesian clustering of dna sequences using markov chains and a stochastic partition model. Stat. Appl. Genet. Mol., 2013.
    [14]
    L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998.
    [15]
    B. Kim and J. Pineau. Maximum mean discrepancy imitation learning. In Proceedings of RSS, 2013.
    [16]
    H. Kurniawati, D. Hsu, and W. S. Lee. Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, pages 65--72, 2008.
    [17]
    O. Macindoe, L. P. Kaelbling, and T. Lozano-Perez. Pomcop: Belief space planning for sidekicks in cooperative games. In AIIDE, 2012.
    [18]
    J. I. Marden. Analyzing and modeling rank data. CRC Press, 1995.
    [19]
    T. B. Murphy and D. Martin. Mixtures of distance-based models for ranking data. Computational statistics & data analysis, 2003.
    [20]
    T.-H. D. Nguyen, D. Hsu, W. S. Lee, T.-Y. Leong, L. P. Kaelbling, T. Lozano-Perez, and A. H. Grant. Capir: Collaborative action planning with intention recognition. In AIIDE, 2011.
    [21]
    M. N. Nicolescu and M. J. Mataric. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proc. AAMAS, 2003.
    [22]
    S. Nikolaidis and J. Shah. Human-robot cross-training: computational formulation, modeling and evaluation of a human team training strategy. In Proc. HRI, 2013.
    [23]
    S. C. Ong, Y. Grinberg, and J. Pineau. Mixed observability predictive state representations. In AAAI, 2013.
    [24]
    S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee. Planning under uncertainty for robotic tasks with mixed observability. IJRR, 29(8):1053--1068, 2010.
    [25]
    Phasespace, http://www.phasespace.com, 2012.
    [26]
    J. Pineau, G. Gordon, S. Thrun, et al. Point-based value iteration: An anytime algorithm for pomdps. In IJCAI, volume 3, pages 1025--1032, 2003.
    [27]
    G. Shani, J. Pineau, and R. Kaplow. A survey of point-based pomdp solvers. In Proc. AAMAS, 2013.
    [28]
    U. Syed and R. E. Schapire. A game-theoretic approach to apprenticeship learning. In Proc. NIPS, 2007.
    [29]
    R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Statist. Soc. Ser. B, 2003.
    [30]
    K. Waugh, B. D. Ziebart, and J. A. D. Bagnell. Computational rationalization: The inverse equilibrium problem. In Proc. ICML, June 2011.

    Cited By

    View all
    • (2024)Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot TeamingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662852(58-67)Online publication date: 6-May-2024
    • (2024)Integration of Deep Learning and Collaborative Robot for Assembly TasksApplied Sciences10.3390/app1402083914:2(839)Online publication date: 18-Jan-2024
    • (2024)HOTSPOT: An ad hoc teamwork platform for mixed human-robot teamsPLOS ONE10.1371/journal.pone.030570519:6(e0305705)Online publication date: 28-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HRI '15: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction
    March 2015
    368 pages
    ISBN:9781450328838
    DOI:10.1145/2696454
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 March 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. human-robot collaboration
    2. mixed observability markov decision process
    3. model learning

    Qualifiers

    • Research-article

    Conference

    HRI '15
    Sponsor:

    Acceptance Rates

    HRI '15 Paper Acceptance Rate 43 of 169 submissions, 25%;
    Overall Acceptance Rate 268 of 1,124 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)14
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot TeamingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662852(58-67)Online publication date: 6-May-2024
    • (2024)Integration of Deep Learning and Collaborative Robot for Assembly TasksApplied Sciences10.3390/app1402083914:2(839)Online publication date: 18-Jan-2024
    • (2024)HOTSPOT: An ad hoc teamwork platform for mixed human-robot teamsPLOS ONE10.1371/journal.pone.030570519:6(e0305705)Online publication date: 28-Jun-2024
    • (2024)Multi-scale progressive fusion-based depth image completion and enhancement for industrial collaborative robot applicationsJournal of Intelligent Manufacturing10.1007/s10845-023-02299-735:5(2119-2135)Online publication date: 1-Jun-2024
    • (2024)HAbot: a human-centered augmented reality robot programming method with the awareness of cognitive loadJournal of Intelligent Manufacturing10.1007/s10845-023-02096-235:5(1985-2003)Online publication date: 1-Jun-2024
    • (2023)Dec-AIRL: Decentralized Adversarial IRL for Human-Robot TeamingProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598753(1116-1124)Online publication date: 30-May-2023
    • (2023)Literature Review on Recent Trends and Perspectives of Collaborative Robotics in Work 4.0Robotics10.3390/robotics1203008412:3(84)Online publication date: 7-Jun-2023
    • (2023)FABRIC: A Framework for the Design and Evaluation of Collaborative Robots with Extended Human AdaptationACM Transactions on Human-Robot Interaction10.1145/358527612:3(1-54)Online publication date: 17-Mar-2023
    • (2023)Transfer Learning of Human Preferences for Proactive Robot Assistance in Assembly TasksProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3568162.3576965(575-583)Online publication date: 13-Mar-2023
    • (2023) Individual Squash Training is More Effective and Social with a Humanoid Robotic Coach * 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309567(621-626)Online publication date: 28-Aug-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media