Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3535850.3535868acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

A Hierarchical Bayesian Process for Inverse RL in Partially-Controlled Environments

Published: 09 May 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Robots learning from observations in the real world may encounter objects or agents in the environment, other than the expert giving the demonstration, that cause nuisance observations. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the sources of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian process that models both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing inverse reinforcement learning algorithm originally designed to work under partial occlusion of the expert to consider the diverse and noisy observations. In a simulated robotic produce-sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.

    Supplementary Material

    ZIP File (fp757aux.zip)
    We give the complete definitions of the MDPs for the two domains, the observation variables

    References

    [1]
    Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship Learning via Inverse Reinforcement Learning. In Twenty-first International Conference on Machine Learning (ICML). 1--8.
    [2]
    Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, Vol. 297 (2021), 103500.
    [3]
    Manel Baradad, Vickie Ye, Adam B. Yedidia, Frédo Durand, William T. Freeman, Gregory W. Wornell, and Antonio Torralba. 2018. Inferring Light Fields From Shadows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6267--6275.
    [4]
    Kenneth Bogert and Prashant Doshi. 2014. Multi-robot Inverse Reinforcement Learning Under Occlusion with Interactions. In International Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS) (AAMAS '14). 173--180.
    [5]
    Kenneth Bogert and Prashant Doshi. 2017. Scaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots Under Occlusion. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS). 522--529.
    [6]
    Kenneth Bogert and Prashant Doshi. 2018. Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions. Artificial Intelligence, Vol. 263 (2018), 46 -- 73.
    [7]
    Kenneth Bogert, Jonathan Feng-Shun Lin, Prashant Doshi, and Dana Kulic. 2016. Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data. In International Joint Conference on Autonomous Agents and Multiagent Systems. 1034--1042.
    [8]
    Daniel S Brown, Wonjoon Goo, and Scott Niekum. 2020. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning. PMLR, 330--359.
    [9]
    Letian Chen, Rohan Paleja, and Matthew Gombolay. 2020. Learning from suboptimal demonstration via self-supervised reward regression. arXiv preprint arXiv:2010.11723 (2020).
    [10]
    J. Choi and K. Kim. 2015. Hierarchical Bayesian Inverse Reinforcement Learning. IEEE Transactions on Cybernetics, Vol. 45, 4 (2015), 793--805.
    [11]
    Jaedeug Choi and Kee-Eung Kim. 2011. Inverse Reinforcement Learning in Partially Observable Environments. J. Mach. Learn. Res., Vol. 12 (2011), 691--730.
    [12]
    Jaedeug Choi and Kee-Eung Kim. 2013. Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (Beijing, China) (IJCAI '13). AAAI Press, 1287--1293.
    [13]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), Vol. 39 (1977), 1--38. Issue 1.
    [14]
    Christos Dimitrakakis and Constantin A. Rothkopf. 2012. Bayesian Multitask Inverse Reinforcement Learning. In 9th European Conference on Recent Advances in Reinforcement Learning. 273--284.
    [15]
    Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. 2017. Imitation Learning: A Survey of Learning Methods. Comput. Surveys, Vol. 50, 2, Article 21 (April 2017), 35 pages.
    [16]
    Kris M. Kitani, Brian D. Ziebart, J. Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision (ECCV). 201--214.
    [17]
    Sergey Levine, Zoran Popović, and Vladlen Koltun. 2010. Feature Construction for Inverse Reinforcement Learning. In Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS). 1342--1350.
    [18]
    Andrew Ng and Stuart Russell. 2000. Algorithms for inverse reinforcement learning. In Seventeenth International Conference on Machine Learning. 663--670.
    [19]
    Lawrence Rabiner. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE. 77(2):257--286.
    [20]
    Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. In 20th International Joint Conference on Artifical Intelligence (IJCAI). 2586--2591.
    [21]
    Stuart Russell. 1998. Learning Agents for Uncertain Environments (Extended Abstract). In Eleventh Annual Conference on Computational Learning Theory. 101--103.
    [22]
    Nihal Soans, Ehsan Asali, Yi Hong, and Prashant Doshi. 2020. SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams. In International Conference on Robotics and Automation (ICRA).
    [23]
    Jacob Steinhardt and Percy Liang. 2014. Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm. In 31st International Conference on Machine Learning. 1593--1601.
    [24]
    Luke Tierney. 1994. Markov chains for exploring posterior distributions. the Annals of Statistics (1994), 1701--1728.
    [25]
    Antonio Torralba and William T Freeman. 2012. Accidental pinhole and pinspeck cameras: Revealing the scene outside the picture. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 374--381.
    [26]
    Shaojun Wang and Dale Schuurmans Yunxin Zhao. 2012. The Latent Maximum Entropy Principle. ACM Transactions on Knowledge Discovery from Data, Vol. 6, 8 (2012).
    [27]
    Jiangchuan Zheng, Siyuan Liu, and Lionel M Ni. 2014. Robust bayesian inverse reinforcement learning with sparse behavior noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
    [28]
    Brian Ziebart. 2010. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. Ph.D. Dissertation. Carnegie Mellon University.
    [29]
    BD Ziebart, JA Bagnell, and AK Dey. 2010. Modeling interaction via the principle of maximum causal entropy. In ICML.
    [30]
    Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum Entropy Inverse Reinforcement Learning. In 23rd National Conference on Artificial Intelligence - Volume 3. 1433--1438.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
    May 2022
    1990 pages
    ISBN:9781450392136

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 09 May 2022

    Check for updates

    Author Tags

    1. cobots
    2. maximum entropy
    3. produce sorting
    4. uncertainty

    Qualifiers

    • Research-article

    Funding Sources

    • NSF
    • GA Research Alliance

    Conference

    AAMAS ' 22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 46
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media