research-article

A Hierarchical Bayesian Process for Inverse RL in Partially-Controlled Environments

Authors:

Kenneth Bogert,

Prashant DoshiAuthors Info & Claims

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Pages 145 - 153

Published: 09 May 2022 Publication History

Abstract

Robots learning from observations in the real world may encounter objects or agents in the environment, other than the expert giving the demonstration, that cause nuisance observations. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the sources of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian process that models both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing inverse reinforcement learning algorithm originally designed to work under partial occlusion of the expert to consider the diverse and noisy observations. In a simulated robotic produce-sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.

Supplementary Material

ZIP File (fp757aux.zip)

We give the complete definitions of the MDPs for the two domains, the observation variables

Download
7.66 MB

References

[1]

Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship Learning via Inverse Reinforcement Learning. In Twenty-first International Conference on Machine Learning (ICML). 1--8.

[2]

Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, Vol. 297 (2021), 103500.

[3]

Manel Baradad, Vickie Ye, Adam B. Yedidia, Frédo Durand, William T. Freeman, Gregory W. Wornell, and Antonio Torralba. 2018. Inferring Light Fields From Shadows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6267--6275.

[4]

Kenneth Bogert and Prashant Doshi. 2014. Multi-robot Inverse Reinforcement Learning Under Occlusion with Interactions. In International Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS) (AAMAS '14). 173--180.

[5]

Kenneth Bogert and Prashant Doshi. 2017. Scaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots Under Occlusion. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS). 522--529.

[6]

Kenneth Bogert and Prashant Doshi. 2018. Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions. Artificial Intelligence, Vol. 263 (2018), 46 -- 73.

[7]

Kenneth Bogert, Jonathan Feng-Shun Lin, Prashant Doshi, and Dana Kulic. 2016. Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data. In International Joint Conference on Autonomous Agents and Multiagent Systems. 1034--1042.

[8]

Daniel S Brown, Wonjoon Goo, and Scott Niekum. 2020. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning. PMLR, 330--359.

[9]

Letian Chen, Rohan Paleja, and Matthew Gombolay. 2020. Learning from suboptimal demonstration via self-supervised reward regression. arXiv preprint arXiv:2010.11723 (2020).

[10]

J. Choi and K. Kim. 2015. Hierarchical Bayesian Inverse Reinforcement Learning. IEEE Transactions on Cybernetics, Vol. 45, 4 (2015), 793--805.

[11]

Jaedeug Choi and Kee-Eung Kim. 2011. Inverse Reinforcement Learning in Partially Observable Environments. J. Mach. Learn. Res., Vol. 12 (2011), 691--730.

Digital Library

[12]

Jaedeug Choi and Kee-Eung Kim. 2013. Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (Beijing, China) (IJCAI '13). AAAI Press, 1287--1293.

Digital Library

[13]

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), Vol. 39 (1977), 1--38. Issue 1.

[14]

Christos Dimitrakakis and Constantin A. Rothkopf. 2012. Bayesian Multitask Inverse Reinforcement Learning. In 9th European Conference on Recent Advances in Reinforcement Learning. 273--284.

[15]

Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. 2017. Imitation Learning: A Survey of Learning Methods. Comput. Surveys, Vol. 50, 2, Article 21 (April 2017), 35 pages.

Digital Library

[16]

Kris M. Kitani, Brian D. Ziebart, J. Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision (ECCV). 201--214.

Digital Library

[17]

Sergey Levine, Zoran Popović, and Vladlen Koltun. 2010. Feature Construction for Inverse Reinforcement Learning. In Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS). 1342--1350.

[18]

Andrew Ng and Stuart Russell. 2000. Algorithms for inverse reinforcement learning. In Seventeenth International Conference on Machine Learning. 663--670.

Digital Library

[19]

Lawrence Rabiner. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE. 77(2):257--286.

[20]

Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. In 20th International Joint Conference on Artifical Intelligence (IJCAI). 2586--2591.

Digital Library

[21]

Stuart Russell. 1998. Learning Agents for Uncertain Environments (Extended Abstract). In Eleventh Annual Conference on Computational Learning Theory. 101--103.

Digital Library

[22]

Nihal Soans, Ehsan Asali, Yi Hong, and Prashant Doshi. 2020. SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams. In International Conference on Robotics and Automation (ICRA).

[23]

Jacob Steinhardt and Percy Liang. 2014. Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm. In 31st International Conference on Machine Learning. 1593--1601.

Digital Library

[24]

Luke Tierney. 1994. Markov chains for exploring posterior distributions. the Annals of Statistics (1994), 1701--1728.

[25]

Antonio Torralba and William T Freeman. 2012. Accidental pinhole and pinspeck cameras: Revealing the scene outside the picture. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 374--381.

[26]

Shaojun Wang and Dale Schuurmans Yunxin Zhao. 2012. The Latent Maximum Entropy Principle. ACM Transactions on Knowledge Discovery from Data, Vol. 6, 8 (2012).

[27]

Jiangchuan Zheng, Siyuan Liu, and Lionel M Ni. 2014. Robust bayesian inverse reinforcement learning with sparse behavior noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.

[28]

Brian Ziebart. 2010. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. Ph.D. Dissertation. Carnegie Mellon University.

Digital Library

[29]

BD Ziebart, JA Bagnell, and AK Dey. 2010. Modeling interaction via the principle of maximum causal entropy. In ICML.

[30]

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum Entropy Inverse Reinforcement Learning. In 23rd National Conference on Artificial Intelligence - Volume 3. 1433--1438.

Digital Library

Index Terms

A Hierarchical Bayesian Process for Inverse RL in Partially-Controlled Environments
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
      1. Robotic autonomy
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Inverse reinforcement learning

Recommendations

Inverse Reinforcement Learning Under Noisy Observations
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, noisy observations of the trajectory are available. We generalize the previous method of ...
Objective Bayesian probabilistic logic

This paper develops connections between objective Bayesian epistemology-which holds that the strengths of an agent's beliefs should be representable by probabilities, should be calibrated with evidence of empirical probability, and should otherwise be ...
Adaptive behaviors of reactive mobile robot with Bayesian inference in nonstationary environments

This paper presents a technique for a reactive mobile robot to adaptively behave in unforeseen and dynamic circumstances. A robot in nonstationary environments needs to infer how to adaptively behave to the changing environment. Behavior-based approach ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

May 2022

1990 pages

ISBN:9781450392136

General Chairs:
Catherine Pelachaud
CNRS-ISIR, Sorbonne University, France
,
Matthew E. Taylor
University of Alberta, Canada
,
Program Chairs:
Piotr Faliszewski
AGH University of Science and Technology, Poland
,
Viviana Mascardi
University of Genova, Italy

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
GA Research Alliance

Conference

AAMAS ' 22

Sponsor:

SIGAI

AAMAS ' 22: International Conference on Autonomous Agents and Multi-Agent Systems

May 9 - 13, 2022

Virtual Event, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
46
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents