Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3523760.3523841acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article
Public Access

APReL: A Library for Active Preference-based Reward Learning Algorithms

Published: 07 March 2022 Publication History

Abstract

Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants. Many preference-based learning algorithms and active querying techniques have been proposed as a solution to this problem. In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop their own algorithms for various modules of the problem. APReL is available at https://github.com/Stanford-ILIAD/APReL.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Pieter Abbeel and Andrew Y Ng. "Apprenticeship learning via inverse reinforcement learning". In: Proceedings of the twentyfirst international conference on Machine learning. 2004, p. 1.
[2]
Brian D Ziebart et al. "Maximum entropy inverse reinforcement learning." In: Aaai. Vol. 8. Chicago, IL, USA. 2008, pp. 1433--1438.
[3]
Mengxi Li et al. "Learning Human Objectives from Sequences of Physical Corrections". In: arXiv preprint arXiv:2104.00078 (2021).
[4]
Andrea Bajcsy et al. "Learning robot objectives from physical human interaction". In: Conference on Robot Learning. PMLR. 2017, pp. 217--226.
[5]
Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. "Thirdperson imitation learning". In: International Conference on Learning Representations (ICLR). 2017.
[6]
Dilip Arumugam et al. "Grounding natural language instructions to semantic goal representations for abstraction and generalization". In: Autonomous Robots 43.2 (2019), pp. 449-- 468.
[7]
Wei Chu, Zoubin Ghahramani, and Christopher KI Williams. "Gaussian processes for ordinal regression." In: Journal of machine learning research 6.7 (2005).
[8]
Ankit Shah, Samir Wadhwania, and Julie Shah. "Interactive robot training for non-markov tasks". In: arXiv preprint arXiv:2003.02232 (2020).
[9]
Dorsa Sadigh et al. "Active Preference-Based Learning of Reward Functions". In: Proceedings of Robotics: Science and Systems (RSS). July 2017.
[10]
Paul F Christiano et al. "Deep reinforcement learning from human preferences". In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, pp. 4302--4310.
[11]
Daniel Brown et al. "Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations". In: International conference on machine learning. PMLR. 2019, pp. 783--792.
[12]
Marco De Gemmis et al. "Preference learning in recommender systems". In: Preference Learning 41 (2009), pp. 41--55.
[13]
Erdem Biyik et al. "Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences". In: The International Journal of Robotics Research (IJRR) (2021).
[14]
Greg Brockman et al. "Openai gym". In: arXiv preprint arXiv:1606.01540 (2016).
[15]
Sydney M Katz, Anne-Claire Le Bihan, and Mykel J Kochenderfer. "Learning an urban air mobility encounter model from expert preferences". In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). IEEE. 2019, pp. 1--8.
[16]
Malayandi Palan et al. "Learning Reward Functions by Integrating Human Demonstrations and Preferences". In: Proceedings of Robotics: Science and Systems (RSS). June 2019.
[17]
Erdem Biyik et al. "The Green Choice: Learning and Influencing Human Decisions on Shared Roads". In: Proceedings of the 58th IEEE Conference on Decision and Control (CDC). Dec. 2019.
[18]
Erdem Biyik et al. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". In: Proceedings of the 3rd Conference on Robot Learning (CoRL). Oct. 2019.
[19]
Nils Wilde, Dana Kulic, and Stephen L Smith. "Active pref- ´ erence learning using maximum regret". In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2020, pp. 10952--10959.
[20]
Maegan Tucker et al. "Preference-based learning for exoskeleton gait optimization". In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2020, pp. 2351--2357.
[21]
Erdem Biyik and Dorsa Sadigh. "Batch Active PreferenceBased Learning of Reward Functions". In: Proceedings of the 2nd Conference on Robot Learning (CoRL). Vol. 87. Proceedings of Machine Learning Research. PMLR, Oct. 2018, pp. 519--528.
[22]
Erdem Biyik et al. "Batch Active Learning Using Determinantal Point Processes". In: arXiv preprint arXiv:1906.07975 (2019).
[23]
Alex Kulesza and Ben Taskar. "Determinantal Point Processes for Machine Learning". In: Foundations and Trends in Machine Learning 5.2--3 (2012), pp. 123--286. ISSN: 1935--8237. URL: http://dx.doi.org/10.1561/ 2200000044.
[24]
Deepak Ramachandran and Eyal Amir. "Bayesian Inverse Reinforcement Learning." In: IJCAI. Vol. 7. 2007, pp. 2586-- 2591.
[25]
Nils Wilde et al. "Learning Reward Functions from Scale Feedback". In: Proceedings of the 5th Conference on Robot Learning (CoRL). 2021.
[26]
Chandrayee Basu et al. "Active learning of reward dynamics from hierarchical queries". In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2019, pp. 120--127.
[27]
Vivek Myers et al. "Learning Multimodal Rewards from Rankings". In: Proceedings of the 5th Conference on Robot Learning (CoRL). 2021.
[28]
Erdem Biyik et al. "Active Preference-Based Gaussian Process Regression for Reward Learning". In: Proceedings of Robotics: Science and Systems (RSS). July 2020. . 15607/rss.2020.xvi.041.
[29]
Kejun Li et al. "ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes". In: International Conference on Robotics and Automation (ICRA). May 2021.
[30]
Sydney Katz et al. "Preference-based Learning of Reward Function Features". In: arXiv preprint arXiv:2103.02727 (Mar. 2021).
[31]
Soheil Habibian, Ananth Jonnavittula, and Dylan P Losey. "Here's What I've Learned: Asking Questions that Reveal Reward Learning". In: arXiv preprint arXiv:2107.01995 (2021).
[32]
Nir Ailon. "An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity." In: Journal of Machine Learning Research 13.1 (2012).
[33]
Hae-Sang Park and Chi-Hyuck Jun. "A simple and fast algorithm for K-medoids clustering". In: Expert systems with applications 36.2 (2009), pp. 3336--3341.
[34]
Christian Bauckhage. NumPy / SciPy Recipes for Data Science: k-Medoids Clustering. Feb. 2015. 4453.2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction
March 2022
1353 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 07 March 2022

Check for updates

Author Tags

  1. active learning
  2. preference-based learning
  3. reward learning
  4. software library

Qualifiers

  • Research-article

Data Availability

Funding Sources

  • FLI
  • NSF
  • DARPA

Conference

HRI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 185
    Total Downloads
  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)6
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media