research-article

Public Access

APReL: A Library for Active Preference-based Reward Learning Algorithms

Authors:

Dorsa SadighAuthors Info & Claims

HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

Pages 613 - 617

Published: 07 March 2022 Publication History

Abstract

Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants. Many preference-based learning algorithms and active querying techniques have been proposed as a solution to this problem. In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop their own algorithms for various modules of the problem. APReL is available at https://github.com/Stanford-ILIAD/APReL.

Supplemental Material

MP4 File

Supplemental video

Download
34.61 MB

References

[1]

Pieter Abbeel and Andrew Y Ng. "Apprenticeship learning via inverse reinforcement learning". In: Proceedings of the twentyfirst international conference on Machine learning. 2004, p. 1.

Digital Library

[2]

Brian D Ziebart et al. "Maximum entropy inverse reinforcement learning." In: Aaai. Vol. 8. Chicago, IL, USA. 2008, pp. 1433--1438.

Digital Library

[3]

Mengxi Li et al. "Learning Human Objectives from Sequences of Physical Corrections". In: arXiv preprint arXiv:2104.00078 (2021).

[4]

Andrea Bajcsy et al. "Learning robot objectives from physical human interaction". In: Conference on Robot Learning. PMLR. 2017, pp. 217--226.

[5]

Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. "Thirdperson imitation learning". In: International Conference on Learning Representations (ICLR). 2017.

[6]

Dilip Arumugam et al. "Grounding natural language instructions to semantic goal representations for abstraction and generalization". In: Autonomous Robots 43.2 (2019), pp. 449-- 468.

Digital Library

[7]

Wei Chu, Zoubin Ghahramani, and Christopher KI Williams. "Gaussian processes for ordinal regression." In: Journal of machine learning research 6.7 (2005).

[8]

Ankit Shah, Samir Wadhwania, and Julie Shah. "Interactive robot training for non-markov tasks". In: arXiv preprint arXiv:2003.02232 (2020).

[9]

Dorsa Sadigh et al. "Active Preference-Based Learning of Reward Functions". In: Proceedings of Robotics: Science and Systems (RSS). July 2017.

[10]

Paul F Christiano et al. "Deep reinforcement learning from human preferences". In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, pp. 4302--4310.

[11]

Daniel Brown et al. "Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations". In: International conference on machine learning. PMLR. 2019, pp. 783--792.

[12]

Marco De Gemmis et al. "Preference learning in recommender systems". In: Preference Learning 41 (2009), pp. 41--55.

[13]

Erdem Biyik et al. "Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences". In: The International Journal of Robotics Research (IJRR) (2021).

[14]

Greg Brockman et al. "Openai gym". In: arXiv preprint arXiv:1606.01540 (2016).

[15]

Sydney M Katz, Anne-Claire Le Bihan, and Mykel J Kochenderfer. "Learning an urban air mobility encounter model from expert preferences". In: 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). IEEE. 2019, pp. 1--8.

[16]

Malayandi Palan et al. "Learning Reward Functions by Integrating Human Demonstrations and Preferences". In: Proceedings of Robotics: Science and Systems (RSS). June 2019.

[17]

Erdem Biyik et al. "The Green Choice: Learning and Influencing Human Decisions on Shared Roads". In: Proceedings of the 58th IEEE Conference on Decision and Control (CDC). Dec. 2019.

Digital Library

[18]

Erdem Biyik et al. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". In: Proceedings of the 3rd Conference on Robot Learning (CoRL). Oct. 2019.

[19]

Nils Wilde, Dana Kulic, and Stephen L Smith. "Active pref- ´ erence learning using maximum regret". In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2020, pp. 10952--10959.

Digital Library

[20]

Maegan Tucker et al. "Preference-based learning for exoskeleton gait optimization". In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2020, pp. 2351--2357.

[21]

Erdem Biyik and Dorsa Sadigh. "Batch Active PreferenceBased Learning of Reward Functions". In: Proceedings of the 2nd Conference on Robot Learning (CoRL). Vol. 87. Proceedings of Machine Learning Research. PMLR, Oct. 2018, pp. 519--528.

[22]

Erdem Biyik et al. "Batch Active Learning Using Determinantal Point Processes". In: arXiv preprint arXiv:1906.07975 (2019).

[23]

Alex Kulesza and Ben Taskar. "Determinantal Point Processes for Machine Learning". In: Foundations and Trends in Machine Learning 5.2--3 (2012), pp. 123--286. ISSN: 1935--8237. URL: http://dx.doi.org/10.1561/ 2200000044.

[24]

Deepak Ramachandran and Eyal Amir. "Bayesian Inverse Reinforcement Learning." In: IJCAI. Vol. 7. 2007, pp. 2586-- 2591.

[25]

Nils Wilde et al. "Learning Reward Functions from Scale Feedback". In: Proceedings of the 5th Conference on Robot Learning (CoRL). 2021.

[26]

Chandrayee Basu et al. "Active learning of reward dynamics from hierarchical queries". In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2019, pp. 120--127.

[27]

Vivek Myers et al. "Learning Multimodal Rewards from Rankings". In: Proceedings of the 5th Conference on Robot Learning (CoRL). 2021.

[28]

Erdem Biyik et al. "Active Preference-Based Gaussian Process Regression for Reward Learning". In: Proceedings of Robotics: Science and Systems (RSS). July 2020. . 15607/rss.2020.xvi.041.

[29]

Kejun Li et al. "ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes". In: International Conference on Robotics and Automation (ICRA). May 2021.

[30]

Sydney Katz et al. "Preference-based Learning of Reward Function Features". In: arXiv preprint arXiv:2103.02727 (Mar. 2021).

[31]

Soheil Habibian, Ananth Jonnavittula, and Dylan P Losey. "Here's What I've Learned: Asking Questions that Reveal Reward Learning". In: arXiv preprint arXiv:2107.01995 (2021).

[32]

Nir Ailon. "An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity." In: Journal of Machine Learning Research 13.1 (2012).

[33]

Hae-Sang Park and Chi-Hyuck Jun. "A simple and fast algorithm for K-medoids clustering". In: Expert systems with applications 36.2 (2009), pp. 3336--3341.

[34]

Christian Bauckhage. NumPy / SciPy Recipes for Data Science: k-Medoids Clustering. Feb. 2015. 4453.2009

Index Terms

APReL: A Library for Active Preference-based Reward Learning Algorithms
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
  2. Machine learning
    1. Machine learning approaches

Recommendations

Batch Active Learning of Reward Functions from Human Preferences
Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based ...
Active preference-based Gaussian process regression for reward learning and optimization

Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for humans. A popular solution is to learn reward functions using ...
Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

March 2022

1353 pages

General Chairs:
Daisuke Sakamoto
Hokkaido University, Japan
,
Astrid Weiss
Technische Universität Wien, Austria
,
Program Chairs:
Laura M Hiatt
Naval Research Laboratory, USA
,
Masahiro Shiomi
ATR, Japan

Sponsors

Publisher

IEEE Press

Publication History

Published: 07 March 2022

Check for updates

Author Tags

Qualifiers

Research-article

Data Availability

Supplemental video https://dl.acm.org/doi/10.5555/3523760.3523841#360133.mp4

Funding Sources

FLI
NSF
DARPA

Conference

HRI '22

Sponsor:

HRI '22: ACM/IEEE International Conference on Human-Robot Interaction

March 7 - 10, 2022

Hokkaido, Sapporo, Japan

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
185
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)6

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten