Reward Learning with Trees: Methods and Evaluation

Bewley, Tom; Lawry, Jonathan; Richards, Arthur; Craddock, Rachel; Henderson, Ian

Computer Science > Machine Learning

arXiv:2210.01007 (cs)

[Submitted on 3 Oct 2022]

Title:Reward Learning with Trees: Methods and Evaluation

Authors:Tom Bewley, Jonathan Lawry, Arthur Richards, Rachel Craddock, Ian Henderson

View PDF

Abstract:Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment. We explore the merits of learning intrinsically interpretable tree models instead. We develop a recently proposed method for learning reward trees from preference labels, and show it to be broadly competitive with neural networks on challenging high-dimensional tasks, with good robustness to limited or corrupted data. Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used, demonstrating that the interpretable reward structure gives significant scope for traceability, verification and explanation.

Comments:	22 pages (9 main body). Preprint, under review
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.01007 [cs.LG]
	(or arXiv:2210.01007v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.01007

Submission history

From: Tom Bewley [view email]
[v1] Mon, 3 Oct 2022 15:17:25 UTC (5,744 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2022-10

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Reward Learning with Trees: Methods and Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reward Learning with Trees: Methods and Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators