Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3278721.3278750acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Incorrigibility in the CIRL Framework

Published: 27 December 2018 Publication History

Abstract

A value learning system has incentives to follow shutdown instructions, assuming the shutdown instruction provides information (in the technical sense) about which actions lead to valuable outcomes. However, this assumption is not robust to model mis-specification (e.g., in the case of programmer errors). We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands. These difficulties parallel those discussed by Soares et al. 2015 in their paper on corrigibility. We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework.

References

[1]
Stuart Armstrong. 2010. Utility Indifference . Technical Report 2010--1. Oxford: Future of Humanity Institute, University of Oxford.
[2]
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. 2017. The Off-Switch Game. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 220--227.
[3]
Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, and Stuart Russell. 2017. Should Robots be Obedient? arXiv preprint arXiv:1705.09990 (2017).
[4]
Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, and Stuart Armstrong. 2015. Corrigibility. In 1st International Workshop on AI and Ethics at AAAI-2015 .

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AIES '18: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society
December 2018
406 pages
ISBN:9781450360128
DOI:10.1145/3278721
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ai safety
  2. cirl
  3. cooperative inverse reinforcement learning
  4. corrigibility

Qualifiers

  • Research-article

Conference

AIES '18
Sponsor:
AIES '18: AAAI/ACM Conference on AI, Ethics, and Society
February 2 - 3, 2018
LA, New Orleans, USA

Acceptance Rates

AIES '18 Paper Acceptance Rate 61 of 162 submissions, 38%;
Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The shutdown problem: an AI engineering puzzle for decision theoristsPhilosophical Studies10.1007/s11098-024-02153-3Online publication date: 19-Jun-2024
  • (2023)Human controlProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625860(271-281)Online publication date: 31-Jul-2023
  • (2021)Axes for Sociotechnical Inquiry in AI ResearchIEEE Transactions on Technology and Society10.1109/TTS.2021.30740972:2(62-70)Online publication date: Jun-2021
  • (2021)Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspectiveSynthese10.1007/s11229-021-03141-4Online publication date: 19-May-2021
  • (2020)Conservative Agency via Attainable Utility PreservationProceedings of the AAAI/ACM Conference on AI, Ethics, and Society10.1145/3375627.3375851(385-391)Online publication date: 7-Feb-2020
  • (2019)The necessary roadblock to artificial general intelligenceAI Matters10.1145/3362077.33620895:3(77-84)Online publication date: 6-Dec-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media