Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

Riedl, Mark O.; Harrison, Brent

Computer Science > Artificial Intelligence

arXiv:1703.10284 (cs)

[Submitted on 30 Mar 2017 (v1), last revised 27 Nov 2018 (this version, v2)]

Title:Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

Authors:Mark O. Riedl, Brent Harrison

View PDF

Abstract:Autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability that learn online using reinforcement learning to discover that the kill switch deprives it of long-term reward and thus learn to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique that prevents a reinforcement learning agent from learning to disable the kill switch. We introduce an interruption process in which the agent's sensors and effectors are redirected to a virtual simulation where it continues to believe it is receiving reward. We illustrate our technique in a simple grid world environment.

Comments:	6 pages; 1 figure; title, abstract updated; new experimental results
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1703.10284 [cs.AI]
	(or arXiv:1703.10284v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1703.10284
Journal reference:	Proceedings of the AAAI 2019 Workshop on SafeAI

Submission history

From: Mark Riedl [view email]
[v1] Thu, 30 Mar 2017 01:35:01 UTC (47 KB)
[v2] Tue, 27 Nov 2018 01:39:36 UTC (38 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2017-03

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mark O. Riedl
Brent Harrison

export BibTeX citation

Computer Science > Artificial Intelligence

Title:Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators