research-article

Epistemic Side Effects: An AI Safety Problem

Authors:

Toryn Q. Klassen,

Parand Alizadeh Alamdari,

Sheila A. McIlraithAuthors Info & Claims

AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Pages 1797 - 1801

Published: 30 May 2023 Publication History

Abstract

AI safety research has investigated the problem of negative side effects -- undesirable changes made by AI systems in pursuit of an underspecified objective. However, the focus has been on physical side effects, such as a robot breaking a vase while moving (when the objective makes no mention of the vase). In this paper we introduce the notion of epistemic side effects, which are side effects on the knowledge or beliefs of agents. Epistemic side effects are most pertinent in a (partially observable) multiagent setting. We show that we can extend an existing approach to avoiding (physical) side effects in reinforcement learning to also avoid some epistemic side effects in certain cases. Nonetheless, avoiding negative epistemic side effects remains an important challenge, and we identify some key research problems.

References

[1]

Parand Alizadeh Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2022. Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 18--26.

[2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv preprint arXiv:1606.06565 (2016). https://doi.org/10.48550/arXiv.1606.06565

[3]

Andrea Baisero and Christopher Amato. 2022. Unbiased Asymmetric Reinforcement Learning under Partial Observability. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 44--52.

Digital Library

[4]

Nick Bostrom. 2011. Information Hazards: A Typology of Potential Harms from Knowledge. Review of Contemporary Philosophy, Vol. 10 (2011), 44--79.

[5]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023). https://doi.org/10.48550/arXiv.2303.12712

[6]

Bart Bussmann, Jacqueline Heinerman, and Joel Lehman. 2019. Towards Empathic Deep Q-Learning. In Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, AISafety@IJCAI (CEUR Workshop Proceedings, Vol. 2419). CEUR-WS.org. http://ceur-ws.org/Vol-2419/paper_19.pdf

[7]

Charles Evans and Atoosa Kasirzadeh. 2021. User Tampering in Reinforcement Learning Recommender Systems. In 4th FAccTRec Workshop on Responsible Recommendation. https://arxiv.org/abs/2109.04083

[8]

Ronald Fagin, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi. 1995. Reasoning About Knowledge. MIT Press. https://doi.org/10.7551/mitpress/5803.001.0001

[9]

Joseph Y. Halpern. 2005. Reasoning about Uncertainty. MIT Press.

Digital Library

[10]

Dan Hendrycks and Mantas Mazeika. 2022. X-Risk Analysis for AI Research. arXiv preprint arXiv:2206.05862 (2022). https://doi.org/10.48550/arXiv.2206.05862

[11]

Toryn Q. Klassen, Sheila A. McIlraith, Christian Muise, and Jarvis Xu. 2022. Planning to Avoid Side Effects. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022). 9830--9839. https://doi.org/10.1609/aaai.v36i9.21219

[12]

Michal Kosinski. 2023. Theory of Mind May Have Spontaneously Emerged in Large Language Models. arXiv preprint arXiv:2302.02083 (2023). https://doi.org/10.48550/arXiv.2302.02083

[13]

Victoria Krakovna, Laurent Orseau, Miljan Martic, and Shane Legg. 2019. Penalizing Side Effects using Stepwise Relative Reachability. In Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, AISafety@IJCAI 2019 (CEUR Workshop Proceedings, Vol. 2419). CEUR-WS.org. http://ceur-ws.org/Vol-2419/paper_1.pdf

[14]

Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, and Shane Legg. 2020. Avoiding Side Effects By Considering Future Tasks. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://papers.nips.cc/paper/2020/file/dc1913d422398c25c5f0b81cab94cc87-Paper.pdf

[15]

Robert C. Moore. 1980. Reasoning about Knowledge and Action. Technical Note 191. SRI International. https://apps.dtic.mil/sti/citations/ADA126244

[16]

Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Morgan Kaufmann, 663--670.

[17]

Alexander Peysakhovich and Adam Lerer. 2018. Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2018). 2043--2044.

[18]

David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, Vol. 1, 4 (1978), 515--526. https://doi.org/10.1017/S0140525X00076512

[19]

Duncan Pritchard, John Turri, and J. Adam Carter. 2022. The Value of Knowledge. In The Stanford Encyclopedia of Philosophy Fall 2022 ed.), Edward N. Zalta and Uri Nodelman (Eds.). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/fall2022/entries/knowledge-value/

[20]

Sandhya Saisubramanian, Ece Kamar, and Shlomo Zilberstein. 2022. Avoiding Negative Side Effects of Autonomous Systems in the Open World. Journal of Artificial Intelligence Research, Vol. 74 (2022), 143--177. https://doi.org/10.1613/jair.1.13581

Digital Library

[21]

Maarten Sap, Ronan Le Bras, Daniel Fried, and Yejin Choi. 2022. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3762--3780. https://aclanthology.org/2022.emnlp-main.248

[22]

Manisha Senadeera, Thommen George Karimpanal, Sunil Gupta, and Santu Rana. 2022. Sympathy-based Reinforcement Learning Agents. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 1164--1172.

Digital Library

[23]

Robert Stalnaker. 1991. The Problem of Logical Omniscience, I. Synthese, Vol. 89, 3 (1991), 425--440. https://doi.org/10.1007/BF00413506

[24]

Alexander Matt Turner, Dylan Hadfield-Menell, and Prasad Tadepalli. 2020. Conservative Agency via Attainable Utility Preservation. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES '20). 385--391. https://doi.org/10.1145/3375627.3375851

Digital Library

[25]

Tomer Ullman. 2023. Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks. arXiv preprint arXiv:2302.08399 (2023). https://doi.org/10.48550/arXiv.2302.08399

[26]

Peter Vamplew, Cameron Foale, Richard Dazeley, and Adam Bignold. 2021. Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence, Vol. 100 (2021), 104186. https://doi.org/10.1016/j.engappai.2021.104186

[27]

Wiebe van der Hoek and Michael J. Wooldridge. 2002. Tractable Multiagent Planning for Epistemic Goals. In The First International Joint Conference on Autonomous Agents & Multiagent Systems, AAMAS 2002. 1167--1174. https://doi.org/10.1145/545056.545095

Digital Library

[28]

Audrey Wang, Rohan Chitnis, Michelle Li, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. 2020. A Unifying Framework for Social Motivation in Human-Robot Interaction. In The AAAI 2020 Workshop on Plan, Activity, and Intent Recognition (PAIR 2020).

[29]

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William S. Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. 2022. Taxonomy of Risks posed by Language Models. In FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, 214--229. https://doi.org/10.1145/3531146.3533088

Digital Library

[30]

Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue, and Zhiyong Feng. 2019. SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes. Autonomous Agents and Multi-Agent Systems, Vol. 33, 4 (2019), 403--429. https://doi.org/10.1007/s10458-019-09411-3

Digital Library

[31]

Shun Zhang, Edmund H. Durfee, and Satinder P. Singh. 2018. Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2019. 4867--4873. https://doi.org/10.24963/ijcai.2018/676

Cited By

Klassen TMuise CMcIlraith SMarquis PSon TKern-Isberner G(2023)Planning with epistemic preferencesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/76(752-756)Online publication date: 2-Sep-2023
https://dl.acm.org/doi/10.24963/kr.2023/76

Index Terms

Epistemic Side Effects: An AI Safety Problem
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Reasoning about belief and knowledge
    2. Philosophical/theoretical foundations of artificial intelligence
      1. Theory of mind
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Detecting and exploring side effects when repairing model inconsistencies
SLE 2019: Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering

When software models change, developers often fail in keeping them consistent. Automated support in repairing inconsistencies is widely addressed. Yet, merely enumerating repairs for developers is not enough. A repair can as a side effect cause new ...
Toward Comprehension of Side Effects in Framework Applications as Feature Interactions
APSEC '12: Proceedings of the 2012 19th Asia-Pacific Software Engineering Conference - Volume 01

Application frameworks are widely used in order to increase efficiency and reliability in object-oriented software development. In this paper we put a focus on side effects caused by misuse of frameworks. A processes of such a side effect often includes ...
Exploring COVID-19 Vaccine Side Effects: A Correlational Study Using Python
Abstract
The COVID-19 pandemic had a great impact on the socio-economic stability of every country. To curb the effect and risk of transmission, governments implemented various measures including the mandatory vaccination of their citizens. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

May 2023

3131 pages

ISBN:9781450394321

General Chairs:
Noa Agmon
Bar-Ilan University, Israel
,
Bo An
Nanyang Technological University, Singapore
,
Program Chairs:
Alessandro Ricci
University of Bologna, Italy
,
William Yeoh
Washington University in St. Louis, USA

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 30 May 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 29 - June 2, 2023

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
64
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)4

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Klassen TMuise CMcIlraith SMarquis PSon TKern-Isberner G(2023)Planning with epistemic preferencesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/76(752-756)Online publication date: 2-Sep-2023
https://dl.acm.org/doi/10.24963/kr.2023/76

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents