Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3545946.3598842acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Epistemic Side Effects: An AI Safety Problem

Published: 30 May 2023 Publication History

Abstract

AI safety research has investigated the problem of negative side effects -- undesirable changes made by AI systems in pursuit of an underspecified objective. However, the focus has been on physical side effects, such as a robot breaking a vase while moving (when the objective makes no mention of the vase). In this paper we introduce the notion of epistemic side effects, which are side effects on the knowledge or beliefs of agents. Epistemic side effects are most pertinent in a (partially observable) multiagent setting. We show that we can extend an existing approach to avoiding (physical) side effects in reinforcement learning to also avoid some epistemic side effects in certain cases. Nonetheless, avoiding negative epistemic side effects remains an important challenge, and we identify some key research problems.

References

[1]
Parand Alizadeh Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2022. Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 18--26.
[2]
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv preprint arXiv:1606.06565 (2016). https://doi.org/10.48550/arXiv.1606.06565
[3]
Andrea Baisero and Christopher Amato. 2022. Unbiased Asymmetric Reinforcement Learning under Partial Observability. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 44--52.
[4]
Nick Bostrom. 2011. Information Hazards: A Typology of Potential Harms from Knowledge. Review of Contemporary Philosophy, Vol. 10 (2011), 44--79.
[5]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023). https://doi.org/10.48550/arXiv.2303.12712
[6]
Bart Bussmann, Jacqueline Heinerman, and Joel Lehman. 2019. Towards Empathic Deep Q-Learning. In Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, AISafety@IJCAI (CEUR Workshop Proceedings, Vol. 2419). CEUR-WS.org. http://ceur-ws.org/Vol-2419/paper_19.pdf
[7]
Charles Evans and Atoosa Kasirzadeh. 2021. User Tampering in Reinforcement Learning Recommender Systems. In 4th FAccTRec Workshop on Responsible Recommendation. https://arxiv.org/abs/2109.04083
[8]
Ronald Fagin, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi. 1995. Reasoning About Knowledge. MIT Press. https://doi.org/10.7551/mitpress/5803.001.0001
[9]
Joseph Y. Halpern. 2005. Reasoning about Uncertainty. MIT Press.
[10]
Dan Hendrycks and Mantas Mazeika. 2022. X-Risk Analysis for AI Research. arXiv preprint arXiv:2206.05862 (2022). https://doi.org/10.48550/arXiv.2206.05862
[11]
Toryn Q. Klassen, Sheila A. McIlraith, Christian Muise, and Jarvis Xu. 2022. Planning to Avoid Side Effects. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022). 9830--9839. https://doi.org/10.1609/aaai.v36i9.21219
[12]
Michal Kosinski. 2023. Theory of Mind May Have Spontaneously Emerged in Large Language Models. arXiv preprint arXiv:2302.02083 (2023). https://doi.org/10.48550/arXiv.2302.02083
[13]
Victoria Krakovna, Laurent Orseau, Miljan Martic, and Shane Legg. 2019. Penalizing Side Effects using Stepwise Relative Reachability. In Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, AISafety@IJCAI 2019 (CEUR Workshop Proceedings, Vol. 2419). CEUR-WS.org. http://ceur-ws.org/Vol-2419/paper_1.pdf
[14]
Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, and Shane Legg. 2020. Avoiding Side Effects By Considering Future Tasks. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020). https://papers.nips.cc/paper/2020/file/dc1913d422398c25c5f0b81cab94cc87-Paper.pdf
[15]
Robert C. Moore. 1980. Reasoning about Knowledge and Action. Technical Note 191. SRI International. https://apps.dtic.mil/sti/citations/ADA126244
[16]
Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Morgan Kaufmann, 663--670.
[17]
Alexander Peysakhovich and Adam Lerer. 2018. Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2018). 2043--2044.
[18]
David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, Vol. 1, 4 (1978), 515--526. https://doi.org/10.1017/S0140525X00076512
[19]
Duncan Pritchard, John Turri, and J. Adam Carter. 2022. The Value of Knowledge. In The Stanford Encyclopedia of Philosophy Fall 2022 ed.), Edward N. Zalta and Uri Nodelman (Eds.). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/fall2022/entries/knowledge-value/
[20]
Sandhya Saisubramanian, Ece Kamar, and Shlomo Zilberstein. 2022. Avoiding Negative Side Effects of Autonomous Systems in the Open World. Journal of Artificial Intelligence Research, Vol. 74 (2022), 143--177. https://doi.org/10.1613/jair.1.13581
[21]
Maarten Sap, Ronan Le Bras, Daniel Fried, and Yejin Choi. 2022. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3762--3780. https://aclanthology.org/2022.emnlp-main.248
[22]
Manisha Senadeera, Thommen George Karimpanal, Sunil Gupta, and Santu Rana. 2022. Sympathy-based Reinforcement Learning Agents. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 1164--1172.
[23]
Robert Stalnaker. 1991. The Problem of Logical Omniscience, I. Synthese, Vol. 89, 3 (1991), 425--440. https://doi.org/10.1007/BF00413506
[24]
Alexander Matt Turner, Dylan Hadfield-Menell, and Prasad Tadepalli. 2020. Conservative Agency via Attainable Utility Preservation. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES '20). 385--391. https://doi.org/10.1145/3375627.3375851
[25]
Tomer Ullman. 2023. Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks. arXiv preprint arXiv:2302.08399 (2023). https://doi.org/10.48550/arXiv.2302.08399
[26]
Peter Vamplew, Cameron Foale, Richard Dazeley, and Adam Bignold. 2021. Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence, Vol. 100 (2021), 104186. https://doi.org/10.1016/j.engappai.2021.104186
[27]
Wiebe van der Hoek and Michael J. Wooldridge. 2002. Tractable Multiagent Planning for Epistemic Goals. In The First International Joint Conference on Autonomous Agents & Multiagent Systems, AAMAS 2002. 1167--1174. https://doi.org/10.1145/545056.545095
[28]
Audrey Wang, Rohan Chitnis, Michelle Li, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. 2020. A Unifying Framework for Social Motivation in Human-Robot Interaction. In The AAAI 2020 Workshop on Plan, Activity, and Intent Recognition (PAIR 2020).
[29]
Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William S. Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. 2022. Taxonomy of Risks posed by Language Models. In FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, 214--229. https://doi.org/10.1145/3531146.3533088
[30]
Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue, and Zhiyong Feng. 2019. SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes. Autonomous Agents and Multi-Agent Systems, Vol. 33, 4 (2019), 403--429. https://doi.org/10.1007/s10458-019-09411-3
[31]
Shun Zhang, Edmund H. Durfee, and Satinder P. Singh. 2018. Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2019. 4867--4873. https://doi.org/10.24963/ijcai.2018/676

Cited By

View all
  • (2023)Planning with epistemic preferencesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/76(752-756)Online publication date: 2-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
May 2023
3131 pages
ISBN:9781450394321
  • General Chairs:
  • Noa Agmon,
  • Bo An,
  • Program Chairs:
  • Alessandro Ricci,
  • William Yeoh

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 30 May 2023

Check for updates

Author Tags

  1. ai safety
  2. knowledge and belief
  3. objective specification
  4. reinforcement learning
  5. side effects

Qualifiers

  • Research-article

Conference

AAMAS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Planning with epistemic preferencesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/76(752-756)Online publication date: 2-Sep-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media