Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3635637.3663077acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
extended-abstract

Defining Deception in Decision Making

Published: 06 May 2024 Publication History

Abstract

With the growing capabilities of machine learning systems, particularly those that communicate or interact with humans, there is an increased risk of systems that can easily deceive and manipulate people. Preventing unintended deception and manipulation therefore represents an important challenge for creating aligned AI systems. To approach this challenge in a principled way, we first need to define deception formally. In this work, we present a concrete definition of deception under the formalism of rational decision making in partially observed Markov decision processes. We propose a general regret theory of deception under which the degree of deception can be quantified in terms of the actor's beliefs, actions, and utility. We instantiate these principles as reward terms for communication agents, and study the degree to which the behavior aligns with human judgments about deception. We hope our work will represent a step toward systems that aim to avoid deception, and detection mechanisms to identify deceptive agents.

References

[1]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
[2]
Thomas L. Carson. 2010. Lying and Deception: Theory and Practice. New York: Oxford University Press.
[3]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, XuezhiWang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/ARXIV.2204.02311
[4]
Meta Fundamental AI Research Diplomacy Team (FAIR)?, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe,Weiyan Shi, Joe Spisak, AlexanderWei, DavidWu, Hugh Zhang, and Markus Zijlstra. 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science 378, 6624 (2022), 1067--1074. https://doi.org/10.1126/science.ade9097 arXiv:https://www.science.org/doi/pdf/10.1126/science.ade9097
[5]
Josh A. Goldstein, Girish Sastry, Micah Musser, Renee DiResta, Matthew Gentzel, and Katerina Sedova. 2023. Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. https: //doi.org/10.48550/ARXIV.2301.04246
[6]
Google. 2023. Bard. https://bard.google.com/.
[7]
He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. 2018. Decoupling Strategy and Generation in Negotiation Dialogues. https://doi.org/10.48550/ ARXIV.1808.09637
[8]
Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long Jin, Hang Zhang, Chen Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, and Weizhu Chen. 2023. AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. arXiv:2303.16854 [cs.CL]
[9]
Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial intelligence 101, 1--2 (1998), 99--134.
[10]
Pamela J. Kalbfleisch and Tony Docan-Morgan. 2019. Defining Truthfulness, Deception, and Related Concepts. Springer International Publishing, Cham, 29--39. https://doi.org/10.1007/978--3--319--96334--1_2
[11]
Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, and Jason Weston. 2019. Recommendation as a Communication Game: Self- Supervised Bot-Play for Goal-oriented Dialogue. https://doi.org/10.48550/ ARXIV.1909.03922
[12]
Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, and Geoffrey Irving. 2021. Alignment of Language Agents.
[13]
Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, and Maarten Sap. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents. https://doi.org/10.48550/ARXIV.2205.12688
[14]
Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. https: //doi.org/10.48550/ARXIV.1706.05125
[15]
Stephanie Lin, Jacob Hilton, and Owain Evans. 2021. TruthfulQA: Measuring How Models Mimic Human Falsehoods. https://doi.org/10.48550/ARXIV.2109.07958
[16]
Jingjing Liu, Stephanie Seneff, and Victor Zue. 2010. Dialogue-Oriented Review Summary Generation for Spoken Dialogue Recommendation Systems. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Los Angeles, California, 64--72. https://aclanthology.org/N10--1008
[17]
Jaume Masip, Eugenio Garrido, and Carmen Herrero. 2004. Defining deception. Anales de Psicología (2004). https://www.redalyc.org/articulo.oa?id=16720112
[18]
OpenAI. 2023. GPT-4. https://openai.com/research/gpt-4
[19]
Alexander Pan, Chan Jun Shern, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, and Dan Hendrycks. 2023. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. arXiv:2304.03279 [cs.LG]
[20]
Sadat Shahriar, Arjun Mukherjee, and Omprakash Gnawali. 2021. A Domain- Independent Holistic Approach to Deception Detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). INCOMA Ltd., Held Online, 1308--1317. https://aclanthology.org/ 2021.ranlp-1.147
[21]
Alex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. 2021. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. https://doi.org/10.48550/ARXIV.2102.02503
[22]
Frédéric Tomas, Olivier Dodier, and Samuel Demarchi. 2022. Computational Measures of Deceptive Language: Prospects and Issues. Frontiers in Communication 7 (2022). https://doi.org/10.3389/fcomm.2022.792378
[23]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
[24]
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. Want To Reduce Labeling Cost? GPT-3 Can Help. arXiv:2108.13487 [cs.CL]
[25]
XueweiWang,Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good. https://doi.org/10.48550/ARXIV.1906.06725
[26]
Francis Rhys Ward. 2022. Towards Defining Deception in Structural Causal Games. In NeurIPS ML Safety Workshop. https://openreview.net/forum?id= ZKlWurATXIZ
[27]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=yzkSU5zdwD
[28]
Sophie Van Der Zee, Ronald Poppe, Alice Havrileck, and Aurélien Baillon. 2022. A Personal Model of Trumpery: Linguistic Deception Detection in a Real-World High-Stakes Setting. Psychological Science 33, 1 (2022), 3--17. https://doi.org/10. 1177/09567976211015941

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
May 2024
2898 pages
ISBN:9798400704864

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

  1. ai safety
  2. deception
  3. human-ai interaction

Qualifiers

  • Extended-abstract

Funding Sources

  • NSF
  • AFOSR
  • Cooperative AI Foundation

Conference

AAMAS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 105
    Total Downloads
  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)8
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media