Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Play for Real(ism) - Using Games to Predict Human-AI interactions in the Real World

Published: 06 October 2021 Publication History

Abstract

AI-enabled decision support systems have repeatedly failed in real world applications despite the underlying model operating as designed. Often this was because the system was used in an unexpected manner. Our goal is to enable better prediction of how systems will be used prior to their implementation as well as to improve existing designs, by taking human behavior into account. There are several challenges to collecting such data. Not having access to an existing prediction engine requires the simulation of such a system's behavior. This simulation must include not just the behavior of the underlying model but also the context in which the decision will be made in the real world. Additionally, collecting statistically valid samples requires that test subjects make repeated choices under slightly varied conditions. Unfortunately, in such repetitious conditions fatigue can quickly set in. Games provide us the ability to address both of these challenges by providing both systems context and narrative context. Systems context can be used to convey some or all of the information the player needs to make a decision in the game environment itself, which can help avoid the onset of fatigue. Narrative context can provide a broader environment within which the simulated system operates, adding a sense of progress, showing the effect of decisions, adding perceived social norms, and setting incentives and stakes. This broader environment can further prevent player fatigue while replicating many of the external factors that might affect choices in the real world. In this paper we describe the design of the Human-AI Decision Evaluation System (HADES), a test harness capable of interfacing with a game environment, simulating the behavior of an AI-enabled decision support system, and collecting the results of human decision making based upon such a system's predictions. Additionally, we present an analysis of data collected by HADES while interfaced with a visual novel game focused on software cyber-risk assessment.

References

[1]
REFERENCES
[2]
D. J. Ahler, C. E. Roush, and G. Sood. 2018. The micro-task market for "Lemons": Collecting data on Amazon's Mechanical Turk. Working Paper. Epub ahead of print.
[3]
V. Aleven, E. Myers, M. Easterday, and A. Ogan. 2010, April. Toward a framework for the analysis and design of educational games. In 2010 third IEEE international conference on digital game and intelligent toy enhanced learning (pp. 69--76). IEEE.
[4]
I. G. Anson. 2018. Taking the time? Explaining effortful participation among low-cost online survey participants. Research & Politics, 5(3), 2053168018785483.
[5]
K. Bergström. 2010, October. The implicit rules of board games: On the particulars of the lusory agreement. In Proceedings of the 14th International Academic MindTrek Conference: Envisioning Future Media Environments (pp. 86--93).
[6]
A. J. Berinsky, G. A. Huber, and G. S. Lenz. 2012. Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political analysis, 20(3), 351--368.
[7]
J. A. Bopp, K. Opwis, and E.D. Mekler. 2018. "An Odd Kind of Pleasure": Differentiating Emotional Challenge in Digital Games. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 41, 1--12.
[8]
S. Clifford, R. M. Jewell, and P. D. Waggoner. 2015. Are samples drawn from Mechanical Turk valid for research on political ideology? Research & Politics, 2(4), 2053168015622072.
[9]
N.E Day, D Hudson, P.R. Dobies, et al. 2011. Student or situation? Personality and classroom context as predictors of attitudes about business school cheating. Soc Psychol Educ. 14: 261. https://doi.org/10.1007/s11218-010--9145--8
[10]
S. A. Dennis, B. M. Goodson, and C. A. Pearson. 2020. Online worker fraud and evolving threats to the integrity of MTurk data: A discussion of virtual private servers and the limitations of IP-based screening procedures. Behavioral Research in Accounting, 32(1), 119--134.
[11]
F. Doshi-Velez and B. Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608
[12]
J. N. Druckman. 2001. Using credible advice to overcome framing effects. Journal of Law, Economics, and Organization, 17(1), 62--82.
[13]
M. Dufwenberg, S. Gächter, and H. Henning-Schmidt. 2006. The framing of games and the psychology of strategic choice (No. 19/2006). Bonn Econ Discussion Papers.
[14]
S. Feng and J. Boyd-Graber. 2019, March. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces (pp. 229--239).
[15]
L. B. Fulton, J. Y. Lee, Q. Wang, Z. Yuan, J. Hammer, and A. Perer. 2020, April. Getting playful with explainable ai: Games with a purpose to improve human understanding of ai. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1--8).
[16]
A. Furnham and H. C. Boo. 2011. A literature review of the anchoring effect. The journal of socio-economics, 40(1), 35--42.
[17]
C. Garvie 2019. Garbage In, Garbage Out | Face Recognition on Flawed Data. [Online]. Available: https://www.flawedfacedata.com/
[18]
D.Y. Geiskkovitch, D. Cormier, S.H. Seo, and J.E Young. 2016. Please continue, we need more data: an exploration of obedience to robots. J. Hum.-Robot Interact. 5, 1 (March 2016), 82--99.
[19]
Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R. Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang. 2020. Mental Models of AI Agents in a Cooperative Game Setting. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--12.
[20]
M. J. Habgood and S. E. Ainsworth. 2011. Motivating children to learn effectively: Exploring the value of intrinsic integration in educational games. The Journal of the Learning Sciences, 20(2), 169--206.
[21]
M. J. Habgood, S. E. Ainsworth, and S. Benford. 2005. Endogenous fantasy and learning in digital games. Simulation & Gaming, 36(4), 483--498.
[22]
D. Journet. 2007. Narrative, Action, and Learning: The Stories of Myst. In: Selfe C.L., Hawisher G.E., Van Ittersum D. (eds) Gaming Lives in the Twenty-First Century. Palgrave Macmillan, New York. https://doi.org/10.1057/9780230601765_6
[23]
J. Juul. 2010. The game, the player, the world: Looking for a heart of gameness. Plurais Revista Multidisciplinar, 1(2).
[24]
R. Kennedy, S. Clifford, T. Burleigh, P. D. Waggoner, R. Jewell, and N. J. Winter. 2020. The shape of and solutions to the MTurk quality crisis. Political Science Research and Methods, 8(4), 614--629.
[25]
Y. Kou and X. Gui. 2020. Mediating Community-AI Interaction through Situated Explanation: The Case of AI-Led Moderation. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1--27.
[26]
A. Kühberger. 1998. The influence of framing on risky decisions: A meta-analysis. Organizational Behavior and Human Decision Processes. 75, 1 (1998), 23--55.
[27]
V. Lai, and C. Tan. 2019. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection, arXiv preprint arXiv:1811.07901
[28]
N. Lane and N. R. Prestopnik 2017, October. Diegetic connectivity: blending work and play with storytelling in serious games. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play (pp. 229--240).
[29]
A. C. Madrigal. 2019. How a Feel-Good AI Story Went Wrong in Flint. [Online]. Available: https://www.theatlantic.com/technology/archive/2019/01/how-machine-learning-found-flints-lead-pipes/578692/
[30]
P. Madumal, T. Miller, L. Sonenberg, and F. Vetere. 2019. A grounded interaction protocol for explainable artificial intelligence. arXiv preprint arXiv:1903.02409.
[31]
T. Miller, P. Howe and L. Sonenberg. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547.
[32]
M. Molineaux, D. Dannenhauer, and D. W. Aha. 2018, January. Towards Explainable NPCs: A Relational Exploration Learning Agent. In AAAI Workshops (pp. 565--569).
[33]
K. J. Mullinix, T. J. Leeper, J. N. Druckman, and J. Freese. 2015. The generalizability of survey experiments. Journal of Experimental Political Science, 2(2), 109--138.
[34]
M. Narayanan, E. Chen, J. He, B. Kim, S. Gershman, and F. Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv preprint arXiv:1802.00682 (2018).
[35]
S. Y. Okita, J. Bailenson, and D. L. Schwartz. 2007. The mere belief of social interaction improves learning. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 29, No. 29).
[36]
E. Peer, J. Vosgerau, and A. Acquisti. 2014. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023--1031.
[37]
M.T. Ribeiro, S. Singh, and C. Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 1135--1144.
[38]
D. Rumeser and M. Emsley. 2019. Can serious games improve project management decision making under complexity?. Project Management Journal, 50(1), 23--39.
[39]
S. Samat and A. Acquisti. 2017. Format vs. content: the impact of risk and presentation on disclosure decisions. In Thirteenth Symposium on Usable Privacy and Security ({SOUPS} 2017) (pp. 377--384).
[40]
K. Schrier. 2019. Designing Games for Moral Learning and Knowledge Building. Games and Culture. 2019;14(4):306--343.
[41]
C. A. Steinkuehler. 2004. Learning in massively multiplayer online games.
[42]
C. Steinkuehler and S. Duncan. 2008. Scientific habits of mind in virtual worlds. Journal of Science Education and Technology, 17(6), 530--543.
[43]
K. A. Thomas and S. Clifford. 2017. Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77, 184--197.
[44]
Matt Turek. 2019. Explainable Artificial Intelligence (XAI). [Online]. Available: https://www.darpa.mil/program/explainable-artificial-intelligence
[45]
A. Tversky and D. Kahneman. 1981. The framing of decisions and the psychology of choice. science, 211(4481), 453--458.
[46]
J. Villareale and J. Zhu. 2021. Understanding Mental Models of AI through Player-AI Interaction. arXiv preprint arXiv:2103.16168
[47]
D. Wang, Q. Yang, A. Abdul, and B. Y. Lim. 2019, May. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1--15).
[48]
J. D. Weinberg, J. Freese, and D. McElhattan. 2014. Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociological Science, 1.
[49]
M. Yin, J.W. Vaughan, and H. Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4--9, 2019, Glasgow, Scotland.

Cited By

View all
  • (2024)Human-AI Collaboration in Cooperative Games: A Study of Playing Codenames with an LLM AssistantProceedings of the ACM on Human-Computer Interaction10.1145/36770818:CHI PLAY(1-25)Online publication date: 15-Oct-2024
  • (2024)Forging Productive Human-Robot Partnerships Through Task TrainingACM Transactions on Human-Robot Interaction10.1145/361165713:1(1-21)Online publication date: 30-Jan-2024
  • (2023)A Missing Piece in the Puzzle: Considering the Role of Task Complexity in Human-AI Decision MakingProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization10.1145/3565472.3592959(215-227)Online publication date: 18-Jun-2023
  • Show More Cited By

Index Terms

  1. Play for Real(ism) - Using Games to Predict Human-AI interactions in the Real World

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CHI PLAY
    CHI PLAY
    September 2021
    1535 pages
    EISSN:2573-0142
    DOI:10.1145/3490463
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 October 2021
    Published in PACMHCI Volume 5, Issue CHI PLAY

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ai interfaces
    2. decision support systems
    3. games
    4. testing

    Qualifiers

    • Research-article

    Funding Sources

    • United States Air Force

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)362
    • Downloads (Last 6 weeks)51
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Human-AI Collaboration in Cooperative Games: A Study of Playing Codenames with an LLM AssistantProceedings of the ACM on Human-Computer Interaction10.1145/36770818:CHI PLAY(1-25)Online publication date: 15-Oct-2024
    • (2024)Forging Productive Human-Robot Partnerships Through Task TrainingACM Transactions on Human-Robot Interaction10.1145/361165713:1(1-21)Online publication date: 30-Jan-2024
    • (2023)A Missing Piece in the Puzzle: Considering the Role of Task Complexity in Human-AI Decision MakingProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization10.1145/3565472.3592959(215-227)Online publication date: 18-Jun-2023
    • (2023)Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video GamesProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581348(1-18)Online publication date: 19-Apr-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media