Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ia and Gamer Desing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

Friend, Collaborator, Student, Manager: How Design


of an AI-Driven Game Level Editor Affects Creators
Matthew Guzdial Nicholas Liao Jonathan Chen
mguzdial3@gatech.edu nliao7@gatech.edu jonathanchen@gatech.edu
Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology

Shao-Yu Chen Shukan Shah Vishwa Shah


shao-yu.chen@gatech.edu shukanshah@gatech.edu vishwashah@gatech.edu
Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology

Joshua Reno Gillian Smith Mark O. Riedl


jreno@gatech.edu gmsmith@wpi.edu riedl@cc.gatech.edu
Georgia Institute of Technology Worcester Polytechnic Institute Georgia Institute of Technology

ABSTRACT ACM Reference Format:


Machine learning advances have afforded an increase in Matthew Guzdial, Nicholas Liao, Jonathan Chen, Shao-Yu Chen,
Shukan Shah, Vishwa Shah, Joshua Reno, Gillian Smith, and Mark O.
algorithms capable of creating art, music, stories, games, and
Riedl. 2019. Friend, Collaborator, Student, Manager: How Design of
more. However, it is not yet well-understood how machine an AI-Driven Game Level Editor Affects Creators. In CHI Conference
learning algorithms might best collaborate with people to on Human Factors in Computing Systems Proceedings (CHI 2019), May
support creative expression. To investigate how practicing 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 13 pages.
designers perceive the role of AI in the creative process, we https://doi.org/10.1145/3290605.3300854
developed a game level design tool for Super Mario Bros.-
style games with a built-in AI level designer. In this paper
we discuss our design of the Morai Maker intelligent tool 1 INTRODUCTION
through two mixed-methods studies with a total of over one- Advances in Artificial Intelligence (AI) and Machine Learning
hundred participants. Our findings are as follows: (1) level (ML) systems have lead to an increasing number of people
designers vary in their desired interactions with, and role of, interacting with these systems on a daily basis, for example
the AI, (2) the AI prompted the level designers to alter their with curated social media timelines, voice user interfaces
design practices, and (3) the level designers perceived the AI and agents, and self-driving cars. While the majority of these
as having potential value in their design practice, varying extant AI systems cover rote or repetitive human interac-
based on their desired role for the AI. tions, there has been increased interest in using AI to assist
creative expression [32, 56]. AI has been proposed in a col-
CCS CONCEPTS laborative framework in domains like storytelling, visual
• Human-centered computing → Human computer in- art, dance, and games [14, 19, 29, 31]. As such approaches
teraction (HCI); User studies; Graphical user interfaces. grow in popularity and sophistication, there is a greater need
to understand how to best develop interaction paradigms
KEYWORDS [18, 35, 52]. It is not yet clear how best to design interfaces
artificial intelligence; human computer collaboration; human- and AI to support creativity.
AI interaction AI has long been utilized in video game design and de-
velopment [55]. This application of AI to game design is
Permission to make digital or hard copies of all or part of this work for called procedural content generation (PCG), in which algo-
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
rithms generate game content like game art, levels, and rules
this notice and the full citation on the first page. Copyrights for components [23, 49]. These algorithms tend to be applied in one of two
of this work owned by others than ACM must be honored. Abstracting with ways: (1) the algorithm is run once and a human designer
credit is permitted. To copy otherwise, or republish, to post on servers or to refines the output (e.g. an initial generated terrain adjusted
redistribute to lists, requires prior specific permission and/or a fee. Request
by human designers) or (2) the algorithm is integrated into
permissions from permissions@acm.org.
a game and produces content at runtime without designer
CHI 2019, May 4–9, 2019, Glasgow, Scotland UK
© 2019 Association for Computing Machinery.
oversight. Much less frequently game design tools are de-
ACM ISBN 978-1-4503-5970-2/19/05. . . $15.00 veloped with AI agents as an equal design partner. Given
https://doi.org/10.1145/3290605.3300854 this, a firm understanding of the best way to structure this

Paper 624 Page 1


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

interaction and the consequences of employing it does not on a photo collage [30]. In this paper we focus on co-creation
yet exist. with an AI partner in the context of game level design.
We designed a tool: Morai Maker, in which a level de- The majority of recent AI advances have arisen due to
signer can collaboratively work with an AI agent to build deep learning [28], a particular machine learning approach
a Super Mario Bros.-like platformer game level. The inter- that allows for highly complex models [42]. There has been
action occurs in a turn-based manner, in which the human a significant amount of work in applying deep learning to a
and AI designers take turns making changes to an initially variety of tasks, including writing [6, 48], music composition
blank level within the same level editor interface. We ran [8, 25], and visual art [7]. However these prior instances
two mixed-methods studies with a total of over one-hundred almost all involve a use case in which a human user prompts
participants, the first as a means of guiding the development a trained AI agent for output. In other words, there is only
of the tool, and the second as a means of examining the ways the most minimal collaboration, the AI is employed as a tool.
in which the tool impacted the behavior of practicing game In those cases in which the AI is not a tool, it is still not
designers. Based on the results of these studies we found frequently given a partner role, for example being tasked
support for the following: with question-answering in a visual chatbot [12].
Recent work has looked into building interfaces and frame-
• The level designers varied in their desired interactions
works to allow for co-creation between a user and an AI.
with the AI and role of the AI in those interactions. We
Sketch-RNN [22] is an interface for sketching in which an
summarize these roles as friend, collaborator, student,
AI attempts to finish a user’s drawing and generate similar
or manager. This followed from the designer’s own
drawings. Drawing Apprentice [13] is a sketching tool that
artistic style, how they attempted to employ the AI,
focuses on adding content based on an AI’s understanding
and their reactions to the AI’s behavior.
of a user’s sketch. This tool allows a user to give explicit
• The level designers demonstrated a willingness to
feedback to the AI, while we instead opt for implicit feed-
adapt their own design practice to the AI and to the
back. DuetDraw [32] is an intelligent tool for creating col-
level editor.
ored pieces of digital art. It allows two different modes with
• The level designers expressed a perception that the
unique human-AI dynamics, with human or AI taking the
AI could potentially bring value to their design prac-
leading role and the other taking an assistant role. We did
tice. However, the form this took depended on the
not explicitly structure such roles into our models, but still
individual designer’s expectations for the AI.
found that users naturally projected different roles onto the
We discuss the design implications for user interfaces and same AI and took corresponding roles. Viewpoints AI [26]
AI agents for game level editors in order to afford the most is an intelligent dance partner installation in which an intel-
effective user-AI interactions. The main contributions of this ligent dance partner attempts to adapt its performance to its
work to the HCI community are as follows: human partner.
• We discuss the design of the intelligent level design Games
editor, both the front-end interface and back-end, col-
The concept of co-creation via procedural content generation
laborative deep neural network AI.
via machine learning (PCGML) has been previously discussed
• Through our two mixed-methods studies we identify
in the literature [47, 56], but no prior approaches or systems
user behaviors and user-AI interactions in our editor.
exist. Comparatively there exist many prior approaches to
• Finally, we discuss the implications for similar inter-
co-creative or mixed-initiative level design agents without
faces in which users and AI collaborate on a single
machine learning [15, 53]. These approaches instead rely
creative object.
upon search or grammar-based approaches [2, 3, 29, 41].
This means that there is significant developer effort required
2 RELATED WORK
to adapt each approach to a novel game.
We review works on (1) general human-AI cooperative cre- Procedural content generation via Machine Learning [47]
ation or co-creation on creative tasks and (2) relevant prior is a relatively new field, focused on generating content through
work in the field of computer games. machine learning methods. The majority of PCGML ap-
proaches represent black box methods, without any prior
AI Co-Creation approach focused on explainability or co-creativity. Our AI
Co-creation or mixed initiative practices are those practices could be classified as a PCGML approach, focused on co-
in which two agents work together in some creative task [24]. creativity.
Most typically these agents are humans, for example children Super Mario Bros. (SMB), which we employ as our particu-
working together to write a story [38] or coworkers working lar domain of study, represents a common area of research

Paper 624 Page 2


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

responsible for them. The current version of this interface


Unity3D project is available on GitHub.1

4 STUDY 1: PROTOTYPE INVESTIGATION


We ran an initial study to derive design lessons to apply
to future iterations of our Morai Maker tool, both in terms
of the interface and back-end AI system. We anticipated
that the choice of the AI algorithm might impact human
experience. However, it is unclear how and to what degree
different algorithms would impact user experience due to a
lack of prior work on this particular problem. Therefore, we
Figure 1: Screenshot of the final level editor. drew upon three previously published AI approaches to Super
Mario Bros. level generation, which we summarize in the next
section. Further, we focused on a primarily quantitative user
study as we were interested in deriving broad, summarizing
results.
into PCGML [11, 27, 45, 46, 50]. Beyond working with a
human user, our approach differs from prior SMB PCGML AI Level Design Partners
approaches in terms of representation quality and the size of
For the initial study we created three AI agents to serve as
generated content. We focus on the generation of individual
level design partners. Each is based on a previously published
level sections instead of entire levels in order to better afford
PCGML approach, adapted to work in an iterative manner to
collaborative level building [43]. We make use of a rich rep-
fit the requirements of our editor and turn-based interaction
resentation of all possible level components and an ordering
framework. We chose to create three potential AI back-end
that allows our approach to handle decorative elements.
partners because we wished to investigate what impacts
each of these systems might have on the user experience.
3 INITIAL MORAI MAKER DEVELOPMENT We lack the space to fully describe each system, but describe
Morai Maker went through several iterations prior to any the approaches at a high level to give a sense of the means
participant study [21]. We named the initial prototype Morai in which their behavior varied.
Maker after the popular level design game in the Mario fran- • Markov Chain: This approach is a markov chain
chise Mario Maker, since it was intended as a similar applica- from Snodgrass and Ontanón [44], based on Java code
tion but with “more AI". We chose Super Mario Bros. because supplied by the authors.
there has been significant PCGML work in this area [47]. • Bayes Net: This approach is a probabilistic graphical
Further the original game is well-recognized, which made it model, also known as a hierarchical Bayesian network
more likely that future study participants would be familiar from Guzdial and Riedl [21].
with it. • LSTM: This approach is a Long Short Term Memory
We built the final version of our prototype interface in Recurrent Neural Network (LSTM) from Summerville
Unity3D [10], a game development engine. The final version and Mateas [46], recreated in Tensorflow from the
of the UI is shown in Figure 1. The major parts of the interface paper.
are the current level in the center of the interface, a minimap
We chose these three approaches because they represent
on the bottom left of the interface, a palette of sprites in
the most successful prior PCGML approaches in terms of
the middle of the bottom row, and an “End Turn” button
breadth and depth of evaluations. All three systems differed
on the bottom right. By pressing this End Turn button the
in terms of the amount of existing level structure surveyed to
current AI level design partner is queried for an addition.
determine what next level components to add. The Markov
A pop-up appears while the partner processes, and then
Chain looked only at a 2x2 grid of level content, making
its additions are added component-by-component to the
hyper-local decisions, the Bayes Net looked at a chunk of
main screen. The camera scrolls to follow each addition,
level with a width of 16 grid points, and the LSTM considered
so that the user is aware of any changes to the level. The
almost the entire level. Further, the Bayes Net was the only
user then regains control and level building continues in
prior approach capable of generating decorative elements
this turn-wise fashion. At any time users can hit the top
and the only approach that represented each type of level
left “Run” button to play through the current version of the
component as an individual type, whereas the other two
level. A back-end logging system tracks all events, including
additions and deletions and which partner (human or AI) was 1 https://github.com/mguzdial3/Morai-Maker-Engine

Paper 624 Page 3


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

approaches grouped similar level components (e.g. all solid


components represented as equivalent). We chose to only
allow the agents to make additions to the level. We made
this choice as the systems were designed to autonomously
generate levels and so were not designed to handle deletions
and to minimize the potential for the agent to undo the
human’s intended design of a level.

Study Method
Each study participant went through the same process within
the same lab environment. First, they were given a short tuto-
rial on the level editor and its function. They then interacted
with two distinct AI partners back-to-back. The partners
were assigned at random from the three possible options.
During each of the two level design sessions, the user was
assigned one of two possible tasks, either to create an above
ground or below ground level. We gave the option for each
participant to look at two examples of the two level types
taken from the original Super Mario Bros.. This leads to a total
Figure 2: Examples of six final levels from our study, each
of twelve possible conditions in terms of pair of partners,
pair of levels from a specific co-creative agent: Markov
order of the pair, and order of the level design assignments. Chain (top), Bayes Net (middle), and LSTM (bottom).
Participants were given a maximum of fifteen minutes
for each level design task, though most participants finished
well before then. Participants were asked to press the “End Results
Turn” button in order to interact with their AI partner at In this section we discuss a data analysis of the results of our
least once. user study. Overall 91 participants took part in this study.
After both rounds of interaction participants took a brief However, seven participants did not interact with one or both
survey in which they were asked to rank the two AI partners of their partners. Thus we do not include these results in our
they interacted with in terms of the following experiential analysis. The remaining 84 participants were split evenly
measures: between the twelve possible conditions, meaning a total of
seven human participants for each condition. Of these, 67
(1) Which of the two agents was the most fun? respondents identified as male, 16 identified as female, and 1
(2) Which of the two agents was the most frustrating? identified as nonbinary. 64 participants placed themselves
(3) Which of the two agents was the most challenging? in the 18-22 age range, which makes sense given we were
(4) Which of the two agents most aided your design? largely drawing from a student population, with 19 in the 23-
(5) Which of the two agents lead to the most surprising 33 age range, and 1 participant in the 34-55 age range. This
and valuable ideas? population is not sufficiently diverse to draw broad, general
(6) Which of the two agents would you most want to use lessons. While this fit our needs for an initial investigation,
again? we broaden participant diversity in the second study.
62% of our respondents had previously designed Mario
We chose these questions to focus on particular experien- levels before. This is likely due to prior experience playing
tial features measured in prior games experience research Mario Maker, a level design game/tool released by Nintendo.
[16, 21, 34]. We employed rankings over ratings because we Our participants were nearly evenly split between those who
were particularly interested in the comparative impact on had never designed a level before 26%, designed a level once
user experience and due to previously noted benefits of rank- before 36%, or had designed multiple levels in the past 38%.
ings [54]. After this ranking section, participants could chose All but 7 of the participants had previously played Super
to leave a comment reflecting on each agent. The survey Mario Bros., and all the participants played games regularly.
ended by collecting demographic data including experience Our first goal in analyzing our results was to determine if
with level design, Super Mario Bros., the participant’s gender the level design task (above or underground) mattered and
(participants input their gender through a text box), and age if the ordering of the pair of partners mattered. We ran a
(selected within the ranges of 18-22, 23-33, 34-54, and 55+). one-way repeated measures ANOVA between ordering of

Paper 624 Page 4


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

Table 1: A table comparing the ratio of first rankings for the three comparisons and the p-value of the Wilcoxon rank-sum
test, testing if the two ranking distributions differed significantly.

Most Fun Most Frustrating Most Challenging Most Aided Most Creative Reuse
Pair of AI ratio p ratio p ratio p ratio p ratio p ratio p
Bayes-LSTM 15:13 0.6029 11:17 0.1142 9:19 0.0083 17:11 0.1142 19:9 0.0083 17:11 0.1142
Bayes-Markov 12:16 0.1469 15:13 0.6029 11:17 0.1142 14:14 1 10:18 0.0349 11:17 0.1142
LSTM-Markov 11:17 0.1142 15:13 0.6029 16:12 0.2937 12:16 0.2937 13:15 0.6029 13:15 0.6029

Table 2: A table comparing the Spearman’s correlation between the different agent ranking questions. Each cell contains the
Spearman’s rho and p values for the ranking across the 84 participants.

Fun Frustrating Challenging Aided Creative Reuse


rho rho
p p rho p rho p rho p rho p
Fun - -0.74 <2.2e-16 -0.10 0.2194 0.79 <2.2e-16 0.76 <2.2e-16 0.88 <2.2e-16
Frustrating -0.74 <2.2e-16 - 0.21 0.0053 -0.81 <2.2e-16 -0.64 <2.2e-16 -0.71 <2.2e-16
Challenging -0.10 0.2194 0.21 0.0053 - -0.21 0.0053 -0.10 0.2194 -0.07 0.3572
Aided 0.79 <2.2e-16 -0.81 <2.2e-16 -0.21 0.0053 - 0.74 <2.2e-16 0.81 <2.2e-16
Creative 0.76 <2.2e-16 -0.64 <2.2e-16 -0.10 0.2194 0.74 <2.2e-16 - 0.83 <2.2e-16

agents, the pair of agents, and the design task and the expe- rankings. One might suggest that we needed to collect more
riential feature rankings. We found that no variable lead to training data. However, after 72 participants we found nearly
any significance. Thus, we can safely treat our data as having these exact same proportionate ratios. Instead, it may be that
only three conditions, dependent on the pair of partners each the participants of this study were choosing their rankings at
participant interacted with. However, this also indicates that random. To determine this we ran a second set of evaluations
none of the agents were a significant determiner of the final testing the degree to which rankings of one experiential
rankings. feature correlated with the other experiential features. Table
We give the overall ranking results in Table 1. We split 2 summarizes the results from comparing the correlations of
the results into three conditions, based on the pair of agents the experiential feature rankings. We applied Spearman’s rho
those participants interacted with. This is necessary given as our correlation test, due to the fact that our ranking data
the pairwise ranking information. For each pair of agents, does not follow a normal distribution. The first value in each
and for each experiential feature (e.g. “Most Fun"), we give cell is the value of Spearman’s rho, while the second is the
the ratio by which each agent was ranked first for that fea- p-value of significance, with significant values in bold. The
ture and the p-value from running the Wilcoxon rank-sum amount and strength of the correlations may seem surprising,
test. This p-value is bolded in the case in which we can re- but we note that there were only two possible values (first
ject that these two sets of rankings arose from the same rank or second rank, associated with 1.0 or -1.0).
distribution. To run the Wilcoxon rank-sum we represented These results lend support to the notion that our partic-
all first place rankings as 1.0 and all second place rankings ipants were not ranking according to random guesses, as
as -1.0, but we could have chosen any two distinct values. we would not expect to see any correlation. Instead, we can
We applied the Wilcoxon rank-sum test as our data did not break our results into groups in terms of experiential rank-
fit a normal distribution. Thus, for example for the Bayes- ings that implied a positive experience (fun, aided, creative,
LSTM condition the Bayes agent was ranked as the most fun and reuse) and those that did not (frustrating and challeng-
by 15 of the 28 participants while the remaining 13 ranked ing). Notably, ranking the agents in terms of which was the
the LSTM as the most fun. Note that we simplify the fifth most challenging appears to have been the least consistent
ranking question in our survey (surprising and valuable) to across participants. There is a weak positive correlation with
most creative; surprise and value have been identified as two the ranking of most frustrating and a weak negative correla-
of the primary requirements for something to be labelled tion with the ranking of most aided. We anticipate this was
creative [5]. due to a lack of clarity in how we phrased the question, but
The results in Table 1 indicate that we were unable to find also note that some participants may have used a challenging
any significant difference for all but three of the comparative

Paper 624 Page 5


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

ranking to denote a lack of understanding on their part for a smart overall... [collaborating] well with my ideas”, “seemed
potentially useful agent. more creative. I did actually use some of its ideas” or felt
that the agent “seemed to build toward an idea”. Compar-
Study 1 Output Levels atively the participant who gave that latter quote felt the
We give examples of two randomly selected levels co-designed Markov agent “didn’t really seem to offer any ‘new’ ideas”,
by each of the three agents in Figure 2. While we chose these despite the Markov agent consistently ranking higher on the
levels at random, they demonstrate some consistent features question about surprising/valuable ideas. Notably, despite
we found across all levels co-designed by these agents. For the requirement of ranking, our results indicate that many
example, the Markov Chain agent co-designed levels (top of participants remained unsatisfied by both agents. One piece
Figure 2 have a variety of unique patterns of blocks, which do of evidence for this is that on average participants deleted
not appear in Mario or in the other agent’s co-designed levels. 60% of all the agent’s additions. This is further made clear
The Bayes Net co-designed levels were more likely to con- through comments in which participants expressed displea-
tain decoration, but otherwise appear similar to typical Super sure with both agents they interacted with, such as “They
Mario Bros. levels. Finally, the LSTM agent’s co-designed lev- both included stuff... which seemed pretty unhelpful". One
els varied widely. In the cases where participants followed participant who interacted first with the Bayes Net agent
typical Super Mario Bros. level conventions, the LSTM agent stated that the agent “did not pick up on my style of design"
tended to co-design levels that strongly resembled original and of the second agent they interacted with, the LSTM agent,
Super Mario Bros. levels. However, when participants did not they stated they were “Disappointed with the lack of creativ-
follow these conventions (as in the two images we randomly ity of level generation idea(s)". This quote appears to state
selected) the output tended to appear noisy or random. De- the opposite view of the participants above who specifically
spite referring to some of these levels as more or less Super mention the creativity of the LSTM agent’s ideas. The LSTM
Mario Bros.-like we note that in all of these example images, agent did not change between participants, thus it appears
and in fact in all of the final levels from this study, there that this friction arose due to differences in the participants
existed structures that did not exist in the original Super as level designers. This type of friction appeared throughout
Mario Bros.-levels. For example in the very top level image, the optional quotes, indicating that different participants had
the extra-long orange bars and single floating cloud the use vastly different design values, with no one agent addressing
of an “M" shape of floating blocks in the second image. them all.
Our primary takeaways from this first study were:
Study 1 Results Discussion
The lack of a consistent ranking between agents according • Users of this prototype tool varied by such a degree
to the experiential feature rankings suggests that no one that no one static agent could meet all of their expec-
agent stands out as significantly superior as a creative part- tations for an AI partner. However, users of the tool
ner. Instead, they suggest that individual participants varied overall expressed a clear sense of what they found
in terms of their preferences. This matches our own expe- valuable.
rience with the agents. When attempting to build a very • Participants demonstrated a consistent departure from
standard Super Mario Bros. level, the LSTM agent performed typical Super Mario Bros. structure.
well. However, as is common with deep learning methods • Participants demonstrated a lack of clear understand-
it was brittle, defaulting to the most common behavior (e.g. ing of their AI partners, but a willingness to invent an
adding ground or block components) when confronted with explanation for how these partners behaved.
unfamiliar input. In comparison the Bayes Net agent was
more flexible, and the Markov Chain agent more flexible still, 5 STUDY 2: THINKALOUD
given its hyper-local reasoning. This can be most clearly seen The purpose of our first study was to derive design insights
in the “Most Creative" results in Table 1, in which the Bayes we might use to develop our initial prototype into a more
Net was ranked as significantly more surprising and valuable fully realized tool. Thus, we made changes based on the
than the LSTM, and the Markov Chain agent ranked as sig- results of this first study, both to the interface and AI agent.
nificantly more surprising and valuable than the Bayes Net. We discuss these changes below. After these changes we
Despite this, in comparing between the LSTM and Markov felt we had arrived at something like an “alpha" build, to
Chain, no such significant relationship was found. borrow software development parlance. We ran a second
The lack of a single superior agent is further supported study in order to evaluate the impacts that this version of
by the optional comments left by some participants. For ex- Morai Maker had on designer’s experiences and behaviors.
ample, despite receiving the fewest number of first place This study was not meant as an evaluation of this tool, but an
rankings, some participants referred to the LSTM as “Pretty investigation of the effect that the tool had on users towards

Paper 624 Page 6


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

its continuing, future development. We focused this study on the agent receives a reward of +0.1, and if the user removes
practicing, published game designers with a more qualitative them -0.1. Notably these rewards are local to the addition’s
methodology for this purpose. Before the study we identified placement, meaning that a user deleting a ground tile in
three major research questions, which directed the design of a particular location will not impact the likelihood of see-
the study and our analysis of the results. They are as follows: ing other ground tiles elsewhere. We also kept track of all
• RQ1: By leveraging active learning to adapt the AI deletions from the user of the AI’s additions in a particular
partner to a user, can our tool better serve the needs session and prohibit the AI partner from making those same
of level designers? additions again.
• RQ2: Can Explainable AI allow users to better under- Overall this had the effect of an agent that, in informal
stand the AI, and therefore to better utilize the tool? tests, was capable of picking up on user’s preferences for
• RQ3: Will our overall changes to the tool lead to ben- local level structures. We predicted that this would allow us
eficial experiences for the designers? to address the first two takeaways from the results of the
first study, given the ability to adapt to an individual user
RQ1 arose from the first and second takeaways listed in and training on more levels than existed in the original Super
the Study 1 Results Discussion section, RQ2 arose from the Mario Bros.. For further technical details please see [20].
third takeaway, and RQ3 arose as a general response to our The size and processing requirements of this new AI model
changes. meant we could no longer run the model on the same device
as the front-end editor. Instead we made use of a client-
Changes to Morai Maker server framework, in which the AI agent ran on a server
The results of the first study indicate a need for an approach during a single participant’s study, and the “End Turn” button
designed for co-creative PCGML instead of adapted from prompted the server for additions. Study personnel could
autonomous PCGML. In particular, given that none of our monitor the output of the server during the study.
existing agents were able to sufficiently handle the variety Results in the first study indicated the participants did
of human participants, we expect instead a need for an ideal not fully understand their AI partners, to address this we in-
partner to more effectively generalize across all potential cluded an explanation generation system. Explainable AI rep-
human designers and to adapt to a human designer actively resents an emerging field of research [4], focused on translat-
during the design task. We modeled the interaction as a semi- ing or rationalizing the behavior of black box models. To the
Markov Decision Process (SMDP) with concurrent actions best of our knowledge, this has not been previously applied
(the different possible additions) [37]. Our final agent trained to PCGML. Codella et al. [9] demonstrated how explanations
on the interactions with the 91 participants, using the “Reuse" could improve model accuracy on three tasks, but required
ranking (1 or -1) as the final reward, given that we felt this that every sample be hand-labeled with an explanation and
was the most important experiential feature and correlated treated explanations from different authors as equivalent.
strongly with the other positive features. In addition, we Ehsan et al. [17] made use of explainable AI for automated
include a small negative reward (-0.1) if the human deletes game playing. Their approach relies on a secondary machine
an addition made by the AI partner. learning interpretation of the original behavior, rather than
From our human participant study we found that local visualizing or explaining the original model as our approach
coherency (Markov Chain) tended to outperform global co- does.
herency (LSTM). Thus for a proposed co-creative architecture In our explanation generation system we identify the most
we chose to make use of a Convolutional Neural Network decisive 4x4 slice of the level for a particular addition as a
(CNN) as the agent in our SMDP. We made this choice as printout from the server, we also included the AI agent’s
CNNs focus learn small, local features that are helpful in confidence in the addition (based on the activation) and
making more global decisions. We made use of a three layer the filter of the first layer of the CNN that was maximally
CNN, with the first layer having 8 4x4 filters, the second activated as described in [33]. Essentially this meant that
layer having 16 3x3 filters, and the final layer having 32 3x3 for every addition an AI expert could translate why the AI
filters. The final layer is a fully connected layer followed by made that addition. We did not incorporate explanations
a reshape to place the output in the form of the action matrix into the editor at this time, given a lack of clarity on how
(40x15x32). Each layer made use of leaky relu activation. We best to integrate this feature. Instead we treated this as a
made use of a mean square loss and adam as our optimizer, semi-Chinese room [39] or semi-Wizard of Oz [36] design,
with the network built in Tensorflow [1]. with study personnel acting to translate the output expla-
We updated our agent to use an active learning frame- nation into user-understandable language. For consistency,
work [40]. During use of the tool the agent trains on implicit we used a single member of the study personnel to trans-
feedback from the user. If a user keeps the AI’s additions late these explanations. No hypothesizing was included in

Paper 624 Page 7


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

these explanations, only relating the output in terms of the reworded four of the original questions to ask participants
generated explanation and prior experience. Our hope was to rank between the two experiences, in regards to the agent.
that we might use the ways in which this individual trans- We did not include the question on reuse, as it was not rel-
lated the explanations as insight into an interface-integrated evant, and we did not include the question on which agent
explanation system. was most challenging to use given the apparent issues with
There were only two changes made to the front-end level that question.
editor. First, we replaced the “Options” button with a “Re- We added a second set of questions after the initial expe-
move" button that would remove all the AI’s additions from riential feature rankings meant to specifically address our
the last turn. We made this choice because we anticipated research questions. They were as follows:
the potential for brittleness and incoherent additions prior (1) Did you prefer the agent’s behavior in the first or sec-
to the agent successfully adapting to the user. The second ond session? (a) First (b) Second.
change was to remove the functionality of the “Run" button, (2) Would you prefer to use this tool with or without the
since in the case of the 7 participants who did not make use AI partner? (a) With (b) Without (c) No preference.
of the AI in the first study, they spent their full time playing (3) Did you feel that the agent was collaborating with
their own level. you? (a) Yes (b) No
Method (4) Did you feel that the agent was adapting to you? (a)
Yes (b) No
The overall method of this second study remained similar (5) If you asked for explanations, did you find that they
to the first. The study differed in terms of the location. Our improved your experience? (a) Yes (b) No
goal for this study was to determine the impact of this tool
on user’s design practice. For this reason, we gave the option For each of these questions participants were asked to
for study participants to take part in the study wherever explain their answers in text, except for the last one, which
they felt the most comfortable to engage in level design. In asked for an explanation in the case of a positive response.
the cases where users took part in the study remotely, study These questions were intended to help us pick apart elements
personnel used conference software to watch their screen of user experience related to our two research questions,
and record audio. RQ1 (1, 4), RQ2 (5), and RQ3 (2, 3). We added questions to
At the start of the study, participants were given a full the demographic section to address the particular goals for
run-down of the tool, this time including informing the users this study. We asked the respondents to describe their game
they could ask questions to the study personnel, including design experience (offering hobbyist, industry, indie, aca-
explicitly “explanations about the AI". Participants then took demic, etc. as suggested answers), and asked them to rate
part in designing two levels. For each level they now inter- their experience with with AI/ML on a Likert scale (No expe-
acted with the same agent, which was not reset in between rience, I have studied or used AI/ML in the past, I regularly
levels. Thus the agent would ideally continue to adapt to use AI/ML, I am an AI/ML expert). We asked this question
the designer during the course of the study session, and due to RQ3, and our concern that experience with AI may
we would expect to see better performance in the second impact how participants asked for explanations.
level design session. The participants were given a random
level design task for each level, to create an above ground Analysis Methods
or under ground level, while the AI was not altered in either During the think-aloud design portions of each study person-
case. The major difference in the level design portion of the nel were instructed to note interactions or utterances they
study was the use of a think-aloud protocol. At the beginning felt related to our three research questions. After all partici-
of the study session participants were encouraged to voice pant data had been collected we further reviewed the audio
their “thoughts, reactions, and intentions" aloud. In the case recordings for any additional interactions or utterances. We
where the participant remained quiet, study personnel asked engaged in two discussion sessions between members of the
“why” questions, referring back to the previous actions of the team concerning this final set of notes in order to derive our
participant. The exact choice of when to make use of these final discussion points in terms of our research questions,
questions was left up to individual study personnel. A single grouping each noted utterance or interaction in terms of
study personnel ran each of these sessions. We made this whether it positively or negatively related to each research
choice for consistency between when the “why" questions question.
were asked and the generated explanations.
Given that the original survey was designed to compare Demographic Results
between interactions with two different agents, we reworked We reached out to a total of 24 game designers through
it for interacting with the same agent twice. In particular we various social media accounts. We sought only those with

Paper 624 Page 8


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

Table 3: Ratio of the answers to the survey’s question in the


second study.

First Second
Most Fun 5 9
Most Frustrating 8 6
Most Aided 5 9
Most Creative 5 9
Preference 6 8
Yes No
Collaborating 7 7
Adapting 9 5

AI, but we found that nine answered “with”, two answered


“without”, and the rest “no preference”. Since the AI partner
is the main feature that differentiates this tool from other
level design editors, we anticipate that this result indicates
some positive support for RQ3. Overall these results indicate
a majority of participants had a positive interaction with
Figure 3: Examples of six final levels from our second study,
each pair of levels comes from a single participant. the agent, which gives some support to a positive answer to
RQ1.
As a summarizing statistic we include the change in aver-
published games, either indie, hobbyist or industry. We made age ranking of the second agent between the four experien-
this choice because we were interested in designers with a tial features shared between the two studies. This indicates
consistent design practice, in order to determine how this that individuals in the second study were 14.3% more likely
design practice was impacted by our tool. 16 agreed to take to rank the second design experience as more fun than indi-
part in this study. During the study session of two of these in- viduals in the first study, 15.5% less likely to rank the second
dividuals it was discovered that networking issues (a firewall agent as more frustrating, 20.3% more likely to think the
the participant had no control over and network latency) second experience aided their design, and 9.5% more likely
lead to the tool crashing. Thus we were left with 14 final to view the second experience as demonstrating more sur-
participants. prising and valuable ideas from the AI. This lends further
Of the 14 participants, 8 identified as male, 4 as female, support toward a positive answer to RQ1 since by the time
and 2 as nonbinary. While not ideal, this does outperform the of the second session the AI agent should have had time to
games industry in terms of gender diversity [51]. In terms of adapt to the user. Half of the participants felt the AI was
age, 11 chose 23-33, 2 chose 34-54, and 1 chose 18-22. This collaborating with them. This seemed to be due to differing
is notably older on average than our initial study, which expectations on what AI collaboration should look like across
follows from the difference in participant pools. the participants, which we discuss further below. We ended
We randomly selected three of our participants and visual- up not finding the explanation question useful because only
ize their final levels in Figure 3. These levels vary significantly three of the fourteen respondents asked for any explanations
from the types of levels seen in Figure 2, and each demon- having to do with the AI. Only 45% of AI additions were
strates unique design aesthetics. Most notably the middle deleted on average in this study, compared to 60% in the first.
designer, who described her levels as “sadistic”. These levels However we caution against strong interpretations of these
offer some evidence towards the tool supporting creative results.
expression, but does not reflect the interaction with the AI.
Qualitative Results
Quantitative Results In this subsection we organize the results of our analysis
The only quantitative results of this study are the survey of the think-aloud utterances and design tool interactions
responses and the logging information. We summarize the in terms of our three research questions. In our analysis
results in terms of the ratio of answers in Table 3. It does not we noted two major themes across our findings. First, that
include the answer to the question concerning whether the individuals differed widely in terms of their expected roles
participant would want to use the tool with or without the for the AI. Second, a willingness to adapt their own behavior

Paper 624 Page 9


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

to the AI. We identify participants according to the order where these were phrased as questions (e.g. “Will it learn?"
that they took part in the study. from participant 5), our study personnel asked if this was
Our first research question asked by leveraging active meant as a question, to which our participants responded
learning to adapt the AI partner to a user, can our tool in the negative. This came up repeatedly, even with partici-
better serve the needs of level designers?. Overall, we pants who asked for explanations. Participant 8, a male who
found that the AI consistently demonstrated adaptation to described himself as a “mostly" hobbyist, stated “It is adding
the participants. The study participants indicated that they something to help the players", despite asking whether the
noticed this, both during the think-aloud and in their com- system had any explicit model of the mechanics and hearing
ments on the final survey. Study participant 9, a male hobby- it did not. These hypothesized reasons often anthropomor-
ist game designer, added this after the collaboration survey phised the AI as in “I’m happy with what the AI is thinking
question: “The more I placed, the more the agent seemed here" from participant 11 and “I like where it’s heads at but-
to get a sense of what I was going for, and in the end we I’m gonna trim some things" from participant 13, a female
had a couple of decent Mario levels." Participant 1, a male student/indie game designer.
indie game designer, responded to the question about round Our third research question asked will our overall changes
preference with: “... After I rejected some of the off-theme to the tool lead to beneficial experiences for the de-
decorative elements it attempted to add at the beginning, signers?, by which we address our belief that designers
I felt like it recognized what kind of blocks I was working would find value in the tool. First, participants generally
with in this environment." During the study, every partici- praised the front-end interface. Participant 10, a female “in-
pant had at least one interaction where they noted the AI’s dustry, indie, academic" game designer stated “Your level ed-
adaptation. Participant 5, a female academic game designer itor is amazingly functional for something written in Unity.
reacted to one of the AI’s additions with “This is a little Also, it felt pretty good to use." Across the think-aloud and
better...Hey it’s starting to figure me out a little bit". Partic- survey comments we identified two major strategies to get
ipant 6, a non-binary, hobbyist game designer offered the value from the tool. As either an unintentional inspiration
following quotes directly after observing each round of the source or an intentional means of getting over a lack of ideas.
AI’s additions “Still don’t feel quite inspired? I’ll try a tighter Participant 4, a male hobbyist designer indicated that he
feedback loop", “Wait? that was nice", “Not great, except this preferred the behavior of the first experience to the second,
part is really great", and “Honestly, good job AI?". explaining “It added things that made the level harder, things
Each participant experienced and remarked upon at least that I would not have thought of by myself, most likely". Par-
one instance of the AI adapting to them. However, for some ticipant 6, after stating a preference for the tool with the
participants, this was the exception, with most of the behav- AI partner stated “I really like the tool regardless of the AI
ior appearing “random". On a comment after the first/second partner, but it was nice to be surprised by the AI partner! It
round behavior preference question, participant 14, a hobby- prompted conversation/discussion in my head". Participant
ist nonbinary game designer stated “The second agent placed 14 during the think-aloud stated “I’m running out of ideas",
objects fairly arbitrarily, in places where it didn’t really affect prompted the AI for additions, and exclaimed “Oh yeah I
gameplay, just looked weird". Participant 11, a male game forgot about these things!". Participant 5 stated she would
designer, who described his game design career as “all of the prefer to use the tool with the AI and explained “It was more
above except industry" stated “it learned too much from the fun than facing an empty level by myself".
first and then attempted to apply it to the second". Notably Not everyone found the tool to be consistently valuable,
this was one of the three participants who actually asked for with the majority of complaints about the tool focusing on
explanations about the AI agent. the AI’s behavior. Participant 10 indicated she had no prefer-
Our second research question asked can Explainable AI ence about whether to use the tool with or without the AI
allow users to better understand the AI, and therefore and stated “I could see using this tool as a way to give myself
to better utilize the tool?. We did not find a meaningful inspiration. But if I had more specific goals in mind... I would
answer to this research question. Only three participants have found it more inhibiting than useful". Seven of the par-
asked for explanations, with two responding to the survey ticipants used the term “random" to refer to behavior they
saying it was useful. Participant 7, a male hobbyist game disliked from the AI, as with participant 2, a female academic
designer added the comment “It helped me understand what game designer, who commented “the AI choices felt a bit too
the AI was attempting to do and adapt to it for better col- random to suit my taste", after indicating she’d rather use the
laboration." However, this could have been prompted by the tool without the AI. Participants also had issues with the AI
way we phrased the survey question. Instead of asking for not recognizing intentional empty space. Even though the AI
explanations about the AI, almost all participants instead could not make the same additions in the same locations, it
voiced hypotheses about why the AI did what it did. In cases could add the same additions in other map locations. When

Paper 624 Page 10


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

the AI repeatedly filled Participant 11’s gaps with ground the AI to “evaluative" him. Later, reacting to an AI addition
blocks he exclaimed “Quit filling in the gaps!". Some users he stated “so I like what the AI- I’ll follow along with what it
expected the AI to only add content that a potential player suggested". This notion of “following along" with the AI was
could reach given the current level. Participant 9, reacting to common across participants who engaged with the AI in this
an AI addition stated “that’s an unreachable block so I’m go- way. As with Participant 2, who reacted to an AI addition by
ing to delete it", this was a common reason given for deleting stating “I’ll stick to its idea". Participant 8, reflecting on the
the AI’s additions. AI’s previous additions while designing stated “I’ll probably
just add some blocks to satisfy its needs".
Roles and User Adapting Analysis Some participants adapted their behavior to the AI pri-
Users differed widely in their expectations for their AI part- marily as a means of attempting to determine how best to
ner. In our analysis we identified four major expected AI interact with it. Participant 12, a male indie, paused and
roles: friend, collaborator, student, and manager. These ex- reflected on the AI’s additions during one of his sessions,
pected roles could positively or negatively impact the user stating “It does do pretty cool stuff now and then, I think I
experience, and fluidly changed throughout each design ses- need to get more used to what its doing". Others were more
sion. For friend we indicate those participants who viewed explicit in their plan to experiment, with participant 11 stat-
interaction with the AI as primarily a fun activity, and even ing near the beginning of his first level design session “[I’m
literally described the AI as a friend. Participant 13 began her going to] spend some time testing it". Still others adapted
second design session by clicking the end turn button and their behavior as an extension of inspiration from the AI, as
stating “Let’s see what my friend comes up with". Participant with participant 6 reflecting with “It does make me want to
7 described the experience as “fun", but gave the answer of do more weird things". Finally, there were those who decided
no preference in the question of whether he would prefer to attempt to change their own design to make even the most
the tool with or without the AI. “random" AI additions fit, like participant 8, who gave “i am
Collaborator as a role indicated those who wanted an looking for ways that make [the AI additions] legit" as an
equal design partner. This could be positive, for example par- explanation for changing his design.
ticipant 2 stated she “found myself wanting to compliment
the AI’s choices more as I noticed its responsiveness during
the second level build". It could also be negative, when the 6 DISCUSSION AND DESIGN IMPLICATIONS
AI did not match a participant’s expectations for a human Overall, we found that our second study demonstrated clear
collaborator. Participant 11 stated “What I expect from a answers for our research questions concerning adaptation of
design partner is one contributing complete(ish) ideas rather the agent and the value user’s found in the tool. However, we
than small edits" after answering that he did not think the AI found no clear answer for our research question concerning
collaborated with him. Participant 2 echoed these statements, explainable AI helping users to leverage the tool. Instead, it
but with a different sense of collaboration: “It didn’t seem to seemed users were much more likely to come up with their
be trying to create any consistency with my initial choices own explanations for the AI’s performance. We anticipate
as you might expect from a human collaborator". this could potentially have arisen from a lack of experience
For the student role, we indicate that the participant seemed interacting with software and tools with explanations. Al-
to expect the AI to follow their specific design beliefs or in- ternatively, given that we used a human to translate our
structions. Participant 8 gave “I think that the first agent Explainable AI system there may have been social pressure
give less ‘illegal’ suggestions" as his reasoning for preferring not to ask questions related to the AI.
the first agent, suggesting a notion of right and wrong agent For ourselves and for future designers of similar tools, we
behavior. On a more positive end, participant 7 stated he felt identified a few design implications or takeaways from both
the system was collaborating with him, explaining “It tried to studies. First, users vary widely in terms of their expectations
copy things I did and it wasn’t ‘frustrating’ to work with." For for the AI, in terms of the role the AI should take in collabo-
this participant, copying him was the ideal for a collaborator. ration and its performance in that role. One could propose
Lastly, perhaps with a cynical view of a student as underling, a technical solution to this problem, that the AI should just
participant 14 reacted to AI additions by exclaiming “Yes! adapt more successfully, but we expect more success to come
Do my work for me!" from more explicitly outlining what the AI can and cannot
By manager, we indicate that the participant seemed to do. Second, users expressed an interest in adapting to the
view the AI as giving instructions to them or judging their AI, either because they viewed the AI as the more dominant
design. Many of the cases of participants adapting their be- designer or because they were motivated to find the best
havior to the AI occurred within this interactive framework. ways to interact with it. This is a positive sign for anyone
Participant 4 began his session by asking if it was possible for interested in co-creation between AI and human designers,

Paper 624 Page 11


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

as it indicates a willingness to bend design practice to incor- Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale
porate AI. Finally, we found that users overall found value in Machine Learning.. In OSDI, Vol. 16. 265–283.
[2] Alberto Alvarez, Steve Dahlskog, Jose Font, Johan Holmberg, Chelsi
the tool as a source of inspiration, in terms of its ability to en-
Nolasco, and Axel Österman. 2018. Fostering creativity in the mixed-
courage users to rethink a design or to offer new ideas when initiative evolutionary dungeon designer. In Proceedings of the 13th
a designer was unsure how to proceed. We anticipate that International Conference on the Foundations of Digital Games. ACM,
these interactions would transfer to other design domains. 50.
[3] Alexander Baldwin, Steve Dahlskog, Jose M Font, and Johan Holmberg.
7 POTENTIAL FOR NEGATIVE IMPACTS 2017. Mixed-initiative procedural generation of dungeons using game
design patterns. In Computational Intelligence and Games (CIG), 2017
With any machine learning and artificial intelligence applica- IEEE Conference on. IEEE, 25–32.
tion there is a need to interrogate the potential to take away [4] Or Biran and Courtenay Cotton. 2017. Explanation and Justification in
the livelihood of people. We note that this tool was intended Machine Learning: A Survey. In IJCAI 2017 Workshop on Explainable
to function only as a design aide, not as a replacement for AI (XAI).
[5] Margaret A Boden. 2004. The creative mind: Myths and mechanisms.
a designer. The AI systems described in this paper are in-
Routledge.
sufficient to act as solitary designers, developed in favor of [6] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal
augmenting, not replacing, creative work. Jozefowicz, and Samy Bengio. 2015. Generating sentences from a
Given that our AI approach adapts to the designs of a continuous space. arXiv preprint arXiv:1511.06349 (2015).
particular user, it could potentially reinforce design biases. [7] Alex J Champandard. 2016. Semantic style transfer and turning two-bit
doodles into fine artworks. arXiv preprint arXiv:1603.01768 (2016).
For example, replicating a designer’s use of an offensive or
[8] Keunwoo Choi, George Fazekas, and Mark Sandler. 2016. Text-based
over-used design, such as yet another game level in which LSTM networks for automatic music composition. arXiv preprint
one must save the princess. There is no place for our system arXiv:1604.05358 (2016).
as it currently exists to challenge a designer to improve [9] Noel CF Codella, Michael Hind, Karthikeyan Natesan Ramamurthy,
beyond inspiring the designer to employ additional level Murray Campbell, Amit Dhurandhar, Kush R Varshney, Dennis Wei,
and Aleksandra Mojsilovic. 2018. Teaching Meaningful Explanations.
components. We anticipate that there is not a great risk for arXiv:1805.11648 (2018).
negative impacts in the domain of Super Mario Bros., but [10] Ryan Henson Creighton. 2010. Unity 3D game development by exam-
continue to reflect on this for future versions of the tool. ple: A Seat-of-your-pants manual for building fun, groovy little games
quickly. Packt Publishing Ltd.
8 CONCLUSIONS [11] Steve Dahlskog and Julian Togelius. 2012. Patterns and procedural
content generation: revisiting Mario in world 1 level 1. In Proceedings
In this paper we discuss the development of an intelligent of the First Workshop on Design Patterns in Games. ACM, 1.
level editor, which includes an AI partner to collaborate with [12] Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav,
users. We ran two mixed-methods user studies, with the first José MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog.
focusing on developing the initial tool, and the second focus- In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Vol. 2.
ing on interrogating its affect on practicing level designers.
[13] Nicholas Davis, Chih-PIn Hsiao, Kunwar Yashraj Singh, Lisa Li, Sanat
We found that users varied widely in their expectations and Moningi, and Brian Magerko. 2015. Drawing apprentice: An enactive
expected role of the AI agent, demonstrated a willingness co-creative agent for artistic collaboration. In Proceedings of the 2015
to adapt their behavior to the agent, and overall viewed the ACM SIGCHI Conference on Creativity and Cognition. ACM, 185–186.
tool as having potential value in their design practice. [14] Nicholas Davis, Chih-Pin Hsiao, Kunwar Yashraj Singh, Brenda Lin,
and Brian Magerko. 2017. Creative sense-making: Quantifying inter-
action dynamics in co-creation. In Proceedings of the 2017 ACM SIGCHI
ACKNOWLEDGMENTS
Conference on Creativity and Cognition. ACM, 356–366.
This material is based upon work supported by the Na- [15] Christoph Sebastian Deterding, Jonathan David Hook, Rebecca
tional Science Foundation under Grant No. IIS-1525967. Any Fiebrink, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis,
opinions, findings, and conclusions or recommendations ex- and Kate Compton. 2017. Mixed-Initiative Creative Interfaces. In CHI
EA’17: Proceedings of the 2016 CHI Conference Extended Abstracts on
pressed in this material are those of the author(s) and do Human Factors in Computing Systems. ACM.
not necessarily reflect the views of the National Science [16] Anders Drachen, Lennart E Nacke, Georgios Yannakakis, and Anja Lee
Foundation. This work was also supported by a 2018 Unity Pedersen. 2010. Correlation between heart rate, electrodermal activity
Graduate Fellowship. We would like to thank the organiz- and player experience in first-person shooter games. In Proceedings of
ers and attendees of Dagstuhl Seminar 17471 on Artificial the 5th ACM SIGGRAPH Symposium on Video Games. ACM, 49–54.
[17] Upol Ehsan, Brent Harrison, Larry Chan, and Mark O Riedl. 2017. Ra-
and Computational Intelligence in Games: AI-Driven Game tionalization: A Neural Machine Translation Approach to Generating
Design, where the discussion that lead to this research began. Natural Language Explanations. arXiv:1702.07826 (2017).
[18] Umer Farooq, Jonathan Grudin, Ben Shneiderman, Pattie Maes, and
REFERENCES Xiangshi Ren. 2017. Human Computer Integration versus Powerful
[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Tools. In Proceedings of the 2017 CHI Conference Extended Abstracts on
Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Human Factors in Computing Systems. ACM, 1277–1282.

Paper 624 Page 12


CHI 2019 Paper CHI 2019, May 4–9, 2019, Glasgow, Scotland, UK

[19] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2015. A neural [38] Elisa Rubegni and Monica Landoni. 2018. How to design a digital
algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015). storytelling authoring tool for developing pre-reading and pre-writing
[20] Matthew Guzdial, Nicholas Liao, and Mark Riedl. 2018. Co-Creative skills. In Proceedings of the 2018 CHI Conference on Human Factors in
Level Design via Machine Learning. arXiv preprint arXiv:1809.09420 Computing Systems. ACM, 395.
(2018). [39] John Searle. 1999. The Chinese Room. (1999).
[21] Matthew Guzdial and Mark Riedl. 2016. Game level generation from [40] Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial
gameplay videos. In Twelfth Artificial Intelligence and Interactive Digital Intelligence and Machine Learning 6, 1 (2012), 1–114.
Entertainment Conference. [41] Noor Shaker, Mohammad Shaker, and Julian Togelius. 2013. Ropossum:
[22] David Ha and Douglas Eck. 2017. A neural representation of sketch An Authoring Tool for Designing, Optimizing and Solving Cut the
drawings. arXiv preprint arXiv:1704.03477 (2017). Rope Levels.. In Proceedings of the Ninth AAAI Conference on Artificial
[23] Mark Hendrikx, Sebastiaan Meijer, Joeri Van Der Velden, and Alexan- Intelligence and Interactive Digital Entertainment.
dru Iosup. 2013. Procedural content generation for games: A survey. [42] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre,
ACM Transactions on Multimedia Computing, Communications, and George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou,
Applications (TOMM) 9, 1 (2013), 1. Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game
[24] Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In of Go with deep neural networks and tree search. nature 529, 7587
Proceedings of the SIGCHI conference on Human Factors in Computing (2016), 484.
Systems. ACM, 159–166. [43] Gillian Smith, Jim Whitehead, and Michael Mateas. 2011. Tanagra:
[25] Allen Huang and Raymond Wu. 2016. Deep learning for music. arXiv Reactive Planning and Constraint Solving for Mixed-Initiative Level
preprint arXiv:1606.04930 (2016). Design. IEEE Transactions on Computational Intelligence and AI in
[26] Mikhail Jacob and Brian Magerko. 2018. Creative arcs in improvised Games 3, 3 (2011), 201–215.
human-computer embodied performances. In Proceedings of the 13th [44] Sam Snodgrass and Santiago Ontañón. 2014. Experiments in map
International Conference on the Foundations of Digital Games. ACM, generation using Markov chains.. In FDG.
62. [45] Sam Snodgrass and Santiago Ontanón. 2017. Learning to generate
[27] Rishabh Jain, Aaron Isaksen, Christoffer Holmgård, and Julian Togelius. video game maps using markov models. IEEE Transactions on Compu-
2016. Autoencoders for level generation, repair, and recognition. In tational Intelligence and AI in Games 9, 4 (2017), 410–422.
Proceedings of the ICCC Workshop on Computational Creativity and [46] Adam Summerville and Michael Mateas. 2016. Super mario as a string:
Games. Platformer level generation via lstms. In The 1st International Confer-
[28] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. ence of DiGRA and FDG.
Nature 521, 7553 (2015), 436. [47] Adam Summerville, Sam Snodgrass, Matthew Guzdial, Christoffer
[29] Antonios Liapis, Georgios N Yannakakis, and Julian Togelius. 2013. Holmgård, Amy K Hoover, Aaron Isaksen, Andy Nealen, and Julian
Sentient Sketchbook: Computer-aided game level authoring.. In Pro- Togelius. 2017. Procedural Content Generation via Machine Learning
ceedings of ACM Conference on Foundations of Digital Games. FDG, (PCGML). arXiv preprint arXiv:1702.00539 (2017).
213–220. [48] Ilya Sutskever, James Martens, and Geoffrey E Hinton. 2011. Gener-
[30] Di Lu, Casey Dugan, Rosta Farzan, and Werner Geyer. 2016. Let’s ating text with recurrent neural networks. In Proceedings of the 28th
Stitch Me and You Together!: Designing a Photo Co-creation Activity International Conference on Machine Learning (ICML-11). 1017–1024.
to Stimulate Playfulness in the Workplace. In Proceedings of the 2016 [49] Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and
CHI Conference on Human Factors in Computing Systems. ACM, 3061– Cameron Browne. 2011. Search-based procedural content genera-
3065. tion: A taxonomy and survey. IEEE Transactions on Computational
[31] Lara J Martin, Prithviraj Ammanabrolu, Xinyu Wang, Shruti Singh, Intelligence and AI in Games 3, 3 (2011), 172–186.
Brent Harrison, Murtaza Dhuliawala, Pradyumna Tambwekar, Ani- [50] Vanessa Volz, Jacob Schrum, Jialin Liu, Simon M Lucas, Adam Smith,
mesh Mehta, Richa Arora, Nathan Dass, et al. 2017. Improvisational and Sebastian Risi. 2018. Evolving Mario Levels in the Latent Space of
storytelling agents. NIPS-Workshops. a Deep Convolutional Generative Adversarial Network. arXiv preprint
[32] Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sung- arXiv:1805.00728 (2018).
woo Lee, and Bongwon Suh. 2018. I Lead, You Help but Only with [51] O’Meara Victoria Weststar, Johanna and Legault Marie-Josée. 2018.
Enough Details: Understanding User Experience of Co-Creation with Developer Satisfaction Survey 2017 Summary Report.
Artificial Intelligence. In Proceedings of the 2018 CHI Conference on [52] Terry Winograd. 2006. Shifting viewpoints: Artificial intelligence and
Human Factors in Computing Systems. ACM, 649. human–computer interaction. Artificial Intelligence 170, 18 (2006),
[33] Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig 1256–1258.
Schubert, Katherine Ye, and Alexander Mordvintsev. 2018. The building [53] Georgios N Yannakakis, Antonios Liapis, and Constantine Alexopoulos.
blocks of interpretability. Distill 3, 3 (2018), e10. 2014. Mixed-initiative co-creativity.. In Proceedings of the 9th Con-
[34] Chris Pedersen, Julian Togelius, and Georgios N Yannakakis. 2009. ference on the Foundations of Digital Games. FDG.
Modeling player experience in super mario bros. In Computational [54] Georgios N Yannakakis and Héctor P Martínez. 2015. Ratings are
Intelligence and Games, 2009. CIG 2009. IEEE Symposium on. IEEE, 132– overrated! Frontiers in ICT 2 (2015), 13.
139. [55] Georgios N Yannakakis and Julian Togelius. 2018. Artificial Intelligence
[35] Xiangshi Ren. 2016. Rethinking the Relationship between Humans and Games. Springer.
and Computers. IEEE Computer 49, 8 (2016), 104–108. [56] Jichen Zhu, Antonios Liapis, Sebastian Risi, Rafael Bidarra, and
[36] Laurel D Riek. 2012. Wizard of oz studies in hri: a systematic review G Michael Youngblood. 2018. Explainable AI for Designers: A Human-
and new reporting guidelines. Journal of Human-Robot Interaction 1, 1 Centered Perspective on Mixed-Initiative Co-Creation. Computational
(2012), 119–136. Intelligence in Games (2018).
[37] Khashayar Rohanimanesh and Sridhar Mahadevan. 2003. Learning to
take concurrent actions. In Advances in neural information processing
systems. 1651–1658.

Paper 624 Page 13

You might also like