research-article

Open access

Teaching artificial intelligence in extracurricular contexts through narrative-based learnersourcing

Authors:

Elizabeth L. MurnaneAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 270, Pages 1 - 28

https://doi.org/10.1145/3613904.3642198

Published: 11 May 2024 Publication History

All formats PDF

Abstract

Collaborative technology provides powerful opportunities to engage young people in active learning experiences that are inclusive, immersive, and personally meaningful. In particular, interactive narratives have proven to be effective scaffolds for learning, and learnersourcing has emerged as a promising student-driven approach to enable personalized education and quality control at-scale. We introduce the first synthesis of these ideas in the context of teaching artificial intelligence (AI), which is now seen as a critical component of 21st-century education. Specifically, we explore the design of a narrative-based learnersourcing platform where engagement is centered around a learner-made choose-your-own-adventure story. In grounding our approach, we draw from pedagogical literature, digital storytelling, and recent work on learnersourcing. We report on our iterative, learner-centered design process as well as our study findings that demonstrate the platform’s positive effects on knowledge gains, interest in AI concepts, and the overall user experience of narrative-based learnersourcing technology.

1 Introduction

As we enter the “age of AI”, it is imperative to foster young people’s AI literacy by enhancing both their knowledge and skills with respect to emerging technology. By building awareness and understanding of AI, people are empowered to capably and responsibly navigate an AI-infused world, including as part of critically appraising such technology [76]. Such competencies also promote one’s ability to use and collaborate with AI in both professional and personal contexts [74]. For example, exposure to computational concepts in high school helps increase later interest in the field, including among students from traditionally underrepresented groups [43, 46, 77]. It also builds students’ readiness to leverage AI in future educational and professional careers [49] in numerous industries that are expected to have high demands for AI competencies that will exceed the workforce supply [19, 102].

Discussion is therefore growing around the need for AI educational resources [74, 122], and various programs are being launched (e.g., the AI Center for Excellence¹ and MIT’s RAISE initiative for Responsible AI for Social Empowerment and Education²). However, the overall reach of AI-learning efforts is still rather limited [140]. In practice, it is challenging to integrate AI subject matter into existing K-12 settings for a variety of reasons, including a lack of instructor understanding and comfort level in teaching AI as well as a generally standardized and rigid school curriculum [116].

Extracurricular programs therefore provide a compelling opportunity to engage broad learner populations, especially in contexts where traditional classroom options may be limited or lacking. For example, our research partner, the TUMO Center for Creative Technologies³ is a free extracurricular program for 12–18 year olds that provides interdisciplinary educational opportunities around the globe, such as programs on emerging design, computing, and technology topics. In this paper, we collaborate with TUMO to investigate: How might we effectively teach AI concepts to students through extracurricular opportunities that support a variety of learning setups, diverse learner backgrounds, and variable levels of baseline familiarity with AI? In particular, we are keen to explore strategies that empower distributed extracurricular facilitators to deliver quality learning experiences that otherwise may not be feasible for them to offer more independently.

We specifically pursue an approach based on learnersourcing. “Learnersourcing” was originally coined by Kim [58] to describe a practice of student learning through the collaborative production of shared learning resources. The term learnersourcing is meant to evoke the related process of crowdsourcing, in which a task is split into smaller tasks that are distributed and completed by a pool of crowd workers [56]. Learnersourcing does differ from crowdsourcing in a few key ways, namely its motivation and incentive structure. Whereas crowdsourcing conceptualizes participants as "workers" (typically rewarded through paid compensation), learnersourcing is pedagogically motivated and emphasizes that the learning benefits of generating and engaging with useful content can be intrinsically meaningful [113, 132]. Learnersourcing can lead to formation of a learner community, though this is not a requirement nor always an outcome of a learnersourcing system [113].

To scaffold the learnersourcing process, we focus on narrative-based activities. Narratives have shown promise as a learning vehicle by promoting student interest and engagement [45], including in the context of STEM [107] and more specifically computing education, even in very young learners [29, 30, 87]. Using the mechanics of storytelling, abstract concepts can be explained through familiar and intuitive metaphors [69], which makes the ideas more accessible, promotes self-regulated learning [94], and overall creates an immersive and engaging experience [78, 79].

Our proposed strategy is both significant and novel, specifically in that it addresses a critical need for supporting learner interest and engagement in learnersourcing tasks. The importance of student interest and self-efficacy has been well documented in traditional classroom contexts [3]. Singh et al.’s 2022 review and classification of learnersourcing systems affirms that these psychological ingredients also impact learning outcomes in learnersourcing contexts [113]. In particular, the authors demonstrate that low self-confidence and low interest in tasks are top contributors to a lack of engagement, thereby highlighting that enhancing student motivation is essential for the viability of a learnersourcing system.

As we will discuss, narrative-based learning activities have been shown to address precisely such challenges related to interest and engagement; however, research has not yet explored their application in learnersourcing given the relatively nascent stage of such systems. In this way, our work responds to calls for additional research on learnersourcing, such as from Khosravi et al. who point out that "fundamental work" is still needed for learnersourcing systems to reach their full potential and be ready for large-scale adoption, including to leverage new opportunities for human-AI partnerships in content creation and evaluation [57].

Using a narrative-based learnersourcing approach, we develop a technology platform that enables a virtuous cycle of content creation, consumption, and peer feedback, with learners iteratively moving up the ranks and taking on increasingly more sophisticated roles that help sustain the educational ecosystem. Our formative design phases involved three cohorts of teen learners who helped us refine the narrative-based learnersourcing approach and explore its feasibility and efficacy across common extracurricular scenarios. Building on those insights, we report on our resulting system that instantiates narrative-based learnersourcing as well as our deployment study to evaluate the approach in terms of outcomes related to knowledge gains, attitudinal shifts, and overall user experience.

Specifically, we aim to examine the following research questions:

•

RQ1: Can a narrative-based learnersourcing platform increase knowledge of AI concepts?

•

RQ2: Can a narrative-based learnersourcing platform promote interest and engagement with learning AI concepts in an extracurricular environment?

•

RQ3: What is the user experience of a narrative-based learnersourcing platform?

2 Background and Related Work

In the following subsections, we review foundational pedagogical frameworks for teaching AI literacy, as well as related literature on the power of digital storytelling and learnersourcing as specific vehicles for learning.

2.1 Pedagogical groundings of AI learning

With the increasing significance of AI in both professional environments and daily routines, being knowledgeable about AI has come to be framed in terms of "literacy" [74], as students will need to be literate in AI skills to thrive in a future where such technology is prevalent [129]. Research interest in AI literacy education was limited before 2016, but the number of publications on this topic has recently surged [85]. This rise in attention, as well as the formation of major collaborations such as UNESCO’s Artificial Intelligence and the Futures of Learning Project, mark a recognition that AI technology will increasingly permeate the fabric of everyday life and that the public, including young members of society, require knowledge around fundamental AI concepts.

Various theoretical and pedagogical frameworks have emerged to guide educators in developing and delivering AI literacy curricula. In our research, we build on Ng et. al’s comprehensive review [86]. In particular, Bloom’s taxonomy is used for designing learning objectives, activities, and assessments [14]; and Technical, Pedagogical, and Content Knowledge (TPACK) describes how those forms of knowledge enable the successful integration of technology into teaching [47].

Beyond these two AI-specific frameworks, a myriad of other pedagogical approaches further guided us, both prominent theories (e.g., constructivism [6, 41], experiential learning [62, 136], project-based learning [15, 61], problem-based learning [110], applied learning [64], community-based learning [9, 39], and active learning theory [16, 88]) as well as conceptualizations of AI as a 21st-century skill [86, 123, 124] and models that are less commonly known yet still pertinent (e.g., rhizomatic learning theory [17]).

2.1.1 Bloom’s taxonomy adapted to AI literacy.

Bloom’s taxonomy, introduced by Benjamin Bloom in 1956, was intended to help educators design a balanced curriculum, learning objectives, and student assessments according to multiple dimensions in the cognitive, psychomotor, and affective domains [1, 14]. Though often misinterpreted as overly hierarchical, the taxonomy’s "levels" of learning are non-linear and highly integrated [13, 23]. Overall, it is generally viewed as a useful framework for educators seeking to craft a dynamic, deep, and adaptive learning experience by interweaving different types of learning and higher level cognitive skills through a variety of tasks [1].

Recently, in AI literacy in K-16 classrooms [86], Ng et al. proposed narrowing Bloom’s original six levels of cognitive learning (knowledge, comprehension, application, analysis, synthesis, and evaluation) to four AI-specific levels: know and understand AI, use AI, evaluate and apply AI, and AI ethics. Know and comprehend AI relates to building basic AI knowledge and comprehension. Use AI works to transfer AI concepts from theory to practice. Evaluate and apply AI builds critical thinking skills for analyzing and solving problems using AI, which typically requires synthesising prior knowledge. Finally, AI ethics evaluates the implications of AI systems in the real world, considering a diversity of perspectives. As with the original taxonomy, the motivation is to ensure AI curricula devote time for student activities that focus on higher-level thinking, such as AI ethics. Further, Ng et al.’s consolidated taxonomy provides useful specificity and practical value (e.g., concrete guidance for supporting problem-based learning) for AI educators.

2.1.2 TPACK adapted to AI literacy.

The technological, pedagogical and content knowledge (TPACK) framework is another highly influential tool for educators planning curricula [60]. Whereas Bloom’s taxonomy focuses on categorizing and structuring educational objectives, activities, and assessments around cognitive processes [1], TPACK focuses on the integration of technological, pedagogical, and content knowledge for effective teaching, emphasizing the complex relationships that exist among these components [12, 60]. Importantly, TPACK incorporates students’ learning context, in particular the context-dependent role of technology [106].

Ng et al. additionally provide an adaptation of TPACK for AI literacy educators [86] that organizes influential literature (e.g., Touretzky et al.’s "Five Big Ideas in AI" [122]) along TPACK’s three dimensions. The adaptation is designed for practitioners, offering example learning activities and guidance when creating interfaces for learning technology [86]. As described in section 3, we therefore structure learner activities and design our user interface based on considerations from these AI-adapted frameworks. This adaptation of TPACK also provides much of our AI literacy curriculum, including educational resources, assessments, and instructional delivery methods. We specifically utilize the AI for All (AI4ALL) Open Learning Curriculum [2], which applies the adapted framework to meet standards from NGSS Engineering⁴, ISTE⁵, Common Core ELA/Literacy⁶, and CSTA⁷.

2.2 Digital storytelling for education

As the ability to understand and apply AI topics is increasingly seen as an essential skill [97], inclusive forms of technical education are needed to broadly support AI learning and promote educational equity [31, 40]. For instance, creative and playful learning activities can encourage diverse learners’ early interest in AI concepts, as highlighted by recent work on designing inclusive educational technologies for AI learning [34]. Other research has demonstrated opportunities to support AI learning beyond traditional classroom settings, including in informal learning spaces [73], via online tools⁸, and in family settings [33, 35].

Narrative-based approaches are applicable to all such contexts (e.g., storytelling and other story-based instruction), as stories are a versatile and effective pedagogical tool [65, 119] including when integrated with technology (e.g., digital storytelling) [100, 101, 108], for students from backgrounds that face educational inequities [20, 71], and in the context of computing education specifically [63, 120]. For example, recent research has shown that learners are able to grasp computing concepts and create engaging stories through voice-based interfaces [29] as well as comic-based [117] and multimodal (voice and visual) programming environments [30]. Particularly pertinent work indicates that digital story writing can promote AI literacy, including students’ ability to move beyond levels of knowing and understanding to actually using and applying AI knowledge to solve real-life problems [87].

Storytelling is recognized as one of the earliest methods of teaching, in large part because it serves as an instrument for distilling complex information into more approachable, familiar, and engaging formats [93]. Today, technology further amplifies the immersive nature of stories through interactive audio, video, and images [104]. Digital storytelling has therefore been utilized across educational domains to enhance student motivation, engagement, and ultimate knowledge retention [36, 109, 139].

Narrative creation also enables knowledge sharing [138]. By using story-based approaches, learners can leverage their past personal experiences, thereby creating more active participation in the learning process and enhancing learning outcomes [115]. In this way, digital storytelling is often considered an excellent strategy to promote a culturally inclusive environment, by empowering participation from traditionally marginalized voices [82]. In addition, digital storytelling is often a more intuitive way for students to approach abstract concepts, and it also positively shapes learner attitudes around the educational topic at hand [18].

By emphasizing creative skills while simultaneously supporting systematic learning processes, digital storytelling further facilitates imagination, idea generation, information gathering and organization, active learning, self-expression, and problem-solving [36, 137]. In these ways, digital storytelling is a natural vehicle for acquiring and cultivating other 21st-century skills in addition to AI learning, namely creativity, critical thinking, open-ended problem-solving, communication, and leadership [28, 98, 118].

2.3 Learnersourcing

Learnersouring is related to crowdsourcing, a technique that engages groups of (typically non-expert) human workers to complete tasks [58]. Crowdsourcing is often done by unpacking tasks into distributable, smaller "micro" tasks that have self-contained context and can be completed by workers in an asynchronous and remote fashion [38]. Crowdsourcing has seen adoption across disciplines [130], including complex and creative tasks like article writing, decision making, and science journalism [59], as part of the Wikipedia project [127], and in open source software efforts [90].

In 2015, Kim and colleagues used the term "learnersourcing" to describe a practice of student learning through collaborative generation of shared learning resources [58]. A key distinction from crowdsourcing is that learnersourcing is focused on promoting learner-centered needs [132], rather than leveraging users as a form of labor to meet task requesters’ needs. That is, learnersourcing users participate by engaging in activities that not only collectively generate teaching content for other users but also impart learning on a personal level [113]. In this model, the crowd of learners both contributes to as well as benefits from created content, resulting in a virtuous learning cycle [58].

Specifically, learnersourcing has been shown to increase learning gains, not only for those individuals who consume crowd-created content but also for those creators of educational content [56, 113]. Recent work has focused on evaluating the quality of such student-generated content [26] as well as leveraging the method for annotation tasks [8] and to overcome experts’ blind spots when developing content for novices [48].

To generate content, learners are prompted to gain an understanding of the subject matter before creating educational artifacts for others, making content creation in a learnersourcing system similar to project-based learning [4]. Artifact creation involves a process of self-explanation, which has been shown to lead to effective learning [11]. Moreover, content creators actively engage with the learning material as they produce these artifacts, which results in better recall [25] and involves cognitive activities associated with higher-level learning according to Bloom’s taxonomy. Additionally, learnersourcing can foster a community where members connect around shared learning goals [113]. Further, such community can provide mentorship and role models while motivating learners to participate and contribute for the good of the collective [58].

Recent work involving learnersourcing emphasizes the growing possibilities for student, educator, and machine partnerships [57], with ChatGPT emerging as a focal point of student-AI collaboration [7, 72, 81, 92, 114, 121]. Among other considerations, these works explore how ChatGPT can be used to achieve higher quality learner-made content. We leverage these insights by taking a human-AI conversational approach to content creation. In our formative case studies (see section 3.3), we also heard from learners that ChatGPT helped to make content creation more enjoyable.

3 System Description

Tying together these pedagogical frameworks and learnersourcing principles, we begin by offering a conceptual model that applies Ng et al.’s formulation of Bloom’s cognitive taxonomy to the context of narrative-based learnersourcing. Building on this model, we then detail how our platform scaffolds a user’s progression through gamified levels of AI learning. Finally, we instantiate these ideas in a functional narrative-based learnersourcing system co-designed and evaluated with learners from the demographic groups that TUMO typically serves.

3.1 Pedagogically-grounded design strategy

Building off Ng et al.’s formulation of Bloom for AI literacy [85], our approach further adapts that model to satisfy the design requirements of a narrative-based learnersourcing system. Specifically, Ng et al.’s groundwork is nicely compatible with a learnersourcing system given (a) it divides engagement into discrete and measurable learning activities that can be delivered to users as learnersourcing "tasks" and (b) it provides a widely accepted assessment scheme that we can use to measure learning. To reiterate, the levels in Bloom’s and Ng et al.’s frameworks are integrated rather than overly hierarchical [5], and we take the same approach in our system’s design.

Figure 1:

Figure 1 illustrates our pedagogical model, designed around learners completing learnersourcing tasks that require conceptual and technical AI problem-solving. Each task involves multiple choice assessments and can also support peer evaluation. Completing a level requires successfully completing a set of tasks that meet specific educational criteria. As learners progress, tasks become more complex and progressively emphasize higher level cognition.

Figure 2:

We design for a creative, self-directed learning experience that leverages repeated exposure to AI concepts to promote deeper knowledge comprehension through ongoing story engagement. As learners cycle through levels, they are initially encouraged to simply experiment with activities related to the higher levels, but progressively they come to spend considerable time on more advanced content. Our top level ("facilitate learnersourcing") adapts Bloom’s original top level ("create") to serve as a learner-enforced check on the real-world relevance of learner-created content across the platform. In this way, our narrative-learnersourcing approach aims to prepare learners for engaging with AI in real life.

Learner levels guide users to reach key curriculum elements by representing that content and associated learnersourcing tasks as components of a game. Upon reaching a higher level, the game objectives change and the learner is rewarded by unlocking access to interfaces that afford increasing rights and roles (e.g., making implementation-level changes to the system), similar to the collaborative model of responsibility found on platforms like Wikipedia.

Figure 3:

3.2 Learner roles and user journey

As learners (interchangeably called users) level up, they become increasingly responsible for building out the system’s content and capabilities. Thus, learnersourcing tasks and learner levels are mechanisms for structuring engagement to both deepen a user’s learning and support the scalability of the system as a whole. Figure 2 overviews the learner experience, mapping Khosravi et al.’s four core functions in a learnersourcing system (utilize, create, oversight, evaluate) [56] to learner roles. Specifically, each role is defined by a focus on one component of our system: a story adventure component (role = story explorer, focused on "utilize"), a story graph (role = story architect, focused on "create" and "evaluate"), and story infrastructure (role = system facilitator, focused on "oversight"). Appendix section A.2 describes these components further.

Even though the system provides guided sequences of sub-tasks, an individual’s actual learning journey is uniquely personal, steered by narrative preferences, prior technical capabilities, and personal learning objectives. Further, while we expect users to engage in activities they enjoy (e.g., extending a particular plot line in the narrative or implementing mini-game features in code), the system does nudge users to work toward leveling up. For example, after passing level 2, prompts begin to encourage adding a mini-game. The system allows for the involvement of more advanced facilitators at each stage and is designed to promote their engagement and interest so that they continue to provide oversight and contribute to the longevity of the platform.

A learner’s progression from explorer, to architect, to facilitator indicates an increasing demonstration of conceptual, technical, and real-world-applicable AI literacy. The following subsections further describe the various roles and associated learning tasks.

3.2.1 Story explorer role.

As a story explorer, a learner works to know and understand AI by engaging with existing story content.

“Start”: New learner onboarding. A user’s first experience on the platform is opening a web URL and landing on the main splash screen (Figure 3, top left image). After creating an account, the user enters into the onboarding story scene. This scene explains the interface’s controls, and its completion establishes that the user understands and can use the UI. She then transitions to scenes made by other learners, and the story’s narratives begin to unfold.

Interacting with a story scene. The learner receives story content on the web app in the form of text, interactive media, and links. After delivering such content, the app waits for the user to enter free response text or select among hard coded choices to continue the action of the scene. We designed the interface to feel familiar to text messaging UIs like Facebook Messenger or Telegram. Story history is preserved as an infinite scroll. Each scene uses an everyday metaphor to explain a topic in AI (e.g., relating training an AI model to instructing a driver during a taxi ride). These metaphors, as well as the setting and other narrative details that give consistency to the story experience, are generally drawn from a specific real-world place and culture: Yerevan, Armenia. Once the learner has completed a scene, she can revisit it at any time using an interactive graph-like representation of the scenes and their connections.

Engaging in guided real-world experiences. As the action in the story scene unfolds, the narrative links these events to a task that requires the learner to engage in activities off-platform, either on the internet or in the physical world. For example, one learning task involving training an image classifier involves a user taking photographs in real life.

Figure 4:

Interacting with embedded mini-games. Within the context of a scene, the learner can enter into an embedded mini-game. Mini-games support critical thinking and more active engagement with an AI topic. One example on our platform is a variation of "20 questions" that illustrates how the k-nearest neighbors algorithm classifies a data point’s grouping. The learners who collaborated to contribute this mini-game (in the story architect role, described in section 3.2.2) implemented the underlying algorithm using facilitator-provided pseudocode as a reference, and they integrated a narrative backdrop for the mini-game into the flow of the broader story happening in that scene.

Post-scene assessment. At the end of every scene, the learner’s comprehension of the AI topic is tested via a multiple choice assessment embedded in the story content. The assessment questions (see Appendix section A.3) employ the same extended metaphor as the scene. A learner must correctly answer all questions to progress. Learner responses are logged and used to compute scene-level statistics that are accessible to facilitators as part of system monitoring.

Post-scene feedback. Finally, the learner is presented with a feedback form (see Appendix section A.6 for examples) for self-reporting enjoyment and perceived learning during the scene.

3.2.2 Story architect role.

As a story architect, a learner focuses on using AI to create new content as well as evaluating and applying AI to synthesize prior knowledge, solve problems, and build critical thinking skills. Figure 4 illustrates UIs used by story architects.

Choosing a "need ticket" to work on. Borrowing from agile development methods, our system uses ticketing to define, organize, and prioritize learnersourced contributions. Each ticket describes a "need" in the story for a scene contribution. For instance, a particular AI topic may require additional explanation if existing content is lacking. After a user makes a selection from a set of available tickets, the system directs the learner through the process of inserting a new story scene or modifying an existing one, as appropriate.

Completing a story preparation template. To prepare for story writing, the system provides a Creator Guidebook (in the form of a Google Doc, which we created and that facilitators maintain) of other story scenes and external lecture videos, code examples, and quizzes to help the learner sufficiently master the AI topic her contribution will teach. The guidebook walks the learner through a process of “story-sourcing” to identify real-world narratives that can serve as compelling and relatable teaching metaphors. The guidebook also offers storytelling exercises for architecting an interesting and engaging narrative. To create an outline of her story contribution and embedded quiz questions, the learner follows a story scene template provided by the guidebook. If the learner has questions at any point, she can use a dedicated Discord server to get help from other learners on the platform.

Figure 5:

Writing a scene with AI collaboration. The learner next follows a procedure described in the guidebook for converting the completed template into ChatGPT prompts, then reviews, edits, and incorporates the ChatGPT responses into an outline, resulting in a more polished scene script. The learner then integrates the script into the “story wiki," a shared staging environment used to test story transitions and wordsmith. Once the text and media are ready, the learner converts the scene into a TypeScript⁹ file using a provided wrapper library and guidebook documentation. She then integrates this file (representing the scene) into the server following a linking process outlined in the guidebook. Finally, the learner runs the backend code locally and tests her contribution.

Embedding a real-world activity or mini-game. To complete level 3, the learner must either embed a mini-game or design and incorporate an activity that involves real-world engagement. The guidebook walks through this process. Changes are marked with a placeholder in the wiki and then, after transitioning the wiki to code, added directly as TypeScript code. To complete level 4, the learner must self-educate herself on an ethical issue related to the AI topic she is working on and create a level 3 type contribution that demonstrates and engages with the issue. The guidebook offers direction by pointing to a variety of information sources, including news outlets (e.g., Forbes and New York Times) and works by prominent scholars in the field (e.g., System Error [131]).

Passing a code review and merging changes into production. After testing code locally, the learner creates a pull request. Using a built-in messaging service on the platform, she requests that a higher-level learner review the changes. Once reviewed and accepted, the pull request is merged into the production environment.

3.2.3 System facilitator role.

As a facilitator, a learner focuses on system oversight and gains increasing responsibility in maintaining and scaling the platform. Figure 5 illustrates UIs used by facilitators.

Identifying user experiences and needs on the platform. When acting as a facilitator, the learner monitors the community’s engagement with existing content using the analytics dashboard, which includes scene-level statistics such as completion rates, comprehension assessment scores, engagement scores, and anonymized feedback. The learner also monitors and responds to help requests, bug alerts, architect requests, and general complaints on the Discord server. She additionally engages with the story content in a targeted way, both to hunt for issues (e.g., story gaps in need of content) and to uncover moments of delight [83]. Synthesizing the signals from these different channels, she accumulates her own set of notes describing user needs on the platform.

Preparing learning resources for story architects. Once a facilitator is better aware of issues with story content, she assesses the status of tools and support available to story architects. Specifically, she considers flagged issues in the guidebook, lectures, and platform features, which she addresses by preparing learning resources for architects. The guidebook describes all these procedures.

Synthesizing needfinding results into "need tickets". The learner discusses her needfinding results with other facilitators and helps synthesize findings to compile a set of tickets, which each define a need that should be addressed by an architect. The learner incorporates these into the existing set of open tickets.

Adapting learning assessments from established curriculum. If a need exists for resources for creating post-scene assessments, the learner pulls questions from available trusted online teaching materials (e.g., AI4ALL), modifying them as appropriate.

Creating features, fixing bugs, and performing code reviews. Our platform’s codebase is available for all learners to view and modify via pull request. When a facilitator identifies a need for a new feature or a bug fix in the underlying system infrastructure (e.g., the web app, server, database, analytics and tracking, or dashboard), she can create and submit a pull request for review by the codebase owners (in our case, this is the research team, though other administrators are possible, such as extracurricular coordinators or even learners that eventually advance to principal admin status). Facilitators also are responsible for reviewing, commenting on, and accepting or rejecting pull requests.

3.3 Formative design phases

It is important to note that we did not create our learnersourcing system through a top-down development process. Rather, we engaged in multiple rounds of case study workshops to understand learner needs, build out and iteratively test functionality, and generally understand how the system might be utilized by diverse learners in a variety of extracurricular scenarios.

All participants had little to no prior coding experience and were recruited by TUMO, which helped with coordination and provided supporting resources. Here we briefly overview each of these preliminary studies and key design takeaways. Elaborated descriptions are provided in Appendix section A.1.

3.3.1 Case study 1 (C1).

Our first case study focused on exploring learnersourced creation of narratives, specifically to understand how we might effectively provide scaffolds to guide participants with little coding experience through creating narrative-based content that convey AI concepts. C1 took place as an online workshop with 18 learners located in Yerevan, Armenia who ranged in age from 12–17 years old (9 female, 9 male).

The workshop began with an overview of narrative structure, followed by participants developing stories incorporating AI concepts. Our initial system prototype allowed participants to implement interactive functionality for their narratives. The study revealed that participants benefited from creating narrative-based content in stages and were enthusiastic about using stories for learning. Notably, narratives that blended fiction and personal experiences, especially those featuring local contexts, were perceived as the most interesting. Participants did struggle with generating narratives from scratch, indicating the value of storytelling prompts encouraging learners to draw on everyday experiences.

3.3.2 Case study 2 (C2).

Our second case study focused on supporting collaboration and how emphasizing the integration of personally and culturally significant narrative elements might address learners’ difficulties in generating story ideas. C2 took place as an in-person workshop with 31 learners located in Yerevan, Armenia who ranged in age from 15–19 years old (21 female, 10 male).

Today, Armenia remains in a precarious national position in the wake of the Second Nagorno-Karabakh War. We observed that this point of national identity was a very important backdrop to a storytelling workshop experience for many students. To create space for these unique and meaningful Armenian stories to enter the project, we encouraged participants to seek “stories of joy” as part of the prompts we gave for sourcing stories. Resulting stories ranged from memories of peaceful moments during the war, to a treasured birthday celebration with a grandparent, to recollections of a favorite song. As learners adapted these plot points into a larger narrative, we encouraged them to add details that would make the story feel as culturally rich as they desired. Some added their favorite music, some added photos of the locations they were describing, and some included specific cultural experiences.

Learners then merged their individual stories into a cohesive overarching narrative, which we described initially as a "rhizome" [17] of narrative possibilities. This organic composition of stories resulted in formation of a new setting, "Apricot Stone City", a fictionalized version of Yerevan, the capital city of Armenia. Learners devised this name and appreciated its multifaceted cultural significance. Specifically, the historic buildings of Yerevan are made of a pink volcanic rock called tuff¹⁰, which gives the city a general coloring similar to ripe apricots. Apricots are indeed a symbolic fruit to Armenians, deeply connected to folklore and a favorite treat. As an example of this cultural significance, Armenia was represented by the song “Apricot Stone” at Eurovision 2010. Further, inside an apricot, there is a pit (“stone”) that contains a seed that is edible and a favorite of Armenians. Our learners took this as a metaphor for the learning experience, where a sweet and delightful narrative contains hard lessons that eventually give way to a nutritious and fulfilling learning experience.

C2 also examined learner collaboration. To support coordination, we specifically developed project management tooling as needed, such as ticket tracking and the story graph component for visualizing scene connections. Moderating content became a key consideration too, with learners designated or self-selecting to be facilitators who performed such moderation. These learners flagged issues in content (e.g., age-inappropriate, inconsistent, and inaccurate language). Participants appreciated the gradual roll out of structured, systematic processes, and they demonstrated that these collaboration features could be utilized without inhibiting creativity.

3.3.3 Case study 3 (C3).

Our third case study focused on scalability and the potential for learnersourced narratives to be effective and engaging across different cultural contexts. C3 took place as an in-person workshop with 16 learners located in Berlin, Germany who ranged in age from 14–17 years old (8 female, 8 male).

Given the culturally-specific elements of the narratives learnersourced from C1 and C2, C3 explored how this content would be received by a learner population with different demographic characteristics, based in a different geographical location, who potentially may therefore share minimal cultural common ground with the original content creators. Encouragingly, high engagement levels and knowledge gains for C3 participants suggested that users from diverse cultural backgrounds could indeed form a dynamic learning community on our narrative-based learnersourcing platform.

3.4 Final technical implementation

After concluding our preliminary phase of case studies, we made a final batch of technical improvements to the platform to enhance various aspects of the user interface and to instrument the system with logging for our evaluation study. For instance, we improved graph view animations¹¹, improved our custom web-app peer-to-peer messaging service, and integrated IdleTimer¹² for detailed engagement tracking [32] through mouse, keystroke, and React DOM monitoring [67]. We performed rounds of internal bug testing and implemented fixes accordingly as part of finalizing the system and moving from the “prototype” to “production” phase of our design process. More fine-grained technical implementation details about the system can be found in Appendix section A.2.

4 Evaluation

Having built and refined the platform (named Apricot Stone City for reasons described in section 3.3.2) based on insights from our iterative design process, we next evaluated its ability to promote learning, engagement, and generally positive user experiences.

4.1 Participants and procedures

Participants were recruited through TUMO’s existing channels including email lists as well as through referrals from a former TUMO instructor. We also utilized snowball sampling [84], encouraging people to share our invite with interested acquaintances.

In designing for a target demographic, we focused primarily on high school aged learners, tailoring our AI curriculum and teaching strategies to this age group [86]. However, from the beginning we were sensitive to concerns from TUMO that, while a target demographic is important, their web-based extracurricular programs include popular offerings that are open access. They found that these are typically utilized by high school aged learners as well as (to a lesser extent) middle school aged learners and college aged students and above. Therefore to align with our partner’s vision of broadly accessible extracurricular educational offerings and to support inclusive participation among anyone aiming to gain AI literacy, we did not employ age-related exclusion criteria.

Our final sample consisted of N=27 participants representative of the learner community that TUMO serves. 14 participants were high school aged (14–18 years old), 11 were college aged (19–22 years old), and the remaining 2 participants were 23+ years old. All participants had at some point been affiliated with TUMO.

In addition to gaining access to the Apricot Stone City platform as described in section 3 and associated resources (e.g., videos, user manual), we also created a Discord chat group where learners could interact with each other for peer-based Q&A, collaborative coordination, and general social engagement. The study ran for 1 week, similar to case study 3, as we were interested in exploring the minimum viable timeframe by which positive effects could be observed with respect to knowledge gains, learner attitudes, and sense of community, among other outcome variables described next. The 1-week timeframe is also representative of many of the most popular extracurricular programs offered by TUMO. Procedures were reviewed by the IRB at Dartmouth College.

4.2 Data collection

4.2.1 Exams, quizzes, and other learning assessments.

We measured learning gains through a comprehensive multiple choice AI knowledge exam administered pre-study and post-study. These exams were isomorphic. Within the platform, post-scene AI knowledge assessments also served as a test of understanding for the AI topic covered in that specific scene. The exam and post-scene assessment questions were created based on the learning objectives and “Unpacked” sections of the AI4K12 framework [2]. The Appendix (see A.3) provides questions from exams and post-scene assessments.

We created the exam, while post-scene questions were created by learners in the architect and facilitator roles, using reference materials we created. During learner assessments and exams, question ordering and multiple choice response ordering were randomized to minimize order effect bias [66, 133]. To discourage random guessing, we included an "I don’t know" option [112] and emphasized that scores would be anonymized and that answering honestly would help us evaluate and improve our system. Pre-study and post-study exams included thirty questions (three questions for each of the ten AI topics covered¹³). Post-scene assessments had five questions.

Once learners progress to more advanced levels on the platform, they transition from consuming content to creating it, including quizzes. We therefore also assessed those learners’ knowledge based on our expert review of their created content, using a rubric with metrics for the correctness and the helpfulness of content. Specifically, our rubric is based on Denny et al.’s metrics for comparing the quality of teaching resources generated by an LLM with resources created by students as part of a learnersourcing activity [27].

4.2.2 Automatically tracked measures.

The platform logs a variety of data, including timestamps of all interactions, visits to platform pages and clicks on specific content, scene edits, and completion of quests and levels. Specifically, we calculate affective, behavioral, and cognitive markers of learner engagement [52] as follows.

Table 1:

Group	Mean	Median	Std Dev	Min	Max	S-W statistic	S-W p-value
All learners	38%	37%	23%	0%	97%	0.96	.35
All learners excluding outliers	36%	43%	22%	7%	73%	0.98	.94
All female learners excluding outliers	38%	43%	12%	7%	50%	0.76	.01
All male learners excluding outliers	39%	30%	18%	17%	73%	0.89	.13

Table 1: Baseline AI knowledge, as measured by the pre-study AI knowledge exam.

•

Sessions: Session logs track active user engagement in milliseconds and sign in/sign out events. To ensure we are only recording engagement when the user is truly active, we use IdleTimer, which tracks mouse and key events. Session logs also record what component in the UI is being viewed (e.g., "adventure view," "story graph view," etc.)

•

Interaction with content: We record all story content (text, images) served to the user and all responses by the user.

•

Content changes: We log all content pushed by users and require users to specify the type of change being made as either a "bug fix" or a "feature." We also survey users to self-report the time spent working on content that they push.

•

Social interaction: We log all communication on the platform between users, using the on-platform messaging capabilities. We also log requests to the facilitators to fix bugs or review content as well as facilitator replies.

•

Attentiveness to content: We operationalize attentiveness as the amount of time spent on a given story scene, normalized (divided) by the amount of words in that scene.

•

Scenes and topics completed: A timestamped record of all the scenes and AI topics that a user has completed.

•

Progress through levels: A timestamped event record of each time a user advances to a new level.

4.2.3 Surveys and self-reported data.

At the beginning of the study, before administering the knowledge assessment, we gave participants a questionnaire to gather personal attitudes about AI-related self-efficacy and sense of belongingness in computing [68, 95, 128], readiness to learn [50, 53, 135], as well as subjective enjoyment regarding reading stories and writing stories. We also administered a task-value belief scale, given the perceived utility of a topic contributes to a learner’s interest in it [24, 70, 95].

A post-survey at the end of the study asked these same questions and also assessed user experience and perceived usability of the platform via the User Engagement Scale (UES-SF) [91] and the extended Unified Theory of Acceptance and Use of Technology (UTAUT2) [125]. We also asked participants to rate the perceived usefulness of specific levels and content in supporting their learning, asked for open-ended reactions to the story content, and asked about sense of belongingness in the Apricot Stone City learning community. Self-assessments delivered at the end of scenes and quests gathered additional self-reported data about self-perceived learning and emotional enjoyment of scenes as well as feedback about what users liked or would change about the scene. The Appendix (A.5, A.6) provides specific questions from these instruments.

4.2.4 Focus group interviews.

At the conclusion of the study, we conducted two focus groups (see our semi-structured interview guide in Appendix section A.7). Given we were interested in understanding the sense of community that participants perceived on the platform, we opted for focus group style interviews rather than individual interviews. Seven participants (3 female, 4 male) volunteered to take part in the interviews.

We explored each of our research questions by inviting participants to describe their perspectives on the favorite things they learned and what aspects of the platform experience facilitated or hampered their learning, how and why their interest and engagement in AI improved or diminished during the study, their likelihood of taking AI or computer science classes in the future, their sense of community on the platform, their reactions around the story-based scaffolding, and factors surrounding their intentions to continue or discontinue using the platform after the conclusion of the study.

Interviews were transcribed and qualitatively analyzed together with open-ended self-report data. Specifically, two authors used inductive coding to surface themes and insights, which we use to contextualize the quantitative results reported in the next section.

5 Results

In this section, we present results from our evaluation of the Apricot Stone City system. Main findings are framed around our research questions. We also describe other notable insights, including gender differences as well as results that speak to our platform’s scalability.

5.1 RQ1: Can a narrative-based learnersourcing platform increase knowledge of AI concepts?

Foremost, we are interested in examining whether a narrative-based learnersourcing approach can promote learning and increase knowledge of AI concepts.

5.1.1 Baseline knowledge levels.

First, we look at learners’ baseline knowledge of the AI topics covered on the platform (see Table 1). A Shapiro-Wilk (S-W) test [103] suggests the data is drawn from a normal distribution. No significant differences are seen in pre-assessment scores related to gender (Cohen’s d = 0.131, p = .775).

5.1.2 Changes in AI knowledge.

Between the pre-study and post-study AI knowledge assessments, we observe a significant mean increase of 24.2% (see Table 2). Regarding outliers, three learners scored exceptionally low on the pre-exam (<3.33%, which is 2 sigma below the mean), and two learners scored exceptionally well (>80.33%, which is 2 sigma above the mean). Highlighting and excluding outliers is useful to understand if aggregate results are skewed by the outcomes of learners who scored near zero or near perfect on the pre-assessment. For consistency, any "excluding outliers" notes (in Tables 1 and 2, as well as later results) refer to this same set of outlier participants.

Table 2:

Group	Mean pre to post change	Median pre to post change	Cohen’s d
All learners	+24.20% ***	+23.33% ***	1.08
All learners excluding outliers	+23.18% ***	+22.67% ***	1.28
All female learners excluding outliers	+29.67% ***	+28.33% ***	2.21
All male learners excluding outliers	+17.78%	+15.00%	0.86

Table 2: Percentage change in AI knowledge, measured by performance on pre- to post-study isomorphic AI knowledge exams. (*p < .05, **p < .01, ***p < .001).

Table 3:

Group	Mean	Median	Std Dev	Min	Max	S-W statistic	S-W p-value
All learners pre exam	3.30	3	2.8	0	10	0.92	.03
All learners post exam	6.33	7	2.87	1	10	0.92	.03
All learners pre exam excluding outliers	3.18	3	2.11	0	8	0.96	.45
All learners post exam excluding outliers	6.27	6.5	2.76	2	10	0.91	.047

Table 3: Concept mastery, as measured by AI knowledge exams.

Running analyses without these outliers shows similar results (a significant mean increase of 23.18%). High scoring outliers did improve on average too (mean 91.67% on the post-assessment, compared to a mean of 90% on the pre-assessment). These results suggest that learners with a wide range of prior experience levels could benefit from participating in our learnersourcing platform. Further, we do not observe statistically significant differences when splitting participants by race, nationality, or age, suggesting our platform may support AI learning across diverse users.

To examine our assumption that content creation is reflective of knowledge gains, we next separate learners into two groups, those who created content and those who only consumed content. T-test comparison does show a significant difference (p = .03, Cohen’s d = 1.07), with the content creation group having a larger positive increase in knowledge (mean: 31%, median: 30%, std dev: 19%). Regarding attitudinal factors that may impact knowledge gains, we observe a significant correlation between pre-study self-efficacy and pre to post knowledge assessment scores (r = 0.56, p = .04).

In addition, we are curious to understand if learners could reliably self-assess their own learning. Comparing objectively tested and self-reported assessments of AI knowledge post-study, we see a significant negative correlation (r = -0.43, p = .045). This result suggests a potential Dunning-Kruger effect, whereby poor performers overestimate their knowledge and high performers underestimate their knowledge [75]. This phenomenon has similarly been observed in work on narrative-centered, game-based learning environments [89]. Such findings indicate the importance of objective, expert-made assessments to reliably gauge learning outcomes versus relying solely on learner self-reports.

5.1.3 Learning specific AI concepts.

To understand how well learners mastered specific concepts, we next group knowledge assessment questions by topic (e.g., Finding patterns in data, Structure of a neural network, etc. for the 10 AI topics the platform covers [2]). Considering a learner to have sufficient mastery of a topic if she could correctly answer at least two of its three assessment questions, we observe a significant increase in concept mastery from the pre to the post assessments (Cohen’s d = 1.09, p < .001) — specifically, a mean increase of 3.04 (median: 3.0, std dev: 2.72), with pre- and post-study concept mastery results shown in Table 3.

We also consider the connection between knowledge gains and progress through Bloom’s levels by conducting an ordered logistic regression between the difference in pre to post exams and the max level completed by a learner. We find a statistically significant relationship with a coefficient of 3.05 and p = .04. This relationship is to be expected, so confirming it adds to our confidence in the correctness of these two independent measures of learning.

5.2 RQ2: Can a narrative-based learnersourcing platform promote interest and engagement with learning AI concepts in an extracurricular environment?

While promoting knowledge gains is a central goal of our work, it is also important to understand learner interest and engagement during the experience, as both are core factors in sustaining participation in learning activities and the mastery of increasingly complex topics over time [51].

5.2.1 Interest in AI concepts.

Learner interest is a four-phase construct spanning triggered situational interest, maintained situational interest, emerging individual interest, and well-developed individual interest [51].

Early and ongoing interest is signaled by time spent on learning activities. On our platform, learners spent a cumulative 674.2 hours engaging with content. Of all available AI subtopics, learners spent the most time on the most advanced topics — specifically, neural networks (e.g., “Adjusting internal representations”, “Weight adjustment,” and “Structure of a neural network") — whereas introductory AI concepts (e.g., “Finding patterns in data”) saw the least amount of time. In post-study focus group interviews, learners said that they preferred to engage with the topics they were least knowledgeable about out of interest and curiosity.

Further, in moving from content consumers to content creators, we see that learners generally re-engaged with scenes they engaged with and rated favorably as consumers, extending or creating alternative narrative paths for scenes and quests that had originally captured their interest. All learners cumulatively spent the most time creating scenes for quests related to the advanced AI concept “Adjusting internal representations” (115.48 hours).

5.2.2 Engagement with the learning process.

In the context of education, engagement is commonly characterized in terms of three dimensions: affective, behavioral, and cognitive [42].

Table 4:

Baseline factor	Regression coefficient	p-value
Self-efficacy	1.66	<.001
Readiness to learn	-1.14	.046
Pre-study self assessment of CS and AI topic comprehension	-0.12	<.001

Table 4: Least squares regression to test how learner baseline factors (self-efficacy, readiness to learn, and a detailed self-assessment of AI comprehension) impact knowledge gains, as measured by pre to post exam scores.

Figure 6:

Affective aspects of engagement relate to a learner’s emotional reactions to the learning activities. The scene that received the highest emotional enjoyment rating (mean: 4.95/6, std dev: 0.92) is called “Carol, Ani, and Anoush have dinner together.” In this scene, a “foreign visitor” to Apricot Stone City (whose name is Carol) joins a local woman (Ani) and her elderly mother (Anoush) for dinner. Learners’ feedback consisted largely of positive comments about the cultural content of the scene — for instance: “Story content was really pleasing as it showcased some typical traditional Armenian cultural aspects” (P7) and “That’s a nice change of scene and a great relief after a lot of learning” (P26).

Our main behavioral indicator of engagement is attentiveness to content. Given our attentiveness findings demonstrate gender differences, we report them in section 5.4 with other such results.

Finally, cognitive engagement is operationalized by completion of learning activities. Specifically, Figure 6 illustrates the distribution of max levels reached by participants during the study period. It is important to note that moving from level 1 (story explorer) to level 2 (story architect) entails transitioning from content consumption to creation. The observed drop-off suggests that learners found this challenging. By comparison, the bump at level 4 (system facilitator) suggests that most learners who were able to successfully make the leap from consumers to creators were then motivated and capable enough to "go all the way". Further, many of these learners had little to no prior coding experience, suggesting our system is able to successfully transition novice learners into roles involving progressively greater concept mastery and platform responsibility.

For most participants, this study was their first time using software development tools like GitHub, VSCode, and React. Based on our analysis of interview data, feedback from the post-study survey, and observations of peer-based troubleshooting via the Discord server, we can see that getting up to speed with these new tools was a challenge for many learners and a source of pride for those who succeeded. Without such support, learners were at risk of stalling in their progress or even leaving the platform. Such findings highlight future opportunities for needfinding to understand other reasons for drop-off and strategies to minimize it.

5.2.3 Personal attitudes and beliefs of learners.

Self-efficacy relates to a person’s belief in her ability to master a task [68, 95, 128]; and readiness to learn refers to the behavioral, cognitive, and socio-emotional skills that indicate preparedness to receive instruction [80]. Given both are important factors in learning, we analyze how these personal attitudes may relate to the amount of interest and engagement participants demonstrated on our platform, as well as whether the learning experiences afforded by our platform can help to positively shape these attitudes.

We conducted a least squares regression to see how baseline levels of these variables affected knowledge gains, as summarized in Table 4. The significant findings suggest that learners with low attitudinal scores (who might therefore typically struggle in both traditional and online learning contexts [10, 22]) actually tended to learn more on our platform. We suspect that wrapping the AI concepts into approachable, familiar stories makes the learning process more accessible for such learners. Including an additional factor, personal interest in stories, results in a regression coefficient of 5.70 (p = .26). While not statistically significant, the high coefficient does suggest that narrative-based learnersourcing is more powerful for learners who are already interested in storytelling.

In addition, we see evidence that the learning experience on the platform can improve personal attitudes and beliefs over time. Specifically, analyzing changes from pre to post responses on attitudinal measures, we find that self-efficacy increased from a median of 10/30 to 21/30 (a 110% change). Readiness to learn remained stable at a median of 16/30, although we believe this is due to the nature of the questions in the online learning readiness instrument [135] we employed (e.g., "I am good at setting goals and deadlines for myself", "I am relatively good at using the computer", etc.), which may have been less salient or shiftable during this study experience.

5.3 RQ3: What is the user experience of a narrative-based learnersourcing platform?

To understand the user experience (UX) of our platform, we analyzed UX questionnaires, self-reported sense of community, and how participants reacted to narrative-based learning content.

Figure 7:

5.3.1 Semi-quantitative UX measures.

To measure how much participants felt absorbed when using the system and how rewarding they found those interactions to be, we used questions adapted from the User Experience Scale Short Form (UES-SF) [91], on a total scale of 3–15. Our analysis indicated a positive user experience on the whole (mean: 11.2, median: 11.0, std dev: 2.5, min: 6, max: 15).

The UTAUT2 [125] is another well-established instrument that helps evaluate additional UX dimensions. Figure 7 illustrates participants’ generally positive responses. We note that the social influence construct demonstrated the highest ratings, suggesting the platform’s ability to promote peer awareness and connection even within the study’s relatively short timeframe.

5.3.2 Sense of community.

Further examining social aspects of user experience, we additionally surveyed participants after the study to measure their sense of the community. On a scale of 3–9, results again point to a reasonably strong sense of community (mean: 6.6, median: 7.0, std dev: 1.8, min: 3, max: 9), though not overwhelmingly so. This can likely be attributed to the relatively short study timeframe, and we expect these community ties would deepen over longer periods of engagement on the platform.

5.3.3 Experience with the narratives.

As mentioned in section 5.2.2, reactions to the story content seemed to play a main role in participants’ emotional engagement with the system. Our inductive coding of qualitative data demonstrated that participants consistently felt most confident with AI concepts that were conveyed via analogies built into the story (as reported by P1, P2, P4, P5, P9, P11, P12, P14, P15, P19, P22, and P26). Learners applauded the use of "metaphor," "analogy," "story examples," and "the process of learning new things by the story." Participants rarely discussed AI concepts in their feedback outside of references to the storyline and story explanations. Our questions around the use of narrative as a tool for AI education overwhelmingly received positive responses.

Although not mentioned as consistently, several participants also expressed excitement for the inclusion of Armenian history and culture in the story. This reflected a trend where scenes with Armenian characters, settings, and subject matter perceived as culturally-authentic also received high enjoyment and learning scores. That said, sometimes Armenian references that wrapped AI content in cultural associations did not always land well with learners. For example, in one scene, a character visits Tsitsernakaberd, the memorial above Yerevan dedicated to the 1.5 million Armenians who perished in the Armenian Genocide. Here, one participant criticized that the location was only superficially discussed, and similarly noted that comparing AI machines to characters with culturally-significant roles could sometimes feel dehumanizing — “I didn’t like that Anoush is treated as a neural network” (P19). Both the author of this content and this reader self-identified as Armenian. Other Armenian learners did appreciate the same content though, highlighting how learner-made content covering sensitive cultural topics can be received very differently.

Beyond culturally-centered storytelling, scenes that introduced new characters out of the blue or that adopted a specific style of genre (e.g., fan-fiction, horror, etc.) tended to receive poorer reviews overall. For example, a scene titled “Morgana explains a gradient,” ventured into fan-fiction based on Knights of the Round Table lore, and it received the lowest emotional enjoyment rating of all platform content (mean: 2.33/6, std dev: 0.94). In post-study feedback from surveys and interviews, the majority of learners disliked these types of scenes because even when they contained “fun elements” (P1) or “relevant AI content” (P9), their tone felt out of place with the broader narrative to which these scenes were contributing — e.g., “It takes away from the beauty of the game with ties to Armenian culture” (P19). Such comments were evenly expressed by learners who identified as Armenian and non-Armenian.

At the same time, some learners very much enjoyed creating these styles of scenes and were not discouraged in doing so even if the scenes received little to no traffic on the platform. These findings again indicate the importance of designing narrative learnersourcing experiences that balance community preferences with opportunities for individual creative freedom.

5.4 Examining gender differences in learning, engagement, and experience

Across a number of our quantitative metrics and qualitative feedback, gender seemed to play a role in learning outcomes, engagement levels, and overall user experience.

Figure 8:

Table 5:

Metric	Measured by	Female Mean	Male Mean	Female Median	Male Median	Female SD	Male SD	p-value (t-test)	Cohen’s d
Baseline CS knowledge	% of questions correct on pre-study knowledge assessment	36%	40%	43%	30%	22%	24%	.62	0.20

Baseline self-efficacy	Pre-study self-efficacy assessment, scaled to [0, 1] range	0.53	0.51	0.53	0.55	0.09	0.12	.53	0.25

Knowledge gains	Difference in % correct scores on pre to post knowledge assessments	32%	17%	33%	15%	21%	19%	.05	0.81

Attentiveness	Time spent per word (sec)	2.8	0.02	1.4	0.002	4.4	0.03	.04	0.87

Engagement	Time spent overall (hr)	26.4	23.7	11.5	4.93	40.6	49.6	.88	0.06

Table 5: Comparing female and male learners on assessed, self-reported, and logged metrics of learning and engagement.

First, we observed a pattern where female learners appear to have slightly higher learning gains compared to male counterparts. For this reason, we have split results for female and male participants in Tables 1 and 2. This difference is also illustrated by Figure 8. To better understand these trends, we analyze female and male learners in terms of various objectively assessed, self-reported, and automatically logged measures of knowledge and engagement, including to check for any statistically significant differences between these two sets of learners. Table 5 summarizes results.

We saw no significant gender differences in prior knowledge at baseline (as previously noted in section 5.1.1) nor self-efficacy at baseline, as well as negligible effect sizes for each of these comparisons, suggesting that any observed differences in post-study assessments is attributable to experiences on the platform.

Figure 9:

We did observe noticeably higher female attentiveness, as illustrated in Figure 9. This may be due to male participants reading faster than female participants or male participants skimming some story content, although the literature on gender differences in text processing speed suggests the former interpretation is unlikely. Specifically, relevant work has shown that females outperform males in alphabet-related processing speed tasks as well as other reading and writing tasks [105]. It is also worth noting that our measure of engagement (time spent on the platform) shows a less pronounced gap between female and male learners (see Figure 10).

Based on our qualitative analysis, many male learners did consider the narrative interesting, though non-essential to the learning process. Some even saw it as a distraction. This is not to say these learners did not find the platform meaningful; on the contrary, UES-SF metrics for both genders were roughly comparable (see Figure 11). Rather, we take these findings as indicators that female and male learners used and benefited from the stories in different ways.

Specifically, female learners tended to self-identify with the story’s main and supporting characters (nearly all of whom were also female) and explored AI concepts by focusing on aspects of characterization. On the other hand, male learners did not self-identify much with the characters, but they were nonetheless interested in the narrative and focused more on the provided metaphors to make sense of AI concepts. To help contextualize these general observations, we look at two participants, P3 and P17, as vignettes.

Figure 10:

Figure 11:

P3 identified as female, was 17 years old, and lived in the US. She reported minimal CS experience and no connection to Armenia. She learned about the study through a future college classmate. After joining the study and engaging with the platform, P3 met many other learners who, like herself, were considering a STEM degree. The system’s community aspects were a major draw for her and kept her highly engaged. In her words, “Where I felt most strongly about community was seeing everyone else’s code and thinking, ‘Wow, this was created by multiple people, even if I didn’t meet those people.’"

P3 also noted that learning through stories and then creating stories herself made the learning process much more approachable and enjoyable. She also found pleasure in learning about a culture different from her own: "I like [the cultural and storytelling] aspects a lot. At least from my point of view, you don’t really hear a lot about Armenian culture and inside someone’s view of it. And I love learning a lot about different kinds of cultures.”

These stories also made the AI topics more legible to P3: “Being able to metaphorically make connections to the topics made learning the content much easier. I would be like, ‘I don’t get this,’ but then in the next sentence, I would be relating to something that Carol [a story character] would be talking about – and I’d be like, ‘Okay, I understand it!’ The statues scene was my favorite, when Carol was admiring the statues, pointing out different aspects, and stating what they meant to her, beyond just being ’stone’ or ’marble.’” Further, P3’s perspectives illustrated how narrative learnersourcing had a powerful impact on her attentiveness and self-efficacy, as stories felt gripping and relatable. We found P3’s experience to be representative of many other female participants’ reactions to the platform.

P17 self-identified as male, was 18 years old, living in Greece, and self-reported significant prior CS experience from extracurricular courses. At first, he described the stories as more suitable for younger students, calling them “childish” and “a pointless thing to read.” At the same time, P17 very much enjoyed creating new story content for the platform. He found level 4 especially interesting: “Level 4 was interesting because there were the ethical concerns, and it was more for real life situations and problems that I might face. It was a great reminder that we need to think about those issues. They are very important.” P17 said he did not consider the community aspects of the platform to be personally very important to him, though he believed others might find that more meaningful. His perspectives, while skewed slightly more negative, showed similarities to many other male learners who we engaged with throughout the study, illustrating the need for systems like ours to support story creation, exposure to broader AI considerations such as ethics, and other features that resonate more deeply with users like P17.

5.5 Speaking to scalability of the platform

Much of our motivation in developing a learnersourcing platform is this approach’s promise in promoting a virtuous cycle of engagement, contribution, and maintenance that can enable the community to sustain itself over time as it grows to reach and benefit increasingly broad groups of learners. Here we examine important scalability challenges, including the cold start problem (having sufficient content when the platform is relatively new), issues of content quality, and how new users can become aware of the platform through organic, learner-driven referrals.

5.5.1 Tackling the cold start problem through human-AI collaboration.

When considering scalability, it is important to understand the early stages of a platform’s lifecycle, including how much content needs to be pre-seeded when the platform is first activated in order to provide a sufficient initial foundation on which early members can then build. For this study, we seeded the platform with learner-made content sourced through our three preliminary case studies, which are described in section 3.3.

A main difference between our case studies and final evaluation study was that we did a considerable amount of hand-holding of learners in the case studies (precisely to get around problems like writing stories without a lack of prior content to build on), but we took a fully hands-off approach in the final study, letting learners act as facilitators for each other. Our experience throughout these phases indicates that creators of new instantiations of a narrative-based learnersourcing platform like Apricot Stone City will need to work with early adopters to draft initial content and/or produce it themselves. Then once that ball is rolling, we saw that learners are enthusiastic about keeping the contributions coming.

To explicitly gauge such contributions made to our platform, we calculated the number of changes made by story architects (total number of scene contributions: 104, mean: 8, median: 6, min: 1, max: 20). The average word count per story contribution was 590, and the cumulative word count of all content on the platform was 88,636. These results represent a meaningful amount of work, particularly when considering that all story content was in English yet many of our users were English as an Additional Language (EAL) speakers.

This level of output was supported by learners leveraging ChatGPT as a writing collaborator, which permitted acceptable quality and more consistent content creation by learners who were mostly novices in both writing and AI. In our interviews, we found that learners appreciated this story consistency, while feeling it was important that stories were co-written by a human rather than simply automatically generated by large language models (LLMs) or other forms of generative AI. The use of such tools as part of collaborative human-AI writing activities thus may be a promising avenue to promote scalability and minimize cold start issues, while preserving the essential aspects of user-driven content creation that are inherent to learnersourcing as a pedagogical strategy.

5.5.2 Gauging quality of learnersourced content.

A key issue in crowd-powered systems is ensuring the quality of user-generated content. To assess the quality of learnersourced content on our platform, we performed an expert review of all learner-created story scenes. We used criteria for gauging correctness and helpfulness (both on a scale of 1-5) based on Denny et al.’s evaluation of learnersourced content [27], averaging our independent scores per metric. For content correctness, we found a mean of 3.91, median of 4, and standard deviation of 1.03. For content helpfulness, we found a mean of 3.38, median of 4, and standard deviation of 1.41. These results indicate that a majority of learners were able to produce content on our narrative-based platform that is at least "satisfactory" in terms of teaching quality. Future steps can build on prior research about content quality in crowdwork, learnersourcing, massive open online courses, and other similar systems, in order to design strategies to both monitor and enhance quality level on narrative-based learnersourcing platforms specifically.

Figure 12:

5.5.3 How learner referrals help scale and sustain the platform.

While we only analyzed data from our study cohort of N=27 recruited participants, we were interested in how much organic growth our platform could generate in just one week through unprompted, learner-driven word-of-mouth. During the course of the study, 105 users joined the platform and stayed active. (We consider users "active" if they successfully create an account and interact with at least one scene a day for three days. A referral is considered "accepted" only if the recipient becomes an active user).

Most of this userbase discovered our platform through referrals from friends or acquaintances. Figure 12 illustrates the flow of referrals accepted by new users. Each distinct arc on the circle’s circumference denotes one user, with the total circumference representing all 105 users active on the platform. The distance along the circumference covered by any particular arc represents the number of referrals that particular user made. Connections between the arcs inside the circle depict which users accept which referrals.

Analyzing the referrals, we observed that new users tend to join in groups linked to a specific referrer, with two users (P21 and P25) driving over half of all accepted referrals. Identifying and studying these highly motivated referrers would be a valuable next step to understand strategies for reproducing this effect. Surprisingly, the main distinguishing factor of our largest "super referrer" was not prior knowledge or experience in computing but rather his enthusiasm and belief in the concept of narrative learnersourcing, which bodes well for its appeal factor and potential for uptake.

6 Discussion

To promote learning experiences for AI literacy that are inclusive, immersive, and personally meaningful, our research explores a strategy that combines learnersourcing with the intuitive, engaging benefits of narrative-based learning. Specifically, our platform, Apricot Stone City, leverages culturally-specific metaphors to wrap AI concepts in an interactive narrative adventure.

6.1 Contributions and reflections

We followed an iterative, user-centered design approach spanning three formative studies with target users along with an extended deployment of the resulting system. These efforts also involved forming and maintaining a strong collaboration with our extracurricular program partner, TUMO, as part of connecting with learners to understand their needs, preferences, and experiences around novel learning environments.

Our specific contributions include:

•

An innovative narrative-based learnersourcing approach that uses stories as a vehicle for active, collaborative learning of AI concepts.

•

Instantiation of this design approach in a usable online platform, with features and functionality informed and refined through multiple rounds of learner-centered design.

•

Rich findings from a deployment study that demonstrate narrative-based learnersourcing can promote knowledge gains, topic interest, and positive learner experiences, including for novice learners and those from backgrounds traditionally underrepresented in computing and STEM fields.

6.1.1 Community partners and culturally-situated narratives.

Our research was done in partnership with TUMO, which provided supporting resources, far reaching access to study participants, and real-world extracurricular learning contexts. It would be worthwhile for future work to consider testing learnersourcing systems in partnerships with other types of organizations, such as smaller-scale extracurricular efforts, formal educational settings, and community-based organizations that engage with additional age ranges.

Given TUMO’s roots in Armenia and our initial case studies in Armenian contexts, culturally-situated storytelling became an important foundation of our approach to narrative-based learnersourcing. Our design process evolved to center cultural expression as a key aspect of the learning experience, with our formative design phases exploring cultural and cross-cultural factors through in-depth, direct engagement with learners and their communities. These findings align with work showing narratives in online communities can be amplifiers of participation, shared values, fulfillment, and emotional connection [37]. In engaging with such issues, our work spans sociology, pedagogy, and narratology. We hope our research inspires additional interdisciplinary scholarship at the intersections of HCI and broad humanistic and technical domains.

6.1.2 Narrative-based learnersourcing helps build inclusive learning communities at scale.

While this paper focused on the deep and extended human-centered design process to develop a narrative-based learnersourcing system and verify its preliminary effectiveness, an important next step is establishing the approach’s scalability. Our findings do indicate that users are able and motivated to learn from and build on prior learnersourced content.

We also saw promise in the organic growth of our platform’s userbase through word-of-mouth referrals. The emergent behavior we observed in our referral patterns aligns with prior HCI research on "super users" as exceptionally influential [134]. Such trends could also be an early indication of preferential attachment — a property where nodes in a graph acquire new links at a rate relative to the node’s existing degree. Colloquially called the "rich get richer" effect, this mechanism has contributed to the growth of large social networks and crowdsourcing projects, including Wikipedia [21, 54].

Taking a step back, we reflect on why scalability matters. In our study, we observed that scaling the platform results in not only more content quantity, but also higher content quality as more learners become facilitators. In turn, new and improved content kept learners engaged and motivated their outreach to new generations of users. Similarly, we observed that the sense of community flourished and persisted across the four learner cohorts (the 3 case studies plus the evaluation study) based on post-study interviews as well as on-platform interactions (e.g., helping each other debug code, learn GitHub, or troubleshoot the development environment).

Lastly, by conducting our case studies and final study in different learning settings, we observed that the platform could be effectively employed in a range of contexts. It would be desirable for future work to explore the value of narrative-based learnersourcing in a variety of other non-formal and formal learning environments [55].

6.1.3 Narrative as a tool for empowerment.

In addition to the positive outcomes we saw for learners across our study sample, we observed that narrative-based learnersourcing can be an effective means of promoting sustained engagement, AI mastery, and self-efficacy for young women specifically. Such findings are especially significant given women are traditionally underrepresented in computer science and strategies are needed to enhance girls’ interest and retention in the field [43, 46, 77].

Our work additionally responds to "adult-centrism" issues raised by Prilleltensky et al., who claim that most work interprets the realities of young people from the point of view of an adult, thus depriving young people of power [99]. When considering Prilleltensky’s terminology, our work illustrates how narrative-learnersourcing could empower young people by extending access and control of the "key dimensions of power" — namely, access to valued resources, opportunities for participation and self-determination, and opportunities for the development of competence and self-efficacy.

6.1.4 Entering the age of AI.

Finally, we situate our work in relation to AI literacy education as a movement. Prominent advocates of AI literacy curricula are not only interested in responding to but also shaping the trajectory of technological and social development. This position was articulated by Stefania Giannini, the UNESCO Assistant Director-General for Education, in her 2023 report on generative AI and the future of education:

“In our environment of AI acceleration and uncertainty, we need education systems that help our societies construct ideas about what AI is and should be, what we want to do with it, and where we want to construct guardrails and draw red lines. Too often we only ask how a new technology will change education. A more interesting question is: How will education shape our reception and steer the integration of new technology – both technology that is here today and technology that remains on the horizon?” [44].

We designed our system based on an understanding of the connectedness between these two questions. That is, we position learners as both consumers and creators of learning content, thereby supporting their ability to actively bridge gaps between these roles by employing conceptual and technical critical thinking and by leveraging the connective tissue of narratives.

6.2 Limitations and future work

In acknowledging limitations and associated opportunities for future work, we first note that learnersourcing systems have well-documented content quality challenges, such as inaccurate or unhelpful learner-made resources [96]. Our system’s design choices aimed to promote quality, for instance through learner-driven content evaluations (e.g., post scene ratings) and control mechanisms (e.g., code reviews and bug reports to facilitators). However, evaluating the relative effectiveness of each of these mechanisms was not the primary focus of this paper. Although our expert review indicated that learner-made content was generally correct and helpful (see section 5.5.2), more work is needed to unpack the many design questions related to quality control.

Next, it is worth pointing out that our system’s logging capabilities allowed us to reliably assess many on-platform behaviors; however, some sub-tasks involving content creation (e.g., writing code locally) were more difficult to track. During in-person case studies, our direct observation could overcome this issue, but we had to rely on self-reported data during our final evaluation study. Future development work could address such limitations by implementing more comprehensive tracking support to enhance scientific investigations, while staying sensitive to learner privacy, data security, and best practices around disabling tracking to avoid unnecessary surveillance of users.

Finally, we chose a 1-week period for our evaluation study given that duration reflected the standard length of many of TUMO’s extracurricular programs, plus we wanted to understand whether knowledge, attitudes, and user experiences could be meaningfully shaped in this relatively short timeframe. While we did find promising outcomes along these lines, it is critical to undertake more extended studies to investigate novelty effects, sustained engagement, and ultimate scalability of narrative-based learnersourcing.

7 Conclusion

This paper presented a pedagogically-grounded, narrative-based learningsourcing approach to teaching AI, with a focus on extracurricular contexts. Following an iterative, user-centered design process involving three preliminary case studies in multi-cultural settings, we instantiated our approach in a web-based platform named Apricot Stone City. Evaluating our system in a deployment study with N=27 participants, we find that users experienced a significant increase in AI knowledge, showed positive shifts in self-efficacy over the course of the study, and were motivated to engage and re-engage with increasingly advanced AI concepts. Measures of user experience indicated that the narrative elements were a main contributor to meaningful engagement on both cognitive and emotional levels, and participants expressed a strong sense of community on the platform. Wrapping AI concepts into story content also made the learning experience more approachable for many users, particularly female learners and those with little prior experience with computer science. Based on our insights, we offer reflections on the value of community partners and culturally-sensitive narratives in creating inclusive learning experiences, and we encourage the HCI community to see narratives as an empowering educational tool as we enter the age of AI.

Acknowledgments

We would like to thank the TUMO staff, other project collaborators, and express our deepest appreciation to the learners who participated in this research. We also acknowledge the support of the Dartmouth PhD Innovation Program.

A Appendix

A.1 Preliminary phases of system design

To iteratively design, implement, and gather feedback on our system, we undertook three case studies with learners in a variety of extracurricular programming formats offered by TUMO. Our first case study focused on exploring learnersourced creation of narratives, the second focused on supporting collaborative engagement, and the third focused on scalability and the potential for learnersourced narratives to be effective and engaging across different cultural contexts. Identifiers for case study participants use the format Cx-y (e.g., C1-1, C1-2, C1-3, C2-1, etc.) — see the table in A.8 for each participant’s ID, age, gender, and case study number.

A.1.1 Case study 1 (C1): Creating an initial prototype, developing narrative-based learning content, and exploring learnersourcing-based scaffolds.

With our first case study, we aimed to build out our initial prototype, develop narrative-based learning content, and explore what sorts of scaffolds were needed to help sufficiently guide learners with little to no prior programming or AI knowledge through a successful learnersourcing experience.

C1 was conducted as an online, 3-week workshop in partnership with TUMO, who assisted in recruiting N=18 learners who ranged in age from 12–17 (9 female, 9 male). All participants had little to no prior coding experience. The first author served as a live facilitator, who developed lesson plans about fundamental AI concepts including machine learning (supervised, unsupervised, and reinforcement algorithms), classification and prediction, model training, testing, internal representations, neural networks, feature sets, and bias. For each concept, a short lecture, educational handout, and activity were created. In prior years, the first author taught multiple other workshops on AI with TUMO and drew on this familiarity and past content. The second author assisted in facilitation and drew on her professional experiences with these AI topics.

The workshop began with an overview of narrative structure, storytelling techniques, and choose-your-own-adventure style games, where learners explored how a linear narrative could be translated into a graph structure with multiple different possible endings. Using this notion of a “story graph” to characterize a narrative as a set of nodes (story content) and edges (transitions among these story scenes and plot points), the authors introduced students to concepts of graph structure, first in plain language and then in terms of code.

The remainder of the workshop focused on participants devising their own stories to convey AI concepts. Specifically, the first author would deliver a lecture on an AI concept and utilize the associated educational handout and activities. Participants were instructed to complete worksheets created by the authors to develop character profiles and map out plot arcs that weaved in the AI concept. This approach was based on Movement Oriented Design, which offers principles for developing educational multimedia narratives that are emotionally engaging and high quality [111]. As an example, after the session on k-means clustering, C1-9 wrote a story where a wizard challenges the main character to sort the items in a massive bag. In another story, a tourist photographs a series of statues during a day of sightseeing and writes their names on the back. To demonstrate concepts related to training data and classification, the story then follows the character as she visits a new part of town with different statues of the same historical figures and tries to guess their names based on her photographs.

Next, participants were tasked with implementing their narrative content as interactive functionality using our initial system prototype. We saw that participants, and particularly those with little prior coding experience, benefited from creating narrative-based content in stages (e.g., graph level summary, then wiki, then code), which resonates with literature on digital storytelling in education [126]. Participants were enthusiastic about the interactive stories they had made and displayed genuine delight with employing stories as a vehicle for learning. Notably, they actively shared the platform with their friends and family out of enthusiasm to show their work to others.

Regarding design implications for the narrative specifically, we found participants perceived the stories that blended fiction and personal experiences as the most interesting. Narratives that featured local contexts, such as places in or people from Yerevan were especially exciting to them. We did observe that participants often struggled to generate story content from scratch, so having prompts that invited them to source stories from everyday experiences was helpful. Further, participants appreciated that the story aspects of the learning experience pushed them in directions where they may have originally lacked self-confidence. That is, we observed that a number of learners who rated themselves at the start of the study as strong technically but weak in terms of storytelling appreciated the story aspects by the end of the study, and vice versa for students originally confident in storytelling but feeling less sure of their technical capabilities. Such findings reinforce ideas that narratives are an accessible, approachable vehicle for learning, including about concepts where learners initially feel intimidated.

After C1, our key open questions related to how learners might effectively collaborate and interact with one another on the platform, and whether it is viable for such a platform to enable not only virtual engagement from distributed learners but also in-person engagement from co-located sets of learners. We therefore undertook our next case study to explore these questions in preparation for our final evaluation.

A.1.2 Case study 2 (C2): Refining designs to provide opportunities for learners to collaborate and interact on the platform as well as engage in-person.

For C2, we recruited N=31 learners learners who ranged in age from 15–19 (21 female, 10 male). All participants had little to no prior coding experience. While C1 helped confirm that narrative-based learning content can be exciting for learners to utilize and create, it also exposed a need for additional scaffolding to help these learners better collaborate on these efforts. C2 therefore focused on these interpersonal considerations, along with whether our approach could remain inclusive to diverse extracurricular learning setups, particularly those with in-person components. Specifically, C2 was a 3-week workshop held in-person in Yerevan, Armenia and conducted in partnership with TUMO, who assisted in recruiting.

Before starting the workshop, we improved support for collaboration on our prototype through better ticket tracking and documentation in the story graph component (mimicking features and processes from popular Agile development tools like Jira). For instance, to manage synchronization between the wiki and codebase, we added a procedure for ownership tagging to the guidebook.

We found that participants were able to utilize all of these features with negligible confusion, which could be easily resolved by the facilitator or a peer. Further, we observed that learners appreciated having structured, systematic processes to follow, and that these processes did not inhibit their creativity. The need for content moderation did come up several times during the workshop, such as when participants added age-inappropriate language or other content. We approached this issue by denoting two volunteers who had reached the role of facilitator to be content moderators, charging them with reading through all updates to the wiki at the end of each day and bringing to our attention anything that they deemed problematic. If we requested changes, the moderators would create tickets accordingly; such roles could foreseeably be baked into the platform. In general, we responded to emerging issues like these by building out processes on the platform for responsibilities that learners could take up themselves, rather than by directly intervening such as changing content ourselves.

A.1.3 Case study 3 (C3): Investigating the scalability of the platform, particularly across cultural contexts, and whether students can build on each other’s contributions even over shorter engagement timeframes.

For the third case study (C3) we recruited N=16 learners learners who ranged in age from 14–17 (8 female, 8 male). All had little to no prior coding experience. C3 focused on whether users could in fact build on each other’s contributions to continue expanding out and sustaining the platform, including when learners were from different cultural contexts. The fact that C2 took place in Yerevan, Armenia and that all C1 and C2 participants identified as Armenian shaped those workshops and the resulting content. Students centered their Armenian identity within the created narratives, including cultural history and points of Armenian pride.

Given this embedding of cultural identity into learners’ created content, C3 explored scalability to a different learner demographic, from a different geographical region, with potentially minimal cultural common ground. We were particularly motivated to examine this question given expressions of enthusiasm from C1 and C2 participants in not only developing content with cultural significance but also for that content to be broadly shared. For example, C2-8 shared that in her favorite part of one story, the plot described Armenians as being known as “warm and hospitable people” and how that made her feel good knowing that “people from other countries will read about it.” She emphasized that this “made the project more exciting.” C2-2 felt strongly that she didn’t “want Armenia to only be associated with Kim Kardashian” but instead she wanted “it to be associated with the history, and the fun stuff too.” A few participants expressed that they were motivated by the cross-cultural possibilities of sharing the project with learners from other countries; C2-29 remarked it could be interesting for those learners who “might not get another opportunity to learn about Armenia.”

Further, while C1 and C2 lasted 3 weeks, we wanted to understand if learning benefits could be observed after a more brief experience on the platform. C3 was therefore run as a 1-week workshop in Berlin, Germany. Through an onboarding survey, we verified there was very little cultural awareness or connection to Armenia for C3 participants, despite the mutual affiliation with TUMO.

This workshop began with a review of the existing prototype. Specifically, after C3 participants engaged with the platform, they wrote what they felt worked well and not well on sticky notes, which we then grouped using affinity diagramming. From resulting clusters, we created a list of aspects to keep as-is, along with suggested additions and other changes, which we then collaboratively ranked by importance. We had conducted the same exercise at the end of C2 (i.e., on the same prototype that C3 participants assessed), and our comparison of the recommendations from the C2 and C3 participants revealed few differences. Both groups focused on improvements related to the desire for style consistency across narratives, more options on the wiki for architecting out story structure, and better ways to integrate teaching lessons into the narrative content. These results were encouraging, as they demonstrated a consistency between the two groups of learners.

Regarding the narrative, we were additionally pleased to see that the local cultural references were well-received by the C3 learners. In interviews, these participants reported that they were excited to learn more about and participate in stories about Armenia, with this content seen as “more intentional” and “less random”. These findings indicate that drawing stories from real-world, localized, and cultural experiences tended to result in more compelling narrative content, even for learners from outside those regions and cultures.

In C3, learners’ baseline exposure to computing was much lower compared to C1 and C2. In this group of participants, most had never taken a computer science course and found the wiki and its associated processes much more intuitive than working directly in the codebase. That said, within two days, participants were able to contribute to the content and began addressing some of the areas of improvement they had identified in the initial review. Given the C3’s condensed timeframe, we observed that learners focused on augmenting existing narrative content (e.g., fixing plot holes, making stylistic improvements, and developing small spin-off stories), demonstrating their ability to build on other users’ contributions and support a cycle of improvement and growth on the platform.

A.2 Technical details of the narrative-based learnersourcing platform

As mentioned, our system consists of three main components: the story adventure, the story graph, and the story infrastructure. Here we provide technical details about these components.

A.2.1 Story adventure component.

The story adventure component provides a web application to enable a broad base of learners to engage with story content. Specifically, the component supports state-dependent concurrent sessions, a chatbot-esque choose-your-own-adventure story experience (where choices can be either free response text or multiple choice), progress tracking, and an animated 3-dimensional graph-based story navigation system. These capabilities are enabled by three web views: an "adventure view", "graph view", and "progress view". A discussion of the backend can be found in A.2.2 on the story graph component.

Adventure view supports an infinite scroll of multimedia, links, and text messages with an interactive display that supports input via multiple choice selections and free text responses. Graph view displays each story scene with a representative picture (e.g., a church, if the scene relates to visiting a church) floating in three-dimensional space (projected using a force-graph calculation). The pictures are connected according to parent-child relationships between the scenes. A learner can navigate the interface with standard, mouse-based zoom, rotate, and translate controls (e.g., similar to Google Earth). The learner can select a scene by clicking on its picture to see an interactive pop-up summary. Using this summary, the learner can return to scenes in adventure view (provided they have already completed the scene). The progress view displays information about the learner’s learning progress. This includes progress toward completing level requirements and statistics related to story scene creation and consumption.

A.2.2 Story graph component.

The story graph component has two objectives: supporting the story adventure and enabling new content creation. The first is accomplished via an exposed API from the story adventure frontend views. Powering this API is a Node.js server and Firebase database that we collectively refer to as “the storyteller.” The storyteller is a finite-state-automaton that maintains a graph representation of story content and a history of every learner’s “story state” as determined by his or her story interactions. It uses these, when provided with a learner’s latest interaction, to respond to “scene continuation” API requests. The second objective is a balancing act, as it involves making the barrier to content creation low for novice learners while supporting advanced learners’ technical creative expression. This is accomplished by dividing the content creation process into three discrete stages, each involving a progressively lower-level tool; these are the “graph view create GUI,” the “story wiki”, and the "storyteller content graph.”

The "graph view creates GUI" is where learners initiate the process of creating or modifying a scene. This entry point is accessible to learners who have just completed level 1 because it shares the same web interface (as an unlockable “architect mode”). The GUI flow asks users to fill out a template describing the change they are making, then directs them to the guidebook (discussed next in A.2.3) for guidance on creating content and using the story wiki and the storyteller content graph.

The story wiki is a collection of Google Docs, where each doc corresponds to a particular scene, used for drafting and testing story content. Each doc is drafted via two steps. First, a scene is drafted at a conceptual level, using scene and character archetype templates. This is provided to ChatGPT as a contextual pre-prompt, using a process outlined in the guidebook. The learner is then guided through fleshing out the template into a non-linear story script using exchanges with ChatGPT. Steps in prompt formulation are structured (e.g., the GPT-3 and GPT-4 models both included AI4ALL and "5 Big Ideas in AI" in their training data, so targeted references to this content by name generally results in reasonably well-crafted responses), though critical thinking is still required to check facts and keep narratives consistent. As story content is created, it is represented as nodes (using bullet points) with further indented multiple choice options and rules-based free response interactions representing the edges as hyperlinks. Internal links to a particular node on a page are created with header refs. More complex interactions such as updates to state variables, mini-games, or real-world activities are simply described in plain text.

The storyteller content graph is where content is implemented as code. Once a content change has been staged on the wiki, a first-time architect is directed to clone the storyteller Node.js from the GitHub repository and run the server locally to test her changes. This is facilitated by the guidebook (via a video tutorial). While popular closed- and open-source options exist for finite-state-automaton, we chose to code the server from scratch to meet the particular integration demands and desired capabilities of our system. Specifically, our codebase compartmentalizes functionality into three modules: 1) the core state-based automata functionality, 2) state management, and 3) story content. The complete abstraction of this complexity via library functions designed as wrappers for our wiki content was the key to helping learners with little to no prior coding experience write code. Using these custom library functions, moving content from the wiki to the codebase became simple. We repeatedly observed that setting up VSCode was often the most challenging aspect of architect onboarding for complete novices. In particular, TypeScript Types helped learners identify and debug errors in their content formatting pre-compilation. A custom set of pre-compilation tests we wrote (e.g., to ensure that every edge actually links to an existent node) was also useful for learners. For novice learners advancing to levels 3 and 4, our library provided simple approaches to the requirements of mini-game embedding and real-world engagement. For more advanced learners, representing story content as data objects wrapped in code (rather than as data in Firebase) had an advantage in allowing exposure to complexity on demand. For instance, the library enabled more advanced learners to access the full functionality of our finite-state-automaton (e.g., to run NLP APIs on free text input using user state variables as a reference) using native TypeScript with full Type support.

A.2.3 Story infrastructure component.

The story infrastructure component is the underlying codebase and learner-driven design and maintenance processes that enable architects to create content that meets the needs of explorers. This component’s primary features are the frontend codebase, storyteller state management and core functionality, analytics view, and guidebook. We focus here on the analytics view and guidebook.

Analytics view is a page in the web app that becomes accessible once a learner completes level 4. It contains a dashboard with summary statistics about the community’s interaction activity, feedback, and quiz and assessment results on each story scene. Though we did not actually enable it for learners for privacy reasons, we do support tracking learners’ low-level actions. Our dashboard can display this data in aggregate assessments and time series views. To log events, we used popular Node packages like IdleTimer.

The guidebook is a Google Doc and accompanying lecture series we created involving six videos. The guide is organized into sections describing architect and facilitator roles in terms of objectives, processes, external resources, and interfaces that should be utilized to meet these goals. Any learner can suggest edits to add content to the guidebook or correct errors.

A.3 Example knowledge assessment questions

The following multiple choice questions are provided as examples of our isomorphic pre- and post-study knowledge assessments. The correct answer to each question is bolded and listed first.

•

Which algorithm/model from the following list would most likely be appropriate for separating a dataset into 5 clusters?

K-means

Decision tree classification

Support vector machine

Naive Bayes classification

I don’t know

•

Which of the following is most suitable for classifying emails as either spam or not spam? (Assume that you already have a dataset of emails, with some that you know were marked by users as "spam".)

Supervised learning

Unsupervised learning

Reinforcement learning

None of the above

I don’t know

•

What is the purpose of k-fold cross-validation?

To assess performance of the model on unseen data

To improve the quality of training data

To label training data for supervised learning

To mix training data with test data to improve performance

I don’t know

•

Which of the following statements best describes overfitting in the context of machine learning?

Overfitting occurs when a model performs well on the training data but fails to generalize to new, unseen data

Overfitting is the result of a model being too simple, leading to poor performance on both training and test data

Overfitting is a desirable outcome, as it indicates that a model has learned the underlying patterns in the data effectively

Overfitting is a term used to describe the process of training a model on a diverse set of data to improve its overall performance

I don’t know

•

How might data visualizations help in identifying potential biases in training data? (Select the most relevant response)

By highlighting imbalances based on demographic variables

By representing data in visually appealing ways

By providing insights into feature correlations

By visualizing training time vs. model accuracy

I don’t know

A.4 Example learner-made quiz questions

The following multiple choice questions are provided as examples of post-scene quizzes made by learners on the platform. The correct answer to each question is bolded and listed first.

•

In the "practice giving a speech" example, Lia was like a reinforcement learning model and she gave a great speech - how did she learn to get better over time?

By doing a smart form of trial and error and seeing what works best for her, then doing more of that thing

By calculating patterns in data

Through the analysis of labeled training data

By following her teacher’s instructions on how to give a great speech

I don’t know

•

In the cloud gazing scene, cloud formations represent the concept of supervised learning - why is that?

Carol imagines that each cloud shape has an ideal label, like a bunny or a cat, and that with some examples she could train her friend to guess the shapes she sees

Using a set of complicated logical rules in code, each cloud can be classified as a type of animal

When she looks at the clouds they resemble a child learning through trial and error

The interconnectedness of the clouds resemble a neural network

I have no idea

•

You looked at metaphors for a each neural network architecture - which one excels at processing images and extracting features?

2D convolutional network

Recurrent network

Generative adversarial network

Feed-forward network

I have no idea

•

Remember Carol’s interaction with the taxi driver? What part of training an AI model did her conversation and observations represent? (You’ll see small hints in the scene - go back if you need help)

Model evaluation

Feature selection

Data preprocessing

Model training

Idk, take me back to the scene!

A.5 Pre- and post-study survey questions

This section provides questions from our pre- and post-study surveys. Options are bold, with information about the metric in italics.

A.5.1 Questions asked pre- and post-study.

•

Academic engagement and self-efficacy

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I am motivated to learn about AI.

If I wanted to, I could potentially do very well in computer science.

I try to make connections between what I learn in different classes and experiences.

I put a lot of effort into the work I do.

Even when things are tough, I can perform quite well.

•

Self-efficacy

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I’m confident I can understand the basic AI concepts taught in this study.

I’m confident I can understand the most complex AI material presented in this study.

I believe I can do an excellent job on the AI-related activities and evaluations in this study.

I’m certain I can master the AI skills being taught in this study.

•

Task-value

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I think I will be able to use what I learn in this study in other classes.

It is important for me to learn the material in this study.

I am very interested in the topics of this study.

I think the material in this study is useful for me to learn.

I like the subject matter of this study.

Understanding the subject matter of this study is very important to me.

•

Interest value, utility value, attainment value

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I like AI.

AI is exciting to me.

I am fascinated by AI.

AI concepts are valuable to learn.

Being good at AI will be important when I get a job or go to college.

Being someone who is good at computer science is important to me.

•

Belongingness in CS, sense of social academic fit

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I feel a sense of belonging to the AI and computer science community.

I feel comfortable in computer science.

I feel like an outsider in computer science.

I identify as a computer scientist.

•

Readiness to learn

Select how much you agree or disagree with the following statements. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I am good at setting goals and deadlines for myself.

I finish things I start.

I do not quit just because things get difficult.

I am relatively good at using the computer.

•

Storytelling enjoyment

Select the response you most agree with. [Not at all, A little, Quite a bit, Very much] (Likert scale, range: [1, 4])

How much do you enjoy reading stories in general?

How much would you enjoy reading stories as part of the study?

How much do you enjoy writing stories in general?

How much would you enjoy writing stories as part of the study?

What is your favorite book? [Free response]

•

Prior experiences with CS, AI, and storytelling

Select the response you most agree with. [Yes, No]

I have participated in a computing extracurricular activity before.

I have taken a computer programming or AI class before.

I have participated in a storytelling-related extracurricular activity before.

I have taken a class before that involved reading and/or writing stories.

•

Self-reported knowledge of AI topics covered on the platform

Reviewing the following statements, rate your familiarity with the concepts discussed. [I have written code that relates to this, I could explain this concept to a friend, I am somewhat familiar with this concept, I could guess what this means, I don’t know what these means] (Likert scale, range: [1, 5])

Define supervised, unsupervised, and reinforcement learning algorithms, and give examples of human learning that are similar to each algorithm.

Model how machine learning constructs a reasoner for classification or prediction by adjusting the reasoner’s parameters (its internal representations).

Use either a supervised or unsupervised learning algorithm to train a model on real-world data, then evaluate the results.

Illustrate what happens during each of the steps required when using machine learning to construct a classifier or predictor.

Describe how various types of machine learning algorithms learn by adjusting their internal representations.

Select the appropriate type of machine learning algorithm (supervised, unsupervised, or reinforcement learning) to solve a reasoning problem.

Train a multi-layer neural network using the backpropagation learning algorithm and describe how the weights of the neurons and the outputs of the hidden units change as a result of learning.

Compare two real-world datasets in terms of the features they comprise and how those features are encoded.

Evaluate a dataset used to train a real AI system by considering the size of the dataset, the way that the data were acquired and labeled, the storage required, and the estimated time to produce the dataset.

Investigate imbalances in training data in terms of gender, age, ethnicity, or other demographic variables that could result in a biased model, by using a data visualization tool.

How to use variables in a programming language

What JavaScript is

The difference between TypeScript and JavaScript

Using conditionals, like “if”, in code

Using loops, like “while” and “for”, in code

Using a map/dictionary data structure in code

Representing a graph using a data structure in code

Using git to push and pull commits

Using git to resolve a merge conflict

Using Node.js

•

Peer comparison

Compared to your peers, how would you rate your knowledge of computer science? [Much less knowledgeable, Somewhat less knowledgeable, Average, Somewhat more knowledgeable, Very knowledgeable] (Likert scale, range: [1, 5])

•

Demographics

How many years old are you? [Number drop down]

What is your gender? [Female, Male, Non-binary, Free response]

What is your nationality? [Free response]

What is your racial/ethnic identity? [Free response]

Do you identify as Armenian? [No, Yes, Free response]

Who (if anyone) referred you to this opportunity? [Free response]

A.5.2 Questions asked only post-study.

•

Reflections on the study

How important was it to you that the story focused on Armenia? Please explain. [Free response]

Did it matter that the story was/wasn’t relevant to your personal experiences? Please explain. [Free response]

What originally motivated you to sign up for this study? What kept you motivated to keep participating? Please elaborate. [Free response]

I would choose to participate in an AI learning community again. [Yes, No]

I feel that I am a member of the Apricot Stone City learning community. [Yes, No]

The Apricot Stone City learning community is supportive of me. [Yes, No]

Did you feel like there was a sense of community during the study? [Yes, No] Please explain. [Free response]

•

User Engagement Scale Short Form (UES-SF) [91] scoped to questions on focused attention and reward factors. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

I was absorbed in this experience.

My experience was rewarding.

I felt interested in this experience.

•

Extended Unified Theory of Acceptance and Use of Technology (UTAUT2) [125] scoped to questions on performance expectancy, effort expectancy, social influence, facilitating conditions, hedonic motivation, habit, and behavioral intention. [Strongly disagree, Disagree, Neutral, Agree, Strongly agree] (Likert scale, range: [1, 5])

This experience helped me learn.

Learning how to use the Apricot Stone City platform was easy for me.

People who are important to me would want me to engage with the platform.

I have the resources necessary to use the platform.

Using the platform was enjoyable.

Using the platform would become a habit for me.

I would intend to continue using the platform in the future if it’s available.

•

Self-reported learning per level

Please rate the following activities in terms of how much you feel you learned by engaging with them. (If you didn’t engage with a level, mark N/A).

[I didn’t find this at all valuable for learning, It was a little valuable for learning, It was very valuable for learning, N/A (I didn’t engage with this)] (Likert scale, range: [1, 3] or N/A value)

Level 1 (using existing content on the platform)

Level 2 (creating content)

Level 3 (coding mini-games)

Level 4 (bringing the concepts into the real-world, considering AI ethics)

After level 4 (supporting other content creators)

How far did you progress on the site (e.g., what level)? What helped you get that far? What prevented you from getting farther? Please explain. [Free response]

A.6 Post-scene feedback questions

A learner receives the following questions to provide feedback after completing a story scene.

•

Enjoyment and self-reported learning

How enjoyable was this scene? [Animated smiley emoticon Likert scale]¹⁴

How would you rate your learning for this scene? [Animated smiley emoticon Likert scale]

What did you like about this scene? [Free response]

What would you change about the scene? [Free response]

A.7 Focus group interview guide

Our semi-structured focus group interviews used the following questions as a guide, centered around our research questions.

•

Questions related to RQ1

Tell me about a time when you felt like you were learning something new and interesting on the platform.

What were some of the favorite new things you learned this past week? It could be AI concepts, or how to use some aspect GitHub – or anything else – whatever you enjoyed learning!

Now, how about a time you felt confused or found something hard to understand?

How did this experience compare to regular school and learning in a classroom with a teacher?

•

Questions related to RQ2

How enthusiastic were you about AI / computer science before this experience? Has that interest gone up or down?

Do you think you’ll take more AI or computer science classes in school or after-school programs if you have the chance? Why or why not?

•

Questions related to RQ3

Tell me about some of the things you liked the most about the platform/your experience?

Anything you really disliked or would want to change?

How did you feel about the stories on Apricot Stone City?

What was your favorite scene? Why?

Were there any stories you really disliked? How come?

Did you get to the point of writing story content? How was that experience?

Did you feel like you were part of a community on Apricot Stone City? Tell me about that.

How much did you interact with other learners on the platform?

What encouraged you or discouraged you from interacting with others?

Did you feel ownership of the platform content?

How did the bugs in the platform content make you feel? (e.g. empowered to fix them? / did you notice them?)

•

Questions related to scalability

If you had one more month to continue working on this platform, what would you like to see happen?

Do you imagine you’ll keep using the platform once the official study ends? Why or why not? It’s OK if the answer is no – we want to understand your honest reactions!

Would you refer the platform to a friend?

Could you see the platform being used in a classroom context? At home?

A.8 Participant information table

Please see the next page.

Table 6:

Participant ID	Age	Gender	Group	Participant ID	Age	Gender	Group
C1-1	13	F	C1	C2-29	16	M	C2
C1-2	14	F	C1	C2-30	17	M	C2
C1-3	15	F	C1	C2-31	18	M	C2
C1-4	15	F	C1	C3-1	15	F	C3
C1-5	16	F	C1	C3-2	16	F	C3
C1-6	16	F	C1	C3-3	16	F	C3
C1-7	16	F	C1	C3-4	16	F	C3
C1-8	16	F	C1	C3-5	16	F	C3
C1-9	17	F	C1	C3-6	16	F	C3
C1-10	12	M	C1	C3-7	17	F	C3
C1-11	14	M	C1	C3-8	17	F	C3
C1-12	14	M	C1	C3-9	14	M	C3
C1-13	14	M	C1	C3-10	15	M	C3
C1-14	16	M	C1	C3-11	15	M	C3
C1-15	16	M	C1	C3-12	16	M	C3
C1-16	16	M	C1	C3-13	16	M	C3
C1-17	17	M	C1	C3-14	16	M	C3
C1-18	17	M	C1	C3-15	17	M	C3
C2-1	16	F	C2	C3-16	17	M	C3
C2-2	16	F	C2	P1	14	F	Study
C2-3	16	F	C2	P2	17	F	Study
C2-4	16	F	C2	P3	17	F	Study
C2-5	16	F	C2	P4	17	F	Study
C2-6	16	F	C2	P5	17	F	Study
C2-7	16	F	C2	P6	18	F	Study
C2-8	17	F	C2	P7	18	F	Study
C2-9	17	F	C2	P8	19	F	Study
C2-10	17	F	C2	P9	19	F	Study
C2-11	17	F	C2	P10	20	F	Study
C2-12	17	F	C2	P11	20	F	Study
C2-13	17	F	C2	P12	20	F	Study
C2-14	17	F	C2	P13	23	F	Study
C2-15	17	F	C2	P14	16	M	Study
C2-16	17	F	C2	P15	17	M	Study
C2-17	17	F	C2	P16	18	M	Study
C2-18	17	F	C2	P17	18	M	Study
C2-19	18	F	C2	P18	18	M	Study
C2-20	18	F	C2	P19	18	M	Study
C2-21	19	F	C2	P20	18	M	Study
C2-22	15	M	C2	P21	19	M	Study
C2-23	15	M	C2	P22	19	M	Study
C2-24	15	M	C2	P23	19	M	Study
C2-25	16	M	C2	P24	21	M	Study
C2-26	16	M	C2	P25	21	M	Study
C2-27	16	M	C2	P26	22	M	Study
C2-28	16	M	C2	P27	23	M	Study

Table 6: Basic demographic information (age and gender) about study participants. The "Group" column indicates participation in one of the preliminary case studies ("C1", "C2", "C3") or the final system evaluation ("Study").

Footnotes

http://aicoe.ai/

https://raise.mit.edu/

https://tumo.org/

https://www.nextgenscience.org/

https://www.iste.org/standards/

https://www.cde.ca.gov/be/st/ss/documents/finalelaccssstandards.pdf

https://csteachers.org/k12standards/

https://www.media.mit.edu/projects/impact-ai-k-12/overview/

https://www.typescriptlang.org/

https://en.wikipedia.org/wiki/Tuff

Using https://github.com/vasturiano/react-force-graph

https://idletimer.dev/

The AI topics covered by the platform are: The nature of learning (humans vs. machines), Finding patterns in data, Training a model, Constructing vs. using a reasoner, Adjusting internal representations, Learning from experience, Structure of a neural network, Feature sets, Large datasets, and Understanding bias in datasets [2].

https://www.npmjs.com/package/react-rating-emoji

Supplemental Material

MP4 File - Video Preview

Video Preview

Transcript for: Video Preview

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

MP4 File - Video Figure

A video figure depicting our user journey.

Transcript for: Video Figure

References

[1]

Nancy E Adams. 2015. Bloom’s taxonomy of cognitive learning objectives. Journal of the Medical Library Association: JMLA 103, 3 (2015), 152.

Abstract

1 Introduction

2 Background and Related Work

2.1 Pedagogical groundings of AI learning

2.1.1 Bloom’s taxonomy adapted to AI literacy.

2.1.2 TPACK adapted to AI literacy.

2.2 Digital storytelling for education

2.3 Learnersourcing

3 System Description

3.1 Pedagogically-grounded design strategy

3.2 Learner roles and user journey

3.2.1 Story explorer role.

3.2.2 Story architect role.

3.2.3 System facilitator role.

3.3 Formative design phases

3.3.1 Case study 1 (C1).

3.3.2 Case study 2 (C2).

3.3.3 Case study 3 (C3).

3.4 Final technical implementation

4 Evaluation

4.1 Participants and procedures

4.2 Data collection

4.2.1 Exams, quizzes, and other learning assessments.

4.2.2 Automatically tracked measures.

4.2.3 Surveys and self-reported data.

4.2.4 Focus group interviews.

5 Results

5.1 RQ1: Can a narrative-based learnersourcing platform increase knowledge of AI concepts?

5.1.1 Baseline knowledge levels.

5.1.2 Changes in AI knowledge.

5.1.3 Learning specific AI concepts.

5.2 RQ2: Can a narrative-based learnersourcing platform promote interest and engagement with learning AI concepts in an extracurricular environment?

5.2.1 Interest in AI concepts.

5.2.2 Engagement with the learning process.

5.2.3 Personal attitudes and beliefs of learners.

5.3 RQ3: What is the user experience of a narrative-based learnersourcing platform?

5.3.1 Semi-quantitative UX measures.

5.3.2 Sense of community.

5.3.3 Experience with the narratives.

5.4 Examining gender differences in learning, engagement, and experience

5.5 Speaking to scalability of the platform

5.5.1 Tackling the cold start problem through human-AI collaboration.

5.5.2 Gauging quality of learnersourced content.

5.5.3 How learner referrals help scale and sustain the platform.

6 Discussion

6.1 Contributions and reflections

6.1.1 Community partners and culturally-situated narratives.

6.1.2 Narrative-based learnersourcing helps build inclusive learning communities at scale.

6.1.3 Narrative as a tool for empowerment.

6.1.4 Entering the age of AI.

6.2 Limitations and future work

7 Conclusion

Acknowledgments

A Appendix

A.1 Preliminary phases of system design

A.1.1 Case study 1 (C1): Creating an initial prototype, developing narrative-based learning content, and exploring learnersourcing-based scaffolds.

A.1.2 Case study 2 (C2): Refining designs to provide opportunities for learners to collaborate and interact on the platform as well as engage in-person.

A.1.3 Case study 3 (C3): Investigating the scalability of the platform, particularly across cultural contexts, and whether students can build on each other’s contributions even over shorter engagement timeframes.

A.2 Technical details of the narrative-based learnersourcing platform

A.2.1 Story adventure component.

A.2.2 Story graph component.

A.2.3 Story infrastructure component.

A.3 Example knowledge assessment questions

A.4 Example learner-made quiz questions

A.5 Pre- and post-study survey questions

A.5.1 Questions asked pre- and post-study.

A.5.2 Questions asked only post-study.

A.6 Post-scene feedback questions

A.7 Focus group interview guide

A.8 Participant information table

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

What's In It for the Learners? Evidence from a Randomized Field Experiment on Learnersourcing Questions in a MOOC

Effects of Technological Interventions for Self-regulation: A Control Experiment in Learnersourcing

Employing Wikibook project in a linguistics course to promote peer teaching and learning

Comments