1 Introduction
Online lectures are widely used for conveying knowledge in various learning contexts. Notably, instructors often use them as educational resources in various teaching contexts like flipped learning [
36] or supplementary materials [
8,
24], which usually take the format of knowledge-transfer-oriented online lectures. However, they usually take the form of a lengthy monologue. This format can cause learners to feel disengaged or quickly lose interest [
4], potentially resulting in persistent negative emotions that detrimentally affect learning outcomes [
28]. To address the limitations of this lecture format, studies have explored the application of conversational agents (CA) to video-based learning [
55,
67,
68,
76]. Many studies used CAs to mimic human tutoring behaviors such as scaffolding [
30], and these direct interactions with CA have improved the learning experience of online learners.
Although these studies imply the importance of CAs’ scaffolding mechanisms in online video lecture settings, they have supported mostly the learners who prefer to interact directly with an instructor and peer learners [
65,
66]. Yet, for
vicarious learners, who prefer to learn from others and actively process the interactions of others, interactions that can be vicariously processed are more beneficial to their learning [
65,
66]. To enhance
vicarious learners’ experience, systems with multiple CAs that simulate interactions between an instructor and a
direct learner [
68] have been introduced based on
vicarious learning theory [
49].
Vicarious learning theory explains the benefits of learning when
vicarious learners observe tutoring between an instructor and
a direct learner who interact directly with the instructor in a video lecture [
2,
11,
12,
49]. The studies found that
vicarious learners preferred dialogic lecture videos that incorporate CAs over monologue-style lecture videos, and it resulted in a positive effect on students’ engagement [
68]. Therefore, introducing vicarious dialogue into monologue-style lectures can serve as a promising solution to address the limitations of conventional online lectures and satisfy
vicarious learners.
However, the current approach has not yet addressed how to create high-quality dialogues that cater to vicarious learners through adaptation or expansion of original lecture contents. It is important to consider the quality of the learning content, such as the level of detail provided by the lecturer, because it can significantly affect a vicarious learner’s cognitive load and engagement. Thus, rather than simply enhancing lectures, we converted the original lecture script into a format that can reduce the cognitive load for vicarious learners and found a pedagogically meaningful format for high-quality dialogue. To do this, we integrated LLM in this conversion process since LLM has been discussed as a feasible way to design vicarious dialogues while reducing the extra effort for instructors [
68]. In short, this work aims to alleviate the manual effort of instructors authoring vicarious dialogue and establish a scalable pipeline for designing educationally high-quality dialogues from lectures.
To achieve this goal, we have developed five guidelines for transforming monologic lectures into a vicarious dialogue that can benefit online learners: Dynamic, Academically Productive, Cognitive Adaptable, Purposeful, and Immersive. As an initial step in crafting these guidelines, we conducted an iterative inductive literature analysis to define what constitutes a pedagogically meaningful dialogue. However, most existing literature focused on insights derived from classrooms or intelligent tutoring systems, not video lectures. Furthermore, there is limited research on transforming the content in video lectures into high-quality educational dialogues. Therefore, we conducted a design workshop with eight educational experts and seven secondary school teachers to develop the guidelines to be tailored for a STEM video learning setting.
To facilitate the efficient authoring of video-based vicarious dialogues based on our guidelines, we propose a system, VIVID (VIdeo to VIcarious Dialogue), which allows instructors to design, evaluate, and modify vicarious interactions with video lectures. To empower this system, we propose a collaborative design process between LLM and instructors to generate high-quality vicarious dialogues efficiently. This process consists of three stages, guided by the developed guidelines in the workshop: (1) Initial Generation: After an instructor chooses where to convert in a lecture, LLM configures a direct learner’s understanding level for each concept in the selected section of the lecture and generates initial dialogues. (2) Compare and Selection: Instructors compare and select from multiple generated dialogues, and (3) Refinement: Instructors collaborate with LLM to refine the final dialogue, which will replace a section of the video lecture.
To determine whether VIVID is helpful for instructors to transform monologue lectures into high-quality dialogue lectures, we conducted a within-subjects study with 12 instructors. VIVID helped instructors simulate a direct learner effectively through co-designing with VIVID. Furthermore, instructors found that VIVID is significantly better in monitoring essential considerations (p = 0.04) with an effect size (Cohen’s d) of 0.8 than the Baseline when designing dialogue. To evaluate the pedagogical quality of the authored dialogues designed through VIVID, we also conducted a human evaluation with six secondary instructors in four criteria which is if the dialogue is Dynamic, Academically productive, Immersive, and Correct. We found that the dialogues made by VIVID were significantly better quality in most criteria than the dialogues generated by Baseline.
The contributions of this work are as follows:
•
Design guidelines through design workshop for making vicarious educational dialogues from lecture videos.
•
VIVID, a system that collaborates with LLM to assist instructors in authoring vicarious dialogues from the monologue-styled lecture videos.
•
Findings from a user study with 12 instructors showing how VIVID can assist instructors in dialogue authoring (Section 6.2), and a technical evaluation with six instructors that demonstrates the higher quality of dialogues created by instructors using VIVID compared to the Baseline (Section 6.4).
4 Findings from Design Workshop
We identified the two most commonly mentioned issues by participants and formulated five design recommendations for creating high-quality vicarious dialogues. Additionally, we propose how LLM can be integrated into the educational dialogue authoring process.
4.1 Challenges in Converting Video Lectures to Dialogue
Two challenges were observed when instructors converted video lectures to dialogue.
Challenge 1: Designing the overall structure of dialogues. We observed that the participants faced difficulties in designing the overall structure of the dialogue when creating from scratch. Participants mostly first struggled with which part of the lecture should be converted to dialogue. P3 mentioned that it was “difficult to figure out which parts of a monologue should be transformed into direct learner’s questions" and P4 said it was “hard to decide when and how much dialogue to create". It poses the cold start problem when designing dialogues by considering the improvement of the vicarious learner’s learning. Furthermore, participants struggled to determine the appropriate format for the dialogue as they were unsure how the dialogue format would affect learning outcomes. P5 said that “while it was easy to convert the lecture into a simple question-and-answer format, I’m not sure if these would be meaningful dialogues for vicarious learners". P15 also mentioned, “If it ends up looking too similar to the original lecture format, converting the material to a dialogue format might not be necessary", asserting the need to define what kind of dialogue format would be helpful for vicarious learners in an online learning environment.
Challenge 2: Anticipating direct learner’s utterances based on their level of understanding. Both instructors and experts needed help with designing a direct learner’s utterances. This is evident from comments: “It is hard to add direct learners’ misconceptions to dialogues effectively" (P15) and “It was difficult to consider individual responses of the direct learners" (P7).
4.2 Design recommendations that should be considered while designing dialogue for vicarious learners.
Based on the challenges above, we propose five dialogue design recommendations. Furthermore, we suggest four teaching strategies (Table
7 in Appendix) validated by workshop participants as likely effective even in a video-based learning context among pre-defined guidelines based on literature (Section 3.1.2).
DR1. Dynamic: Include various interaction patterns to reflect the dialogic dynamics between the tutor and tutee. A vicarious dialogue should be structured with fast turn-taking and various utterance patterns (Table
1, Table
2) that capture the dynamism of an actual tutoring scenario. Moreover, P14 mentioned that
"fast turn-taking is required to hold the attention of vicarious learners in online education, as it is more difficult to retain focus on digital learning platforms than in physical classrooms". Furthermore, instructors and experts often divided the tutor’s lengthy utterances into smaller sub-dialogues between the tutor and the direct learner, highlighting the quick turn-taking in vicarious dialogues.
DR2. Academically productive: Encourage the metacognitive and constructive utterances of the direct learner to make a dialogue academically productive. Direct learners’ utterances should be pedagogically meaningful to enhance vicarious learners’ learning and engagement. Most workshop participants consistently emphasized the influence of direct learners on vicarious learners throughout the dialogue design process. Notably, they stressed the importance of direct learners displaying "interactive engagement" in dialogues, as vicarious learners are highly likely to empathize with the direct learner’s learning process. The term "interactive engagement" refers to the active engagement of direct learners both cognitively and metacognitively.
Direct learner’s cognitive engagement: P15 highlighted the importance of a tutor in a vicarious dialogue who should encourage active engagement by facilitating connections between direct learners’ existing knowledge and the new material, citing
Ausubel’s meaningful learning theory [
33]. In addition, P14 mentioned that
“When the instructor links the learning contents with the learner’s personal experiences, the transfer learning occurs more easily".
Direct learner’s metacognitive engagement: P15 and P9 proposed incorporating self-assessment and explanations of understanding from the direct learner into vicarious dialogues:
“When a direct learner self-assesses their level of understanding or performs self-summarization, a vicarious learner could potentially check their comprehension". In addition, P15 suggested that a tutor continuously promotes the direct learner’s metacognition. This guide aligns with the findings that in an ITS [
1,
47], the constructive actions of a direct learner, such as answering based on what they learned from the instructor’s scaffolding and asking deep-level reasoning questions [
35,
47], significantly influenced the learning outcomes and participation of vicarious learners.
DR3. Cognitively adaptive: Adapt the teaching strategies to the level of understanding of the vicarious learner, learning objectives, and lecture contents. Previous literature suggests that strategies requiring higher cognitive engagement, like inducing cognitive conflicts and posing deep-level reasoning questions, benefit vicarious learners [
16,
17,
21,
27]. However, applying cognitively demanding strategies, like
cognitive conflict in Table
7 in Appendix, may not always suit all learning materials or learners when converting lecture videos into dialogues. P15 noted that the choice of cognitive strategy may vary depending on the granularity of the learning content being transformed into a dialogue. In addition, he emphasized the importance of aligning cognitive strategies with learning objectives and the level of vicarious learners, stating that
“Frequent placement of lighter, easily answerable questions and minimal use of cognitive strategies on important content could lower the cognitive load on vicarious learners".
DR4. Purposeful: Define a learning objective for the vicarious learner and ensure that the learning objective is achieved through that dialogue. To create meaningful dialogue for vicarious learners, we recommend aligning the dialogue’s goal with the vicarious learner’s learning objective and illustrating the achievement of this objective through interactions between a direct learner and a tutor. P15 and P8 emphasized the importance of defining clear learning objectives for vicarious learners as an initial step in dialogue creation. Additionally, P8 highlighted that learning objectives should be intimately tied to the difficulties vicarious learners face.
DR5. Immersive: Utilize realistic teaching scenarios and match the direct learner’s cognitive level with the vicarious learner’s level. We suggest considering two factors that can immerse vicarious learners in their vicarious interaction.
•
Incorporate common teaching scenarios: Some participants suggested using real classroom scenarios for vicarious learner engagement. For example, P11 proposed scenarios in which the direct learner is given an incorrect problem and asked to explain what is wrong and a situation where another learner responds correctly to the tutor’s question when a student gives wrong answers. P14 also suggested a scenario where a tutor makes the direct learner apply what they have learned in different examples.
•
Match cognitive levels: Instructors and experts highlighted aligning the cognitive levels of direct and vicarious learners in lecture videos to benefit the vicarious learners.— “Vicarious learners often lose interest when confronted with familiar material but are more likely to engage when unfamiliar or essential information is presented.” (P12). Therefore, addressing vicarious learners’ unfamiliar or challenging parts through direct learners’ dialogue could be an effective way to design meaningful and high-quality dialogue.
4.3 Enhancing the Educational Dialogue Design Process with LLMs
After establishing guidelines, we explored how instructors and experts used LLM-generated dialogues and developed evaluation criteria (Table
5) for evaluating their pedagogical quality based on how workshop participants assess the dialogues (Table
3). We also explored strategies for integrating LLMs into the educational dialogue design process.
4.3.1 Utilization of LLM-Generated Dialogues.
We propose two ways in which the LLM could enhance the dialogue design process for vicarious learners. Firstly, it can provide pre-generated dialogues, stimulating instructors’ ideation. P2 commented that using the LLM felt like it provided helpful guidelines, making it more effective than starting from scratch. Secondly, it can assist in modifying dialogues at different levels, refining sub-dialogues and crafting direct learners’ responses. Participants proposed presenting expected responses at different levels (P12) and automating the process of generating questions from the direct learner’s perspective (P2).
Despite the LLM’s advantages, the dialogue authoring process still requires active instructor involvement. In our observation, we have noted that instructors have their own set of criteria when designing high-quality dialogues. These criteria are based on their teaching experiences and can vary depending on the instructor’s emphasis on specific aspects where they believe vicarious learners may face challenges. Guided by these personalized criteria, instructors designed and revised their dialogues.
Some instructors found the generated dialogues satisfactory because they aligned with their intended teaching points or teaching style. P13 chose the dialogue, stating “When teaching math, using fewer variables is better. So, I initially emphasized reducing the number of characters and utilizing known information. The dialogue aligns well with my problem-solving approach that focuses on minimizing variables". Some instructors didn’t use the dialogues because the content didn’t meet their quality criteria. For example, P11 made revisions to emphasize a specific point, stating, “The tutee’s question: ’So, is x-2 the square root of 6?’ is crucial in the problem-solving process. It would be helpful if the tutor followed up with a question like, ’What is the number that becomes 6 when squared?’ to elaborate on this point".
4.3.2 Criteria for Evaluating the Educational Dialogues.
Instructors evaluated the quality of LLM-generated dialogue based on seven criteria (Table
3). Five of these criteria aligned with the key factors to consider when designing educational dialogues (Section 4.2), while the other two criteria,
Usefulness and
Correctness, pertain to evaluating dialogues generated by the LLM.
4.4 Design Goals
Based on LLM’s strengths and limitations in designing educational dialogue and criteria that instructors emphasized the most when evaluating the quality of dialogues (Table
3), we propose four design goals (DG):
DG1. Enable instructors to easily simulate direct learners easily.
DG2. Assist instructors in designing dialogues by referencing utterances generated at various levels of granularity.
DG3. Assist instructors in creating dialogues that reflect the user’s dialogue usage context and personal experience with students.
DG4. Ensure that instructors consistently monitor important considerations when designing vicarious dialogues.
5 VIVID: A System for Authoring Vicarious Dialogues from Monologue-styled Lecture Videos with LLM Assistance
Based on our design goals from the workshop, we developed VIVID, an LLM-based system to assist instructors in crafting vicarious dialogues from their monologue-styled lecture videos. While LLM holds potential benefits for the dialogue design process, as detailed in Section 4.3.1, they may not be practically utilized in real educational settings if
Correctness and
Usefulness (Table
3) are not ensured. Thus, VIVID provides a collaborative authoring process between LLM and instructors, facilitating the generation of high-quality and correct vicarious dialogues. Based on our four design goals and observed dialogue design process in the workshop, this collaborative authoring process consists of three stages: (1)
Initial Generation, (2)
Comparison and Selection, and (3)
Refinement.
To motivate VIVID’s design, we describe a usage scenario where an instructor collaborates with LLM to author dialog through VIVID. A high school biology teacher, Sophia requires her students to watch recorded lectures before class. Sophia wants to make sure that students easily understand parts of the lectures with the most common misconceptions. In this context, she uses VIVID to transform the sections in her recorded lecture where misconceptions frequently occur into dialogues so that her students gain a better understanding. Thus, she uploads her lecture video to VIVID
(A1, Figure 1) and selects the sections she wants to transform into dialogues
(A2).
Initial Generation. She then highlights areas where her students might develop misconceptions or key examples she wants to emphasize in the dialogue
(B1, Figure 1). Sophia aims to design the dialogue scenario as if it is occurring in a high school biology class, where a teacher addresses the direct learner’s misconceptions in the dialogue
(B2). Upon highlighting, VIVID generates four dialogues reflecting the dialogue scenario.
Comparison and Selection. VIVID shows generated dialogues with an ‘understanding level rubric’
(C1, Figure 1) that shows four levels of learners’ understanding for each key concept in the selected part and the ‘dialogue cards
(C2)’ that contains key information of each dialogue. Sophia compares each dialogue, considering the knowledge levels of the direct learner for each concept illustrated in the dialogue cards
(C2). She then chooses to modify ‘Dialogue 2’ because it highlights the misconceptions she wants to include.
Refinement. Sophia modifies ‘Dialogue 2’ by adding questions in the tutor’s utterance to address direct learner’s misconceptions. She clicks the
Generate button
(D1-A, Figure 2) to add a new utterance. However, she is unsure what answers the direct learner could provide for these newly added questions. To view different examples of how the learner might respond, she first selects the learner’s utterance that she wants to see more variations of clicked sub-dialogue
(D2). Afterward, she clicks the
Laboratory button
(D4-1), and VIVID generates four variations of the chosen utterances.
After reviewing the results, she wants to replace the existing utterances with new ones that better represent the learner’s misconceptions. She clicks the
Apply button
(D4-2) to replace the previous utterances with new ones. This allows Sophia to create a dialogue where misconceptions are effectively addressed in the final dialogue.
5.1 Initial Generation
VIVID initially creates various dialogues for instructors to choose the one that aligns best with their intention for converting monologue to dialogue as we found that the LLM-generated dialogues can be utilized as prototypes in the process of educational dialog design (Section 4.3.1). Notably, our LLM-based pipeline of the Initial Generation stage is designed to generate dialogues that satisfy the most emphasized characteristics by workshop participants, which are
Dynamic, Academically Productive, and
Immersive (DR1, DR2, and DR5 in Section 4.2). Furthermore, when generating dialogues, VIVID reflects instructors’ needs in our pipeline, making instructors easily simulate direct learners with knowledge levels similar to their target vicarious learner (DG1 in Section 4.4). Thus, the Initial Generation stage consists of four steps to generate dialogues that finely adjust the direct learner’s knowledge state based on the instructor’s needs. We determined our final prompts (further details are in the Supplemental Material) by evaluating the quality of various dialogues based on our evaluation criteria (Table
5).
5.1.1 Step 1. Create a rubric for highlighted areas, indicating the learner’s understanding level for each concept.
DR5 (Immersive) in Section 4.2 suggests that the dialogue should align the cognitive level of direct learners with vicarious learners. Highlighting feature allows instructors to highlight sections in the script that vicarious learners might find challenging. It reflects the intention of instructors to convert the dialogue for a specific level of vicarious learners. Therefore, VIVID leverages the highlighted sections to make assumptions about the level of vicarious learners and uses it to model the direct learner (DR5 in Section 4.2).
Before configuring the direct learners’ understanding state, we extract the core concepts of the selected area in the transcript and divide the direct learners’ possible understanding state of each concept into four levels. These levels are based on the cognitive domain of Bloom’s taxonomy [
26] as it has been used by instructors to design, assess, and evaluate student’s learning [
43]. VIVID then generates four
understanding levels for each key concept with LLM and presents them in a rubric format
(B1) (Figure
1). The
understanding level here refers to the understanding state expected of direct learners when they learn new concepts from the instructor during the dialogue.
5.1.2 Step 2. Determine the direct learner’s understanding level using the highlighted parts and the rubric.
The highlighted parts present the concepts that the direct learner may not fully comprehend after the tutor’s explanation in the dialogue. We set the direct learner’s understanding level based on the highlighted concepts, using ‘level 1’, ‘level 2’, or ‘level 3’ in the generated rubric to indicate the direct learner’s knowledge deficits. The direct learner is prompted at the highest understanding level, ‘level 4’ for unhighlighted areas.
The process of determining a direct learner’s understanding level didn’t consider prerequisite relationships between concepts to generate a dialogue that reflects varied levels of comprehension of each concept, as shown in Figure
4. For example, consider a case where Concept A is a prerequisite for Concept B. Even if the LLM model sets Concept A at ‘level 1’ and Concept B at ‘level 4’, a scenario can be designed where the learner studies Concept A with the teacher to fill the knowledge gap (level 1) and then responds well to Concept B (level 4).
5.1.3 Step 3. Create an answer sheet consisting of the learner’s expected answers to the tutor’s questions and questions showing where the learner struggles.
We designed our prompt to create expected questions and responses to the instructor’s questions when the direct learner is in a specific knowledge deficit state. The expected answer sheet was designed in a descriptive format to reflect the learner’s nuanced understanding. We prompted an LLM to manipulate the expected answers to the instructor’s questions concerning the learner’s knowledge level for each concept. We also designed a prompt to generate questions that direct learners might struggle with the concepts set to a low level.
5.1.4 Step 4. Generate dialogues.
The final dialogues are generated through prompts based on the following three elements as shown in Figure
3: (1) Adjusted direct learner’s knowledge state information through Step 1 to Step 3 to achieve
Immersive (DR5), (2) Key utterance categories of a tutor and a tutee in Table
2 and Table
1 to achieve
Dynamic (DR1), and (3) Key teaching strategies described in Table
7 to achieve
Academic Productive (DR2).
5.2 Comparison and Selection
In
Comparison and Selection stage, VIVID provides the instructors with an
Understanding level rubric (B1) and
Dialogue cards (B2) (Figure
1) to enable monitoring and selecting based on the criteria that were important during
Initial Generation stage (DG4 in Section 4.4). Each
dialog card (B2) contains the primary information of the dialogue, such as the direct learner’s understanding level of each concept, key teaching strategies, and key dialogue patterns. Besides,
Understanding level rubric represents a four-level understanding state for each key concept appearing in the selected part in the transcript.
5.3 Refinement
5.3.1 Basic tools for instructor’s direct refinement.
In the workshop, we observed that instructors were proficient in using existing dialogue content, like breaking down lengthy tutor utterances into smaller segments or incorporating script contents into dialogue. To facilitate this kind of authoring, VIVID provides four basic functions:
add (D1-a),
duplicate (D1-b),
delete utterance (D1-c), and
change speaker (D1-d). As visible in
(D1), each utterance box in the final dialogue is clickable and can be moved with drag-and-drop (Figure
2). Additionally, we aimed to enhance the
Correctness of the dialogue through direct refinement.
5.3.2 LLM-based refinement tool: Laboratory.
In addition to basic functions, VIVID offers the
Laboratory tool
(D4-1) that provides alternatives
(D3) for the selected sub-dialogues
(D2) through LLM (Figure
2). It is designed to address the instructor’s challenges in developing direct learners’ utterances while considering their understanding level (
Challenge 2 in Section 4.1) and achieve DG3 (Section 4.4). To do this, we designed the prompt used in
Laboratory tool while maintaining four key elements except for the original dialog patterns (in Supplemental Material): (1) learner’s level of the selected dialogue in the
Comparison and Selection phase, (2) dialogue context, (3) main learning contents, and (4) the number of turns. On the other hand, we diversified the dialogue patterns, reflecting utterance categories in Table
2 and Table
1 in our prompt. When the instructor clicks the
Apply button
(D4-2), the selected sub-dialogue
(D2) is replaced with the new sub-dialog
(D4-2).
5.4 Implementation
VIVID is implemented using React
1, connected to a Flask
2-based back-end server that utilizes GPT API. Whisper [
58], an automatic speech recognition model by OpenAI, auto-generated the script of the section that the instructor chose from the lecture video (
B1 in Figure
1). To address limitations in text-to-speech (TTS) models like noise or language and get more precise dialogue conversion, VIVID allows instructors to modify the TTS output directly during the
Initial Generation stage.
Subsequently, the system harnessed the API of the latest trained GPT-4, OpenAI’s advanced language model, to generate the rubric, learner’s knowledge level, predicted answer sheet, and the final dialogue. Considering the importance of model accuracy in an educational context, we conducted prompt engineering experiments using GPT-3.5 and GPT-4. We chose to use GPT-4 due to its superior generation quality. We set a temperature of 0.65 for the rubric generation, which was empirically determined through trial to maintain consistency, and used the default temperature for other features.
7 Discussion
In this section, we discussed how to improve explainability, controllability, and verbosity for better utility, VIVID’s potential beyond lecture videos, customizability for learners, and its generalizability.
7.1 Human-AI Interaction Design for VIVID
The technical evaluation showed that dialogues created using VIVID were significantly better than Baseline in five out of six criteria (Figure
7), excluding fast turn-taking. Similarly, the quality of the VIVID-generated dialogues during
Initial Generation phase was rated significantly higher. However, there was no significant difference in the instructors’ perceived efficiency and usefulness of VIVID compared to Baseline (Figure
6).
Despite the positive results for VIVID, the overall usefulness of each system feature was relatively low among the instructors (Figure
6). We attribute this to two reasons: (1) low explainability and informativeness of dialogue design described in the
highlighting feature and
dialogue cards and (2) low controllability of the
laboratory feature. Thus, we suggest three improvements:
•
Enhancing Explainability: The highlighting feature and dialogue cards in VIVID need to offer greater explainability to the instructors. P7 highlighted that having prior knowledge of each feature’s exact functionality could have led to more frequent and appropriate usage, potentially resulting in a higher level of satisfaction with the system. Notably, there is a need to investigate the types of information instructors require to effectively discern the diversity among learners and pedagogical dialogue patterns. We observed substantial differences between instructors in their ability to recognize differences in direct learners’ understanding levels and how these differences are reflected in dialogue structure.
•
Providing Fine-grained Controllability: Enhancing controllability and providing granular modifications on the laboratory feature can improve instructors’ workflow. In our user study, we observed that instructors exhibited varying expectations for the modified versions offered by the laboratory, and they tended to rate usability lower when their expectations were not met. The improved version of laboratory feature could support the instructors in determining and expressing what features they would expect in the revised versions of the dialogue. For instance, enabling instructors to select elements, such as diverse versions of examples, questions, or versions with added prior knowledge with interactive guidance, could increase the perceived usefulness of the feature.
•
Improving Verbosity: One unexpected downside was that the generated dialogues were perceived to be verbose, which is likely due to LLM’s tendency to produce long text. This issue could be addressed by revising the prompting pipeline to limit the length of utterances and dialogues generated, which we leave as future work.
7.2 Potential Applications beyond the Video Lecture Context
In the educational context, dialogues can serve multiple roles, extending beyond the mere transmission of factual knowledge. In our user study, several instructors highlighted the adaptability of our dialogue design pipeline, suggesting its potential application in diverse learning contexts, instructional materials, and various learning stages. For instance, P7 proposed utilizing our dialogue design pipeline to use dialogues for the learner’s review, or to use dialogue as a means to diagnose the learner’s misconceptions by providing a dialogue in which the direct learner presents misconceptions.
Furthermore, VIVID and its process of transforming lectures into a dyadic format may take the role of a valuable active learning tool. Our dialogue design pipeline can be utilized in formulating questions in a dialogue format for learners and provide interactive guidance for the students’ self-learning process using digital textbooks or in flipped learning settings. Learners can gain a better understanding of complex concepts by analyzing educational content and exploring effective teaching strategies.
7.3 Customizable VIVID for learners
VIVID is a system that supports instructors in transforming their lecture videos into educational dialogues in text format. Yet, it is important to consider how these dialogues can be seamlessly incorporated into the video learning environment (VLE) to enrich learners’ experiences and optimize learning outcomes. We can integrate text-format dialogue into the VLE by delivering it in voice and text modes together, utilizing the VLE’s multi-modality. For instance, the dialogue can be converted into human-like speech and played alongside the corresponding lecture clip by replacing the original explanations with dialogue. Furthermore, vicarious learners can simultaneously explore multimodal dialogue incorporating formulas in the lecture within a chat-like interface.
While VIVID, designed for instructors, utilizes data pertinent to the levels of a vicarious learner group considered by instructors, it is limited when incorporating teaching strategies, like transfer learning (DR2 in Section 4.2) and personalization of dialogue, which demand each vicarious learner’s data, such as prior knowledge, personal background, and current understanding state. We believe that VIVID can be extended to collect data from vicarious learners by using a multi-modal representation of vicarious dialogue. This would enable customized modeling of direct learners, effective transfer learning, and personalization to vicarious learning. For instance, we can collect the data for generating personalized dialogue by requiring learners to click on challenging elements such as formulas or explanations within a lecture as they watch it. Therefore, future work should expand VIVID to include learners and evaluate dialogues from the learner-centered criteria, such as engagement and learning gain.
7.4 Generalizability of VIVID
Even when different lectures cover the same concept, variables such as material modalities, style of delivery, and language affect how a learner perceives and understands new knowledge. We found that instructors tend to adjust dialogue to fit their teaching style when the teaching style in the lecture differs from their preference. To enable instructors to use lecture videos of any teaching style and match them with their intended outcomes, it is necessary to explore a solution for converting dialogue, which includes a preprocessing step for scripting before the Initial Generation phase. To design dialogues based on lectures with varying teaching styles, VIVID needs to preprocess the lecture material to isolate core concepts, understand the instructor’s intention, and transform the knowledge into a personalized format that matches the user’s preferred teaching style.
Moreover, it is important to determine which lecture segments and lengths are suitable for a dialogue style. As P3 said, certain contents or subjects may be more suitable for dialogue formats to help learners better understand relatively complex concepts or examples. Further, we observed in our technical evaluation that the dialogue generation had varying degrees of improvement depending on the subject matter. Thus, enhancing the advantages of dialogue format can be achieved by understanding and reflecting on the differing effects of dialogue format between subjects in dialogue design.
7.5 Limitations and Future Work
We acknowledge several limitations in our current study. Firstly, the knowledge progression of the direct learner in the dialogue was not one-sided in VIVID. VIVID didn’t consider prerequisite relationships to create diverse dialogues (Section 5.1.2). Yet, some dialogues depicted direct learners initially understanding a concept but later appearing to lack understanding. Thus, redesigning the knowledge state setting pipeline is needed to maintain consistent knowledge levels and prevent reverse progression. Secondly, our experiments involved instructors designing dialogues for only a single segment within a lecture. However, the generated dialogues are influenced by factors such as the length of the selected segment, the type of content, and the subject. To explore VIVID’s use cases more deeply, it is necessary to conduct experiments under a more diverse set of conditions.