Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Leveraging Pedagogical Theories to Understand Student Learning Process with Graph-based Reasonable Knowledge Tracing

Jiajun Cui 0000-0001-5900-7643 cuijj96@gmail.com East China Normal UniversityShanghaiChina Hong Qian 0000-0003-2170-5264 hqian@cs.ecnu.edu.cn East China Normal UniversityShanghaiChina Bo Jiang 0000-0002-7914-1978 bjiang@deit.ecnu.edu.cn East China Normal UniversityShanghaiChina  and  Wei Zhang 0000-0001-6763-8146 zhangwei.thu2011@gmail.com East China Normal UniversityShanghaiChina
(2024)
Abstract.

Knowledge tracing (KT) is a crucial task in intelligent education, focusing on predicting students’ performance on given questions to trace their evolving knowledge. The advancement of deep learning in this field has led to deep-learning knowledge tracing (DLKT) models that prioritize high predictive accuracy. However, many existing DLKT methods overlook the fundamental goal of tracking students’ dynamical knowledge mastery. These models do not explicitly model knowledge mastery tracing processes or yield unreasonable results that educators find difficulty to comprehend and apply in real teaching scenarios. In response, our research conducts a preliminary analysis of mainstream KT approaches to highlight and explain such unreasonableness. We introduce GRKT, a graph-based reasonable knowledge tracing method to address these issues. By leveraging graph neural networks, our approach delves into the mutual influences of knowledge concepts, offering a more accurate representation of how the knowledge mastery evolves throughout the learning process. Additionally, we propose a fine-grained and psychological three-stage modeling process as knowledge retrieval, memory strengthening, and knowledge learning/forgetting, to conduct a more reasonable knowledge tracing process. Comprehensive experiments demonstrate that GRKT outperforms eleven baselines across three datasets, not only enhancing predictive accuracy but also generating more reasonable knowledge tracing results. This makes our model a promising advancement for practical implementation in educational settings. The source code is available at https://github.com/JJCui96/GRKT.

knowledge tracing, student behavior modeling, data mining, pedagogical theory, reasonable knowledge tracing
journalyear: 2024copyright: acmlicensedconference: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ; August 25–29, 2024; Barcelona, Spain.booktitle: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), August 25–29, 2024, Barcelona, Spainisbn: 979-8-4007-0490-1/24/08doi: 10.1145/XXXXXX.XXXXXXccs: Computing methodologies Neural networksccs: Applied computing Educationccs: Information systems Data mining

1. Introduction

In personalized learning, Knowledge Tracing (KT) is crucial for tracking students’ evolving knowledge mastery based on their historical question responses (Wu et al., 2024; Corbett and Anderson, 1994). Early researchers addressed this challenge by leveraging the monotonicity assumption (Embretson and Reise, 2013), linking better mastery of one knowledge concept (KC) to a higher probability of correctly answering related questions. They trained models to predict student responses on given questions, proposing typical machine learning-based KT methods (Corbett and Anderson, 1994; Pardos and Heffernan, 2011). Consequently, predicting student performance became the primary task, with prediction accuracy as the mainstream metric for evaluating KT models, promoting the emergence of deep learning knowledge tracing (DLKT) methods. However, many DLKT approaches prioritize prediction ability over the fundamental objective of knowledge tracing, sometimes forgoing tracing altogether (Choi et al., 2020; Ghosh et al., 2020). Others use internal network weights to represent knowledge mastery (Yin et al., 2023; Shen et al., 2021), facing challenges in constructing meaningful tracing results due to the low interpretability and reasonability of deep neural network structures. Hidden neurons in these networks adaptively learn from data without explicit meaning (Guidotti et al., 2018). It is worth noting that the cognitive diagnosis task also assesses knowledge mastery but usually focuses on static testing instead of dynamic learning process (Liu et al., 2021; Leighton and Gierl, 2007). Therefore, we do not delve into it within this paper.

Refer to caption
Figure 1. Illustration of a student’s evolving knowledge mastery while answering ten questions, traced by two DLKT models, along with an assumed ideal tracing result. The student is sampled from the ASSIST12 dataset, introduced in Section 5.1.1.

Figure 1 illustrates the traced dynamic knowledge mastery of an example student by two DLKT models: DKT (Piech et al., 2015) and LPKT (Shen et al., 2021). DKT is a pioneering approach that directly applies recurrent neural networks (RNNs) to the KT task. In this case, when the student responds to the initial four questions related to the blue KC Calculations with Similar Figures, their knowledge mastery of the unrelated green KC Ordering Integers increases, presenting an unreasonable outcome. Furthermore, a correct response to the sixth question results in a contrary decrease in its corresponding KC’s mastery, demonstrating an inconsistent change in direction. LPKT, as a time-aware method, models learning and forgetting processes for more reasonable knowledge tracing. However, it struggles to capture the relation between the yellow KC Area Triangle and the blue KC Calculations with Similar Figures, as evidenced by the decreasing mastery of the blue curve following a correct response to an question of yellow. Both of these two KCs examine students’ calculations about the base and height of triangles, which suggests their underlying relation. Beneath the figure is a tracing result from an assumed ideal model, which we design based on comprehensive pedagogical effects. As shown, the student mastery will increase and drop according to their right/wrong responses based on the testing effect (Roediger III and Karpicke, 2006). The mastery of the yellow KC would relatedly increase due to the correct response to the sixth orange KC, according to the transfer of learning (Perkins et al., 1992). Besides, the mastery between responses should also vary due to students’ learning and forgetting behaviors modeled by the learning and forgetting curves (Yelle, 1979; Ebbinghaus, 1885).

From this example, we summarize three deficiencies of current DLKT methods in dynamic knowledge tracing reasonability: (i) Mastery change of unrelated KCs - learning one KC affects unrelated KC mastery; (ii) No mastery change of related KCs - learning one KC does not impact related KCs; (iii) Inconsistent mastery change direction - correct answers may decrease KC mastery, and vice versa. These stem from opaque deep neural networks, whose parameters serve the overarching objective of performance prediction. Moreover, many researches use RNNs to model knowledge application and update by the recurrent units’ output and state transition (Shen et al., 2021, 2022; Liu et al., 2019; Piech et al., 2015). This mixes the effects of students answering questions and their spontaneous behaviors, leading to confusing tracing results. For example, incorrect responses may strengthen wrong knowledge retrieval and get a mastery drop of the related KC. But when they get feedback and learn from their errors, they can make a final progress. This fine-grained knowledge mastery changing is not captured. To address these above issues, we introduce GRKT, a Graph-based Reasonable Knowledge Tracing to enhance knowledge tracing reasonability while retaining neural networks’ representational power.

To be specific, we integrate pedagogical theories (Perkins et al., 1992; Yelle, 1979; Ebbinghaus, 1885; Roediger III and Karpicke, 2006) into the KT modeling, dividing the learning process into three distinct stages. (i) The knowledge retrieval stage analyzes how students respond to questions. This stage draws from cognitive psychology (Melton, 1963), viewing learning as encoding, storing, and retrieving memories. When students answer questions, retrieval from memory becomes crucial. We start this stage by retrieving the encoded memory related to the question’s KC and project it into a mastery value. We then compare this value with the question’s difficulty score to predict if the student could correctly answer the question. (ii) The memory strengthening stage focuses on how answering questions impacts students’ knowledge mastery. Here, students strengthen their memory retrieval routes, aligning with the Testing Effect theory (Roediger III and Karpicke, 2006; Kornell et al., 2009). Correct retrievals enhance learning, while incorrect ones reinforce errors. We encode this positive/negative memory strengthening in the knowledge memory of the relevant KC based on whether the question is correctly solved. (iii) The knowledge learning/forgetting stage explores what students do after question answering. This stage aims to model the active learning and natural forgetting behaviors based on the Learning curve (Yelle, 1979) and the Forgetting curve (Ebbinghaus, 1885). Both curves suggest a decreasing rate of learning and forgetting over time. Concretely, we first introduce a learning decider to determine whether students will continue learning the KCs just practiced or the KCs for future study. Then, we employ KC-specific time-aware kernels to model the learning/forgetting curves of all involved KCs based on these decisions. By applying this three-stage modeling process iteratively across students’ response sequences, we establish a coherent and reasonable knowledge tracing framework. This approach effectively captures mastery changes resulting from question answering and subsequent behaviors, addressing the issue of inconsistent mastery change direction.

To handle the two other issues of mastery change of unrelated KCs and no mastery change of related KCs, we utilize the message passing mechanism of graph neural networks (GNNs) applied to KC relation graphs. This mechanism establishes clear boundaries between related and unrelated KCs. Specifically, changes in knowledge mastery of one KC are propagated through the graph edges to its related KCs within a specific number of hops. From the pedagogical perspective, this message passing aligns with the Transfer of Learning theory (Perkins et al., 1992), which explains humans’ ability to transfer knowledge between similar fields to solve problems and acquire skills. We integrate this understanding into our three-stage learning process modeling using KC relation-based GNNs. For instance, in the first stage of GRKT, instead of solely retrieving knowledge from the target question’s KC, we utilize graph aggregation to synthesize the memory of the KC’s neighbors for solving the question. Similarly, during the second stage, the memory strengthening process involves propagating the gain and loss of knowledge mastery to the KC’s neighbors, and this process is also applied in the third stage’s knowledge learning. Additionally, we exploit the homophily of GNNs to generate similar time-aware kernels for related KCs, effectively modeling their similar learning/forgetting processes. This defines the boundaries between related and unrelated KCs based on the number of hops in GNN operations, effectively addressing challenges associated with mastery changes between different KCs. It’s worth noting that KCs have various types of relations, including prerequisite, similarity, collaboration, remedial, and hierarchy (Gao et al., 2023). In GRKT, we primarily focus on leveraging the two most commonly used relations: prerequisite and similarity.

To the best of our knowledge, this work represents the first comprehensive analysis of the reasonability issues in current DLKT methods, and integrates multiple pedagogical theories to address these concerns. The main contributions of this paper are as follows:

  • Motivation. We identify the reasonability issues arising from the widespread adoption of deep learning techniques in the KT task. Many DLKT methods tend to excessively prioritize student performance prediction, often overlooking unreasonable knowledge tracing results due to the inherent interpretability challenges posed by neural networks.

  • Methods. We outline three primary reasonability issues prevalent in current DLKT methods. To address these issues, we introduce GRKT, a graph-based reasonable knowledge tracing, which establishes a three-stage learning process modeling. Additionally, we utilize the KC relation graph to mitigate mutual effects among KCs. The incorporation of multiple pedagogical theories provide sufficient support for our proposed method.

  • Experiments. Comprehensive experimental results showcase that our GRKT exhibits superior prediction performance and yields reasonable knowledge tracing results when compared to eleven baselines across three widely-used datasets.

2. related work

2.1. Reasonable Knowledge Tracing

Early KT methods in machine learning, such as Bayesian Knowledge Tracing (BKT) (Corbett and Anderson, 1994), initially showcased reasonable results due to their transparent and interpretable internal structure. BKT utilizes Hidden Markov Models (HMMs) to probabilistically represent the student learning process. It transitions knowledge mastery and emits probabilities of correct responses, while also considering guessing and slipping behaviors. Subsequent KT methods expanded upon BKT by incorporating additional pedagogical factors such as question difficulty (Pardos and Heffernan, 2011) or prior student information (Yudelson et al., 2013).

However, traditional methods show inferior prediction performance when compared to subsequent emerging DLKT methods (Piech et al., 2015; Liu et al., 2019; Shen et al., 2022; Pandey and Karypis, 2019; Ghosh et al., 2020; Choi et al., 2020; Cui et al., 2024), which reach high prediction performance due to the power of neural networks. Even so, these DLKT methods fail to produce reasonable knowledge tracing results due to their inherently opaque structures. Efforts have been made to tackle this challenge. Shen et al. (Shen et al., 2021) proposed Learning Process-consistent Knowledge Tracing (LPKT), which utilizes student response duration and interval time to capture learning and forgetting behaviors. However, it only focuses on knowledge learning and forgetting and does not model the interplay of knowledge mastery changes between KCs, limiting its reasonability. Similarly, Yin et al. (Yin et al., 2023) introduced the Diagnostic Transformer (DTransformer), which diagnoses student knowledge mastery from each tackled question and employs a contrastive learning framework to produce more stable knowledge tracing. While this stability enhances reasonability to some extent, its transformer-based structures do not adequately reflect the transition of knowledge mastery between continuous student responses. Therefore, while these approaches improve model reasonability from specific angles, they do not offer a comprehensive method to generate reasonable knowledge tracing results covering both KC relations and continuous learning processes.

We address this gap with our proposed GRKT, which utilizes GNNs to model KC relations and introduces a three-stage learning process to capture evolving knowledge mastery. By integrating these techniques, GRKT achieves high prediction performance while also generating more reasonable knowledge tracing results.

2.2. Graph-based Knowledge Tracing

Graph Neural Networks (GNNs) (Scarselli et al., 2008) serve as an efficient tool to capture intricate relations between instances in real-world scenarios. Their message aggregation and propagation operations on graphs yield deep representations for node features, enhancing performance in various downstream tasks across different domains. In the context of KT, researchers explore various structures to harness the power of GNNs. Nakagawa et al. (Nakagawa et al., 2019) pioneered the incorporation of GNNs into KT by reformulating it as a time-series node-level classification problem based on KC relation graphs. Gan et al. (Gan et al., 2022) leveraged this structure to enhance graph representation learning, generating more informative question and concept embeddings. Except for KC relations, question-question and question-KC relations are also widely considered. For instance, Bi-CLKT (Song et al., 2022) applied contrastive learning to question-KC and KC-KC graphs to generate question embeddings enriched with question and KC structural information. Another work (Yang et al., 2021) leveraged question-KC relations to address question sparsity and multi-skill problems. In our paper, we specifically focus on utilizing GNNs to model the mutual effects of KCs during students’ knowledge leveraging and changing, constructing a more reasonable approach to knowledge tracing.

It is worth noting that some other GNN-based or memory-based methods (e.g., GKT (Nakagawa et al., 2019) and DKVMN (Zhang et al., 2017)) also update mastery between KCs. However, their knowledge state updating is still potentially performed by the erase-followed-by-add mechanism, which uses GRU/LSTM cells unable to solve the reasonability issues such as not guaranteeing the direction of consistency change between KCs.

3. preliminary

3.1. Task Formulation

Knowledge tracing aims to trace the dynamic evolution of students’ knowledge mastery throughout their learning processes characterized by their responses to questions. Suppose there are a student set 𝒰𝒰\mathcal{U}caligraphic_U, a question set 𝒬𝒬\mathcal{Q}caligraphic_Q, and a KC set 𝒞𝒞\mathcal{C}caligraphic_C. Each student u𝒰𝑢𝒰u\in\mathcal{U}italic_u ∈ caligraphic_U has a historical response sequence u={r1u,r2u,,r|u|u}superscript𝑢subscriptsuperscript𝑟𝑢1subscriptsuperscript𝑟𝑢2subscriptsuperscript𝑟𝑢superscript𝑢\mathcal{H}^{u}=\{r^{u}_{1},r^{u}_{2},\cdots,r^{u}_{|\mathcal{H}^{u}|}\}caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = { italic_r start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_r start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | end_POSTSUBSCRIPT }, where each response rtu=(qtu,atu,ctu,Ttu)subscriptsuperscript𝑟𝑢𝑡subscriptsuperscript𝑞𝑢𝑡subscriptsuperscript𝑎𝑢𝑡subscriptsuperscript𝑐𝑢𝑡subscriptsuperscript𝑇𝑢𝑡r^{u}_{t}=\left(q^{u}_{t},a^{u}_{t},c^{u}_{t},T^{u}_{t}\right)italic_r start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) comprises the involved question qtu𝒬subscriptsuperscript𝑞𝑢𝑡𝒬q^{u}_{t}\in\mathcal{Q}italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_Q, the correctness atu{0,1}subscriptsuperscript𝑎𝑢𝑡01a^{u}_{t}\in\{0,1\}italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 } (where atu=1subscriptsuperscript𝑎𝑢𝑡1a^{u}_{t}=1italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 means a correct response), the KC ctu𝒞superscriptsubscript𝑐𝑡𝑢𝒞c_{t}^{u}\in\mathcal{C}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_C examined by the question, and the timestamp Ttusuperscriptsubscript𝑇𝑡𝑢T_{t}^{u}italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT of the response. It is worth noting that there could be multiple KCs associated with one question. To be concise, we use the notations with just one KC to describe the task setting and the proposed method, but our method is easily extended to the setting of multiple KCs (e.g., averaging the KC representations as mentioned in Section 4.3). The objective is to track and monitor the evolving knowledge mastery of u𝑢uitalic_u after each response, u={m1u,m2u,,m|u|u}superscript𝑢subscriptsuperscriptm𝑢1subscriptsuperscriptm𝑢2subscriptsuperscriptm𝑢superscript𝑢\mathcal{M}^{u}=\{\textbf{m}^{u}_{1},\textbf{m}^{u}_{2},\cdots,\textbf{m}^{u}_% {|\mathcal{H}^{u}|}\}caligraphic_M start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = { m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | end_POSTSUBSCRIPT } where mtusubscriptsuperscriptm𝑢𝑡\textbf{m}^{u}_{t}m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is stacked with {mci,tu|ci𝒞}conditional-setsubscriptsuperscript𝑚𝑢subscript𝑐𝑖𝑡subscript𝑐𝑖𝒞\{m^{u}_{c_{i},t}|c_{i}\in\mathcal{C}\}{ italic_m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT | italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C } and mci,tusubscriptsuperscript𝑚𝑢subscript𝑐𝑖𝑡m^{u}_{c_{i},t}italic_m start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT signifies the student’s knowledge mastery of the KC cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time step t𝑡titalic_t. A higher value denotes a superior level of mastery. However, the absence of annotated mastery levels necessitates researchers to resort to the student performance prediction task as a surrogate measure (Liu et al., 2021). In this paradigm, given usuperscript𝑢\mathcal{H}^{u}caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT, the objective is to predict whether student u𝑢uitalic_u can correctly answer a new question q|u|+1usubscriptsuperscript𝑞𝑢superscript𝑢1q^{u}_{|\mathcal{H}^{u}|+1}italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | + 1 end_POSTSUBSCRIPT, with its associated KC c|u|+1usubscriptsuperscript𝑐𝑢superscript𝑢1c^{u}_{|\mathcal{H}^{u}|+1}italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | + 1 end_POSTSUBSCRIPT at timestamp T|u|+1usubscriptsuperscript𝑇𝑢superscript𝑢1T^{u}_{|\mathcal{H}^{u}|+1}italic_T start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | + 1 end_POSTSUBSCRIPT. This hinges on the monotonicity assumption (Embretson and Reise, 2013), which posits that higher knowledge mastery leads to a higher probability of answering questions correctly. For brevity, we omit the superscript u𝑢uitalic_u in the later method description.

Refer to caption
Figure 2. The entire framework of GRKT encompasses three recurrent stages: knowledge retrieval, memory strengthening, and knowledge learning/forgetting.

4. Methodology

As shown in Figure 2, GRKT conducts a recurrent modeling within a three-stage learning process: knowledge retrieval, memory strengthening, and knowledge learning/forgetting. The proposed KC relation-based graph neural networks capture knowledge mastery variation between KCs throughout these stages. This section introduces the KC relation-based GNNs first, then explains the three-stage learning process modeling with these GNNs. For ease of understanding GRKT, we list and explain all relevant notations in Appendix A.

4.1. KC Relation-based Graph Neural Networks

Based on the transfer of learning theory (Perkins et al., 1992), we introduce KC relation-based GNNs to transfer the knowledge leveraging and changing throughout the three-stage learning process, as shown in Figure 2. To avoid repetition, we first elaborate on a prototype of KC relation-based GNNs in this section and highlight differences when applied to different stages in the subsequent sections.

Due to the lack of KC relation annotations, we follow previous works (Nakagawa et al., 2019; Song et al., 2022) that construct KC relations based on the data statistics. Details could be referred to in Appendix B. Besides, we focus on the two most common relations, prerequisite and similarity and extend three relation graphs 𝒫,𝒮,𝒫𝒮{\mathcal{P},\mathcal{S},\mathcal{R}}caligraphic_P , caligraphic_S , caligraphic_R, whose edges denote one KC being prerequisite/subsequent/relevant (similar) to another one. This is because the forward and backward message passed along the unidirectional prerequisite relation should be differentiated. Based on this, we design the KC relation-based GNNs with multiple layers. They receive KC node features such as knowledge memory, knowledge gain/loss, or knowledge learnt in the three stages, which would be introduced later. To capture the graph information, each layer first aggregates the features of each node’s neighbors for each graph 𝒢{𝒫,𝒮,}𝒢𝒫𝒮\mathcal{G}\in\{\mathcal{P},\mathcal{S},\mathcal{R}\}caligraphic_G ∈ { caligraphic_P , caligraphic_S , caligraphic_R } from the last layer as

(1) f¯ci𝒢,(l)=1|𝒢(ci)|cj𝒢(ci)(βci,cj𝒢f~cj(l1)Wproto𝒢,(l))subscriptsuperscript¯f𝒢𝑙subscript𝑐𝑖1𝒢subscript𝑐𝑖subscriptsubscript𝑐𝑗𝒢subscript𝑐𝑖subscriptsuperscript𝛽𝒢subscript𝑐𝑖subscript𝑐𝑗subscriptsuperscript~f𝑙1subscript𝑐𝑗subscriptsuperscriptW𝒢𝑙𝑝𝑟𝑜𝑡𝑜\bar{\textbf{f}}^{\mathcal{G},(l)}_{c_{i}}=\frac{1}{|\mathcal{G}(c_{i})|}\sum_% {c_{j}\in\mathcal{G}(c_{i})}\left(\beta^{\mathcal{G}}_{c_{i},c_{j}}\cdot\tilde% {\textbf{f}}^{(l-1)}_{c_{j}}\textbf{W}^{\mathcal{G},(l)}_{proto}\right)over¯ start_ARG f end_ARG start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_G ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_G ( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_β start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG f end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT )
(2) f~ci𝒢,(l)=ReLU(f¯ci𝒢,(l))Oproto𝒢,(l).subscriptsuperscript~f𝒢𝑙subscript𝑐𝑖ReLUsuperscriptsubscript¯fsubscript𝑐𝑖𝒢𝑙superscriptsubscriptO𝑝𝑟𝑜𝑡𝑜𝒢𝑙\tilde{\textbf{f}}^{\mathcal{G},(l)}_{c_{i}}=\text{ReLU}\left(\bar{\textbf{f}}% _{c_{i}}^{\mathcal{G},(l)}\right)\textbf{O}_{proto}^{\mathcal{G},(l)}.over~ start_ARG f end_ARG start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ReLU ( over¯ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT ) O start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT .

where Wproto𝒢,(l)dl1×dl1subscriptsuperscriptW𝒢𝑙𝑝𝑟𝑜𝑡𝑜superscriptsubscript𝑑𝑙1subscript𝑑𝑙1\textbf{W}^{\mathcal{G},(l)}_{proto}\in\mathbb{R}^{d_{l-1}\times d_{l-1}}W start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and Oproto𝒢,(l)dl1×dlsuperscriptsubscriptO𝑝𝑟𝑜𝑡𝑜𝒢𝑙superscriptsubscript𝑑𝑙1subscript𝑑𝑙\textbf{O}_{proto}^{\mathcal{G},(l)}\in\mathbb{R}^{d_{l-1}\times d_{l}}O start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the learnable weight matrices in this layer. 𝒢()𝒢\mathcal{G}(\cdot)caligraphic_G ( ⋅ ) is the neighbor function of 𝒢𝒢\mathcal{G}caligraphic_G. ReLU()ReLU\text{ReLU}(\cdot)ReLU ( ⋅ ) is an activation function to introduce non-linearity to enhance model representability. βci,cj𝒢subscriptsuperscript𝛽𝒢subscript𝑐𝑖subscript𝑐𝑗\beta^{\mathcal{G}}_{c_{i},c_{j}}italic_β start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the correlation score of KC cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT on graph 𝒢𝒢\mathcal{G}caligraphic_G, obtained by

(3) βci,cj𝒢=σ(kciTWcor𝒢kcj).subscriptsuperscript𝛽𝒢subscript𝑐𝑖subscript𝑐𝑗𝜎subscriptsuperscriptkTsubscript𝑐𝑖subscriptsuperscriptW𝒢𝑐𝑜𝑟subscriptksubscript𝑐𝑗\beta^{\mathcal{G}}_{c_{i},c_{j}}=\sigma\left({\textbf{k}}^{\text{T}}_{c_{i}}% \textbf{W}^{\mathcal{G}}_{cor}\textbf{k}_{c_{j}}\right).italic_β start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_σ ( k start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_o italic_r end_POSTSUBSCRIPT k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

kci,kcj1×desubscriptksubscript𝑐𝑖subscriptksubscript𝑐𝑗superscript1subscript𝑑𝑒\textbf{k}_{c_{i}},\textbf{k}_{c_{j}}\in\mathbb{R}^{1\times d_{e}}k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the two KCs’ embeddings where desubscript𝑑𝑒d_{e}italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is the number of embedding dimensions. Wcor𝒢de×desubscriptsuperscriptW𝒢𝑐𝑜𝑟superscriptsubscript𝑑𝑒subscript𝑑𝑒\textbf{W}^{\mathcal{G}}_{cor}\in\mathbb{R}^{d_{e}\times d_{e}}W start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_o italic_r end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the trainable matrix for 𝒢𝒢\mathcal{G}caligraphic_G, and σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) denotes the sigmoid function, which regularizes the score in (0,1)01(0,1)( 0 , 1 ). We then fuse the aggregated features from the three graphs by

(4) f~ci(l)={𝒢{𝒫,𝒮,}f~ci𝒢,(l)+f~ci(l1),if dl1=dl,𝒢{𝒫,𝒮,}f~ci𝒢,(l),if dl1dl,subscriptsuperscript~f𝑙subscript𝑐𝑖casessubscript𝒢𝒫𝒮superscriptsubscript~fsubscript𝑐𝑖𝒢𝑙subscriptsuperscript~f𝑙1subscript𝑐𝑖if subscript𝑑𝑙1subscript𝑑𝑙subscript𝒢𝒫𝒮superscriptsubscript~fsubscript𝑐𝑖𝒢𝑙if subscript𝑑𝑙1subscript𝑑𝑙\tilde{\textbf{f}}^{(l)}_{c_{i}}=\begin{cases}\sum_{\mathcal{G}\in\{\mathcal{P% },\mathcal{S},\mathcal{R}\}}\tilde{\textbf{f}}_{c_{i}}^{\mathcal{G},(l)}+% \tilde{\textbf{f}}^{(l-1)}_{c_{i}},&\text{if }d_{l-1}=d_{l},\\ \sum_{\mathcal{G}\in\{\mathcal{P},\mathcal{S},\mathcal{R}\}}\tilde{\textbf{f}}% _{c_{i}}^{\mathcal{G},(l)},&\text{if }d_{l-1}\neq d_{l},\end{cases}over~ start_ARG f end_ARG start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { start_ROW start_CELL ∑ start_POSTSUBSCRIPT caligraphic_G ∈ { caligraphic_P , caligraphic_S , caligraphic_R } end_POSTSUBSCRIPT over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT + over~ start_ARG f end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , end_CELL start_CELL if italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT caligraphic_G ∈ { caligraphic_P , caligraphic_S , caligraphic_R } end_POSTSUBSCRIPT over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT ≠ italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , end_CELL end_ROW

where we apply a residual connection (Szegedy et al., 2017) when dl1=dlsubscript𝑑𝑙1subscript𝑑𝑙d_{l-1}=d_{l}italic_d start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT to stabilize the training process. In this prototype, we denote the input features of all KCs as F~(0)|C|×d0superscript~F0superscript𝐶subscript𝑑0\tilde{\textbf{F}}^{(0)}\in\mathbb{R}^{|C|\times d_{0}}over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | italic_C | × italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and one of them as f~ci(0)1×d0subscriptsuperscript~f0subscript𝑐𝑖superscript1subscript𝑑0\tilde{\textbf{f}}^{(0)}_{c_{i}}\in\mathbb{R}^{1\times d_{0}}over~ start_ARG f end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for KC cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the output features as F~(L)|C|×dLsuperscript~F𝐿superscript𝐶subscript𝑑𝐿\tilde{\textbf{F}}^{(L)}\in\mathbb{R}^{|C|\times d_{L}}over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | italic_C | × italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and f~ci(L)1×dLsuperscriptsubscript~fsubscript𝑐𝑖𝐿superscript1subscript𝑑𝐿\tilde{\textbf{f}}_{c_{i}}^{(L)}\in\mathbb{R}^{1\times d_{L}}over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where d0,dLsubscript𝑑0subscript𝑑𝐿d_{0},d_{L}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT are the numbers of input and output feature dimensions. Then this prototype GNN is formulated as:

(5) F~(L)=GNNproto(F~(0)|d0,d1,,dL)superscript~F𝐿subscriptGNN𝑝𝑟𝑜𝑡𝑜conditionalsuperscript~F0subscript𝑑0subscript𝑑1subscript𝑑𝐿\tilde{\textbf{F}}^{(L)}=\text{GNN}_{proto}(\tilde{\textbf{F}}^{(0)}|d_{0},d_{% 1},\cdots,d_{L})over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT ( over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT | italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )
(6) f~ci(L)=GNNproto(f~ci(0)|d0,d1,,dL).superscriptsubscript~fsubscript𝑐𝑖𝐿subscriptGNN𝑝𝑟𝑜𝑡𝑜conditionalsuperscriptsubscript~fsubscript𝑐𝑖0subscript𝑑0subscript𝑑1subscript𝑑𝐿\tilde{\textbf{f}}_{c_{i}}^{(L)}=\text{GNN}_{proto}(\tilde{\textbf{f}}_{c_{i}}% ^{(0)}|d_{0},d_{1},\cdots,d_{L}).over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT ( over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT | italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) .

This prototype is then extended for different student learning stages to construct reasonable knowledge tracing based on the transfer of learning theory. Besides, the number of layers L𝐿Litalic_L controls the number of hops the feature propagates on the graphs, which clarifies the boundary between related and non-related KCs.

4.2. Knowledge Memory & Knowledge Tracing

GRKT aims to model the process of student retrieving and learning knowledge with their memory. Therefore, we employ a dynamic knowledge memory bank denoted as H|C|×dkHsuperscript𝐶subscript𝑑𝑘\textbf{H}\in\mathbb{R}^{|C|\times d_{k}}H ∈ blackboard_R start_POSTSUPERSCRIPT | italic_C | × italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where each row hcisubscripthsubscript𝑐𝑖\textbf{h}_{c_{i}}h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT encodes the current knowledge memory of KC cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for the student. Here, dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT signifies the number of memory dimensions. This memory bank evolves alongside the student’s learning process, represented as HtsubscriptH𝑡\textbf{H}_{t}H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with a learnable initial state H0subscriptH0\textbf{H}_{0}H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT representing their prior knowledge before engaging in any learning behavior. To track the knowledge mastery of a specific KC, we apply a non-negative projection vector wh0dk×1subscriptwsubscriptsuperscriptsubscript𝑑𝑘1absent0\textbf{w}_{h}\in\mathbb{R}^{d_{k}\times 1}_{\geq 0}w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT to hci,tsubscripthsubscript𝑐𝑖𝑡\textbf{h}_{c_{i},t}h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT using the equation:

(7) m^ci,t=hci,twh,subscript^𝑚subscript𝑐𝑖𝑡subscripthsubscript𝑐𝑖𝑡subscriptw\hat{m}_{c_{i},t}=\textbf{h}_{c_{i},t}\cdot\textbf{w}_{h},over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ⋅ w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ,

which yields the mastery of KC cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time step t𝑡titalic_t. The non-negative constraint on the network weights guarantees the monotonic relationship between mastery and each memory dimension. This technique has been widely adopted in numerous studies (Wang et al., 2022, 2021) to satisfy the monotonicity assumption. Moreover, we leverage this constraint to establish a foundation for reasonable knowledge tracing, which would be gradually refined in subsequent descriptions.

4.3. Stage I: Knowledge Retrieval

In this stage, students retrieve stored knowledge from memory to solve given questions, a mechanism explained by memory theory (Melton, 1963). Additionally, the transfer of learning theory (Perkins et al., 1992) suggests that learners transfer knowledge from similar fields to tackle problems. Leveraging this insight, we employ a KC relation-based GNN to model knowledge transfer from related KCs. Specifically, we aggregate the knowledge memory of the given KC ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to solve its corresponding question qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT before time step t𝑡titalic_t (represented as tsuperscript𝑡t^{-}italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT):

(8) h~ct,t(L)=GNNrtv(h~ct,t(0)|{dk}L+1),superscriptsubscript~hsubscript𝑐𝑡superscript𝑡𝐿subscriptGNN𝑟𝑡𝑣conditionalsubscriptsuperscript~h0subscript𝑐𝑡superscript𝑡subscriptsubscript𝑑𝑘𝐿1\tilde{\textbf{h}}_{c_{t},t^{-}}^{(L)}=\text{GNN}_{rtv}(\tilde{\textbf{h}}^{(0% )}_{c_{t},t^{-}}|\{d_{k}\}_{L+1})\,,over~ start_ARG h end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = GNN start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT ( over~ start_ARG h end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | { italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT ) ,

with initializing h~ct,t(0)=hct,tsubscriptsuperscript~h0subscript𝑐𝑡superscript𝑡subscripthsubscript𝑐𝑡superscript𝑡\tilde{\textbf{h}}^{(0)}_{c_{t},t^{-}}=\textbf{h}_{c_{t},t^{-}}over~ start_ARG h end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Recognizing that different questions have different mastery requirements of KCs, we incorporate question-KC correlation scores into the aggregation process in this GNN, which are calculated by:

(9) αqi,cj=σ(eqiTWreqkcj),subscript𝛼subscript𝑞𝑖subscript𝑐𝑗𝜎subscriptsuperscripteTsubscript𝑞𝑖subscriptW𝑟𝑒𝑞subscriptksubscript𝑐𝑗\alpha_{q_{i},c_{j}}=\sigma\left({\textbf{e}}^{\text{T}}_{q_{i}}\textbf{W}_{% req}\textbf{k}_{c_{j}}\right)\,,italic_α start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_σ ( e start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT W start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ,

where eqide×1subscriptesubscript𝑞𝑖superscriptsubscript𝑑𝑒1\textbf{e}_{q_{i}}\in\mathbb{R}^{d_{e}\times 1}e start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT and kcjsubscriptksubscript𝑐𝑗\textbf{k}_{c_{j}}k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the embeddings of qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and Wreqde×desubscriptW𝑟𝑒𝑞superscriptsubscript𝑑𝑒subscript𝑑𝑒\textbf{W}_{req}\in\mathbb{R}^{d_{e}\times d_{e}}W start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a learnable matrix. Then, the graph message aggregation process of Equation 8 is actually

(10) h~ct𝒢,(l)=1|𝒢(ct)|ci𝒢(ct)(αqt,ciβct,ci𝒢h~ci(l1)Wrtv𝒢,(l)).subscriptsuperscript~h𝒢𝑙subscript𝑐𝑡1𝒢subscript𝑐𝑡subscriptsubscript𝑐𝑖𝒢subscript𝑐𝑡subscript𝛼subscript𝑞𝑡subscript𝑐𝑖subscriptsuperscript𝛽𝒢subscript𝑐𝑡subscript𝑐𝑖subscriptsuperscript~h𝑙1subscript𝑐𝑖subscriptsuperscriptW𝒢𝑙𝑟𝑡𝑣\tilde{\textbf{h}}^{\mathcal{G},(l)}_{c_{t}}=\frac{1}{|\mathcal{G}(c_{t})|}% \sum_{c_{i}\in\mathcal{G}(c_{t})}\left(\alpha_{q_{t},c_{i}}\cdot\beta^{% \mathcal{G}}_{c_{t},c_{i}}\cdot\tilde{\textbf{h}}^{(l-1)}_{c_{i}}\textbf{W}^{% \mathcal{G},(l)}_{rtv}\right).over~ start_ARG h end_ARG start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_G ( italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | end_ARG ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_G ( italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_β start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG h end_ARG start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT ) .

We also remove the non-linear feed-forward process and restrict Wrtv𝒢,(l)0dk×dksubscriptsuperscriptW𝒢𝑙𝑟𝑡𝑣subscriptsuperscriptsubscript𝑑𝑘subscript𝑑𝑘absent0\textbf{W}^{\mathcal{G},(l)}_{rtv}\in\mathbb{R}^{d_{k}\times d_{k}}_{\geq 0}W start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT to ensure higher values of the related KCs’ memory bring higher knowledge mastery. After getting the aggregating knowledge memory from this GNN, we get the knowledge mastery as Equation 7 and compare it with the question difficulty dqtsubscript𝑑subscript𝑞𝑡d_{q_{t}}italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT to generate the predictive probability of solving the question:

(11) a^t=σ(h~ct,t(L)whdqt).subscript^𝑎𝑡𝜎superscriptsubscript~hsubscript𝑐𝑡𝑡𝐿subscriptwsubscript𝑑subscript𝑞𝑡\hat{a}_{t}=\sigma\left(\tilde{\textbf{h}}_{c_{t},t}^{(L)}\cdot\textbf{w}_{h}-% d_{q_{t}}\right).over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_σ ( over~ start_ARG h end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ⋅ w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

For multi-KC questions, we average the KCs’ memory. The difficulty dqtsubscript𝑑subscript𝑞𝑡d_{q_{t}}italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT of question qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is generated by a Multi-Layer Perception (MLP):

(12) dqt=ReLU(e¯qtWdiff(1)+bdiff(1))Wdiff(2)+bdiff(2).subscript𝑑subscript𝑞𝑡ReLUsubscript¯esubscript𝑞𝑡subscriptsuperscriptW1𝑑𝑖𝑓𝑓subscriptsuperscriptb1𝑑𝑖𝑓𝑓subscriptsuperscriptW2𝑑𝑖𝑓𝑓subscriptsuperscriptb2𝑑𝑖𝑓𝑓d_{q_{t}}=\text{ReLU}\left(\bar{\textbf{e}}_{q_{t}}\textbf{W}^{(1)}_{diff}+% \textbf{b}^{(1)}_{diff}\right)\textbf{W}^{(2)}_{diff}+\textbf{b}^{(2)}_{diff}.italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ReLU ( over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT + b start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT ) W start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT + b start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT .

Here, e¯qt=[kcteqt]subscript¯esubscript𝑞𝑡delimited-[]direct-sumsubscriptksubscript𝑐𝑡subscriptesubscript𝑞𝑡\bar{\textbf{e}}_{q_{t}}=[\textbf{k}_{c_{t}}\oplus\textbf{e}_{q_{t}}]over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ e start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] is the concatenated representation of qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and its examined KC ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT’s embeddings. For multi-KC questions, we use the KCs’ average embedding. Wdiff(1)2de×dhsuperscriptsubscriptW𝑑𝑖𝑓𝑓1superscript2subscript𝑑𝑒subscript𝑑\textbf{W}_{diff}^{(1)}\in\mathbb{R}^{2d_{e}\times d_{h}}W start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Wdiff(2)dh×1superscriptsubscriptW𝑑𝑖𝑓𝑓2superscriptsubscript𝑑1\textbf{W}_{diff}^{(2)}\in\mathbb{R}^{d_{h}\times 1}W start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT, bdiff(1)1×dhsuperscriptsubscriptb𝑑𝑖𝑓𝑓1superscript1subscript𝑑\textbf{b}_{diff}^{(1)}\in\mathbb{R}^{1\times d_{h}}b start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and bdiff(2)1×1superscriptsubscriptb𝑑𝑖𝑓𝑓2superscript11\textbf{b}_{diff}^{(2)}\in\mathbb{R}^{1\times 1}b start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × 1 end_POSTSUPERSCRIPT are learnable matrices and vectors. dhsubscript𝑑d_{h}italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is the number of hidden dimensions. We denote the process of this two-layer MLP as dqt=MLPdiff(e¯qt|2de,dh,1)subscript𝑑subscript𝑞𝑡subscriptMLP𝑑𝑖𝑓𝑓conditionalsubscript¯esubscript𝑞𝑡2subscript𝑑𝑒subscript𝑑1d_{q_{t}}=\text{MLP}_{diff}(\bar{\textbf{e}}_{q_{t}}|2d_{e},d_{h},1)italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = MLP start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT ( over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | 2 italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , 1 ), and a similar notation is applied for brevity in subsequent descriptions. Hereinafter, we accurately model the process whereby students retrieve knowledge from memory to answer new questions.

4.4. Stage II: Memory Strengthening

The testing effect theory (Roediger III and Karpicke, 2006) reveals that a correct retrieval strengthens the storage of knowledge in memory, while an unsuccessful retrieval can lead to incorrect strengthening. Without correction or active learning after the error, this may reduce knowledge mastery (Kornell et al., 2009). In this stage, we determine the memory strengthening process based on whether the examined KC is correctly retrieved to solve the question, resulting in either knowledge gain or loss. Additionally, these knowledge changes are propagated to related KCs based on the transfer of learning theory. To enhance memory from a correct response to question qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we first combine and input the current memory hct,tsubscripthsubscript𝑐𝑡superscript𝑡\textbf{h}_{c_{t},t^{-}}h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of KC ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the question information e¯qtsubscript¯esubscript𝑞𝑡\bar{\textbf{e}}_{q_{t}}over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT into an MLP to obtain an initial memory feature:

(13) gct,t=MLPgain([hct,te¯qt]|dk+2de,dh,dk).subscriptgsubscript𝑐𝑡𝑡subscriptMLP𝑔𝑎𝑖𝑛conditionaldelimited-[]direct-sumsubscripthsubscript𝑐𝑡superscript𝑡subscript¯esubscript𝑞𝑡subscript𝑑𝑘2subscript𝑑𝑒subscript𝑑subscript𝑑𝑘\textbf{g}_{c_{t},t}=\text{MLP}_{gain}\left([\textbf{h}_{c_{t},t^{-}}\oplus% \bar{\textbf{e}}_{q_{t}}]|d_{k}+2d_{e},d_{h},d_{k}\right).g start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = MLP start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT ( [ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⊕ over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 2 italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .

For multi-KC question, we calculate all the associated KCs’ features. This feature serves as a spark to propagate knowledge changes via another KC relation-based GNN. Specifically, by initializing an input feature matrix G~t(0)subscriptsuperscript~G0𝑡\tilde{\textbf{G}}^{(0)}_{t}over~ start_ARG G end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where g~ci,t(0)=gct,tsubscriptsuperscript~g0subscript𝑐𝑖𝑡subscriptgsubscript𝑐𝑡𝑡\tilde{\textbf{g}}^{(0)}_{c_{i},t}=\textbf{g}_{c_{t},t}over~ start_ARG g end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = g start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT if ci=ctsubscript𝑐𝑖subscript𝑐𝑡c_{i}=c_{t}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and g~ci,t(0)=0subscriptsuperscript~g0subscript𝑐𝑖𝑡0\tilde{\textbf{g}}^{(0)}_{c_{i},t}=\textbf{0}over~ start_ARG g end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 0 if cictsubscript𝑐𝑖subscript𝑐𝑡c_{i}\neq c_{t}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the knowledge gain for all KCs is obtained as follows:

(14) G~t(L)=ReLU(GNNgain(G~t(0)|{dk}L+1)).superscriptsubscript~G𝑡𝐿ReLUsubscriptGNN𝑔𝑎𝑖𝑛conditionalsubscriptsuperscript~G0𝑡subscriptsubscript𝑑𝑘𝐿1\tilde{\textbf{G}}_{t}^{(L)}=\text{ReLU}(\text{GNN}_{gain}(\tilde{\textbf{G}}^% {(0)}_{t}|\{d_{k}\}_{L+1})).over~ start_ARG G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = ReLU ( GNN start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT ( over~ start_ARG G end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | { italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT ) ) .

The ReLU()ReLU\text{ReLU}(\cdot)ReLU ( ⋅ ) activation function ensures that the knowledge gain to be positive. Moreover, due to the zero feature initialization except for the examined KC, the knowledge gain is only propagated to KCs within L𝐿Litalic_L hops, delineating a boundary between related and unrelated KCs. Similarly, we could derive the negative knowledge loss L~t(L)superscriptsubscript~L𝑡𝐿\tilde{\textbf{L}}_{t}^{(L)}over~ start_ARG L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT when students provide incorrect responses and wrongly strengthen their memory, by using a similar network GNNloss()subscriptGNN𝑙𝑜𝑠𝑠\text{GNN}_{loss}(\cdot)GNN start_POSTSUBSCRIPT italic_l italic_o italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ).

Subsequently, we update the knowledge memory bank with respect to the response atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as follows:

(15) Ht=Ht+atG~t(L)+(1at)L~t(L).subscriptH𝑡subscriptHsuperscript𝑡subscript𝑎𝑡superscriptsubscript~G𝑡𝐿1subscript𝑎𝑡superscriptsubscript~L𝑡𝐿\textbf{H}_{t}=\textbf{H}_{t^{-}}+a_{t}\tilde{\textbf{G}}_{t}^{(L)}+(1-a_{t})% \tilde{\textbf{L}}_{t}^{(L)}.H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = H start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over~ start_ARG G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT + ( 1 - italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over~ start_ARG L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT .

It is worth noting that different questions also have different effects on strengthening students’ memory of KCs. Therefore, similar to Equation 10, these two GNNs also add the question-KC correlation scores during message passing. Henceforth, the second stage, memory strengthening, is reasonably modeled based on the testing effect and the transfer of learning.

4.5. Stage III: Knowledge Learning/Forgetting

After students answer questions, their subsequent actions vary depending on the feedback received. They may review their correct answers or correct their mistakes. Besides, they might prepare for the next question’s KC they would encounter. These active learning behaviors contribute to improving their knowledge mastery, which we model as the knowledge learning process in this stage. Concretely, the KC of the last question and the next question both influence the student’s learning target. Therefore, we use an MLP to determine if the student actively learns them based on his/her current knowledge memory and the involved questions’ information. For KC ci{ct,ct+1}subscript𝑐𝑖subscript𝑐𝑡subscript𝑐𝑡1c_{i}\in\{c_{t},c_{t+1}\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT } (or more involved KCs for multi-KC questions), the two-dimension policy distribution is calculated by:

(16) πci,t=softmax(MLPdcs([hci,te¯qte¯qt+1]|dk+4de,dh,2)).subscript𝜋subscript𝑐𝑖𝑡softmaxsubscriptMLP𝑑𝑐𝑠conditionaldelimited-[]direct-sumsubscripthsubscript𝑐𝑖𝑡subscript¯esubscript𝑞𝑡subscript¯esubscript𝑞𝑡1subscript𝑑𝑘4subscript𝑑𝑒subscript𝑑2\pi_{c_{i},t}=\text{softmax}\left(\text{MLP}_{dcs}([\textbf{h}_{c_{i},t}\oplus% \bar{\textbf{e}}_{q_{t}}\oplus\bar{\textbf{e}}_{q_{t+1}}]|d_{k}+4d_{e},d_{h},2% )\right).italic_π start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = softmax ( MLP start_POSTSUBSCRIPT italic_d italic_c italic_s end_POSTSUBSCRIPT ( [ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ⊕ over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 4 italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , 2 ) ) .

Here, argmax πci,t=0argmax subscript𝜋subscript𝑐𝑖𝑡0\text{argmax }\pi_{c_{i},t}=0argmax italic_π start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 0 indicates that the first dimension is bigger. We suppose there is no active learning. Contrarily, argmax πci,t=1argmax subscript𝜋subscript𝑐𝑖𝑡1\text{argmax }\pi_{c_{i},t}=1argmax italic_π start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 1 indicates the student would learn cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Under this circumstance, we calculate the progress of learning cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in a similar way:

(17) pci,t=MLPprg([hci,te¯qte¯qt+1]|dk+4de,dh,dk)).\textbf{p}_{c_{i},t}=\text{MLP}_{prg}([\textbf{h}_{c_{i},t}\oplus\bar{\textbf{% e}}_{q_{t}}\oplus\bar{\textbf{e}}_{q_{t+1}}]|d_{k}+4d_{e},d_{h},d_{k})).p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = MLP start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT ( [ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ⊕ over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊕ over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] | italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 4 italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) .

Based on the transfer of learning theory, this progress is also propagated to related KCs using another KC relation-based GNN. After initializing P~t(0)subscriptsuperscript~P0𝑡\tilde{\textbf{P}}^{(0)}_{t}over~ start_ARG P end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT where p~ci,t(0)=pci,tsubscriptsuperscript~p0subscript𝑐𝑖𝑡subscriptpsubscript𝑐𝑖𝑡\tilde{\textbf{p}}^{(0)}_{c_{i},t}=\textbf{p}_{c_{i},t}over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT for ci{ct,ct+1}subscript𝑐𝑖subscript𝑐𝑡subscript𝑐𝑡1c_{i}\in\{c_{t},c_{t+1}\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT } with argmax πci,t=1argmax subscript𝜋subscript𝑐𝑖𝑡1\text{argmax }\pi_{c_{i},t}=1argmax italic_π start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 1, and p~ci,t(0)=0subscriptsuperscript~p0subscript𝑐𝑖𝑡0\tilde{\textbf{p}}^{(0)}_{c_{i},t}=\textbf{0}over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 0 otherwise, we compute

(18) P~t(L)=ReLU(GNNprg(P~t(0)|{dk}L+1)).superscriptsubscript~P𝑡𝐿ReLUsubscriptGNN𝑝𝑟𝑔conditionalsubscriptsuperscript~P0𝑡subscriptsubscript𝑑𝑘𝐿1\tilde{\textbf{P}}_{t}^{(L)}=\text{ReLU}(\text{GNN}_{prg}(\tilde{\textbf{P}}^{% (0)}_{t}|\{d_{k}\}_{L+1})).over~ start_ARG P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = ReLU ( GNN start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT ( over~ start_ARG P end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | { italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT ) ) .

This active learning process continues until the student answers the next question, allowing us to model each KC’s progress p~ci,t(L),ci𝒞superscriptsubscript~psubscript𝑐𝑖𝑡𝐿subscript𝑐𝑖𝒞\tilde{\textbf{p}}_{c_{i},t}^{(L)},c_{i}\in\mathcal{C}over~ start_ARG p end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C with a KC-specific time-aware kernel function to update:

(19) hci,(t+1)=hci,t+ϕci(p~ci,t(L),ΔTt+1)subscripthsubscript𝑐𝑖superscript𝑡1subscripthsubscript𝑐𝑖𝑡subscriptbold-italic-ϕsubscript𝑐𝑖subscriptsuperscript~p𝐿subscript𝑐𝑖𝑡Δsubscript𝑇𝑡1\textbf{h}_{c_{i},(t+1)^{-}}=\textbf{h}_{c_{i},t}+\boldsymbol{\phi}_{c_{i}}(% \tilde{\textbf{p}}^{(L)}_{c_{i},t},\Delta T_{t+1})h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ( italic_t + 1 ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT + bold_italic_ϕ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT , roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT )

where ΔTt+1=Tt+1TtΔsubscript𝑇𝑡1subscript𝑇𝑡1subscript𝑇𝑡\Delta T_{t+1}=T_{t+1}-T_{t}roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the time duration until the next question. According to the learning curve (Yelle, 1979), the efficiency of students in learning a specific KC tends to be high initially and gradually decreases over both the learning time and frequency. Therefore, we design the kernel function in an exponential form:

(20) ϕci(p~ci,t(L),ΔTt+1)=p~ci,t(L)(1exp((nci,t+1)ΔTt+1𝜸~ci(L))),subscriptbold-italic-ϕsubscript𝑐𝑖subscriptsuperscript~p𝐿subscript𝑐𝑖𝑡Δsubscript𝑇𝑡1direct-productsubscriptsuperscript~p𝐿subscript𝑐𝑖𝑡1expsubscript𝑛subscript𝑐𝑖𝑡1Δsubscript𝑇𝑡1superscriptsubscript~𝜸subscript𝑐𝑖𝐿\boldsymbol{\phi}_{c_{i}}(\tilde{\textbf{p}}^{(L)}_{c_{i},t},\Delta T_{t+1})=% \tilde{\textbf{p}}^{(L)}_{c_{i},t}\odot(\textbf{1}-\text{exp}(-(n_{c_{i},t}+1)% \Delta T_{t+1}\cdot\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(L)})),bold_italic_ϕ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT , roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) = over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ⊙ ( 1 - exp ( - ( italic_n start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT + 1 ) roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ⋅ over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ) ,

where direct-product\odot is the Hadamard product. nci,tsubscript𝑛subscript𝑐𝑖𝑡n_{c_{i},t}italic_n start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT is the number of times that cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has been learned by the student. 𝜸~ci(L)superscriptsubscript~𝜸subscript𝑐𝑖𝐿\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(L)}over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT represents the KC-specific kernel parameters of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT generated by another KC relation-based GNN. It leverages the property of graph homophily that makes related KCs have similar learning ratios:

(21) 𝜸~ci(L)=softplus(GNNlrn(𝜸~ci(0)|de,{dk}L)),superscriptsubscript~𝜸subscript𝑐𝑖𝐿softplussubscriptGNN𝑙𝑟𝑛conditionalsuperscriptsubscript~𝜸subscript𝑐𝑖0subscript𝑑𝑒subscriptsubscript𝑑𝑘𝐿\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(L)}=\text{softplus}(\text{GNN}_{lrn}(% \tilde{\boldsymbol{\gamma}}_{c_{i}}^{(0)}|d_{e},\{d_{k}\}_{L}))\,,over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = softplus ( GNN start_POSTSUBSCRIPT italic_l italic_r italic_n end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT | italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , { italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ) ,

with initializing 𝜸~ci(0)=kcisuperscriptsubscript~𝜸subscript𝑐𝑖0subscriptksubscript𝑐𝑖\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(0)}=\textbf{k}_{c_{i}}over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT which is cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s embedding. Here, softplus()softplus\text{softplus}(\cdot)softplus ( ⋅ ) is an activation function to restrict the parameter to be positive. On the other hand, for KCs that students have acquired before but they do not choose to learn, we introduce the knowledge forgetting process. Therefore, for the KCs students do not make progress on (i.e., p~ci,t(L)=0subscriptsuperscript~p𝐿subscript𝑐𝑖𝑡0\tilde{\textbf{p}}^{(L)}_{c_{i},t}=\textbf{0}over~ start_ARG p end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = 0), their previously acquired knowledge fades over time:

(22) hci,(t+1)=hci,t𝜿ci(Δhci,t,ΔTt+1)subscripthsubscript𝑐𝑖superscript𝑡1subscripthsubscript𝑐𝑖𝑡subscript𝜿subscript𝑐𝑖Δsubscripthsubscript𝑐𝑖𝑡Δsubscript𝑇𝑡1\textbf{h}_{c_{i},(t+1)^{-}}=\textbf{h}_{c_{i},t}-\boldsymbol{\kappa}_{c_{i}}(% \Delta\textbf{h}_{c_{i},t},\Delta T_{t+1})h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ( italic_t + 1 ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT - bold_italic_κ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT , roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT )

where Δhci,t=hci,thci,0Δsubscripthsubscript𝑐𝑖𝑡subscripthsubscript𝑐𝑖𝑡subscripthsubscript𝑐𝑖0\Delta\textbf{h}_{c_{i},t}=\textbf{h}_{c_{i},t}-\textbf{h}_{c_{i},0}roman_Δ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT = h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT - h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0 end_POSTSUBSCRIPT represents the total knowledge acquisition the student has accumulated. According to the forgetting curve (Ebbinghaus, 1885), the speed that students forget knowledge follows a pattern of initially rapid decay and then a gradual decrease over time and the review frequency. Therefore, we similarly design KC-specific forgetting kernel functions in an exponential form:

(23) 𝜿ci(Δhci,t,ΔTt+1)=Δhci,t(1exp((nci,t+1)ΔTt+1𝜽~ci(L))),subscript𝜿subscript𝑐𝑖Δsubscripthsubscript𝑐𝑖𝑡Δsubscript𝑇𝑡1direct-productΔsubscripthsubscript𝑐𝑖𝑡1expsubscript𝑛subscript𝑐𝑖𝑡1Δsubscript𝑇𝑡1superscriptsubscript~𝜽subscript𝑐𝑖𝐿\boldsymbol{\kappa}_{c_{i}}(\Delta\textbf{h}_{c_{i},t},\Delta T_{t+1})=\Delta% \textbf{h}_{c_{i},t}\odot(\textbf{1}-\text{exp}(-(n_{c_{i},t}+1)\Delta T_{t+1}% \cdot\tilde{\boldsymbol{\theta}}_{c_{i}}^{(L)})),bold_italic_κ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT , roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) = roman_Δ h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ⊙ ( 1 - exp ( - ( italic_n start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT + 1 ) roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ⋅ over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ) ,

where the kernel parameters 𝜽~ci(L)superscriptsubscript~𝜽subscript𝑐𝑖𝐿\tilde{\boldsymbol{\theta}}_{c_{i}}^{(L)}over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT are similarly generated by another KC relation-based GNN:

(24) 𝜽~ci(L)=softplus(GNNfgt(𝜽~ci(0)|de,{dk}L))superscriptsubscript~𝜽subscript𝑐𝑖𝐿softplussubscriptGNN𝑓𝑔𝑡conditionalsuperscriptsubscript~𝜽subscript𝑐𝑖0subscript𝑑𝑒subscriptsubscript𝑑𝑘𝐿\tilde{\boldsymbol{\theta}}_{c_{i}}^{(L)}=\text{softplus}(\text{GNN}_{fgt}(% \tilde{\boldsymbol{\theta}}_{c_{i}}^{(0)}|d_{e},\{d_{k}\}_{L}))over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = softplus ( GNN start_POSTSUBSCRIPT italic_f italic_g italic_t end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT | italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , { italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) )

with initializing 𝜽~ci(0)=kcisuperscriptsubscript~𝜽subscript𝑐𝑖0subscriptksubscript𝑐𝑖\tilde{\boldsymbol{\theta}}_{c_{i}}^{(0)}=\textbf{k}_{c_{i}}over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Consequently, based on the learning and forgetting curves, we have derived the updated knowledge memory H(t+1)subscriptHsuperscript𝑡1\textbf{H}_{(t+1)^{-}}H start_POSTSUBSCRIPT ( italic_t + 1 ) start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in this stage, which is recursively used for answering the next question.

Table 1. Statistics of the three preprocessed datasets.
Dataset ASSIST09 ASSIST12 Junyi
#response 0.4m 2.6m 25.4m
#sequence 7.4k 38.1k 325.4k
#question 13.5k 51.0k 2.8k
#concept 140 198 722
#concept/question 1.22 1.0 1.0
Table 2. Results of the main experiments. The best results among GRKT and the baselines are in bold. The second ones are in italic. * indicates statistical significance over the best baseline, measured by T-test with p-value \leq 0.05. “CONS”, “GAUC” and “RPT” are short for the three metrics for reasonability, consistency, GAUCM and Repetition.
Dataset ASSIST09 ASSIST12 Junyi
Metric AUC ACC CONS GAUC RPT AUC ACC CONS GAUC RPT AUC ACC CONS GAUC RPT
DKT 0.7695 0.7246 0.6463 0.7172 0.8131 0.7303 0.7358 0.6772 0.6929 0.7955 0.8003 0.8541 0.7432 0.6415 0.8790
DKVMN 0.7680 0.7239 0.8708 0.7116 0.8061 0.7279 0.7349 0.9273 0.6729 0.7971 0.8004 0.8541 0.9455 0.6379 0.8780
DKT+ 0.7707 0.7245 0.6364 0.7089 0.8395 0.7300 0.7353 0.6809 0.6766 0.8172 0.7993 0.8539 0.7624 0.6436 0.8869
SAKT 0.7634 0.7206 0.8539 0.7101 0.7749 0.7227 0.7329 0.8202 0.6866 0.7797 0.7995 0.8535 0.8600 0.6387 0.8747
GKT 0.7702 0.7252 0.6697 0.7183 0.8124 0.7339 0.7372 0.7450 0.6971 0.7986 0.8023 0.8547 0.7403 0.6398 0.8788
AKT 0.7820 0.7320 0.5870 0.7113 0.8184 0.7665 0.7514 0.5909 0.6892 0.8172 0.8161 0.8593 0.5810 0.6398 0.8734
SKT 0.7732 0.7273 0.7023 0.7098 0.8092 0.7354 0.7398 0.7813 0.6952 0.7934 0.8045 0.8552 0.7792 0.6420 0.8805
LPKT 0.7869 0.7369 0.7909 0.7124 0.8205 0.7740 0.7556 0.8174 0.6839 0.8255 0.8153 0.8585 0.7238 0.6453 0.8845
DIMKT 0.7814 0.7351 0.7899 0.7153 0.8221 0.7711 0.7550 0.8099 0.6995 0.8198 0.8163 0.8594 0.8945 0.6424 0.8850
DTrans 0.7858 0.7345 0.8928 0.7126 0.8253 0.7720 0.7542 0.9217 0.6863 0.8249 0.8149 0.8577 0.9274 0.6420 0.8893
LBKT 0.7865 0.7372 0.8054 0.7134 0.8225 0.7763 0.7562 0.8123 0.6814 0.8230 0.8140 0.8568 0.8123 0.6409 0.8871
GRKT 0.7914* 0.7398* 1.0000* 0.7209* 0.8486* 0.7794* 0.7576 1.0000* 0.7064* 0.8319* 0.8207* 0.8624* 1.0000* 0.6473* 0.8957*
improv. 0.57% 0.35% 12.01% 0.36% 1.09% 0.40% 0.19% 8.50% 0.98% 0.78% 0.54% 0.35% 7.83% 0.31% 0.72%
Table 3. Results of the ablation experiments.
Dataset ASSIST09 ASSIST12 Junyi
Metric AUC ACC CONS GAUC RPT AUC ACC CONS GAUC RPT AUC ACC CONS GAUC RPT
GRKT 0.7914 0.7398 1.0000 0.7209 0.8486 0.7794 0.7576 1.0000 0.7064 0.8319 0.8207 0.8624 1.0000 0.6473 0.8957
-LF 0.7871 0.7367 1.0000 0.7066 0.8243 0.7767 0.7558 1.0000 0.6809 0.8276 0.8170 0.8598 1.0000 0.6401 0.8815
-SIM-PRE 0.7578 0.7161 1.0000 0.6197 0.8246 0.7502 0.7424 1.0000 0.6223 0.8291 0.7921 0.8481 1.0000 0.6084 0.8781
-SIM 0.7896 0.7375 1.0000 0.7135 0.8402 0.7777 0.7564 1.0000 0.6888 0.8259 0.8186 0.8611 1.0000 0.6447 0.8862
-PRE 0.7897 0.7384 1.0000 0.7149 0.8437 0.7779 0.7563 1.0000 0.6915 0.8264 0.8191 0.8615 1.0000 0.6452 0.8897

4.6. Model Training

The three-stage modeling is recurrent along the student response sequence. After learning/forgetting knowledge in the third stage, the updated knowledge memory is prepared for the first stage to answer the next question. This makes GRKT an end-to-end style so we directly train the model by the binary cross-entropy loss, aligning the predictive probability a^tusubscriptsuperscript^𝑎𝑢𝑡\hat{a}^{u}_{t}over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from Equation 11 with the ground-truth response correctness label atusubscriptsuperscript𝑎𝑢𝑡a^{u}_{t}italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

(25) =u𝒰rtuuatuloga^tu+(1atu)log(1a^tu).subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscript𝑢subscriptsuperscript𝑎𝑢𝑡subscriptsuperscript^𝑎𝑢𝑡1subscriptsuperscript𝑎𝑢𝑡1subscriptsuperscript^𝑎𝑢𝑡\mathcal{L}=-\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u}\in\mathcal{H}^{u}}a^{u}_{t}% \log\hat{a}^{u}_{t}+(1-a^{u}_{t})\log(1-\hat{a}^{u}_{t}).caligraphic_L = - ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_log over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) roman_log ( 1 - over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

Here, we omit the averaging notation for brevity. Besides, we also apply the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT normalization to the model parameters during the training process to avoid the over-fitting issue.

5. Experiments

In this section, we design comprehensive experiments to address the following research questions:

  • Q1:

    Does GRKT achieve competitive results in terms of both prediction performance and knowledge tracing reasonability compared to current state-of-the-art DLKT methods?

  • Q2:

    What are the roles and impacts of different components of GRKT on the overall performance and reasonability?

  • Q3:

    How reasonable is the knowledge mastery traced by GRKT from an intuitive perspective?

Additionally, we conduct other experiments such as hyper-parameter analysis. Due to space constraints, we include them in Appendix C.4.

5.1. Experimental Setup

5.1.1. Datasets

We evaluate the performance of GRKT on three widely-used public KT datasets:

For preprocessing each dataset, we partition the response sequences of every student into subsequences, each containing 100 responses. Subsequences containing fewer than 10 responses are eliminated, while those with less than 100 responses are padded with zeros to meet the required length. Statistics of the processed datasets can be found in Table 1.

5.1.2. Evaluation

As a binary classification task of predicting student responses, we utilize the area under the curve (AUC) and accuracy (ACC) as the evaluation metrics for prediction performance. For evaluating model reasonability, we introduce three metrics:

  • Consistency: We propose this metric to measure the ratio of consistent variation between the mastery of KCs. When a student’s mastery of the corresponding KC declines after answering a certain question, the mastery of other KCs should either decline (for related KCs) or remain unchanged (for unrelated KCs). We calculate this percentage.

  • GAUCM: This metric calculates the average AUC scores with respect to the mastery of each question’s examined KC. Its reflects the monotonicity assumption: a question could be more likely to be correctly answered if students have higher mastery of its KC. This metric is proposed by Zhang et al. (Zhang et al., 2023).

  • Repetition: This metric is proposed by Yeung et al. (Yeung and Yeung, 2018), stating that a reasonable KT method should satisfy: after a student has finished a question and is given this same question again, the response result (correct or incorrect) should remain the same. We calculate the accuracy under this circumstance.

The formulas of these metrics are presented in Appendix C.1. Moreover, we employ a five-fold cross-validation to assess the model’s performance. 10% of the sequences of each fold serve as the validation set for parameter tuning. We stop the training when the validation performance fails to improve for 10 consecutive epochs.

5.1.3. Baselines

To compare with mainstream DLKT methods covering different aspects, we select eleven baselines from 2015 to 2023, including DKT (Piech et al., 2015), DKVMN (Zhang et al., 2017), DKT+ (Yeung and Yeung, 2018), SAKT (Pandey and Karypis, 2019), GKT (Nakagawa et al., 2019), AKT (Ghosh et al., 2020), SKT (Tong et al., 2020), LPKT (Shen et al., 2021), DIMKT (Shen et al., 2022), Dtransformer (Yin et al., 2023) and LBKT (Xu et al., 2023). Among them, GKT and SKT leverages the KC graph, and LPKT leverage the timestamp information. DKT+, LPKT and Dtransformer consider some aspects of model reasonability: the knowledge tracing stability or learning/forgetting behaviors, but not comprehensively address the DLKT unreasonableness issue. For the methods not providing the proxy of tracing knowledge mastery, AKT and DIMKT, we follow previous works (Cui et al., 2023; Liu et al., 2019) that replace input question features with zeros to estimate the mastery. We note that cognitive diagnosis baselines are not considered because they usually focus on static testing environments (Leighton and Gierl, 2007) but we study in the dynamic learning situation.

5.1.4. Implementation Details

We employ the Adam optimizer (Kingma and Ba, 2014) for all methods to achieve their best performance. We choose their learning rates from {1e-2, 5e-3, 1e-3, 5e-4, 1e-4}, and fixed the embedding and hidden dimension numbers at 128 for fairness. We strictly follow the original papers of all methods to set their hyper-parameters. For GRKT, detailed hyper-parameter setting is referred in the Appendix C.2. Furthermore, for the non-negative constraint on the specified network weights in Equations 7 and 10, we use the softmax operation along the knowledge memory dimension, which performs best in practice. Besides, the Junyi dataset includes some labeled relations, which we experiment with and present the results in Appendix C.3.

5.2. Overall Performance (Q1)

Table 2 illustrates the comprehensive performance comparison between GRKT and eleven other baselines. Notably, GRKT showcases the highest efficacy, surpassing the leading baselines by margins ranging from 0.19% to 12.01% across both prediction performance and reasonability metrics. For metrics such as AUC and ACC, which primarily gauge predictive accuracy, the state-of-the-art DLKT techniques, LPKT, and DIMKT exhibit exemplary performance owing to their sophisticated neural architectures. Besides, methods that emphasize aspects of reasonability, such as enhancing knowledge tracing stability and explicitly modeling learning and forgetting behaviors, DKT+, LPKT, and DTransformer, demonstrate competitive performance across reasonableness metrics. These methods secure seven out of nine second-place positions in reasonability metrics. Remarkably, GRKT achieves a perfect score of 1.0 on the consistency metric, signifying its ability to effectively address the challenge of maintaining consistency in knowledge mastery changes across KCs by the network constraints.

Refer to caption
Figure 3. Case study of the same student’s evolving knowledge mastery exemplified in Section 1.
Refer to caption
Figure 4. Knowledge tracing heatmap of GRKT, LPKT and DKT tracing one another student’s mastery on KC Addition and Subtraction Integers. Different colors represent different KCs.

5.3. Ablation Study (Q2)

The ablation study aims to evaluate the impact of each component in GRKT by removing specific techniques and comparing the results with the full model. Four components are removed:

  • -LF: Removal of the third stage, knowledge learning/forgetting.

  • -SIM: Removal of the similarity relation.

  • -PRE: Removal of the prerequisite relation.

  • -SIM-PRE: Removal of the leverage of KC relation graphs.

As shown in Table 3, GRKT-SIM-PRE experiences the most significant deterioration, emphasizing the crucial role of KC relations in the KT task. Moreover, when only one of these two relations is utilized, there is a notable improvement in performance, indicating that each provides meaningful information for GRKT. Moreover, the performance is further enhanced when both relations are used together. Additionally, the degradation of GRKT-LF underscores the importance of modeling the knowledge learning/forgetting stage.

5.4. Reasonable Knowledge Tracing (Q3)

To intuitively validate the resonability of GRKT, we present one student’s dynamic knowledge mastery traced by GRKT in Figure 3. As depicted, the result aligns well with our hypothesis of a comprehensive and reasonable knowledge tracing model integrating various effects based on pedagogical theories. Furthermore, it addresses three key issues in the reasonableness of existing DLKT methods: mastery changes of unrelated KCs, not mastery changes of related KCs, and inconsistent mastery change direction. We also present GRKT, LPKT and DKT tracing one another student’s mastery on KC Addition and Subtraction Integers in Figure 4. As shown, GRKT yields reasonable knowledge tracing results such as the fine-grained knowledge changing from testing effects and the faded knowledge with forgetting curves. LPKT and DKT still have reasonable issues such as the mastery change of unrelated KCs and no mastery change of related KCs.

5.5. Complexity Analysis

Although the detailed methodology description of GRKT, its internal composition of only GNNs and MLPs does not make the inference complicated. Suppose t𝑡titalic_t is the length of response sequence, C𝐶Citalic_C is the KC set, E𝐸Eitalic_E is the KC relation edge set, d𝑑ditalic_d is the hidden dimension number we set as a small value of 16, and k𝑘kitalic_k is the GRKT’s memory dimension number. The time complexity of GRKT is then O(t|E|k+|E|d+t|C|k2+|C|d2+td2)𝑂𝑡𝐸𝑘𝐸𝑑𝑡𝐶superscript𝑘2𝐶superscript𝑑2𝑡superscript𝑑2O(t|E|k+|E|d+t|C|k^{2}+|C|d^{2}+td^{2})italic_O ( italic_t | italic_E | italic_k + | italic_E | italic_d + italic_t | italic_C | italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_C | italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_t italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), consisting of feature aggregation O(t|E|k+|E|d)𝑂𝑡𝐸𝑘𝐸𝑑O(t|E|k+|E|d)italic_O ( italic_t | italic_E | italic_k + | italic_E | italic_d ) and feature non-linear transformation O(t|C|k2+|C|d2)𝑂𝑡𝐶superscript𝑘2𝐶superscript𝑑2O(t|C|k^{2}+|C|d^{2})italic_O ( italic_t | italic_C | italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_C | italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) of the GNNs, and O(td2)𝑂𝑡superscript𝑑2O(td^{2})italic_O ( italic_t italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) of the MLPs. In contrast, other comparable attention or RNN-based methods usually have time complexity O(td2+t2d)𝑂𝑡superscript𝑑2superscript𝑡2𝑑O(td^{2}+t^{2}d)italic_O ( italic_t italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d ). In real scenarios, t,|C|,d𝑡𝐶𝑑t,|C|,ditalic_t , | italic_C | , italic_d usually lie in 100-200 and the KC relation graphs are sparse. Therefore, we can approximately assume t=d=|C|=k2=n𝑡𝑑𝐶superscript𝑘2𝑛t=d=|C|=k^{2}=nitalic_t = italic_d = | italic_C | = italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_n and |E|=k|C|𝐸𝑘𝐶|E|=k\cdot|C|| italic_E | = italic_k ⋅ | italic_C | to facilitate the complexity comparison, which indicates the GRKT’s time complexity is actually in the same order of magnitude O(n3)𝑂superscript𝑛3O(n^{3})italic_O ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) as other methods. We also test the inference speed of GRKT. It averagely costs 60ms for one student, which is acceptable in practice.

6. conclusion

In this paper, we point out the issue that many existing DLKT approaches prioritize predictive accuracy over tracking students’ dynamic knowledge mastery. This often results in models that yield unreasonable outcomes, complicating their application in real teaching scenarios. To this end, our study introduces GRKT, a graph-based reasonable knowledge tracing. It employs graph neural networks and consists of a finer-grained three-stage modeling process based on pedagogical theories, conducting a more reasonable knowledge tracing. Extensive experiments across multiple datasets demonstrate that GRKT not only enhances predictive accuracy but also generates more reasonable knowledge tracing results. In the future, we plan to address certain limitations of GRKT, such as enhancing the model’s ability to provide more fine-grained responses, including multiple-choice or essay answers. Furthermore, we would evaluate GRKT in real teaching scenarios.

References

  • (1)
  • Chang et al. (2015) Haw-Shiuan Chang, Hwai-Jung Hsu, and Kuan-Ta Chen. 2015. Modeling Exercise Relationships in E-Learning: A Unified Approach.. In EDM. 532–535.
  • Choi et al. (2020) Youngduck Choi, Youngnam Lee, Junghyun Cho, Jineon Baek, Byungsoo Kim, Yeongmin Cha, Dongmin Shin, Chan Bae, and Jaewe Heo. 2020. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the seventh ACM conference on learning@ scale. 341–344.
  • Corbett and Anderson (1994) Albert T Corbett and John R Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4 (1994), 253–278.
  • Cui et al. (2023) Jiajun Cui, Zeyuan Chen, Aimin Zhou, Jianyong Wang, and Wei Zhang. 2023. Fine-Grained Interaction Modeling with Multi-Relational Transformer for Knowledge Tracing. ACM Transactions on Information Systems 41, 4 (2023), 1–26.
  • Cui et al. (2024) Jiajun Cui, Minghe Yu, Bo Jiang, Aimin Zhou, Jianyong Wang, and Wei Zhang. 2024. Interpretable Knowledge Tracing via Response Influence-based Counterfactual Reasoning. In Proceedings of the 40th IEEE International Conference on Data Engineering.
  • Ebbinghaus (1885) Hermann Ebbinghaus. 1885. Über das gedächtnis: untersuchungen zur experimentellen psychologie. Duncker & Humblot.
  • Embretson and Reise (2013) Susan E Embretson and Steven P Reise. 2013. Item response theory. Psychology Press.
  • Feng et al. (2009) Mingyu Feng, Neil Heffernan, and Kenneth Koedinger. 2009. Addressing the assessment challenge with an online system that tutors as it assesses. User modeling and user-adapted interaction 19 (2009), 243–266.
  • Gan et al. (2022) Wenbin Gan, Yuan Sun, and Yi Sun. 2022. Knowledge structure enhanced graph representation learning model for attentive knowledge tracing. International Journal of Intelligent Systems 37, 3 (2022), 2012–2045.
  • Gao et al. (2023) Weibo Gao, Hao Wang, Qi Liu, Fei Wang, Xin Lin, Linan Yue, Zheng Zhang, Rui Lv, and Shijin Wang. 2023. Leveraging transferable knowledge concept graph embedding for cold-start cognitive diagnosis. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 983–992.
  • Ghosh et al. (2020) Aritra Ghosh, Neil Heffernan, and Andrew S Lan. 2020. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2330–2339.
  • Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Kornell et al. (2009) Nate Kornell, Matthew Jensen Hays, and Robert A Bjork. 2009. Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition 35, 4 (2009), 989.
  • Leighton and Gierl (2007) Jacqueline Leighton and Mark Gierl. 2007. Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press.
  • Liu et al. (2019) Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, and Guoping Hu. 2019. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering 33, 1 (2019), 100–115.
  • Liu et al. (2021) Qi Liu, Shuanghong Shen, Zhenya Huang, Enhong Chen, and Yonghe Zheng. 2021. A survey of knowledge tracing. arXiv preprint arXiv:2105.15106 (2021).
  • Melton (1963) Arthur W Melton. 1963. Implications of short-term memory for a general theory of memory. Journal of verbal Learning and verbal Behavior 2, 1 (1963), 1–21.
  • Nakagawa et al. (2019) Hiromi Nakagawa, Yusuke Iwasawa, and Yutaka Matsuo. 2019. Graph-based knowledge tracing: modeling student proficiency using graph neural network. In IEEE/WIC/ACM International Conference on Web Intelligence. 156–163.
  • Pandey and Karypis (2019) Shalini Pandey and George Karypis. 2019. A Self-Attentive Model for Knowledge Tracing. International Educational Data Mining Society (2019).
  • Pardos and Heffernan (2011) Zachary A Pardos and Neil T Heffernan. 2011. KT-IDEM: Introducing item difficulty to the knowledge tracing model. In User Modeling, Adaption and Personalization: 19th International Conference, UMAP 2011, Girona, Spain, July 11-15, 2011. Proceedings 19. Springer, 243–254.
  • Perkins et al. (1992) David N Perkins, Gavriel Salomon, et al. 1992. Transfer of learning. International encyclopedia of education 2 (1992), 6452–6457.
  • Piech et al. (2015) Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems 28 (2015).
  • Roediger III and Karpicke (2006) Henry L Roediger III and Jeffrey D Karpicke. 2006. Test-enhanced learning: Taking memory tests improves long-term retention. Psychological science 17, 3 (2006), 249–255.
  • Scarselli et al. (2008) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61–80.
  • Shen et al. (2022) Shuanghong Shen, Zhenya Huang, Qi Liu, Yu Su, Shijin Wang, and Enhong Chen. 2022. Assessing Student’s Dynamic Knowledge State by Exploring the Question Difficulty Effect. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 427–437.
  • Shen et al. (2021) Shuanghong Shen, Qi Liu, Enhong Chen, Zhenya Huang, Wei Huang, Yu Yin, Yu Su, and Shijin Wang. 2021. Learning process-consistent knowledge tracing. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1452–1460.
  • Song et al. (2022) Xiangyu Song, Jianxin Li, Qi Lei, Wei Zhao, Yunliang Chen, and Ajmal Mian. 2022. Bi-CLKT: Bi-graph contrastive learning based knowledge tracing. Knowledge-Based Systems 241 (2022), 108274.
  • Szegedy et al. (2017) Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
  • Tong et al. (2020) Shiwei Tong, Qi Liu, Wei Huang, Zhenya Hunag, Enhong Chen, Chuanren Liu, Haiping Ma, and Shijin Wang. 2020. Structure-based knowledge tracing: An influence propagation view. In 2020 IEEE international conference on data mining (ICDM). IEEE, 541–550.
  • Wang et al. (2022) Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yu Yin, Shijin Wang, and Yu Su. 2022. NeuralCD: a general framework for cognitive diagnosis. IEEE Transactions on Knowledge and Data Engineering (2022).
  • Wang et al. (2021) Xinping Wang, Caidie Huang, Jinfang Cai, and Liangyu Chen. 2021. Using knowledge concept aggregation towards accurate cognitive diagnosis. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2010–2019.
  • Wu et al. (2024) Siyu Wu, Yang Cao, Jiajun Cui, Runze Li, Hong Qian, Bo Jiang, and Wei Zhang. 2024. A Comprehensive Exploration of Personalized Learning in Smart Education: From Student Modeling to Personalized Recommendations. arXiv:2402.01666
  • Xu et al. (2023) Bihan Xu, Zhenya Huang, Jiayu Liu, Shuanghong Shen, Qi Liu, Enhong Chen, Jinze Wu, and Shijin Wang. 2023. Learning behavior-oriented knowledge tracing. In Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 2789–2800.
  • Yang et al. (2021) Yang Yang, Jian Shen, Yanru Qu, Yunfei Liu, Kerong Wang, Yaoming Zhu, Weinan Zhang, and Yong Yu. 2021. GIKT: a graph-based interaction model for knowledge tracing. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part I. Springer, 299–315.
  • Yelle (1979) Louis E Yelle. 1979. The learning curve: Historical review and comprehensive survey. Decision sciences 10, 2 (1979), 302–328.
  • Yeung and Yeung (2018) Chun-Kit Yeung and Dit-Yan Yeung. 2018. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the fifth annual ACM conference on learning at scale. 1–10.
  • Yin et al. (2023) Yu Yin, Le Dai, Zhenya Huang, Shuanghong Shen, Fei Wang, Qi Liu, Enhong Chen, and Xin Li. 2023. Tracing Knowledge Instead of Patterns: Stable Knowledge Tracing with Diagnostic Transformer. In Proceedings of the ACM Web Conference 2023. 855–864.
  • Yudelson et al. (2013) Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon. 2013. Individualized bayesian knowledge tracing models. In Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings 16. Springer, 171–180.
  • Zhang et al. (2017) Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web. 765–774.
  • Zhang et al. (2023) Moyu Zhang, Xinning Zhu, Chunhong Zhang, Wenchen Qian, Feng Pan, and Hui Zhao. 2023. Counterfactual Monotonic Knowledge Tracing for Assessing Students’ Dynamic Mastery of Knowledge Concepts. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 3236–3246.
Table 4. The notation table of GRKT. We omit the superscript of the target student u𝑢uitalic_u whose knowledge is to be traced.
Task formulation
𝒰,𝒬,𝒞𝒰𝒬𝒞\mathcal{U},\mathcal{Q},\mathcal{C}caligraphic_U , caligraphic_Q , caligraphic_C sets of students, questions, KCs
ci,cj,qi,qjsubscript𝑐𝑖subscript𝑐𝑗subscript𝑞𝑖subscript𝑞𝑗c_{i},c_{j},q_{i},q_{j}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT certain KCs, questions
u,t𝑢𝑡u,titalic_u , italic_t the target student, time step
\mathcal{H}caligraphic_H response history of u𝑢uitalic_u
rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT response of u𝑢uitalic_u at t𝑡titalic_t
qt,ctsubscript𝑞𝑡subscript𝑐𝑡q_{t},c_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT question and examined KC of rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
at,Ttsubscript𝑎𝑡subscript𝑇𝑡a_{t},T_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT binary correctness and timestamp of rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
\mathcal{M}caligraphic_M evolving knowledge mastery of u𝑢uitalic_u
mtsubscriptm𝑡\textbf{m}_{t}m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT knowledge mastery of u𝑢uitalic_u at t𝑡titalic_t
mci,tsubscript𝑚subscript𝑐𝑖𝑡m_{c_{i},t}italic_m start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT knowledge mastery of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of u𝑢uitalic_u at t𝑡titalic_t
KC relation-based GNN
𝒫,𝒮,𝒫𝒮\mathcal{P},\mathcal{S},\mathcal{R}caligraphic_P , caligraphic_S , caligraphic_R prerequisite, subsequence, similarity graphs
𝒫(),𝒮(),()𝒫𝒮\mathcal{P}(\cdot),\mathcal{S}(\cdot),\mathcal{R}(\cdot)caligraphic_P ( ⋅ ) , caligraphic_S ( ⋅ ) , caligraphic_R ( ⋅ ) neighbor functions of 𝒫,𝒮,𝒫𝒮\mathcal{P},\mathcal{S},\mathcal{R}caligraphic_P , caligraphic_S , caligraphic_R
𝒢𝒢\mathcal{G}caligraphic_G certain graph in 𝒫,𝒮,𝒫𝒮\mathcal{P},\mathcal{S},\mathcal{R}caligraphic_P , caligraphic_S , caligraphic_R
𝒢()𝒢\mathcal{G}(\cdot)caligraphic_G ( ⋅ ) neighbor function of 𝒢𝒢\mathcal{G}caligraphic_G
L𝐿Litalic_L number of GNN layers
GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT prototype of KC relation-based GNN
d0,d1,,dLsubscript𝑑0subscript𝑑1subscript𝑑𝐿d_{0},d_{1},...,d_{L}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT # of dimensions of prototype GNN’s layers
f~ci(0),F~(0)superscriptsubscript~fsubscript𝑐𝑖0superscript~F0\tilde{\textbf{f}}_{c_{i}}^{(0)},\tilde{\textbf{F}}^{(0)}over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT prototype input of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all to GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT
f~ci(L),F~(L)superscriptsubscript~fsubscript𝑐𝑖𝐿superscript~F𝐿\tilde{\textbf{f}}_{c_{i}}^{(L)},\tilde{\textbf{F}}^{(L)}over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT prototype output of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all from GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT
f~ci(l),F~(l)superscriptsubscript~fsubscript𝑐𝑖𝑙superscript~F𝑙\tilde{\textbf{f}}_{c_{i}}^{(l)},\tilde{\textbf{F}}^{(l)}over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , over~ start_ARG F end_ARG start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT prototype intermedium of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all of
lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT layer of GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT
Wproto𝒢,(l),Oproto𝒢,(l)subscriptsuperscriptW𝒢𝑙𝑝𝑟𝑜𝑡𝑜subscriptsuperscriptO𝒢𝑙𝑝𝑟𝑜𝑡𝑜\textbf{W}^{\mathcal{G},(l)}_{proto},\textbf{O}^{\mathcal{G},(l)}_{proto}W start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT , O start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT weight matrices of lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT of GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT for 𝒢𝒢\mathcal{G}caligraphic_G
f~ci𝒢,(l)superscriptsubscript~fsubscript𝑐𝑖𝒢𝑙\tilde{\textbf{f}}_{c_{i}}^{\mathcal{G},(l)}over~ start_ARG f end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_G , ( italic_l ) end_POSTSUPERSCRIPT prototype intermedium of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT layer
of GNNprotosubscriptGNN𝑝𝑟𝑜𝑡𝑜\text{GNN}_{proto}GNN start_POSTSUBSCRIPT italic_p italic_r italic_o italic_t italic_o end_POSTSUBSCRIPT for 𝒢𝒢\mathcal{G}caligraphic_G
GRKT basic factors
eqi,eqt,kci,kctsubscriptesubscript𝑞𝑖subscriptesubscript𝑞𝑡subscriptksubscript𝑐𝑖subscriptksubscript𝑐𝑡\textbf{e}_{q_{i}},\textbf{e}_{q_{t}},\textbf{k}_{c_{i}},\textbf{k}_{c_{t}}e start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , e start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT embeddings of qi,qt,ci,ctsubscript𝑞𝑖subscript𝑞𝑡subscript𝑐𝑖subscript𝑐𝑡q_{i},q_{t},c_{i},c_{t}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
e¯qi,e¯qtsubscript¯esubscript𝑞𝑖subscript¯esubscript𝑞𝑡\bar{\textbf{e}}_{q_{i}},\bar{\textbf{e}}_{q_{t}}over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG e end_ARG start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT concatenation of qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its KC’s embeddings
αqt,cjsubscript𝛼subscript𝑞𝑡subscript𝑐𝑗\alpha_{q_{t},c_{j}}italic_α start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT requirement score of qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT requiring cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
WreqsubscriptW𝑟𝑒𝑞\textbf{W}_{req}W start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT matrix to calculate requiring scores
βci,cj𝒢subscriptsuperscript𝛽𝒢subscript𝑐𝑖subscript𝑐𝑗\beta^{\mathcal{G}}_{c_{i},c_{j}}italic_β start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT correlation score of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for 𝒢𝒢\mathcal{G}caligraphic_G
Wcor𝒢subscriptsuperscriptW𝒢𝑐𝑜𝑟\textbf{W}^{\mathcal{G}}_{cor}W start_POSTSUPERSCRIPT caligraphic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c italic_o italic_r end_POSTSUBSCRIPT matrix to calculate correlation scores for 𝒢𝒢\mathcal{G}caligraphic_G
H0subscriptH0\textbf{H}_{0}H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT initial knowledge memory of u𝑢uitalic_u
HtsubscriptHsuperscript𝑡\textbf{H}_{t^{-}}H start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT knowledge memory of u𝑢uitalic_u at a moment before t𝑡titalic_t
HtsubscriptH𝑡\textbf{H}_{t}H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT knowledge memory of u𝑢uitalic_u at t𝑡titalic_t
hci,tsubscripthsubscript𝑐𝑖𝑡\textbf{h}_{c_{i},t}h start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT knowledge memory of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of u𝑢uitalic_u at t𝑡titalic_t
de,dk,dhsubscript𝑑𝑒subscript𝑑𝑘subscript𝑑d_{e},d_{k},d_{h}italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT  embedding, memory, and hidden dimensions
whsubscriptw\textbf{w}_{h}w start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT vector to project knowledge memory to mastery
m^ci,tsubscript^𝑚subscript𝑐𝑖𝑡\hat{m}_{c_{i},t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT modeled knowledge mastery of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at t𝑡titalic_t
dqtsubscript𝑑subscript𝑞𝑡d_{q_{t}}italic_d start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT question difficulty of qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
MLPdiffsubscriptMLP𝑑𝑖𝑓𝑓\text{MLP}_{diff}MLP start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT MLP to generate question difficulty
Wdiff(1),Wdiff(2)superscriptsubscriptW𝑑𝑖𝑓𝑓1superscriptsubscriptW𝑑𝑖𝑓𝑓2\textbf{W}_{diff}^{(1)},\textbf{W}_{diff}^{(2)}W start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , W start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT weight matrices in MLPdiffsubscriptMLP𝑑𝑖𝑓𝑓\text{MLP}_{diff}MLP start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT
bdiff(1),bdiff(2)superscriptsubscriptb𝑑𝑖𝑓𝑓1superscriptsubscriptb𝑑𝑖𝑓𝑓2\textbf{b}_{diff}^{(1)},\textbf{b}_{diff}^{(2)}b start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , b start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT weight vectors in MLPdiffsubscriptMLP𝑑𝑖𝑓𝑓\text{MLP}_{diff}MLP start_POSTSUBSCRIPT italic_d italic_i italic_f italic_f end_POSTSUBSCRIPT
Table 5. The continuing notation table for the three stages.
Stage I: knowledge retrieval
GNNrtvsubscriptGNN𝑟𝑡𝑣\text{GNN}_{rtv}GNN start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT KC relation-based GNN for knowledge retrieval
h~ci,t(0),H~t(0)superscriptsubscript~hsubscript𝑐𝑖superscript𝑡0superscriptsubscript~Hsuperscript𝑡0\tilde{\textbf{h}}_{c_{i},t^{-}}^{(0)},\tilde{\textbf{H}}_{t^{-}}^{(0)}over~ start_ARG h end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , over~ start_ARG H end_ARG start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT memory input of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all to GNNrtvsubscriptGNN𝑟𝑡𝑣\text{GNN}_{rtv}GNN start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT before t𝑡titalic_t
h~ci,t(L),H~t(L)superscriptsubscript~hsubscript𝑐𝑖superscript𝑡𝐿subscriptsuperscript~H𝐿superscript𝑡\tilde{\textbf{h}}_{c_{i},t^{-}}^{(L)},\tilde{\textbf{H}}^{(L)}_{t^{-}}over~ start_ARG h end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , over~ start_ARG H end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT memory output of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all from GNNrtvsubscriptGNN𝑟𝑡𝑣\text{GNN}_{rtv}GNN start_POSTSUBSCRIPT italic_r italic_t italic_v end_POSTSUBSCRIPT before t𝑡titalic_t
a^tsubscript^𝑎𝑡\hat{a}_{t}over^ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT predictive probability of atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
Stage II: memory strengthening
MLPgainsubscriptMLP𝑔𝑎𝑖𝑛\text{MLP}_{gain}MLP start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT MLP to get memory feature for knowledge gain
gct,tsubscriptgsubscript𝑐𝑡𝑡\textbf{g}_{c_{t},t}g start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT memory feature of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at t𝑡titalic_t for knowledge gain
GNNgainsubscriptGNN𝑔𝑎𝑖𝑛\text{GNN}_{gain}GNN start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT KC relation-based GNN for knowledge gain
g~ct,t(0),G~t(0)superscriptsubscript~gsubscript𝑐𝑡𝑡0superscriptsubscript~G𝑡0\tilde{\textbf{g}}_{c_{t},t}^{(0)},\tilde{\textbf{G}}_{t}^{(0)}over~ start_ARG g end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , over~ start_ARG G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT memory feature input of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and all to GNNgainsubscriptGNN𝑔𝑎𝑖𝑛\text{GNN}_{gain}GNN start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT at t𝑡titalic_t
g~ct,t(L),G~t(L)superscriptsubscript~gsubscript𝑐𝑡𝑡𝐿superscriptsubscript~G𝑡𝐿\tilde{\textbf{g}}_{c_{t},t}^{(L)},\tilde{\textbf{G}}_{t}^{(L)}over~ start_ARG g end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , over~ start_ARG G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT knowledge gain of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and all from GNNgainsubscriptGNN𝑔𝑎𝑖𝑛\text{GNN}_{gain}GNN start_POSTSUBSCRIPT italic_g italic_a italic_i italic_n end_POSTSUBSCRIPT at t𝑡titalic_t
MLPlosssubscriptMLP𝑙𝑜𝑠𝑠\text{MLP}_{loss}MLP start_POSTSUBSCRIPT italic_l italic_o italic_s italic_s end_POSTSUBSCRIPT MLP to get memory feature for knowledge loss
lct,tsubscriptlsubscript𝑐𝑡𝑡\textbf{l}_{c_{t},t}l start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT memory feature of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at t𝑡titalic_t for knowledge loss
GNNlosssubscriptGNN𝑙𝑜𝑠𝑠\text{GNN}_{loss}GNN start_POSTSUBSCRIPT italic_l italic_o italic_s italic_s end_POSTSUBSCRIPT KC relation-based GNN for knowledge loss
l~ct,t(0),L~t(0)superscriptsubscript~lsubscript𝑐𝑡𝑡0superscriptsubscript~L𝑡0\tilde{\textbf{l}}_{c_{t},t}^{(0)},\tilde{\textbf{L}}_{t}^{(0)}over~ start_ARG l end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , over~ start_ARG L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT memory feature input of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and all to GNNlosssubscriptGNN𝑙𝑜𝑠𝑠\text{GNN}_{loss}GNN start_POSTSUBSCRIPT italic_l italic_o italic_s italic_s end_POSTSUBSCRIPT at t𝑡titalic_t
l~ci,t(L),L~t(L)superscriptsubscript~lsubscript𝑐𝑖𝑡𝐿superscriptsubscript~L𝑡𝐿\tilde{\textbf{l}}_{c_{i},t}^{(L)},\tilde{\textbf{L}}_{t}^{(L)}over~ start_ARG l end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , over~ start_ARG L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT knowledge loss of ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and all from GNNlosssubscriptGNN𝑙𝑜𝑠𝑠\text{GNN}_{loss}GNN start_POSTSUBSCRIPT italic_l italic_o italic_s italic_s end_POSTSUBSCRIPT at t𝑡titalic_t
Stage III: knowledge learning/forgetting
MLPdscsubscriptMLP𝑑𝑠𝑐\text{MLP}_{dsc}MLP start_POSTSUBSCRIPT italic_d italic_s italic_c end_POSTSUBSCRIPT MLP to get policy distribution for active learning
πci,tsubscript𝜋subscript𝑐𝑖𝑡\pi_{c_{i},t}italic_π start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT policy distribution if u𝑢uitalic_u decide to learn cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at t𝑡titalic_t
MLPprgsubscriptMLP𝑝𝑟𝑔\text{MLP}_{prg}MLP start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT MLP to get initial knowledge progress
pci,tsubscriptpsubscript𝑐𝑖𝑡\textbf{p}_{c_{i},t}p start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT initial knowledge progress of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at t𝑡titalic_t
GNNprgsubscriptGNN𝑝𝑟𝑔\text{GNN}_{prg}GNN start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT KC relation-based GNN for knowledge progress
p~ci,t(0),P~t(0)superscriptsubscript~psubscript𝑐𝑖𝑡0superscriptsubscript~P𝑡0\tilde{\textbf{p}}_{c_{i},t}^{(0)},\tilde{\textbf{P}}_{t}^{(0)}over~ start_ARG p end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT , over~ start_ARG P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT initial progress input of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all to GNNprgsubscriptGNN𝑝𝑟𝑔\text{GNN}_{prg}GNN start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT at t𝑡titalic_t
p~ci,t(L),P~t(L)superscriptsubscript~psubscript𝑐𝑖𝑡𝐿superscriptsubscript~P𝑡𝐿\tilde{\textbf{p}}_{c_{i},t}^{(L)},\tilde{\textbf{P}}_{t}^{(L)}over~ start_ARG p end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , over~ start_ARG P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT knowledge progress of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and all from GNNprgsubscriptGNN𝑝𝑟𝑔\text{GNN}_{prg}GNN start_POSTSUBSCRIPT italic_p italic_r italic_g end_POSTSUBSCRIPT at t𝑡titalic_t
ΔTt+1Δsubscript𝑇𝑡1\Delta T_{t+1}roman_Δ italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT time interval between Ttsubscript𝑇𝑡T_{t}italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Tt+1subscript𝑇𝑡1T_{t+1}italic_T start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT
ϕcisubscriptbold-italic-ϕsubscript𝑐𝑖\boldsymbol{\phi}_{c_{i}}bold_italic_ϕ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT KC-specific time-aware kernel for learning cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
nci,tsubscript𝑛subscript𝑐𝑖𝑡n_{c_{i},t}italic_n start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT # of times u𝑢uitalic_u has learnt cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
GNNlrnsubscriptGNN𝑙𝑟𝑛\text{GNN}_{lrn}GNN start_POSTSUBSCRIPT italic_l italic_r italic_n end_POSTSUBSCRIPT KC relation-based GNN to get parameters of 𝜸cisubscript𝜸subscript𝑐𝑖\boldsymbol{\gamma}_{c_{i}}bold_italic_γ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
𝜸~ci(0)superscriptsubscript~𝜸subscript𝑐𝑖0\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(0)}over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT input feature of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT initialized as kcisubscriptksubscript𝑐𝑖\textbf{k}_{c_{i}}k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT to GNNlrnsubscriptGNN𝑙𝑟𝑛\text{GNN}_{lrn}GNN start_POSTSUBSCRIPT italic_l italic_r italic_n end_POSTSUBSCRIPT
𝜸~ci(L)superscriptsubscript~𝜸subscript𝑐𝑖𝐿\tilde{\boldsymbol{\gamma}}_{c_{i}}^{(L)}over~ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT output parameters of 𝜸cisubscript𝜸subscript𝑐𝑖\boldsymbol{\gamma}_{c_{i}}bold_italic_γ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from GNNlrnsubscriptGNN𝑙𝑟𝑛\text{GNN}_{lrn}GNN start_POSTSUBSCRIPT italic_l italic_r italic_n end_POSTSUBSCRIPT
𝜿cisubscript𝜿subscript𝑐𝑖\boldsymbol{\kappa}_{c_{i}}bold_italic_κ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT KC-specific time-aware kernel for forgetting cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
GNNfgtsubscriptGNN𝑓𝑔𝑡\text{GNN}_{fgt}GNN start_POSTSUBSCRIPT italic_f italic_g italic_t end_POSTSUBSCRIPT KC relation-based GNN to get parameters of 𝜽cisubscript𝜽subscript𝑐𝑖\boldsymbol{\theta}_{c_{i}}bold_italic_θ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
𝜽~ci(0)superscriptsubscript~𝜽subscript𝑐𝑖0\tilde{\boldsymbol{\theta}}_{c_{i}}^{(0)}over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT input feature of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT initialized as kcisubscriptksubscript𝑐𝑖\textbf{k}_{c_{i}}k start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT to GNNfgtsubscriptGNN𝑓𝑔𝑡\text{GNN}_{fgt}GNN start_POSTSUBSCRIPT italic_f italic_g italic_t end_POSTSUBSCRIPT
𝜽~ci(L)superscriptsubscript~𝜽subscript𝑐𝑖𝐿\tilde{\boldsymbol{\theta}}_{c_{i}}^{(L)}over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT output parameters of 𝜿cisubscript𝜿subscript𝑐𝑖\boldsymbol{\kappa}_{c_{i}}bold_italic_κ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from GNNfgtsubscriptGNN𝑓𝑔𝑡\text{GNN}_{fgt}GNN start_POSTSUBSCRIPT italic_f italic_g italic_t end_POSTSUBSCRIPT

Appendix A Notation Table

We list and explain the notations in our methodology introduction in Table 4 and 5.

Appendix B Method Details

B.1. KC Relation Graph Construction

In the absence of KC relation annotations in the datasets, we construct the KC relation graph based on data statistics. For the similarity between KCs cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we estimate their similarity score using:

simci,cj=u𝒰rtu,rtuuI(atu=atu,ctu=ci,ctu=cj)u𝒰rtu,rtuuI(ctu=ci,ctu=cj),𝑠𝑖subscript𝑚subscript𝑐𝑖subscript𝑐𝑗subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscriptsubscript𝑟superscript𝑡𝑢superscript𝑢𝐼formulae-sequencesubscriptsuperscript𝑎𝑢𝑡subscriptsuperscript𝑎𝑢superscript𝑡formulae-sequencesubscriptsuperscript𝑐𝑢𝑡subscript𝑐𝑖subscriptsuperscript𝑐𝑢superscript𝑡subscript𝑐𝑗subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscriptsubscript𝑟superscript𝑡𝑢superscript𝑢𝐼formulae-sequencesubscriptsuperscript𝑐𝑢𝑡subscript𝑐𝑖subscriptsuperscript𝑐𝑢superscript𝑡subscript𝑐𝑗sim_{c_{i},c_{j}}=\frac{\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u},r_{t^{\prime}}^{% u}\in\mathcal{H}^{u}}I(a^{u}_{t}=a^{u}_{t^{\prime}},c^{u}_{t}=c_{i},c^{u}_{t^{% \prime}}=c_{j})}{\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u},r_{t^{\prime}}^{u}\in% \mathcal{H}^{u}}I(c^{u}_{t}=c_{i},c^{u}_{t^{\prime}}=c_{j})},italic_s italic_i italic_m start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_I ( italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_I ( italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG ,

where I()𝐼I(\cdot)italic_I ( ⋅ ) is the indicator function that takes value 1 if the condition is satisfied. This approximates the probability that a student could answer questions of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT correctly while he/her could also answer questions of cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT correctly (or both incorrectly), indicating an underlying similarity between them.

For the prerequisite relationship between cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we assume that if cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is prerequisite to cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then answering questions of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT correctly but cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT incorrectly is more likely than answering questions of cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT incorrectly but cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT correctly. Therefore, we use:

preci,cj=u𝒰rtu,rtuuI(atu=1,atu=0,ctu=ci,ctu=cj)u𝒰rtu,rtuuI(atuatu,ctu=ci,ctu=cj),𝑝𝑟subscript𝑒subscript𝑐𝑖subscript𝑐𝑗subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscriptsubscript𝑟superscript𝑡𝑢superscript𝑢𝐼formulae-sequencesubscriptsuperscript𝑎𝑢𝑡1formulae-sequencesubscriptsuperscript𝑎𝑢superscript𝑡0formulae-sequencesubscriptsuperscript𝑐𝑢𝑡subscript𝑐𝑖subscriptsuperscript𝑐𝑢superscript𝑡subscript𝑐𝑗subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscriptsubscript𝑟superscript𝑡𝑢superscript𝑢𝐼formulae-sequencesubscriptsuperscript𝑎𝑢𝑡subscriptsuperscript𝑎𝑢superscript𝑡formulae-sequencesubscriptsuperscript𝑐𝑢𝑡subscript𝑐𝑖subscriptsuperscript𝑐𝑢superscript𝑡subscript𝑐𝑗pre_{c_{i},c_{j}}=\frac{\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u},r_{t^{\prime}}^{% u}\in\mathcal{H}^{u}}I(a^{u}_{t}=1,a^{u}_{t^{\prime}}=0,c^{u}_{t}=c_{i},c^{u}_% {t^{\prime}}=c_{j})}{\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u},r_{t^{\prime}}^{u}% \in\mathcal{H}^{u}}I(a^{u}_{t}\neq a^{u}_{t^{\prime}},c^{u}_{t}=c_{i},c^{u}_{t% ^{\prime}}=c_{j})},italic_p italic_r italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_I ( italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 , italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0 , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_I ( italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG ,

to approximate the probability that cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a prerequisite to cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Finally, we set a threshold η𝜂\etaitalic_η to determine whether cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is similar/prerequisite to cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (by simci,cjη𝑠𝑖subscript𝑚subscript𝑐𝑖subscript𝑐𝑗𝜂sim_{c_{i},c_{j}}\geq\etaitalic_s italic_i italic_m start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_η and preci,cjη𝑝𝑟subscript𝑒subscript𝑐𝑖subscript𝑐𝑗𝜂pre_{c_{i},c_{j}}\geq\etaitalic_p italic_r italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≥ italic_η, respectively). Additionally, KC pairs with a co-occurrence frequency under 10 times in the dataset are not considered.

Table 6. Hyperparameter setting of GRKT applying for the three datasets.
Parameter ASSIST09 ASSIST12 Junyi
lr𝑙𝑟lritalic_l italic_r 5e-3 5e-3 5e-3
L𝐿Litalic_L 2 2 2
dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 16 16 16
η𝜂\etaitalic_η 0.6 0.7 0.8
l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1e-6 1e-5 1e-5

Appendix C Supplements for Experiments

C.1. Metrics for Reasonability

We formulate the three metrics for model reasonability in this section:

C.1.1. Consistency

This metric measures the ratio of consistent variation between the mastery of KCs:

(26) Consistency=u𝒰rtuuci𝒞I(mci,tumci,t+1u)ci𝒞I(mctu,tumctu,t+1u).𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦subscript𝑢𝒰subscriptsuperscriptsubscript𝑟𝑡𝑢superscript𝑢subscriptsubscript𝑐𝑖𝒞𝐼superscriptsubscript𝑚subscript𝑐𝑖𝑡𝑢superscriptsubscript𝑚subscript𝑐𝑖𝑡1𝑢subscriptsubscript𝑐𝑖𝒞𝐼superscriptsubscript𝑚subscriptsuperscript𝑐𝑢𝑡𝑡𝑢superscriptsubscript𝑚subscriptsuperscript𝑐𝑢𝑡𝑡1𝑢Consistency=\sum_{u\in\mathcal{U}}\sum_{r_{t}^{u}\in\mathcal{H}^{u}}\frac{\sum% _{c_{i}\in\mathcal{C}}I(m_{c_{i},t}^{u}\geq m_{c_{i},t+1}^{u})}{\sum_{c_{i}\in% \mathcal{C}}I(m_{c^{u}_{t},t}^{u}\geq m_{c^{u}_{t},t+1}^{u})}.italic_C italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_c italic_y = ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C end_POSTSUBSCRIPT italic_I ( italic_m start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ≥ italic_m start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C end_POSTSUBSCRIPT italic_I ( italic_m start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ≥ italic_m start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) end_ARG .

Here, we omit the averaging operation over the students and responses for conciseness. We only consider the situation where a student’s mastery of the learnt KC of the current question declines while other KCs do not increase, instead of the current one increasing and the others declining. This is because the latter case might be due to natural forgetting behaviors.

C.1.2. GAUCM

This metric calculates the average AUC scores with respect to the mastery of each question’s examined KC:

(27) GAUCM=qi𝒬N(qi)AUC[{m^ctu,tu},{atu}]u𝒰,rtuuqtu=qiqi𝒬N(qi).𝐺𝐴𝑈𝐶𝑀subscriptsubscript𝑞𝑖𝒬𝑁subscript𝑞𝑖𝐴𝑈𝐶subscriptsuperscriptsubscriptsuperscript^𝑚𝑢subscriptsuperscript𝑐𝑢𝑡𝑡subscriptsuperscript𝑎𝑢𝑡subscriptsuperscript𝑞𝑢𝑡subscript𝑞𝑖formulae-sequence𝑢𝒰superscriptsubscript𝑟𝑡𝑢superscript𝑢subscriptsubscript𝑞𝑖𝒬𝑁subscript𝑞𝑖GAUCM=\frac{\sum_{q_{i}\in\mathcal{Q}}N(q_{i})\cdot AUC\left[\{\hat{m}^{u}_{c^% {u}_{t},t}\},\{a^{u}_{t}\}\right]^{q^{u}_{t}=q_{i}}_{u\in\mathcal{U},r_{t}^{u}% \in\mathcal{H}^{u}}}{\sum_{q_{i}\in\mathcal{Q}}N(q_{i})}.italic_G italic_A italic_U italic_C italic_M = divide start_ARG ∑ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_Q end_POSTSUBSCRIPT italic_N ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_A italic_U italic_C [ { over^ start_ARG italic_m end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT } , { italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } ] start_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u ∈ caligraphic_U , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_Q end_POSTSUBSCRIPT italic_N ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG .

AUC[𝒴^,𝒴]AB𝐴𝑈𝐶superscriptsubscript^𝒴𝒴𝐴𝐵AUC[\hat{\mathcal{Y}},\mathcal{Y}]_{A}^{B}italic_A italic_U italic_C [ over^ start_ARG caligraphic_Y end_ARG , caligraphic_Y ] start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT indicates the AUC score of the prediction set 𝒴^^𝒴\hat{\mathcal{Y}}over^ start_ARG caligraphic_Y end_ARG and the ground-truth set 𝒴𝒴\mathcal{Y}caligraphic_Y, given the range A𝐴Aitalic_A and the condition B𝐵Bitalic_B. N(qi)𝑁subscript𝑞𝑖N(q_{i})italic_N ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the number of qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being answered. For evaluating GRKT, we use the aggregated mastery instead of the single KC’s mastery to calculate AUC because we consider the transfer of learning theory that students may leverage related KCs to solve questions.

C.1.3. Repetition

This metric supposes that a reasonable KT method should adhere to the following rule: after a student has finished a question and is given the same question again, the response result (correct or incorrect) should remain the same:

(28) Repetition=ACC[{KT(qtu|{rtu|1tt})},{atu}]u𝒰,rtuu.𝑅𝑒𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛𝐴𝐶𝐶subscriptKTconditionalsubscriptsuperscript𝑞𝑢𝑡conditional-setsuperscriptsubscript𝑟superscript𝑡𝑢1superscript𝑡𝑡subscriptsuperscript𝑎𝑢𝑡formulae-sequence𝑢𝒰superscriptsubscript𝑟𝑡𝑢superscript𝑢Repetition=ACC\left[\{\textbf{KT}(q^{u}_{t}|\{r_{t^{\prime}}^{u}|1\leq t^{% \prime}\leq t\})\},\{a^{u}_{t}\}\right]_{u\in\mathcal{U},r_{t}^{u}\in\mathcal{% H}^{u}}.italic_R italic_e italic_p italic_e italic_t italic_i italic_t italic_i italic_o italic_n = italic_A italic_C italic_C [ { KT ( italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | { italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | 1 ≤ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_t } ) } , { italic_a start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } ] start_POSTSUBSCRIPT italic_u ∈ caligraphic_U , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT .

ACC()𝐴𝐶𝐶ACC(\cdot)italic_A italic_C italic_C ( ⋅ ) denotes the accuracy score whose notation is similar to the AUC()𝐴𝑈𝐶AUC(\cdot)italic_A italic_U italic_C ( ⋅ ) in Equation 27. KT(qtu|{rtu|1tt})KTconditionalsubscriptsuperscript𝑞𝑢𝑡conditional-setsuperscriptsubscript𝑟superscript𝑡𝑢1superscript𝑡𝑡\textbf{KT}(q^{u}_{t}|\{r_{t^{\prime}}^{u}|1\leq t^{\prime}\leq t\})KT ( italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | { italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | 1 ≤ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_t } ) denotes the prediction score if u𝑢uitalic_u could correctly answer qtusubscriptsuperscript𝑞𝑢𝑡q^{u}_{t}italic_q start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT given his/her past t𝑡titalic_t responses {rtu|1tt}conditional-setsuperscriptsubscript𝑟superscript𝑡𝑢1superscript𝑡𝑡\{r_{t^{\prime}}^{u}|1\leq t^{\prime}\leq t\}{ italic_r start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT | 1 ≤ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_t } including the response to qtusuperscriptsubscript𝑞𝑡𝑢q_{t}^{u}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT itself.

C.2. Hyper-parameter Setting

We provide the hyper-parameter settings in Table 6. The notations on the left side indicate the learning rate, the number of GNN layers, the number of knowledge memory dimensions, the graph construction threshold, and the value of l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT normalization.

Table 7. Comparison of GRKT applied to the Junyi dataset with labeled KC relations (GRKT-L), statistics-based relations (GRKT-S), and no relations (GRKT-0). The two values in the “sparsity” column respectively denote the constructed KC similarity and prerequisite graphs’ sparsity.
Model Sparsity AUC ACC CONS GAUC RPT
GRKT-S 0.171, 0.169 0.8207 0.8624 1.0000 0.6473 0.8957
GRKT-L 0.006, 0.003 0.8108 0.8562 1.0000 0.6423 0.8861
GRKT-0 0.000, 0.000 0.7921 0.8481 1.0000 0.6084 0.8781

C.3. Experimental Results Using labeled Graph Relations

The Junyi dataset includes KC similarity and prerequisite relations annotated by experts with confidence scores ranging from 1 to 9. We select relations with average scores higher than 5 as graph edges. Table 7 presents the experimental results of GRKT leveraging expert-labeled relations compared with statistics-based relations and no relations. As shown, the graphs established on expert annotations are too sparse, with only an average of 1-2 related KCs for one KC, which may not reflect real scenarios. Despite the experimental results based on expert-labeled relations being inferior to the statistics-based version, they still exhibit noticeable improvement compared to the version without any relations.

Refer to caption
Figure 5. Experimental results analyzing the effects of hyper-parameters in GRKT are presented. The green and red decimals on the right side respectively indicate the sparsity of the constructed KC similarity and prerequisite graphs based on the specified threshold.

C.4. Hyper-parameter Analysis

We conduct experiments to analyze the effects of various hyperparameters on GRKT’s performance. The experiments are performed on the two ASSIST datasets, as shown in Figure 5. The results show that setting the number of layers in the KC relation-based graphs to 2 achieves the best performance for GRKT, suggesting that retrieving information from further distances over the graph can enhance the model. However, employing more layers may lead to overfitting issues. For the KC graph construction threshold, the performance peaks at around 0.6 to 0.8. In this interval, the sparsity of the two graphs ranges from 0.01 to 0.3, indicating that too many relations lead to structural redundancy, while too few result in limited information sharing between KCs.