Improving Students Academic Performance With AI A
Improving Students Academic Performance With AI A
Improving Students Academic Performance With AI A
Yixin Cheng
November 2021
© Yixin Cheng 2021
Except where otherwise indicated, this report is my own original work.
Yixin Cheng
19 November 2021
Acknowledgments
Thanks also to Felipe Cueto-Ramirez, for providing support and advice for my project.
Special thanks to Jinxiao Xie, who has always encouraged me to challenge myself and
made time to help and support me.
Last but not least, I owe a debt of gratitude to my parents and sisters, who pro-
vided me with the opportunity to study abroad, and endless love.
iii
Abstract
Artificial intelligence and semantic technologies are evolving and have been applied
in various research areas, including the education domain. Higher Education insti-
tutions (HEIs) strive to improve students’ academic performance. Early intervention
to at-risk students and a reasonable curriculum is vital for students’ success. Prior
research opted for deploying traditional machine learning models to predict students’
performance. However, the existing research on applying deep learning models on
prediction is very limited. In terms of curriculum semantic analysis, after conducting
a comprehensive systematic review regarding the use of semantic technologies in the
context of Computer Science curriculum, a major finding of the study is that technolo-
gies used to measure similarity have limitations in terms of accuracy and ambiguity
in the representation of concepts, courses, etc. To fill these gaps, in this study, three
implementations were developed, that is, to predict students’ performance using
marks from the previous semester, to model a course representation in a semantic
way and compute the similarity, and to identify the prerequisite between two similar
courses. Regarding performance prediction, we used the combination of Genetic
Algorithm and Long-Short Term Memory (LSTM) on a dataset from a Brazilian uni-
versity containing 248730 records. As for similarity measurement between courses,
we deployed Bidirectional Encoder Representation with Transformers (BERT), which
is proposed by Jacob et al. [Devlin et al., 2018] to o encode the sentence in the course
description from the Australian National University (ANU). We then used cosine
similarity to obtain the distance between courses. With respect to prerequisite identi-
fication, TextRazor was applied to extract concepts from course description, followed
by employing SemRefD, which was presented by [Manrique et al., 2019a], to measure
the degree of prerequisite between two concepts. The outcomes of this study can be
summarized as: (i) a breakthrough result improves Manrique’s work [Manrique et al.,
2019b] by 2.5% in terms of accuracy in dropout prediction; (ii) uncover the similarity
between courses based on course description; (iii) identify the prerequisite over three
compulsory courses of School of Computing at ANU. In the future, these technolo-
gies could potentially be used to identify at-risk students, analyze the curriculum of
university programs, aid student advisors, and create recommendation systems.
v
vi
Contents
Acknowledgments iii
Abstract v
1 Introduction 1
1.1 Problem Statement and Motivations . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Approaches 11
3.1 Dropout Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Procedure Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.4 Training and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Curriculum Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Procedure Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Similarity Measurement . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Prerequisite Identification . . . . . . . . . . . . . . . . . . . . . . . 24
vii
viii Contents
Bibliography 37
Appendix 1 43
Appendix 2 45
Appendix 3 49
Appendix 4 51
List of Figures
ix
x LIST OF FIGURES
xi
xii LIST OF TABLES
Chapter 1
Introduction
Students’ academic performance is not only important for themselves, but also impor-
tant for higher education institutions (HEIs), who use it as a basis for measuring the
success of their educational programs. A variety of metrics can be used to measure
student performance. Dropout rates are notable among these metrics. In other words,
the reduce of dropout rates could indicate the improvement of student’s academic
performance. However, according to the published figures, each year, one-fifth of
first-year undergraduates drop out their degrees across Australia [Shipley and Ian,
2019]. In Brazil, it is estimated that only 62.4% of university enrolments succeed in
obtaining an undergraduate degree [Sales et al., 2016]. These are concerning statistics
for a country’s development and can affect students’ lives financially, mentally, and
professionally. As a consequence, HEIs and researchers have shown an increasing
interest in predicting systems to identify students at risk of dropping out [Manrique
et al., 2019a]. For example, Ruben et al. created multiple feature sets from student
data to predict whether a student would drop out by using machine learning algo-
rithms [Manrique et al., 2019a]; also, Mujica et al. pointed out that path analysis
was useful for predicting student dropouts based on affective and cognitive variables
[Diaz-Mujica et al., 2019].
According to Vergel et al., the curriculum design may affect the students’ perfor-
mance and retention. A careful design is required to "understand the role of curricu-
lum design in dropout rates and unmask how the political dynamics embedded in
the curriculum influence students’ retention" [Vergel et al., 2018]. Moreover, Drennan,
Rohde, Johnson, and Kuennen claimed that students’ academic performance in a
course is related to their performance in their prerequisite courses [Drennan and
Rohde, 2002], [Johnson and Kuennen, 2006]. It can be seen that curriculum plays a
crucial role in student performance as well as decreasing dropout rates.
The main motivation for this research is to improve students’ academic perfor-
mance based on the analysis of curriculum and student academic performance in
different courses. Identifying prerequisites for a course is key to enhance student
experience, improve curriculum design and increasing graduation rates.
1
2 Introduction
1.2 Objectives
The main objectives of this study is to apply Artificial Intelligence and Semantic
technologies to analyse and allow changes that may improve student academic per-
formance.
The goal of this research project is threefold:
To accomplish these goals, a systematic review was carried out to identify what
technologies being used in CS curriculum.
1.3 Contributions
The main contributions of this thesis can be divided into:
• An approach for student dropout prediction using Genetic Algorithm (GA) and
Long Short-Term Memory (LSTM);
The chapter is divided into two sections: background and related work. The following
section briefly introduces the basic concepts unitized in this research. In related work,
we conduct a systematic review regarding curriculum analysis by using Artificial
Intelligence and Semantic Web technologies.
2.1 Background
Artificial Intelligence refers to the development of machines or computers that sim-
ulate or emulate the functions of the human brain. The functions of a computer
differ according to the area of study. Prolog, for example, is a programming language
that aims to understand human logic. It also applies mathematics in order to create
systems that can discern relevant conclusions from a set of statements. Intelligent
Agents is another example. Unlike traditional agents, Intelligent Agents are designed
to take actions that are optimized to achieve a specific goal. Based on their perception
of the environment and internal rules, intelligent agents make decisions. The study
explores four growing fields: Machine Learning, Deep Learning, Natural Language
Processing, and Semantic Web.
Machine Learning is a field of study that uses algorithms to detect patterns in data
and to predict or make useful decisions. Different machine learning algorithms can be
implemented in an innumerable number of scenarios. Machine learning algorithms
are optimized for specific data sets; There are a variety of algorithms used to build
models to suit different use cases. Using machine learning algorithms, we predict
students’ performance in university courses based on their performance in previous
courses.
Deep Learning is a neural network with multiple layers of perceptrons. The neural
networks try to imitate human brain activities, albeit far from matching its capabilities,
which enables it to study from high volume of data. Additional hidden layers can
help to refine and optimize a neural network for accuracy, even when it has a single
layer. Recurrent Neural Networks (RNNs) are one special type of neural networks
that combines the information from previous time steps to generate updated outputs.
The Natural Language Processing (NLP) process is a psychological procedure that
consists of analyzing successful people’s strategies and applying them to reaching a
3
4 Background and Related Work
personal goal. The process links cognitions, language, and patterns of action that are
learned through experience to particular outcomes.
W3C (World Wide Web Consortium) standards enable the Semantic Web to be
extended from the World Wide Web. W3C standardizes the process of making in-
formation on the World Wide Web machine-readable. In order to accomplish this
goal, a series of communication standards were created that enable developers to
describe concepts and entities, as well as their classification and relationships. The
Resource Description Framework (RDF) and Web Ontology Language (OWL) have
enabled developers to create systems that store and use complex knowledge data
bases known as knowledge graphs.
2.2.1 Methodology
This systematic review was conducted following the methodology defined by Kitchen-
ham and Charters [BA and Charters, 2007]. The method is composed of three main
steps: planning, conducting and reporting.
The planning stage helps to identify existing research in the area of interest as
well as build the research question. Our research question was created based on
the PICOC method [BA and Charters, 2007] and used to identify keywords and
corresponding synonyms to build the search string to find relevant related works.
The resulting search string is given below.
("computer science" OR "computer engineering" OR "informatics") AND ("curriculum"
OR "course description" OR "learning outcomes" OR "curricula" OR “learning objects”)
AND ("semantic" OR "ontology" OR "linked data" OR "linked open data")
To include a paper in this systematic review, we defined the following inclusion
criteria: (1) papers must have been written in English; (2) papers must be exclusively
related to semantic technologies, computer science and curriculum; (3) papers must
have 4 or more pages (i.e., full research papers); (4) papers must be accessible online;
and, finally, (5) papers must be scientifically sound, present a clear methodology and
conduct a proper evaluation for the proposed method, tool or model.
Figure 2.1 shows the paper selection process in detail. Initially, 4,510 papers were
retrieved in total. We used the ACM Digital Library1 , IEEE Xplore Digital Library2 ,
1 https://dl.acm.org/
2 https://ieeexplore.ieee.org/
§2.2 Related Work 5
2.2.2 Tools/Frameworks
Protégé8 [Imran and Young, 2016] is a popular open-source editor and framework for
ontology construction, which several researchers have adopted to design curricula
in Computer Science [Tang and Hoh, 2013], [Wang et al., 2019], [Nuntawong et al.,
2017], [Karunananda et al., 2012], [Hedayati and Mart, 2016], [Saquicela et al., 2018],
[Maffei et al., 2016], and [Vaquero et al., 2009]. Despite its wide adoption, Protégé still
presents limitations in terms of the manipulation of ontological knowledge [Tang and
Hoh, 2013]. As an attempt to overcome this shortcoming, Asoke et al. [Karunananda
et al., 2012] developed a curriculum design plug-in called OntoCD. OntoCD allows
curriculum designers to customise curricula by loading a skeleton curriculum and a
benchmark domain ontology. The evaluation of OntoCD is performed by developing
a Computer Science degree program using benchmark domain ontologies developed
in accordance with the guidelines provided by IEEE and ACM. Moreover, Adelina
and Jason [Tang and Hoh, 2013] used Protégé to develop SUCO (Sunway University
Computing Ontology), an ontology-specific Application Programming Interface (API)
for curricula management system. They claim that, in response to the shortcoming
with using the Protégé platform, SUCO shows a higher level of ability to manipulate
and extract knowledge and will function effectively if the ontology is processed as an
eXtensible Markup Language (XML) format document.
Other specific ontology-based tools in curriculum management have also been
developed. CDIO [Liang and Ma, 2012] is an example of such tool. CDIO was cre-
ated to automatically adapt a given curriculum according to teaching objectives and
teaching content based on the designed constructed ontology. A similar approach
was used by Maffei et al. /citepMaffei2016 to model the semantics behind functional
alignments in order to design, synthesize, and evaluate functional alignment activ-
ities. In Mandić’s study, the author presented a software platform9 for comparing
chosen Curriculum for information technology teachers [Mandić, 2018]. In Hedayati’s
work, the authors used the curriculum Capability Maturity Model (CMM), which is
a taxonomical model used for describing the organization’s level of capability in the
domain of software engineering [Paulk et al., 1993], An ontology-driven model for
analyzing the development process of the vocational ICT curriculum in the context
of the culturally sensitive curriculum in Afghanistan is used as a reference model
[Hedayati and Mart, 2016].
2.2.3 Datasets
There are several datasets used in curriculum design studies in Computer Science.
The open-access CS201310 dataset is the result of the joint development of a com-
puting curriculum sponsored by the ACM and IEEE Computer Society [Piedra and
Caro, 2018]. The CS2013 dataset has been used in several studies [Piedra and Caro,
2018], [Aeiad and Meziane, 2016], [Nuntawong et al., 2016], [Nuntawong et al., 2017],
8 https://protege.stanford.edu/
9 http://www.pef.uns.ac.rs/InformaticsTeacherEducationCurriculum
10 https://cs2013.org/
§2.2 Related Work 7
[Karunananda et al., 2012], [Hedayati and Mart, 2016], and [Fiallos, 2018] to develop
ontologies or as a benchmark curriculum in similarity comparison between computer
science curricula.
Similar to CS2013, The Thailand Qualification Framework for Higher Education
(TQF: HEd) was developed by the Office of the Thailand Higher Education Commis-
sion to be used by all higher education institutions (HEIs) in Thailand as a framework
to enhance the quality of course curricula, including the Computer Science curricu-
lum. TQF: HEd was used for the guidelines in terms of ontology development in
the following studies [Nuntawong et al., 2017], [Nuntawong et al., 2016], [Hao et al.,
2008], and [Nuntawong et al., 2015].
Other studies use self-created datasets (e.g., [Wang et al., 2019], [Maffei et al.,
2016], [Hedayati and Mart, 2016], and [Fiallos, 2018]). Specifically, in Wang’s work,
the Ontology System for the Computer Course Architecture (OSCCA) was proposed
based on a dataset created using course catalogs from top universities in China
as well as network education websites [Wang et al., 2019]. In Maffei’s study, the
authors experimented and evaluate the proposal based on the Engineering program
at KTH Royal Institute of Technology in Stockholm, Sweden [Maffei et al., 2016]. In
Gubervic et al.’s work, the dataset used in comparing courses comes from Faculty
of Electrical Engineering and Computing compared to all universities from United
States of America [Guberovic et al., 2018]. In Fiallos’s study, not only did the author
adopt CS2013 for domain ontologies modeling, but also the core courses from Escuela
Superior Politécnica del Litoral (ESPOL11 ) Computational Sciences were collected for
semantic similarity comparison [Fiallos, 2018].
studies [Piedra and Caro, 2018], [Nuntawong et al., 2017], [Nuntawong et al., 2016],
[Hao et al., 2008], [Karunananda et al., 2012], [Tapia-Leon et al., 2018], [Chung and
Kim, 2014], and [Nuntawong et al., 2015]. In Piedra’s study [Piedra and Caro, 2018],
the BoK defined in CS2013 ontology was viewed as a to-be-covered description of the
content and a curriculum to implement this information. Similarly, Numtawong et
al.[Nuntawong et al., 2017] applied the BoK, which is based on the ontology of TQF:
HEd, to conduct the ontology mapping.
The Library of Congress Subject Headings14 (LCSH) is a managed vocabulary
Upheld by the Library of Congress. LCSH terminology is a BoK, which contains
more than 240,000 topical subject headings. The equivalence, hierarchical, and asso-
ciative types of relationships between headings can be offered. In Adrian’s study, the
authors create an ontology based on LCSH and the Faceted Application of Subject
Terminology15 (a completely enumerative faceted subject terminology schema origi-
nated from LCSH), to assess the consistency of an academic curriculum and apply it
to an Information Science curriculum [Barb and Kilicay-Ergin, 2020].
Knowledge Area (KA), also a subclass of OWL, is an area of specialization such as
Operating Systems and Algorithm. The relationship between BoK and KA is built in
various ways in the studies. For example, in Piedra’s study [Piedra and Caro, 2018],
each BoK class contains a set of KAs. In contrast, Numtawong et al. [Nuntawong
et al., 2017] considered KA as the superclass of BoK. KA classification was proposed in
Orellana et al.’s study [Orellana et al., 2018]. In that paper, the curricula are classified
in the KA defined by UNESCO16 which defines 9 main areas and 24 subareas of
knowledge. To do this, they convert the curricula to the vector space and then process
with using traditional supervised approaches such as the support vector machines
and k-nearest neighbors [Orellana et al., 2018]. By classifying the KA, the similarity
measurement can be applied more easily.
similarity metric using WordNet18 to generate a score to compare LOs based on the
Bloom taxonomy. Similarly, Mandić [Mandić, 2018] proposed the taxonomic struc-
tural similarity between curricula by applying the revised Bloom’s taxonomy which
has the adjustment in cognitive levels of learning.
This section partially presented some of the works found in the literature that used
semantic technologies to help university stakeholders design curricula for Computer
Science. The following chapters present the approaches and analysis carried out
using previous works as inspiration.
18 https://wordnet.princeton.edu/
Chapter 3
Approaches
This chapter contains two sections. The first section introduces the dropout prediction
task and a series of techniques to predict students’ performance. The second part
presents semantic techniques that can be used to analyze the relationship between
Computer Science courses.
Before introducing the data pre-processing, we introduce the data set used for this
experiment. Figure 3.2 presents a few instances of the dataset used.
11
12 Approaches
Figure 3.4: Student distribution by degree Figure 3.5: Dropout distribution by year
The dataset used in this thesis has been used in previous research Manrique et al.
[2019b] and is provided by a Brazilian university. It contains 248,730 academic records
of 5,582 students enrolled in six distinct degrees from 2001 to 2009. The dataset is in
Portuguese and we will translate the main terms for better understanding.
The dataset contains 32 attributes in total: cod_curso (course code); nome_curso
(course name); cod_hab (degree code); nome_hab (degree name); cod_enfase (empha-
sis code); nome_enfase (emphasis name); ano_curriculo (curriculum year); cod_curriculo
(curriculum code); matricula (student identifier); mat_ano (student enrolment year);
mat_sem ((semester of the enrolment); periodo (term); ano (year); semestre (semester);
grupos (group); disciplina (discipline/course); semestre_recomendado (recommended
semester); semestre_do_aluno (student semester); no_creditos (number of credits);
turma (class); grau (grades); sit_final (final status (pass/fail)); sit_vinculo_atual (cur-
rent status); nome_professor (professor name); cep (zip code); pontos_enem (na-
tional exam marks); diff (difference between semesters and students performance);
tentativas (number of attempts in a course); cant (previous course); count (count);
14 Approaches
Technique Operation
missing values with 0 in other columns, and then to run the RF and perform iterations
until each missing value will be handled. The procedure of imputation is shown in
Algorithm 1. Next comes to group the data by "course" and "semester" attributes.
The final step normalizes all the input data except the "sit_vinculo_atual" and
"semestre" with the z-score method as z-score can improve the model performance
than other techniques [Imron and Prasetyo, 2020], and [Cheadle et al., 2003]. The
z-score method is computed based on a set of values S = { x1 , x2 , ..., xn }, the mean of
the set (µ), and the standard deviation (σ). The formula is presented in what follows.
x−µ
z= (3.1)
σ
The last pre-processing step applied to the dataset it the data enrichment step.
As Table 3.1 shows, the number of instances in the CSI dataset is very small and
not sufficient for training/testing purposes. For this, we used the Synthetic Minor-
ity Oversampling Technique (SMOTE) to increase the number of instances of the
CSI dataset. Briefly, SMOTE generates new instances based on real ones. SMOTE
considers the minorities to generate and balance the dataset with new instances.
nary encoding which determines the chosen feature as 1, and 0 represents column will
not be chosen. The next step is to calculate the fitness of each individual in the popula-
tion by performing SVM in the three datasets (ADM, CSI, and ARQ). SVM is a super-
vised linear machine learning technique that is most commonly employed for classifi-
cation purposes, and it has good performance in the fitness calculation in feature selec-
tion [Tao et al., 2019]. It is defined as a linear classifier with maximum interval on the
feature space (when a linear kernel is used), which is basically an interval maximiza-
tion strategy that results in a convex quadratic programming problem. Given a train-
ing dataset, D = {([ x11 , x12 , ...x1n ], y1 ), ([ x21 , x22 , ...x2n ], y2 ), ..., ([ xm1 , xm2 , ...xmn ], yn )} , yn ∈
{−1, +1}, where m is the number of samples in the dataset, and n is the number of
features. In this experiment n = 27. The target is to get a decision boundary so as to
separate the samples into different areas. The separator in two-dimensional space is
a straight line and can be written as formula 3.2:
mx + c = 0 (3.2)
Once mapping the separator to the n-dimensional space, it will become the hyper-
plane separator H. It can be written as formula 3.3:
H : wT x + b = 0 (3.3)
where w = (w1 , w2 , ...wn ) where w is the vector determining the hyperplane direc-
tion, The hyperplane is located at the origin after a displacement term b is applied. By
determining the vector w and the bias b, the division hyperplane is denoted as (w, b).
In the sample space, a given point vector ∅(xo ) is the distance from the hyperplane.
3.4:
wT ∅(xo ) + b
d H (∅(xo ))) = (3.4)
kwk2
where kwk2 is the 2-norm which can be defined by w as in formula 3.5:
q
kwk2 = w21 + w22 + ... + w2n (3.5)
The vectors close to the hyperplane make the formula 3.6 hold are named "support
vectors". The sum of distance of 2 vectors which are from different areas to the
hyperplane is called "margin", which is defined in the formula 3.7:
2
γ= (3.7)
kwk2
§3.1 Dropout Prediction 17
Table 3.2: Genetic Algorithm techniques, descriptions, goals, and specific method used in
this study
Initialization initialization of population acquisition to generate a set of solutions Randomly generated initialization
Encoding representation of the individuals to convey the necessary information Binary Encoding
Fitness the degree of health of individuals to evaluate the optimality of a solution Support Vector Machine (SVM)
Crossover parents are chosen for production to determine which solutions are to be preserved Uniform Crossover
Selection solutions are selected for production to select the individuals with better adaptability Proportional Selection
Mutation a gene is deliberately changed to maintain diversity in the population set mutation rate setting-up
Termination a process stops the evolution to terminate and output the optimal outcomes Maximum generations
To obtain a partitioned hyperplane with maximum margin, that is, to find param-
eters w and b that comply with the constraint in equation 3.6 such that the optimal
hyperplane classifies all the points in D correctly, namely:
h i
w∗ = argw max(γ), s.t.minn yn wT ∅(xo ) + b = 1 (3.8)
Followed by the fitness computation with SVM in 5-fold cross validation, which
is a technique to prevent overfitting, proportional selection was conducted to get
the individuals, the process is similar to the roulette wheel, that is, individuals with
higher fitness ratings will have the greater chance to be selected, the formula of this
process is shown in equation 3.9:
fγ (xi (t))
ϕs (xi (t)) = n (3.9)
∑I=1 fγ (xI (t))
where ns is a population’s total number of chromosomes whose initial number
is 1,000, ϕs (xi ) is the possibility of xi being selected , and fγ (xi ) is the fitness of xi to
yield float type of value greater than 0.
After completing selection, the uniform crossover was performed to get the off-
springs between a pair of selected parents. Specifically, each gene is selected at
random from one of the corresponding genes on every parent chromosome as shown
in Figure 3.7. Note that this method will only yield one offspring and cross rate is
0.75. Reproduction is followed by mutation which certain gene(s) will be changed
randomly by the setting rate, in this thesis, is 0.002. The final step is the terminal
criterion, when the maximum generation reached, the population has evolved 100
generations, an optimal subset will be generated as the output. The entire workflow
is illustrated in Figure 3.8.
tional feedforward neural networks and traditional RNNs, it has feedback connection
and gate schema, which power LSTM to learn and memorize the information over
time sequences. Likewise, human learning activities are matchable to LSTM based on
gate schema. In this study, it also simulate the process of influence of past exam to the
current performance. Furthermore, LSTM is capable to handle the vanishing gradient.
There are multiple studies in the past focusing on Deep Knowledge Tracing to predict
student performance in the quiz[Piech et al., 2015] and simulation of Human Activity
Recognition based on the combination of LSTM and convolutional neural network
(CNN)[Xia et al., 2020].
As illustrated in Figure 3.9, LSTM consists of the following components which
has been listed in Table 3.3. The inputs at each time sequence are comprised of 3
elements, that is, Xt , Ht−1 , and Ct−1 . With regards to outputs, Ht and Ct are exported
by the LSTM. Note that there are 8 weight W parameters in total, 4 associated with
hidden state and others linked with the input. Moreover, 4 bias b will be used in
one unit. All the W and b will be initialized randomly for the first place, and will be
adjusted by back-propagation. To prevent gradients exploring when gradients reach
the designed threshold, which is 1.01, clipping the gradients.
Once execution in LSTM during one sequence t with using the dataset from
section 3.1.3. The first move is to determine which information will be forgotten as
§3.1 Dropout Prediction 21
formula 3.10 illustrates. This step is executed by the forget gate, Ht−1 and Xt pass the
gate to get a output between 0 and 1.
Ct = ft ∗ Ct −1 + It ∗ C
et (3.13)
The final step is to output the hidden state. As shown in formula 3.14 and 3.15,
similar to step 1 and 2, carrying Ht−1 and Xt as inputs, to pass output gate with
sigmoid which will make a decision regarding the parts of outputs. Then output is
transited to a tanh layer to acquire the hidden state Ht , which will be used as the
input in the time step or another neural network.
Ht = Ot ∗ tanh(Ct ) (3.15)
In this study, after a series of hyper-parameter tuning, the final output hidden
state will be regarded as the input of a 3-layer fully connected (FC) neural network
to predict the dropout. The hidden size in FC is 128, in the first layer, sigmoid is
deployed. Likewise, ReLU is applied in the output layer. As for loss measurement, as
illustrate in formula 3.16, Mean Squared Error (MSE) is used to compute the loss:
1 n
n i∑
MSE = (Yi − Ŷi )2 (3.16)
=1
where n is the number of samples, Yi is the observed value, and Ŷi is the predicted
22 Approaches
value.
After computing the loss, the back-propagation will be executed to adjust the
weights with using Adam as the optimizer. Furthermore, by introducing a dropout
layer on outputs of each LSTM layer apart from the last layer, with dropout possibility
equals to 0.7, to prevent overfitting. Figure 3.10 visualizes the process of once training.
Identifying dropout is an important step to understand the reason why students fail
and what can be done to increase graduation rates. We argue that the sequence of
courses taken by students may influence student attrition. This section presents an
approach to create a sequence of courses based on its descriptions. The idea is that
in the future be able to correlate dropout rates, the sequence of courses taken by
students and the course content/description.
We divide this section into two parts. The first part introduces a methods to
measure the similarity between course pairs using the Bidirectional Encoder Rep-
resentations from Transformers (the so-called BERT language model). The second
part is responsible for ordering course pairs by employing Semi-Reference Distance
(SemRefD) [Manrique et al., 2019b].
As Figure 3.11 and 5.2 show, the first move is to unify BERT to conduct similarity mea-
surement. It involves three steps, that is, dataset acquirement, sentence embedding,
and similarity measurement. Likewise, the second step uses the same dataset, the
first process is to extract entities, followed by concept comparison and the sequence
will be identified between courses in the end.
§3.2 Curriculum Semantic Analysis 23
QK T
Attention( Q, K, V ) = so f tmax ( √ )V (3.17)
dk
where K and V create a key-value pair of input of of dimension dk , Q stands for
the query. The output is a weighted sum of values.
Compare with computing the attention one time, the multi-head mechanism
which is used in BERT, goes through the scaled dot-product attention several times at
same time and creates separate embedding matrices that are combined into one final
output embedding as formula 3.18 shown:
te ∑in=1 ti ei
cos(t, e) = =p n (3.19)
ktkkek ∑i=1 (ti )2 ∑in=1 (ei )2
p
where t and e are sentence vectors which were generated by BERT, the output
ranges within [-1, 1] depicting the degree of contextual similarity.
To get the entire contextual similarity, the average similarity among sentences will
be calculated. After the processes above, the course similarity is obtained.
are considered, while in the indicator function 1(c j , c A ), the property paths between
concept target and related concepts are considered [Manrique et al., 2019a].
As a result, all concepts from two courses will be compared and summed up to
reveal the order of them, that is, whether A is prerequisite of B or B is prerequisite of
A.
Chapter 4
In this chapter, the results will be presented in 2 sections from experiments described
in Chapter 3, namely, dropout prediction and curriculum semantic analysis along
with discussions in details. Notably, the results yielded by methods in systematic
review for have been indicated in Section 2.2.1. Additionally, the experimental envi-
ronment will be introduced initially.
Server Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Tesla V100-DGXS-32GB*4 251GB 10TB Ubuntu 18.04.6
Apple Macbook Pro M1 chip 8-core GPU 16GB 512GB macOS Monterey 12.0.1
27
28 Results and Discussion
False Negative (FN): Models predicted negative outcomes and are inaccurate.
To be specific, the The accuracy refers to the proportion of correctly predicted
observations in a whole samples. The precision measures the correctly predicted
TP samples to the total positive ones. Recall depicts the sensitivity of model by
measuring the correctly predicted positive observations to the all the observations.
F1 score measure the overall performance of model The equations are as formula 4.1
- 4.4 shown below:
TP + TN
Accuracy = (4.1)
TP + FP + FN + TN
TP
Precision = (4.2)
TP + FP
TP
Recall = (4.3)
TP + FN
2 ∗ ( Recall ∗ Precision) TP
F1 = = (4.4)
Recall + precision TP + FP+2 FN
The results of feature selection as listed in Table 4.2. After 100 generations evolution
in a population comprised of 1000 individuals, as for the performance of final optimal
individuals among these datasets, ARQ has the best performance in terms of accuracy
(90.61), F1 score (95.03) with the least number of dropped features, indicating that
this individual fits the environment most than the others. By comparison, CSI has
the least performance in terms of measurement metrics. By observing the lower score
in precision than recall across used dataset, indicating that the model has it’s bias in
terms of the prediction preference in positive instead of negative, and emphasising
the importance of the balance of datasets. Furthermore, by acquiring the ratio of
negative in CSI (24%), ADM (13%), and ARQ (10%), also proving this inference. Thus,
employing model on a balanced dataset may improve the its performance. As for
ADM, it has the nearly same recall as ARQ’s and best recall score in this experiment.
§4.2 Dropout Prediction 29
Overall, the results turn out this model align with this experiment.
Likewise, this experiment used same datasets on the dropout prediction. As dropout
prediction is one of the main objectives of this project, the performance of pro-
posed model is important to this project. As illustrated in 4.2 - 4.7, these out-
comes on training reveal the capacity of the model, whose end of the abscissa is
the epoch(every10epochs) ∗ timestep. Furthermore, the performance of model on test
further validate the suitability of the model on datasets as shown in Table 4.3.
According to the accuracy and loss during training, we can observe that the model
always converges after 10 epochs. As for the accuracy in ADM and ARQ datasets,
which starts from below 10% initially, then fall to the lower, and rises to the close to
the top afterwards. For ARQ, the curve is similar to some extent. After investigation,
multiple reasons are identified, which leads to the abnormal curve. For instance,
dropout rate is high, which shows that model drops the well-functional neurons, or
the batch size is large. In this study, the time step is fixed as mentioned in section
3.1.4, which indicates that the batch size is uncontrollable. In addition, after hyper-
parameter tuning, high dropout rate (0.7) enables the best results of model, which
shows that selecting the current dropout rate is a trade-off decision. Thus, this study
chose to keep a better performance with using current dropout rate.
In terms of loss, after reaching the convergence, it will repeat the loss curve along
with the reset of the time step. Apart from repetition of loss, we also observed that the
unstable curve during one epoch training, which is named multiple descent [Chen
et al., 2020]. The reason behind the anomaly may vary. For example, we assumed
there are minimas θ1 and θ2 , when the distance between two minima d = kθ1 − θ2 k is
very small, and the learning rate is not small enough, which leads to cross a local θ1
and arrives in θ2 eventually. Also, this phenomenon can be caused by datasets [Chen
et al., 2020].
With respect to the performance of proposed model in test, the model performs
well in ADM and ARQ, whose best accuracy reaches 92.83% and 97.65% respec-
tively. Notably, the accuracy of ARQ improves the result in Manrique’s previous
work (95.2%) by 2.45% [Manrique et al., 2019a]. Finally, we captured a potential
improvement for model by adopting dynamic time steps instead of fixed steps to
make full use of dataset. Overall, the model proposed by this study is suitable for the
current datasets.
30 Results and Discussion
Figure 4.2: Accuracy of LSTM+FC on ADM Figure 4.3: Loss of LSTM+FC on ADM
Figure 4.4: Accuracy of LSTM+FC on ARQ Figure 4.5: Loss of LSTM+FC on ARQ
Figure 4.6: Accuracy of LSTM+FC on CSI Figure 4.7: Loss of LSTM+FC on CSI
§4.3 Curriculum Semantic Analysis 31
Dataset Avg. acc (10 iter.)(%) Avg. acc (100 iter.)(%) Avg. acc (200 iter.)(%) Top. acc(%)
This section contains two subsections, that is, similarity measurement, as mentioned
in Section 3.2.2, and prerequisite identification as mentioned in Section 3.2.3.
After the encoding between sentence captured by BERT and the average similarity
between courses computed hereby, the results have been acquired and visualized in
heat map format from Figure 4.8 ranging from 0.8265 to 1.0, which lists the result of
all the course comparison, to Figure 4.9 - Figure 4.12 as below.
In 1000-level courses comparison, we can see that COMP1110 (Structured Pro-
gramming) has the closest average distance to the rest of courses. In contrast, the
COMP1600 (Foundations of Computing) has the farthest to others. From courses
content to interpret, similarity among COMP1100 (Programming as Problem Solving)
to COMP1100 is one of the most fundamental programming course among 1000 level
courses, and COMP 1600 focuses on the mathematical perspective.
With respect to 2000-level courses comparison, COMP2100 (Software Design
Methodologies) has the closest average distance to the rest of courses. On the con-
trary, the COMP2560 (Studies in Advanced Computing R & D) has the farthest to
others. To sum up, the overall similarity between 2000-level courses becomes lower
than 1000-level’s, which indicates that the curriculum differentiation has appeared.
As regards 3000-level and 4000-level courses, it can be seen that this trend has
become more obvious and grown, which also aligns with real situations, for example,
students knowledge grows, the course content deepens and divided by specifications,
such as Data Science, Machine Learning etc. In the meantime, students also have
the interest to dive in, which aids students to make use of strengths in the region of
interest and improve the academic performance in return. Finally, we identified an
obstacle in automatic evaluation, which is a common limitation in studies in system-
atic review [Piedra and Caro, 2018], [Pata et al., 2013], [Yu et al., 2007], [Saquicela
et al., 2018] and [Tapia-Leon et al., 2018].
32 Results and Discussion
Figure 4.9: 1000-level courses similarity Figure 4.10: 2000-level courses similarity
Figure 4.11: 3000-level courses similarity Figure 4.12: 4000-level courses similarity
35
36 Conclusion and Future Work
then used SemRefD, which was proposed and evaluated by [Manrique et al., 2019a], to
measure the degree of prerequisite between two concepts. By deploying the model on
COMP1100, COMP1110, and COMP2100 respectively, we established the relationship
between these three courses. The results show that COMP1100 and COMP1110 have
a strong prerequisite relationship with COMP2100, while the relationships between
the two courses are inclined to be on the same level.
In terms of the future work, these technologies could potentially be used to anal-
yse the curriculum of university programs, to aid student advisors, and to create
recommendation systems that combine semantic and deep learning technologies.
Furthermore, this study suggested that using GA+LSTM model could provide insti-
tutions with early detection for dealing with problems and retaining students. As
for the experiment’s future refinement, first, in dropout prediction, dynamic time
steps can be introduced in LSTM instead of using fixed steps to make full use of the
dataset. Moreover, a more balanced dataset could be used in the experiment. We will
continue to investigate and develop appropriate models to improve the results in the
future.
Bibliography
BA, K. and Charters, S., 2007. Guidelines for performing systematic literature
reviews in software engineering. 2 (01 2007). (cited on page 4)
Babatunde, O.; Armstrong, L.; Leng, J.; and Diepeveen, D., 2014. A genetic
algorithm-based feature selection. International Journal of Electronics Communication
and Computer Engineering, 5 (07 2014), 889–905. (cited on page 15)
Cheadle, C.; Vawter, M. P.; Freed, W. J.; and Becker, K. G., 2003. Analysis of
microarray data using z score transformation. The Journal of Molecular Diagnostics,
5, 2 (2003), 73–81. doi:https://doi.org/10.1016/S1525-1578(10)60455-2. https://www.
sciencedirect.com/science/article/pii/S1525157810604552. (cited on page 15)
Chen, L.; Jianbo, Y.; Shuting, W.; Bart, P.; and Giles, C., 2018. Investigating active
learning for concept prerequisite learning. In 32nd AAAI Conference on Artificial In-
telligence, AAAI 2018, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018,
7913–7919. AAAI press. Funding Information: We gratefully acknowledge partial
support from the Pennsylvania State University Center for Online Innovation in
Learning. Publisher Copyright: Copyright © 2018, Association for the Advance-
ment of Artificial Intelligence (www.aaai.org). All rights reserved.; 32nd AAAI
Conference on Artificial Intelligence, AAAI 2018 ; Conference date: 02-02-2018
Through 07-02-2018. (cited on page 25)
Chen, L.; Min, Y.; Belkin, M.; and Karbasi, A., 2020. Multiple descent: Design your
own generalization curve. CoRR, abs/2008.01036 (2020). https://arxiv.org/abs/2008.
01036. (cited on page 29)
Chung, H. S. and Kim, J. M., 2014. Semantic model of syllabus and learning ontology
for intelligent learning system. Lecture Notes in Computer Science (including subseries
37
38 Bibliography
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8733 (2014),
175–183. doi:10.1007/978-3-319-11289-3_18. (cited on page 8)
Devlin, J.; Chang, M.; Lee, K.; and Toutanova, K., 2018. BERT: pre-training of
deep bidirectional transformers for language understanding. CoRR, abs/1810.04805
(2018). http://arxiv.org/abs/1810.04805. (cited on pages v, 2, and 23)
Diaz-Mujica, A.; Pérez, M.; Bernardo, A.; Cervero, A.; and González-Pienda,
J., 2019. Affective and cognitive variables involved in structural prediction of
university droput. Psicothema, 31 (10 2019), 429–436. doi:10.7334/psicothema2019.124.
(cited on page 1)
Drennan, L. and Rohde, F., 2002. Determinants of performance in advanced un-
dergraduate management accounting: An empirical investigation. Accounting and
Finance, 42 (02 2002), 27–40. doi:10.1111/1467-629X.00065. (cited on page 1)
Fiallos, A., 2018. Assisted curricula design based on generation of domain ontologies
and the use of NLP techniques. 2017 IEEE 2nd Ecuador Technical Chapters Meeting,
ETCM 2017, 2017-Janua (2018), 1–6. doi:10.1109/ETCM.2017.8247474. (cited on pages
7 and 9)
Guberovic, E.; Turcinovic, F.; Relja, Z.; and Bosnic, I., 2018. In search of a
syllabus: Comparing computer science courses. 2018 41st International Convention
on Information and Communication Technology, Electronics and Microelectronics, MIPRO
2018 - Proceedings, (2018), 588–592. doi:10.23919/MIPRO.2018.8400111. (cited on page
7)
Hao, X.; Meng, X.; and Cui, X., 2008. Knowledge Point Based Curriculum Developing
and Learning Object Reusing. Knowledge Creation Diffusion Utilization, (2008), 126–
137. (cited on pages 7 and 8)
Hedayati, M. H. and Mart, L., 2016. Ontology-driven modeling for the culturally-
sensitive curriculum development: A case study in the context of vocational ICT
education in Afghanistan. Proceedings of the 10th INDIACom; 2016 3rd International
Conference on Computing for Sustainable Global Development, INDIACom 2016, , Ontol-
ogy 101 (2016), 928–932. (cited on pages 6 and 7)
H.Gomaa, W. and A. Fahmy, A., 2013. A Survey of Text Similarity Approaches. Inter-
national Journal of Computer Applications, 68, 13 (2013), 13–18. doi:10.5120/11638-7118.
(cited on page 9)
Hong, S. and Lynn, H. S., 2020. Accuracy of random-forest-based imputation of
missing data in the presence of non-normality, non-linearity, and interaction. BMC
Medical Research Methodology, 20 (2020). (cited on page 14)
Huang, J.; Cai, Y.; and Xu, X., 2007. A hybrid genetic algorithm for feature selection
wrapper based on mutual information. Pattern Recognition Letters, 28, 13 (2007),
1825–1844. doi:https://doi.org/10.1016/j.patrec.2007.05.011. https://www.sciencedirect.
com/science/article/pii/S0167865507001754. (cited on page 15)
Bibliography 39
Imran, M. and Young, R. I., 2016. Reference ontologies for interoperability across
multiple assembly systems. International Journal of Production Research, 54, 18 (2016),
5381–5403. doi:10.1080/00207543.2015.1087654. (cited on page 6)
Imron, M. A. and Prasetyo, B., 2020. Improving algorithm accuracy k-nearest
neighbor using z-score normalization and particle swarm optimization to predict
customer churn. (cited on page 15)
Johnson, M. and Kuennen, E., 2006. Basic math skills and performance in an
introductory statistics course. Journal of Statistics Education, 14, 2 (2006), null. doi:
10.1080/10691898.2006.11910581. https://doi.org/10.1080/10691898.2006.11910581.
(cited on page 1)
Karunananda, A. S.; Rajakaruna, G. M.; and Jayalal, S., 2012. OntoCD - Onto-
logical solution for curriculum development. International Conference on Advances
in ICT for Emerging Regions, ICTer 2012 - Conference Proceedings, (2012), 137–144.
doi:10.1109/ICTer.2012.6421412. (cited on pages 6, 7, and 8)
Kawintiranon, K.; Vateekul, P.; Suchato, A.; and Punyabukkana, P., 2016. Under-
standing knowledge areas in curriculum through text mining from course materials.
In 2016 IEEE International Conference on Teaching, Assessment, and Learning for Engi-
neering (TALE), 161–168. doi:10.1109/TALE.2016.7851788. (cited on page 8)
Lasley, T. J., 2013. Bloom’s Taxonomy. doi:10.4135/9781412957403.n51. (cited on
page 9)
Leardi, R., 2000. Application of genetic algorithm–pls for feature selection in spectral
data sets. Journal of Chemometrics - J CHEMOMETR, 14 (09 2000), 643–655. doi:
10.1002/1099-128X(200009/12)14:5/63.0.CO;2-E. (cited on page 15)
Liang, Y. and Ma, X., 2012. Teaching reform of software engineering course. ICCSE
2012 - Proceedings of 2012 7th International Conference on Computer Science and Edu-
cation, , Iccse (2012), 1936–1939. doi:10.1109/ICCSE.2012.6295452. (cited on page
6)
Maffei, A.; Daghini, L.; Archenti, A.; and Lohse, N., 2016. CONALI Ontology. A
Framework for Design and Evaluation of Constructively Aligned Courses in Higher
Education: Putting in Focus the Educational Goal Verbs. Procedia CIRP, 50 (2016),
765–772. doi:10.1016/j.procir.2016.06.004. http://dx.doi.org/10.1016/j.procir.2016.06.004.
(cited on pages 6 and 7)
Mandić, M., 2018. Semantic web based software platform for curriculum harmoniza-
tion*. ACM International Conference Proceeding Series, (2018). doi:10.1145/3227609.
3227654. (cited on pages 6, 7, and 10)
Manrique, R.; Pereira Nunes, B.; and Marino, O., 2019a. Exploring knowledge
graphs for the identification of concept prerequisites. Smart Learning Environments,
6 (12 2019), 21. doi:10.1186/s40561-019-0104-3. (cited on pages v, 1, 2, 14, 25, 26,
29, 35, and 36)
40 Bibliography
Manrique, R.; Pereira Nunes, B.; Marino, O.; Casanova, M.; and Nurmikko-
Fuller, T., 2019b. An analysis of student representation, representative features
and classification algorithms to predict degree dropout. 401–410. doi:10.1145/
3303772.3303800. (cited on pages v, 13, and 22)
McGuinness, D. L. and van Harmelen, F., 2004. OWL Web Ontology Lan-
guage Overview. W3C recommendation, 10, February (2004). http://www.w3.org/
TR/owl-features/ . (cited on page 7)
Nuntawong, C.; Namahoot, C. S.; and Brückner, M., 2016. A web based co-
operation tool for evaluating standardized curricula using ontology mapping.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial In-
telligence and Lecture Notes in Bioinformatics), 9929 LNCS (2016), 172–180. doi:
10.1007/978-3-319-46771-9_23. (cited on pages 6, 7, 8, and 9)
Nuntawong, C.; Namahoot, C. S.; and Brückner, M., 2017. HOME: Hybrid ontol-
ogy mapping evaluation tool for computer science curricula. Journal of Telecommu-
nication, Electronic and Computer Engineering, 9, 2-3 (2017), 61–65. (cited on pages 6,
7, 8, and 9)
Nuntawong, C.; Snae, C.; and Brückner, M., 2015. A semantic similarity assessment
tool for computer science subjects using extended wu & palmer’s algorithm and
ontology. Lecture Notes in Electrical Engineering, 339 (02 2015), 989–996. doi:10.1007/
978-3-662-46578-3_118. (cited on pages 7, 8, and 9)
Orellana, G.; Orellana, M.; Saquicela, V.; Baculima, F.; and Piedra, N., 2018.
A text mining methodology to discover syllabi similarities among higher edu-
cation institutions. Proceedings - 3rd International Conference on Information Sys-
tems and Computer Science, INCISCOS 2018, 2018-Decem (2018), 261–268. doi:
10.1109/INCISCOS.2018.00045. (cited on pages 8 and 9)
Pantanowitz, A. and Marwala, T., 2009. Missing data imputation through the use
of the random forest algorithm. Advances in Computational Intelligence, 116 (01 2009).
doi:10.1007/978-3-642-03156-4_6. (cited on page 14)
Pata, K.; Tammets, K.; Laanpere, M.; and Tomberg, V., 2013. Design principles for
competence management in curriculum development. Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics), 8095 LNCS (2013), 260–273. doi:10.1007/978-3-642-40814-4_21.
(cited on page 31)
Paulk, M.; Curtis, B.; Chrissis, M.; and Weber, C., 1993. Capability Maturity Model
for Software, Version 1.1. (cited on page 6)
Pawar, A. and Mago, V., 2018. Similarity between learning outcomes from course
objectives using semantic analysis, bloom’s taxonomy and corpus statistics. arXiv,
(2018). (cited on page 9)
Bibliography 41
Piech, C.; Spencer, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L. J.; and
Sohl-Dickstein, J., 2015. Deep knowledge tracing. CoRR, abs/1506.05908 (2015).
http://arxiv.org/abs/1506.05908. (cited on page 20)
Sales, A. R. P.; Balby, L.; and Cajueiro, A., 2016. Exploiting academic records for
predicting student drop out: a case study in brazilian higher education. J. Inf. Data
Manag., 7 (2016), 166–180. (cited on page 1)
Sanjeevi, M., 2018. Chapter 10.1: Deepnlp - lstm (long short term mem-
ory) networks with math. https://medium.com/deep-math-machine-learning-ai/
chapter-10-1-deepnlp-lstm-long-short-term-memory-networks-with-math-21477f8e4235.
(cited on pages ix and 20)
Saquicela, V.; Baculima, F.; Orellana, G.; Piedra, N.; Orellana, M.; and Es-
pinoza, M., 2018. Similarity detection among academic contents through semantic
technologies and text mining. In IWSW. (cited on pages 6, 7, 9, and 31)
Seidel, N.; Rieger, M.; and Walle, T., 2020. Semantic textual similarity of course
materials at a distance-learning university. CEUR Workshop Proceedings, 2734 (2020).
(cited on page 9)
Shah, A. D.; Bartlett, J. W.; Carpenter, J. R.; Nicholas, O.; and Hemingway,
H., 2014. Comparison of random forest and parametric imputation models for
imputing missing data using mice: A caliber study. American Journal of Epidemiology,
179 (2014), 764 – 774. (cited on page 14)
Shipley, B. and Ian, W., 2019. Here comes the drop: university drop out rates and
increasing student retention through education. https://www.voced.edu.au/content/
ngv:84995. (cited on page 1)
Tang, A. and Hoh, J., 2013. Ontology-specific API for a curricula management system.
2013 2nd International Conference on E-Learning and E-Technologies in Education, ICEEE
2013, (2013), 294–297. doi:10.1109/ICeLeTE.2013.6644391. (cited on pages 6 and 7)
Tao, Z.; Huiling, L.; Wenwen, W.; and Xia, Y., 2019. Ga-svm based feature selection
and parameter optimization in hospitalization expense modeling. Applied Soft
Computing, 75 (2019), 323–332. doi:https://doi.org/10.1016/j.asoc.2018.11.001. https:
//www.sciencedirect.com/science/article/pii/S1568494618306264. (cited on page 16)
Tapia-Leon, M.; Rivera, A. C.; Chicaiza, J.; and Luján-Mora, S., 2018. Application
of ontologies in higher education: A systematic mapping study. In 2018 IEEE Global
Engineering Education Conference (EDUCON), 1344–1353. doi:10.1109/EDUCON.2018.
8363385. (cited on pages 8 and 31)
42 Bibliography
Vaquero, J.; Toro, C.; Martín, J.; and Aregita, A., 2009. Semantic enhancement of
the course curriculum design process. In Proceedings of the 13th International Con-
ference on Knowledge-Based and Intelligent Information and Engineering Systems: Part I,
KES ’09 (Santiago, Chile, 2009), 269–276. Springer-Verlag, Berlin, Heidelberg. doi:
10.1007/978-3-642-04595-0_33. https://doi.org/10.1007/978-3-642-04595-0_33. (cited
on pages 6 and 7)
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.;
Kaiser, L.; and Polosukhin, I., 2017. Attention is all you need. CoRR,
abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762. (cited on page 23)
Wang, Y.; Wang, Z.; Hu, X.; Bai, T.; Yang, S.; and Huang, L., 2019. A Courses On-
tology System for Computer Science Education. 2019 IEEE International Conference
on Computer Science and Educational Informatization, CSEI 2019, 3 (2019), 251–254.
doi:10.1109/CSEI47661.2019.8938930. (cited on pages 6, 7, 8, and 9)
Wohlin, C., 2014. Guidelines for snowballing in systematic literature studies and a
replication in software engineering. ACM International Conference Proceeding Series,
(2014). doi:10.1145/2601248.2601268. (cited on page 5)
Wu, Z. and Palmer, M., 1994. Verbs semantics and lexical selection. In Proceedings
of the 32nd Annual Meeting on Association for Computational Linguistics, ACL ’94 (Las
Cruces, New Mexico, 1994), 133–138. Association for Computational Linguistics,
USA. doi:10.3115/981732.981751. https://doi.org/10.3115/981732.981751. (cited on
page 9)
Xia, K.; Huang, J.; and Wang, H., 2020. Lstm-cnn architecture for human activity
recognition. IEEE Access, 8 (2020), 56855–56866. doi:10.1109/ACCESS.2020.2982225.
(cited on page 20)
Yu, X.; Tungare, M.; Fan, W.; Pérez-Quiñones, M.; Fox, E. A.; Cameron, W.;
and Cassel, L., 2007. Using automatic metadata extraction to build a structured
syllabus repository. Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4822 LNCS (2007),
337–346. doi:10.1007/978-3-540-77094-7_43. (cited on page 31)
Chapter 5
Appendix 1
43
44 Appendix 1
Chapter 5
Appendix 2
Contract
45
46 Appendix 2
UniID: u6809382
COURSE CODE, TITLE AND UNIT: COMP8755 <Individual Computing Project> 12 units
LEARNING OBJECTIVES:
- Understanding and ability to apply/implement semantic technologies, AI / Machine Learning models
and Data Mining techniques;
- Understanding fundamental research to address a research question;
- Apply computing knowledge and implementation skills to the area of Computers in Education;
- Understanding of interdisciplinary research; and,
- Understand how to conduct a substantial research-based project Understand how to analyse data,
synthesize and report research findings.
PROJECT DESCRIPTION:
• Write a literature survey
• Development of a model to improve students' performance using AI/Semantic technologies
• Train and test the model on large datasets
• Tune parameter and inference
• Evaluation/validation of the proposed model
• Deploy the model
• Write thesis
Appendix 3
Description of Software
Dropout prediction
Similarity Measurement
Installment of BERT as Service and runing The installment of BERT used in this
project as the link above or following.
Run the Bert as a Service pointing to the unzipped downloaded model using the fol-
lowing command: bert-serving-start -model_dir /your_directory/wwm_uncased_L-
24_H-1024_A-16 -num_worker=4 -port XXXX -max_seq_len NONE
run python similarity.py to get the result of two sentences in two courses comparison.
The result will be in /result/similarity_measurement/full_similarity
49
50 Appendix 3
run python text_similarity.py to get the result of two courses comparison, which
will be stored in /result/similarity_measurement/full_similarity/similarity_full.csv
Prerequisite Identification
Install the external libraries by using conda or pip command RefDSimple.py is for us-
ing RefD to retrieve and get potential candidates from DBpedia https://www.dbpedia.org/
which is a main knowledge graph in Semantic Web.
config.cfg is the configuration file, you may need to change the proxy before us-
ing it.
Run paper_crawler.py to acquire the paper from SPringer to local file named "dataset-
byabstract", which includes title, abstract, and so on.
Chapter 5
Appendix 4
README file
51
52 Appendix 4