Feature Extraction For Classifying Students Based On Their Academic Performance
Feature Extraction For Classifying Students Based On Their Academic Performance
ACADEMIC PERFORMANCE
Shaik Yacoob, S.S.D.Bhavani, S.Sai Sri Ram, A.Lakshmana Rao, G.Vineela
Department of Computer Science and Engineering,GIET,Rajahmundry,India
Abstract:In today's world of education, building tools to help methods. We clearly classify two groups of students: those
students learn in a formal or online way is a major challenge. The who are likely to complete a activity or career successfully
first stages of using machine learning techniques are to enable and those who seems struggling. Following the recent team
those technologies that focus on predicting student achievement in identification, we can provide additional resources and
terms of marks obtained. The downside to these methods is that support to help them succeed.
they do not work very well in predicting successful students. The On the other hand, "success" and "failure," can be related
purpose of our efforts is twofold. To begin with, we are
or not. Grade B, for example, can be a bad grade for a good
investigating whether underperforming students can be accurately
student but a good grade for a very poor student. We looked
predicted by restoring this function as a problem of binary
at many ways to divide students in the subject, such as student
separation. Second, in order to learn more about factors that may
contribute to improper functioning, we have created a set of
failure, drop-out, doing worse than predicted, and doing
human translators that measure these factors. These features are worse than expected in view of the difficulties of the course.
taken from the University of Minnesota community center grades. We have developed features or characteristics that capture
We conducted research based on these factors to identify the the various elements that affect grades at the end of the year
different groups of students who were interested in determining to gain a good understanding of the learning process and its
their value. most important qualities. We present a comprehensive
Keywords: Python, Machine Learning, Classification analysis using these features for answering following two
algorithms. questions: What factors determine a learner's success? What
are the most important factors? Outcomes are important
because different factors are very important in various
I. INTRODUCTION classification problems.
1
knowledge gaps in this study. This approach captures the 2. Taking those records from dataset which contains
relationship between the subjects in terms of the information dropout students and assign feature value as 1 (Wgr)
they give by combining historical student-course grades with 3. Taking those records from dataset which has grades
information about the courses. lower than expected and assign feature value as 2
3. Grade prediction with temporal course-wise influence: (RelF)
4. Taking those records from dataset which has grade
New educational technology apps that evaluate data value lower than expectation and he is having difficulty
generated by universities are urgently needed to ensure that in study course and assign value as 3 (RelCF)
students graduate on time (4 to 6 years) and are properly 5. Rest student we are marking with feature value as 4
prepared for careers in their respective fields of study. In this which indicate student is performing well.
study, we provide a novel method for evaluating historical By using ‘University of Minnesota’ grade dataset we are
educational records from a big, public institution in order to extracting above features and assign those values as the target
do next-term grade prediction, or estimating the grades that a or class label for this dataset. After extracting features we are
applying 4 machine learning algorithms on this dataset to
student will receive in a course that he or she will enrol in the
generate training model, later new student record will be
next term. Accurate next-term grade prediction provides the applied on this dataset to classify that student records as good
possibility of improved student degree planning, performer or poor performer and we can know the reason of
individualized advising, and automated interventions to help poor performance such as Fgr (indicate as failing student),
students stay on track in their chosen degree programme and Wgr (indicate as dropout), ReIF (lower than expected grade)
graduate on time. We introduce Matrix Factorization with or RelCF (lower than expected grade and having course
Temporal Course-wise Influence, a factorization-based difficulty).
technique for grade prediction that integrates both course- Four algorithms used in this paper:
wise and temporal influence effects. Courses and students are SVM Algorithm: Machine learning supports predicting and
portrayed in this approach as "knowledge" space. The classifying data, and we use a variety of machine learning
methods depending on the database. Support Vector
similarity of a student's latent representation in the
Machine, or SVM, is a straightforward model that can be used
"knowledge" space is modelled as a student's marks. The
to solve division and reversal tasks. It can solve both direct
course-by-course influence is taken into account as an extra and indirect problems and is useful for a variety of
component in predicting grades. The suggested method applications. SVM is a basic system: The method divides data
improves various standard approaches in inferring into categories by drawing a line or hyper plane.The radial
meaningful patterns between pairing of courses within basis function kernel, or RBF kernel, is a famous kernel
academic programmes, according to our results. function in machine learning that is utilised in a variety of
4. Divide students using Data mining algorithms: kernelized learning algorithms. It's especially popular in
support vector machine classification. Intuitively, the further
In this paper, we explore and compare algorithms with our data points are from the hyperplane, the more certain we
several data mining methods to differentiate students based are that they have been classified correctly. As a result, we
on their Moodle usage statistics and final course marks. We want our data points to be as far away from the hyperplane as
have developed a data mining application to make it easier for feasible while being on the correct side. Therefore, when new
educators to prepare and implement data mining techniques. test data is entered, the reaction phase is determined in which
We used real data from Cordoba University students in seven direction the hyperplane reaches.
Moodle studies. On original numerical data, we used Random Forest Algorithm: This is an ensemble algorithm,
discretization and rebalancing pre-processing approaches to which means it will develop an accurate classifier model by
combining different classifier methods. Internally, this
see if better classifier models could be obtained. Finally, we
technique will construct a train model for classification using
argue that a classifier model suitable for instructional
a decision tree algorithm.
purposes must be both accurate and understandable. Finally, Decision Tree Algorithm: This algorithm creates a training
we claim that, in order to be useful for decision making, a model by grouping comparable records together in the same
classifier model fit for instructional application must be both branch of the tree and continuing until all records are grouped
accurate and accessible to teachers. together in the complete tree. The classification train model
refers to the entire tree.
Gradient Boosting Classifiers: Gradient boosting classifiers
III. METHODOLOGY combines multiple weak learning models to make a powerful
predictive model. When making gradient boosting, decision
In this project author is describing concept to predict or trees are often used. Gradient enhancement models are
classify student performance based on their previous gaining popularity due to their ability to differentiate
academic performance. Using this paper, we will concentrate complex data sets, and have recently overcome many of
more on poor performance students by extracting grade Kaggle's data science challenges.
features from their past performance records. In this project Scikit-Learn, a Python machine learning package, offers a
we are using university dataset which contains record from A variety of gradient boosting classifier offerings, including
to W and we are extracting 4 features from this dataset to XGBoost.A single accurate train model will be built by
classify poor performing students. combining different techniques. Gradient Boosting gives
1. Taking those records from dataset which has features D better results in all of these algorithms.
and F and consider as failing student and we will assign
features values as 0 (Fgr) for such students.
2
IV. PROPOSED SYSTEM
3
In the screenshot above we can see total number of records in In the screenshot x-axis represents the name of the algorithm
dataset and then displaying algorithm chooses how many and the y-axis represents the accuracy.Now we can test new
records for training and testing purpose. Now click on student records on this train model to predict or classify new
‘Features Extraction’ button to extract features and assign as student performance. To check new student we need to
class label to the classifier algorithms. upload ‘text.txt’ test dataset from dataset folder and this
dataset contains below data.
4
VI. CONCLUSION
VII. REFERENCES