Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation

Zhao, Shuai; Zhou, Dongbo; Wang, Huan; Chen, Di; Yu, Lin

doi:10.3390/app15031231

Open AccessArticle

Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation

by

Shuai Zhao

¹,

Dongbo Zhou

¹

,

Huan Wang

¹,

Di Chen

^1,*

and

Lin Yu

^1,2

¹

Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China

²

Information Office, Central China Normal University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1231; https://doi.org/10.3390/app15031231

Submission received: 20 December 2024 / Revised: 20 January 2025 / Accepted: 23 January 2025 / Published: 25 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Predicting student academic success is a significant task in the field of educational data analysis, offering insights for personalized learning interventions. However, the existing research faces challenges such as imbalanced datasets, inefficient feature transformation methods, and limited exploration data integration. This research introduces an innovative method for predicting student performance by transforming one-dimensional student online learning behavior data into two-dimensional images using four distinct text-to-image encoding methods: Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF). We evaluated the transformed images using CNN and FCN individually as well as an ensemble network, EnCF. Additionally, traditional machine learning methods, such as Random Forest, Naive Bayes, AdaBoost, Decision Tree, SVM, Logistic Regression, Extra Trees, K-Nearest Neighbors, Gradient Boosting, and Stochastic Gradient Descent, were employed on the raw, untransformed data with the SMOTE method for comparison. The experimental results demonstrated that the Recurrence Plot (RP) method outperformed other transformation techniques when using CNN and achieved the highest classification accuracy of 0.9528 under the EnCF ensemble framework. Furthermore, the deep learning approaches consistently achieved better results than traditional machine learning, underscoring the advantages of image-based data transformation combined with advanced ensemble learning approaches.

Keywords:

academic performance prediction; ensemble learning; machine learning; educational data mining; deep learning

1. Introduction

Over the past few years, institutions of higher learning have increasingly adopted information and communication technology (ICT)-based learning approaches, resulting in the generation of vast amounts of educational data. These data, sourced from Learning Management Systems (LMS), Student Information Systems (SIS), video-assisted courses, and other digital learning platforms, offer valuable insights into student behavior and academic performance patterns [1]. However, this influx of data brings new challenges, including data complexity, diversity, and high dimensionality. To make sense of this vast array of information, Educational Data Mining (EDM) has gained recognition as a significant tool, enabling researchers and educators to analyze data and predict student academic success [2].

Educational data mining (EDM) is an increasingly expanding domain that applies data-driven approaches to analyze and improve various aspects of education. With the increasing availability of data from digital learning platforms, student management systems, and online learning behaviors, EDM provides opportunities to uncover patterns and insights that can inform personalized learning, early intervention strategies, and policy decisions [3]. By harnessing advanced approaches including machine learning and data representation, EDM has proven effective in addressing key educational challenges, including dropout prediction, performance forecasting, and engagement analysis.

The task of predicting student academic success has become a critical research area, as accurate predictions can guide interventions and support strategies, particularly for students at risk of poor performance or dropout [4]. Traditional predictive models have primarily relied on structured data, such as test scores, attendance records, and demographic information. While these features provide valuable insights, they often oversimplify the multifaceted nature of student performance by overlooking behavioral data, which can reveal deeper patterns of engagement, effort, and learning dynamics. Behavioral data, such as online learning interactions, video engagement, and assignment submission patterns, are inherently unstructured and multidimensional, making them challenging to analyze using conventional methods. As a result, valuable information embedded in these data remains untapped in many predictive frameworks. This limitation underscores the need for innovative approaches that can effectively extract and model behavioral data to enhance the precision and robustness of academic success predictions.

Data-driven methods have greatly transformed this field, as machine learning and deep learning techniques now enable the analysis of various data types, including engagement metrics, grades, and participation patterns [5]. However, a persistent challenge in predicting student success is the complexity and high dimensionality of behavioral data collected from online learning platforms. Traditional predictive methods typically rely on one-dimensional numerical or text-based data, which can overlook latent patterns within the data that may significantly influence academic outcomes [6]. In recent research, transforming text-based and numerical behavioral data into alternative formats has shown promise for boosting the effectiveness of predictive architectures. One particularly effective approach involves converting behavioral data into image-based representations, which allows for the application of advanced image-processing techniques from the field of deep learning [7]. By representing behavioral data visually, these methods aim to capture temporal dependencies, correlations, and intricate patterns that might be difficult to identify using traditional linear modeling techniques [8]. Deep learning architectures, particularly Convolutional Neural Networks (CNNs), are highly appropriate for this work. CNNs are highly effective at extracting spatial patterns from image-based data, making them ideal for analyzing the transformed behavioral representations. Combining image features within an ensemble framework enhances predictive performance by leveraging their complementary strengths, offering a comprehensive approach to analyzing behavioral data. This research contributes several key findings in the domain of predicting student academic success.

(1): This study introduces a novel feature integration approach by combining high-dimensional image features extracted by a CNN from student data transformed into images using methods such as Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF) with low-dimensional numerical features extracted by an FCN from the original data. The feature-level fusion mechanism leverages concatenated features from both networks, enabling the model to automatically learn optimal combinations of complementary information from heterogeneous data sources for improved classification accuracy.
(2): A feature-level fusion mechanism is introduced, where the output feature vectors from the CNN and FCN are concatenated to establish a unified feature structure. Instead of direct weighting, this method empowers the ensemble framework to learn an optimal combination of features through end-to-end training, capturing complex temporal and nonlinear relationships from image data while preserving essential numerical details from data, leveraging the complementary strengths of both networks and improving classification robustness.
(3): To benchmark the proposed approach, a range of machine learning techniques, such as Stochastic Gradient Descent (SGD), Gradient Boosting (GB), Decision Tree (DT), Extra Tree (ET), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), AdaBoost, and Random Forest (RF) are employed for performance comparison. Meanwhile, several deep learning methods, consisting of a Convolutional Neural Network (CNN), Fully Connected Network (FCN), MobileNet, and EfficientNetB4, are used for performance comparison. The evaluation metrics enable a thorough evaluation of the proposed framework against cutting-edge methods.

The structure of this study is presented as detailed below: Section 2 introduces a comprehensive survey of existing work employing machine learning, deep neural networks combined with feature transformation, and ensemble learning to predict student academic performance, highlighting current research and key investigation results in the domain. Section 3 introduces the data collection utilized within this research, details the proposed methodology, and describes the feature transformation techniques, machine learning classifiers, and deep neural networks applied, along with the evaluation metrics employed for analysis. Section 4 focuses on the experimental results and their interpretation, emphasizing the significance of the findings in achieving the research objectives. In the final section, Section 5 offers a conclusion by highlighting the major contributions and their implications, while also proposing potential future research directions and areas for further exploration.

2. Related Work

2.1. Machine Learning with Student Success Prediction

Predicting student academic performance has been a prominent research area in educational data mining, with numerous machine learning approaches applied to detect students with high risk and predict academic outcomes. Conventional machine learning models, for instance, Random Forest, Decision Tree (DT), K-Nearest Neighbor (KNN), and Gradient Boosting (GB), have been widely used to process structured student data, such as demographic information, course grades, and engagement metrics.

Asif et al. [9] utilized data mining approaches such as dividing into clusters to forecast students’ final grades and used decision trees to monitor students’ learning progress, achieving a maximum accuracy of 83.63%. Yagci [10] employed classical machine learning algorithms to predict students’ performance in Turkish language courses, achieving classification accuracy between 70% and 75%. Bujang et al. [11] introduced a student grade forecasting system driven by AutoML, combining different machine learning algorithms with the SMOTE technique and feature selection strategies. They found that Random Forest achieved an F1 score of 99.5%. Nayak et al. [12] applied a range of machine learning approaches like RF, NB, DT, and MLP to two datasets for student classification. Ram et al. [13] developed a machine learning platform for predicting learners’ academic achievement, utilizing classifiers like Support Vector Machines, AdaBoost, Logistic Regression, and Random Forest. They found that SVM and Random Forest achieved accuracies of 92%, while Logistic Regression and AdaBoost achieved accuracies of 91%. Balabied et al. [14], utilizing the OULAD dataset, reported an accuracy rate of 90% using the random forest algorithm.

These models are particularly effective when working with numeric data, where the relationships between features can be directly modeled. However, while these approaches are valuable, they often struggle to capture more complex temporal or sequential patterns in behavioral data, such as those generated from online learning environments. Moreover, these approaches typically depend on feature extraction, which can limit their ability to generalize to new datasets or uncover hidden relationships within the data.

2.2. Deep Learning with Image-Based Data Transformation Techniques

Recent work has investigated the application of deep neural networks to enhance performance in student academic success prediction, taking advantage of their potential to automatically extract features from original data. Aljohani et al. [15] utilized clickstream data from students’ weekly online learning sessions in the OULAD dataset for forecasting whether students will succeed or fail employing deep neural networks, specifically LSTM models. Their model achieved an accuracy of 95.23% in predicting academic outcomes during the final week of e-learning courses, exceeding SVM, LR, and ANN. Waheed et al. [16] leveraged big data from Virtual Learning Environments (VLEs) and applied deep artificial neural networks to forecast students’ academic outcomes. Their research demonstrated that deep neural networks can effectively forecast academic achievement from VLE big data. Huang and Zeng [17] proposed an innovative framework for forecasting academic achievement, utilizing dual graph neural networks to efficiently leverage both structural data from interaction behaviors and the feature spaces of student attributes. The architecture outperforms existing methods, reaching a classification accuracy of 83.96% in predicting pass/fail outcomes and 90.18% for predicting pass/withdraw outcomes with a commonly used open-access dataset.

Nevertheless, these models are often applied to one-dimensional, unprocessed data, which may not fully leverage the representational power of deep learning architectures designed for complex data types like images. Transforming non-visual data into image-based formats is a technique gaining traction in diverse areas, like finance, industry, and, more recently, education. The motivation behind this transformation is to allow the use of powerful image classification methods, particularly the convolutional neural network (CNN), which excels in detecting spatial patterns and correlations. The CNN is perfectly designed for handling spatially structured data, rendering it an ideal choice for image recognition tasks because of its capability to automatically extract hierarchical features based on unprocessed image data. Several methods have been proposed to transform time-series or sequence-based data into two-dimensional images. Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF).

Ben Said et al. [18] presented a dual-path deep learning architecture that effectively predicts the performance of online learners early by analyzing clickstream data and converting it into images utilizing the Gramian Angular Field (GAF). The model combines demographic and assessment data and demonstrated promising results on the OULAD dataset. Yang et al. [19] introduced a sensor classification framework through the conversion of multivariate time series data into 2D images via GASF, GADF, and MTF methods. The framework utilized a CNN model for classification tasks, achieving high accuracy comparable to complex architectures and outperforming traditional methods. Li and Wang [20] employed the Gradient Angle Difference Field (GADF) method to transform time series features of physiological signals into 2D time-series images, significantly improving individual difference classification accuracy and outperforming traditional time-series features. Yin et al. [21] proposed an innovative time-series similarity measurement approach that transforms time series into 2D images by integrating time-based and frequency-based features. The method combines Recurrence Plots (RPs) and Wavelet Scalograms (WSs) to generate fused images, enhancing information for similarity analysis. ResNet-18 was employed to capture features from the fused images, and Euclidean distance between feature vectors was applied to evaluate similarity. Experiments on the UCR time-series dataset showed that this fused approach significantly outperformed traditional and single-domain methods. Jin et al. [22] converted time-series data into visual representations utilizing the Gramian Angular Field (GAF) and Recurrence Plots (RPs) methods. The approach applied LSTM, GRU, and Bidirectional LSTM networks to process raw time-series data. It utilized a Broad Learning System (BLS) to learn from the generated images and combined probabilistic outputs using the Dempster–Shafer evidence theory, achieving competitive results on public datasets.

2.3. Ensemble Learning Approaches with Student Success Prediction

Ensemble learning has been widely adopted to improve classification accuracy by combining multiple models, thereby reducing variance and increasing robustness [23]. Techniques like bagging [24], boosting [25], and stacking [26] create ensemble models that generalize better to unseen data, addressing the limitations of individual models [27]. In particular, deep learning ensembles have shown promise in tasks requiring both high-level feature extraction and fine-grained pattern recognition [28]. Ensemble approaches have been used extensively in educational research to integrate predictive models and increase the reliability of student success predictions.

Saidani et al. [29] introduced an approach that integrates CNN feature extraction with machine learning models to estimate students’ academic outcomes. By utilizing an ensemble model of Random Forest and SVM, the approach achieved a classification accuracy of 98.99%, outperforming existing models. Teoh et al. [30] developed a method for estimating student performance in video-based learning environments, utilizing ensemble learning techniques including stacking, boosting, and bagging. The experimental findings indicate that boosting reached the best achievement, with an accuracy of 90.9%, outperforming stacking and bagging. Shayegan et al. [31] employed an ensemble model that integrates multiple machine learning classifiers as base learners and Logistic Regression as the final learner to forecast student outcomes on the OULAD dataset. The model achieved 98% accuracy, a 4% improvement over traditional methods. Al-Ameri et al. [32] used features extracted by a CNN in combination with an ensemble learning model that integrates SVM and RF classifiers to forecast student academic success. The proposed method achieved an accuracy of 97.88%, outperforming traditional models.

Ensemble learning can also be used to identify the most relevant features of the dataset associated with students’ performance. For instance, a hybrid machine learning framework that combines several classification models and ensemble architecture to identify the most effective predictive model has been proposed [33]. The results show that the ensemble methods achieve greater predictive accuracy compared to individual classifiers, with the model accuracy enhanced by the application of feature selection techniques. Furthermore, ensemble learning can be utilized to forecast students’ performance in e-learning environments. For example, a study has introduced an ensemble meta-model that aggregates predictions from optimal models to improve student classification accuracy, resulting in a final predictive model [34], and the findings indicate that the ensemble meta-model achieves superior performance compared to single models, with an accuracy of 93%.

In summary, ensemble learning is a powerful technique for student academic performance prediction, since it is capable of enhancing the performance of individual models and providing more robust predictions. Through the combination of the strengths of diverse models and feature selection techniques, ensemble learning can provide accurate predictions of students’ performance and help educators recognize students who need extra assistance and interventions.

3. Materials and Methods

3.1. Dataset

This research makes use of a dataset containing student online learning behavior data collected from an e-learning platform named MOODLE. It was released by Hasan et al. [35] and comprises three categories, i.e., students’ academic information, video interactions, and activities. Academic information was extracted based on the Student Information System (SIS), while video interaction data were obtained from eDify, and activity data were sourced from MOODLE. This mapping process yielded the final dataset containing 326 samples and 21 features. These include the features “Applicant Name”, “Attempt Count”, “Prohibition”, “CGPA”, “Remote Student”, “High Risk”, “At Risk”, “Term Exceeded”, “At Risk SSC”, “Plagiarism history”, “CW1”, “Other Modules”, “CW2”, “Online C”, “ESE”, “Online O”, “Paused”, “Played”, “Likes”, “Segment”, and “Result” [29]. The target variable in this study is “Result”, and the main feature description of the dataset is shown in Table 1.

3.2. Data Preprocessing

The dataset exhibited a slight imbalance between pass and fail labels, which could potentially bias the models towards the majority class. To address this issue, we applied SMOTE to the training data, helping to balance the class distribution. This technique was implemented on both the original data for traditional machine learning algorithms and the transformed image data for deep neural networks. Table 2 illustrates the class distribution of the dataset.

3.3. Image-Based Transformation Techniques

In this section, to leverage deep learning models for classification, we transformed the one-dimensional student behavioral data into two-dimensional images using four different techniques: Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF). The online learning behavior data of each student were transformed into separate images, resulting in a total of 326 images. Each transformation method is designed to capture different aspects of the temporal and structural relationships within the dataset. Figure 1 presents a sample of the four different kinds of images transformed by 1D raw data.

3.3.1. Pixel Representation

Pixel Representation (PR) is a straightforward approach that transforms one-dimensional numerical data into a two-dimensional image by mapping each value directly to grayscale pixel intensity [36]. This method provides a basic yet effective visualization of raw data, where each student’s behavior sequence is represented as a matrix of pixel intensities that correspond to the magnitude of the data values. The grayscale level of each pixel indicates the behavior data’s numerical value, with darker pixels representing higher values and lighter pixels indicating lower values. This approach maintains the sequential order of data points, allowing convolutional neural networks to detect patterns in student behavior over time based on pixel intensities and distributions.

3.3.2. Sine Wave Transformation

Sine Wave Transformation (SWT) involves generating a sine wave corresponding to the values in the one-dimensional behavior data [37]. With this technique, every data point is used to modulate a sine wave, where the amplitude, frequency, or phase may vary depending on the value. The sine wave generated is then represented as an image. This transformation emphasizes the periodic nature of the behavioral data and can capture recurring patterns or regular intervals in the student’s activities. Sine Wave Transformation creates a continuous and smooth image that represents temporal data trends, allowing deep learning models to learn patterns related to periodic behavior changes. This method is especially useful in analyzing sequential data, where periodic learning behaviors may correlate with academic success or struggle.

3.3.3. Recurrence Plot

As a nonlinear transformation, a Recurrence Plot (RP) visualizes the recurrence of states in a time series [38]. Given a sequence of student behavior data, the RP calculates when certain states reappear over time, creating a symmetric matrix where each point represents a recurrence between two data points. High-density clusters on the plot indicate frequent recurrences, while isolated points or gaps suggest unique or one-time events. In this study, the RP captures the complex temporal dependencies and repetitive patterns within the behavioral data. By transforming the data into a recurrence matrix, the RP provides deep learning models with a robust view of recurring behavioral sequences, making it particularly effective in distinguishing consistent academic behaviors that contribute to success or failure.

3.3.4. Gramian Angular Field

Gramian Angular Field (GAF) is an approach that encodes time-series data into images by transforming it into polar coordinates and calculating the trigonometric sum or difference between each pair of points [39]. Specifically, GAF first scales data values into the interval [−1, 1] [−1, 1] [−1, 1] to convert them into angular values, representing each point’s relationship over time as an image. This transformation captures both the magnitude and the directional changes from the dataset, making it especially effective for uncovering patterns that arise from fluctuations in student behavior over time. The GAF representation provides a rich visual summary of the time-series data. In this study, we use GAF to represent both the temporal evolution and magnitude differences across the student behavior data.

In summary, the selection of Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF) is based on their ability to capture diverse aspects of student behavior data. PR provides a simple yet effective visualization of numerical structures, SWT emphasizes cyclical and temporal trends, RP uncovers complex temporal dynamics and recurrence patterns, and GAF preserves both spatial and temporal dependencies. As a whole, these methods enable a thorough investigation of student learning behaviors, utilizing the advantages of different image transformation techniques to enhance classification performance.

3.4. Machine Learning Models

In this section, we introduce the application of different machine learning models in this research, offering a comprehensive review of their execution. The Scikit-learn library, alongside the Natural Language Toolkit (NLTK), serves as the foundation for executing these algorithms. Specifically, a range of supervised learning methods, commonly utilized for classification and regression tasks, have been implemented using Python’s Scikit-learn module. For this research, RF [40], SVM [41], ET [42], GB [43], AdaBoost [44], DT [45], LR [46], SGD [47], KNN [48], and NB [49] were employed to conduct experiments on the original data, with default parameter settings serving as baseline comparisons.

3.5. Deep Learning Models

Deep learning models have attracted significant focus due to their remarkable effectiveness across a multitude of applications. These models demonstrate a strong capability to extract essential features and capture complex relationships with the target class. This research employs deep learning architectures, comprising CNN, FCN, MobileNet, and EfficientNetB4. A brief summary of these models is provided below.

3.5.1. Conventional Neural Network

A Convolutional Neural Network (CNN) is a class of deep learning models modeled after the brain’s visual processing, widely utilized in computer vision and pattern recognition [50]. A CNN leverages convolutional layers to extract hierarchical spatial features and pooling layers to reduce dimensionality, thereby improving computational efficiency while preserving essential information. Its unique weight-sharing mechanism and ability to learn representations at different levels make CNNs especially powerful for processing high-dimensional data, such as images and signals. With its remarkable performance in applications such as classification, detection, and segmentation, CNN has become a cornerstone in addressing complex multimodal data analysis challenges.

3.5.2. Fully Connected Network

A Fully Connected Network (FCN) is a basic architecture consisting of multiple layers, with each neuron fully connected to the next layer [51], and this dense connectivity enables an FCN to capture intricate, non-linear relationships from the data, rendering it highly efficient for applications requiring feature integration and representation learning. While traditionally used for classification and regression, the FCN has been adapted for a range of utilizations in the domain of image processing. Despite its high expressive power, the fully connected structure often leads to increased computational costs and susceptibility to overfitting, necessitating techniques such as regularization and dropout to enhance generalization.

3.5.3. MobileNet

The MobileNet is an efficient CNN architecture optimized for implementation in mobile and embedded systems, as highlighted by Ashwinkumar et al. [52]. Its core innovation lies in the use of depthwise separable convolutions, which substantially decrease the parameter size and computational cost compared to standard convolutions. This design enables MobileNet to deliver competitive performance in image classification tasks while being highly suitable for resource-constrained environments. MobileNet efficiently captures spatial information in input images by leveraging its modular design, rendering it the optimal solution for applications requiring instantaneous inference on low-power devices. However, its lightweight nature and emphasis on efficiency may pose challenges in capturing highly complex or intricate patterns present in certain datasets. While MobileNet excels in tasks involving relatively simple visual features, its performance might be constrained in scenarios requiring deeper and more specialized feature extraction capabilities. Despite these limitations, MobileNet remains a robust and adaptable solution for various computer vision tasks, particularly in environments with strict computational and memory limitations.

3.5.4. EfficientNetB4

The EfficientNetB4 model is an advanced CNN architecture developed for achieving high capability in image classification tasks with optimal computational efficiency, as introduced by Geetha and Prakash [53]. Its core innovation lies in the hybrid scaling method, which consistently adjusts the network’s architecture complexity, model width, and resolution by applying a set of fixed scaling factors. This method ensures a balanced trade-off between accuracy and efficiency, enabling EfficientNetB4 to reach leading achievement with a substantially reduced parameter count and lower computational cost compared to a conventional CNN. EfficientNetB4 effectively captures spatial and contextual information from input images through its well-designed architecture, leveraging features such as squeeze-and-excitation blocks to enhance feature recalibration. Its scalability and efficiency make it particularly suitable for tasks that demand high accuracy while operating within limited computational and memory budgets. However, due to its higher complexity compared to lightweight models like MobileNet, EfficientNetB4 may require more computational resources, rendering it less appropriate for instantaneous applications on low-power devices.

In conclusion, the selection of CNN, FCN, MobileNet, and EfficientNetB4 is based on their complementary capabilities in addressing the various challenges of image-based classification tasks for student behavior data. The CNN is chosen for its capability to automatically detect spatial hierarchies and capture local features, making it effective in identifying key features from transformed images. The FCN extends this functionality by enabling pixel-level feature extraction, which is crucial for capturing fine-grained details across the entire image, especially in dense data representations. MobileNet is selected for its lightweight architecture and computational efficiency, ensuring it is appropriate for instantaneous utilization and environments with limited resources. EfficientNetB4 is included for its advanced scalability and state-of-the-art performance, which balance model depth, width, and resolution, ensuring high accuracy on complex and high-resolution image data. Ultimately, these models offer a robust, efficient, and versatile framework for estimating student learning performance, leveraging the strengths of feature extraction, efficiency, and scalability.

3.6. Proposed Methodology

To attain better results, abundant experiments have demonstrated a strong inclination towards aggregated machine learning models. These models, which integrate multiple classifiers, consistently exceed single algorithms, rendering them a popular and effective option. Accordingly, this research adopts an ensemble approach to predict students’ academic performance.

As depicted in Figure 2, our proposed ensemble model integrated two deep learning models: CNN and FCN. Research is performed with an e-learning dataset in three distinct experiments.

(1) Using the 1D raw data with conventional and deep learning approaches;

(2) Using the transformed 2D data with deep learning methods;

(3) Using the transformed 2D data with ensemble learning methods.

For the first experiment, the whole features were used to predict academic outcomes. In the second experiment, we applied four different text-to-image transformation methods—Pixel Representation (PR), Gramian Angular Field (GAF), Sine Wave Transformation (SWT), and Recurrence Plot (RP)—to convert the original features into images, resulting in four distinct types of transformed image. Each student had four images generated using different transformation methods, resulting in a total of 1304 images. Classification experiments were conducted separately on different kinds of images using individual CNN and FCN models. For the third experiment, an ensemble learning approach was adopted, integrating the CNN and FCN models into a framework. The original features were transformed into images using the four aforementioned text-to-image methods and fed into the CNN model for training, while the raw features were directly input into the FCN model. This ensemble model effectively leveraged both the initial features and the extracted features from the transformed images.

We introduce an innovative ensemble learning architecture that integrates CNN and FCN for student performance prediction in Figure 3. The architecture comprises three main layers: data preprocessing, feature extraction, and ensemble prediction. In the preprocessing layer, categorical features are encoded using Label Encoder, and class imbalance is addressed through SMOTE oversampling. The feature extraction layer operates through two parallel paths: (1) a transformation of raw data into recurrence plots, which are then processed by a CNN with two convolutional layers (32 and 64 filters) followed by max-pooling and fully connected layers, and (2) the direct processing of numerical features through an FCN with two hidden layers (64 and 32 neurons). The ensemble prediction layer combines the outputs from both networks through a concatenation operation, followed by a final fully connected layer with sigmoid activation for classification. This model is executed using cross-entropy loss and an Adam optimizer at a learning rate of 0.001, incorporating dropout (0.5) for regularization. The training process implements early stopping with a patience of 10 epochs to prevent overfitting. In general, the CNN is employed to process student data that has been transformed into image format. This approach leverages the CNN’s capability to effectively extract patterns from images that correlate with student behavior. The FCN class defines a simple fully connected network designed to process the original numerical feature data. The FCN directly handles the raw student features, serving as a complement to the image features extracted by the CNN.

Algorithm 1 presents a detailed step-by-step representation of the proposed ensemble model. Additionally, a concise pseudocode representation is provided below, illustrating the algorithm used to integrate the CNN and FCN for classifying students’ academic performance into pass and fail categories.

Algorithm 1: The proposed EnCF algorithm

Require: Training data

D = {(X_{i}, y_{i})}_{i = 1}^{N}

, where

X_{i}

is the input feature vector, and

y_{i}

is the target classes.

Require: Ensemble Model EnCF combining CNN with FCN.

Ensure: Predicted target classes

\hat{y}

for test data.
1: Split

D

into training set

D_{t r a i n}

and test set

D_{t e s t}

.

2: Use SMOTE to resample

D_{t r a i n}

to handle class imbalance.

3: Train CNN on image features from

D_{t r a i n}

.

4: Train FCN on raw features from

D_{t r a i n}

.
5: Combine CNN and FCN outputs using a fully connected layer for final prediction.
6: Ensemble Prediction:
7: For each sample,

(X_{j}, y_{j})

in

D_{t e s t}

:
8: Compute image-based features using CNN.
9: Compute raw features using FCN.
10: Combine CNN and FCN outputs to predict class

\hat{y}

.
11: End for
Return: Predicted target classes

\hat{y}

for test data.

3.7. Evaluation Metrics

Performance evaluation metrics play a critical role in assessing model effectiveness and pinpointing areas for refinement and optimization. Commonly utilized metrics include F1 score, recall, accuracy, and precision, as highlighted by Chicco and Jurman [54]. These metrics are calculated based on values such as true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

Accuracy can be calculated as

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(1)

Precision is calculated using

Precision = \frac{TP}{TP + FP}

(2)

Recall is computed as

Recall = \frac{TP}{TP + FN}

(3)

F1 score can be calculated using

F 1 score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(4)

4. Results and Discussion

4.1. Experiment Setup

In this experiment, the model training was executed on a system equipped with an Intel Xeon Platinum 8362 processor and an NVIDIA A40 GPU. All models were implemented using Python. Comprehensive details regarding the hardware and software configuration can be found in Table 3. The dataset was divided into training and testing subsets at a ratio of 4:1.

4.2. Experimental Results of 1D Raw Data

In this section, different kinds of methods are employed to perform the experiment using 1D raw data. We carried out a thorough analysis of the evaluation results across all models.

4.2.1. Results for Machine Learning Models with 1D Raw Data

The results of the experiment can be found in Table 4 and Figure 4. The results reveal that RF achieves the highest accuracy of 92.42%, surpassing all other ML models. It is followed by the SVM, ET, and GB models, each with an accuracy of 90.91%. The NB model performs the worst on the balanced dataset, achieving 54.55% accuracy. The RF model also excels in other evaluation metrics, including 93.08% precision, 92.42% recall, and 91.69% F1 score. As previously noted, data upsampling techniques enhanced the performance of NB and LR, but their results remain less impressive compared to other tree-based ensemble models.

4.2.2. Results for Deep Learning Models with 1D Raw Data

Deep learning models were utilized on the raw data, with the results summarized in Table 5. The deep learning approaches yielded encouraging outcomes. Among the models, the CNN showed the best performance, reaching 84.91% accuracy. MobileNet and EfficientNet-B4 obtained accuracies of 82.08% and 83.02%, respectively.

4.2.3. Results of All Models with 1D Raw Data

The experimental findings of our study are displayed in Table 6 and Figure 5. We evaluated the results of various ML models and DL models. The results show that the top-performing ML models, RF, SVM, ET, and GB, achieve higher accuracy than the DL models. The best-performing ML model, RF, obtains an accuracy of 0.9242, while the most efficient DL model, 1D CNN, achieves an accuracy of 0.8491. The RF model is a suitable choice for this type of data, achieving high accuracy. The results indicate that ensemble methods, such as RF, ET, and GB, outperform single models, such as DT, LR, and SGD. Tree-based models, such as RF, ET, and GB, perform well on the dataset, achieving high accuracy.

4.3. Results of Deep Learning with Four Feature Transformation Methods for 2D Data

The experimental results of our study are presented in Table 7. We evaluated the results of two DL models, a 2D CNN and 2D FCN, using four different feature transformations: PR, GAF, SWT, and RP. The 2D CNN model outperforms the 2D FCN model on all four feature transformations, except for the PR feature transformation, where the 2D FCN model achieves a slightly higher accuracy. The results demonstrate that the preference for feature transformation has a significant influence on the performance of DL models. The RP feature transformation results in the best performance, achieving an accuracy of 0.9340. Next is the SWT feature transformation, achieving an accuracy of 0.8868. The GAF feature transformation results in moderate performance, while the PR feature transformation results in lower performance. The 2D CNN model outperforms the 2D FCN model on most feature transformations.

4.4. Results of Ensemble Learning with Four Feature Transformation Methods for 2D Data

The experimental results of our study are presented in Table 8 and Figure 6. We evaluated the performance of two single DL models (2D CNN and 2D FCN) and an ensemble method (2D EnCF) that combines these two models using four different feature transformation techniques (PR, GAF, SWT, and RP). The RP preprocessing technique yielded the best results for all three models, with the 2D EnCF ensemble method achieving an accuracy of 0.9528, outperforming the single 2D CNN and 2D FCN models. The 2D EnCF ensemble method consistently outperformed or matched the performance of the single 2D CNN and 2D FCN models across all preprocessing techniques. This suggests that combining multiple deep learning models can lead to improved performance. The results demonstrate that the ensemble method (2D EnCF) is superior to the single DL models (2D CNN and 2D FCN) in terms of performance and robustness. The combination of multiple models can leverage their strengths and mitigate their weaknesses, leading to improved overall performance. Additionally, the choice of preprocessing technique substantially impacts the results of all three models.

4.5. Results of All Models Using 1D and 2D Data

The comprehensive overview of experimental results is presented in Table 9 and Figure 7. In Table 9, “/” denotes that the image transformation method was not used. We assessed the results of various ML models, including traditional ML methods and DL models. The DL models include single models such as 1D CNN, MobileNet, EfficientNetB4, 2D CNN, and 2D FCN, as well as an integrated architecture, the 2D EnCF, that merges the advantages of the 2D CNN and 2D FCN. The best-performing traditional machine learning method is Random Forest (RF), which achieves an accuracy of 0.9242. However, the 2D EnCF ensemble method outperforms all traditional machine learning methods, achieving an accuracy of 0.9528 on the RP feature transformation. The single deep learning models achieve varying levels of performance, with the best-performing single model being the 2D CNN on the RP feature transformation with an accuracy of 0.9340. However, the 2D EnCF ensemble method consistently outperforms the single deep learning models across different feature transformations. This indicates that the selection of feature transformation exerts a substantial influence on the performance of the deep learning models. The RP feature transformation results in the best performance for all deep learning models, including the 2D EnCF ensemble method.

The results demonstrate that the 2D EnCF ensemble method with RP is superior to both conventional ML methods and single DL models with respect to performance, and the combination of various models can leverage their strengths and mitigate their weaknesses, yielding superior overall results. Additionally, the choice of feature transformation significantly determines the performance of DL models.

4.6. Comparison with Previous Studies

In order to assess the effectiveness of our proposed approach, we conducted a comparative analysis with prior research. Previous studies employing various machine learning methods for predicting student academic performance [9,10,11,12,13,14] have reported accuracies reaching up to 92%. Research utilizing deep learning approaches [15,16,17] has achieved accuracies of up to 95.23% in similar prediction tasks. Further advancements involving the fusion of deep learning and ensemble learning techniques [29,30,31,32] have yielded accuracies as high as 98.99%. While the application of image transformation techniques to behavioral data in conjunction with deep learning for academic performance prediction remains relatively unexplored, a study similar to ours [18] employed clickstream data transformed into images, combined with deep learning, achieving an accuracy exceeding 95%.

In theory, converting 1D data into 2D images allows for capturing complex temporal and spatial relationships, which a DL model, particularly a CNN or FCN, is especially proficient at processing. DL models automatically extract hierarchical features, overcoming the limitations of conventional ML methods that depend on manual feature selection. Practically, this approach improves the performance by employing rich behavioral data that reflects students’ learning patterns, which are not fully captured by traditional numerical features. The use of image-based representations allows for improved generalization and robust predictions. Additionally, this method facilitates personalized educational interventions and supports the development of intelligent, data-driven educational systems that can proactively address students’ needs.

5. Conclusions

Educational data mining serves as a powerful analytical approach for evaluating educational data and forecasting students’ academic outcomes, assisting educators in making informed decisions. However, existing research faces several challenges, including the use of imbalanced datasets, reliance on entire feature sets without leveraging efficient transformation methods, and insufficient exploration of data integration strategies. To address these limitations, this research proposes an innovative architecture that combines advanced feature transformation, data integration, and ensemble learning to enhance the prediction of student academic success.

In this study, we transform one-dimensional textual behavioral data into two-dimensional images using four methods: Pixel Representation (PR), Sine Wave Transformation (SWT), Recurrence Plot (RP), and Gramian Angular Field (GAF). These transformations not only enhance the visualization of behavioral features but also capture complex temporal and non-linear relationships among student behaviors. The transformed images serve as input to a CNN, which captures high-dimensional image features, while the original data is processed by an FCN to extract low-dimensional numerical features. To fully exploit the complementary nature of these heterogeneous feature representations, we propose a novel dual-stream architecture where the outputs of a CNN and FCN are concatenated at the feature level to form a unified representation. This feature-level fusion mechanism enables the model to automatically learn optimal combinations of high-dimensional and low-dimensional features through an end-to-end training pipeline. By integrating diverse feature types, the model achieves significant improvements in classification accuracy and robustness compared to single-modality approaches. Among the image transformation methods, the Recurrence Plot combined with the EnCF ensemble demonstrated the best performance, outperforming both traditional machine learning methods applied to raw data and CNN-only and FCN-only models based on image-transformed data. This highlights the effectiveness of combining neural network architectures to capture complex patterns in both raw and transformed data. The experimental findings demonstrate that the RP-based image transformation, coupled with the EnCF ensemble network, achieves the highest prediction accuracy. This integration approach leverages the strengths of both feature extraction techniques, offering a more detailed and distinct representation of student data. Compared to traditional methods, the proposed framework demonstrates superior performance in terms of classification accuracy, robustness, and interpretability.

However, a significant limitation is the relatively modest dataset size, which poses challenges for the training of DL models. This limited sample size potentially elevates the risk of overfitting and may constrain the generalizability of findings to more extensive or heterogeneous datasets. Furthermore, the behavioral data employed in this study may not fully encapsulate the diversity and complexity inherent in student learning patterns across varying contexts and demographics. This limitation stems from a potential lack of comprehensiveness in the behavioral features captured. In addition, although the selected image transformation methods facilitate the extraction of salient features, they may not entirely preserve the intricate temporal and spatial relationships intrinsic to student behaviors. Consequently, some crucial predictive information for accurate modeling might inadvertently be omitted.

Future studies could extend this approach by incorporating other advanced architectures, such as transformer-based models or attention mechanisms, to further improve prediction accuracy. Additionally, this method can be applied to larger and more diverse datasets to validate its generalizability across various academic performance indicators, such as course completion rates or long-term success metrics. Exploring multi-class classification tasks could also provide deeper insights into varying levels of academic achievement. Finally, improving interpretability through techniques that link behavioral patterns with academic outcomes will further support educators in making data-driven decisions.

Author Contributions

S.Z.: Writing—original draft, formal analysis, methodology; D.Z.: writing—reviewing and editing, conceptualization, supervision, funding acquisition; H.W.: data curation, visualization, software; D.C.: writing—reviewing and editing, supervision, funding acquisition; L.Y.: writing—reviewing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62293553, 62177017, the Hubei Provincial Natural Science Foundation of China under Grant 2023AFA020, and the National Social Science Funds of China under Grant BCA230278.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PR	Pixel Representation
SWT	Sine Wave Transformation
RP	Recurrence Plot
GAF	Gramian Angular Field
CNN	Convolutional Neural Network
OULAD	Open University Learning Analytics Dataset
FCN	Fully Convolutional Network
EnCF	Ensemble CNN and FCN
SMOTE	Synthetic Minority Over-sampling Technique
RF	Random Forest
SVM	Support Vector Machine
ET	Extra Trees
GB	Gradient Boosting
DT	Decision Trees
LR	Logistic Regression
SGD	Stochastic Gradient Descent
KNN	K-Nearest Neighbors
NB	Naive Bayes
ML	Machine Learning
DL	Deep Learning
ICT	Information and Communication Technology
LMS	Learning Management Systems
SIS	Student Information Systems
EDM	Educational Data Mining
GADF	Gradient Angle Difference Field
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Units
BLS	Broad Learning System
WS	Wavelet Scalograms
GASF	Gramian Angular Summation Field
NLTK	Natural Language Toolkit
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative

References

Hasan, R.; Palaniappan, S.; Mahmood, S.; Shah, B.; Abbas, A.; Sarker, K.U. Enhancing the teaching and learning process using video streaming servers and forecasting techniques. Sustainability 2019, 11, 2049. [Google Scholar] [CrossRef]
Romero, C.; Ventura, S. Educational data mining and learning analytics: An updated survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
Du, X.; Yang, J.; Hung, J.-L.; Shelton, B. Educational data mining: A systematic review of research and emerging trends. Inf. Discov. Deliv. 2020, 48, 225–236. [Google Scholar] [CrossRef]
Albreiki, B.; Zaki, N.; Alashwal, H. A systematic literature review of student’performance prediction using machine learning techniques. Educ. Sci. 2021, 11, 552. [Google Scholar] [CrossRef]
Matz, S.C.; Bukow, C.S.; Peters, H.; Deacons, C.; Dinu, A.; Stachl, C. Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Sci. Rep. 2023, 13, 5705. [Google Scholar]
Umer, R.; Susnjak, T.; Mathrani, A.; Suriadi, L. Current stance on predictive analytics in higher education: Opportunities, challenges and future directions. Interact. Learn. Environ. 2023, 31, 3503–3528. [Google Scholar] [CrossRef]
Yang, Z.; Yang, J.; Rice, K.; Hung, J.-L.; Du, X. Using convolutional neural network to recognize learning images for early warning of at-risk students. IEEE Trans. Learn. Technol. 2020, 13, 617–630. [Google Scholar] [CrossRef]
Kong, X.; Luo, C. A novel ConvLSTM with multifeature fusion for financial intelligent trading. Int. J. Intell. Syst. 2022, 37, 8855–8877. [Google Scholar] [CrossRef]
Asif, R.; Merceron, A.; Ali, S.A.; Haider, N.G. Analyzing undergraduate students’ performance using educational data mining. Comput. Educ. 2017, 113, 177–194. [Google Scholar] [CrossRef]
Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
Bujang, S.D.A.; Selamat, A.; Ibrahim, R.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H.; Ghani, N.A.M. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021, 9, 95608–95621. [Google Scholar] [CrossRef]
Nayak, P.; Vaheed, S.; Gupta, S.; Mohan, N. Predicting students’ academic performance by mining the educational data through machine learning-based classification model. Educ. Inf. Technol. 2023, 28, 14611–14637. [Google Scholar] [CrossRef]
Ram, M.S.; Srija, V.; Bhargav, V.; Madhavi, A.; Kumar, G.S. Machine Learning Based Student Academic Performance Prediction. In Proceedings of the 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2–4 September 2021; pp. 683–688. [Google Scholar]
Balabied, S.A.A.; Eid, H.F. Utilizing random forest algorithm for early detection of academic underperformance in open learning environments. PeerJ Comput. Sci. 2023, 9, e1708. [Google Scholar] [CrossRef] [PubMed]
Aljohani, N.R.; Fayoumi, A.; Hassan, S.-U. Predicting at-risk students using clickstream data in the virtual learning environment. Sustainability 2019, 11, 7238. [Google Scholar] [CrossRef]
Waheed, H.; Hassan, S.-U.; Aljohani, N.R.; Hardman, J.; Alelyani, S.; Nawaz, R. Predicting academic performance of students from VLE big data using deep learning models. Comput. Hum. Behav. 2020, 104, 106189. [Google Scholar] [CrossRef]
Huang, Q.; Zeng, Y. Improving academic performance predictions with dual graph neural networks. Complex Intell. Syst. 2024, 10, 3557–3575. [Google Scholar] [CrossRef]
Ben Said, A.; Abdel-Salam, A.-S.G.; Hazaa, K.A. Performance prediction in online academic course: A deep learning approach with time series imaging. Multimed. Tools Appl. 2024, 83, 55427–55445. [Google Scholar] [CrossRef]
Yang, C.-L.; Chen, Z.-X.; Yang, C.-Y. Sensor classification using convolutional neural network by encoding multivariate time series as two-dimensional colored images. Sensors 2019, 20, 168. [Google Scholar] [CrossRef]
Li, J.; Wang, Q. Comparison of the representational ability in individual difference analysis using 2-D time-series image and time-series feature patterns. Expert Syst. Appl. 2023, 215, 119429. [Google Scholar] [CrossRef]
Yin, J.; Zhuang, X.; Sui, W.; Sheng, Y.; Yang, Y. A new similarity measurement method for time series based on image fusion of recurrence plots and wavelet scalogram. Eng. Appl. Artif. Intell. 2024, 129, 107679. [Google Scholar] [CrossRef]
Jin, X.-B.; Yang, A.; Su, T.; Kong, J.-L.; Bai, Y. Multi-channel fusion classification method based on time-series data. Sensors 2021, 21, 4391. [Google Scholar] [CrossRef] [PubMed]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
Chatzimparmpas, A.; Martins, R.M.; Kucher, K.; Kerren, A. StackGenVis: Alignment of data, algorithms, and models for stacking ensemble learning using performance metrics. IEEE Trans. Vis. Comput. Graph. 2020, 27, 1547–1557. [Google Scholar] [CrossRef]
Seijo-Pardo, B.; Porto-Díaz, I.; Bolón-Canedo, V.; Alonso-Betanzos, A. Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowl. Based Syst. 2017, 118, 124–139. [Google Scholar] [CrossRef]
Xu, Q.; Wang, J.; Jiang, B.; Luo, B. Fine-grained visual classification via internal ensemble learning transformer. IEEE Trans. Multimed. 2023, 25, 9015–9028. [Google Scholar] [CrossRef]
Saidani, O.; Umer, M.; Alshardan, A.; Alturki, N.; Nappi, M.; Ashraf, I. Student academic success prediction in multimedia-supported virtual learning system using ensemble learning approach. Multimed. Tools Appl. 2024, 83, 87553–87578. [Google Scholar] [CrossRef]
Teoh, C.-W.; Ho, S.-B.; Dollmat, K.S.; Tan, C.-H. Ensemble-Learning techniques for predicting student performance on video-based learning. Int. J. Inf. Educ. Technol. 2022, 12, 741–745. [Google Scholar] [CrossRef]
Shayegan, M.J.; Akhtari, R. A Stacking Machine Learning Model for Student Performance Prediction Based on Class Activities in E-Learning. Comput. Syst. Sci. Eng. 2024, 48, 1251–1272. [Google Scholar] [CrossRef]
Al-Ameri, A.; Al-Shammari, W.; Castiglione, A.; Nappi, M.; Pero, C.; Umer, M. Student Academic Success Prediction Using Learning Management Multimedia Data With Convoluted Features and Ensemble Model. ACM J. Data Inf. Qual. 2024. [Google Scholar] [CrossRef]
Evangelista, E. A hybrid machine learning framework for predicting students’ performance in virtual learning environment. Int. J. Emerg. Technol. Learn. (iJET) 2021, 16, 255–272. [Google Scholar] [CrossRef]
Mary, T.A.C.; Rose, P.A.L. Ensemble Machine Learning Model for University Students’ Risk Prediction and Assessment of Cognitive Learning Outcomes. 2023 IJIET 2023, 13, 948–958. [Google Scholar]
Hasan, R.; Palaniappan, S.; Mahmood, S.; Abbas, A.; Sarker, K.U. Dataset of Students’ Performance Using Student Information System, Moodle and the Mobile Application “eDify”. Data 2021, 6, 110. [Google Scholar] [CrossRef]
Amankwah, M.G.; Camps, D.; Bethel, E.W.; Van Beeumen, R.; Perciano, T. Quantum pixel representations and compression for N-dimensional images. Sci. Rep. 2022, 12, 7712. [Google Scholar] [CrossRef]
Naidu, V.; Divya, M.; Mahalakshmi, P. Multi-modal medical image fusion using multi-resolution discrete sine transform. Control Data Fusion E-J. 2017, 1, 13–26. [Google Scholar]
Zhang, Y.; Hou, Y.; OuYang, K.; Zhou, S. Multi-scale signed recurrence plot based time series classification using inception architectural networks. Pattern Recognit. 2022, 123, 108385. [Google Scholar] [CrossRef]
Jaleel, M.; Kucukler, O.F.; Alsalemi, A.; Amira, A.; Malekmohamadi, H.; Diao, K. Analyzing gas data using deep learning and 2-d gramian angular fields. IEEE Sens. J. 2023, 23, 6109–6116. [Google Scholar] [CrossRef]
Sun, Z.; Wang, G.; Li, P.; Wang, H.; Zhang, M.; Liang, X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst. Appl. 2024, 237, 121549. [Google Scholar] [CrossRef]
Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
Saeed, U.; Jan, S.U.; Lee, Y.-D.; Koo, I. Fault diagnosis based on extremely randomized trees in wireless sensor networks. Reliab. Eng. Syst. Saf. 2021, 205, 107284. [Google Scholar] [CrossRef]
Deng, S.; Su, J.; Zhu, Y.; Yu, Y.; Xiao, C. Forecasting carbon price trends based on an interpretable light gradient boosting machine and Bayesian optimization. Expert Syst. Appl. 2024, 242, 122502. [Google Scholar] [CrossRef]
Wang, W.; Sun, D. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021, 563, 358–374. [Google Scholar] [CrossRef]
Charbuty, B.; Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
Bailly, A.; Blanc, C.; Francis, É.; Guillotin, T.; Jamal, F.; Wakim, B.; Roy, P. Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Comput. Methods Programs Biomed. 2022, 213, 106504. [Google Scholar] [CrossRef]
Qin, W.; Luo, X.; Zhou, M. Adaptively-accelerated Parallel Stochastic Gradient Descent for High-Dimensional and Incomplete Data Representation Learning. IEEE Trans. Big Data 2023, 10, 92–107. [Google Scholar] [CrossRef]
Wang, A.X.; Chukova, S.S.; Nguyen, B.P. Ensemble k-nearest neighbors based on centroid displacement. Inf. Sci. 2023, 629, 313–323. [Google Scholar] [CrossRef]
Wickramasinghe, I.; Kalutarage, H. Naive Bayes: Applications, variations and vulnerabilities: A review of literature with code snippets for implementation. Soft Comput. 2021, 25, 2277–2293. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
Petersen, P.; Voigtlaender, F. Equivalence of approximation by convolutional neural networks and fully-connected networks. Proc. Am. Math. Soc. 2020, 148, 1567–1581. [Google Scholar] [CrossRef]
Ashwinkumar, S.; Rajagopal, S.; Manimaran, V.; Jegajothi, B. Automated plant leaf disease detection and classification using optimal MobileNet based convolutional neural networks. Mater. Today Proc. 2022, 51, 480–487. [Google Scholar] [CrossRef]
Geetha, A.; Prakash, N. Classification of Glaucoma in Retinal Images Using EfficientnetB4 Deep Learning Model. Comput. Syst. Sci. Eng. 2022, 43, 1041–1055. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Four different kinds of images transformed by 1D raw data: (a) transformed image by PR; (b) transformed image by GAF; (c) transformed image by SWT; (d) transformed image by RP.

Figure 2. The main process of our experiment.

Figure 3. The main process and architecture of the EnCF model.

Figure 4. Performance of ML models on 1D raw data.

Figure 5. Performance of ML and DL models on 1D raw data.

Figure 6. Performance comparison of deep learning and ensemble learning models on 2D data.

Figure 7. Performance comparison of all models on 1D and 2D data.

Table 1. Description of the features of the dataset.

Features	Description
Applicant Name	It represents the student’s name such as “Student 1”.
Attempt Count	It indicates the number of attempts made for a specific module. It is categorized as “Low/Medium/High”.
Prohibition	The student might have incomplete modules to address. It is categorized as “Yes/No”.
CGPA	It shows the student’s overall grade point average. It is categorized as “Adequate/Excellent/Fair/Good/Poor/Very Good”.
Remote Student	It represents whether the student is enrolled in an e-learning study mode. It is categorized as “Yes/No”.
High Risk	It indicates the likelihood of failure in a particular subject. It is categorized as “Yes/No”.
At Risk	It shows that a student who fails previous modules will be considered at risk. It is categorized as “Yes/No”.
Term Exceeded	It represents the student’s advancement in their degree program. It is categorized as “Yes/No”.
At Risk SSC	It indicates whether the student has been enrolled in the student success center due to academic deficiencies. It is categorized as “Yes/No”.
Plagiarism history	It details the student’s previous instances of plagiarism in any module. It is categorized as “Low/Medium”.
CW1	The grades achieved by the student in their first coursework. It is categorized as “Fair/Fail/Adequate/Very Good/Good/Excellent”.
Other Modules	It indicates the student’s enrollment in other modules during the current semester. It is categorized as “Low/Medium/High”.
CW2	The student’s score achieved in their second coursework. It is categorized as “Fair/Fail/Adequate/Very Good/Good/Excellent”.
Online C	Time spent by the user on on-campus activities (measured in minutes). It is categorized as “Adequate/Excellent/Fair/Good/Poor/Very Good”.
ESE	It refers to the marks obtained in the end-of-semester examination. It is categorized as “Fair/Fail/Adequate/Very Good/Good/Excellent”.
Online O	Time spent by the user on off-campus activities (measured in minutes). It is categorized as “Adequate/Excellent/Fair/Good/Poor/Very Good”.
Paused	The total count of times the video was stopped.
Played	The total count of times the video was played.
Likes	The total number of times the student has marked the video as liked.
Segment	The total instances where a student has used the slider to play a specific section of the video.
Result	It indicates whether the student has passed the exam. It is categorized as “Pass/Fail”.

Table 2. Distribution of the classes of the dataset.

Class	Total
Pass	264
Fail	62
Total	326

Table 3. Specifications of the experimental setup.

Frameworks	Pytorch, Sci-Kit Learn
Language	Python3.8
Ram	256 G
OS	Ubuntu 20.04 LTS
CPU	Intel Xeon Platinum 8362 @ 2.80 GHz
GPU	NVIDIA A40 48 G

Table 4. Results of ML models using 1D raw data.

Model	Accuracy	Precision	Recall	F1 Score
RF	0.9242	0.9308	0.9242	0.9169
SVM	0.9091	0.9058	0.9091	0.9061
ET	0.9091	0.9077	0.9091	0.9025
GB	0.9091	0.9183	0.9091	0.8979
AdaBoost	0.8485	0.8408	0.8485	0.8436
DT	0.8333	0.8208	0.8333	0.8248
LR	0.7727	0.8378	0.7727	0.7914
SGD	0.6970	0.8142	0.6970	0.7269
KNN	0.6061	0.7488	0.6061	0.6461
NB	0.5455	0.7501	0.5455	0.5901

Table 5. Results of DL models on 1D raw data.

Model	Accuracy	Precision	Recall	F1 Score
1D CNN	0.8491	0.8491	0.8491	0.8491
MobileNet	0.8208	0.8255	0.8208	0.8213
EfficientNetB4	0.8302	0.8335	0.8302	0.8307

Table 6. Results of ML and DL models on 1D raw data.

Model	Accuracy	Precision	Recall	F1 Score
ML Models
RF	0.9242	0.9308	0.9242	0.9169
SVM	0.9091	0.9058	0.9091	0.9061
ET	0.9091	0.9077	0.9091	0.9025
GB	0.9091	0.9183	0.9091	0.8979
AdaBoost	0.8485	0.8408	0.8485	0.8436
DT	0.8333	0.8208	0.8333	0.8248
LR	0.7727	0.8378	0.7727	0.7914
SGD	0.6970	0.8142	0.6970	0.7269
KNN	0.6061	0.7488	0.6061	0.6461
NB	0.5455	0.7501	0.5455	0.5901
DL Models
1D CNN	0.8491	0.8491	0.8491	0.8491
MobileNet	0.8208	0.8255	0.8208	0.8213
EfficientNetB4	0.8302	0.8335	0.8302	0.8307

Table 7. Results of DL models on 2D data.

Model	Feature Transformation	Accuracy	Precision	Recall	F1 Score
2D CNN	PR	0.8208	0.8271	0.8208	0.8176
2D FCN	PR	0.8396	0.8410	0.8396	0.8384
2D CNN	GAF	0.8679	0.8689	0.8679	0.8672
2D FCN	GAF	0.7925	0.8034	0.7925	0.7868
2D CNN	SWT	0.8868	0.8869	0.8868	0.8865
2D FCN	SWT	0.8679	0.8679	0.8679	0.8676
2D CNN	RP	0.9340	0.9340	0.9340	0.9339
2D FCN	RP	0.8774	0.8820	0.8774	0.8777

Table 8. Results of ensemble learning models using 2D data.

Model	Feature Transformation	Accuracy	Precision	Recall	F1 Score
2D CNN	PR	0.8208	0.8271	0.8208	0.8176
2D FCN	PR	0.8396	0.8410	0.8396	0.8384
2D EnCF	PR	0.8396	0.8517	0.8396	0.8358
2D CNN	GAF	0.8679	0.8689	0.8679	0.8672
2D FCN	GAF	0.7925	0.8034	0.7925	0.7868
2D EnCF	GAF	0.8585	0.8606	0.8585	0.8588
2D CNN	SWT	0.8868	0.8869	0.8868	0.8865
2D FCN	SWT	0.8679	0.8679	0.8679	0.8676
2D EnCF	SWT	0.9057	0.9072	0.9057	0.9051
2D CNN	RP	0.9340	0.9340	0.9340	0.9339
2D FCN	RP	0.8774	0.8820	0.8774	0.8777
2D EnCF	RP	0.9528	0.9529	0.9528	0.9528

Table 9. Results of all models using 1D and 2D data.

Model	Feature Transformation	Accuracy	Precision	Recall	F1 Score
ML Models
RF	/	0.9242	0.9308	0.9242	0.9169
SVM	/	0.9091	0.9058	0.9091	0.9061
ET	/	0.9091	0.9077	0.9091	0.9025
GB	/	0.9091	0.9183	0.9091	0.8979
AdaBoost	/	0.8485	0.8408	0.8485	0.8436
DT	/	0.8333	0.8208	0.8333	0.8248
LR	/	0.7727	0.8378	0.7727	0.7914
SGD	/	0.6970	0.8142	0.6970	0.7269
KNN	/	0.6061	0.7488	0.6061	0.6461
NB	/	0.5455	0.7501	0.5455	0.5901
DL Models
1D CNN	/	0.8491	0.8491	0.8491	0.8491
MobileNet	/	0.8208	0.8255	0.8208	0.8213
EfficientNetB4	/	0.8302	0.8335	0.8302	0.8307
2D CNN	PR	0.8208	0.8271	0.8208	0.8176
2D FCN	PR	0.8396	0.8410	0.8396	0.8384
2D EnCF	PR	0.8396	0.8517	0.8396	0.8358
2D CNN	GAF	0.8679	0.8689	0.8679	0.8672
2D FCN	GAF	0.7925	0.8034	0.7925	0.7868
2D EnCF	GAF	0.8585	0.8606	0.8585	0.8588
2D CNN	SWT	0.8868	0.8869	0.8868	0.8865
2D FCN	SWT	0.8679	0.8679	0.8679	0.8676
2D EnCF	SWT	0.9057	0.9072	0.9057	0.9051
2D CNN	RP	0.9340	0.9340	0.9340	0.9339
2D FCN	RP	0.8774	0.8820	0.8774	0.8777
2D EnCF	RP	0.9528	0.9529	0.9528	0.9528

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, S.; Zhou, D.; Wang, H.; Chen, D.; Yu, L. Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation. Appl. Sci. 2025, 15, 1231. https://doi.org/10.3390/app15031231

AMA Style

Zhao S, Zhou D, Wang H, Chen D, Yu L. Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation. Applied Sciences. 2025; 15(3):1231. https://doi.org/10.3390/app15031231

Chicago/Turabian Style

Zhao, Shuai, Dongbo Zhou, Huan Wang, Di Chen, and Lin Yu. 2025. "Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation" Applied Sciences 15, no. 3: 1231. https://doi.org/10.3390/app15031231

APA Style

Zhao, S., Zhou, D., Wang, H., Chen, D., & Yu, L. (2025). Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation. Applied Sciences, 15(3), 1231. https://doi.org/10.3390/app15031231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Student Academic Success Prediction Through Ensemble Learning and Image-Based Behavioral Data Transformation

Abstract

1. Introduction

2. Related Work

2.1. Machine Learning with Student Success Prediction

2.2. Deep Learning with Image-Based Data Transformation Techniques

2.3. Ensemble Learning Approaches with Student Success Prediction

3. Materials and Methods

3.1. Dataset

3.2. Data Preprocessing

3.3. Image-Based Transformation Techniques

3.3.1. Pixel Representation

3.3.2. Sine Wave Transformation

3.3.3. Recurrence Plot

3.3.4. Gramian Angular Field

3.4. Machine Learning Models

3.5. Deep Learning Models

3.5.1. Conventional Neural Network

3.5.2. Fully Connected Network

3.5.3. MobileNet

3.5.4. EfficientNetB4

3.6. Proposed Methodology

3.7. Evaluation Metrics

4. Results and Discussion

4.1. Experiment Setup

4.2. Experimental Results of 1D Raw Data

4.2.1. Results for Machine Learning Models with 1D Raw Data

4.2.2. Results for Deep Learning Models with 1D Raw Data

4.2.3. Results of All Models with 1D Raw Data

4.3. Results of Deep Learning with Four Feature Transformation Methods for 2D Data

4.4. Results of Ensemble Learning with Four Feature Transformation Methods for 2D Data

4.5. Results of All Models Using 1D and 2D Data

4.6. Comparison with Previous Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI