Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Software_Defect_Prediction_Using_an_Intelligent_Ensemble-Based_Model

This research presents an intelligent ensemble-based software defect prediction model (VESDP) that combines four supervised machine learning algorithms to enhance software quality and reduce testing costs. The model employs a two-stage prediction process and is evaluated using seven historical defect datasets from the NASA MDP repository, achieving remarkable accuracy and outperforming twenty state-of-the-art techniques. The study highlights the effectiveness of ensemble methods in improving predictive performance and resource utilization in software development.

Uploaded by

23mp2109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Software_Defect_Prediction_Using_an_Intelligent_Ensemble-Based_Model

This research presents an intelligent ensemble-based software defect prediction model (VESDP) that combines four supervised machine learning algorithms to enhance software quality and reduce testing costs. The model employs a two-stage prediction process and is evaluated using seven historical defect datasets from the NASA MDP repository, achieving remarkable accuracy and outperforming twenty state-of-the-art techniques. The study highlights the effectiveness of ensemble methods in improving predictive performance and resource utilization in software development.

Uploaded by

23mp2109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Received 29 November 2023, accepted 20 January 2024, date of publication 24 January 2024, date of current version 12 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3358201

Software Defect Prediction Using an Intelligent


Ensemble-Based Model
MISBAH ALI 1 , TEHSEEN MAZHAR 1 , YASIR ARIF2 , SHAHA AL-OTAIBI 3 , (Member, IEEE),
YAZEED YASIN GHADI 4 , TARIQ SHAHZAD 5 , MUHAMMAD AMIR KHAN6 ,
AND HABIB HAMAM 5,7,8,9 , (Senior Member, IEEE)
1 Department of Computer Science, Virtual University of Pakistan, Lahore 55150, Pakistan
2 Department of Computer Science, Global Institute, Lahore 54000, Pakistan
3 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P. O. Box 84428, Riyadh
11671, Saudi Arabia
4 Department of Computer Science and Software Engineering, Al Ain University, Abu Dhabi, United Arab Emirates
5 School of Electrical Engineering, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa
6 School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Shah Alam, Selangor 40450, Malaysia
7 Faculty of Engineering, University of Moncton, Moncton, NB E1A3 E9, Canada
8 International Institute of Technology and Management (IITG), Commune d’Akanda, Libreville 1989, Gabon
9 Bridges for Academic Excellence, Tunis, Centre Ville 1002, Tunisia

Corresponding authors: Muhammad Amir Khan (amirkhan@uitm.edu.my) and Tehseen Mazhar (tehseenmazhar719@gmail.com)
This work was supported by Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, through the Princess Nourah bint
Abdulrahman University Researchers Supporting Project under Grant PNURSP2024R136.

ABSTRACT Software defect prediction plays a crucial role in enhancing software quality while achieving
cost savings in testing. Its primary objective is to identify and send only defective modules to the testing
stage. This research introduces an intelligent ensemble-based software defect prediction model that combines
diverse classifiers. The proposed model employs a two-stage prediction process to detect defective modules.
In the first stage, four supervised machine learning algorithms are employed: Random Forest, Support Vector
Machine, Naïve Bayes, and Artificial Neural Network. These algorithms are optimized through iterative
parameter optimization to achieve the highest accuracy possible. In the second stage, the predictive accuracy
of the individual classifiers is integrated into a voting ensemble to make the final predictions. This ensemble
approach further improves the accuracy and reliability of the defect predictions. Seven historical defect
datasets from the NASA MDP repository, namely CM1, JM1, MC2, MW1, PC1, PC3, and PC4, were
utilized to implement and evaluate the proposed defect prediction system. The results demonstrate that each
dataset’s proposed intelligent system achieved remarkable accuracy, outperforming twenty state-of-the-art
defect prediction techniques, including base classifiers and ensemble methods.

INDEX TERMS Machine learning, software defect prediction, ensemble classification, heterogeneous
classifiers, random forest, support vector machine, naïve Bayes.

I. INTRODUCTION COVID-19 pandemic, which has accelerated our reliance on


Rapid globalization has transformed our world into a closely online systems for communication, commerce, and remote
connected village where the software industry is crucial in work [3], [4], [5].
driving progress. In this age of a digitally interconnected In a Software Development Life Cycle (SDLC), the work
world, software applications have emerged as the lifeblood flow from the development team to the Quality Assurance
of our global society, forming the backbone of daily (QA) team typically involves several stages. Initially, the
activities, businesses, and critical infrastructure [1], [2]. development team implements the software code handed over
This influence has only intensified, particularly after the to the QA team for testing. The QA team then rigorously
evaluates the software, identifying and reporting defects or
The associate editor coordinating the review of this manuscript and issues. The iterative feedback loop between development
approving it for publication was Juan A. Lara . and QA continues until a high-quality, defect-free software
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
20376 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 1. Development-to-QA workflow in SDLC.

product is achieved [6], [7]. The workflow from the power of classification techniques to enhance the accuracy
Development-to-QA team in an SDLC is shown in Figure 1. of defect prediction models. However, prior work in this field
However, defect-free software is not without its challenges. has limitations, including challenges related to classification
Three critical factors that prominently influence software techniques, overfitting, and underfitting [14].
quality assurance are time, financial resources, and the Parallel to classification, Ensemble Modeling (EM) has
availability of skilled manpower. The industry’s ever-growing also emerged as a promising technique combining the
demand necessitates formulating effective testing strategies predictions from multiple ML models to improve overall
to optimize these valuable resources while maintaining the performance [15]. Ensemble methods, such as bagging,
highest software quality standards [8]. boosting, stacking, and random forest, contribute to the field
This is where software defect prediction (SDP) steps into by mitigating inherent challenges [15]. By aggregating the
the spotlight. SDP is the art of leveraging historical data and predictions of multiple base classifiers, ensemble techniques
machine learning (ML) techniques to forecast and identify reduce the risks of overfitting, underfitting, and biases
potential defects in software systems before the testing phase that can affect individual classifiers. They have proven to
[9]. It investigates the complex set of software metrics such be valuable in enhancing the accuracy and robustness of
as code complexity, size, and historical defect data to build defect prediction models by reducing the inherent biases
models capable of gauging the likelihood of defects [10]. of individual classifiers [16], [17]. Nevertheless, researchers
The integration of software defect prediction disrupts have encountered a standard stumbling block: the inherent
the traditional development-to-QA workflow. The feedback susceptibility of ensemble techniques to biases that can influ-
loop is altered by proactively identifying potential defects ence their efficacy [18], [19]. Figure 3 presents an overview
before the testing phase. The predictive insights allow of the software defect prediction using ML techniques.
developers to address and rectify potential issues before the To meet the ever-pressing need for cost-effective, high-
software reaches the QA team. This process streamlines the quality software, the modern software development paradigm
process and reduces the traditional back-and-forth between must be equipped with an SDP system that is both
development and QA. This shift promotes a more efficient effective and efficient. This research unveils an intelligent
and cost-effective software development life cycle [11]. ensemble-based software defect prediction model that com-
A visual representation highlighting the reduced feedback bines the strengths of diverse classifiers and ensemble tech-
loop when the SDP model is in place is shown in Figure 2. niques while optimizing resource utilization cost-effectively.
In the context of software defect prediction, classification The proposed model, known as the Voting Ensemble-Based
techniques take centre stage. They involve categorising data Software Defect Prediction model (VESDP), integrates four
into classes or labels, making them particularly relevant heterogeneous supervised classifiers: Random Forest (RF),
in identifying potential software defects. Various classifica- Support Vector Machine (SVM), Naïve Bayes (NB), and
tion techniques include decision trees, logistic regression, Artificial Neural Network (ANN). Through iterative tuning,
support vector machines, etc [12], [13]. These techniques each classifier is optimized for maximum accuracy, thereby
aid in assessing and addressing software quality concerns elevating the model’s predictive performance. The predictive
proactively. Historically, several studies have utilized the accuracy of these diverse classifiers is further integrated into

VOLUME 12, 2024 20377


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 2. Development-to-QA workflow using SDP.

a voting ensemble, offering a remedy for the biases often to progress in the field by serving as a benchmark for
encountered when individual classifiers are relied upon in evaluating and fostering the development of advanced
isolation. solutions [21], [22].
The VESDP model’s performance is rigorously evaluated
using seven datasets extracted from the NASA MDP C. CONTRIBUTION OF THE STUDY
repository. In this evaluation, a comprehensive suite of The contributions of this research can be summarized as
eight performance measures, including Predicted Positive follows
Value (PPV), Predicted Negative Value (PNV), True Positive • Novel Ensemble-Based Model: We introduce a pioneer-
Rate (TPR), True Negative Rate, accuracy, Misclassification ing ensemble-based software defect prediction model
Rate (MR), False Positive Ratio (FPR), and False Negative (VESDP) that proficiently combines classification and
Ratio (FNR), are employed to measure the model’s efficacy ensemble techniques, pushing the boundaries of tradi-
and robustness. The results conclusively demonstrate the tional approaches.
higher predictive power of the proposed VESDP model, • Thorough Evaluation with Diverse Datasets: We rigor-
achieving remarkable accuracy across all datasets and ously evaluate the VESDP model by subjecting it to
surpassing state-of-the-art techniques in the field. testing with seven carefully selected datasets from the
NASA MDP repository, offering a robust foundation for
A. OBJECTIVE OF THE STUDY assessment.
This study is aimed at achieving the following research • Remarkable Accuracy: Our VESDP model attains
objectives exceptional accuracy rates across all datasets, clearly
1. Create a software defect prediction model that combines surpassing established state-of-the-art techniques in the
diverse classifiers using a voting ensemble technique field, underscoring its groundbreaking potential.
2. Assess the model’s accuracy using historical defect • Quantified Impact of Ensemble Technique: We quantify
datasets and eight performance measures the tangible impact of integrating predictive accuracy
3. Compare the model’s performance with existing from diverse classifiers through the voting ensemble
techniques to demonstrate its effectiveness technique, revealing its effect through a comprehensive
analysis of multiple performance measures.
B. MOTIVATION OF THE STUDY • Benchmarking Against Modern Methods: We con-
The study explores ensemble models in SDP, aiming duct an extensive comparative analysis, comparing
to enhance predictive performance, stability, and robust- the proposed VESDP model against twenty published
ness [20]. It investigates the impact of heterogeneous techniques, highlighting its excellence in the realm of
supervised machine learning classifiers on software defect software defect prediction
prediction models. Additionally, the research conducts a
comparative analysis with twenty state-of-the-art techniques D. ORGANIZATION OF THE STUDY
to establish the effectiveness of the proposed framework, The rest of the paper is organized as follows: Section II
validate its novelty, and provide valuable insights for the thoroughly reviews existing literature, while Section III
research community and industry. This approach contributes outlines the proposed VESDP framework, detailing its phases

20378 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 3. Software defect prediction process.

TABLE 1. List of abbreviations.

and activities. Section IV presents results and a comparative lower costs. A thorough comparative analysis of several
analysis of the VESDP framework against state-of-the-art classifiers was conducted in the context of software defect
techniques. Section V addresses potential research validity prediction [23]; the authors analyzed ten machine learning
concerns, and Section VI delivers a summary, findings algorithms, including Decision Tree, Naive Bayes, K-Nearest
analysis, and suggestions for future research. Neighbor, Support Vector Machine, Random Forest, Extra
Trees, Adaboost, Gradient Boosting, Bagging, and Multi-
II. LITERATURE REVIEW Layer Perceptron. The analysis was performed on benchmark
A. CLASSIFICATION NASA datasets from the PROMISE warehouse, specifically
Authors in [3] developed an intelligent cloud-based SDP CM1, KC1, KC2, JM1, and PC1. The experimental results
system incorporating data fusion and decision-level machine showed that the employed algorithms achieved higher
learning fusion techniques. The system integrated predictive average accuracy rates on the PC1 dataset. Among classifiers,
accuracy from three classifiers, namely naïve Bayes (NB), the Random Forest learning models with the PCA approach
artificial neural network (ANN), and decision tree (DT) exhibited boosted average performance across the datasets.
using a fuzzy logic-based fusion method. The proposed Figure 4 illustrates the classification process.
system, evaluated using NASA datasets, outperformed other In [24], the authors addressed the challenge of managing
techniques and aimed to achieve high-quality software with a large volume of software defect reports in software

VOLUME 12, 2024 20379


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 4. Classification process.

development and maintenance. They introduced a software approaches. Researchers in [28] extensively studied the
defect prediction (SDP) model based on LASSO–SVM to behaviour of various machine learning classification tech-
improve prediction accuracy. The model combined feature niques for software defect prediction, including Naïve Bayes,
selection using the minimum absolute value compression and Multi-Layer Perceptron, Radial Basis Function, Support
selection method with the support vector machine algorithm. Vector Machine, K Nearest Neighbor, kStar, One Rule,
This approach significantly enhanced prediction accuracy, PART, Decision Tree, and Random Forest by employing
with simulation results showing an accuracy of 93.25% NASA datasets. The performance of these techniques was
and 66.67%, a recall rate of 78.04%, and an f-measure evaluated using performance measures such as Precision,
of 72.72%. The proposed model outperformed traditional Recall, F-measure, accuracy, MCC, and ROC Area. The
methods in terms of both accuracy and speed. In [25], results indicated that accuracy and ROC may not be effective
researchers presented a cloud-based framework for real-time performance measures due to class imbalance issues, while
software defect prediction, comparing four back-propagation precision, recall, F-Measure, and MCC are more reliable. The
training algorithms. Bayesian regularization (BR) emerged researchers in [29] introduced a novel approach that inte-
as the most effective. The framework also included a grates genetic algorithms (GA) with support vector machine
fuzzy layer to select the best training function based on (SVM) classifiers and particle swarm algorithms for software
performance. Publicly available NASA datasets were used for fault prediction. The approach applies to large-scale (NASA
evaluation, employing various measures. The results showed MDP) and small-scale (Java open-source projects) datasets.
BR outperformed other training algorithms and widely used The results demonstrate that integrating GA with SVM
machine-learning techniques. and particle swarm algorithms enhances the performance
The researchers in [26] applied machine learning tech- of software fault prediction, addressing limitations observed
niques to analyze the performance of different numbers of in previous studies. Authors in [30]introduced an algorithm
trees in the RF algorithm in the context of software defect based on support vector machines (SVM) and extreme
prediction using the RAPIDMINER machine learning tool. learning machines (ELM) for software reliability prediction.
They compared the performance of different numbers of trees They investigated the factors influencing prediction accuracy,
in the RF algorithm. The results indicate that increasing the such as using previous failure data and the appropriate type
number of trees slightly improves accuracy, with a maximum of failure information. They proposed a model for feature
accuracy of 99.59% and a minimum accuracy of 85.96%. selection using ELM and SVM by addressing dataset imbal-
The research also highlighted the effectiveness of the RF ance issues through the resampling method and applying it
algorithm in software defect prediction, particularly when to NASA Metrics datasets. Experimental results showed that
using around a hundred trees. the ELM-based reliability prediction model achieves higher
The author [27] evaluated various semi-supervised learn- accuracy, specificity, recall, precision, and F1-measure than
ing (SSL) techniques, mainly the extended random forest SVM. The authors in [31] conducted research to highlight
(extRF) technique, for predicting defective modules. The the use of ML methods, particularly SVM, for building
extRF technique extends the random forest approach and software defect prediction models. Figure 5 illustrates defect
employs a weighted mixture of random trees for final prediction process using different classifiers. They evaluated
predictions. The study concludes that SSL techniques, the performance of SVM with different kernel functions on
including active learning, can achieve improved predic- various datasets collected from software repositories. The
tion performance compared to traditional machine learning researchers conducted 1520 experiments on 38 datasets.

20380 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 5. Defect prediction using different classifiers.

The performance of kernel functions was not significantly Regression, Decision Trees, K-nearest neighbour, Support
affected by the granularity of the data but did show Vector Machines, and Ensemble Learning, along with feature
differences in datasets with high imbalance ratios. RBF extraction and selection techniques for classifying software
performed significantly better in highly imbalanced ratios, modules as defect-prone or non-defect-prone. They proposed
while other kernel functions like polynomial and linear a model that utilized partial least squares (PLS) regression
performed well on moderately imbalanced datasets. The and RFE for dimension reduction and the synthetic minority
authors suggested using SVM with the RBF kernel for defect oversampling technique (SMOTE) to handle imbalanced
datasets due to its higher performance than other kernel datasets [35]. The results show that XGBoost and Stacking
functions. To improve software efficiency and reliability, the Ensemble techniques yield the best results with a defect
authors in [32] proposed an efficient and reliable framework prediction accuracy above 0.9. Figure 4 represents defect
for software defect prediction based on naive Bayes and linear prediction using different classifiers.
regression. The framework consists of three main steps: data
preprocessing (noise removal and normalization), feature
extraction using correlation-based analysis, and applying B. ENSEMBLE LEARNING
machine learning models such as naive Bayes and linear The authors in [36] proposed an intelligent system based on
regression. Results show that the proposed framework can feature selection and ensemble machine learning techniques
reach an accuracy of 98.7% using the naive Bayes algorithm, to predict defective modules in the software. A novel metric
which significantly reduces maintenance costs, lowers code selection technique was introduced to select the most relevant
complexity, and improves software quality by predicting features, and a three-step nested approach was employed for
defects early in the development process. accurate prediction. The first step involves using a decision
The authors in [33] conducted a study to investigate tree, support vector machines, and naïve Bayes to detect
the impact of automated parameter optimization on defect faulty modules. In the second step, the predictive accuracy
prediction models using 18 datasets. The authors analyzed of these techniques is integrated using ensemble methods
the classifiers’ efficiency and stability, parameter transfer- such as bagging, voting, and stacking. Finally, fuzzy logic
ability, computational cost, and ranking of the different was applied to further combine the predictive accuracy of the
classifying methods. The results reflected that automated ensemble techniques. The experiments were conducted on a
parameter optimization improved the performance of defect fused software defect dataset, combining five NASA datasets,
prediction models, increasing the AUC (area under the which demonstrated that the proposed system outperforms
curve) performance by up to forty percent. It was also other advanced techniques, achieving an impressive accuracy
observed that optimized classifiers were as stable as those rate of 92.08%.
with default settings, except for random forest classifiers. Researchers in [37] developed a business failure prediction
Grid-search optimization had a low computational cost, model for the US restaurant industry by using a majority vot-
adding less than 30 minutes of additional computation ing ensemble method with a decision tree. They used experi-
time. Furthermore, some rarely-used techniques, like C5.0, mental data from 1980 to 2017 and developed three models:
outperform widely-used techniques after optimization. an entire period model, an economic downturn model, and an
Researchers in [34] addressed the challenges of data economic expansion model. The models achieved prediction
imbalance and high dimensionality in defect datasets. accuracies of 88.02%, 80.81%, and 87.02% respectively.
They employed several ML algorithms, such as Logistic Authors in [38] conducted thorough research to study

VOLUME 12, 2024 20381


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

TABLE 2. Comprehensive summary of literature review.

the effectiveness of seven Tree-based ensembles including as well as boosting ensembles like AdaBoost, Gradient
bagging ensembles like Random Forest and Extra Trees, Boosting, Hist Gradient Boosting, XGBoost, and CatBoost.

20382 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

They employed 11 publicly available NASA software defect the predictive power of the model by synthesizing diverse
datasets. The empirical results showed that the tree-based classifier outputs, contributing to a more robust and accurate
bagging ensembles, particularly random forest and extra software defect prediction. An overview of the proposed
trees, outperform the tree-based boosting ensembles. model is shown in Figure 5. It shows that the proposed
In [39], a software defect prediction system was introduced model VESDP model comprises two layers i.e. training
by researchers. This system utilized a nested ensemble and testing. There are three stages in the training layer:
learning (EL) approach, with a Voting classifier as the main 1) data preprocessing 2) base classification 3) ensemble
classifier and three base ensemble classifiers: bagging, boost- classification.
ing, and stacking. The accuracy achieved by this framework In the training layer, the dataset is preprocessed for
on two distinct NASA datasets was 83.46% and 79.65%. classification. The predictive accuracy of base classifiers is
A software defect prediction model was proposed in a study then aggregated in an ensemble classifier, contributing to the
conducted by researchers in [40]. This model employed development of the proposed ensemble-based model. The
multi-layer feed-forward neural networks in combination testing layer comprises one stage only namely prediction.
with stacking as an ensemble technique. Six different search This layer involves the defect prediction for unseen modules
methods were applied for feature selection to enhance based on the trained model. The experiments have been
the model’s performance, with the multilayer perceptron performed using a Python tool that streamlined our data anal-
serving as a subset evaluator. The achieved accuracies ysis process, allowing for efficient preprocessing, advanced
on NASA’s datasets using the best-first search, greedy statistical analysis, and accurate ML modeling enabling us to
stepwise search, and GS methods were 80%, 75%, and 76%, extract valuable insights from research data.
respectively. Existing studies in software defect prediction The following key steps were executed to identify defective
have commonly employed individual classifiers, which, modules in the software:
despite their utility, may suffer from limitations such as • In the first step, datasets comprising various software
overfitting, lack of robustness, and biases inherent to specific metrics were collected and reused [41].
algorithms. • In the second step, preprocessing on the datasets was
Moreover, these standalone classifiers might not capture performed that further included three sub-activities,
the diverse patterns present in complex software datasets, namely dataset splitting, cleaning, and normalization
leading to suboptimal predictive performance [3], [28]. [42], [43]
Although some studies have explored ensemble tech- • In the third step, the VESDP model was trained based
niques, most have predominantly focused on homogeneous on the diverse combination of base classifiers.
classifiers within their ensembles [23], [36]. In con- • In the fourth step, the base classifiers were integrated
trast, the proposed framework introduces a paradigm shift into an ensemble learning technique that aggregated the
by integrating the predictive accuracy of heterogeneous accuracy of base classifiers and produced unbiased and
classifiers–Random Forest (RF), Support Vector Machine more accurate results.
(SVM), Naïve Bayes (NB), and Artificial Neural Network • Finally, in the last step, the preprocessing technique was
(ANN)–through a voting ensemble classification technique. applied to the new modules, and the dataset was input to
This innovative combination addresses the shortcomings of the trained model, which predicts defective modules.
both individual classifiers and conventional homogeneous The primary objective of the proposed approach is to
ensemble approaches. By leveraging the strengths of diverse predict the defective modules in software. The proposed
classifiers, the proposed model enhances interpretability, VESDP model can be expressed by the following mapping.
generalizability, and predictive accuracy, setting it apart as a
more comprehensive and effective solution in software defect Y = f (X ) + ϵ (1)
prediction. The summary of the literature review is shown in
Table 2. It presents the ML techniques proposed for SDP, the where Y represents the prediction made by the model i.e.,
source of datasets employed for research, the specific datasets the module is defective or non-defective, X represents the
used, and the performance measures implemented to analyze module to be passed to the model for prediction purposes, and
the results. ϵ accounts for any deviation between the predicted output Y
and the actual output. The key objective of this research is
III. MATERIALS AND METHODS to minimize the amount of ϵ to make the defect prediction
The proposed research introduces an intelligent ensemble- model more reliable and efficient.
based software defect prediction model. This model utilizes X = x1 , x2 , x3 , . . . . . . xn (2)
heterogeneous supervised ML classifiers for enhanced accu-
racy. The innovative approach aims to address challenges where x1,x2,x3, . . .xn denote the attributes associated with
in predicting software defects efficiently. Individual base the software module. This research aims to find this mapping
classifier outputs are consolidated through a voting ensemble through ensemble-based machine-learning techniques. The
model, strategically leveraging the strengths embedded in the graphical representation of the proposed model having details
proposed VESDP model. This ensemble approach enhances of each step is shown in Figure 6.

VOLUME 12, 2024 20383


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

A software module X consists of multiple attributes and and unbiased learning process. Furthermore, normalization
can be represented as is instrumental in handling variations in data distribution,
Comprehensive details of each step involved in the training ensuring that the model performs consistently across different
and testing layer are provided in the following sections. datasets [45]. Its role extends beyond numerical stability,
encompassing the robustness and generalization capabilities
A. DATASET COLLECTION
of the proposed VESDP model.
The first step in the training layer is the collection
of historical software defect datasets. The selection of C. CLASSIFICATION
well-established benchmark datasets from NASA’s MDP
Classification is the third step in the training layer that
repository, including CM1, JM1, MC2, MW1, PC1, PC3,
differentiates between defective and non-defective modules.
and PC4, was driven by a commitment to representative-
It trains the models using labeled data and performs the
ness and comparability with existing research. Including
identification of potentially faulty modules [46], [47]. In this
these datasets aligns with industry standards, enhancing
research, four supervised machine learning classifiers of
the generalizability of our proposed solution [41]. Each
a heterogeneous nature have been implemented namely
dataset corresponds to one software component, and each
Random Forest (RF), Support Vector Machine (SVM), Naïve
instance in the dataset represents one software module; also,
Bayes, and Artificial Neural Network (ANN).
one module consists of multiple software quality attributes,
The selection of Random Forest (RF), Support Vector
including LOC_COMMENT, LOC_TOTAL CALL_PAIRS,
Machine (SVM), Naïve Bayes, and Artificial Neural Network
HALSTEAD_LENGTH, and HALSTEAD_CONSTANT,
(ANN) as our base classifiers is rooted in their diverse and
etc., that have been recorded during the development
complementary strengths. RF excels in capturing complex
phase of SDLC. There are various dependent attributes and
relationships within data, particularly useful in software
one independent attribute in each module. The dependent
defect prediction scenarios [26]. SVM, with its ability to
attributes (x1,x2,x3,. . .x) are used to make predictions,
handle non-linear data through kernel functions, offers robust
whereas the independent attribute also called the target
classification capabilities [29]. Naïve Bayes, relying on
variable, represents whether the module is defective (Y) or
probabilistic principles, provides simplicity and efficiency
non-defective (N).
in handling large datasets with conditional independence
assumptions [48].
B. PREPROCESSING Lastly, inspired by neural networks, ANN demonstrates
Pre-processing is the second step of the training layer. strong pattern recognition, which is essential for intricate
This step further involves three sub-activities: 1) dataset software defect patterns [49]. The selection and optimization
splitting, 2) cleaning, and 3) normalization, which enhance of these classifiers contribute to a well-rounded and resilient
the effectiveness of the proposed model. Dataset splitting ensemble, enhancing the adaptability and accuracy of our
is the first sub-activity of the pre-processing step. In this proposed VESDP model. Initially, the model is developed
step, the used datasets are divided into two groups of using training data; based on the training data results, the
training and testing with a ratio of 70:30 employing classifiers have been optimized iteratively to achieve the
the class-based splitting rule [3]. The second sub-activity, highest accuracy.
cleaning, is pivotal for the model’s robustness. Cleaning It has been observed that RF performs best when split
ensures the accuracy of predictions by removing incon- quality is measured using the Gini criterion and the depth of
sistent, inaccurate, or irrelevant data points. This step the tree is restricted to 10. SVM shows maximum accuracy
improves the quality and integrity of the data by reducing when the kernel is set as poly and the complexity factor is
noise, handling missing values, ensuring consistency, and set to 2. ANN performs best when hidden layers are set to 2,
correcting errors within the dataset [44]. The cleaning with 10 neurons in each layer. The rest of the parameters in
activity is performed using the mean imputation method that RF, SVM, and ANN are used with default values. However,
replaced the missing values in the dataset, leading to better parameter tuning is not a primary concern in NB as it relies
predictions. on the assumption that features are conditionally independent
The third sub-activity in the preprocessing step is nor- given the class variable; thus, it has been implemented with
malization. Normalization is a widely used technique that default parameters. The classification step ends by producing
scales and standardizes the input attributes of the dataset the optimized versions of all base classifiers.
by equalizing feature scales within a range of 0 to 1
[42]. Normalization not only facilitates convergence but also
contributes to the stability and efficiency of the machine- D. ENSEMBLE MODELING
learning model. The process of equalizing feature scales Ensemble modeling is the fourth step in the training layer.
eliminates the dominance of attributes with larger scales, It refers to combining multiple individual models to make
preventing them from disproportionately influencing the more accurate predictions or classifications. It is based on
model [43]. This, in turn, aids in achieving a balanced the concept of ‘‘wisdom of the crowd,’’ where combining the

20384 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 6. Proposed VESDP model.

predictions of multiple models often leads to better overall used. The complete source code files along with the datasets
performance than relying on a single model [20], [50]. employed in the framework; have been uploaded to GitHub
This research employs a voting ensemble model that boosts Repository [51]. The pseudo-code of the main ensemble
the accuracy and reliability of predictions [37]. The voting classifier is shown in Figure 7.
ensemble leverages the unique strengths of each base model,
promoting a more robust and reliable prediction system. E. DEFECT PREDICTIONS USING THE VESDP MODEL
The diversity inherent in using multiple models mitigates In this research, a voting ensemble-based software defect
biases and errors that might be present in any single model, prediction model is proposed. The proposed model is applied
fostering a clear understanding of software quality attributes. in the testing layer, which comprises one step only, which is
Furthermore, the ensemble’s resilience to outliers and noise real-time predictions using unseen software modules. In this
in the data adds an extra layer of robustness, ensuring more layer, unlabeled data is passed as input to the function f (X )
consistent and accurate predictions. Overfitting, a common that attaches labels to the respective software modules. It is
challenge in machine learning, is also addressed as the voting observed that the proposed model has a less error rate ϵ
ensemble method naturally reduces the risk of models fitting as compared to the modern techniques implemented for
too closely to specific patterns in the training data. SDP. The output Y of the function f (X ) containing resultant
The ensemble’s ability to aggregate predictions leads to predictions is sent back to the development team of SDLC
an overall improvement in accuracy, making it a valuable that can debug the defective modules before passing it to
asset in the realm of software defect prediction, where the testing team; thus, saving time and effort for the quality
precision and reliability are paramount for effective quality assurance team and making it an economic process for the
assurance in software development [20]. In the proposed organization as well.
model, the predictive accuracy of four heterogeneous base
classifiers, including RF, SVM, NB, and ANN, is given F. PERFORMANCE EVALUATIONL
as input to the voting ensemble model. The proposed In the above-mentioned formulas, α λ reflects the defective
VESDP model exhibits better performance on the voting modules in the software which were correctly predicted
ensemble as compared to base classifiers for the datasets as defective by the model; similarly, α θ represents the

VOLUME 12, 2024 20385


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 7. Pseudo-code of the main ensemble classifier.

αλ + αθ
non-defective modules in the software which were correctly Accuracy = (7)
predicted as non-defective by the model. βλ and βθ Values αλ + αθ + βλ + βθ
indicate that there is a conflict between actual and predicted Misclassification rate(MR) = 1 − Accuracy (8)
values. βλ Shows that the module was non-defective but it False positive rate(MR) = 1 − TNR (9)
was predicted as defective; on the other hand, βθ indicates False negative rate(MR) = 1 − TPR (10)
that the module was defective but it was predicted as
non-defective by the model.
IV. RESULTS AND DISCUSSION
αλ
Predicted positive value (PPV ) = (3) In this research, an intelligent voting ensemble-based soft-
αλ + βλ ware defect prediction model (VESDP) was implemented.
αθ To perform the experiments, seven publicly accessible NASA
Predicted negative value (PNV ) = (4)
αθ + βθ datasets (CM1, JM1, MC2, MW1, PC1, PC3, and PC4 were
αλ extracted from the MDP repository. In the preprocessing step,
True positive rate (TPR) = (5)
αλ + βθ the datasets were subjected to three sub-activities, namely
αθ splitting, cleaning, and normalization. The splitting activity
True negative rate (TNR) = (6)
αθ + βλ divides the datasets into two sub-sets, namely training, and

20386 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

testing, with a ratio of 70:30 using the class-based splitting


rule [53]. To train the model, initially, four heterogeneous
supervised classification algorithms, including RF, SVM,
NB, and ANN were implemented. These classifiers were
optimized iteratively until each of them produced the highest
possible accuracy for the used datasets. The predictive accu-
racy from individual classifiers is integrated using the voting
ensemble technique, which further boosts the performance
of the proposed model. To analyze the performance, eight
widely used evaluation measures were implemented [22],
[54]. All the performance measures have been derived using
a confusion matrix that is drawn by implementing functions
provided by Python [23], [55]. The results obtained from both
FIGURE 8. Performance measures on CM1 dataset.
training and testing datasets on each classifier are presented
below.
Detailed results on the CM1 dataset are shown in Table 3. testing, achieving 79.36%. The model maintains a balanced
It is evident from the table that in the training phase, trade-off between false positive and false negative ratios,
RF achieves 100% accuracy in both predicted positive indicating its ability to make reliable predictions in the
and true positive rates. However, during the testing phase, context of software defect prediction. ANN achieves an
RF maintains high accuracy at 85.86%, with a well- accuracy of 79.1% during testing. While the accuracy
balanced trade-off between predicted positive values and true is lower compared to some models, ANN demonstrates
positive rates. This indicates RF’s robustness in accurately a balanced performance concerning false positive and
predicting positive instances, even with previously unseen false negative ratios. VESDP model maintains a testing
data. SVM performs well during both the training and testing accuracy of 79.92%. This indicates that the ensemble
phases, achieving an accuracy of 86.87% during testing. approach effectively combines the strengths of individ-
The classifier effectively differentiates between defective ual classifiers to achieve robust performance in identify-
and non-defective modules. NB exhibits moderate accuracy ing defective and non-defective modules within the JM1
during testing, with an accuracy of 80.81%. The model dataset.
shows a balanced trade-off between false positive and false The performance measures attained by JM1 testing
negative ratios, indicating its ability to make reasonably datasets, aside from accuracy are shown in Figure 9.
accurate predictions. ANN achieves an accuracy of 26.26% The performance measures attained by the CM1 test
during testing. While the accuracy is comparatively lower, dataset, aside from accuracy, are shown in Figure 10.
the model shows a balance in false positive and false negative The detailed results of performance evaluation on the MC2
ratios, suggesting its capability to classify defective and non- dataset are presented in Table 5. It presents that in the training
defective modules. VESDP model maintains an accuracy of phase, RF demonstrates 100% accuracy in both predicted
86.87% during testing. This underscores the effectiveness positive values and true positive rates. While, in the testing
of the ensemble approach in combining diverse classifiers phase, RF maintains a high accuracy of 63.61%, indicating
to enhance overall predictive performance, especially when its capability to generalize well to unseen data. The model
dealing with unseen data. Detailed results on the CM1 dataset shows a balanced trade-off between predicted positive value
are shown in Table 3. and true positive rate, suggesting its effectiveness in making
The performance measures attained by the CM1 test accurate predictions while minimizing false negatives. SVM
dataset, aside from accuracy, are shown in Figure 7. achieves a testing accuracy of 71.05%, demonstrating its
The detailed results of the assessment of JM1 dataset effectiveness in distinguishing between defective and non-
are shown in Table 4. It reveals that in the training phase, defective modules.
RF classifier exhibits 100% accuracy in both predicted Detailed results on the MW1 dataset are presented in
positive values and true positive rates. Whereas in the Table 6. It shows that during the training phase, RF exhibits
testing phase, RF maintains a high accuracy of 80.53%, 100% accuracy in both predicted positive and true positive
demonstrating its capability to generalize well to unseen data. rates. While in the testing phase, RF maintains a high
The classifier strikes a balance between predicted positive accuracy of 88%, with a balance between predicted positive
value and true positive rate, indicating its ability to make values and true positive rates. This suggests that RF continues
accurate predictions while minimizing false negatives. SVM to perform well in predicting positive instances even when
achieves high accuracy during both training and testing, with confronted with new and unseen data. SVM demonstrates
a testing accuracy of 79.36%. robust performance, achieving an accuracy of 86.67% during
The model effectively differentiates between defec- testing. The model maintains a high true negative rate,
tive and non-defective modules, showcasing its robust showcasing its ability to correctly identify non-defective
performance. NB demonstrates moderate accuracy during modules and a low misclassification rate. NB performs

VOLUME 12, 2024 20387


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

TABLE 3. Detailed results of classifiers on CM1 dataset.

TABLE 4. Detailed results of classifiers on JM1 dataset.

TABLE 5. Detailed results of classifiers on MC2 dataset.

reasonably well, with an accuracy of 78.67% during testing.


The model’s ability to maintain a balanced false positive
ratio and false negative ratio suggests its reliability in
distinguishing between defective and non-defective modules.
ANN faces challenges during testing, with an accuracy of
52%. The model exhibits a balanced false positive ratio and
false negative ratio, indicating a moderate ability to classify
both defective and non-defective modules. VESDP model
excels during testing, achieving an accuracy of 89.33%.
This shows the effectiveness of the ensemble approach in
combining the strengths of individual classifiers to enhance
overall predictive performance when applied to unseen data.
The performance measures, apart from accuracy, attained FIGURE 9. Performance measures on JM1 dataset.
by the MW1 test dataset are represented in Figure 11.
Detailed results on the PC1 dataset are reflected in Table 7. While in the testing phase, it maintains a high accuracy
It shows that in the training phase, the RF exhibits 100% of 93.63%, with a balanced trade-off between predicted
accuracy in both predicted positive and true positive rates. positive values and true positive rates. This suggests RF’s

20388 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

FIGURE 10. Performance measures on MC2 dataset.

FIGURE 12. Performance measures on PC1 dataset.

accurately predicting positive instances, even with previously


unseen data.
SVM demonstrates strong performance during both train-
ing and testing phases, achieving an accuracy of 87.66%
during testing. The model effectively distinguishes between
defective and non-defective modules. NB achieves an
accuracy of 50% during testing, with a balanced false
positive ratio and false negative ratio. This suggests that
FIGURE 11. Performance measures on MW1 dataset.
NB provides a baseline performance, and there is room for
improvement in certain scenarios. ANN achieves an accuracy
of 87.66% during testing, demonstrating balanced false
effectiveness in accurately predicting positive instances even positive and false negative ratios. This indicates moderate
with new and unseen data, making it a robust choice success in classifying both defective and non-defective
for software defect prediction. SVM demonstrates strong modules. VESDP model excels during testing, achieving
performance during both the training and testing phases. an accuracy of 87.97%. This demonstrates the strength of
The classifier achieves an accuracy of 91.67% during the ensemble approach in combining diverse classifiers to
testing, showcasing its ability to effectively distinguish enhance overall predictive performance, particularly when
between defective and non-defective modules. NB performs applied to unseen data.
reasonably well, with an accuracy of 82.84% during testing. The performance measures achieved by PC3 test dataset,
The classifier’s balanced false positive ratio and false negative excluding accuracy are represented in Figure 13.
ratio suggest its reliability in classification, though there is The results obtained from the PC4 dataset are shown in
room for improvement in certain scenarios. ANN achieves Table 9. It is clear from that table that in the training phase,
an accuracy of 85.29% during testing. The classifier shows RF shows 100% accuracy in both predicted positive and true
a balanced false positive ratio and false negative ratio, positive rates. However, in the testing phase, it maintains
indicating its moderate ability to classify both defective and high accuracy at 87.92%, with a well-balanced trade-off
non-defective modules. VESDP model excels during testing, between predicted positive values and true positive rates.
achieving an accuracy of 92.16%. This exhibits the strength SVM demonstrates strong performance during both training
of the ensemble approach in combining diverse classifiers and testing phases, achieving an accuracy of 87.66% during
to enhance overall predictive performance, especially when testing.
applied to unseen data. The classifier effectively distinguishes between defective
The performance measures achieved by the PC1 test and non-defective modules. NB achieves an accuracy of 50%
dataset, excluding accuracy, are reflected in Figure 12. The during testing, with a balanced false positive ratio and false
detailed results achieved on the PC3 dataset by executing all negative ratio. This suggests that NB provides a baseline
the classifiers are presented in Table 8. It presents that in performance, and there is room for improvement in certain
the training phase, the RF classifier achieves 100% accuracy scenarios. ANN achieves an accuracy of 87.66% during
in both predicted positive and true positive rates. While in testing, demonstrating balanced false positive and false
the testing phase, it maintains high accuracy at 87.92%, with negative ratios. This indicates moderate success in classifying
a well-balanced trade-off between predicted positive values both defective and non-defective modules. VESDP model
and true positive rates. This suggests RF’s robustness in excels during testing, achieving an accuracy of 87.97%.

VOLUME 12, 2024 20389


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

TABLE 6. Detailed results of classifiers on the MW1 dataset.

TABLE 7. Detailed results of classifiers on the PC1 dataset.

TABLE 8. Detailed results of classifiers on the PC3 dataset.

TABLE 9. Detailed results of classifiers on the PC4 dataset.

This highlights the strength of the ensemble approach in A. PERFORMANCE COMPARISON


combining diverse classifiers to enhance overall predictive In this section, the proposed voting ensemble-based software
performance, particularly when applied to unseen data. The defect prediction (VESDP) model is compared based on
detailed results obtained from the PC4 dataset are displayed accuracy with modern research that has implemented state-
in Table 9. of-the-art techniques for software defect prediction. The
The performance measures attained by the PC4 test dataset, comparison is based on twenty published studies in the last
except accuracy, are displayed in Figure14. 5 years. Out of twenty studies, ten researchers worked on

20390 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

TABLE 10. Comparison of VESDP with state-of-the-art techniques.

FIGURE 13. Performance measures on PC3 dataset.

FIGURE 15. Comparison graph.

heterogeneous base classifiers into ensemble classification


boosts the accuracy of the software defect prediction process.
The accuracy comparison of the proposed VESDP model
with modern techniques is shown in Table 10. Figure 15
illustrates performance comparison.

V. THREATS TO VALIDITY
FIGURE 14. Performance measures on PC4 dataset.
Validity threats refer to factors or issues that may undermine
the CM1 dataset, eight on MW1, JM1, and PC3 datasets, the accuracy, credibility, or generalizability of the findings in
seven on MC2 and PC1 datasets, and five on the PC4 a research paper. These threats can arise at different stages of
dataset. It is clear from the results that the integration of the research process and can affect the validity of the study’s

VOLUME 12, 2024 20391


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

conclusions [66]. Some most crucial validity threats are listed costs by minimizing the resources dedicated to quality assur-
below: ance activities during testing. In this research, an intelligent
ensemble-based model for software defect prediction was
A. INTERNAL VALIDITY proposed. The model was implemented using benchmark
This type of validity assesses the adequacy of the chosen datasets extracted from the NASA defect repository. The pro-
prediction techniques for the specific datasets employed in posed model integrated the predictive accuracy of four het-
the research or for other datasets used to tackle the same erogeneous supervised classifiers using the voting ensemble
problem [67]. classification technique. For statistical analysis, eight perfor-
In this research, four supervised classification algorithms mance measures were implemented. To prove the effective-
were used to implement the proposed VESDP model namely ness of the strategy adopted in the proposed model, a compar-
RF, SVM, NB, and ANN based on the diversity in their ative analysis was conducted with state-of-the-art techniques.
computation mechanism and performance. In the future, The proposed VESDP model outperformed modern research
researchers can implement clustering algorithms along with and proved its efficiency for the software defect prediction
feature selection techniques to analyze the performance of process.
software defect prediction models.
VII. LIMITATION OF PROPOSED MODEL
B. EXTERNAL VALIDITY The training data has a significant impact on the effectiveness
This form of validity examines whether the proposed solution of any machine-learning model, including ensemble-based
is equally effective when applied to other datasets associated models. The training dataset’s disparity, missing data,
with the same problem domain [68]. In this research, seven or noise may have a detrimental effect on the model’s
benchmark datasets namely CM1, MW1, PC1, PC3, and PC4 ability to predict the future. The requirements, development
from NASA’s defect repository were employed to implement processes, and code used in software development are
the proposed VESDP model. Hence, the conclusion of continually changing. An ensemble-based model trained
this research cannot be generalized to other defect datasets on historical data may struggle to adapt when faced with
having different attributes. However, the preprocessing steps unexpected changes in project dynamics or new development
including dataset splitting, cleaning, and normalization paradigms.The model looks for trends in historical data to
along with parameter optimization in the classification produce forecasts. If the current task differs greatly from the
step can be implemented by other researchers in their projects in the training dataset, the model’s performance can
studies. suffer.

C. CONSTRUCT VALIDITY VIII. FUTURE WORK


This validity is concerned with the performance measures For future work, it is recommended to explore advanced
used to evaluate the proposed model [69]. In this research, feature selection techniques using a genetic algorithm or
eight performance measures including predicted positive bat search algorithm to enhance the robustness of the
value, predicted negative value, true positive rate, true proposed model in software defect prediction. Additionally,
negative rate, accuracy, misclassification rate, false positive incorporating nested ensemble techniques using bagging,
ratio, and false negative ratio have been implemented to boosting, stacking, etc. could further improve the reliability
evaluate the performance of the proposed VESDP model. of predictions. These enhancements would contribute to the
However, only an accuracy measure is used to compare continuous evolution of defect prediction models, advanc-
the performance of the VESDP model with state-of-the-art ing their applicability in real-world software development
techniques. scenarios.

D. CONCLUSION VALIDITY IX. FUNDING


This validity refers to the extent to which the conclusion Princess Nourah bint Abdulrahman University Researchers
drawn from a study accurately represents the true relation- Supporting Project number (PNURSP2024R136), Princess
ships or effects observed in the proposed model [68]. In this Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
research, the conclusion is drawn based on the accuracy
comparison with state-of-the-art techniques. This assessment REFERENCES
shows that the proposed model has better results as compared [1] Z. M. Zain, S. Sakri, and N. H. A. Ismail, ‘‘Application of deep learning
to modern research. in software defect prediction: Systematic literature review and meta-
analysis,’’ Inf. Softw. Technol., vol. 158, Jun. 2023, Art. no. 107175, doi:
10.1016/j.infsof.2023.107175.
VI. CONCLUSION [2] M. Unterkalmsteiner et al., ‘‘Software startups—A research agenda,’’
Software defect prediction aims to identify faulty modules 2023, arXiv:2308.12816.
before the testing phase, enabling a focus on testing only [3] S. Aftab, S. Abbas, T. M. Ghazal, M. Ahmad, H. A. Hamadi, C. Y. Yeun,
and M. A. Khan, ‘‘A cloud-based software defect prediction system using
those modules that are likely to have defects. An efficient data and decision-level machine learning fusion,’’ Mathematics, vol. 11,
defect prediction model can reduce software development no. 3, p. 632, Jan. 2023, doi: 10.3390/math11030632.

20392 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

[4] S. Goyal, ‘‘Heterogeneous stacked ensemble classifier for software [22] A. Iqbal and S. Aftab, ‘‘A classification framework for software defect
defect prediction,’’ in Proc. 6th Int. Conf. Parallel, Distrib. Grid prediction using multi-filter feature selection technique and MLP,’’ Int.
Comput. (PDGC), Waknaghat, India, Nov. 2020, pp. 126–130, doi: J. Mod. Educ. Comput. Sci., vol. 12, no. 1, pp. 18–25, Feb. 2020, doi:
10.1109/PDGC50313.2020.9315754. 10.5815/ijmecs.2020.01.03.
[5] S. Mehta and K. S. Patnaik, ‘‘Stacking based ensemble learning for [23] M. Cetiner and O. K. Sahingoz, ‘‘A comparative analysis for machine
improved software defect prediction,’’ in Proc. 5th Int. Conf. Microelec- learning based software defect prediction systems,’’ in Proc. 11th Int. Conf.
tron., Comput. Commun. Syst., vol. 748, 2021, pp. 167–178. Comput., Commun. Netw. Technol. (ICCCNT), Kharagpur, India, Jul. 2020,
[6] M. Shafiq, F. H. Alghamedy, N. Jamal, T. Kamal, Y. I. Daradkeh, pp. 1–7, doi: 10.1109/ICCCNT49239.2020.9225352.
and M. Shabaz, ‘‘Retracted: Scientific programming using optimized [24] K. Wang, L. Liu, C. Yuan, and Z. Wang, ‘‘Software defect prediction
machine learning techniques for software fault prediction to improve model based on LASSO–SVM,’’ Neural Comput. Appl., vol. 33, no. 14,
software quality,’’ IET Softw., vol. 17, no. 4, pp. 694–704, Jan. 2023, doi: pp. 8249–8259, Jul. 2021, doi: 10.1007/s00521-020-04960-1.
10.1049/sfw2.12091.
[25] M. S. Daoud, S. Aftab, M. Ahmad, M. A. Khan, A. Iqbal, S. Abbas,
[7] Y. Tang, Q. Dai, M. Yang, T. Du, and L. Chen, ‘‘Software defect prediction M. Iqbal, and B. Ihnaini, ‘‘Machine learning empowered software
ensemble learning algorithm based on adaptive variable sparrow search defect prediction system,’’ Intell. Autom. Soft Comput., vol. 31, no. 2,
algorithm,’’ Int. J. Mach. Learn. Cybern., vol. 14, no. 6, pp. 1967–1987, pp. 1287–1300, 2022, doi: 10.32604/iasc.2022.020362.
Jan. 2023, doi: 10.1007/s13042-022-01740-2.
[26] Y. N. Soe, P. I. Santosa, and R. Hartanto, ‘‘Software defect prediction
[8] S. Goyal, ‘‘3PcGE: 3-parent child-based genetic evolution for software
using random forest algorithm,’’ in Proc. 12th South East Asian Technical
defect prediction,’’ Innov. Syst. Softw. Eng., vol. 19, no. 2, pp. 197–216,
Univ. Consortium, Yogyakarta, Indonesia, Mar. 2018, pp. 1–5, doi:
Jun. 2023, doi: 10.1007/s11334-021-00427-1.
10.1109/SEATUC.2018.8788881.
[9] J. Liu, J. Ai, M. Lu, J. Wang, and H. Shi, ‘‘Semantic feature
learning for software defect prediction from source code and external [27] F. H. Alshammari, ‘‘Software defect prediction and analysis using
knowledge,’’ J. Syst. Softw., vol. 204, Oct. 2023, Art. no. 111753, doi: enhanced random forest (extRF) technique: A business process man-
10.1016/j.jss.2023.111753. agement and improvement concept in IoT-based application processing
environment,’’ Mobile Inf. Syst., vol. 2022, pp. 1–11, Sep. 2022, doi:
[10] A. K. Gangwar and S. Kumar, ‘‘Concept drift in software defect prediction:
10.1155/2022/2522202.
A method for detecting and handling the drift,’’ ACM Trans. Internet
Technol., vol. 23, no. 2, pp. 1–28, May 2023, doi: 10.1145/3589342. [28] A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana, M. Ahmad, and A. Husen,
[11] M. S. Alkhasawneh, ‘‘Software defect prediction through neural network ‘‘Performance analysis of machine learning techniques on software defect
and feature selections,’’ Appl. Comput. Intell. Soft Comput., vol. 2022, prediction using NASA datasets,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10,
pp. 1–16, Sep. 2022, doi: 10.1155/2022/2581832. no. 5, 2019, doi: 10.14569/IJACSA.2019.0100538.
[12] T. F. Husin and M. R. Pribadi, ‘‘Implementation of LSSVM in [29] H. Alsghaier and M. Akour, ‘‘Software fault prediction using particle
classification of software defect prediction data with feature selection,’’ in swarm algorithm with genetic algorithm and support vector machine
Proc. 9th Int. Conf. Electr. Eng., Comput. Sci. Informat. (EECSI), Jakarta, classifier,’’ Softw., Pract. Exper., vol. 50, no. 4, pp. 407–427, Apr. 2020,
Indonesia, Oct. 2022, pp. 126–131, doi: 10.23919/EECSI56542.2022. doi: 10.1002/spe.2784.
9946611. [30] S. K. Rath, M. Sahu, S. P. Das, S. K. Bisoy, and M. Sain, ‘‘A
[13] J. A. Richards, ‘‘Supervised classification techniques,’’ in Remote Sensing comparative analysis of SVM and ELM classification on software
Digital Image Analysis. Cham, Switzerland: Springer, 2022, pp. 263–367. reliability prediction model,’’ Electronics, vol. 11, no. 17, p. 2707,
[14] B. J. Odejide, A. O. Bajeh, A. O. Balogun, Z. O. Alanamu, K. S. Adewole, Aug. 2022, doi: 10.3390/electronics11172707.
A. G. Akintola, and S. A. Salihu, ‘‘An empirical study on data sampling [31] M. Azzeh, Y. Elsheikh, A. B. Nassif, and L. Angelis, ‘‘Examining the
methods in addressing class imbalance problem in software defect performance of kernel methods for software defect prediction based on
prediction,’’ in Proc. Comput. Sci. Online Conf. Cham, Switzerland: support vector machine,’’ Sci. Comput. Program., vol. 226, Mar. 2023,
Springer, Apr. 2022, pp. 594–610. Art. no. 102916, doi: 10.1016/j.scico.2022.102916.
[15] X. Wu and J. Wang, ‘‘Application of bagging, boosting and stacking [32] A. Rahim, Z. Hayat, M. Abbas, A. Rahim, and M. A. Rahim, ‘‘Software
ensemble and EasyEnsemble methods for landslide susceptibility mapping defect prediction with Naïve Bayes classifier,’’ in Proc. Int. Bhurban
in the three Gorges reservoir area of China,’’ Int. J. Environ. Res. Public Conf. Appl. Sci. Technol. (IBCAST), Islamabad, Pakistan, Jan. 2021,
Health, vol. 20, no. 6, p. 4977, Mar. 2023, doi: 10.3390/ijerph20064977. pp. 293–297, doi: 10.1109/ibcast51254.2021.9393250.
[16] F. Jiang, X. Yu, D. Gong, and J. Du, ‘‘A random approximate reduct- [33] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto,
based ensemble learning approach and its application in software ‘‘The impact of automated parameter optimization on defect prediction
defect prediction,’’ Inf. Sci., vol. 609, pp. 1147–1168, Sep. 2022, doi: models,’’ IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 683–711, Jul. 2019,
10.1016/j.ins.2022.07.130. doi: 10.1109/TSE.2018.2794977.
[17] H. Chen, X.-Y. Jing, Y. Zhou, B. Li, and B. Xu, ‘‘Aligned metric represen- [34] S. Mehta and K. S. Patnaik, ‘‘Improved prediction of software defects using
tation based balanced multiset ensemble learning for heterogeneous defect ensemble machine learning techniques,’’ Neural Comput. Appl., vol. 33,
prediction,’’ Inf. Softw. Technol., vol. 147, Jul. 2022, Art. no. 106892, doi: no. 16, pp. 10551–10562, Aug. 2021, doi: 10.1007/s00521-021-05811-3.
10.1016/j.infsof.2022.106892. [35] A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, ‘‘Software
[18] A. O. Balogun, A. O. Bajeh, V. A. Orie, and A. W. Yusuf-Asaju, ‘‘Software defect prediction analysis using machine learning techniques,’’ Sustain-
defect prediction using ensemble learning: An ANP based evaluation ability, vol. 15, no. 6, p. 5517, Mar. 2023, doi: 10.3390/su15065517.
method,’’ FUOYE J. Eng. Technol., vol. 3, no. 2, pp. 50–55, Sep. 2018,
[36] S. Abbas, S. Aftab, M. A. Khan, T. M. Ghazal, H. A. Hamadi, and
doi: 10.46792/fuoyejet.v3i2.200.
C. Y. Yeun, ‘‘Data and ensemble machine learning fusion based intelligent
[19] A. O. Balogun, F. B. Lafenwa-Balogun, H. A. Mojeed, V. E. Adeyemo, software defect prediction system,’’ Comput., Mater. Continua, vol. 75,
O. N. Akande, A. G. Akintola, A. O. Bajeh, and F. E. Usman-Hamza, no. 3, pp. 6083–6100, 2023, doi: 10.32604/cmc.2023.037933.
‘‘SMOTE-based homogeneous ensemble methods for software defect
[37] S. Y. Kim and A. Upneja, ‘‘Majority voting ensemble with a deci-
prediction,’’ in Computational Science and Its Applications—ICCSA 2020,
sion trees for business failure prediction during economic down-
vol. 12254, O. Gervasi, B. Murgante, S. Misra, C. Garau, I. B. D. Taniar,
turns,’’ J. Innov. Knowl., vol. 6, no. 2, pp. 112–123, Apr. 2021, doi:
B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, C. M. Torre, and
10.1016/j.jik.2021.01.001.
Y. Karaca, Eds. Cham, Switzerland: Springer, 2020, pp. 615–631.
[20] R. J. Jacob, R. J. Kamat, N. M. Sahithya, S. S. John, and S. P. Shankar, [38] A. Alazba and H. Aljamaan, ‘‘Software defect prediction using stacking
‘‘Voting based ensemble classification for software defect prediction,’’ generalization of optimized tree-based ensembles,’’ Appl. Sci., vol. 12,
in Proc. IEEE Mysore Sub Sect. Int. Conf. (MysuruCon), Hassan, no. 9, p. 4577, Apr. 2022, doi: 10.3390/app12094577.
India, Oct. 2021, pp. 358–365, doi: 10.1109/MysuruCon52639.2021. [39] M. A. Javed. (2021). A Framework for Software Defect Prediction Using
9641713. Nested-Ensemble Learning and Feature Selection Techniques. [Online].
[21] A. Alsaeedi and M. Z. Khan, ‘‘Software defect prediction using Available: https://vspace.vu.edu.pk/detail.aspx?id=592
supervised machine learning and ensemble techniques: A comparative [40] F. Matloob. (2020). Software Defect Prediction Model Using
study,’’ J. Softw. Eng. Appl., vol. 12, no. 5, pp. 85–100, 2019, doi: Multi-Layer Feed Forward Neural Networks. [Online]. Available:
10.4236/jsea.2019.125007. https://vspace.vu.edu.pk/detail.aspx?id=342

VOLUME 12, 2024 20393


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

[41] M. Shepperd, Q. Song, Z. Sun, and C. Mair, ‘‘Data quality: Some [61] A. Balogun, R. O. Oladele, H. A. Mojeed, and B. Amin-Balogun,
comments on the NASA software defect datasets,’’ IEEE Trans. Softw. ‘‘Performance analysis of selected clustering techniques for software
Eng., vol. 39, no. 9, pp. 1208–1215, Sep. 2013, doi: 10.1109/TSE.2013.11. defects prediction,’’ Afr. J. Comput. ICT, vol. 12, no. 2, pp. 30–42, 2019.
[42] J. Shi, X. Li, L. Li, C. Ouyang, and C. Xu, ‘‘An efficient deep learning- [62] M. A. Javed, ‘‘A framework for software defect prediction using
based troposphere ZTD dataset generation method for massive GNSS nested-ensemble learning and feature selection techniques,’’ M.S. thesis,
CORS stations,’’ IEEE Trans. Geosci. Remote Sens., 2023. Virtual Univ. Pakistan, Lahore, Pakistan, 2021. [Online]. Available:
[43] W. Du, C. Wu, H. Yu, Q. Kong, Y. Xu, and W. Zhang, ‘‘Determination https://vspace.vu.edu.pk/details.aspx?id=592
of multicomponents in Rubi Fructus by near-infrared spectroscopy [63] A. O. Balogun, S. Basri, L. F. Capretz, S. Mahamad, A. A. Imam,
technique,’’ Int. J. Anal. Chem., vol. 2023, pp. 1–9, Nov. 2023, doi: M. A. Almomani, V. E. Adeyemo, A. K. Alazzawi, A. O. Bajeh, and
10.1155/2023/5575944. G. Kumar, ‘‘Software defect prediction using wrapper feature selection
[44] P. Suresh Kumar, H. S. Behera, J. Nayak, and B. Naik, ‘‘Bootstrap based on dynamic re-ranking strategy,’’ Symmetry, vol. 13, no. 11, p. 2166,
aggregation ensemble learning-based reliable approach for software defect Nov. 2021, doi: 10.3390/sym13112166.
prediction by using characterized code feature,’’ Innov. Syst. Softw. Eng., [64] S. Singh and T. U. Haider, ‘‘Selection of best feature reduction
vol. 17, no. 4, pp. 355–379, Dec. 2021, doi: 10.1007/s11334-021-00399-2. method for module-based software defect prediction,’’ J. Phys., Conf.
[45] H. Tong, S. Wang, and G. Li, ‘‘Credibility based imbalance boosting Ser., vol. 2273, no. 1, May 2022, Art. no. 012002, doi: 10.1088/1742-
method for software defect proneness prediction,’’ Appl. Sci., vol. 10, 6596/2273/1/012002.
no. 22, p. 8059, Nov. 2020, doi: 10.3390/app10228059. [65] S. Amin. (2019). Software Defect Prediction via Machine Learning Clas-
sifiers. [Online]. Available: https://vspace.vu.edu.pk/detail.aspx?id=378
[46] H. Alsawalqah, N. Hijazi, M. Eshtay, H. Faris, A. A. Radaideh, I. Aljarah,
[66] F. Yucalar, A. Ozcift, E. Borandag, and D. Kilinc, ‘‘Multiple-classifiers in
and Y. Alshamaileh, ‘‘Software defect prediction using heterogeneous
software quality engineering: Combining predictors to improve software
ensemble classification based on segmented patterns,’’ Appl. Sci., vol. 10,
fault prediction ability,’’ Eng. Sci. Technol., Int. J., vol. 23, no. 4,
no. 5, p. 1745, Mar. 2020, doi: 10.3390/app10051745.
pp. 938–950, Aug. 2020, doi: 10.1016/j.jestch.2019.10.005.
[47] A. Iqbal, S. Aftab, I. Ullah, M. S. Bashir, and M. A. Saeed, ‘‘A feature [67] U. Sharma B and R. Sadam, ‘‘Towards developing and analysing
selection based ensemble classification framework for software defect metric-based software defect severity prediction model,’’ 2022,
prediction,’’ Int. J. Modern Educ. Comput. Sci., vol. 11, no. 9, pp. 54–64, arXiv:2210.04665.
Sep. 2019, doi: 10.5815/ijmecs.2019.09.06. [68] A. Abdu, Z. Zhai, R. Algabri, H. A. Abdo, K. Hamad, and M. A. Al-antari,
[48] F. M. Tua and W. Danar Sunindyo, ‘‘Software defect prediction using ‘‘Deep learning-based software defect prediction via semantic key features
software metrics with Naïve Bayes and rule mining association methods,’’ of source code—Systematic survey,’’ Mathematics, vol. 10, no. 17, p. 3120,
in Proc. 5th Int. Conf. Sci. Technol. (ICST), Yogyakarta, Indonesia, Aug. 2022.
Jul. 2019, pp. 1–5, doi: 10.1109/icst47872.2019.9166448. [69] Z. Xu, J. Liu, X. Luo, Z. Yang, Y. Zhang, P. Yuan, Y. Tang, and T. Zhang,
[49] S. I. Ayon, ‘‘Neural network based software defect prediction using genetic ‘‘Software defect prediction based on kernel PCA and weighted extreme
algorithm and particle swarm optimization,’’ in Proc. 1st Int. Conf. Adv. learning machine,’’ Inf. Softw. Technol., vol. 106, pp. 182–200, Feb. 2019.
Sci., Eng. Robot. Technol. (ICASERT), Dhaka, Bangladesh, May 2019,
pp. 1–4, doi: 10.1109/ICASERT.2019.8934642.
[50] T. Zhang, Y. Yu, X. Mao, Y. Lu, Z. Li, and H. Wang, ‘‘FENSE: A feature-
based ensemble modeling approach to cross-project just-in-time defect
prediction,’’ Empirical Softw. Eng., vol. 27, no. 7, p. 162, Dec. 2022, doi:
10.1007/s10664-022-10185-8.
[51] [Online]. Available: https://github.com/misbah-here/VESDP_Repository
[52] A. O. Balogun, S. Basri, S. A. Jadid, S. Mahamad, M. A. Al-momani,
A. O. Bajeh, and A. K. Alazzawi, ‘‘Search-based wrapper feature selection
methods in software defect prediction: An empirical analysis,’’ in MISBAH ALI received the B.S. degree (Hons.) in
Intelligent Algorithms in Software Engineering, vol. 1224, R. Silhavy, Ed. computer science from the Punjab University Col-
Cham, Switzerland: Springer International Publishing, 2020, pp. 492–503. lege of Information Technology, Lahore, Pakistan,
[53] I. Kaur and A. Kaur, ‘‘Comparative analysis of software fault prediction in 2015. She is currently pursuing the M.S. degree
using various categories of classifiers,’’ Int. J. Syst. Assurance Eng. in computer science with the Virtual University of
Manage., vol. 12, no. 3, pp. 520–535, Jun. 2021, doi: 10.1007/s13198-021-
Pakistan, with a focus on software engineering.
01110-1.
Her research interests include machine learning,
[54] B. Mumtaz, S. Kanwal, S. Alamri, and F. Khan, ‘‘Feature selection using
data mining, and software process improvement.
artificial immune network: An approach for software defect prediction,’’
Intell. Autom. Soft Comput., vol. 29, no. 3, pp. 669–684, 2021, doi:
10.32604/iasc.2021.018405.
[55] S. Goyal, ‘‘Handling class-imbalance with KNN (neighbourhood) under-
sampling for software defect prediction,’’ Artif. Intell. Rev., vol. 55, no. 3,
pp. 2023–2064, Mar. 2022, doi: 10.1007/s10462-021-10044-w.
[56] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, ‘‘Performance
analysis of feature selection methods in software defect prediction:
A search method approach,’’ Appl. Sci., vol. 9, no. 13, p. 2764, Jul. 2019.
[57] H. Aljamaan and A. Alazba, ‘‘Software defect prediction using tree-
based ensembles,’’ in Proc. 16th ACM Int. Conf. Predictive Models
Data Anal. Softw. Eng., New York, NY, USA, Nov. 2020, pp. 1–10, doi:
TEHSEEN MAZHAR received the B.Sc. degree
10.1145/3416508.3417114. in computer science from Bahauddin Zakariya
[58] M. Azam, M. Nouman, and A. R. Gill, ‘‘Comparative analysis of University, Multan, Pakistan, the M.Sc. degree in
machine learning technique to improve software defect prediction,’’ computer science from Quaid-i-Azam University,
KIET J. Comput. Inf. Sci., vol. 5, no. 2, pp. 1–11, Jul. 2022, doi: Islamabad, Pakistan, and the M.S. degree (Hons.)
10.51153/kjcis.v5i2.96. in computer science from the Virtual University
[59] S. Goyal and P. K. Bhatia, ‘‘Comparison of machine learning techniques of Pakistan, where he is currently pursuing the
for software quality prediction,’’ Int. J. Knowl. Syst. Sci., vol. 11, no. 2, Ph.D. degree. He is with SED and a Lecturer
pp. 20–40, Apr. 2020, doi: 10.4018/IJKSS.2020040102. with GCUF. He has more than 21 publications in
[60] U. S. Bhutamapuram and R. Sadam, ‘‘With-in-project defect prediction well-reputed journals, such as Electronics (MDPI),
using bootstrap aggregation based diverse ensemble learning technique,’’ Health, Applied Science, Brain Sciences, Symmetry, Future Internet, Peer
J. King Saud Univ.-Comput. Inf. Sci., vol. 34, no. 10, pp. 8675–8691, j, and Computers, Materials & Continua. His research interests include
Nov. 2022, doi: 10.1016/j.jksuci.2021.09.010. machine learning, the Internet of Things, and networks.

20394 VOLUME 12, 2024


M. Ali et al.: Software Defect Prediction Using an Intelligent Ensemble-Based Model

YASIR ARIF received the B.S. degree (Hons.) MUHAMMAD AMIR KHAN received the M.Sc.
in computer science from the Global Institute, degree in computer engineering from COMSATS
Lahore, Pakistan, in 2017. His research interests University Islamabad, Abbottabad Campus, and
include machine learning, artificial intelligence, the Ph.D. degree in information technology from
and natural language processing. Universiti Teknologi PETRONAS, Malaysia, with
a focus on cutting-edge research. He is cur-
rently with the Department of Computer Science,
Universiti Teknologi Mara, Malaysia. He is an
accomplished academician and a researcher. As an
Associate Professor, he continues to inspire and
guide the next generation of computer scientists, leaving an indelible mark
on the academic landscape. He laid the groundwork for a future dedicated to
technological advancements with COMSATS University Islamabad. With an
SHAHA AL-OTAIBI (Member, IEEE) received the M.S. degree in computer impressive record of more than 50 research papers published in ISI/Impact
science and the Ph.D. degree in artificial intelligence from King Saud Factor journals and international conferences, he stands as a prominent
University. She is currently an Associate Professor with the Department figure shaping the discourse and advancements within the realm of computer
of Information Systems, College of Computer and Information Sciences, science. His research interests include a broad spectrum, notably focusing on
Princess Nourah bint Abdulrahman University, Saudi Arabia. Her main communication protocols for the Internet of Things (IoT), wireless sensor
research interests include data science, artificial intelligence, machine networks, wireless ad hoc networks, software-defined networks (SDN), and
learning, bio-inspired computing, cybersecurity, and information security. medical imaging. His contributions to these areas underscore his visionary
She is a Senior Fellow of the U.K. Higher Education Academy (SFHEA). approach to technology and his dedication to addressing contemporary
She is a reviewer of some journals and an editorial board member of other challenges.
journals.

YAZEED YASIN GHADI received the Ph.D.


degree in electrical and computer engineering
from Queensland University. He is currently
an Assistant Professor in software engineering
with Al Ain University. He was a Postdoctoral
Researcher with Queensland University before HABIB HAMAM (Senior Member, IEEE)
joining Al Ain University. He has published more received the B.Eng. and M.Sc. degrees in
than 80 peer-reviewed journals and conference information processing from the Technical Uni-
papers and holds three pending patents. His versity of Munich, Germany, 1988 and 1992,
current research interests include developing novel respectively, and the Ph.D. degree in physics
electro-acoustic-optic neural interfaces for large-scale high-resolution elec- and applications in telecommunications from
trophysiology and distributed optogenetic stimulation. He was a recipient the University of Rennes 1 conjointly with
of several awards. His dissertation on developing novel hybrid plasmonic the France Telecom Graduate School, France,
photonic on chip-biochemical sensors received the Sigma Xi Best Ph.D. in 1995, and the Postdoctoral Diploma degree
Thesis Award. ‘‘Accreditation to Supervise Research in Signal
Processing and Telecommunications’’ from the University of Rennes 1,
in 2004. From 2006 to 2016, he was the Canada Research Chair of Optics
in Information and Communication Technologies, the most prestigious
research position in Canada which he held for a decade. The title is awarded
TARIQ SHAHZAD received the B.E. and M.S. by the Head of the Government of Canada after a selection by an international
degrees from COMSATS University Islamabad, scientific jury in the related field. He is currently a Full Professor with the
Pakistan, in 2006 and 2014, respectively, and the Department of Electrical Engineering, University of Moncton. His research
Ph.D. degree from the University of Johannesburg, interests include optical telecommunications, wireless communications,
South Africa, in 2021. He is currently a Research diffraction, fiber components, RFID, information processing, the IoT, data
Fellow with the University of Johannesburg, South protection, COVID-19, and deep learning. He is a Senior Member of OSA
Africa. His research work has been published in and a Registered Professional Engineer in New Brunswick. He was a
top-tier IEEE conferences and well-reputed peer- recipient of several pedagogical and scientific awards. He is among others
reviewed journals. His research interests include the Editor-in-Chief and the Founder of CIT Review Journal, an Academic
the Internet of Things, machine learning, and AI Editor of Applied Sciences, and an Associate Editor of the IEEE CANADIAN
in healthcare. He has served as a technical program committee member and REVIEW. He also served as a guest editor in several journals.
a paper reviewer for international conferences and journals.

VOLUME 12, 2024 20395

You might also like