(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
Software Reliability Prediction by using Deep
Learning Technique
Shivani Yadav, Balkishan
Department of Computer Science & Applications
Maharshi Dayanand University, Rohtak-124001, Haryana, India
Abstract—The importance of software systems and their
impact on all sectors of society is undeniable. Furthermore, it is
increasing every day as more services get digitized. This
necessitates the need for evolution of development and quality
processes to deliver reliable software. For reliable software, one
of the important criteria is that it should be fault-free. Reliability
models are designed to evaluate software reliability and predict
faults. Software reliability prediction is always an area of interest
in the field of software engineering. Prediction of software
reliability can be done using numerous available models but with
the inception of computational intelligence techniques,
researchers are exploring new techniques such as machine
learning, genetic algorithm, deep learning, etc. to develop better
prediction models. In the current study, a software reliability
prediction model is developed using a deep learning technique
over twelve real datasets from different repositories. The results
of the proposed model are analyzed and found quite encouraging.
The results are also compared with previous studies based on
various performance metrics.
concept [38]. The label deep was inspired by the number of
processing layers that data must pass through. Deep learning
advances have resulted in the development of neural networks
with more complexity to generate more powerful learning
abilities. The deep learning model takes an input and performs
a step-by-step nonlinear transformation and then uses the
learnings to generate a statistical model as output. The model
continues these iterations until the output is accurate enough.
Due to the data-hungry nature of deep learning algorithms and
increased dataset size, complex problems can be easily solved
more accurately and efficiently.
Keywords—Software reliability; deep learning; performance
metrics; prediction; dense neural network; fault prediction
This paper aims to develop a novel neural network-based
deep learning reliability prediction model. The choice of the
deep learning model has been determined because of its ability
to automatically capture and learn the discriminative features
from data, which results in an improved reliability prediction
model. This research will open the road for other deep learning
approaches to be used in fault prediction. So, that software
engineers will be able to better predict the likelihood of faults
which results in greater resource use, risk management and
better quality control.
I.
INTRODUCTION
Reliability is an essential and one of the most critical
aspects of a software product and it is also one of the major
attributes to determine software quality. Software reliability
can be described as its ability to perform its intended functions
accurately and successfully. Regular checks during software
development ensure the prevention of faults which can further
lead to failure and might incur huge efforts to correct or
recover if detected later. Therefore, reliability prediction is an
important aspect of any software development approach. For
reliable software, it is important that it should be fault-free.
Computational intelligence techniques like machine learning,
genetic algorithm, deep learning, etc. are gaining the attention
of researchers for reliability prediction. The current study uses
a deep learning-based technique for software reliability
prediction due to its potential to predict high accuracy on the
huge amount of unstructured or unlabeled data [1]. Early fault
prediction using deep learning models helps to improve the
reliability of the software.
Deep learning is a subset of machine learning algorithms
that are built on Artificial Neural Network (ANN). Neural
networks are computational systems that respond to external
inputs with dynamic state changes and try to determine
underlying relationships within a dataset. ANN with two or
three layers is a basic neural network and the neural network
with more than three layers is considered as a deep learning
Deep learning integration into Software Engineering (SE)
tasks has become increasingly popular among software
developers and researchers these days. Deep learning assists
SE experts in extracting requirements from natural language
text, generating source code, and predicting software faults for
typical SE tasks. Deep learning in SE has increased the interest
of both the SE and Artificial Intelligence (AI) experts.
The remaining paper is organized into five sections.
Section 2 conducts a literature review of related studies to
explore the various models already used for predicting the
software reliability and its accuracy so, that the scope of further
improvement can be identified. Section 3 discusses the
proposed model design for improving the accuracy of software
reliability prediction. Step by step process is also discussed in
this section. Section 4 implements the model and presents the
results. Results are presented in tabular as well as graphical
form and also discussed in detail. Section 5 compares the result
of the current study with previous studies. In the final section
of the paper, the work has been summarized with possible
directions for future research.
II. LITERATURE REVIEW
The use of Computational Intelligence (CI) in the field of
software engineering is expanding nowadays. It can be
witnessed by the huge research work undergoing and still being
683 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
carried out by various researchers. Some important research
work related to software reliability prediction is filtered and
studied to conduct the current work.
The term CI can be traced back to 1983 when Nick
Cercone and Gordon McCalla started the International Journal
of Computational Intelligence (IJCI). Cercone and McCalla
sought to differentiate their work from existing studies in the
broad Artificial Intelligence domain [2]. Bezdek[2] was the
first to propose a technical definition of CI and its relation to
neural networks like computational networks. Marks [3]
summarized that fuzzy systems, genetic algorithms, neural
networks, and evolutionary programming is building blocks of
CI. In the same year,Karunanithi et al. [4] explored the
application of connectionist models based on a neural network
for software reliability growth prediction and claim better
results as compared to traditional parametric models. In the
same field, Ho et al. [5] extended the research, the work
compared traditional and connectionist models while
extensively studying software reliability prediction using
connectionist models. Therefore, neural networks give good
results in predicting errors but do not provide appropriate
results under different circumstances. In 2005, Tian and
Noore[6] mentioned that neural networks are difficult to
interpret physically the neurons in layer and proposed an
alternative approach based on genetic algorithm predict
software reliability and Costa et al. [7] proposed a hybrid
approach which used both genetic algorithms and evolutionary
neural networks for improving the reliability prediction. In this
approach, a genetic algorithm is used to analyze the number of
neurons in each layer of ANN. The use of hybridization
became prominent since 2005 in the field of predicting
software reliability. Another study by Pai and Hong [8]
experimented combination of Simulated Annealing (SA) and
Support Vector Machines (SVMs) for predicting software
reliability. In this study, SA is used to choose the SVM
parameters. However, the authors suggest exploring other
searching techniques for improving the results. Hu et al. [9]
used recurrent neural networks (RNNs) and genetic algorithms
for designing generic software reliability models and showed
better results with the larger datasets. In 2011, Lo [10]
introduced techniques Support Vector Machine (SVM) and
Autoregressive Integrated Moving Average (ARIMA), both the
proposed models predict better results as compared to the
results of the traditional model. Li et al. [11] used the Adaboost
technique based on machine learning which combines weak
predictors into a single predictor to improve prediction
accuracy and the results are verified using two case studies.
Similarly, Roy et al.[12]a proposed neuro-genetic algorithm in
which ANN is trained using backpropagation and further the
weights of the network are optimized using Genetic Algorithm
(GA). Further the results are compared with traditional
methods and good results are obtained by the model. Then,
researchers focused more on machine learning and deep
learning methods. Jin et al. [13] proposed a combination of
Quantum Particle Swarm Optimization (QPSO) and hybrid
Artificial Neural Network (ANN) for predicting faultproneness of software modules. QPSO was used for
dimensionality reduction whereas ANN classified modules into
non-faulty and faulty categories. The approach is simple to
implement, and results showed the correlation between a
module‟s software metrics and fault-proneness, which makes it
possible to minimize cost and effort for software maintenance.
Malhotra [14] reviewed various machine learning techniques
for software fault prediction, performance is assessed and
compared with statistical techniques. The study proved that
machine learning technique models predict software fault
better than traditional models, but these techniques are still
limited. Wahono[15] proposed three influential frameworks
i.e., Lessmann et al., Menzies et al., and Song et al. by
combining Machine Learning (ML) classifiers for predicting
software defects and improving the accuracy but these
frameworks are not able to handle noisy data. Jaiswal and
Malhotra [16] studied the application of various ML techniques
including Instance-Based Learning (IBL), Cascading Forward
Backpropagation Neural Network (CFBPNN), Multilayer
Perceptron (MLP), General Regression Neural Network
(GRNN), Feed Forward Backpropagation Neural Network
(FFBPNN), Bagging, and Adaptive Neuro-Fuzzy Inference
System (ANFIS) on industrial software. The results showed
that ANFIS provides better reliability prediction compared to
other methods. Several recent studies indicate the strength of
the deep learning approach in software reliability prediction
such as Clemente et al.[17] developed a predictive model using
a deep learning technique that predicts security bugs with more
accuracy (73.50%) as compared to machine learning
techniques.[18][19][20][21][22] identified all the challenges,
metrics required for finding faults and testing using different
computational techniques.[23][24][25][26][27] fire reviewed,
and assessed qualityparameters for component-based software
using different computational intelligence techniques. Qasem
et. al. [28] predicted software faults using two deep learning
algorithms i.e., the Multi-layer Perceptrons (MLP) and
Convolutional Neural Network (CNN) using four NASA
datasets and concluded CNN is a better model but implemented
on limited datasets.
The literature review shows that there are a lot of
techniques being used by various researchers in predicting
software reliability, but more work needs to be done for
predicting reliability for complex or large datasets. However,
the neural network-based deep learning approach is gaining the
attention of researchers due to its capability of providing better
results. However, still, there is a scope on improving the
accuracy of the reliability prediction by detecting faults in the
software. To further improve prediction accuracy, a deep
learning model is designed which is presented in subsequent
sections.
III. DESIGN OF MODEL
Deep learning algorithms are based on ANN where hidden
layers try to uncover relationships between data. An artificial
neural network works by processing inputs through several
dynamic state responses. The interconnected processing
elements between different layers are called neurons and are
responsible for facilitating the computational system. Artificial
neural networks have evolved to provide increasingly complex
structures with powerful learning abilities.
The framework used for building this model is shown in
Fig. 1.
684 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
Fig. 1. Model Design Framework.
A. Data Acquisition
Data acquisition obtains meaningful data and transforms it
into a digital form that can be processed by the model. The data
used in the current study is obtained through various online
sources and loaded into panda‟s data frames for further
processing. A total of twelve datasets are used which consists
of various features. Table VI presents the detail of all the
datasets along with their sources. NAN (invalid or not a
number) values check, and categorical features encoding are
performed on the datasets. If the dataset contained NAN, it is
ignored as they may create noise in further processing and
lower the accuracy of the prediction. The attributes of the
datasets are also divided into features and target attributes.
B. Data Preparation / Preprocessing
Collected data contains some impurities, therefore, not
suitable for modeling in its raw form. It needs to be cleaned
and pre-processed. For preprocessing the data transformation
and normalization are carried out. This is accomplished by the
application of natural logarithmic transformation and min-max
normalization. Natural logarithmic transformation is used to
reduce the skewness of the dataset distribution [38] and the
min-max normalization technique provides high accuracy,
learning speed and transforms the large value ranges into small
range values. After normalization, a dataset is built using a
weighted random sampler technique. The dataset is divided
into sub-datasets for training, validation, and testing. This
distribution is done randomly with 70% training data,
10%validation data, and 20%testing data. The purpose of
training is to make the dataset applicable to train or fit for the
model. Validation is used for unbiased evaluation at the time of
hyperparameter tuning and the test set does unbiased
evaluation of the final model.
The dataset contains many outliers which can affect the
sample mean/variance and skew the results. To eliminate the
noise due to outliers, considering the median and the
interquartile range can yield better results. Therefore, Robust
Scaling is applied to relevant features in the data set.
C. Modelling
The model is implemented using a dense neural network
which consists of three types of layers: input, hidden, and
output and shown in Fig. 2. In this type of network, all the
neurons at one layer are connected with all the neurons of the
previous layer. Various configurations of the model are
designed for each dataset and later the configuration with the
best results is finalized. Activation functions along with the
layers are decided to design a network. Also, the initial values
of hyperparameters are decided.
Fig. 2. Dense Neural Network Architecture.
For different configurations on each dataset different
activation functions are used within the hidden layers in this
study like ReLU (Rectified Linear Unit), GELU (Gaussian
Error Linear Unit), Tanh (Hyperbolic Tangent), Softmax, and
Sigmoid [29][30] and Table I represents all the activation
functions with range.
ReLU
is
a
non-linear,
differentiable,
and
computationally fast converge training phase of the
network.
A sigmoid activation function is non-linear,
differentiable, and output ranges from 0 to 1 so that the
output layer produces the result in probability for binary
classification.
Tanh is non-linear, differentiable, monotonic, and used
for classification. The negative inputs are mapped
strongly negative, and the zero inputs are mapped near
zero.
GELU is formed by combining properties of dropout,
zoneout, and ReLu. It is a neuron activation function
based on the Gaussian function.
The softmax activation function normalizes the
probability distribution of predicted target classes.
685 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
TABLE I.
DIFFERENT ACTIVATION FUNCTION
Activation Function
Function f(x)
Range
ReLU
{
[0,∞)
Precision is the number of positive class predictions that are
actually positive class predictions. It is calculated as number of
correctly predicted positive observations divided by total
predicted positive observations [32].
(0,1)
Precision = TP/ (TP + FP)
(-1,1)
A recall is defined as the number of correct positive
predictions divided by all correct positive samples [32].
Sigmoid, ( )
Tanh,
( )
GELU
(
)
[
where
( )]
√
( ) is the error function
∑
for i= 1,2,3,…,j
Softmax
(-0.17,∞)
(0,1)
(5)
IV. IMPLEMENTATION AND RESULTS
The deep learning model is implemented on various
datasets as shown in 0and determines its software reliability
prediction ability. The objective of the model is to classify
modules as faulty or non-faulty, based on different features of
the dataset and all the datasets are shown in Table II.
TABLE II.
DATASET
Dataset
Data with
defects
Data with no
defects
Target
Feature
MJ
14299
79849
Bugs
PC5
5176
16670
Defective
JM1
2106
8779
Defects
MC1
68
9398
C
PC2
23
5566
C
KC1
326
1783
Problem
PC4
178
1280
C
PC1
77
1032
Defects
(1)
PC3
77
1032
Defects
is the
KC2
107
415
Problems
Datatrieve
11
119
Faulty
COCOMO NASA
26
34
Rely
Further, the cost-sensitive learning method is used to tackle
the class imbalance problem by assigning different weights to
both classes (faulty and non-faulty). The difference in weights
influences the classification of the classes during the training
phase. The whole purpose is to penalize the misclassification.
E. Testing
In this phase, the evaluation of the model is done
statistically using four standard performance metrics accuracy,
precision, recall, and F1-score. The percentage of correct
predictions for test data is referred to as accuracy. The
confusion matrix along with all these four-performance metrics
calculates support value. The support is the actual number of
occurrences of a response class in a dataset. Further, the
accuracy of the model is evaluated using formulas:
Accuracy = (TP + TN) / (TP + FP + FN + TN)
(4)
F1 = 2*(precision * recall)/ (precision + recall)
Cross entropy ( ) is a loss function that is used during
training to adjust model weights and find optimal weights. The
aim is to minimize the loss, where a perfect model has zero
loss. While zero loss is often difficult to achieve practically,
models are optimized to minimize the loss to the extent
possible.
for n classes, where is the truth label and
Softmax probability for the jthclass [31].
Recall = TP/ (TP + FN)
F1-score measures the accuracy of a model on a dataset and
is calculated as the harmonic mean of the model‟s precision
and recall [32],
D. Training
In the training section, cross-entropy is used as a loss
function. Cross-entropy calculates the difference between two
probability distributions. SGD (Stochastic Gradient Descent)
and Adam (Adaptive Moment Estimation) are used for
optimization. They are used to update the weights after each
iteration. The updated weights are saved so that further, loss
and accuracy can be calculated. SGD is used as
the optimization technique because of its ability to learn faster
by randomly selecting a subset of data, generally called batch,
and performing gradient descent iteratively on that subset.
Adam optimization is an enhancement over SGD. It brings the
best of AdaGrad and RMSProp, which are extensions of SGD
to provide an adaptive learning rate with little memory
requirements and computational efficiency.
∑
(3)
(2)
where, TP= True positive, TN= True negative, FP= False
positive, FN= False negative.
The MadeyskiJureczko (MJ) dataset presents metrics that
are used to build software defect prediction models for
component-based software. Different metrics included are
6Chidamber and Kemerer (CK) metrics, 1 Henderson-Sellers
(HS) metric, 5 Bansiy and Davis (BD) metrics, 3 Tang and 2
Martin metrics. Other metrics are based on McCabe‟s
complexity. The target attribute is named „bugs‟.
Datasets MC1, PC1, PC2, PC3, PC4, and PC5 are used for
software defect prediction with 40, 21, 36, 22, 37, 39attributes
respectively. Each dataset has 1 target attribute for predicting
faults. The target attribute for MC1, PC1, PC2, PC3, PC4, PC5
is named as „C‟, „defects‟, „C‟, „defects‟, „C‟ and „defective‟
respectively.
JM1, KC1, and KC2 datasets are used to encourage
repeatable, verifiable, refutable, and improve predictive models
of software engineering. All datasets have 22 attributes
686 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
consisting of 5 different lines of code measure, 3 McCabe
metrics, 4 base Halstead measures, 8 derived Halstead
measures, a branch count, and 1 target field. The JM1 target
field is named as„defects‟, KC1 target attribute is named as
„problem‟and KC2 target attribute is classified as „problems‟
which tells whether the module contains/does not contain
reported defects in terms of 1 and 0.
Datatrieve dataset consists of a total of nine attributes
including eight condition attributes and one target attribute.
The target attribute is named „faulty6_1‟ which has values of
either 1 or 0. 0 indicates no faults are found and 1 indicates that
faults are present. The purpose of the dataset is to study the
correlation of code quality with the characteristics of the
modules and the transition process between two versions of the
software. The characteristics of the modules are recorded using
attributes
“LOC6_0”,
“LOC6_1”,
“AddedLOC”,
“DeletedLOC”,
“DifferentBlocks”,
“ModificationRate”,
“ModuleKnowledge”, “ReusedLOC”, “Faulty6_1”.
The COCOMO NASA dataset attributes are used to find
the required software reliability. The target feature is named
„Attribute Rely‟. Values of various attributes are represented in
the form of nominal, very high, high, low which are further
converted into 0 and 1 during preprocessing. The seventeen
attributes and its characteristics used are RELY (Required
software reliability), DATA (Database size), CPLX (Process
complexity), TIME (Time constraint), STOR (Main memory),
VIRT, TURN (Turnaround time), ACAP (analysts),
AEXP(Application), PCAP(Programmers), VEXP (Virtual
machine), LEXP(Language), MODP (Modern Programming),
TOOL (use of software), SCED (Schedule information), LOC
(Line of code), ACT_EFFORT (Actual effort).
On all the twelve datasets, the same modeling approach is
used with different configurations. In modeling, different
layouts of neurons, and the values of hyperparameters (epoch,
batch size, learning rate) have experimented and all the values
are hyper tuned to achieve better results, the optimal
combination of hyperparameters minimizes the loss function.
Different dataset results in different values of parameters and
different configurations for optimal results are shown in 0.
The loss and accuracy graph over the number of epochs for
every dataset is shown in TABLE VFig. 4 to 27. A good
prediction model should have low loss and high accuracy. As
observed from the loss and accuracy graphs from all the
datasets, the accuracy of the model over the iterations is higher
than the loss respectively.
An accuracy metric is used to measure how accurate the
developed model‟s prediction is as compared to actual data.
The loss values are calculated on training data and verified
using validation data. Loss values are observed after each
iteration of optimization to find the optimal model parameters.
The model's loss and accuracy data for each epoch are saved in
the history which is used by the model's developer to make
more informed decisions about the architectural choices that
must be made. Optimal Configuration for the datasets is
represented in Table III.
TABLE III.
OPTIMAL CONFIGURATION
Dataset
Layers in Model
Learning
Rate
Activation
Function
MJ
[24,1024,112,1]
0.04
ReLu
PC5
[39,1024,812,512,2]
0.0094
Tanh
JM1
[21,1024,512,256,1]
0.0004
Softmax,
GELU, ReLu
MC1
[40,1024,2]
0.009
Softmax
PC2
[35,1024,256,2]
0.001
ReLu
KC1
[21,1024,512,256,128,64,2]
0.0099
Tanh
PC4
[37, 1024,812,512,2]
0.0094
Tanh
PC1
[21,1024,2048,2]
0.0099
Tanh
PC3
[37,1024,512,2]
0.01
GELU
KC2
[21, 1024,256,1]
0.005
Sigmoid,
Tanh
Datatrieve
[9, 256,512,64,1]
0.00001
Tanh, ReLu
COCOMO
NASA
[17,512,128,1]
0.2
Tanh
The design model is tested using various performance
metrics i.e., accuracy, precision, recall, and F1- score. These
are the most commonly used reliable metrics for assessing the
performance of a prediction model. The performance
evaluation is done using a confusion matrix. The confusion
matrix provides a summary of the individual class predictions
for class-specific evaluations and provides information in terms
of TP, TN, FP, and FN.
The results of the prediction model are shown in Table IV
TABLE IVand Fig. 3.
TABLE IV.
PERFORMANCE METRICS
Dataset
Accuracy
Precision
Recall
F1score
Support
MJ
89%
0.90
0.96
0.93
55894
PC5
91%
0.99
0.90
0.95
11669
JM1
89%
0.92
0.95
0.93
5266
MC1
95%
0.99
0.95
0.97
7518
PC2
86%
0.99
0.86
0.93
3896
KC1
84%
0.90
0.91
0.91
1248
PC4
89%
0.99
0.87
0.93
895
PC1
85%
0.99
0.84
0.91
722
PC3
83%
0.99
0.81
0.89
1052
KC2
86%
0.89
0.94
0.92
311
Datatrieve
86%
0.97
0.87
0.92
83
COCOMO
NASA
96%
0.99
0.91
0.95
23
687 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
The prediction accuracy on PC1, PC2, PC3, PC4, and
PC5 datasets is more than 80%. The results are average
as compared to previous work on these datasets. It
concludes that this model is giving optimum results on
these datasets.
V. COMPARISON WITH EXISTING MODELS
Fig. 3. Performance Metrics Graph.
From the results following are the observations that are
made:
Among all the datasets, the model showed the highest
accuracy for the COCOMO NASA dataset i.e., 96%
accuracy with precision 99%, recall 91% and f1-score
95% but it has the least instances among all the datasets.
Datatrieve dataset also contains fewer instances and the
model‟s accuracy on Datatrieve is 86% with good
precision 97%, recall 87%, and f1-score 92%.
MJ dataset has the highest number of instances among
all the datasets and prediction accuracy on this dataset is
89% which is validating the model as a good model.
The model showed a precision of 90%, recall of 96%,
and f1-score of 93%. This shows that this model is
working well on a large dataset.
The prediction accuracy on MC1 and JM1 datasets is
95%, 89% respectively, though its instances are less
than MJ. Results of precision 99%; 92%, recall 95%;
95% and f1-score 97%; 93% respectively are also very
promising.
TABLE V.
Fig. 4. Accuracy Graph for MJ.
Fig. 7. Loss Graph for PC5.
Our proposed deep learning-based reliability prediction
model shows better results in terms of accuracy, precision,
recall, and f1-measure as compared to other techniques like
decision tree, linear regression, backpropagation neural
network, SVM, random tree, random forest, naïve bayes,
hybrid machine learning techniques, etc. For the dataset KC1
accuracy is second highest after VOTE [34] proposed by the
author Wang et.al achieved the highest precision but better as
compared to other models like Under Sampling Strategy
(USS), Random Forest (RF), and Naïve Bayes (NB), whereas
KC2 dataset achieved the highest accuracy, precision, recall
and f1-measure when compared with other machine learning
techniques. Datatrieve dataset achieved the highest accuracy
and precision when compared with the previous model (USS)
result given by author Rao et al. [35]. COCOMO NASA
dataset is evaluated with the highest score among all the
datasets, but the result cannot be reliable because the dataset is
small. Except for accuracy, where it is second after Random
Forest by Wang et.al[34], the MJ dataset, which is the largest
component-based dataset, outperforms all other approaches
like Linear Regression (LR), Decision Trees (DT), Naïve
Bayes (NB), SVM, Stochastic Gradient Boosting, KNN in all
performance metrics[33]. While JM1 dataset results top in all
the metrics as compared to models USS [35], VOTE, RT, NB
[33]. Dataset MC1, PC1, PC2, PC3, PC4, and PC5 achieve the
highest results in precision, recall, and f1-score.Performance
metrics of various models for different datasets are listed in
Table VII.
LOSS AND ACCURACY GRAPH FOR VARIOUS DATASETS
Fig. 5. Loss Graph for MJ.
Fig. 6. Accuracy Graph for PC5.
Fig. 8. Accuracy Graph for JM1.
Fig. 9. Loss Graph for JM1.
688 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
Fig. 12. Accuracy Graph for PC2.
Fig. 10. Accuracy Graph for MC1.
Fig. 11. Loss Graph for MC1.
Fig. 13. Loss Graph for PC2.
Fig. 14. Accuracy Graph for KC1.
Fig. 15. Loss Graph for KC1.
Fig. 16. Accuracy Graph for PC3.
Fig. 17. Loss Graph for PC4.
Fig. 18. Accuracy Graph for PC1.
Fig. 20. Accuracy Graph for PC3.
Fig. 21. Loss Graph for PC3.
Fig. 23. Loss Graph for KC2.
Fig. 24. Accuracy Graph for Transaction.
Fig. 19. Loss Graph for PC1.
Fig. 22. Accuracy Graph for KC2.
689 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
Fig. 26. Accuracy Graph for COCOMO NASA.
Fig. 25. Loss Graph for Transaction.
TABLE VI.
Fig. 27. Loss Graph for COCOMO NASA.
DATASET DESCRIPTION
Dataset
Criterion
No. of
Attributes
No. of
instances
Source of Dataset
MJ
Software defect
prediction
24
94148
https://madeyski.e-informatyka.pl/tools/software-defect-prediction/
PC5
Software defect
prediction
39
17186
https://github.com/klainfo/NASADefectDataset/raw/master/OriginalData/MDP/PC5.arff
JM1
Software defect
prediction
22
10885
http://promise.site.uottawa.ca/SERepository/datasets/jm1.arff
MC1
Software defect
prediction
40
9466
https://www.openml.org/data/download/53939/mc1.arff
PC2
Software defect
prediction
36
5589
https://www.openml.org/data/download/53952/pc2.arff
KC1
Software defect
prediction
21
2109
http://promise.site.uottawa.ca/SERepository/datasets/kc1.arff
PC4
Software defect
prediction
37
1458
https://www.openml.org/data/download/53932/pc4.arff
PC1
Software defect
prediction
22
1109
http://promise.site.uottawa.ca/SERepository/datasets/pc1.arff
PC3
Software defect
prediction
22
1109
https://www.openml.org/data/download/53933/pc3.arff
KC2
Software defect
prediction
22
522
http://promise.site.uottawa.ca/SERepository/datasets/kc2.arff
Datatrieve
Success/ failure in
the transaction
9
130
http://promise.site.uottawa.ca/SERepository/datasets/datatrieve.arff
COCOMO
NASA
Required software
reliability
17
60
http://promise.site.uottawa.ca/SERepository/datasets/cocomonasa_v1.arff
TABLE VII.
Dataset
MJ
PC5
COMPARISON OF ACCURACY WITH THE EXISTING MODELS IN THE LITERATURE
Model
Accuracy
Precision
Recall
F - measure
Proposed
89%
90.00%
96%
93%
Linear Regression (LR) [33]
74.99%
18.22%
Decision Tree (DT) [33]
74.45%
10.79%
Naive Bayes (NB) [33]
73.76%
22.28%
Support Vector Machine (SVM) [33]
78.19%
26.58%
Stochastic Gradient Boosting (GBM) [33]
76.16%
22.03%
K-Nearest Neighbor (KNN) [33]
84.24%
56.83%
Proposed
91%
VOTE [34]
97.46%
Random Tree [34]
97.08%
Naive Bayes [34]
96.44%
99%
90.00%
95%
690 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
Dataset
JM1
MC1
PC2
KC1
PC4
PC1
PC3
KC2
Datatrieve
COCOMO
NASA
Model
Accuracy
Precision
Recall
F - measure
Proposed
89%
92%
95%
93%
USS[35]
66.40%
82.50%
96.90%
89.10%
VOTE [34]
81.44%
Random Tree [34]
75.30%
Naive Bayes [34]
80.45%
Proposed
95%
99%
95%
97%
USS[35]
85.50%
67%
43.30%
49.70%
VOTE [34]
99.42%
Random Tree [34]
99.43%
Naive Bayes [34]
93.80%
Proposed
86%
99%
86%
93%
VOTE [34]
99.53%
Random Tree [34]
99.29%
Naive Bayes [34]
97.11%
Proposed
84%
90.00%
91%
91%
USS[35]
78.50%
87.80%
95.30%
91.40%
VOTE [34]
85.62%
Random Tree [34]
82.85%
Naive Bayes [34]
82.50%
Proposed
89%
99%
87%
93%
ILLE-SVM (Improved Locally Linear Embedding and Support)[37]
90.00%
62.50%
83.33%
71.43%
VOTE [34]
90.28%
Random Tree [34]
87.74%
Naive Bayes [34]
87.11%
Proposed
85%
99%
84%
91%
USS[35]
84.10%
52.60%
36.30%
40.90%
ILLE-SVM (Improved Locally Linear Embedding and Support)[37]
84.78%
78.26%
90.00%
83.68%
LASSO-SVM[36]
78.26%
79.40%
75.46%
79.85%
SVM[36]
71.32%
69.29%
69.25%
70.64%
Linear regression (LR)[36]
84.20%
61.50%
69.60%
65.30%
Back propagation neural network(BPNN)[36]
79.30%
60.60%
72.40%
66.90%
Cluster Analysis (CA)[36]
71.60%
63.50%
71.20%
67.10%
VOTE [34]
93.73%
Random Tree [34]
91.64%
Naive Bayes [34]
89.12%
Proposed
83%
99%
81%
89%
USS[35]
76.60%
37.60%
26.10%
30.10%
ILLE-SVM (Improved Locally Linear Embedding and Support)[37]
89.66%
73.08%
86.36%
79.05%
VOTE [34]
89.12%
Random Tree [34]
86.01%
Naive Bayes [34]
48.30%
Proposed
86%
89%
94%
92%
VOTE [34]
82.91%
Random Tree [34]
79.86%
Naive Bayes [34]
83.62%
Proposed
86%
USS[35]
50.00%
91.20%
99%
95.40%
Proposed
96%
99%
91%
95%
691 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
VI. CONCLUSION
Predicting software reliability has become an essential
activity in software development to develop better quality
software. Recently, the researcher community has identified
that computational intelligence techniques can outperform
traditional prediction methods. This study predicts the software
reliability using a dense neural network which is implemented
using deep learning. The classification is performed on twelve
datasets KC1, KC2, Datatrieve, COCOMO NASA, MJ, JM1,
MC1, PC1, PC2, PC3, PC4, and PC5. The optimal model is
designed with different configurations for each dataset for
classification. Results are evaluated using four standard
performance metrics, i.e., accuracy, precision, recall, and f1score. The results obtained by our model show better results as
compared to previous models in terms of accuracy, especially
dataset MJ, JM1, KC2, and COCOMO NASA.
Hybridization of deep learning techniques with other
computational intelligence techniques can be explored for
better results. The same study can be extended with large
industrial datasets to achieve better results and can also be
experimented with other algorithms.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
REFERENCES
C. Chen etal., “Reliability analysis using deep learning,” International
Design Engineering Technical Conferences and Computers and
Information in Engineering Conference, vol. 51739, pp. 1–10, Aug.
2018.
J. C. Bezdek, “On the relationship between neural networks, pattern
recognition and intelligence”, International journal of approximate
reasoning, vol. 6, pp. 85–107, 1992.
Robert J. Marks II, “Intelligence: Computational versus artificial,” IEEE
Trans. Neural Networks, vol. 4, pp. 737–739, 1993.
N. Karunanithi, D. Whitley and Y. K. Malaiya, “Prediction of software
reliability using connectionist models,” IEEE Transactions on software
engineering, vol. 18, p. 563, 1992.
S. L. Ho, M. Xie and T. N. Goh, “A study of the connectionist models
for software reliability prediction,” Computers & Mathematics with
Applications, vol. 46, pp. 1037–1045, 2003.
L. Tian and A. Noore, “Evolutionary neural network modeling for
software cumulative failure time prediction,” Reliability Engineering &
system safety, vol. 87, pp. 45–51, 2005.
E. O. Costa, G. A. de Souza, A. T. R. Pozo and S. R. Vergilio,
“Exploring genetic programming and boosting techniques to model
software reliability,” IEEE Transactions on Reliability, vol. 56, pp. 422–
434, 2007.
P.F. Pai and W.C. Hong,, “Software reliability forecasting by support
vector machines with simulated annealing algorithms,” Journal of
Systems and Software, vol. 79, pp. 747–755, 2006.
Q. P. Hu, M. Xie, S. H. Ng and G. Levitin, “Robust recurrent neural
network modeling for software fault detection and correction
prediction,” Reliability Engineering & System Safety, vol. 92, pp. 332–
340, 2007.
J. H. Lo, “A study of applying ARIMA and SVM model to software
reliability prediction,” International Conference on Uncertainty
Reasoning and Knowledge Engineering, vol. 1, pp. 141–144, 2011.
H. Li, M. Zeng, M. Lu, X. Hu and Z. Li, “Adaboosting-based dynamic
weighted combination of software reliability growth models,” Quality
and Reliability Engineering International, vol. 28, pp. 67–84, 2012.
P. Roy, G. S. Mahapatra and K. N. Dey, “Neuro-genetic approach on
logistic model based software reliability prediction,” Expert systems
with Applications, vol. 42, pp. 4709–4718, 2015.
C. Jin and S. W. Jin, “Prediction approach of software fault-proneness
based on hybrid artificial neural network and quantum particle swarm
optimization,” Applied Soft Computing, vol. 35, pp. 717–725, 2015.
[14] R. Malhotra, “A systematic review of machine learning techniques for
software fault prediction,” Applied Soft Computing, vol. 27, pp. 504–
518, 2015.
[15] R. S. Wahono, “A Systematic Literature Review of Software Defect
Prediction,” Journal of Software Engineering, vol. 1, pp. 1–16, 2015.
[16] A. Jaiswal and R. Malhotra, “Software reliability prediction using
machine learning techniques,” International Journal of System
Assurance Engineering and Management, vol. 9, pp. 230–244, 2018.
[17] C. J. Clemente, F. Jaafar and Y. Malik, “Is predicting software security
bugs using deep learning better than the traditional machine learning
algorithms?,” IEEE International Conference on Software Quality,
Reliability and Security (QRS), pp. 95–102, 2018.
[18] O. Dahiya and K. Solanki,“An Efficient APHT Technique for
Requirement-Based Test Case Prioritization,” International Journal of
Engineering Trends and Technology, vol. 69, pp. 215–227, 2021.
[19] O. Dahiya and K. Solanki,“Prevailing Standards in Requirement-Based
Test Case Prioritization: An Overview,” ICT Analysis and Applications,
pp. 467–474, 2021.
[20] O. Dahiya and K. Solanki,“A Study on Identification of Issues and
Challenges Encountered in Software Testing,” In Proceedings of
International Conference on Communication and Artificial Intelligence ,
pp. 549–556, 2021.
[21] O. Dahiya, K. Solanki, and A. Dhankhar, “Risk-based testing:
identifying, assessing, mitigating & managing risks efficiently in
software testing,” International Journal of advanced research in
engineering and technology, vol. 11, pp. 192–203, 2020.
[22] O.Dahiya and K. Solanki,“A systematic literature study of regression
test case prioritization approaches,” International Journal of Engineering
& Technology, vol. 7, pp. 2184–2191, 2018.
[23] S. Yadav and B. Kishan, “Reliability of Component-Based Systems- A
Review”, International Journal of Advanced Trends in Computer
Science and Engineering, vol.8, pp. 293–299, 2019.
[24] S. Yadav and B. Kishan, “Assessment of software quality models to
measure the effectiveness of software quality parameters for Component
Based Software (CBS),” Journal of Applied Science andComputations,
vol. 6, pp. 2751–2756, 2019.
[25] S. Yadav and B. Kishan, “Analysis and Assessment of Existing
Software Quality Models to Predict the Reliability of Component-Based
Software,” International Journal of Emerging Trends in Engineering
Research, vol. 8, pp. 2824–2840,2020.
[26] S. Yadav and B. Kishan, “Component-Based Software System using
Computational Intelligence Technique for Reliability Prediction,”
International Journal of Advanced Trends in Computer Science and
Engineering, vol. 9, pp. 3708–3721, 2020.
[27] S. Yadav and B. Kishan, “Assessments of Computational Intelligence
Techniques for Predicting Reliability of Component Based Software
Parameter and Design Issues,” International Journal of Advanced
Research in Engineering and Technology, vol. 11, pp. 565–584, 2020.
[28] O. Al Qasem and M. Akour, “Software fault prediction using deep
learning algorithms,” International Journal of Open Source Software and
Processes (IJOSSP), vol. 10, pp. 1–19, 2019.
[29] Wikipedia contributors, “Activation function,” [Online]. Available:
https://en.wikipedia.org/w/index.php?title=Activation_function&oldid=
1076034609. [Accessed 2022].
[30] Dishashree26, “Activation Functions | Fundamentals Of Deep
Learning,”
January
2020.
[Online].
Available:
https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deeplearning-activation-functions-when-to-use-them/. [Accessed December
2021].
[31] Wikipedia contributors, “Cross entropy,” 2022. [Online]. Available:
https://en.wikipedia.org/w/index.php?title=Cross_entropy&oldid=10714
50106. [Accessed 2022].
[32] M. Sunasra, “Performance Metrics for Classification problems in
Machine
Learning,”
March
2019.
[Online].
Available:
https://medium.com/@MohammedS/performance-metrics-forclassification-problems-in-machine-learning-part-i-b085d432082b.
[Accessed January 2022].
692 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 3, 2022
[33] B. S. Deshpande, B. Kumar and A. Kumar, “Assessment of Software
Reliability by Object Oriented Metrics using Machine Learning
Techniques,” International Journal of Grid and Distributed Computing,
vol. 14, pp. 01–10, 2021.
[34] T. Wang, W. Li, H. Shi and Z. Liu, “Software defect prediction based on
classifiers ensemble,” Journal of Information & Computational Science,
vol. 8, pp. 4241–4254, 2011.
[35] K. N. Rao and C. S. Reddy, “A novel under sampling strategy for
efficient software defect analysis of skewed distributed data,” Evolving
Systems, vol. 11, pp. 119–131, 2020.
[36] K. Wang, L. Liu, C. Yuan and Z. Wang, “Software defect prediction
model based on LASSO--SVM,” Neural Computing and Applications,
pp. 8249–8259, 2021.
[37] C. Shan, H. Zhu, C. Hu, J. Cui and J. Xue, “Software defect prediction
model based on improved LLE-SVM,” in 2015 4th International
Conference on Computer Science and Network Technology (ICCSNT),
vol. 1, 2015, pp. 530–535.
[38] L. Qiao, X. Li, Q. Umer and P. Guo, “Deep learning based software
defect prediction,” Neurocomputing, vol. 385, pp. 100–110, 2020.
693 | P a g e
www.ijacsa.thesai.org