International Journal of Advanced Trends in Computer Science and Engineering
International Journal of Advanced Trends in Computer Science and Engineering
International Journal of Advanced Trends in Computer Science and Engineering
294
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
new models using machine learning to improve accuracy in optimization algorithm (SBO) [3], Particle Swarm
software effort estimation [3][15][16][17]. Using a based Optimization (PSO) [37][38], COCOMO [39][40][41],
feature selection [18][19][20][21], or parameter optimization Cuckoo Search (CS) [42]. There are so many optimization
[22][23][24]. Some prediction techniques have been algorithms that have been submitted in an effort to develop
suggested but none have proved consistently successful in software, because each optimization algorithm has different
predicting software development efforts [11]. Estimation adaptation and performance capabilities.
methods have long been introduced, but this approach still has The Meta-heuristic algorithm can effectively solve problems
the potential to make estimates on software accurately and in non-linear optimization [43]. Implementation of this
stably. So software development using machine learning to algorithm can be calculated in various ways to solve the
produce more reliable and accuracy development is still optimization problem [33][43]. To increase effort predictions
needed in this field of research. by exploring parameter settings is one of the functions of
using meta-heuristics [44]. Using the meta-heuristic approach
Problems that occur based on previous studies, are in the early in finding the best feature parts, by adopting a classifier to
stages of software project development. Often considered an select features optimally using the wrapper model [24]. In
obstacle, namely the problem of time and cost framework. But addition, significantly optimization of using metaheuristics
significantly that the costs estimation and duratin estimation can have the ability to find full search space and high-quality
are inversely proportional. So in this study, the first problem solutions in a reasonable period of time using global search
that the most accurate estimation is to estimate the effort and capabilities [45]. Metaheuristic method designed to overcome
duration. Because the estimation model of the effort and this problem [46]. Metaheuristic optimization method gives
duration is proposed as a decision-making tool in developing good results than traditional and non-evolutionary methods in
software so that it is released from errors that cause negative terms of increasing accuracy using the feature selection
implications or failures in software projects. While the second method.
problem, is the prediction process must be based on historical
information. So in developing the model for effort estimation In overcoming limitations and narrowing the gap between the
and duration estimation must use data mining. Because the findings of recent research and the potential for dissemination
presence of noisy, irrelevant, and redundant data on the in the application of machine learning algorithms to estimate
dataset will greatly affect ML performance, because poor data the effort and duration of developing software efforts early in
quality arises due to missing values, outliners and missing the project life cycle. A comprehensive approach is used to
values can cause uncertainty and inconsistency. According to ensure its usefulness and the accuracy of exceptional
Huang et al (2006), estimation problems are complex estimates and resilience to in-data noise, irrelevance, and
problems and have features such as nonlinear relationships; redundant. According to the results obtained in the literature
Measurement of software metrics; and software processes that review, several studies only focused on adjusting individual
are inaccurate and uncertain changes rapidly, no model has algorithms for best performance and accuracy in Machine
proven to be the perfect solution [35]. The implementation of Learning (ML) models, such as the use or improvement of
software at an early stage can significantly improve the algorithms, such as neural networks, case-based reasoning,
success of software projects if it can make precise and support vector regression, decision trees, and etc. So that the
accurate effort and duration estimation. So it is necessary to application of statistics and machine learning algorithms are
do a comprehensive approach, by doing the data preparation used to effort estimation and estimated duration.
stage until implementation to produce accurate and reliable Contributions in this study, will integrate the data
software estimates. preprocessing, meta learning, feature selection; and
Optimization parameter aims to convert heterogeneous data
In software efforts estimation several types of algorithms can into homogeneous data to improve accuracy for effort
be applied, including Genetic Algorithm [18]; Support Vector estimation and duration estimation. The aim of the study was
Machine [25][26]; Fuzzy [27]; Support Vector Regression to overcome the limitations of the gap, in developing the SEE
[18][28]; Artificial Neural Network [29]; Adaptive model in the early stages of software project planning using
Regression [30]. Approaches for comparison of these models the meta-heuristic approach and the application of machine
are often invalid and may make things worse. Identified learning to effort and estimation of duration. In this study, we
several theoretical problems with a study comparing different will use the ISBSG dataset, which is the most popular dataset
estimation models in several common datasets to produce the and has the most reliable data source..
best models [31]. No specific classifier can do the best
accuracy results for all data sets [32].
2. RELATED WORK
In a few years, many optimization algorithms have been used
as enhancements and adjustments to effort estimator Feature selection is a process of removing irrelevant and
parameters. There are two categories of optimization methods excessive features. Because in large data the use of feature
in general, including: 1) Mathematical methods, like: selection to overcome a large number of input features by
Dynamic Programming (DP) and Quasi-Newton (QN) [33]; 2) looking for subset feature space. The search method is chosen
Metaheuristic algorithm33, like: Genetic algorithm (GA) to do a search and evaluators submit values for each feature
[18][28][34][35], Bee Colony Optimization (BCO) and Ant section [47]. The function of selecting features to do relevant
Colony Optimization (ACO) [36], satin bower bird and most informative data extraction, so that the classification
295
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
is more suitable in the feature set [36]. Most of the problems concluded that the ensemble method that has the best
in classification can be solved by selecting features, but still performance is bagging and stacking.
need a new approach to determine the sub-feature options in
increasing accuracy [48]. The main purpose of features All this time, the SEE technique has experienced a lot of
selection in learning is to find the features so that it produces instability in producing precise approximate accuracy [50], so
high accuracy [26][49]. The performance of feature selection that to overcome this problem an ensemble learning approach
techniques is strongly influenced by the characteristics of the needs to be applied [68]. Ensemble to predict software project
dataset, this has an impact on the problem of accuracy and development efforts by combining more than one SDEE
time complexity of various feature selection techniques. technique used. To analyze the accuracy of the ML model by
using MMRE and MdMRE that have lower values, while the
In the prediction system is very influential on the data Pred (25) is higher, this shows a more accurate estimate.
collection used. Although there have been many models According to Wen et al (2012), it was shown that ANN and
proposed to solve this problem. However, there are still many SVR were the most accurate (median MMRE around 35% and
models that significantly and consistently have accurate Pred (25) around 70%), followed by CBR, Decision Tree
results that funds have uncertainty about prediction (DT), and Genetic programming (median MMRE and Pred
techniques [50]. Attribute noise, incomplete, and inconsistent (25) around 50%), while Bayesian Networks (BN) have the
in the software measurement dataset lowers the performance worst accuracy (median MMRE around 100% and Pred (25)
of machine-learning [51]. Data quality will decrease when around 30%) [11]. Whereas Idri et al (2016), the performance
used on heterogeneous and inconsistent datasets [52]. of the ensemble effort estimation technique that SVR is the
Irrelevant and inconsistent project effects on downhill most accurate (median Pred (25) 50% and MMRE 48.6%),
estimates by designing frameworks, where all projects are ANN (median Pred (25) 40% and MMRE 49.9%), while
clustered [53]. The existence of each dataset that is not Neuro Fuzzy (NF) is the least accurate (Median Pred (25)
normally distributed will imply an effort to develop an 31% and MMRE 79.5%) [69]. based on the literature review,
accurate method [54]. The choice of features is used to speed it can be concluded that ANN and SVR are ML techniques
up the performance of algorithms in mining data and improve that have the best and most accurate accuracy in predicting
data quality, by reducing the dimensions of the feature space, software development.
removing data that is excessive, irrelevant, and noisy [55][56].
A collection of relevant dataset features can improve accuracy While Genetic method algorithms can improve performance
[56]. The selection of features will explore the effects of in ML and feature selection [18][28]. The parameters of the
attributes that are not relevant [57]. The machine learning basic COCOMO model can be improved by applying simple
model is greatly influenced by the level of accuracy by using genetic algorithms [40]. Combining the GA and SVM
the dataset [58]. Data preparation used to build machine methods can improve predictions more accurately by finding
learning models is needed, by selecting, cleaning, reducing, the best SVM regression parameters by the proposed model
transforming and feature selection [51][58[59]. The use of [70]. The Satin Bowerbird Optimizer (SBO) algorithm
efficient machine learning algorithms is an important task in compared to the five most famous new algorithms (Ant Lion
features selection as reduction dimensions. However, the Optimization (ALO), Partial Swarm Optimization (PSO), Fire
proliferation of this feature selection technique raises the Fly optimization (FA), GA and Artificial Bees Colony (ABC))
difficulty of choosing the algorithm for selecting the features has the best performance than other algorithms, both in test
that are most suitable for an application, resulting from the functions and is statistically superior [3].
selection of different features [60].
3. LITERATURE
There are several ensemble methods that have been proposed,
such as bagging, boosting, random sampling techniques [61]; 3.1 Satin Bowerbird Optimizer (SBO)
and stacking [62][63]. Using the ensemble method in
performing different data collection will achieve better The satin bower bird optimization algorithm is one type of
accuracy than individual techniques [63]. Boosting [64][65] algorithm that simulates the life of a type of satin bower bird
or bagging [22][65[66] is a representative approach that [71]. Bowerbirds during autumn and winter, will leave forest
combines preprocessing data oversampling and under habitat and move to open forests to find food. However, in the
sampling with ensemble classifiers. Integrating bagging with spring, they gather together and inhabit the area, because at
under-sampling is stronger than over-sampling [22]. Bagging that time it is the mating season for them. During the season
provides a large advantage in accuracy, with testing on real they will make different materials such as flowers, fruits,
datasets and simulations using classification, regression trees shiny objects, branches, and dramatic movements to attract
and subset selection in linear regression [66]. Bagging is a women's attention which is a variable in this regard. That male
method that can handle class imbalances, and can improve birds use natural instincts and imitate other males to build
performance in noisy data environments [67]. The application their nests [71].
using the ensemble approach serves to predict the average
using strong machine learning, stabilize the model, reduce the According to satin bowerbird's life principle, here are the
influence of noise in the data and have an impact on the steps of the SBO algorithm:
abnormal behavior of the algorithm. So that it can be
1. A Set of Random Bower Generations:
296
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
In equation, x is i bower/solution vector and x is k For pedagogical reasons, this is linear function , which is
member of this vector. x as a target solution in the presented in the Equation. 8.
current iteration. The value of j is calculated based on the
probability. x shows the elite position, which is f(x) = 〈w, x〉 + b with wϵℝ , bϵℝ (8)
stored in each algorithm cycle. The parameter λ
determines the attraction in the bower goal. λ Where 〈. . , . . 〉 denotes the dot product in ℝ . For case
determines the number of steps calculated for each nonlinear regression ( ) = 〈 , ∅( )〉 + , where ∅ are
variable. some nonlinear functions that map the input space to a higher
dimensional feature space ℝ . ε-SV, weight vector w and
threshold b are selected to optimize the problem.
297
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
299
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
The ISBSG dataset is applied, normalization of dependent 1. P.Bourque p, R.Dupuis, A.Abran, J.W.Moore, and
variables, cross validation approaches and Mean Absolute L.Tripp, Guide to the Software Engineering Body of
Residual Error (MAR) with Median and Mean of Absolute
Knowledge, IEEE Software , Vol. 16, Issue. 6, 1999.
Residuals (MdAR) are used as accuracy criteria [58]. Several
methods have been proposed to evaluate the accuracy of the https://doi.org/10.1109/52.805471
predictive value of accuracy in software. Besides being 2. N. Garcia-Diaz, C. Lopez-Martin, and A. Chavoya, “A
measured using various metrics on the value of accuracy, Comparative Study of Two Fuzzy Logic Models for
there are several accuracy assessments on popular software Software Development Effort Estimation,” Procedia
effort estimation, such as: n-fold cross validation, holdout and Technol., vol. 7, pp. 305–314, 2013.
leave-one-out cross validation (LOOCV) [11][88]. 3. S. H. Samareh Moosavi and V. Khatibi Bardsiri, “Satin
bowerbird optimizer: A new optimization algorithm
3.4 Model Evaluation
to optimize ANFIS for software development effort
There are three metrics used to evaluate the performance of estimation,” Eng. Appl. Artif. Intell., vol. 60, no. May
the software effort estimation, such as: Mean Magnitude 2016, pp. 1–15, 2017.
Relative Error (MMRE), Magnitude Relative Error (MRE), https://doi.org/10.1016/j.engappai.2017.01.006
and percentages of the PRED, which are calculated as follows 4. I. Kalichanin-Balich and C. Lopez-Martin, “Applying a
[89][90]:
feedforward neural network for predicting software
|estimated − actual| development effort of short-scale projects,” 8th ACIS
MRE = (10) Int. Conf. Softw. Eng. Res. Manag. Appl. SERA 2010, pp.
actual
269–275, 2010.
∑ MRE 5. G. R. Finnie, G. E. Wittig, and J.-M. Desharnais, “A
MMRE = (11)
N comparison of software effort estimation techniques:
Using function points with neural networks,
A
PRED(X) = (12) case-based reasoning and regression models,” J. Syst.
N Softw., vol. 39, no. 3, pp. 281–289, 1997.
A is the number of projects with ≤ and N the 6. S. J. Huang, N. H. Chiu, and Y. J. Liu, “A comparative
number of projects in the set. MRE must be less than 0.25 evaluation on the accuracies of software effort
(0.25) to be accepted by the software effort estimation estimates from clustered data,” Inf. Softw. Technol.,
model. whereas MMRE must have a minimum value and vol. 50, no. 9–10, pp. 879–888, 2008.
PRED (25) has a maximum value [90]. 7. H. Lee, “A structured methodology for software
development effort prediction using the analytic
4. CONCLUSION
hierarchy process,” J. Syst. Softw., vol. 21, no. 2, pp.
179–186, 1993.
The large number of previous studies prioritizes accurate
accuracy, regardless of the estimation process that takes a 8. J. Ryder, “Fuzzy modeling of software effort
long time. then based on that problem, the development of prediction,” Inf. Technol. Conf. 1998. IEEE, pp. 53–56,
software projects will produce good accuracy if it produces a 1998.
fast, efficient and practical time. So the need to implement the 9. A. Heiat, “Comparison of artificial neural network
machine learning algorithm to measure effort and duration of and regression models for estimating software
estimation. the proposed framework will present a holistic
development effort,” Inf. Softw. Technol., vol. 44, no.
approach to building models in estimating efforts and duration
in the early stages of software development. the stages in this 15, pp. 911–922, 2002.
study process include: data preprocessing, feature selection, https://doi.org/10.1016/S0950-5849(02)00128-3
optimization parameters, meta learning and 4 (four) machine 10. S. G. Macdonell and A. R. Gray, “A Comparison of
learning algorithms using the ISBSG dataset. In addition, Modeling Techniques for Software Development
classification problems also involve a number of features, this Effort Prediction,” Springer-Verlag, pp. 869–872,
is because not all available features are equally important. 1997.
Good and accurate classification must require small features.
11. J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang, “Systematic
for the type of validation used to measure the accuracy of the
estimated overall model using n-fold cross validation. literature review of machine learning based software
Whereas to evaluate accuracy to estimate software development effort estimation models,” Inf. Softw.
engineering using; MMRE, and PRED (25). Technol., vol. 54, no. 1, pp. 41–59, 2012.
300
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
12. M. Hosni, A. Idri, A. Abran, and A. Bou, “On the value 24. S. W. Lin, K. C. Ying, S. C. Chen, and Z. J. Lee,
of parameter tuning in heterogeneous ensembles “Particle swarm optimization for parameter
effort estimation,” Soft Comput., 2017. determination and feature selection of support vector
13. K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens, machines,” Expert Syst. Appl., vol. 35, no. 4, pp.
“Data mining techniques for software effort 1817–1824, 2008.
estimation: A comparative study,” IEEE Trans. Softw. 25. M. Azzeh and A. B. Nassif, “A hybrid model for
Eng., vol. 38, no. 2, pp. 375–397, 2012. estimating software project effort from Use Case
14. N. Saini and B. Khalid, “Effectiveness of Feature Points,” Appl. Soft Comput. J., vol. 49, pp. 981–989,
Selection and Machine Learning Techniques for 2016.
Software Effort Estimation,” IOSR J. Comput. Eng., 26. İ. Babaoglu, O. Findik, and E. Ülker, “A comparison of
vol. 16, no. 1, pp. 34–38, 2014. feature selection models utilizing binary particle
https://doi.org/10.9790/0661-16193438 swarm optimization and genetic algorithm in
15. M. Azzeh, D. Neagu, and P. I. Cowling, “Analogy-based determining coronary artery disease using support
software effort estimation using Fuzzy numbers,” J. vector machine,” Expert Syst. Appl., vol. 37, no. 4, pp.
Syst. Softw., vol. 84, no. 2, pp. 270–284, 2011. 3177–3183, 2010.
16. S. Aljahdali and A. F. Sheta, “Software effort https://doi.org/10.1016/j.eswa.2009.09.064
estimation by tuning COOCMO model parameters 27. M. Azzeh, D. Neagu, and P. Cowling, “Improving
using differential evolution,” ACS/IEEE Int. Conf. Analogy Software Effort Estimation using Fuzzy
Comput. Syst. Appl. - AICCSA 2010, pp. 1–6, 2010. Feature Subset Selection Algorithm,” PROMISE ACM,
17. V. Khatibi Bardsiri, D. N. A. Jawawi, S. Z. M. Hashim, pp. 71–78, 2008.
and E. Khatibi, “Increasing the accuracy of software 28. P. L. Braga, A. L. I. Oliveira, and S. R. L. Meira, “A
development effort estimation using projects GA-based Feature Selection and Parameters
clustering,” IET Softw., vol. 6, no. 6, p. 461, 2012. Optimization for Support Vector Regression Applied
18. A. L. I. Oliveira, P. L. Braga, R. M. F. Lima, and M. L. to Software Effort Estimation Chromosome design,”
Cornelio, “GA-based method for feature selection and ACM, pp. 1788–1792, 2008.
parameters optimization for machine learning 29. [29] S. Aljahdali, A. F. Sheta, and narayan C. Debnath,
regression applied to software effort estimation,” Inf. “Estimating Software Effort and Function Point
Softw. Technol., vol. 52, no. 11, pp. 1155–1166, 2010. Using Regression , Support Vector Machine and
https://doi.org/10.1016/j.infsof.2010.05.009 Artificial Neural Networks Models,” IEEE Access,
19. Q. Liu, J. Xiao, and H. Zhu, “Feature selection for 2015.
software effort estimation with localized 30. S. M. Satapathy, “Empirical Assessment of Machine
neighborhood mutual information,” Cluster Comput., Learning Models for Effort Estimation of Web-based
vol. 3456789, no. 1, 2018. Applications,” ISEC ’17, ACM, pp. 74–84, 2017.
20. J. Huang, Y. F. Li, J. W. Keung, Y. T. Yu, and W. K. 31. B. Kitchenham and E. Mendes, “Why Comparative
Chan, “An empirical analysis of three-stage Effort Prediction Studies may be Invalid,” PROMISE
data-preprocessing for analogy-based software effort '09 Proceedings of the 5th International Conference on
estimation on the ISBSG data,” Proc. - 2017 IEEE Int. Predictor Models in Software Engineering, 2009.
Conf. Softw. Qual. Reliab. Secur. QRS 2017, pp. https://doi.org/10.1145/1540438.1540444
442–449, 2017. 32. Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A
https://doi.org/10.1109/QRS.2017.54 general software defect-proneness prediction
21. P. Phannachitta, J. Keung, A. Monden, and K. framework,” IEEE Trans. Softw. Eng., vol. 37, no. 3, pp.
Matsumoto, “A stability assessment of solution 356–370, 2011.
adaptation techniques for analogy-based software 33. A. Kaveh and V. R. Mahdavi, “Colliding bodies
effort estimation,” Empir. Softw. Eng., vol. 22, no. 1, pp. optimization: A novel meta-heuristic method,”
474–504, 2017. Comput. Struct., vol. 139, pp. 18–27, 2014.
22. J. Błaszczyński and J. Stefanowski, “Neighbourhood 34. C. L. Huang and C. J. Wang, “A GA-based feature
sampling in bagging for imbalanced data,” selection and parameters optimizationfor support
Neurocomputing, vol. 150, no. PB, pp. 529–542, 2014. vector machines,” Expert Syst. Appl., vol. 31, no. 2, pp.
23. H. Velarde, C. Santiesteban, A. Garcia, and J. Casillas, 231–240, 2006.
“Software Development Effort Estimation based-on 35. S. J. Huang and N. H. Chiu, “Optimization of analogy
multiple classifier system and Lines of Code,” IEEE weights by genetic algorithm for software effort
Lat. Am. Trans., vol. 14, no. 8, pp. 3907–3913, 2016.
301
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
estimation,” Inf. Softw. Technol., vol. 48, no. 11, pp. 48. H. K. Bhuyan and N. K. Kamila, “Privacy preserving
1034–1045, 2006. sub-feature selection in distributed data mining,”
https://doi.org/10.1016/j.infsof.2005.12.020 Appl. Soft Comput. J., vol. 36, pp. 552–569, 2015.
36. P. Shunmugapriya and S. Kanmani, “A hybrid 49. M. Ramaswami and R. Bhaskaran, “A Study on Feature
algorithm using ant and bee colony optimization for Selection Techniques in Educational Data Mining,” J.
feature selection and classification (AC-ABC Comput., vol. 1, no. 1, pp. 7–11, 2009.
Hybrid),” Swarm Evol. Comput., vol. 36, pp. 27–36, 50. M. J. Shepperd and G. Kadoda, “Comparing software
2017. prediction techniques using simulation,” IEEE Trans.
37. J. Mercieca and S. G. Fabri, “A Metaheuristic Particle Softw. Eng., vol. 27, no. 11, pp. 1014–1022, 2001.
Swarm Optimization Approach to Nonlinear Model 51. C. Catal, O. Alan, and K. Balkan, “Class noise detection
Predictive Control,” vol. 5, no. 3, pp. 357–369, 2012. based on software metrics and ROC curves,” Inf. Sci.
38. V. Khatibi Bardsiri, D. N. A. Jawawi, S. Z. M. Hashim, (Ny)., vol. 181, no. 21, pp. 4867–4877, 2011.
and E. Khatibi, “A PSO-based model to increase the 52. V. Khatibi Bardsiri and E. Khatibi, “Insightful
accuracy of software development effort estimation,” analogy-based software development effort
Softw. Qual. J., vol. 21, no. 3, pp. 501–526, 2013. estimation through selective classification and
39. P. Agrawal and S. Kumar, “Early phase software effort localization,” Innov. Syst. Softw. Eng., vol. 11, no. 1, pp.
estimation model,” 2016 Symp. Colossal Data Anal. 25–38, 2015.
Netw., pp. 1–8, 2016. https://doi.org/10.1007/s11334-014-0242-2
https://doi.org/10.1109/CDAN.2016.7570914 53. V. Resmi, S. Vijayalakshmi, and R. S. Chandrabose, “An
40. R. K. Sachan et al., “Optimizing Basic COCOMO effective software project effort estimation system
Model Using Simplified Genetic Algorithm,” Procedia using optimal firefly algorithm,” Cluster Comput.,
Comput. Sci., vol. 89, pp. 492–498, 2016. 2017.
41. O. Benediktsson, D. Dalcher, K. Reed, and M. 54. E. Kocaguneli and T. Menzies, “Software effort models
Woodman, “COCOMO-Based Effort Estimation,” should be assessed via leave-one-out validation,” J.
Kluwer Acad. Publ., pp. 265–281, 2003. Syst. Softw., vol. 86, no. 7, pp. 1879–1890, 2013.
42. E. E. Miandoab and F. S. Gharehchopogh, “A Novel https://doi.org/10.1016/j.jss.2013.02.053
Hybrid Algorithm for Software Cost Estimation 55. J. Novaković, P. Strbac, and D. Bulatović, “Toward
Based on Cuckoo Optimization and K-Nearest optimal feature selection using ranking methods and
Neighbors Algorithms,” Eng. Technol. Appl. Sci. Res., classification algorithms,” Yugosl. J. Oper. Res., vol.
vol. 6, no. 3, pp. 1018–1022, 2016. 21, no. 1, pp. 119–135, 2011.
43. R. Kishore and D. . Gupta, “Software Effort Estimation 56. M. Hosni, A. Idri, and A. Abran, “Investigating
using Satin Bowerbird Algorithm,” vol. 5, no. 3, pp. Heterogeneous Ensembles with Filter Feature
216–218, 2012. Selection for Software Effort Estimation,” ACM, no. 2,
44. A. Corazza, S. Di Martino, F. Ferrucci, C. Gravino, F. 2017.
Sarro, and E. Mendes, “Using tabu search to configure 57. N. Acir, Ö. Özdamar, and C. Güzeliş, “Automatic
support vector regression for effort estimation,” classification of auditory brainstem responses using
Empir. Softw. Eng., vol. 18, no. 3, pp. 506–546, 2013. SVM-based feature selection algorithm for threshold
45. S. C. Yusta, “Different metaheuristic strategies to detection,” Eng. Appl. Artif. Intell., vol. 19, no. 2, pp.
solve the feature selection problem,” Pattern Recognit. 209–218, 2006.
Lett., vol. 30, no. 5, pp. 525–534, 2009. 58. P. Pospieszny, B. Czarnacka-Chrobot, and A.
46. J. Reca, J. Martínez, C. Gil, and R. Baños, “Application Kobyliński, “An effective approach for software
of several meta-heuristic techniques to the project effort and duration estimation with machine
optimization of real looped water distribution learning algorithms,” J. Syst. Softw., 2017.
networks,” Water Resour. Manag., vol. 22, no. 10, pp. 59. J. Huang, Y. F. Li, and M. Xie, “An empirical analysis
1367–1379, 2008. of data preprocessing for machine learning-based
47. D. Oreski, S. Oreski, and B. Klicek, “Effects of dataset software cost estimation,” Inf. Softw. Technol., vol. 67,
characteristics on the performance of feature pp. 108–127, 2015.
selection techniques,” Appl. Soft Comput. J., vol. 52, pp. https://doi.org/10.1016/j.infsof.2015.07.004
109–119, 2017. 60. N. Bidi and Z. Elberrichi, “Feature selection for text
https://doi.org/10.1016/j.asoc.2016.12.023 classification using genetic algorithms,” Proc. 2016
8th Int. Conf. Model. Identif. Control. ICMIC 2016, pp.
806–810, 2017.
302
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
61. E. Kocaguneli, T. Menzies, and J. W. Keung, “On the Investigation of the Effort Data Consistency in the
value of ensemble effort estimation,” IEEE Trans. ISBSG Repository,” researchGate, no. January 2005,
Softw. Eng., vol. 38, no. 6, pp. 1403–1416, 2012. 2014.
62. D. H. Wolpert, “Stacked Generalization,” vol. 5, pp. 75. C.-J. Hsu and C.-Y. Huang, “Comparison of weighted
241–259, 1992. grey relational analysis for software effort
63. T. Wang, W. Li, H. Shi, and Z. Liu, “Software Defect estimation,” Softw. Qual. J., vol. 19, no. 1, pp. 165–200,
Prediction Based on Classifiers Ensemble,” J. Inf. 2011.
Comput. Sci., vol. 16, no. December, pp. 4241–4254, 76. M. Fernández-diego, “Discretization Methods for NBC
2011. in Effort Estimation : An Empirical Comparison
64. R. E. Schapire, “The Strength of Weak Learnability based on ISBSG Projects,” pp. 103–106, 2012.
(Extended Abstract),” Mach. Learn., vol. 227, no. 77. C. Lokan and E. Mendes, “Investigating the use of
October, pp. 28–33, 1989. chronological split for software effort estimation,”
65. Y. Liu, E. Shriberg, A. Stolcke, and M. Harper, “Using IET Softw., vol. 3, no. 5, p. 422, 2009.
machine learning to cope with imbalanced classes in 78. V. Khatibi, D. Norhayati, A. Jawawi, A. Khatibi, and E.
natural speech: evidence from sentence boundary and Khatibi, “Engineering Applications of Arti fi cial
disfluency detection.,” Interspeech, no. 1, pp. 2–5, 2004. Intelligence LMES : A localized multi-estimator
66. L. Breiman, “Bagging predictors,” Mach. Learn., vol. model to estimate software development effort,” Eng.
24, no. 2, pp. 123–140, 1996. Appl. Artif. Intell., pp. 1–17, 2013.
67. T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, 79. R. Malhotra, “Software Effort Prediction using
“Comparing boosting and bagging techniques with Statistical and Machine Learning Methods,” vol. 2,
noisy and imbalanced data,” IEEE Trans. Syst. Man, no. 1, pp. 145–152, 2011.
Cybern. Part ASystems Humans, vol. 41, no. 3, pp. 80. P. Phannachitta, J. Keung, K. E. Bennin, A. Monden, and
552–568, 2011. K. Matsumoto, “Filter-INC: Handling
68. M. Azzeh, A. B. Nassif, S. Banitaan, and F. Almasalha, effort-inconsistency in software effort estimation
“Pareto efficient multi-objective optimization for datasets,” Proc. - Asia-Pacific Softw. Eng. Conf.
local tuning of analogy-based estimation,” Neural APSEC, pp. 185–192, 2017.
Comput. Appl., vol. 27, no. 8, pp. 2241–2265, 2015. 81. S. Mensah, J. Keung, M. F. Bosu, and K. E. Bennin,
https://doi.org/10.1007/s00521-015-2004-y “Duplex output software effort estimation model with
69. A. Idri, M. Hosni, and A. Abran, “Systematic literature self-guided interpretation,” Inf. Softw. Technol., vol.
review of ensemble effort estimation,” J. Syst. Softw., 94, pp. 1–13, 2018.
vol. 118, pp. 151–175, 2016. 82. Y. S. Seo and D. H. Bae, On the value of outlier
70. J.-C. Lin, C.-T. Chang, and S.-Y. Huang, “Research on elimination on software effort estimation research,
Software Effort Estimation Combined with Genetic vol. 18, no. 4. 2013.
Algorithm and Support Vector Regression,” 2011 Int. 83. T. Iliou, M. Nerantzaki, and G. Anastassopoulos, “A
Symp. Comput. Sci. Soc., pp. 349–352, 2011. Novel Machine Learning Data Preprocessing Method
71. M. A. Mostafa, A. F. Abdou, A. F. A. El-gawad, and A. for Enhancing Classification Algorithms
F. Abdou, “SBO-based selective harmonic elimination Performance,” pp. 1–5, 2015.
for nine levels asymmetrical cascaded H-bridge 84. N. Cerpa, M. Bardeen, C. A. Astudillo, and J. Verner,
multilevel inverter,” Aust. J. Electr. Electron. Eng., vol. “Evaluating different families of prediction methods
00, no. 00, pp. 1–13, 2018. for estimating software project outcomes,” J. Syst.
72. Corazza, A. et al. (2013) ‘Using tabu search to Softw., vol. 112, pp. 48–64, 2016.
configure support vector regression for effort 85. Y. Arafat, S. Hoque, and D. Farid, “Cluster-based
estimation’, Empirical Software Engineering, 18(3), pp. Under-sampling with Random Forest for Multi-Class
506–546. doi: 10.1007/s10664-011-9187-3. Imbalanced Classification,” Int. Conf. Software,
73. Smola Cotroneo, D. et al. (2016) ‘Prediction of the Knowledge, Inf. Manag. Appl., pp. 1–6, 2017.
Testing Effort for the Safety Certification of 86. B. Sluban and N. Lavrač, “Relating ensemble diversity
Open-Source Software: A Case Study on a Real-Time and performance: A study in class noise detection,”
Operating System’, Proceedings - 2016 12th European Neurocomputing, vol. 160, pp. 120–131, 2015.
Dependable Computing Conference, EDCC 2016, pp. 87. D. Zhu, “A hybrid approach for efficient ensembles,”
141–152. doi: 10.1109/EDCC.2016.22. Decis. Support Syst., vol. 48, no. 3, pp. 480–487, 2010.
74. D. Déry and A. Abran, “Investigation of the Effort
Data Consistency in the ISBSG Repository
303
Robert Marco et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.5), 2019, 294- 304
304