Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Random forest swarm optimization-based for heart diseases diagnosis

Published: 01 March 2021 Publication History

Graphical abstract

Display Omitted

Highlights

Through combining the multi-objective particle swarm optimization and Random forest, a new approach is proposed to predict the heart disease.
The main goal is to produce diverse and accurate classifiers and determine the (near) optimal number of classifiers.
The results indicate that the proposed algorithm outperforms the other techniques in terms of accuracy and statistical tests.

Abstract

Heart disease has been one of the leading causes of death worldwide in recent years. Among diagnostic methods for heart disease, angiography is one of the most common methods, but it is costly and has side effects. Given the difficulty of heart disease prediction, data mining can play an important role in predicting heart disease accurately. In this paper, by combining the multi-objective particle swarm optimization (MOPSO) and Random Forest, a new approach is proposed to predict heart disease. The main goal is to produce diverse and accurate decision trees and determine the (near) optimal number of them simultaneously. In this method, an evolutionary multi-objective approach is used instead of employing a commonly used approach, i.e., bootstrap, feature selection in the Random Forest, and random number selection of training sets. By doing so, different training sets with different samples and features for training each tree are generated. Also, the obtained solutions in Pareto-optimal fronts determine the required number of training sets to build the random forest. By doing so, the random forest's performance can be enhanced, and consequently, the prediction accuracy will be improved. The proposed method's effectiveness is investigated by comparing its performance over six heart datasets with individual and ensemble classifiers. The results suggest that the proposed method with the (near) optimal number of classifiers outperforms the random forest algorithm with different classifiers.

References

[1]
J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques, Elsevier, 2011.
[2]
S. Ronoud, S. Asadi, An evolutionary deep belief network extreme learning-based for breast cancer diagnosis, Soft Comput 23 (24) (2019) 13139–13159.
[3]
M.H. Tahan, S. Asadi, MEMOD: a novel multivariate evolutionary multi-objective discretization, Soft Comput 22 (1) (2018) 301–323.
[4]
F. Mansourypoor, S. Asadi, Development of a Reinforcement Learning-based Evolutionary Fuzzy Rule-Based System for diabetes diagnosis, Comput. Biol. Med. 91 (2017) 337–352.
[5]
M.H. Tahan, S. Asadi, EMDID: Evolutionary multi-objective discretization for imbalanced datasets, Inform. Sci. 432 (2018) 442–461,.
[6]
S.M.R. Kazemi, E. Hadavandi, S. Shamshirband, S. Asadi, A novel evolutionary-negative correlated mixture of experts model in tourism demand estimation, Comput. Hum. Behav. 64 (2016) 641–655.
[7]
P. Abbaszadeh, A. Alipour, S. Asadi, Development of a coupled wavelet transform and evolutionary Levenberg-Marquardt neural networks for hydrological process modeling, Comput. Intell. 34 (1) (2018) 175–199.
[8]
M. Shouman, T. Turner, R. Stocker, Using data mining techniques in heart disease diagnosis and treatment, in: Proc. 2012 Japan-Egypt Conf. Electron. Commun. Comput. JEC-ECC 2012, 2012: pp. 173–177. https://doi.org/10.1109/JEC-ECC.2012.6186978.
[9]
D. Sisodia, D.S. Sisodia, Prediction of Diabetes using Classification Algorithms, Proc. Comput. Sci. 132 (2018) 1578–1585.
[10]
V. Chaurasia, S. Pal, B.B. Tiwari, Prediction of benign and malignant breast cancer using data mining techniques, J. Algorithms Comput. Technology 12 (2) (2018) 119–126.
[11]
R. Das, I. Turkoglu, A. Sengur, Effective diagnosis of heart disease through neural networks ensembles, Expert Syst. Appl. 36 (4) (2009) 7675–7680.
[12]
N.D. Wong, Epidemiological studies of CHD and the evolution of preventive cardiology, Nat. Rev. Cardiol. 11 (5) (2014) 276–289.
[13]
S. Bashir, U. Qamar, F.H. Khan, BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting, Australas Phys. Eng. Sci. Med. 38 (2) (2015) 305–323.
[14]
Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, A.A. Yarifard, Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm, Comput. Methods Programs Biomed. 141 (2017) 19–26.
[15]
S. Ahmed, A. Bey, S. Hashmi, S. Yadav, Prevalence and clinical aspects of drug-induced gingival enlargement, Biomed. Res 20 (3) (2009) 212,.
[16]
O.W. Samuel, G.M. Asogbon, A.K. Sangaiah, P. Fang, G. Li, An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction, Expert Syst. Appl. 68 (2017) 163–172.
[17]
I. Yekkala, S. Dixit, M.A. Jabbar, Prediction of heart disease using ensemble learning and Particle Swarm Optimization, in: Proc. 2017 Int. Conf. Smart Technol. Smart Nation, SmartTechCon 2017, Institute of Electrical and Electronics Engineers Inc., 2018: pp. 691–698. https://doi.org/10.1109/SmartTechCon.2017.8358460.
[18]
M. Jan, A.A. Awan, M.S. Khalid, S. Nisar, Ensemble approach for developing a smart heart disease prediction system using classification algorithms, Res. Reports Clin. Cardiol. 9 (2018) 33–45,.
[19]
X. Liu, X. Wang, Q. Su, M.o. Zhang, Y. Zhu, Q. Wang, Q. Wang, A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method, Comput. Math. Methods Med. 2017 (2017) 1–11.
[20]
T. Nguyen, A. Khosravi, D. Creighton, S. Nahavandi, Classification of healthcare data using genetic fuzzy logic system and wavelets, Exp. Syst. Appl. 42 (4) (2015) 2184–2197.
[21]
G. Manogaran, R. Varatharajan, M.K. Priyan, Hybrid Recommendation System for Heart Disease Diagnosis based on Multiple Kernel Learning with Adaptive Neuro-Fuzzy Inference System, Multimed. Tools Appl. 77 (4) (2018) 4379–4399.
[22]
A. Davari Dolatabadi, S.E.Z. Khadem, B.M. Asl, Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM, Comput. Methods Programs Biomed. 138 (2017) 117–126.
[23]
R. Polikar, Ensemble based systems in decision making, IEEE Circ. Syst. Mag. 6 (3) (2006) 21–45.
[24]
S.E. Roshan, S. Asadi, Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization, Eng. Appl. Artif. Intell. 87 (2020) 103319,.
[25]
L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (1-2) (2010) 1–39.
[26]
Kagan Tumer, Joydeep Ghosh, Error Correlation and Error Reduction in Ensemble Classifiers, Connect. Sci. 8 (3-4) (1996) 385–404.
[27]
G. Brown, J. Wyatt, R. Harris, X. Yao, Diversity creation methods: a survey and categorisation, Inform. Fusion 6 (1) (2005) 5–20.
[28]
L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140.
[29]
Y. Freund, R.E. Schapire, et al., Experiments with a new boosting algorithm, Icml (1996) 148–156.
[30]
L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32,.
[31]
L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Mach. Learn. 51 (2003) 181–207,.
[32]
Z. Donyavi, S. Asadi, Diverse training dataset generation based on a multi-objective optimization for semi-Supervised classification, Pattern Recogn. 108 (2020) 107543,.
[33]
Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, 1st ed., Chapman & Hall/CRC, 2012.
[34]
S. Asadi, Evolutionary fuzzification of RIPPER for regression: Case study of stock prediction, Neurocomputing 331 (2019) 121–137.
[35]
D.S.C. Nascimento, A.L.V. Coelho, A.M.P. Canuto, Integrating complementary techniques for promoting diversity in classifier ensembles: A systematic study, Neurocomputing 138 (2014) 347–357.
[36]
J. Abellán, C.J. Mantas, J.G. Castellano, S. Moral-García, Increasing diversity in random forest learning algorithm via imprecise probabilities, Expert Syst. Appl. 97 (2018) 228–243.
[37]
S. Asadi, S.E. Roshan, A bi-objective optimization method to produce a near-optimal number of classifiers and increase diversity in Bagging, Knowl.-Based Syst. 213 (2021) 106656,.
[38]
Y.e. Ren, L.e. Zhang, P.N. Suganthan, Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article], IEEE Comput. Intell. Mag. 11 (1) (2016) 41–53.
[39]
Z. Cselényi, Mapping the dimensionality, density and topology of data: The growing adaptive neural gas, Comput. Methods Prog. Biomed. 78 (2) (2005) 141–156.
[40]
S.H. Huang, L.R. Wulsin, H. Li, J. Guo, Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study, Comput. Methods Programs Biomed. 93 (2) (2009) 115–123.
[41]
R.W. Johnson, An Introduction to the Bootstrap, Teach. Stat. 23 (2001) 49–54. https://doi.org/10.1111/1467-9639.00050.
[42]
T.K. Ho, Random decision forests, in: Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, IEEE Computer Society, 1995: pp. 278–282. https://doi.org/10.1109/ICDAR.1995.598994.
[43]
C. Yang, X.-C. Yin, Diversity-Based Random Forests with Sample Weight Learning, Cogn. Comput. 11 (5) (2019) 685–696.
[44]
P. Latinne, O. Debeir, C. Decaestecker, Limiting the number of trees in random forests, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer Verlag, 2001: pp. 178–187. https://doi.org/10.1007/3-540-48219-9_18.
[45]
S. Bernard, L. Heutte, S. Adam, Using random forests for handwritten digit recognition, in: Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, 2007: pp. 1043–1047. https://doi.org/10.1109/ICDAR.2007.4377074.
[46]
S. Bernard, L. Heutte, S. Adam, On the selection of decision trees in Random forests, in: Proc. Int. Jt. Conf. Neural Networks, 2009: pp. 302–307. https://doi.org/10.1109/IJCNN.2009.5178693.
[47]
V.Y. Kulkarni, P.K. Sinha, Pruning of random forest classifiers: A survey and future directions, in: Proc. - 2012 Int. Conf. Data Sci. Eng. ICDSE 2012, 2012: pp. 64–68. https://doi.org/10.1109/ICDSE.2012.6282329.
[48]
M. Kaur, H.K. Gianey, D. Singh, M. Sabharwal, Multi-objective differential evolution based random forest for e-health applications, Mod. Phys. Lett. B 33 (05) (2019) 1950022,.
[49]
M. Bursa, L. Lhotska, M. Macas, Hybridized Swarm Metaheuristics for Evolutionary Random Forest Generation, in: Institute of Electrical and Electronics Engineers (IEEE), 2008: pp. 150–155. https://doi.org/10.1109/his.2007.9.
[50]
M.A. Jabbar, B.L. Deekshatulu, P. Chandra, Intelligent heart disease prediction system using random forest and evolutionary approach, J. Netw. Innov. Comput. 4 (2016) 175–184.
[51]
C. Qi, Q. Chen, Evolutionary Random Forest Algorithms for Predicting the Maximum Failure Depth of Open Stope Hangingwalls, IEEE Access 6 (2018) 72808–72813.
[52]
X.-a. Bi, X.i. Hu, H. Wu, Y. Wang, Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest, IEEE J. Biomed. Health Inform. 24 (10) (2020) 2973–2983.
[53]
M.N. Adnan, M.Z. Islam, Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm, Knowl.-Based Syst. 110 (2016) 86–97.
[54]
J. Kennedy, R. Eberhart, Particle swarm optimization (PSO), in: Proc. IEEE Int. Conf. Neural Networks, Perth, Aust., 1995: pp. 1942–1948.
[55]
Y. Zhang, S. Wang, P. Phillips, G. Ji, Binary PSO with mutation operator for feature selection using decision tree applied to spam detection, Knowl.-Based Syst. 64 (2014) 22–31.
[56]
S. Chandra, R. Bhat, H. Singh, A PSO based method for detection of brain tumors from MRI, in: 2009 World Congr. Nat. Biol. Inspired Comput. NABIC 2009 - Proc., 2009: pp. 666–671. https://doi.org/10.1109/NABIC.2009.5393455.
[57]
Z. Abdmouleh, A. Gastli, L. Ben-Brahim, M. Haouari, N.A. Al-Emadi, Review of optimization techniques applied for the integration of distributed generation from renewable energy sources, Renew. Energy 113 (2017) 266–280.
[58]
S.L. Salzberg, C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993, Mach. Learn. 16 (1994) 235–240. https://doi.org/10.1007/bf00993309.
[59]
Alexey Tsymbal, Mykola Pechenizkiy, Pádraig Cunningham, Diversity in search strategies for ensemble feature selection, Inform. Fusion 6 (1) (2005) 83–98.
[60]
Zahra Donyavi, Shahrokh Asadi, Using decomposition-based multi-objective evolutionary algorithm as synthetic example optimization for self-labeling, Swarm Evol. Comput. 58 (2020) 100736,.
[61]
P.C. Mahalanobis, On the generalized distance in statistics, in (1936).
[62]
Mohammad Karim Sohrabi, Alireza Tajik, Multi-objective feature selection for warfarin dose prediction, Comput. Biol. Chem. 69 (2017) 126–133.
[63]
C.A. Coello Coello, M.S. Lechuga, MOPSO: A proposal for multiple objective particle swarm optimization, in: Proc. 2002 Congr. Evol. Comput. CEC 2002, IEEE Computer Society, 2002: pp. 1051–1056. https://doi.org/10.1109/CEC.2002.1004388.
[64]
M. Reyes-Sierra, C.A.C. Coello, others, Multi-objective particle swarm optimizers: A survey of the state-of-the-art, Int. J. Comput. Intell. Res. 2 (2006) 287–308.
[65]
D. Dua, C. Graff, {UCI} Machine Learning Repository, (2017). http://archive.ics.uci.edu/ml.
[66]
David H. Wolpert, Stacked generalization, Neural Networks 5 (2) (1992) 241–259.
[67]
D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, Chapman and Hall/CRC (2003),.
[68]
Shahrokh Asadi, Jamal Shahrabi, Complexity-based parallel rule induction for multiclass classification, Inf. Sci. 380 (2017) 53–73.
[69]
D.M. ROM, A sequentially rejective test procedure based on a modified Bonferroni inequality, Biometrika 77 (3) (1990) 663–665.
[70]
S. Holm, A simple sequentially rejective multiple test procedure, Scand. J. Stat. 65–70 (1979).
[71]
Jianjun (David) Li, A two-step rejection procedure for testing multiple hypotheses, J. Stat. Plan. Inference 138 (6) (2008) 1521–1527.
[72]
H. Finner, On a Monotonicity Problem in Step-Down Multiple Test Procedures, J. Am. Stat. Assoc. 88 (423) (1993) 920–923.
[73]
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2002) 182–197. https://doi.org/10.1109/4235.996017.
[74]
E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the strength Pareto evolutionary algorithm, TIK-Report. 103 (2001).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Biomedical Informatics
Journal of Biomedical Informatics  Volume 115, Issue C
Mar 2021
228 pages

Publisher

Elsevier Science

San Diego, CA, United States

Publication History

Published: 01 March 2021

Author Tags

  1. Data mining
  2. Ensemble learning
  3. Random forest
  4. Diversity
  5. Heart disease

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimized Extreme Learning Machine with Bacterial Colony Optimization Algorithm for Disease Diagnosis in Clinical DatasetsSN Computer Science10.1007/s42979-024-02864-85:5Online publication date: 26-May-2024
  • (2023)Early Prediction of Heart Disease via LSTM-XGBoostProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594383(631-637)Online publication date: 17-Mar-2023
  • (2023)SWEP-RFJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10167235:8Online publication date: 1-Sep-2023
  • (2023)A novel automated CNN arrhythmia classifier with memory-enhanced artificial hummingbird algorithmExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119162213:PCOnline publication date: 1-Mar-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media