A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems
Abstract
:1. Introduction
2. General Background
2.1. Class Imbalance Problem
- Safe: Data located in the homogeneous regions from one class only (majority or minority).
- Borderline: Data located in nearby decision boundaries between classes. In this scenario, the classifiers need to decide the class of the objects when they are in the decision boundary, which, due to the bias, result in favor of the majority class.
- Rare: Data located inside the majority class is often seen as overlapping. The classifier tends to classify the minority class as part of the majority class. The effect of this has been discussed in different works [32].
- Outliers: Data located far away from the sample space. The minority objects could be treated as noise by the classifier; on the other hand, noise could be treated as minority objects [33]. This happens when there are outlier objects in the database and the data should not be removed because it could be a representation of a minority class.
2.2. Approaches to Deal with Class Imbalance Problems
- Data level: The objective of this approach is to create a balanced training dataset by preprocessing the data through artificial manipulation. There are three solutions to data sampling: over-sampling, under-sampling, and hybrid-sampling.
- (a)
- (b)
- Under-sampling: Objects are removed from the majority class. The goal is to have the same number of objects in each class. The basic solution of this method is random under-sampling. The disadvantage of using this method is that it can exclude a significant amount of the original data.
- (c)
- Hybrid-sampling: A combination of over-sampling and under-sampling. This approach generates objects for the minority class while it eliminates objects from the majority class.
- Algorithm level: The aim of this type of approach is a specific modification of the classifier. This approach is not flexible for different classification problems because it focuses on a specific classifier with a specific type of database. Nevertheless, the results could lead to good classification results for a particular problem. This type of solution can also combine strengths of different solutions as the NeuroFuzzy Model [37], which combines a fuzzy system trained as a neural network [31].
- Cost-sensitive: The objective is to create a cost matrix that is built with different misclassification costs. The misclassified objects of the minority class have a higher misclassification cost than the misclassified objects of the majority class. One of the main disadvantages is the cost-sensitive problem, which appears because the cost of misclassification is different for each of the classes. Therefore, this type of problem cannot be compared against non-cost-sensitive problems [38].
2.3. Pattern-Based Classifiers
- Filtering stage: At this stage, there are set-based filters and quality measures that need to distinguish between patterns that have a high discriminative ability for supervised classification [41]. The quality is usually established by measuring reliability, novelty, coverage, conciseness, peculiarity, diversity, utility, and actionability [42]. All the previous measurements take into account two parameters: if the pattern covers an object and if the object is representative of the class determined by the pattern.
- Classification stage: The last stage is the classification of query objects. At this stage, the classifier combines the patterns and creates a voting scheme. Finally, it is necessary to evaluate the performance of the classifier to determine its quality.
2.4. Fuzzy Logic
3. Pattern and Fuzzy Approaches for Imbalance Problems
3.1. Research Methodology
3.2. Data- Level Approaches
3.3. Algorithm Level
3.4. Discussion
4. Applications Domains
5. Taxonomy
6. Future Directions
7. Conclusions
- Advantages:
- -
- Fuzzy and pattern-based approaches attract interest from the research community.
- -
- Fuzzy logic is widely used for its flexibility and understandability of the results.
- -
- Medicine is an area where the imbalance problem is constantly presented and uses the newest techniques.
- -
- Techniques that include fuzzy approaches have shown better classification results in comparison to other classifiers based on non-fuzzy approaches.
- -
- Fuzzy pattern-based approaches are a promising solution to handle the imbalanced data problem. However, this type of classifier should be studied further.
- Disadvantages:
- -
- Despite the flexibility of fuzzy approaches, they can lead to repetitive solutions that are small variations of other ones.
- -
- The quality of fuzzy patterns is highly dependent on the quality of the features of the fuzzification process.
- -
- Fuzzy emerging patterns are highly dependent on the quality measure Growth Rate, which could not provide good patterns as stated in [142] .
- -
- The combination of fuzzy and pattern-based approaches has not been studied in detail, so some research can lead to a dead end.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- An, B.; Suh, Y. Identifying financial statement fraud with decision rules obtained from Modified Random Forest. Data Technol. Appl. 2020, 54, 235–255. [Google Scholar] [CrossRef]
- De Bock, K.W.; Coussement, K.; Lessmann, S. Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach. Eur. J. Oper. Res. 2020, 285, 612–630. [Google Scholar] [CrossRef]
- Kim, T.; Ahn, H. A hybrid under-sampling approach for better bankruptcy prediction. J. Intell. Inf. Syst. 2015, 21, 173–190. [Google Scholar]
- Zhou, L. Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowl. Based Syst. 2013, 41, 16–25. [Google Scholar] [CrossRef]
- Mazurowski, M.A.; Habas, P.A.; Zurada, J.M.; Lo, J.Y.; Baker, J.A.; Tourassi, G.D. Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw. 2008, 21, 427–436. [Google Scholar] [CrossRef] [Green Version]
- Goyal, D.; Choudhary, A.; Pabla, B.; Dhami, S. Support vector machines based non-contact fault diagnosis system for bearings. J. Intell. Manuf. 2020, 31, 1275–1289. [Google Scholar] [CrossRef]
- Zhu, Z.B.; Song, Z.H. Fault diagnosis based on imbalance modified kernel Fisher discriminant analysis. Chem. Eng. Res. Des. 2010, 88, 936–951. [Google Scholar] [CrossRef]
- Fawcett, T.; Provost, F. Adaptive fraud detection. Data Min. Knowl. Discov. 1997, 1, 291–316. [Google Scholar] [CrossRef]
- Minastireanu, E.A.; Gabriela, M. Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection. BRAIN. Broad Res. Artif. Intell. Neurosci. 2020, 11, 131–143. [Google Scholar] [CrossRef]
- Gao, X.; Chen, Z.; Tang, S.; Zhang, Y.; Li, J. Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 2016, 173, 1927–1935. [Google Scholar] [CrossRef]
- Koziarski, M.; Kwolek, B.; Cyganek, B. Convolutional neural network-based classification of histopathological images affected by data imbalance. In Video Analytics. Face and Facial Expression Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11264, pp. 1–11. [Google Scholar]
- Yu, H.; Ni, J.; Dan, Y.; Xu, S. Mining and integrating reliable decision rules for imbalanced cancer gene expression data sets. Tsinghua Sci. Technol. 2012, 17, 666–673. [Google Scholar] [CrossRef]
- Olszewski, D. A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 2012, 26, 246–258. [Google Scholar] [CrossRef]
- Chen, L.; Dong, G. Using Emerging Patterns in Outlier and Rare-Class Prediction. In Contrast Data Mining: Concepts, Algorithms, and Applications; CRC Press: Boca Raton, FL, USA, 2013; pp. 171–186. [Google Scholar]
- Loyola-González, O.; Medina-Pérez, M.A.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Monroy, R.; García-Borroto, M. PBC4cip: A new contrast pattern-based classifier for class imbalance problems. Knowl. Based Syst. 2017, 115, 100–109. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Quinlan, J.R. Bagging, boosting, and C4.5. In Proceedings of the Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996; Volume 1, pp. 725–730. [Google Scholar]
- Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef] [Green Version]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- García-Borroto, M.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. Fuzzy emerging patterns for classifying hard domains. Knowl. Inf. Syst. 2011, 28, 473–489. [Google Scholar] [CrossRef]
- Liu, J. Fuzzy support vector machine for imbalanced data with borderline noise. Fuzzy Sets Syst. 2020. [Google Scholar] [CrossRef]
- Ambika, M.; Raghuraman, G.; SaiRamesh, L. Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques. Soft Comput. 2020, 24, 13293–13304. [Google Scholar] [CrossRef]
- Loyola-González, O. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 2019, 7, 154096–154113. [Google Scholar] [CrossRef]
- Loyola-González, O.; Gutierrez-Rodríguez, A.E.; Medina-Pérez, M.A.; Monroy, R.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; García-Borroto, M. An Explainable Artificial Intelligence Model for Clustering Numerical Databases. IEEE Access 2020, 8, 52370–52384. [Google Scholar] [CrossRef]
- García-Borroto, M.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Medina-Pérez, M.A.; Ruiz-Shulcloper, J. LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recognit. 2010, 43, 3025–3034. [Google Scholar] [CrossRef]
- Zhang, X.; Dong, G. Overview and analysis of contrast pattern based classification. In Contrast Data Mining: Concepts, Algorithms, and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016; Volume 11, pp. 151–170. [Google Scholar]
- Liu, C.; Cao, L.; Philip, S.Y. Coupled fuzzy k-nearest neighbors classification of imbalanced non-IID categorical data. In Proceedings of the 2014 International Joint Conference on Neural Networks, Beijing, China, 6–11 July 2014; pp. 1122–1129. [Google Scholar]
- Dong, G.; Bailey, J. Contrast Data Mining: Concepts, Algorithms, and Applications; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Duan, L.; García-Borroto, M.; Dong, G. More Expressive Contrast Patterns and Their Mining. In Contrast Data Mining: Concepts, Algorithms, and Applications; CRC Press: Boca Raton, FL, USA, 2013; pp. 89–108. [Google Scholar]
- Napierala, K.; Stefanowski, J. Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 2016, 46, 563–597. [Google Scholar] [CrossRef]
- Lin, W.C.; Tsai, C.F.; Hu, Y.H.; Jhang, J.S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409–410, 17–26. [Google Scholar] [CrossRef]
- Denil, M.; Trappenberg, T. Overlap versus imbalance. In Proceedings of the Canadian Conference on Artificial Intelligence, Ottawa, ON, Canada, 31 May–2 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 220–231. [Google Scholar]
- Beyan, C.; Fisher, R. Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recognit. 2015, 48, 1653–1672. [Google Scholar] [CrossRef] [Green Version]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Wang, K.J.; Makond, B.; Chen, K.H.; Wang, K.M. A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients. Appl. Soft Comput. 2014, 20, 15–24. [Google Scholar] [CrossRef]
- Gao, M.; Hong, X.; Harris, C.J. Construction of neurofuzzy models for imbalanced data classification. IEEE Trans. Fuzzy Syst. 2013, 22, 1472–1488. [Google Scholar] [CrossRef]
- Kim, J.; Choi, K.; Kim, G.; Suh, Y. Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost. Expert Syst. Appl. 2012, 39, 4013–4019. [Google Scholar] [CrossRef]
- Dong, G.; Li, J.; Wong, L. The use of emerging patterns in the analysis of gene expression profiles for the diagnosis and understanding of diseases. New Gener. Data Min. Appl. 2005, 331–354. [Google Scholar] [CrossRef] [Green Version]
- Han, J.; Cheng, H.; Xin, D.; Yan, X. Frequent pattern mining: Current status and future directions. Data Min. Knowl. Discov. 2007, 15, 55–86. [Google Scholar] [CrossRef] [Green Version]
- Gonzalez, O.L. Supervised Classifiers Based on Emerging Patterns for Class Imbalance Problems. Ph.D. Thesis, Coordinación de Ciencias Computacionales, National, Puebla, Mexico, 2017. [Google Scholar]
- García-Vico, Á.M.; González, P.; Carmona, C.J.; del Jesus, M.J. A Big Data Approach for the Extraction of Fuzzy Emerging Patterns. Cogn. Comput. 2019, 11, 400–417. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Walker, C.L.; Walker, E.A. A First Course in Fuzzy Logic; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Ross, T.J. Fuzzy Logic with Engineering Applications; Wiley Online Library: Hoboken, NJ, USA, 2004; Volume 2. [Google Scholar]
- Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef] [Green Version]
- Lior, R.; Oded, M. Data Mining with Decision Trees: Theory and Applications; World Scientific: Singapore, 2014; Volume 81. [Google Scholar]
- Zimmermann, H.J. Fuzzy Set Theory and Its Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Gramann, K.D.M. Fuzzy classification: An overview. Fuzzy-Syst. Comput. Sci. 1994, 277–294. [Google Scholar] [CrossRef]
- Orazbayev, B.; Ospanov, E.; Orazbayeva, K.; Kurmangazieva, L. A hybrid method for the development of mathematical models of a chemical engineering system in ambiguous conditions. Math. Model. Comput. Simulations 2018, 10, 748–758. [Google Scholar] [CrossRef]
- Werro, N. Fuzzy Classification of Online Customers; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Liu, S.; Zhang, J.; Xiang, Y.; Zhou, W. Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans. Fuzzy Syst. 2017, 25, 1476–1490. [Google Scholar] [CrossRef]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
- Shirabad, J.S.; Menzies, T.J. The PROMISE repository of software engineering databases. Sch. Inf. Technol. Eng. Univ. 2005, 24. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository; UCI: Irvine, CA, USA, 2017. [Google Scholar]
- Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
- Bekkar, M.; Djemaa, H.K.; Alitouche, T.A. Evaluation measures for models assessment over imbalanced data sets. J. Inf. Eng. Appl. 2013, 3, 27–39. [Google Scholar]
- Zhu, X.; Zhang, S.; Jin, Z.; Zhang, Z.; Xu, Z. Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 2010, 23, 110–121. [Google Scholar] [CrossRef]
- Pan, R.; Yang, T.; Cao, J.; Lu, K.; Zhang, Z. Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl. Intell. 2015, 43, 614–632. [Google Scholar] [CrossRef]
- Folguera, L.; Zupan, J.; Cicerone, D.; Magallanes, J.F. Self-organizing maps for imputation of missing data in incomplete data matrices. Chemom. Intell. Lab. Syst. 2015, 143, 146–151. [Google Scholar] [CrossRef]
- Jo, T.; Japkowicz, N. Class imbalances versus small disjuncts. ACM Sigkdd Explor. Newsl. 2004, 6, 40–49. [Google Scholar] [CrossRef]
- Rahman, M.M.; Davis, D. Cluster based under-sampling for unbalanced cardiovascular data. In Proceedings of the World Congress on Engineering, San Francisco, CA, USA, 23–25 October 2013; Volume 3, pp. 3–5. [Google Scholar]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Liu, G.; Yang, Y.; Li, B. Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning. Knowl. Based Syst. 2018, 158, 154–174. [Google Scholar] [CrossRef]
- Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2012, 26, 405–425. [Google Scholar] [CrossRef]
- Zhang, H.; Li, M. RWO-Sampling: A random walk over-sampling approach to imbalanced data classification. Inf. Fusion 2014, 20, 99–116. [Google Scholar] [CrossRef]
- Das, B.; Krishnan, N.C.; Cook, D.J. RACOG and wRACOG: Two probabilistic oversampling techniques. IEEE Trans. Knowl. Data Eng. 2014, 27, 222–234. [Google Scholar] [CrossRef] [Green Version]
- Ksieniewicz, P. Standard Decision Boundary in a Support-Domain of Fuzzy Classifier Prediction for the Task of Imbalanced Data Classification. In Proceedings of the International Conference on Computational Science, Amsterdam, The Netherlands, 3–5 June 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 103–116. [Google Scholar]
- Kuncheva, L.; Bezdek, J.C.; Sutton, M.A. On combining multiple classifiers by fuzzy templates. In Proceedings of the 1998 Conference of the North American Fuzzy Information Processing Society-NAFIPS (Cat. No. 98TH8353), Pensacola Beach, FL, USA, 20–21 August 1998; pp. 193–197. [Google Scholar]
- Ren, R.; Yang, Y.; Sun, L. Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data. Appl. Intell. 2020, 50, 2465–2487. [Google Scholar] [CrossRef]
- Mahalanobis, P.C. On the Generalized Distance in Statistics; National Institute of Science of India: Bengaluru, India, 1936. [Google Scholar]
- Tang, B.; He, H. GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognit. 2017, 71, 306–319. [Google Scholar] [CrossRef]
- Kaur, P.; Gosain, A. Robust hybrid data-level sampling approach to handle imbalanced data during classification. Soft Comput. 2020, 24, 15715–15732. [Google Scholar] [CrossRef]
- Kaur, P.; Gosain, A. FF-SMOTE: A metaheuristic approach to combat class imbalance in binary classification. Appl. Artif. Intell. 2019, 33, 420–439. [Google Scholar] [CrossRef]
- Tang, S.; Chen, S.P. The generation mechanism of synthetic minority class examples. In Proceedings of the 2008 International Conference on Information Technology and Applications in Biomedicine, Shenzhen, China, 30–31 May 2008; pp. 444–447. [Google Scholar]
- Feng, L.; Qiu, M.H.; Wang, Y.X.; Xiang, Q.L.; Yang, Y.F.; Liu, K. A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 2010, 31, 1216–1225. [Google Scholar] [CrossRef]
- Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
- Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 475–482. [Google Scholar]
- Stefanowski, J.; Wilk, S. Selective pre-processing of imbalanced data for improving classification performance. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Turin, Italy, 1–5 September 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 283–292. [Google Scholar]
- Hart, P. The condensed nearest neighbor rule (Corresp.). IEEE Trans. Inf. Theory 1968, 14, 515–516. [Google Scholar] [CrossRef]
- Tomek, I. Two modifications of CNN. IEEE Trans. Syst. Man, Cybern. Syst. 1976, 6, 769–772. [Google Scholar]
- Yoon, K.; Kwek, S. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In Proceedings of the Fifth International Conference on Hybrid Intelligent Systems (HIS’05), Rio de Janeiro, Brazil, 6–9 November 2005; pp. 6–11. [Google Scholar]
- Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. In Proceedings of the Conference on Artificial Intelligence in Medicine in Europe, Cascais, Portugal, 1–5 July 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 63–66. [Google Scholar]
- Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; Citeseer: Pittsburgh, PA, USA, 1997; Volume 97, pp. 179–186. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2009, 40, 185–197. [Google Scholar] [CrossRef]
- Yen, S.J.; Lee, Y.S. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 2009, 36, 5718–5727. [Google Scholar] [CrossRef]
- Ramentol, E.; Caballero, Y.; Bello, R.; Herrera, F. SMOTE-RS B*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 2012, 33, 245–265. [Google Scholar] [CrossRef]
- Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
- Fan, H.; Ramamohanarao, K. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng. 2006, 18, 721–737. [Google Scholar] [CrossRef]
- Buscema, M.; Consonni, V.; Ballabio, D.; Mauri, A.; Massini, G.; Breda, M.; Todeschini, R. K-CM: A new artificial neural network. Application to supervised pattern recognition. Chemom. Intell. Lab. Syst. 2014, 138, 110–119. [Google Scholar] [CrossRef]
- Buscema, M.; Grossi, E. The semantic connectivity map: An adapting self-organising knowledge discovery method in data bases. Experience in gastro-oesophageal reflux disease. Int. J. Data Min. Bioinform. 2008, 2, 362–404. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Ståhle, L.; Wold, S. Partial least squares analysis with cross-validation for the two-class problem: A Monte Carlo study. J. Chemom. 1987, 1, 185–196. [Google Scholar] [CrossRef]
- Collobert, R.; Bengio, S. Links between perceptrons, MLPs and SVMs. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 23. [Google Scholar]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression; Wiley: New York, NY, USA, 2000. [Google Scholar]
- McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 544. [Google Scholar]
- Platt, J.C. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods-Support Vector Learning; Schoelkopf, B., Burges, C., Smola, A., Eds.; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Fernández, A.; Herrera, F. Evolutionary Fuzzy Systems: A Case Study in Imbalanced Classification. In Fuzzy Logic and Information Fusion; Springer: Berlin/Heidelberg, Germany, 2016; pp. 169–200. [Google Scholar]
- LóPez, V.; FernáNdez, A.; Del Jesus, M.J.; Herrera, F. A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl. Based Syst. 2013, 38, 85–104. [Google Scholar] [CrossRef]
- Fan, Q.; Wang, Z.; Li, D.; Gao, D.; Zha, H. Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl. Based Syst. 2017, 115, 87–99. [Google Scholar] [CrossRef]
- Hsu, C.W.; Lin, C.J. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 2002, 13, 415–425. [Google Scholar]
- Platt, J.C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 1999, 10, 61–74. [Google Scholar]
- Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 39, 539–550. [Google Scholar]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man. Cybern. Part C Appl. Rev. 2011, 42, 463–484. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Y.; Kotagiri, R.; Wu, L.; Tari, Z.; Cheriet, M. KRNN: K Rare-class Nearest Neighbour classification. Pattern Recognit. 2017, 62, 33–44. [Google Scholar] [CrossRef]
- Zhu, C.; Wang, Z. Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit. Lett. 2017, 88, 72–80. [Google Scholar] [CrossRef]
- Chen, S.; Wang, Z.; Tian, Y. Matrix-pattern-oriented Ho–Kashyap classifier with regularization learning. Pattern Recognit. 2007, 40, 1533–1543. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Barua, S.; Islam, M.M.; Murase, K. A novel synthetic minority oversampling technique for imbalanced data set learning. In Proceedings of the International Conference on Neural Information Processing, Shanghai, China, 13–17 November 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 735–744. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. 511–518. [Google Scholar]
- Wang, Y.; Wang, S.; Lai, K.K. A new fuzzy support vector machine to evaluate credit risk. IEEE Trans. Fuzzy Syst. 2005, 13, 820–831. [Google Scholar] [CrossRef]
- Batuwita, R.; Palade, V. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 2010, 18, 558–571. [Google Scholar] [CrossRef]
- Pruengkarn, R.; Wong, K.W.; Fung, C.C. Imbalanced data classification using complementary fuzzy support vector machine techniques and smote. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 978–983. [Google Scholar]
- Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. [Google Scholar] [CrossRef]
- Lee, H.K.; Kim, S.B. An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst. Appl. 2018, 98, 72–83. [Google Scholar] [CrossRef]
- Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. In Proceedings of the European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50. [Google Scholar]
- García, V.; Mollineda, R.A.; Sánchez, J.S. On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 2008, 11, 269–280. [Google Scholar] [CrossRef]
- Wang, B.X.; Japkowicz, N. Boosting support vector machines for imbalanced data sets. Knowl. Inf. Syst. 2010, 25, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Gupta, D.; Richhariya, B. Entropy based fuzzy least squares twin support vector machine for class imbalance learning. Appl. Intell. 2018, 48, 4212–4231. [Google Scholar] [CrossRef]
- Shao, Y.H.; Chen, W.J.; Zhang, J.J.; Wang, Z.; Deng, N.Y. An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. Pattern Recognit. 2014, 47, 3158–3167. [Google Scholar] [CrossRef]
- Chen, S.G.; Wu, X.J. A new fuzzy twin support vector machine for pattern classification. Int. J. Mach. Learn. Cybern. 2018, 9, 1553–1564. [Google Scholar] [CrossRef]
- Arafat, M.Y.; Hoque, S.; Xu, S.; Farid, D.M. An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification. In Proceedings of the 2019 13th International Conference on Software, Knowledge, Information Management and Applications, Island of Ulkulhas, Maldives, 26–28 August 2019; pp. 1–6. [Google Scholar]
- Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Farid, D.M.; Zhang, L.; Rahman, C.M.; Hossain, M.A.; Strachan, R. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 2014, 41, 1937–1946. [Google Scholar] [CrossRef]
- Arafat, M.Y.; Hoque, S.; Farid, D.M. Cluster-based under-sampling with random forest for multi-class imbalanced classification. In Proceedings of the 2017 11th International Conference on Software, Knowledge, Information Management and Applications, Malabe, Sri Lanka, 6–8 December 2017; pp. 1–6. [Google Scholar]
- Schapire, R.E. Explaining adaboost. In Empirical Inference; Springer: Berlin/Heidelberg, Germany, 2013; pp. 37–52. [Google Scholar]
- Liu, R.; Wang, F.; He, M.; Jiao, L. An adjustable fuzzy classification algorithm using an improved multi-objective genetic strategy based on decomposition for imbalance dataset. Knowl. Inf. Syst. 2019, 61, 1583–1605. [Google Scholar] [CrossRef]
- Ducange, P.; Lazzerini, B.; Marcelloni, F. Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft Comput. 2010, 14, 713–728. [Google Scholar] [CrossRef]
- Xu, L.; Chow, M.Y.; Taylor, L.S. Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification E-algorithm. IEEE Trans. Power Syst. 2007, 22, 164–171. [Google Scholar] [CrossRef] [Green Version]
- Cho, P.; Lee, M.; Chang, W. Instance-based entropy fuzzy support vector machine for imbalanced data. Pattern Anal. Appl. 2020, 23, 1183–1202. [Google Scholar] [CrossRef] [Green Version]
- Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Volume 96, pp. 148–156. [Google Scholar]
- Zong, W.; Huang, G.B.; Chen, Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
- Sakr, N.A.; Abu-ElKheir, M.; Atwan, A.; Soliman, H. A multilabel classification approach for complex human activities using a combination of emerging patterns and fuzzy sets. Int. J. Electr. Comput. Eng. 2019, 9, 2993–3001. [Google Scholar] [CrossRef]
- Modayil, J.; Bai, T.; Kautz, H. Improving the recognition of interleaved activities. In Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Korea, 21–24 September 2008; pp. 40–43. [Google Scholar]
- Patel, H.; Thakur, G. An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J. Res. 2019, 65, 780–789. [Google Scholar] [CrossRef]
- Tan, S. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst. Appl. 2005, 28, 667–671. [Google Scholar] [CrossRef] [Green Version]
- Patel, H.; Thakur, G. A hybrid weighted nearest neighbor approach to mine imbalanced data. In Proceedings of the International Conference on Data Mining (DMIN), The Steering Committee of The World Congress in Computer Science, Las Vegas, NV, USA, 25–28 July 2016; p. 106. [Google Scholar]
- Patel, H.; Thakur, G.S. Classification of imbalanced data using a modified fuzzy-neighbor weighted approach. Int. J. Intell. Eng. Syst. 2017, 10, 56–64. [Google Scholar]
- García-Vico, Á.M.; González, P.; Carmona, C.J.; del Jesus, M.J. Study on the use of different quality measures within a multi-objective evolutionary algorithm approach for emerging pattern mining in big data environments. Big Data Anal. 2019, 4, 1–15. [Google Scholar] [CrossRef] [Green Version]
- García-Borroto, M.; Loyola-González, O.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. Evaluation of quality measures for contrast patterns by using unseen objects. Expert Syst. Appl. 2017, 83, 104–113. [Google Scholar] [CrossRef]
- García-Vico, A.M.; González, P.; del Jesus, M.J.; Carmona, C.J. A first approach to handle fuzzy emerging patterns mining on big data problems: The EvAEFP-spark algorithm. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 9–12 July 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Luna, J.M.; Carmona, C.J.; García-Vico, A.M.; del Jesus, M.J.; Ventura, S. Subgroup Discovery on Multiple Instance Data. Int. J. Comput. Intell. Syst. 2019, 12, 1602–1612. [Google Scholar] [CrossRef] [Green Version]
- Atzmueller, M.; Puppe, F. SD-Map-A fast algorithm for exhaustive subgroup discovery. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Freiburg, Germany, 3–5 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 6–17. [Google Scholar]
- Luna, J.M.; Romero, J.R.; Romero, C.; Ventura, S. On the Use of Genetic Programming for Mining Comprehensible Rules in Subgroup Discovery. IEEE Trans. Cybern. 2014, 44, 2329–2341. [Google Scholar] [CrossRef]
- Carmona, C.J.; González, P.; del Jesus, M.J.; Herrera, F. NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery. IEEE Trans. Fuzzy Syst. 2010, 18, 958–970. [Google Scholar] [CrossRef] [Green Version]
- Garcıa-Vicoa, A.M.; Chartea, F.; Gonzáleza, P.; Elizondob, D.; Carmonaa, C.J. E2PAMEA: A fast evolutionary algorithm for extracting fuzzy emerging patterns in big data environments. Neurocomputing 2020, 415, 60–73. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, Z.H. Cost-sensitive face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1758–1769. [Google Scholar] [CrossRef]
- Tang, Y.; Zhang, Y.Q.; Chawla, N.V.; Krasser, S. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man. Cybern. Part B (Cybern.) 2008, 39, 281–288. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gu, X.; Angelov, P.P.; Soares, E.A. A self-adaptive synthetic over-sampling technique for imbalanced classification. Int. J. Intell. Syst. 2020, 35, 923–943. [Google Scholar] [CrossRef]
- Gu, X.; Angelov, P.; Rong, H.J. Local optimality of self-organising neuro-fuzzy inference systems. Inf. Sci. 2019, 503, 351–380. [Google Scholar] [CrossRef]
- Cunningham, P.; Delany, S.J. k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples). arXiv 2020, arXiv:2004.04523. [Google Scholar]
- Garcia-Vico, A.M.; Carmona, C.J.; Gonzalez, P.; del Jesus, M.J. A Preliminary Many Objective Approach for Extracting Fuzzy Emerging Patterns. In Proceedings of the 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020), Burgos, Spain, 16–18 September 2020; Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 100–110. [Google Scholar]
- Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints. IEEE Trans. Evol. Comput. 2014, 18, 577–601. [Google Scholar] [CrossRef]
- Deb, K.; Sundar, J. Reference point based multi-objective optimization using evolutionary algorithms. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle, WA, USA, 8–11 July 2006; pp. 635–642. [Google Scholar]
- Schaefer, G. Strategies for imbalanced pattern classification for digital pathology. In Proceedings of the 2017 6th International Conference on Informatics, Electronics and Vision & 2017 7th International Symposium in Computational Medical and Health Technology, Himeji, Japan, 1–3 September 2017; pp. 1–4. [Google Scholar]
- Jaafar, H.; Ramli, N.H.; Nasir, A.S.A. An Improvement to The k-Nearest Neighbor Classifier for ECG Database; IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2018; Volume 318, pp. 1–10. [Google Scholar]
- Polat, K. Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets. Neural Comput. Appl. 2018, 30, 987–1013. [Google Scholar] [CrossRef]
- Cho, P.; Chang, W.; Song, J.W. Application of instance-based entropy fuzzy support vector machine in peer-to-peer lending investment decision. IEEE Access 2019, 7, 16925–16939. [Google Scholar] [CrossRef]
- Xia, Y.; Liu, C.; Liu, N. Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron. Commer. Res. Appl. 2017, 24, 30–49. [Google Scholar] [CrossRef]
- Li, F.Q.; Wang, S.L.; Liu, G.S. A Bayesian Possibilistic C-Means clustering approach for cervical cancer screening. Inf. Sci. 2019, 501, 495–510. [Google Scholar] [CrossRef]
- Grzymala-Busse, J.W.; Hu, M. A comparison of several approaches to missing attribute values in data mining. In Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Banff, AB, Canada, 16–19 October 2000; Springer: Berlin/Heidelber, Germany, 2000; pp. 378–385. [Google Scholar]
- Jassim, F.A. Image Denoising Using Interquartile Range Filter with Local Averaging. arXiv 2013, arXiv:1302.1007. [Google Scholar]
- Jain, D.; Singh, V. A two-phase hybrid approach using feature selection and Adaptive SVM for chronic disease classification. Int. J. Comput. Appl. 2019, 2, 1–13. [Google Scholar] [CrossRef]
- García-Vico, Á.M.; Carmona, C.J.; González, P.; Seker, H.; del Jesus, M.J. FEPDS: A Proposal for the Extraction of Fuzzy Emerging Patterns in Data Streams. IEEE Trans. Fuzzy Syst. 2020, 28, 3193–3203. [Google Scholar] [CrossRef]
Short Biography of Authors
Ismael Lin obtained his bachelor’s degree in Electrical Engineering from Tecnologico de Monterrey, and his master’s degree in Robotics from KTH Royal Institute of Technology. He is a Ph.D. student in Computer Science at Tecnologico de Monterrey, Campus Estado de Mexico, in the Machine Learning research group. His current research focuses on supervised learning, class imbalance problem, and pattern-based classification. | |
Octavio Loyola-González received his PhD degree in Computer Science from the National Institute for Astrophysics, Optics, and Electronics, Mexico, in 2017. He has won several awards from different institutions due to his research work on applied projects; consequently, he is a Member of the National System of Researchers in Mexico (Rank1). He worked as a distinguished professor and researcher at Tecnologico de Monterrey, Campus Puebla, for undergraduate and graduate programs of Computer Sciences. Currently, he is responsible for running Machine Learning & Artificial Intelligence practice inside Altair Management Consultants Corp., where he is involved in the development and implementation using analytics and data mining in the Altair Compass department. He has outstanding experience in the fields of big data & pattern recognition, cloud computing, IoT, and analytical tools to apply them in sectors where he has worked for as Banking & Insurance, Retail, Oil&Gas, Agriculture, Cybersecurity, Biotechnology, and Dactyloscopy. From these applied projects, Dr. Loyola-González has published several books and papers in well-known journals, and he has several ongoing patents as manager and researcher in Altair Compass. | |
Raúl Monroy obtained a Ph.D. degree in Artificial Intelligence from Edinburgh University, in 1998, under the supervision of Prof. Alan Bundy. He has been in Computing at Tecnologico de Monterrey, Campus Estado de México, since 1985. In 2010, he was promoted to (full) Professor in Computer Science. Since 1998, he is a member of the CONACYT-SNI National Research System, rank three. Together with his students and members of his group, Machine Learning Models (GIEE – MAC), Prof. Monroy studies the discovery and application of novel model machine learning models, which he often applies to cybersecurity problems. At Tecnologico de Monterrey, he is also Head of the graduate programme in computing, at region CDMX. | |
Miguel Angel Medina-Pérez received a Ph.D. in Computer Science from the National Institute of Astrophysics, Optics, and Electronics, Mexico, in 2014. He is currently a Research Professor with the Tecnologico de Monterrey, Campus Estado de Mexico, where he is also a member of the GIEE-ML (Machine Learning) Research Group. He has rank 1 in the Mexican Research System. His research interests include Pattern Recognition, Data Visualization, Explainable Artificial Intelligence, Fingerprint Recognition, and Palmprint Recognition. He has published tens of papers in referenced journals, such as “Information Fusion,” “IEEE Transactions on Affective Computing,” “Pattern Recognition,” “IEEE Transactions on Information Forensics and Security,” “Knowledge-Based Systems,” “Information Sciences,” and “Expert Systems with Applications.” He has extensive experience developing software to solve Pattern Recognition problems. A successful example is a fingerprint and palmprint recognition framework which has more than 1.3 million visits and 135 thousand downloads. |
Year | Ref. | Key Merit(s) | Disadvantage(s)/Improvements |
---|---|---|---|
2011 | [20] | -Proposed a fuzzy emerging pattern technique. -Proposed the FEPC classifier. | -Fuzzy emerging patterns are highly dependent on the quality measure Growth Rate. |
2014 | [91] | -Proposed K-Contractive Map (K-CM). | -Similar performance with a classical k-NN classifier. |
2016 | [100] | -A study in evolutionary fuzzy systems (EFSs) for imbalance problems. | -A taxonomy of the reviewed methods is missing. |
2016 | [102] | -Presented EFSVM. A fuzzy membership evaluation that assigns the membership value according to the class certainty. | -Unfortunately, they do not mention possible improvements in their work. |
2017 | [107] | -Proposed KRNN based on a KNN classifier. | -KRNN could be extended to multiple classes. |
2017 | [108] | -Proposed EMatMHKS. An algorithm with an entropy-based fuzzy membership and based in MatMHKS. | -An improvement in their function to measure the entropy-based fuzzy membership. |
2017 | [115] | -Proposed a combination of CMTFSVM and SMOTE. entropy-based fuzzy. | -A comparison with other popular classifiers can enrich the results. |
2018 | [117] | -Proposed a classifier based on OSM, and inspired on FSVM and k-NN. It geometrically separates the data to solve the imbalance problem. | -It uses a 1-NN algorithm for the hard-overlapping regions, which can result in a high generalization error. |
2019 | [121] | -Proposed two variants of EFSVM. One uses least squares and the other one uses twin SVM. | -The results could improve with the implementation of heuristics solutions in the method for parameter selection. |
2019 | [124] | -Proposed a method to generate balanced data with support vectors. | -The usage of real-world data can enrich the results of their method. |
2019 | [129] | -Proposed AFC-MOGD. An algorithm based on adjustable fuzzy classification with a multi-objective genetic strategy. | -The usage of real-world data can enrich the results of their method. |
2019 | [132] | -Proposed an instance-based EFSVM. | -The selection of different neighborhood sizes could improve the results due to the better knowledge of the distribution of the data. |
2019 | [135] | -Proposed a multilabel classification for complex activity recognition. | -A comparison against other fuzzy classifiers would enrich the results. |
2019 | [137] | -Proposed Fuzzy ADPTKNN. It combines ADPTKNN and fuzzy k-NN. | -The method could be extended to feature-based NN. |
2019 | [42] | -Proposed BD-EFEP. An algorithm for big data environments. | -Additional comparisons against big data environments can enrich the results. |
2019 | [141] | -Presents the effects of different quality measures in patterns, focused on Big Data Environments. | -New approaches for efficient extraction of patterns in big data environments are needed. |
2019 | [144] | -Proposed three approaches for mining subgroups in multiple instances problems. | -More tests are needed to improve their results and to determine the imbalance ratio in which the method is more suitable. |
2020 | [148] | -Proposed an adaptive version of NSGA-II. | -An optimization of the fuzzy sets could improve the results. |
2020 | [21] | -Proposed an extension of FSVM-CIL with a new distance measure and a new fuzzy function. | -A comparison against more fuzzy classifiers would enrich the results. |
2020 | [151] | -Proposed SASYNO. A self-adaptive synthetic over-sampling approach. | -The usage of more databases would enrich the results. |
2020 | [154] | -Presents a preliminary many objective algorithm for extracting Emerging Fuzzy Patterns. | -The usage of real-world databases would enrich the results. |
Application | Approach | Refs. | Advantage | Disadvantage |
---|---|---|---|---|
Theoretical | Data level | [51,63,67,69,72] | Data is artificially manipulated to deal with the imbalance problem. | The data created can lead to a bias in the classification process. |
Algorithm level | [20,21,42,91,100,102,107,108,115,117,121,124,129,132,135,137,141,144,148,151,154] | The results from the experiments tend to have positive results due to the fitting process of the problem. | Each solution solves the imbalance problem in their own scenario, which does not present a general solution for most cases. | |
Cost-sensitive | - | - | - | |
Medicine | Data level Algorithm level | [22] | The used methods have been deeply studied and have better results against other learning techniques. | They need to enhance how they handle classification errors. |
Algorithm level | [158] | The usage of Mahalanobis distance in combination with fuzzy kNN, improves the results of normal kNN. | The results are limited to the specific case. | |
Data level Algorithm level | [159] | The approach can be used in diseases, signal, and image classification | Optimization could be improved. | |
Algorithm level | [162] | The author mentioned that there are room for improvements, and they have promising results. | Accuracy and sensitivity are somehow low, 76% and 79% respectively. | |
Data level Cost-sensitive | [157] | Improved robustness and classification performance in contrast of a single lassifier. | The results are not known, but they will report them in future publications. | |
Financial | Algorithm level | [160] | Robust results for the P2P lending market. | The decision model could be considered too simple. |
Urban planning | Algorithm level | [166] | The model can adapt to abrupt changes and has great scalability. | It is unknown how it will perform outside the pilot city. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, I.; Loyola-González, O.; Monroy, R.; Medina-Pérez, M.A. A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems. Appl. Sci. 2021, 11, 6310. https://doi.org/10.3390/app11146310
Lin I, Loyola-González O, Monroy R, Medina-Pérez MA. A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems. Applied Sciences. 2021; 11(14):6310. https://doi.org/10.3390/app11146310
Chicago/Turabian StyleLin, Ismael, Octavio Loyola-González, Raúl Monroy, and Miguel Angel Medina-Pérez. 2021. "A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems" Applied Sciences 11, no. 14: 6310. https://doi.org/10.3390/app11146310