Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On the joint-effect of class imbalance and overlap: a critical review

Published: 01 December 2022 Publication History

Abstract

Current research on imbalanced data recognises that class imbalance is aggravated by other data intrinsic characteristics, among which class overlap stands out as one of the most harmful. The combination of these two problems creates a new and difficult scenario for classification tasks and has been discussed in several research works over the past two decades. In this paper, we argue that despite some insightful information can be derived from related research, the joint-effect of class overlap and imbalance is still not fully understood, and advocate for the need to move towards a unified view of the class overlap problem in imbalanced domains. To that end, we start by performing a thorough analysis of existing literature on the joint-effect of class imbalance and overlap, elaborating on important details left undiscussed on the original papers, namely the impact of data domains with different characteristics and the behaviour of classifiers with distinct learning biases. This leads to the hypothesis that class overlap comprises multiple representations, which are important to accurately measure and analyse in order to provide a full characterisation of the problem. Accordingly, we devise two novel taxonomies, one for class overlap measures and the other for class overlap-based approaches, both resonating with the distinct representations of class overlap identified. This paper therefore presents a global and unique view on the joint-effect of class imbalance and overlap, from precursor work to recent developments in the field. It meticulously discusses some concepts taken as implicit in previous research, explores new perspectives in light of the limitations found, and presents new ideas that will hopefully inspire researchers to move towards a unified view on the problem and the development of suitable strategies for imbalanced and overlapped domains.

References

[1]
Abdi L and Hashemi S To combat multi-class imbalanced problems by means of over-sampling techniques IEEE Trans Knowl Data Eng 2015 28 1 238-251
[2]
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European conference on machine learning. Springer, pp 39–50
[3]
Alejo R, Valdovinos RM, García V, and Pacheco-Sanchez JH A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios Pattern Recogn Lett 2013 34 4 380-388
[4]
Anwar N, Jones G, and Ganesh S Measurement of data complexity for classification problems with unbalanced data Stat Anal Data Min ASA Data Sci J 2014 7 3 194-211
[5]
Armano G and Tamponi E Experimenting multiresolution analysis for identifying regions of different classification complexity Pattern Anal Appl 2016 19 1 129-137
[6]
Barandela R, Valdovinos RM, and Sánchez JS New applications of ensembles of classifiers Pattern Anal Appl 2003 6 3 245-256
[7]
Barella VH, Costa EP, Carvalho A, Pl F (2014) Clusteross: a new undersampling method for imbalanced learning. In: Proceedings of the 3th Brazilian conference on intelligent systems. Academic Press
[8]
Barella VH, Garcia LP, de Souto MP, Lorena AC, de Carvalho A (2018) Data complexity measures for imbalanced classification tasks. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
[9]
Barella VH, Garcia LP, de Souto MC, Lorena AC, and de Carvalho AC Assessing the data complexity of imbalanced datasets Inf Sci 2021 553 83-109
[10]
Barua S, Islam M, Yao X, and Murase K Mwmote-majority weighted minority oversampling technique for imbalanced data set learning IEEE Trans Knowl Data Eng 2014 26 2 405-425
[11]
Batista GE, Prati RC, and Monard MC A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explor Newsl 2004 6 1 20-29
[12]
Batuwita R and Palade V Fsvm-cil: fuzzy support vector machines for class imbalance learning IEEE Trans Fuzzy Syst 2010 18 3 558-571
[13]
Bi J and Zhang C An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme Knowl Based Syst 2018 158 81-93
[14]
Borsos Z, Lemnaru C, and Potolea R Dealing with overlap and imbalance: a new metric and approach Pattern Anal Appl 2018 21 2 381-395
[15]
Breiman L Bagging predictors Mach Learn 1996 24 2 123-140
[16]
Bunkhumpornpat C and Sinapiromsaran K Dbmute: density-based majority under-sampling technique Knowl Inf Syst 2017 50 3 827-850
[17]
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 475–482
[18]
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2011) Mute: majority under-sampling technique. In: 2011 8th international conference on information, communications and signal processing. IEEE, pp 1–4
[19]
Bunkhumpornpat C, Sinapiromsaran K, and Lursinsap C Dbsmote: density-based synthetic minority over-sampling technique Appl Intell 2012 36 3 664-684
[20]
Cao H, Li XL, Woon DYK, and Ng SK Integrated oversampling for imbalanced time series classification IEEE Trans Knowl Data Eng 2013 25 12 2809-2822
[21]
Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP Smote: synthetic minority over-sampling technique J Artif Intell Res 2002 16 321-357
[22]
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
[23]
Chen S (2017) An improved synthetic minority over-sampling technique for imbalanced data set learning. Degree thesis of Department of Information Engineering, National Tsing Hua University, pp 1–59
[24]
Chen S, He H, and Garcia EA Ramoboost: ranked minority oversampling in boosting IEEE Trans Neural Netw 2010 21 10 1624-1642
[25]
Chen L, Fang B, Shang Z, and Tang Y Tackling class overlap and imbalance problems in software defect prediction Softw Qual J 2018 26 1 97-125
[26]
Chen X, Zhang L, Wei X, Lu X (2021) An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets. Appl Intell 51(4):1918–1933
[27]
Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, Citeseer, pp 732–737
[28]
Cohen G, Hilario M, Sax H, Hugonnet S, and Geissbuhler A Learning from imbalanced data in surveillance of nosocomial infection Artif Intell Med 2006 37 1 7-18
[29]
Correia A, Soares C, Jorge A (2019) Dataset morphing to analyze the performance of collaborative filtering. In: International conference on discovery science. Springer, pp 29–39
[30]
Costa AJ, Santos MS, Soares C, Abreu PH (2020) Analysis of imbalance strategies recommendation using a meta-learning approach. In: 7th ICML workshop on automated machine learning (AutoML-ICML2020), pp 1–10
[31]
Cummins L (2013) Combining and choosing case base maintenance algorithms. PhD thesis, University College Cork
[32]
Das B, Krishnan NC, Cook DJ (2014a) Handling imbalanced and overlapping classes in smart environments prompting dataset. In: Data mining for service. Springer, pp 199–219
[33]
Das B, Krishnan NC, and Cook DJ Racog and wracog: two probabilistic oversampling techniques IEEE Trans Knowl Data Eng 2014 27 1 222-234
[34]
Das S, Datta S, and Chaudhuri B Handling data irregularities in classification: foundations, trends, and future challenges Pattern Recogn 2018 81 674-693
[35]
de Melo VV, Lorena AC (2018) Using complexity measures to evolve synthetic classification datasets. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
[36]
Deb K, Pratap A, Agarwal S, and Meyarivan T A fast and elitist multiobjective genetic algorithm: Nsga-ii IEEE Trans Evol Comput 2002 6 2 182-197
[37]
Denil M, Trappenberg T (2010) Overlap versus imbalance. In: Canadian conference on artificial intelligence. Springer, pp 220–231
[38]
Douzas G and Bacao F Geometric smote a geometrically enhanced drop-in replacement for smote Inf Sci 2019 501 118-135
[39]
Douzas G, Bacao F, and Last F Improving imbalanced learning through a heuristic oversampling method based on k-means and smote Inf Sci 2018 465 1-20
[40]
Eshelman LJ (1991) The chc adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination. In: Foundations of genetic algorithms, vol 1. Elsevier, pp 265–283
[41]
Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise Kdd 1996 96 226-231
[42]
Fan Q, Wang Z, Li D, Gao D, and Zha H Entropy-based fuzzy support vector machine for imbalanced datasets Knowl Based Syst 2017 115 87-99
[43]
Fernandes ER and de Carvalho AC Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning Inf Sci 2019 494 141-154
[44]
Fernández A, García S, Galar M, Prati R, Krawczyk B, and Herrera F Data Intrinsic Characteristics 2018 Cham Springer 253-277
[45]
Fernández A, García S, Galar M, Prati R, Krawczyk B, and Herrera F Ensemble Learning 2018 Cham Springer 147-196
[46]
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018c) Dimensionality reduction for imbalanced learning. In: Learning from imbalanced data sets. Springer, pp 227–251
[47]
Fernández A, García S, Galar M, Prati RC, Krawczyk B, and Herrera F Learning From Imbalanced Data Sets 2018 Berlin Springer
[48]
Fernández A, Garcia S, Herrera F, and Chawla NV Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary J Artif Intell Res 2018 61 863-905
[49]
França TR, Miranda PB, Prudêncio RB, Lorenaz AC, Nascimento AC (2020) A many-objective optimization approach for complexity-based data set generation. In: 2020 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–8
[50]
Freund Y and Schapire RE A decision-theoretic generalization of on-line learning and an application to boosting J Comput Syst Sci 1997 55 1 119-139
[51]
Friedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors) Ann Stat 2000 28 2 337-407
[52]
Fu GH, Wu YJ, Zong MJ, and Yi LZ Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics Chemom Intell Lab Syst 2020 196 103906
[53]
Galar M, Fernández A, Barrenechea E, Bustince H, and Herrera F Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers Pattern Recogn 2013 46 12 3412-3424
[54]
Galar M, Fernández A, Barrenechea E, and Herrera F Drcw-ovo: distance-based relative competence weighting combination for one-vs-one strategy in multi-class problems Pattern Recogn 2015 48 1 28-42
[55]
García S and Herrera F Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy Evol Comput 2009 17 3 275-306
[56]
García V, Alejo R, Sánchez J, Sotoca J, Mollineda R (2006) Combined effects of class imbalance and class overlap on instance-based classification. In: International conference on intelligent data engineering and automated learning. Springer, pp 371–378
[57]
García V, Mollineda R, Sánchez J, Alejo R, Sotoca J (2007a) When overlapping unexpectedly alters the class imbalance effects. In: Iberian conference on pattern recognition and image analysis. Springer, pp 499–506
[58]
García V, Sánchez J, Mollineda R (2007b) An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Iberoamerican congress on pattern recognition. Springer, pp 397–406
[59]
García V, Mollineda R, and Sánchez J On the k-nn performance in a challenging scenario of imbalance and overlapping Pattern Anal Appl 2008 11 3–4 269-280
[60]
García V, Sánchez J, Marqués A, Florencia R, and Rivera G Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data Expert Syst Appl 2020 158 113026
[61]
Greene J (2001) Feature subset selection using thornton’s separability index and its applicability to a number of sparse proximity-based classifiers. In: Proceedings of annual symposium of the pattern recognition association of South Africa
[62]
Guzmán-Ponce A, Valdovinos RM, Sánchez JS, and Marcial-Romero JR A new under-sampling method to face class overlap and imbalance Appl Sci 2020 10 15 5164
[63]
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, and Bing G Learning from class-imbalanced data: review of methods and applications Expert Syst Appl 2017 73 220-239
[64]
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
[65]
Hart P The condensed nearest neighbor rule (corresp.) IEEE Trans Inf Theory 1968 14 3 515-516
[66]
He H, Bai Y, Garcia E, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328
[67]
Ho T and Basu M Complexity measures of supervised classification problems IEEE Trans Pattern Anal Mach Intell 2002 24 3 289-300
[68]
Huttenlocher DP, Klanderman GA, and Rucklidge WJ Comparing images using the hausdorff distance IEEE Trans Pattern Anal Mach Intell 1993 15 9 850-863
[69]
Jain A, Duin R, and Mao J Statistical pattern recognition: a review IEEE Trans Pattern Anal Mach Intell 2000 22 1 4-37
[70]
Japkowicz N (2001) Concept-learning in the presence of between-class and within-class imbalances. In: Conference of the Canadian society for computational studies of intelligence. Springer, pp 67–77
[71]
Jo T and Japkowicz N Class imbalances versus small disjuncts ACM SIGKDD Explor Newsl 2004 6 1 40-49
[72]
Kang S, Cho S, and Kang P Constructing a multi-class classifier using one-against-one approach with different binary classifiers Neurocomputing 2015 149 677-682
[73]
Kaur H, Pannu HS, and Malhi AK A systematic review on imbalanced data challenges in machine learning: applications and solutions ACM Comput Surv (CSUR) 2019 52 4 1-36
[74]
Kovács G An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets Appl Soft Comput 2019 83 105662
[75]
Koziarski M and Wozniak M Ccr: a combined cleaning and resampling algorithm for imbalanced data classification Int J Appl Math Comput Sci 2017 27 4 727-736
[76]
Koziarski M, Krawczyk B, and Wozniak M Radial-based oversampling for noisy imbalanced data classification Neurocomputing 2019 343 19-33
[77]
Krawczyk B Learning from imbalanced data: open challenges and future directions Progr. Artif. Intell. 2016 5 4 221-232
[78]
Kubat M, Matwin S, et al. Addressing the curse of imbalanced training sets: one-sided selection Icml Citeseer 1997 97 179-186
[79]
Lango M, Brzezinski D, Firlik S, Stefanowski J (2017) Discovering minority sub-clusters and local difficulty factors from imbalanced data. In: International conference on discovery science. Springer, pp 324–339
[80]
Lango M, Brzezinski D, Stefanowski J (2018) Imweights: classifying imbalanced data using local and neighborhood information. In: Second international workshop on learning with imbalanced domains: theory and applications, PMLR, pp 95–109
[81]
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe. Springer, pp 63–66
[82]
Lee HK and Kim SB An overlap-sensitive margin classifier for imbalanced and overlapping data Expert Syst Appl 2018 98 72-83
[83]
Leyva E, González A, and Perez R A set of complexity measures designed for applying meta-learning to instance selection IEEE Trans Knowl Data Eng 2014 27 2 354-367
[84]
Li KS, Wang HR, and Liu KH A novel error-correcting output codes algorithm based on genetic programming Swarm Evol Comput 2019 50 100564
[85]
Liu C Partial discriminative training for classification of overlapping classes in document analysis IJDAR 2008 11 2 53
[86]
Liu XY, Wu J, and Zhou ZH Exploratory undersampling for class-imbalance learning IEEE Trans Syst Man Cybern Part B (Cybern) 2008 39 2 539-550
[87]
López V, Fernández A, García S, Palade V, and Herrera F An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics Inf Sci 2013 250 113-141
[88]
Lorena AC, Costa IG, Spolaôr N, and De Souto MC Analysis of complexity indices for classification problems: cancer gene expression data Neurocomputing 2012 75 1 33-42
[89]
Lorena AC, Garcia LP, Lehmann J, Souto MC, and Ho TK How complex is your classification problem? A survey on measuring classification complexity ACM Comput Surv (CSUR) 2019 52 5 1-34
[90]
Luengo J, Fernández A, García S, and Herrera F Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling Soft Comput 2011 15 10 1909-1936
[91]
MacCuish J and MacCuish N Clustering in Bioinformatics and Drug Discovery 2010 London CRC Press
[92]
Macià N and Bernadó-Mansilla E Towards uci+: a mindful repository design Inf Sci 2014 261 237-262
[93]
Malina W Two-parameter fisher criterion IEEE Trans Syst Man Cybern Part B (Cybern) 2001 31 4 629-636
[94]
Mani I, Zhang I (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, ICML United States, vol 126
[95]
Manukyan A and Ceyhan E Classification of imbalanced data with a geometric digraph family J Mach Learn Res 2016 17 1 6504-6543
[96]
Massie S, Craw S, and Wiratunga N Complexity-guided case discovery for case based reasoning AAAI 2005 5 216-221
[97]
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, and Zimmermann T Local versus global lessons for defect prediction and effort estimation IEEE Trans Softw Eng 2012 39 6 822-834
[98]
Mercier M, Santos M, Abreu P, Soares C, Soares J, Santos J (2018) Analysing the footprint of classifiers in overlapped and imbalanced contexts. In: International symposium on intelligent data analysis. Springer, pp 200–212
[99]
Muñoz MA, Villanova L, Baatar D, and Smith-Miles K Instance spaces for machine learning classification Mach Learn 2018 107 1 109-147
[100]
Napierala K and Stefanowski J Types of minority class examples and their influence on learning classifiers from imbalanced data J Intell Inf Syst 2016 46 3 563-597
[101]
Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. In: International conference on rough sets and current trends in computing. Springer, pp 158–167
[102]
Nekooeimehr I and Lai-Yuen SK Adaptive semi-unsupervised weighted oversampling (a-suwo) for imbalanced datasets Expert Syst Appl 2016 46 405-416
[103]
Oh S A new dataset evaluation method based on category overlap Comput Biol Med 2011 41 2 115-122
[104]
Orriols-Puig A, Macia N, and Ho TK Documentation for the data complexity library in c++ Universitat Ramon Llull, La Salle 2010 196 1-40
[105]
Pascual-Triana JD, Charte D, Andrés Arroyo M, Fernández A, and Herrera F Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect Knowl Inf Syst 2021 63 7 1961-1989
[106]
Prati RGB, Monard M (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican international conference on artificial intelligence. Springer, pp 312–321
[107]
Rivolli A, Garcia LP, Soares C, Vanschoren J, de Carvalho AC (2018) Characterizing classification datasets: a study of meta-features for meta-learning. arXiv:180810406
[108]
Sáez J, Luengo J, Stefanowski J, and Herrera F Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering Inf Sci 2015 291 184-203
[109]
Sáez JA, Galar M, and Krawczyk B Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy IEEE Access 2019 7 83396-83411
[110]
Santos M, Abreu P, García-Laencina P, Simão A, and Carvalho A A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients J Biomed Inform 2015 58 49-59
[111]
Santos M, Soares J, Abreu P, Araújo H, and Santos J Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches IEEE Comput Intell Mag 2018 13 3 59-76
[112]
Santoso B, Wijayanto H, Notodiputro KA, and Sartono B K-neighbor over-sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance Appl Math Sci 2018 12 10 449-460
[113]
Seiffert C, Khoshgoftaar TM, Van Hulse J, and Napolitano A Rusboost: a hybrid approach to alleviating class imbalance IEEE Trans Syst Man, Cybern Part A Syst Hum 2009 40 1 185-197
[114]
Selvaraj G, Kaliamurthi S, Kaushik A, Khan A, Wei Y, Cho W, Gu K, and Wei D Identification of target gene and prognostic evaluation for lung adenocarcinoma using gene expression meta-analysis, network analysis and neural network algorithms J Biomed Inform 2018 86 120-134
[115]
Shilaskar S, Ghatol A, and Chatur P Medical decision support system for extremely imbalanced datasets Inf Sci 2017 384 205-219
[116]
Singh S Multiresolution estimates of classification complexity IEEE Trans Pattern Anal Mach Intell 2003 25 12 1534-1539
[117]
Singh S Prism-a novel framework for pattern recognition Pattern Anal Appl 2003 6 2 134-149
[118]
Singh D, Gosain A, and Saha A Weighted k-nearest neighbor based data complexity metrics for imbalanced datasets Stat Anal Data Min ASA Data Sci J 2020 13 4 394-404
[119]
Slowik A, Kwasnicka H (2020) Evolutionary algorithms and their applications to engineering problems. Neural Comput Appl 32(16):12363–12379
[120]
Smith MR, Martinez T, and Giraud-Carrier C An instance level analysis of data complexity Mach Learn 2014 95 2 225-256
[121]
Sotoca JM, Sanchez J, Mollineda RA (2005) A review of data complexity measures and their applicability to pattern classification problems. Actas del III Taller Nacional de Mineria de Datos y Aprendizaje TAMIDA, pp 77–83
[122]
Sotoca JM, Mollineda RA, and Sánchez JS A meta-learning framework for pattern classication by means of data complexity measures Inteligencia Artificial Revista Iberoamericana de Inteligencia Artificial 2006 10 29 31-38
[123]
Sowah RA, Agebure MA, Mills GA, Koumadi KM, and Fiawoo SY New cluster undersampling technique for class imbalance learning Int J Mach Learn Comput 2016 6 3 205
[124]
Stefanowski J (2013) Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Emerging paradigms in machine learning. Springer, pp 277–306
[125]
Stefanowski J (2016) Dealing with data difficulty factors while learning from imbalanced data. In: Challenges in computational statistics and data mining. Springer, pp 333–363
[126]
Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: International conference on data warehousing and knowledge discovery. Springer, pp 283–292
[127]
Tang Y and Gao J Improved classification for problem involving overlapping patterns IEICE Trans Inf Syst 2007 90 11 1787-1795
[128]
Tang W, Mao K, Mak LO, Ng GW (2010) Classification for overlapping classes using optimized overlapping region detection and soft decision. In: 2010 13th international conference on information fusion. IEEE, pp 1–8
[129]
Thornton C (1998) Separability is a learner’s best friend. In: 4th Neural computation and psychology workshop, London, 9–11 April 1997. Springer, pp 40–46
[130]
Tomek I Two modifications of cnn IEEE Trans Syst Man Commun 1976 6 769-772
[131]
Vorraboot P, Rasmequan S, Chinnasarn K, and Lursinsap C Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms Neurocomputing 2015 152 429-443
[132]
Vuttipittayamongkol P, Elyan E (2020a) Improved overlap-based undersampling for imbalanced dataset classification with application to epilepsy and Parkinson’s disease. Int J Neural Syst 30(08):2050043
[133]
Vuttipittayamongkol P and Elyan E Neighbourhood-based undersampling approach for handling imbalanced and overlapped data Inf Sci 2020 509 47-70
[134]
Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C (2018) Overlap-based undersampling for improving imbalanced data classification. In: International conference on intelligent data engineering and automated learning. Springer, pp 689–697
[135]
Vuttipittayamongkol P, Elyan E, Petrovski A (2020) On the class overlap problem in imbalanced data classification. Knowl Based Syst 106631
[136]
Van der Walt CM, Barnard E (2007) Measures for the characterisation of pattern-recognition data sets. In: 18th Annual symposium of the pattern recognition association of South Africa
[137]
Van der Walt CM, et al. (2008) Data measures that characterise classification problems. PhD thesis, University of Pretoria
[138]
Wang BX and Japkowicz N Boosting support vector machines for imbalanced data sets Knowl Inf Syst 2010 25 1 1-20
[139]
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining. IEEE, pp 324–331
[140]
Wang S and Yao X Using class imbalance learning for software defect prediction IEEE Trans Reliab 2013 62 2 434-443
[141]
Wei J, Huang H, Yao L, Hu Y, Fan Q, and Huang D Ia-suwo: an improving adaptive semi-unsupervised weighted oversampling for imbalanced classification problems Knowl Based Syst 2020 203 106116
[142]
Wei J, Huang H, Yao L, Hu Y, Fan Q, and Huang D Ni-mwmote: an improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems Expert Syst Appl 2020 158 113504
[143]
Weng CG, Poon J (2006) A data complexity analysis on imbalanced datasets and an alternative imbalance recovering strategy. In: 2006 IEEE/WIC/ACM international conference on web intelligence (WI 2006 main conference proceedings) (WI’06). IEEE, pp 270–276
[144]
Wilson DL Asymptotic properties of nearest neighbor rules using edited data IEEE Trans Syst Man Cybern 1972 3 408-421
[145]
Wojciechowski S and Wilk S Difficulty factors and preprocessing in imbalanced data sets: an experimental study on artificial data Found Comput Decis Sci 2017 42 2 149-176
[146]
Wozniak M, Grana M, and Corchado E A survey of multiple classifier systems as hybrid systems Inf Fusion 2014 16 3-17
[147]
Xiong H, Wu J, Liu L (2010) classification with classoverlapping: a systematic study. In: Proceedings of the 1st international conference on E-Business intelligence (ICEBI2010). Atlantis Press
[148]
Yan Y, Liu R, Ding Z, Du X, Chen J, and Zhang Y A parameter-free cleaning method for smote in imbalanced classification IEEE Access 2019 7 23537-23548
[149]
Yen SJ and Lee YS Cluster-based under-sampling approaches for imbalanced data distributions Expert Syst Appl 2009 36 3 5718-5727
[150]
Zhu C and Wang Z Entropy-based matrix learning machine for imbalanced data sets Pattern Recogn Lett 2017 88 72-80
[151]
Zhu T, Lin Y, and Liu Y Synthetic minority oversampling technique for multiclass imbalance problems Pattern Recogn 2017 72 327-340
[152]
Zhu T, Lin Y, and Liu Y Improving interpolation-based oversampling for imbalanced data learning Knowl-Based Syst 2020 187 104826
[153]
Zhu Y, Yan Y, Zhang Y, and Zhang Y Ehso: evolutionary hybrid sampling in overlapping scenarios for imbalanced learning Neurocomputing 2020 417 333-346

Cited By

View all
  • (2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
  • (2024)Customer churn prediction using a novel meta-classifier: an investigation on transaction, Telecommunication and customer churn datasetsJournal of Combinatorial Optimization10.1007/s10878-024-01196-w48:1Online publication date: 3-Aug-2024
  • (2024)A cluster impurity-based hybrid resampling for imbalanced classification problemsApplied Intelligence10.1007/s10489-024-05644-254:20(9671-9684)Online publication date: 1-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Artificial Intelligence Review
Artificial Intelligence Review  Volume 55, Issue 8
Dec 2022
783 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2022

Author Tags

  1. Class imbalance
  2. Class overlap
  3. Data intrinsic characteristics
  4. Class overlap complexity measures
  5. Class overlap-based approaches
  6. Class overlap representations

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
  • (2024)Customer churn prediction using a novel meta-classifier: an investigation on transaction, Telecommunication and customer churn datasetsJournal of Combinatorial Optimization10.1007/s10878-024-01196-w48:1Online publication date: 3-Aug-2024
  • (2024)A cluster impurity-based hybrid resampling for imbalanced classification problemsApplied Intelligence10.1007/s10489-024-05644-254:20(9671-9684)Online publication date: 1-Oct-2024
  • (2024)Noise-free sampling with majority framework for an imbalanced classification problemKnowledge and Information Systems10.1007/s10115-024-02079-666:7(4011-4042)Online publication date: 1-Jul-2024
  • (2023)Adaptive Fusion for Visual Question Answering: Integrating Multi-Label Classification and Similarity MatchingProceedings of the 5th ACM International Conference on Multimedia in Asia10.1145/3595916.3626381(1-7)Online publication date: 6-Dec-2023
  • (2023)Class-overlap undersampling based on Schur decomposition for Class-imbalance problemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119735221:COnline publication date: 1-Jul-2023
  • (2023)A systematic review for class-imbalance in semi-supervised learningArtificial Intelligence Review10.1007/s10462-023-10579-056:Suppl 2(2349-2382)Online publication date: 1-Nov-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media