Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Survey of Predictive Modeling on Imbalanced Domains

Published: 13 August 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

    References

    [1]
    Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. 2004. Applying support vector machines to imbalanced datasets. In Machine Learning: ECML 2004. Springer, 39--50.
    [2]
    Roberto Alejo, J. A. Antonio, Rosa Maria Valdovinos, and J. Horacio Pacheco-Sánchez. 2013. Assessments metrics for multi-class imbalance learning: A preliminary study. In Pattern Recognition. Springer, 335--343.
    [3]
    Roberto Alejo, Vicente García, and J. Horacio Pacheco-Sánchez. 2014. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neur. Process. Lett. (2014), 1--15.
    [4]
    Roberto Alejo, Vicente García, José Martínez Sotoca, Ramón Alberto Mollineda, and José Salvador Sánchez. 2007. Improving the performance of the RBF neural networks trained with imbalanced samples. In Computational and Ambient Intelligence. Springer, 162--169.
    [5]
    Roberto Alejo, Rosa Maria Valdovinos, Vicente García, and J. Horacio Pacheco-Sanchez. 2013. A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn. Lett. 34, 4 (2013), 380--388.
    [6]
    Roberto Alejo Eleuterio, José Martínez Sotoca, Vicente García Jiménez, and Rosa María Valdovinos Rosas. 2011. Back propagation with balanced MSE cost function and nearest neighbor editing for handling class overlap and class imbalance. (2011).
    [7]
    Josh Attenberg and Seyda Ertekin. 2013. Class imbalance and active learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.
    [8]
    Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In Ninth International Conference on Intelligent Systems Design and Applications, 2009. ISDA'09. IEEE, 283--287.
    [9]
    Gaurav Bansal, Atish P. Sinha, and Huimin Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. J. Manag. Inform. Syst. 25, 3 (2008), 315--336.
    [10]
    Ricardo Barandela, José Salvador Sánchez, Vicente Garcia, and Edgar Rangel. 2003. Strategies for learning in class imbalance problems. Pattern Recogn. 36, 3 (2003), 849--851.
    [11]
    Vincent Barnab-Lortie, Colin Bellinger, and Nathalie Japkowicz. 2015. Active learning for one-class classification. In Proceedings of ICMLA'2015.
    [12]
    Sukarna Barua, Monirul Islam, Xin Yao, and Kazuyuki Murase. 2012. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering (2012), 1.
    [13]
    Guilherme Batista, Danilo Silva, and Ronaldo Prati. 2012. An experimental design to evaluate class imbalance treatment methods. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 95--101.
    [14]
    Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 20--29.
    [15]
    Rukshan Batuwita and Vasile Palade. 2009. A new performance measure for class imbalance learning. Application to bioinformatics problems. In International Conference on Machine Learning and Applications, 2009. ICMLA'09. IEEE, 545--550.
    [16]
    Rukshan Batuwita and Vasile Palade. 2010a. Efficient resampling methods for training support vector machines with imbalanced datasets. In The 2010 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
    [17]
    Rukshan Batuwita and Vasile Palade. 2010b. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18, 3 (2010), 558--571.
    [18]
    Rukshan Batuwita and Vasile Palade. 2012. Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning. J. Bioinform. Comput. Biol. 10, 4 (2012).
    [19]
    Colin Bellinger, Nathalie Japkowicz, and Christopher Drummond. 2015. Synthetic oversampling for advanced radioactive threat detection. In Proceedings ICML'2015.
    [20]
    Colin Bellinger, Shiven Sharma, and Nathalie Japkowicz. 2012. One-class versus binary classification: Which and when? In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 102--106.
    [21]
    Jinbo Bi and Kristin P. Bennett. 2003. Regression error characteristic curves. In Proc. of the 20th Int. Conf. on Machine Learning. 43--50.
    [22]
    Jerzy Błaszczyński and Jerzy Stefanowski. 2015. Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150 (2015), 529--542.
    [23]
    Andrew P. Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 7 (1997), 1145--1159.
    [24]
    Paula Branco. 2014. Re-sampling Approaches for Regression Tasks under Imbalanced Domains. Master's thesis. Dept. Computer Science, Faculty of Sciences, University of Porto.
    [25]
    Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and regression trees. Wadsworth & Brooks, Monterey, CA (1984).
    [26]
    Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2009. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Advances in Knowledge Discovery and Data Mining. Springer, 475--482.
    [27]
    Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2011. MUTE: Majority under-sampling technique. In 2011 8th International Conference on Information, Communications and Signal Processing (ICICS). IEEE, 1--4.
    [28]
    Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2012. DBSMOTE: Density-based synthetic minority over-sampling technique. Applied Intelligence 36, 3 (2012), 664--684.
    [29]
    Chumphol Bunkhumpornpat and Sitthichoke Subpaiboonkit. 2013. Safe level graph for synthetic minority over-sampling techniques. In 2013 13th International Symposium on Communications and Information Technologies (ISCIT). IEEE, 570--575.
    [30]
    Michael Cain and Christian Janssen. 1995. Real estate price prediction under asymmetric loss. Ann. Inst. Stat. Math. 47, 3 (1995), 401--414.
    [31]
    Peng Cao, Dazhe Zhao, and Osmar R. Zaïane. 2013. A PSO-based cost-sensitive neural network for imbalanced data classification. In Trends and Applications in Knowledge Discovery and Data Mining. Springer, 452--463.
    [32]
    Cristiano Leite Castro and Antônio de Pádua Braga. 2013. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neur. Netw. Learn. Syst. 24, 6 (2013), 888--899.
    [33]
    Edward Y. Chang, Beitao Li, Gang Wu, and Kingshy Goh. 2003. Statistical learning for effective visual information retrieval. In ICIP (3). 609--612.
    [34]
    Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015a. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163 (2015), 3--16.
    [35]
    Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015b. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89 (2015), 385--397.
    [36]
    Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. JAIR 16 (2002), 321--357.
    [37]
    Nitesh V. Chawla, David A. Cieslak, Lawrence O. Hall, and Ajay Joshi. 2008. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 17, 2 (2008), 225--252.
    [38]
    Nitesh V. Chawla, Lawrence O. Hall, and Ajay Joshi. 2005. Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, New York, NY, 24--33.
    [39]
    Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 1--6.
    [40]
    Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Knowledge Discovery in Databases: PKDD 2003. Springer, 107--119.
    [41]
    Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).
    [42]
    Sheng Chen, Haibo He, and Edwardo A. Garcia. 2010. Ramoboost: Ranked minority oversampling in boosting. IEEE Trans. Neural Networks 21, 10 (2010), 1624--1642.
    [43]
    Xue-wen Chen and Michael Wasikowski. 2008. Fast: A roc-based feature selection metric for small samples and imbalanced data classification problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 124--132.
    [44]
    Peter F. Christoffersen and Francis X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. J. Appl. Econom. 11, 5 (1996), 561--571.
    [45]
    Peter F. Christoffersen and Francis X. Diebold. 1997. Optimal prediction under asymmetric loss. Econom. Theor. 13, 6 (1997), 808--817.
    [46]
    Leilei Chu, Hui Gao, and Wenbo Chang. 2010. A new feature weighting method based on probability distribution in imbalanced text classification. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Vol. 5. IEEE, 2335--2339.
    [47]
    Yu-Meei Chyi. 2003. Classification analysis techniques for skewed class distribution problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University (2003).
    [48]
    David A. Cieslak and Nitesh V. Chawla. 2008. Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases. Springer, 241--256.
    [49]
    David A. Cieslak, Thomas R. Hoens, Nitesh V. Chawla, and W. Philip Kegelmeyer. 2012. Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24, 1 (2012), 136--158.
    [50]
    Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, and Antoine Geissbuhler. 2006. Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 1 (2006), 7--18.
    [51]
    Sven F. Crone, Stefan Lessmann, and Robert Stahlbock. 2005. Utility based data mining for time series analysis: Cost-sensitive learning for neural network predictors. In Proceedings of the 1st International Workshop on Utility-based Data Mining. ACM, New York, NY, 59--68.
    [52]
    Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi. 2015. When is undersampling effective in unbalanced classification tasks? In Machine Learning and Knowledge Discovery in Databases. Springer, 200--215.
    [53]
    Sophia Daskalaki, Ioannis Kopanas, and Nikolaos M. Avouris. 2006. Evaluation of classifiers for an uneven class distribution problem. Appl. Artif. Intell. 20, 5 (2006), 381--417.
    [54]
    Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. In ICML'06: Proc. of the 23rd Int. Conf. on Machine Learning (ACM ICPS). ACM, New York, NY, 233--240.
    [55]
    María Dolores Del Castillo and José Ignacio Serrano. 2004. A multistrategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 70--79.
    [56]
    Misha Denil and Thomas Trappenberg. 2010. Overlap versus imbalance. In Advances in Artificial Intelligence. Springer, 220--231.
    [57]
    Pedro Domingos. 1999. MetaCost: A general method for making classifiers cost-sensitive. In KDD'99: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 155--164.
    [58]
    John Doucette and Malcolm I. Heywood. 2008. GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In Genetic Programming. Springer, 266--277.
    [59]
    Dennis J. Drown, Taghi M. Khoshgoftaar, and Naeem Seliya. 2009. Evolutionary sampling and software quality modeling of high-assurance systems. IEEE Trans. Syst. Man Cybernet. A 39, 5 (2009), 1097--1107.
    [60]
    Chris Drummond and Robert C. Holte. 2000. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 198--207.
    [61]
    Chris Drummond and Robert C. Holte. 2003. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, Vol. 11. Citeseer.
    [62]
    James P. Egan. 1975. Signal detection theory and {ROC} analysis. (1975).
    [63]
    Charles Elkan. 2001. The foundations of cost-sensitive learning. In IJCAI'01: Proc. of 17th Int. Joint Conf. of Artificial Intelligence, Vol. 1. Morgan Kaufmann Publishers, 973--978.
    [64]
    Şeyda Ertekin. 2013. Adaptive oversampling for imbalanced data classification. In Information Sciences and Systems 2013. Springer, 261--269.
    [65]
    Şeyda Ertekin, Jian Huang, Leon Bottou, and Lee Giles. 2007b. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. ACM, New York, NY, 127--136.
    [66]
    Şeyda Ertekin, Jian Huang, and C. Lee Giles. 2007a. Active learning for class imbalance problem. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 823--824.
    [67]
    Andrew Estabrooks and Nathalie Japkowicz. 2001. A mixture-of-experts framework for learning from imbalanced data sets. In Advances in Intelligent Data Analysis. Springer, 34--43.
    [68]
    Andrew Estabrooks, Taeho Jo, and Nathalie Japkowicz. 2004. A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20, 1 (2004), 18--36.
    [69]
    Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 8 (2006), 861--874.
    [70]
    Alberto Fernández, María José del Jesus, and Francisco Herrera. 2010. On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inform. Sci. 180, 8 (2010), 1268--1291.
    [71]
    Alberto Fernández, Salvador García, María José del Jesus, and Francisco Herrera. 2008. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159, 18 (2008), 2378--2398.
    [72]
    Antonio Fernández-Baldera, José M. Buenaposada, and Luis Baumela. 2015. Multi-class boosting for imbalanced data. In Pattern Recognition and Image Analysis. Springer, 57--64.
    [73]
    César Ferri, Peter Flach, José Hernández-Orallo, and Athmane Senad. 2005. Modifying ROC curves to incorporate predicted probabilities. In Proceedings of the Second Workshop on ROC Analysis in Machine Learning. 33--40.
    [74]
    César Ferri, José Hernández-orallo, and Peter A. Flach. 2011a. Brier curves: A new cost-based visualisation of classifier performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 585--592.
    [75]
    César Ferri, José Hernández-Orallo, and Peter A. Flach. 2011b. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 657--664.
    [76]
    César Ferri, José Hernández-Orallo, and R. Modroiu. 2009. An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 1 (2009), 27--38.
    [77]
    George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (2003), 1289--1305.
    [78]
    George Forman and Ira Cohen. 2004. Learning from little: Comparison of classifiers given little training. In Knowledge Discovery in Databases: PKDD 2004. Springer, 161--172.
    [79]
    Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. C 42, 4 (2012), 463--484.
    [80]
    Mikel Galar, Alberto Fernández, Edurne Barrenechea, and Francisco Herrera. 2013. Eusboost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. (2013).
    [81]
    Ming Gao, Xia Hong, Sheng Chen, Chris J. Harris, and Emad Khalaf. 2014. PDFOS: PDF estimation based over-sampling for imbalanced two-class problems. Neurocomputing 138 (2014), 248--259.
    [82]
    Joaquín García, Salvador Derrac, Isaac Triguero, Cristobal J. Carmona, and Francisco Herrera. 2012. Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25, 1 (2012), 3--12.
    [83]
    Salvador García, José Ramón Cano, Alberto Fernández, and Francisco Herrera. 2006. A proposal of evolutionary prototype selection for class imbalance problems. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 1415--1423.
    [84]
    Salvador García and Francisco Herrera. 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17, 3 (2009), 275--306.
    [85]
    Vicente García, Roberto Alejo, José Salvador Sánchez, José Martínez Sotoca, and Ramón Alberto Mollineda. 2006. Combined effects of class imbalance and class overlap on instance-based classification. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 371--378.
    [86]
    Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2008. A new performance evaluation method for two-class imbalanced problems. In Structural, Syntactic, and Statistical Pattern Recognition. Springer, 917--925.
    [87]
    Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2009. Index of balanced accuracy: A performance measure for skewed class distributions. In Pattern Recognition and Image Analysis. Springer, 441--448.
    [88]
    Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2010. Theoretical analysis of a performance measure for imbalanced data. In 2010 20th International Conference on Pattern Recognition (ICPR). IEEE, 617--620.
    [89]
    Alireza Ghasemi, Mohammad T. Manzuri, Hamid R. Rabiee, Mohammad H. Rohban, and Siavash Haghiri. 2011a. Active one-class learning by kernel density estimation. In 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.
    [90]
    Alireza Ghasemi, Hamid R. Rabiee, Mohsen Fadaee, Mohammad T. Manzuri, and Mohammad H. Rohban. 2011b. Active learning from positive and unlabeled data. In 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE, 244--250.
    [91]
    Adel Ghazikhani, Reza Monsefi, and Hadi Sadoghi Yazdi. 2014. Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybernet. 5, 1 (2014), 51--62.
    [92]
    Clive W. Granger. 1999. Outline of forecast theory using generalized cost functions. Span. Econ. Rev. 1, 2 (1999), 161--173.
    [93]
    Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing. Springer, 878--887.
    [94]
    David J. Hand. 2009. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learn. 77, 1 (2009), 103--123.
    [95]
    Peter. E. Hart. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14 (1968), 515--516.
    [96]
    Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 1322--1328.
    [97]
    Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Knowl. Data Eng. 21, 9 (2009), 1263--1284.
    [98]
    Haibo He and Yunqian Ma. 2013. Imbalanced Learning: Foundations, Algorithms, and Applications. John Wiley & Sons.
    [99]
    José Hernández-Orallo. 2012. Soft (gaussian CDE) regression models and loss functions. arXiv Preprint arXiv:1211.1043 (2012).
    [100]
    José Hernández-Orallo. 2013. {ROC} curves for regression. Pattern Recogn. 46, 12 (2013), 3395--3411.
    [101]
    José Hernández-Orallo. 2014. Probabilistic reframing for cost-sensitive regression. ACM Trans. Knowl. Discov. Data 8, 4, Article 17 (Aug. 2014), 55 pages.
    [102]
    José Hernández-Orallo, Peter Flach, and César Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 1 (2012), 2813--2869.
    [103]
    Robert C. Holte, Liane E. Acker, and Bruce W. Porter. 1989. Concept learning and the problem of small disjuncts. In IJCAI, Vol. 89. Citeseer, 813--818.
    [104]
    Junjie Hu. 2012. Active learning for imbalance problem using L-GEM of RBFNN. In ICMLC. 490--495.
    [105]
    Shengguo Hu, Yanfeng Liang, Lintao Ma, and Ying He. 2009. MSMOTE: Improving classification performance when training data is imbalanced. In Second International Workshop on Computer Science and Engineering, 2009. WCSE'09, Vol. 2. IEEE, 13--17.
    [106]
    Kaizhu Huang, Haiqin Yang, Irwin King, and Michael R. Lyu. 2004. Learning classifiers from imbalanced data based on biased minimax probability machine. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 2. IEEE, II--558.
    [107]
    Jae Pil Hwang, Seongkeun Park, and Euntai Kim. 2011. A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst. Appl. 38, 7 (2011), 8580--8585.
    [108]
    Tasadduq Imam, Kai Ming Ting, and Joarder Kamruzzaman. 2006. z-SVM: An SVM for improved classification of imbalanced data. In AI 2006: Advances in Artificial Intelligence. Springer, 264--273.
    [109]
    Nathalie Japkowicz. 2000. Learning from imbalanced data sets: A comparison of various strategies. In AAAI Workshop on Learning from Imbalanced Data Sets, Vol. 68. Menlo Park, CA.
    [110]
    Nathalie Japkowicz. 2001. Concept-learning in the presence of between-class and within-class imbalances. In Advances in Artificial Intelligence. Springer, 67--77.
    [111]
    Nathalie Japkowicz. 2003. Class imbalances: Are we focusing on the right issue. In Workshop on Learning from Imbalanced Data Sets II, Vol. 1723. 63.
    [112]
    Natalie Japkowicz. 2013. Assessment metrics for imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.
    [113]
    Nathalie Japkowicz, Catherine Myers, and Mark Gluck. 1995. A novelty detection approach to classification. In IJCAI. 518--523.
    [114]
    Nathalie Japkowicz and Mohak Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press.
    [115]
    Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449.
    [116]
    Piyasak Jeatrakul, Kok Wai Wong, and Chun Che Fung. 2010. Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In Neural Information Processing. Models and Applications. Springer, 152--159.
    [117]
    Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 40--49.
    [118]
    Mahesh V. Joshi, Vipin Kumar, and Ramesh C. Agarwal. 2001. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001. IEEE, 257--264.
    [119]
    Pilsung Kang and Sungzoon Cho. 2006. EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. In Neural Information Processing. Springer, 837--846.
    [120]
    Taghi M. Khoshgoftaar, Chris Seiffert, Jason Van Hulse, Amri Napolitano, and Andres Folleco. 2007. Learning with limited minority class data. In Sixth International Conference on Machine Learning and Applications, 2007. ICMLA 2007. IEEE, 348--353.
    [121]
    Sotiris Kotsiantis, Dimitris Kanellopoulos, and Panayiotis Pintelas. 2006. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 30, 1 (2006), 25--36.
    [122]
    Sotiris Kotsiantis and Panagiotis Pintelas. 2003. Mixture of expert agents for handling imbalanced data sets. Ann. Math. Comput. Teleinform. 1, 1 (2003), 46--55.
    [123]
    Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30, 2--3 (1998), 195--215.
    [124]
    Miroslav Kubat and Stan Matwin. 1997. Addressing the curse of imbalanced training sets: One-sided selection. In Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 179--186.
    [125]
    Jorma Laurikkala. 2001. Improving Identification of Difficult Small Classes by Balancing Class Distribution. Springer.
    [126]
    Hyoung-joo Lee and Sungzoon Cho. 2006. The novelty detection approach for different degrees of class imbalance. In Neural Information Processing. Springer, 21--30.
    [127]
    Sauchi Stephen Lee. 1999. Regularization in skewed binary classification. Comput. Stat. 14, 2 (1999), 277.
    [128]
    Sauchi Stephen Lee. 2000. Noisy replication in skewed binary classification. Comput. Stat. Data Anal. 34, 2 (2000), 165--191.
    [129]
    Tae-Hwy Lee. 2008. Loss functions in time series forecasting. International Encyclopedia of the Social Sciences (2008).
    [130]
    Chen Li, Chen Jing, and Gao Xin-tao. 2009. An improved P-SVM method used to deal with imbalanced data sets. In IEEE International Conference on Intelligent Computing and Intelligent Systems, 2009. ICIS 2009, Vol. 1. IEEE, 118--122.
    [131]
    Kewen Li, Wenrong Zhang, Qinghua Lu, and Xianghua Fang. 2014. An improved SMOTE imbalanced data classification method based on support degree. In 2014 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI). IEEE, 34--38.
    [132]
    Peng Li, Pei-Li Qiao, and Yuan-Chao Liu. 2008. A hybrid re-sampling method for SVM learning from imbalanced data sets. In Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008. FSKD'08. Vol. 2. IEEE, 65--69.
    [133]
    M. Lichman. 2013. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
    [134]
    Chun-Fu Lin and Sheng-De Wang. 2002. Fuzzy support vector machines. IEEE Trans. Neur. Network. 13, 2 (2002), 464--471.
    [135]
    Alexander Liu, Joydeep Ghosh, and Cheryl E. Martin. 2007. Generative oversampling for mining imbalanced datasets. In DMIN. 66--72.
    [136]
    Wei Liu, Sanjay Chawla, David A. Cieslak, and Nitesh V. Chawla. 2010. A robust decision tree algorithm for imbalanced data sets. In SDM, Vol. 10. SIAM, 766--777.
    [137]
    Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybernet. B 39, 2 (2009), 539--550.
    [138]
    Yang Liu, Aijun An, and Xiangji Huang. 2006. Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Advances in Knowledge Discovery and Data Mining. Springer, 107--118.
    [139]
    Victoria López, Alberto Fernández, Salvador García, Vasile Palade, and Francisco Herrera. 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250 (2013), 113--141.
    [140]
    Victoria López, Alberto Fernández, and Francisco Herrera. 2014. On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Inform. Sci. 257 (2014), 1--13.
    [141]
    José María Luna, Cristóbal Romero, José Raúl Romero, and Sebastián Ventura. 2015. An evolutionary algorithm for the discovery of rare class association rules in learning management systems. Appl. Intell. 42, 3 (2015), 501--513.
    [142]
    Tomasz Maciejewski and Jerzy Stefanowski. 2011. Local neighbourhood extension of SMOTE for mining imbalanced data. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 104--111.
    [143]
    Satyam Maheshwari, Jitendra Agrawal, and Sanjeev Sharma. 2011. A new approach for classification of highly imbalanced datasets using evolutionary algorithms. Intl. J. Sci. Eng. Res 2 (2011), 1--5.
    [144]
    Marcus A. Maloof. 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, Vol. 2. 2--1.
    [145]
    Larry Manevitz and Malik Yousef. 2002. One-class SVMs for document classification. J. Mach. Learn. Res. 2 (2002), 139--154.
    [146]
    Olvi L. Mangasarian and Edward W. Wild. 2001. Proximal support vector machine classifiers. In Proceedings KDD-2001: Knowledge Discovery and Data Mining. Citeseer.
    [147]
    Veenu Mangat and Renu Vig. 2014. Intelligent rule mining algorithm for classification over imbalanced data. J. Emerg. Technol. Web Intell. 6, 3 (2014), 373--379.
    [148]
    Inderjeet Mani and Jianping Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets.
    [149]
    José Manuel Martínez-García, Carmen Paz Suárez-Araujo, and Patricio García Báez. 2012. SNEOM: A sanger network based extended over-sampling method. application to imbalanced biomedical datasets. In Neural Information Processing. Springer, 584--592.
    [150]
    David Mease, Abraham Wyner, and Andreas Buja. 2007. Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Mach. Learn. Res. 8 (2007), 409--439.
    [151]
    Giovanna Menardi and Nicola Torelli. 2010. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. (2010), 1--31.
    [152]
    Charles E. Metz. 1978. Basic principles of ROC analysis. In Seminars in Nuclear Medicine, Vol. 8. Elsevier, 283--298.
    [153]
    Ying Mi. 2013. Imbalanced classification based on active learning SMOTE. Res. J. Appl. Sci. 5 (2013).
    [154]
    Dunja Mladenic and Marko Grobelnik. 1999. Feature selection for unbalanced class distribution and naive bayes. In ICML, Vol. 99. 258--267.
    [155]
    Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recogn. 45, 1 (2012), 521--530.
    [156]
    Douglas Mossman. 1999. Three-way rocs. Med. Dec. Mak. 19, 1 (1999), 78--89.
    [157]
    Satuluri Naganjaneyulu and Mrithyumjaya Rao Kuppa. 2013. A novel framework for class imbalance learning using intelligent under-sampling. Progr. Artif. Intell. 2, 1 (2013), 73--84.
    [158]
    Munehiro Nakamura, Yusuke Kajiwara, Atsushi Otsuka, and Haruhiko Kimura. 2013. LVQ-SMOTE--learning vector quantization based synthetic minority over--sampling technique for biomedical data. BioData Min. 6, 1 (2013), 16.
    [159]
    Krystyna Napierała, Jerzy Stefanowski, and Szymon Wilk. 2010. Learning from imbalanced data in presence of noisy and borderline examples. In Rough Sets and Current Trends in Computing. Springer, 158--167.
    [160]
    Wing WY Ng, Jiankun Hu, Daniel S. Yeung, Sha Yin, and Fabio Roli. 2014. Diversified sensitivity-based undersampling for imbalance classification problems. (2014).
    [161]
    Sang-Hoon Oh. 2011. Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74, 6 (2011), 1058--1061.
    [162]
    Ronald Pearson, Gregory Goney, and James Shwaber. 2003. Imbalanced clustering for microarray time-series. In Proceedings of the ICML, Vol. 3.
    [163]
    María Pérez-Ortiz, Pedro Antonio Gutiérrez, and César Hervás-Martínez. 2014. Projection-based ensemble learning for ordinal regression. IEEE Trans. Cybernet. 44, 5 (2014), 681--694.
    [164]
    Clifton Phua, Damminda Alahakoon, and Vincent Lee. 2004. Minority report in fraud detection: Classification of skewed data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 50--59.
    [165]
    Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004a. Class imbalances versus class overlapping: An analysis of a learning system behavior. In MICAI 2004: Advances in Artificial Intelligence. Springer, 312--321.
    [166]
    Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004b. Learning with class skews and small disjuncts. In Advances in Artificial Intelligence--SBIA 2004. Springer, 296--306.
    [167]
    Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Diego F. Silva. 2014. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inform. Syst. (2014), 1--24.
    [168]
    Foster J. Provost and Tom Fawcett. 1997. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD, Vol. 97. 43--48.
    [169]
    Foster J Provost, Tom Fawcett, and Ron Kohavi. 1998. The case against accuracy estimation for comparing induction algorithms. In ICML'98: Proc. of the 15th Int. Conf. on Machine Learning. Morgan Kaufmann Publishers, 445--453.
    [170]
    Troy Raeder, George Forman, and Nitesh V. Chawla. 2012. Learning from imbalanced data: Evaluation matters. In Data Mining: Foundations and Intelligent Paradigms. Springer, 315--331.
    [171]
    Enislay Ramentol, Yailé Caballero, Rafael Bello, and Francisco Herrera. 2012a. SMOTE-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inform. Syst. 33, 2 (2012), 245--265.
    [172]
    Enislay Ramentol, Nelle Verbiest, Rafael Bello, Yailé Caballero, Chris Cornelis, and Francisco Herrera. 2012b. SMOTE-FRST: A new resampling method using fuzzy rough set theory. In 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to Appear).
    [173]
    Romesh Ranawana and Vasile Palade. 2006. Optimized precision-a new measure for classifier performance evaluation. In IEEE Congress on Evolutionary Computation, 2006. CEC 2006. IEEE, 2254--2261.
    [174]
    Bhavani Raskutti and Adam Kowalczyk. 2004. Extreme re-balancing for SVMs: A case study. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 60--69.
    [175]
    Rita P. Ribeiro. 2011. Utility-based Regression. Ph.D. Dissertation. Dep. Computer Science, Faculty of Sciences, University of Porto.
    [176]
    Rita P. Ribeiro and Luís Torgo. 2003. Predicting harmful algae blooms. In Progress in Artificial Intelligence. Springer, 308--312.
    [177]
    Cornelis V. Rijsbergen. 1979. Information Retrieval. Dept. of Computer Science, University of Glasgow, 2nd edition. (1979).
    [178]
    Juan J. Rodríguez, José-Francisco Díez-Pastor, Jesús Maudes, and César García-Osorio. 2012. Disturbing neighbors ensembles of trees for imbalanced data. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 83--88.
    [179]
    José A. Sáez, Julián Luengo, Jerzy Stefanowski, and Francisco Herrera. 2015. SMOTE--IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inform. Sci. 291 (2015), 184--203.
    [180]
    Juan Pablo Sánchez-Crisostomo, Roberto Alejo, Erika López-González, Rosa María Valdovinos, and J. Horacio Pacheco-Sánchez. 2014. Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In Advances in Swarm Intelligence. Springer, 17--23.
    [181]
    Javier Sánchez-Monedero, Pedro Antonio Gutiérrez, and Cesar Hervás-Martínez. 2013. Evolutionary ordinal extreme learning machine. In Hybrid Artificial Intelligent Systems. Springer, 500--509.
    [182]
    Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neur. Comput. 13, 7 (2001), 1443--1471.
    [183]
    Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Andres Folleco. 2011. An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inform. Sci. (2011).
    [184]
    Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2010. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans.Syst. Man Cybernet. A 40, 1 (2010), 185--197.
    [185]
    Shiven Sharma, Colin Bellinger, and Nathalie Japkowicz. 2012. Clustering based one-class classification for compliance verification of the comprehensive nuclear-test-ban treaty. In Advances in Artificial Intelligence. Springer, 181--193.
    [186]
    Atish P. Sinha and Jerrold H. May. 2004. Evaluating and tuning predictive data mining models using receiver operating characteristic curves. J. Manag. Inform. Syst. 21, 3 (2004), 249--280.
    [187]
    Parinaz Sobhani, Herna Viktor, and Stan Matwin. 2014. Learning from imbalanced data using ensemble methods and cluster-based undersampling. In New Frontiers in Mining Complex Patterns. Springer, 69--83.
    [188]
    Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inform. Process. Manag. 45, 4 (2009), 427--437.
    [189]
    Jie Song, Xiaoling Lu, and Xizhi Wu. 2009. An improved AdaBoost algorithm for unbalanced classification data. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Vol. 1. IEEE, 109--113.
    [190]
    Panote Songwattanasiri and Krung Sinapiromsaran. 2010. SMOUTE: Synthetics minority over-sampling and under-sampling techniques for class imbalanced problem. In Proceedings of the Annual International Conference on Computer Science Education: Innovation and Technology, Special Track: Knowledge Discovery. 78--83.
    [191]
    Jerzy Stefanowski. 2016. Dealing with data difficulty factors while learning from imbalanced data. In Challenges in Computational Statistics and Data Mining. Springer, 333--363.
    [192]
    Jerzy Stefanowski and Szymon Wilk. 2008. Selective pre-processing of imbalanced data for improving classification performance. In Data Warehousing and Knowledge Discovery. Springer, 283--292.
    [193]
    Yanmin Sun, Mohamed S. Kamel, and Yang Wang. 2006. Boosting for learning multiple classes with imbalanced class distribution. In Sixth International Conference on Data Mining, 2006. ICDM'06. IEEE, 592--602.
    [194]
    Yanmin Sun, Mohamed S. Kamel, Andrew K. C. Wong, and Yang Wang. 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 12 (2007), 3358--3378.
    [195]
    Yanmin Sun, Andrew K. C. Wong, and Mohamed S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recogn. Artif. Intell. 23, 4 (2009), 687--719.
    [196]
    Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 10 (2012), 3738--3750.
    [197]
    Aik Tan, David Gilbert, and Yves Deville. 2003. Multi-class protein fold classification using a new ensemble machine learning approach. (2003).
    [198]
    Yuchun Tang and Yan-Qing Zhang. 2006. Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In 2006 IEEE International Conference on Granular Computing. IEEE, 457--460.
    [199]
    Yuchun Tang, Yan-Qing Zhang, Nitesh V. Chawla, and Sven Krasser. 2009. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybernet. B 39, 1 (2009), 281--288.
    [200]
    Dacheng Tao, Xiaoou Tang, Xuelong Li, and Xindong Wu. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7 (2006), 1088--1099.
    [201]
    Nguyen Thai-Nghe, Zeno Gantner, and Lars Schmidt-Thieme. 2011. A new evaluation measure for learning from imbalanced data. In The 2011 International Joint Conference on Neural Networks (IJCNN). IEEE, 537--542.
    [202]
    Ivan Tomek. 1976. Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 11 (1976), 769--772.
    [203]
    Luís Torgo. 2005. Regression error characteristic surfaces. In KDD'05: Proc. of the 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, 697--702.
    [204]
    Luís Torgo and Rita P. Ribeiro. 2003. Predicting outliers. In Knowledge Discovery in Databases: PKDD 2003. Springer, 447--458.
    [205]
    Luís Torgo and Rita P. Ribeiro. 2007. Utility-based regression. In PKDD'07: Proc. of 11th European Conf. on Principles and Practice of Knowledge Discovery in Databases. Springer, 597--604.
    [206]
    Luís Torgo and Rita P. Ribeiro. 2009. Precision and recall in regression. In DS'09: 12th Int. Conf. on Discovery Science. Springer, 332--346.
    [207]
    Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, and Paula Branco. 2013. SMOTE for regression. In Progress in Artificial Intelligence. Springer, 378--389.
    [208]
    Peter Van Der Putten and Maarten Van Someren. 2004. A bias-variance analysis of a real-world learning problem: The coil challenge 2000. Mach. Learn. 57, 1--2 (2004), 177--195.
    [209]
    Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th International Conference on Machine Learning. ACM, 935--942.
    [210]
    Madireddi Vasu and Vadlamani Ravi. 2011. A hybrid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance. Int. J. Data Min. Model. Manag. 3, 1 (2011), 75--105.
    [211]
    Nele Verbiest, Enislay Ramentol, Chris Cornelis, and Francisco Herrera. 2012. Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In Advances in Artificial Intelligence--IBERAMIA 2012. Springer, 169--178.
    [212]
    Konstantinos Veropoulos, Colin Campbell, and Nello Cristianini. 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1999. Citeseer, 55--60.
    [213]
    Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11 (2010), 3371--3408.
    [214]
    Kiri L. Wagstaff, Nina L. Lanza, David R. Thompson, Thomas G. Dietterich, and Martha S. Gilmore. 2013. Guiding scientific discovery with explanations using DEMUD. In AAAI.
    [215]
    Byron C. Wallace and Issa J. Dahabreh. 2012. Class probability estimates are unreliable for imbalanced data (and how to fix them). In 2012 IEEE 12th International Conference on Data Mining (ICDM). IEEE, 695--704.
    [216]
    Byron C. Wallace and Issa J. Dahabreh. 2014. Improving class probability estimates for imbalanced data. Knowl. Inform. Syst. 41, 1 (2014), 33--52.
    [217]
    Byron C. Wallace, Kevin Small, Carla E. Brodley, and Thomas A. Trikalinos. 2011. Class imbalance, redux. In 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, 754--763.
    [218]
    Benjamin X. Wang and Nathalie Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowl. Inform. Syst. 25, 1 (2010), 1--20.
    [219]
    Heng Wang and Zubin Abraham. 2015. Concept drift detection for imbalanced stream data. arXiv Preprint arXiv:1504.01044 (2015).
    [220]
    He-Yong Wang. 2008. Combination approach of SMOTE and biased-SVM for imbalanced datasets. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 228--231.
    [221]
    Shuo Wang and Xin Yao. 2009. Diversity analysis on imbalanced data sets by using ensemble models. In IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE, 324--331.
    [222]
    Xiaoguang Wang, Xuan Liu, Nathalie Japkowicz, and Stan Matwin. 2013a. Resampling and cost-sensitive methods for imbalanced multi-instance learning. In 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW). IEEE, 808--816.
    [223]
    Xiaoguang Wang, Stan Matwin, Nathalie Japkowicz, and Xuan Liu. 2013b. Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In Advances in Artificial Intelligence. Springer, 174--186.
    [224]
    Mike Wasikowski and Xue-wen Chen. 2010. Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1388--1400.
    [225]
    Deng Weiguo, Wang Li, Wang Yiyang, and Qian Zhong. 2012. An improved SVM-KM model for imbalanced datasets. In 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE). IEEE, 100--103.
    [226]
    Gary M. Weiss. 2004. Mining with rarity: A unifying framework. SIGKDD Explor. Newslett. 6, 1 (2004), 7--19.
    [227]
    Gary M. Weiss. 2005. Mining with rare cases. In Data Mining and Knowledge Discovery Handbook. Springer, 765--776.
    [228]
    Gary M. Weiss. 2010. The impact of small disjuncts on classifier learning. In Data Mining. Springer, 193--226.
    [229]
    Gary M. Weiss. 2013. Foundations of imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.
    [230]
    Gary M. Weiss and Foster J. Provost. 2003. Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Intell. Res.(JAIR) 19 (2003), 315--354.
    [231]
    Cheng G. Weng and Josiah Poon. 2008. A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference-Volume 87. Australian Computer Society, Inc., 27--32.
    [232]
    Gang Wu and Edward Y. Chang. 2003. Class-boundary alignment for imbalanced dataset learning. In ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC. 49--56.
    [233]
    Gang Wu and Edward Y. Chang. 2005. KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 6 (2005), 786--795.
    [234]
    Shaomin Wu, Peter Flach, and César Ferri. 2007. An improved model selection heuristic for AUC. In ECML. Springer, 478--489.
    [235]
    Jin Xiao, Ling Xie, Changzheng He, and Xiaoyi Jiang. 2012. Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst. Appl. 39, 3 (2012), 3668--3675.
    [236]
    Li Xuan, Chen Zhigang, and Yang Fan. 2013. Exploring of clustering algorithm on class-imbalanced data. In 2013 8th International Conference on Computer Science & Education (ICCSE). IEEE, 89--93.
    [237]
    Zeping Yang and Daqi Gao. 2012. An active under-sampling approach for imbalanced data classification. In 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID). Vol. 2. IEEE, 270--273.
    [238]
    Show-Jane Yen and Yue-Shi Lee. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731--740.
    [239]
    Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36, 3 (2009), 5718--5727.
    [240]
    Yang Yong. 2012. The research of imbalanced data set of sample sampling method based on K-means cluster and genetic algorithm. Energy Procedia 17 (2012), 164--170.
    [241]
    Kihoon Yoon and Stephen Kwek. 2005. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In Fifth International Conference on Hybrid Intelligent Systems, 2005. HIS'05. IEEE, 6--pp.
    [242]
    Dai Yuanhong, Chen Hongchang, and Peng Tao. 2009. Cost-sensitive support vector machine based on weighted attribute. In International Forum on Information Technology and Applications, 2009. IFITA'09, Vol. 1. IEEE, 690--692.
    [243]
    Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, 435--442.
    [244]
    Arnold Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. J. Am. Statist. Assoc. 81, 394 (1986), 446--451.
    [245]
    Dongmei Zhang, Wei Liu, Xiaosheng Gong, and Hui Jin. 2011. A novel improved SMOTE resampling algorithm based on fractal. J. Comput. Inform. Syst. 7, 6 (2011), 2204--2211.
    [246]
    Huaxiang Zhang and Mingfang Li. 2014. RWO-sampling: A random walk over-sampling approach to imbalanced data classification. Inform. Fus. 20 (2014), 99--116.
    [247]
    Huimin Zhao, Atish P. Sinha, and Gaurav Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Dec. Support Syst. 51, 3 (2011), 372--383.
    [248]
    Zhaohui Zheng, Xiaoyun Wu, and Rohini Srihari. 2004. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 80--89.
    [249]
    Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 63--77.
    [250]
    Jingbo Zhu and Eduard H. Hovy. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In EMNLP-CoNLL, Vol. 7. 783--790.
    [251]
    Ling Zhuang and Honghua Dai. 2006a. Parameter estimation of one-class SVM on imbalance text classification. In Advances in Artificial Intelligence. Springer, 538--549.
    [252]
    Ling Zhuang and Honghua Dai. 2006b. Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1, 7 (2006), 32--40.

    Cited By

    View all
    • (2024)A detailed study of resampling algorithms for cyberattack classification in engineering applicationsPeerJ Computer Science10.7717/peerj-cs.197510(e1975)Online publication date: 15-Apr-2024
    • (2024)Multi-Class Imbalanced Data Classification: A Systematic Mapping StudyEngineering, Technology & Applied Science Research10.48084/etasr.720614:3(14183-14190)Online publication date: 1-Jun-2024
    • (2024)MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced dataAIMS Mathematics10.3934/math.20248519:7(17504-17530)Online publication date: 2024
    • Show More Cited By

    Index Terms

    1. A Survey of Predictive Modeling on Imbalanced Domains

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 49, Issue 2
      June 2017
      747 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2966278
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 August 2016
      Accepted: 01 March 2016
      Revised: 01 March 2016
      Received: 01 May 2015
      Published in CSUR Volume 49, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Imbalanced domains
      2. classification
      3. performance metrics
      4. rare cases
      5. regression

      Qualifiers

      • Survey
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)472
      • Downloads (Last 6 weeks)45

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A detailed study of resampling algorithms for cyberattack classification in engineering applicationsPeerJ Computer Science10.7717/peerj-cs.197510(e1975)Online publication date: 15-Apr-2024
      • (2024)Multi-Class Imbalanced Data Classification: A Systematic Mapping StudyEngineering, Technology & Applied Science Research10.48084/etasr.720614:3(14183-14190)Online publication date: 1-Jun-2024
      • (2024)MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced dataAIMS Mathematics10.3934/math.20248519:7(17504-17530)Online publication date: 2024
      • (2024)Image Synthesis for Solar Flare PredictionThe Astrophysical Journal Supplement Series10.3847/1538-4365/ad1dd4271:1(29)Online publication date: 1-Mar-2024
      • (2024)A Machine Learning Approach to Predict Relative Residual Strengths of Recycled Aggregate Concrete after Exposure to High TemperaturesSustainability10.3390/su1605189116:5(1891)Online publication date: 25-Feb-2024
      • (2024)Exploring the Impact of the NULL Class on In-the-Wild Human Activity RecognitionSensors10.3390/s2412389824:12(3898)Online publication date: 16-Jun-2024
      • (2024)Differentiating Pressure Ulcer Risk Levels through Interpretable Classification Models Based on Readily Measurable IndicatorsHealthcare10.3390/healthcare1209091312:9(913)Online publication date: 27-Apr-2024
      • (2024)Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class OverlappingApplied Sciences10.3390/app1413584514:13(5845)Online publication date: 4-Jul-2024
      • (2024)The expert's knowledge combined with AI outperforms AI alone in seizure onset zone localization using resting state fMRIFrontiers in Neurology10.3389/fneur.2023.132446114Online publication date: 11-Jan-2024
      • (2024)Forecasting ocean hypoxia in salmonid fish farmsFrontiers in Aquaculture10.3389/faquc.2024.13651233Online publication date: 9-Jul-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media