Abstract
Classification complexity measures play an important role in classifier selection and are primarily designed for balanced data. Focusing on binary classification, this paper proposes a novel methodology to evaluate their validity on imbalanced data. The twelve complexity measures composed by Ho are evaluated on synthetic imbalanced data sets with various probability distributions, various boundary shapes and various data skewness. The experimental results demonstrate that most of the complexity measures are statistically changeable as data skewness varies. They need to be revised and improved for imbalanced data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
Ho TK, Basu M, Law MHC (2006) In: Basu M, Ho TK (eds) Data complexity in pattern recognition. Measures of geometrical complexity in classification problems, Springer, Berlin, pp 3–24
Moran S, He Y, Liu K (2009) Choosing the best bayesian classifier: an empirical study. IAENG Int J Comput Sci 36(4):9–19
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):298–300
Sun Y, Wong AC, Kamel MS (2009) Classification of Imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719
Ho TK (2008) Data complexity analysis: linkage between context and solution in classification. Lect Notes Comput Sci 5342:986–995
Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112
Weng CG, Poon J (2010) CODE: a data complexity framework for imbalanced datasets. Lect Notes Artif Intell 5569:16–27
Moore DS, McCabe GP, Craig BA (2009) Introduction to the practice of statistics, 6th edn. W.H. Freeman, New York
Orriols-Puig A, Macia N, Ho TK (2010) Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle
Acknowledgments
This work is partially supported by the Science and Technology Project of Guangdong Province (No. 2012B050600028, No. 2012B091000171 and No. 2011B090400460) and the National Natural Science Foundation of China (No. 61074147, No. 60374062).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xing, Y., Cai, H., Cai, Y., Hejlesen, O., Toft, E. (2013). Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data. In: Sun, Z., Deng, Z. (eds) Proceedings of 2013 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38466-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-38466-0_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38465-3
Online ISBN: 978-3-642-38466-0
eBook Packages: EngineeringEngineering (R0)