Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data

  • Conference paper
  • First Online:
Proceedings of 2013 Chinese Intelligent Automation Conference

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 256))

Abstract

Classification complexity measures play an important role in classifier selection and are primarily designed for balanced data. Focusing on binary classification, this paper proposes a novel methodology to evaluate their validity on imbalanced data. The twelve complexity measures composed by Ho are evaluated on synthetic imbalanced data sets with various probability distributions, various boundary shapes and various data skewness. The experimental results demonstrate that most of the complexity measures are statistically changeable as data skewness varies. They need to be revised and improved for imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge

    Google Scholar 

  2. Ho TK, Basu M, Law MHC (2006) In: Basu M, Ho TK (eds) Data complexity in pattern recognition. Measures of geometrical complexity in classification problems, Springer, Berlin, pp 3–24

    Google Scholar 

  3. Moran S, He Y, Liu K (2009) Choosing the best bayesian classifier: an empirical study. IAENG Int J Comput Sci 36(4):9–19

    Google Scholar 

  4. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):298–300

    Google Scholar 

  5. Sun Y, Wong AC, Kamel MS (2009) Classification of Imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719

    Article  Google Scholar 

  6. Ho TK (2008) Data complexity analysis: linkage between context and solution in classification. Lect Notes Comput Sci 5342:986–995

    Article  Google Scholar 

  7. Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112

    Article  MathSciNet  MATH  Google Scholar 

  8. Weng CG, Poon J (2010) CODE: a data complexity framework for imbalanced datasets. Lect Notes Artif Intell 5569:16–27

    Google Scholar 

  9. Moore DS, McCabe GP, Craig BA (2009) Introduction to the practice of statistics, 6th edn. W.H. Freeman, New York

    Google Scholar 

  10. Orriols-Puig A, Macia N, Ho TK (2010) Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by the Science and Technology Project of Guangdong Province (No. 2012B050600028, No. 2012B091000171 and No. 2011B090400460) and the National Natural Science Foundation of China (No. 61074147, No. 60374062).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Xing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xing, Y., Cai, H., Cai, Y., Hejlesen, O., Toft, E. (2013). Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data. In: Sun, Z., Deng, Z. (eds) Proceedings of 2013 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38466-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38466-0_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38465-3

  • Online ISBN: 978-3-642-38466-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics