Balancing Strategies and Class Overlapping

Batista, Gustavo E. A. P. A.; Prati, Ronaldo C.; Monard, Maria C.

doi:10.1007/11552253_3

Gustavo E. A. P. A. Batista^21,22,
Ronaldo C. Prati²¹ &
Maria C. Monard²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

2185 Accesses
33 Citations

Abstract

Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over-sampling methods seem to outperform under-sampling methods.

This research is partly supported by Brazilian Research Councils CAPES and FAPESP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1997)
MATH Google Scholar
Japkowicz, N.: Class Imbalances: Are We Focusing on the Right Issue?. In: Proc. of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (II), Washington, DC, USA (2003)
Google Scholar
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. Technical Report A-2001-2, University of Tampere (2001)
Google Scholar
Marzban, C.: The ROC Curve and the Area Under it as a Performance Measure. Weather and Forecasting 19(6), 1106–1114 (2004)
Article MathSciNet Google Scholar
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Class Imbalances versus Class Overlapping: an Analysis of a Learning System Behavior. In: Lim, J.-I., Lee, D.-H. (eds.) ICISC 2003. LNCS, vol. 2971, pp. 312–321. Springer, Heidelberg (2004)
Google Scholar
Provost, F., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52, 199–215 (2003)
Article MATH Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Weiss, G.M., Provost, F.: Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
MATH Google Scholar
Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans. on Systems, Management, and Communications 2(3), 408–421 (1972)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science at University of São Paulo, P. O. Box 668, ZIP Code 13560-970, São Carlos (SP), Brazil
Gustavo E. A. P. A. Batista, Ronaldo C. Prati & Maria C. Monard
Faculty of Computer Engineering at Pontifical Catholic University of Campinas, Rodovia D. Pedro I, Km 136, ZIP Code 13086-900, Campinas (SP), Brazil
Gustavo E. A. P. A. Batista

Authors

Gustavo E. A. P. A. Batista
View author publications
You can also search for this author in PubMed Google Scholar
Ronaldo C. Prati
View author publications
You can also search for this author in PubMed Google Scholar
Maria C. Monard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista, G.E.A.P.A., Prati, R.C., Monard, M.C. (2005). Balancing Strategies and Class Overlapping. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_3

Download citation

DOI: https://doi.org/10.1007/11552253_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Balancing Strategies and Class Overlapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Balancing Strategies and Class Overlapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation

Overlap-Based Undersampling for Improving Imbalanced Data Classification

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation