Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Class imbalances versus small disjuncts

Published: 01 June 2004 Publication History

Abstract

It is often assumed that class imbalances are responsible for significant losses of performance in standard classifiers. The purpose of this paper is to the question whether class imbalances are truly responsible for this degradation or whether it can be explained in some other way. Our experiments suggest that the problem is not directly caused by class imbalances, but rather, that class imbalances may yield small disjuncts which, in turn, will cause degradation. We argue that, in order to improve classifier performance, it may, then, be more useful to focus on the small disjuncts problem than it is to focus on the class imbalance problem. We experiment with a method that takes the small disjunct problem into consideration, and show that, indeed, it yields a performance superior to the performance obtained using standard or advanced solutions to the class imbalance problem.

References

[1]
M. Kubat, R. Holte, and S. Matwin. Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30: 195--215, 1998.]]
[2]
T. E. Fawcett and F. Provost. Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 3 (1): 291--316, 1997.]]
[3]
D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning", In Proceedings of the Eleventh International Conference of Machine Learning, pages 148--156, 1994.]]
[4]
P. M. Murphy and D. W. Aha. UCI Repository of Machine Learning Databases. University California at Irvine, Department of Information and Computer Science.]]
[5]
R. C. Holte, L. E. Acker, and B. W. Porter. Concept Learning and the Problem of Small Disjuncts, In Proceedings of the Eleventh Joint International Conference on Artificial Intelligence, pages 813--818, 1989.]]
[6]
Pearson R. K, Gonye, G. E and Schwaber, J. S., "Imbalanced Clustering of Microarray Time-Series", In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.]]
[7]
N. Japkowicz. Concept Learning in the Presence of Between-Class and Within-Class Imbalance. Advances in Artificial Intelligence: Proceedings of the 14th Converences of the Canadian Society for Computational Studies of Intelligence, pages 67--77, 2001.]]
[8]
A. Nickerson, N. Japkowicz, and E. Millos, "Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets", In Proceedings of the 8th International Workshop on AI and Statistics, pages 261--265, 2001.]]
[9]
G. An. The Effects of Adding Noise During Backpropagation Training on a Generalization Performance, Neural Computation, 8: 643--674, 1996.]]
[10]
N. Japkowicz and S. Shaju, "The Class Imbalance Problem: A Systematic Study", Intelligent Data Analysis, Volume 6, Number 5, pp. 429--450, 2002.]]
[11]
Visa, S. and Ralescu, A., "Learning Imbalanced and Overlapping Classes using Fuzzy Sets", In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.]]
[12]
A. Estabrooks, T. Jo, and N. Japkowicz, "A Multiple Resampling Method for Learning from Imbalanced Data Sets", Computational Intelligence, 28(1): in press, 2004.]]
[13]
G. M. Weiss, "Learning with Rare Case and Small Disjuncts", In Proceedings of 17th International Conference on Machine Learning, 558--565, 1995]]
[14]
G. M. Weiss "A Quantitive Study of Small Disjuncts", In Proceedings of the Seventeenth National Conference on Artificial Intelligence, 665--670, 2000]]

Cited By

View all
  • (2025)Over-sampling methods for mixed data in imbalanced problemsCommunications in Statistics - Simulation and Computation10.1080/03610918.2024.2447451(1-23)Online publication date: 2-Jan-2025
  • (2025)Coupling importance sampling neural network for imbalanced data classification with multi-level learning biasNeurocomputing10.1016/j.neucom.2025.129427623(129427)Online publication date: Mar-2025
  • (2025)HIDIM: A novel framework of network intrusion detection for hierarchical dependency and class imbalanceComputers & Security10.1016/j.cose.2024.104155148(104155)Online publication date: Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 6, Issue 1
Special issue on learning from imbalanced datasets
June 2004
117 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1007730
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2004
Published in SIGKDD Volume 6, Issue 1

Check for updates

Author Tags

  1. between-class imbalance
  2. class imbalance
  3. rare cases
  4. resampling
  5. small disjuncts
  6. within-class imbalance

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Over-sampling methods for mixed data in imbalanced problemsCommunications in Statistics - Simulation and Computation10.1080/03610918.2024.2447451(1-23)Online publication date: 2-Jan-2025
  • (2025)Coupling importance sampling neural network for imbalanced data classification with multi-level learning biasNeurocomputing10.1016/j.neucom.2025.129427623(129427)Online publication date: Mar-2025
  • (2025)HIDIM: A novel framework of network intrusion detection for hierarchical dependency and class imbalanceComputers & Security10.1016/j.cose.2024.104155148(104155)Online publication date: Jan-2025
  • (2025)Glo-net: A dual task branch based neural network for multi-class glomeruli segmentationComputers in Biology and Medicine10.1016/j.compbiomed.2025.109670186(109670)Online publication date: Mar-2025
  • (2024)GIR-based canonical forest: An ensemble method for imbalanced big dataKorean Journal of Applied Statistics10.5351/KJAS.2024.37.5.61537:5(615-629)Online publication date: 31-Oct-2024
  • (2024)Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in ManufacturingSensors10.3390/s2415500924:15(5009)Online publication date: 2-Aug-2024
  • (2024)Handling the Imbalanced Problem in Agri-Food Data AnalysisFoods10.3390/foods1320330013:20(3300)Online publication date: 17-Oct-2024
  • (2024)Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification LearningTsinghua Science and Technology10.26599/TST.2023.901000629:1(216-231)Online publication date: Feb-2024
  • (2024)A Comprehensive Survey on Rare Event PredictionACM Computing Surveys10.1145/369995557:3(1-39)Online publication date: 11-Nov-2024
  • (2024)GANs in the Panorama of Synthetic Data Generation MethodsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365729421:1(1-28)Online publication date: 10-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media