article

Editorial: special issue on learning from imbalanced data sets

Authors:

Nitesh V. Chawla,

Nathalie Japkowicz, and

Aleksander KotczAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 6, Issue 1

Pages 1 - 6

https://doi.org/10.1145/1007730.1007733

Published: 01 June 2004 Publication History

Get Access

References

[1]

In N. Japkowicz, editor, Proceedings of the AAAI'2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05. AAAI, 2000.]]

Google Scholar

[2]

In T. Dietterich, D. Margineantu, F. Provost, and P. Turney, editors, Proceedings of the ICML'2000 Workshop on Cost-sensitive Learning. 2000.]]

Google Scholar

[3]

In N. V. Chawla, N. Japkowicz, and A. Kotcz, editors, Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Data Sets. 2003.]]

Google Scholar

[4]

In C. Ferri, P. Flach, J. Orallo, and N. Lachice, editors, ECAI' 2004 First Workshop on ROC Analysis in AI. ECAI, 2004.]]

Google Scholar

[5]

N. Abe. Invited talk: Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond. http://www.site.uottawa.ca/~nat/Workshop2003/ICML03Workshop_Abe.ppt, 2003.]]

Google Scholar

[6]

G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20--29, 2004.]]

Digital Library

Google Scholar

[7]

M. Castillo and J. Serrano. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 6(1):70--79, 2004.]]

Digital Library

Google Scholar

[8]

P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of Knowledge Discovery and Data Mining, pages 164--168, 1998.]]

Google Scholar

[9]

N. V. Chawla. C4.5 and imbalanced datasets: Investigating the effect of ampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML'03 Workshipshop on Class Imbalances, 2003.]]

Google Scholar

[10]

N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16:321--357, 2002.]]

Digital Library

Google Scholar

[11]

N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107--119, Dubrovnik, Croatia, 2003.]]

Crossref

Google Scholar

[12]

P. Domingos, Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 155--164, San Diego, CA, 1999, ACM Press.]]

Digital Library

Google Scholar

[13]

C. Drummond and R. Holte. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 198--207, 2001.]]

Digital Library

Google Scholar

[14]

C. Drummond and R. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[15]

C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.]]

Digital Library

Google Scholar

[16]

C. Elkan. Invited talk: The real challenges in data mining: A contrarian view. http://www.site.uottawa.ca/~nat/Workshop2003/realchallenges2.ppt., 2003.]]

Google Scholar

[17]

W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassification cost-sensitive boosting. In Proceedings of Sixteenth International Conference on Machine Learning, pages 983--990, Slovenia, 1999.]]

Digital Library

Google Scholar

[18]

T. Fawcett, ROC graphs: Notes and practical considerations for researchers. http://www.hpl.hp.com/personal/Tom_Fawcett/papers/index.html, 2003.]]

Google Scholar

[19]

G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289--1305, 2003.]]

Digital Library

Google Scholar

[20]

J. Furnkranz and P. Flach. An analysis of rule evaluation metrics. In Proceedings of the Twentieth International Conference on Machine Learning, pages 202--209, 2003.]]

Google Scholar

[21]

H. Guo and H. L. Viktor. Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. SIGKDD Explorations, 6(1):30--39, 2004.]]

Digital Library

Google Scholar

[22]

I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.]]

Digital Library

Google Scholar

[23]

R. Hickey. Learning rare class footprints: the reflex algorithm. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[24]

R. Holte. Summary of the workshop. http://www.site./uottawa.ca/~nat/Workshop2003/workshop2003.html, 2003.]]

Google Scholar

[25]

N. Japkowicz. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pages 67--77, 2001.]]

Digital Library

Google Scholar

[26]

N. Japkowicz. Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1/2):97--122, 2001.]]

Digital Library

Google Scholar

[27]

N. Japkowics, Class imbalance: Are we focusing on the right issue? In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[28]

N. Japkowicz and R. Holte. Workshop report: Aaai-2000 workshop on learning from imbalanced data sets. AI Magazine, 22(1), 2001.]]

Google Scholar

[29]

N. Japkowics and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):203--231, 2002.]]

Digital Library

Google Scholar

[30]

T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40--49, 2004.]]

Digital Library

Google Scholar

[31]

M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining, pages 257--264, San Jose, CA, 2001.]]

Digital Library

Google Scholar

[32]

P. Juszczak and R. P. W. Duin. Uncertainty sampling methods for one-class classifiers. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[33]

A. Kolcz and J. Alspector. Asymmetric missing-data problems: overcoming the lack of negative data in preference ranking. Information Retrieval, 5(1):5--40, 2002.]]

Digital Library

Google Scholar

[34]

A. Kotcz, A. Chowdhury, and J. Alspector. Data duplication: An imbalance problem? In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]

Google Scholar

[35]

M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, Nashville, Tennesse, 1997, Morgan Kaufmann.]]

Google Scholar

[36]

B. Liu, Y. Dai, X. Li, W. S. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pages 19--22, 2003.]]

Digital Library

Google Scholar

[37]

M. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[38]

L. M. Manevitz and M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, 2:139--154, 2001.]]

Digital Library

Google Scholar

[39]

D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the 16th International Conference on Machine Learning, pages 258--267, 1999.]]

Digital Library

Google Scholar

[40]

K. Nigam, A. K. McCallum, s. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103--134, 2000.]]

Digital Library

Google Scholar

[41]

R. Pearson, G. Goney, and J. Shwaber. Imbalanced clustering for microarray time-series. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[42]

C. Phua and D. Alahakoon. Minority report in fraud detection: Classification of skewed data. SIGKDD Explorations, 6(1):50--59, 2004.]]

Digital Library

Google Scholar

[43]

F. Provost. Invited talk: Choosing a marginal class distribution for classifier induction. http://www.site.uottawa.ca/~nat/Workshop2003/provost.html, 2003.]]

Google Scholar

[44]

F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001.]]

Digital Library

Google Scholar

[45]

J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]]

Digital Library

Google Scholar

[46]

P. Radivojac, N. V. Chawla, K. Dunker, and Z. Obradovic. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 2004. Accepted.]]

Digital Library

Google Scholar

[47]

B. Raskutti and A. Kowalczyk. Extreme re-balancing for SVM's: a case study. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[48]

B. Raskutti and A. Kowalczyk. Extreme rebalancing for svms: a case study. SIGKDD Explorations, 6(1):60--69, 2004.]]

Digital Library

Google Scholar

[49]

B. Schölkopf, J. C. Platt, J. Shawe-Taylor. A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.]]

Digital Library

Google Scholar

[50]

D. Tax. One-class classification. PhD thesis, Delft University of Technology, 2001.]]

Google Scholar

[51]

K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of Seventeenth International Conference on Machine Learning, pages 983--990, Stanford, CA, 2000.]]

Digital Library

Google Scholar

[52]

K. M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transaction on Knowledge and Data Engineering. 14:659--665, 2002.]]

Digital Library

Google Scholar

[53]

P. Turney. Types of cost in inductive concept learning. In Proceedings of the ICML'2000 Workshop on Cost-Sensitive Learning, pages 15--21, 2000.]]

Google Scholar

[54]

S. Visa and A. Ralescu. Learning imbalanced and overlapping classes using fuzzy sets. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[55]

G. Weiss. Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1):7--19, 2004.]]

Digital Library

Google Scholar

[56]

G. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003.]]

Crossref

Google Scholar

[57]

G. Wu and E. Y. Chang. Class-boundary alignment for imbalanced dataset learning. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]

Google Scholar

[58]

B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 204--213, 2001.]]

Digital Library

Google Scholar

[59]

B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the Third IEEE International Conference on Data Mining, pages 435--442, Melbourne, FL, 2003.]]

Digital Library

Google Scholar

[60]

J. Zhang and I. Mani. knn approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]

Google Scholar

[61]

Z. Zheng and R. Srihari. Optimally combining positive and negative features for text categorization. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Date Sets, 2003.]]

Google Scholar

[62]

Z. Zheng, X. Wu, and R. Srihari. Feature selection for text categorization on imbalanced data. SIGKDD Explorations, 6(1):80--89, 2004.]]

Digital Library

Google Scholar

Cited By

View all

Kim JKim JJang WPyo JLee HByeon SLee HPark YKim S(2024)Enhancing Machine Learning Performance in Estimating CDOM Absorption Coefficient via Data ResamplingRemote Sensing10.3390/rs1613231316:13(2313)Online publication date: 25-Jun-2024
https://doi.org/10.3390/rs16132313
Zeng JZhao YZheng JZhang YShi PLi YChen GMeng XYue D(2024)Early Identification of River Blockage Disasters Caused by Debris Flows in the Bailong River Basin, ChinaRemote Sensing10.3390/rs1607130216:7(1302)Online publication date: 7-Apr-2024
https://doi.org/10.3390/rs16071302
Petrović IBroggi SKiller-Oberpfalzer MPfaff JGriessenauer CMilosavljević IBalenović AMutzenbach JPikija S(2024)Predictors of In-Hospital Mortality after Thrombectomy in Anterior Circulation Large Vessel Occlusion: A Retrospective, Machine Learning StudyDiagnostics10.3390/diagnostics1414153114:14(1531)Online publication date: 16-Jul-2024
https://doi.org/10.3390/diagnostics14141531
Show More Cited By

Index Terms

Editorial: special issue on learning from imbalanced data sets
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

Editorial

For the first time, an issue of Organised Sound has been published without a theme. This has been done, not due to the Editors' running out of appropriate ideas, but instead to allow us to publish a number of submitted manuscripts that would have had to ...
Read More
Guest editorial: Special issue on models and methodologies for co-design of embedded systems

This special issue is based on innovative ideas presented and discussed during the first ACM/IEEE Conference on Formal Methods and Models for Co-Design (MEMOCODE) held at Mont Saint Michel in France during the summer of 2003. Selected papers from the ...
Read More
Editorial: Introduction to the Special Issue on Multimedia Data Mining

The twelve papers in this special issue focus on multimedia data mining. The special issue evolved from a successful workshop organized in conjunction with the 2006 ACM KDD conference, but the special issue was open to the whole community.

Read More

Comments

Information & Contributors

Information

Published In

ACM SIGKDD Explorations Newsletter Volume 6, Issue 1

Special issue on learning from imbalanced datasets

June 2004

117 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/1007730

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2004

Published in SIGKDD Volume 6, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,630
Total Citations
View Citations
10,675
Total Downloads

Downloads (Last 12 months)466
Downloads (Last 6 weeks)58

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kim JKim JJang WPyo JLee HByeon SLee HPark YKim S(2024)Enhancing Machine Learning Performance in Estimating CDOM Absorption Coefficient via Data ResamplingRemote Sensing10.3390/rs1613231316:13(2313)Online publication date: 25-Jun-2024
https://doi.org/10.3390/rs16132313
Zeng JZhao YZheng JZhang YShi PLi YChen GMeng XYue D(2024)Early Identification of River Blockage Disasters Caused by Debris Flows in the Bailong River Basin, ChinaRemote Sensing10.3390/rs1607130216:7(1302)Online publication date: 7-Apr-2024
https://doi.org/10.3390/rs16071302
Petrović IBroggi SKiller-Oberpfalzer MPfaff JGriessenauer CMilosavljević IBalenović AMutzenbach JPikija S(2024)Predictors of In-Hospital Mortality after Thrombectomy in Anterior Circulation Large Vessel Occlusion: A Retrospective, Machine Learning StudyDiagnostics10.3390/diagnostics1414153114:14(1531)Online publication date: 16-Jul-2024
https://doi.org/10.3390/diagnostics14141531
Gupta AChug ASingh A(2024)Processing and optimized learning for improved classification of categorical plant disease datasetsIntelligent Data Analysis10.3233/IDA-230651(1-25)Online publication date: 21-Mar-2024
https://doi.org/10.3233/IDA-230651
Abirami LKarthikeyan J(2024)Review on Improved Machine Learning Techniques for Predicting Chronic DiseasesOptical Memory and Neural Networks10.3103/S1060992X2401002833:1(28-46)Online publication date: 25-Mar-2024
https://doi.org/10.3103/S1060992X24010028
Scholz FKolb TNeidhardt J(2024)Classifying User Roles in Online News Forums: A Model for User Interaction and Behavior AnalysisAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3665187(240-249)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3631700.3665187
Li YGao SGuo ZLiu LXiong YSun SLi H(2024)Machine Learning-Based Prediction on Relapse of Acute Myeloid Leukemia2024 7th World Conference on Computing and Communication Technologies (WCCCT)10.1109/WCCCT60665.2024.10541360(330-336)Online publication date: 12-Apr-2024
https://doi.org/10.1109/WCCCT60665.2024.10541360
Mohamed Tvan Santen VAlrahis LSinanoglu OAmrouch H(2024)Graph Attention Networks to Identify the Impact of Transistor Degradation on Circuit ReliabilityIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.339746071:7(3269-3281)Online publication date: Jul-2024
https://doi.org/10.1109/TCSI.2024.3397460
Tao LLi HWang FLiu MTang ZWang Q(2024)An Adaptive Safe-Region Diversity Oversampling Algorithm for Imbalanced ClassificationIEEE Access10.1109/ACCESS.2024.339615512(63713-63724)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3396155
Rajagopalan SPurohit ASingh J(2024)Genetically optimised SMOTE-based adversarial discriminative domain adaptation for rotor fault diagnosis at variable operating conditionsMeasurement Science and Technology10.1088/1361-6501/ad5b7d35:10(106109)Online publication date: 11-Jul-2024
https://doi.org/10.1088/1361-6501/ad5b7d
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Editorial

Guest editorial: Special issue on models and methodologies for co-design of embedded systems

Editorial: Introduction to the Special Issue on Multimedia Data Mining

Comments

Published In

Publisher

Publication History

Check for updates

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

References

Cited By

Index Terms

Recommendations

Editorial

Guest editorial: Special issue on models and methodologies for co-design of embedded systems

Editorial: Introduction to the Special Issue on Multimedia Data Mining

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations