Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Editorial: special issue on learning from imbalanced data sets

Published: 01 June 2004 Publication History
  • Get Citation Alerts
  • First page of PDF


    In N. Japkowicz, editor, Proceedings of the AAAI'2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05. AAAI, 2000.]]
    In T. Dietterich, D. Margineantu, F. Provost, and P. Turney, editors, Proceedings of the ICML'2000 Workshop on Cost-sensitive Learning. 2000.]]
    In N. V. Chawla, N. Japkowicz, and A. Kotcz, editors, Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Data Sets. 2003.]]
    In C. Ferri, P. Flach, J. Orallo, and N. Lachice, editors, ECAI' 2004 First Workshop on ROC Analysis in AI. ECAI, 2004.]]
    N. Abe. Invited talk: Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond. http://www.site.uottawa.ca/~nat/Workshop2003/ICML03Workshop_Abe.ppt, 2003.]]
    G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20--29, 2004.]]
    M. Castillo and J. Serrano. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 6(1):70--79, 2004.]]
    P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of Knowledge Discovery and Data Mining, pages 164--168, 1998.]]
    N. V. Chawla. C4.5 and imbalanced datasets: Investigating the effect of ampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML'03 Workshipshop on Class Imbalances, 2003.]]
    N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16:321--357, 2002.]]
    N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107--119, Dubrovnik, Croatia, 2003.]]
    P. Domingos, Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 155--164, San Diego, CA, 1999, ACM Press.]]
    C. Drummond and R. Holte. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 198--207, 2001.]]
    C. Drummond and R. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.]]
    C. Elkan. Invited talk: The real challenges in data mining: A contrarian view. http://www.site.uottawa.ca/~nat/Workshop2003/realchallenges2.ppt., 2003.]]
    W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassification cost-sensitive boosting. In Proceedings of Sixteenth International Conference on Machine Learning, pages 983--990, Slovenia, 1999.]]
    T. Fawcett, ROC graphs: Notes and practical considerations for researchers. http://www.hpl.hp.com/personal/Tom_Fawcett/papers/index.html, 2003.]]
    G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289--1305, 2003.]]
    J. Furnkranz and P. Flach. An analysis of rule evaluation metrics. In Proceedings of the Twentieth International Conference on Machine Learning, pages 202--209, 2003.]]
    H. Guo and H. L. Viktor. Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. SIGKDD Explorations, 6(1):30--39, 2004.]]
    I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.]]
    R. Hickey. Learning rare class footprints: the reflex algorithm. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    R. Holte. Summary of the workshop. http://www.site./uottawa.ca/~nat/Workshop2003/workshop2003.html, 2003.]]
    N. Japkowicz. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pages 67--77, 2001.]]
    N. Japkowicz. Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1/2):97--122, 2001.]]
    N. Japkowics, Class imbalance: Are we focusing on the right issue? In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    N. Japkowicz and R. Holte. Workshop report: Aaai-2000 workshop on learning from imbalanced data sets. AI Magazine, 22(1), 2001.]]
    N. Japkowics and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):203--231, 2002.]]
    T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40--49, 2004.]]
    M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining, pages 257--264, San Jose, CA, 2001.]]
    P. Juszczak and R. P. W. Duin. Uncertainty sampling methods for one-class classifiers. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    A. Kolcz and J. Alspector. Asymmetric missing-data problems: overcoming the lack of negative data in preference ranking. Information Retrieval, 5(1):5--40, 2002.]]
    A. Kotcz, A. Chowdhury, and J. Alspector. Data duplication: An imbalance problem? In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]
    M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, Nashville, Tennesse, 1997, Morgan Kaufmann.]]
    B. Liu, Y. Dai, X. Li, W. S. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pages 19--22, 2003.]]
    M. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    L. M. Manevitz and M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, 2:139--154, 2001.]]
    D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the 16th International Conference on Machine Learning, pages 258--267, 1999.]]
    K. Nigam, A. K. McCallum, s. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103--134, 2000.]]
    R. Pearson, G. Goney, and J. Shwaber. Imbalanced clustering for microarray time-series. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    C. Phua and D. Alahakoon. Minority report in fraud detection: Classification of skewed data. SIGKDD Explorations, 6(1):50--59, 2004.]]
    F. Provost. Invited talk: Choosing a marginal class distribution for classifier induction. http://www.site.uottawa.ca/~nat/Workshop2003/provost.html, 2003.]]
    F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001.]]
    J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]]
    P. Radivojac, N. V. Chawla, K. Dunker, and Z. Obradovic. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 2004. Accepted.]]
    B. Raskutti and A. Kowalczyk. Extreme re-balancing for SVM's: a case study. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    B. Raskutti and A. Kowalczyk. Extreme rebalancing for svms: a case study. SIGKDD Explorations, 6(1):60--69, 2004.]]
    B. Schölkopf, J. C. Platt, J. Shawe-Taylor. A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.]]
    D. Tax. One-class classification. PhD thesis, Delft University of Technology, 2001.]]
    K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of Seventeenth International Conference on Machine Learning, pages 983--990, Stanford, CA, 2000.]]
    K. M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transaction on Knowledge and Data Engineering. 14:659--665, 2002.]]
    P. Turney. Types of cost in inductive concept learning. In Proceedings of the ICML'2000 Workshop on Cost-Sensitive Learning, pages 15--21, 2000.]]
    S. Visa and A. Ralescu. Learning imbalanced and overlapping classes using fuzzy sets. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    G. Weiss. Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1):7--19, 2004.]]
    G. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003.]]
    G. Wu and E. Y. Chang. Class-boundary alignment for imbalanced dataset learning. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]
    B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 204--213, 2001.]]
    B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the Third IEEE International Conference on Data Mining, pages 435--442, Melbourne, FL, 2003.]]
    J. Zhang and I. Mani. knn approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]
    Z. Zheng and R. Srihari. Optimally combining positive and negative features for text categorization. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Date Sets, 2003.]]
    Z. Zheng, X. Wu, and R. Srihari. Feature selection for text categorization on imbalanced data. SIGKDD Explorations, 6(1):80--89, 2004.]]

    Cited By

    View all
    • (2024)Enhancing Machine Learning Performance in Estimating CDOM Absorption Coefficient via Data ResamplingRemote Sensing10.3390/rs1613231316:13(2313)Online publication date: 25-Jun-2024
    • (2024)Early Identification of River Blockage Disasters Caused by Debris Flows in the Bailong River Basin, ChinaRemote Sensing10.3390/rs1607130216:7(1302)Online publication date: 7-Apr-2024
    • (2024)Predictors of In-Hospital Mortality after Thrombectomy in Anterior Circulation Large Vessel Occlusion: A Retrospective, Machine Learning StudyDiagnostics10.3390/diagnostics1414153114:14(1531)Online publication date: 16-Jul-2024
    • Show More Cited By

    Index Terms

    1. Editorial: special issue on learning from imbalanced data sets
        Index terms have been assigned to the content through auto-classification.



        Information & Contributors


        Published In

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 6, Issue 1
        Special issue on learning from imbalanced datasets
        June 2004
        117 pages
        Issue’s Table of Contents


        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 June 2004
        Published in SIGKDD Volume 6, Issue 1

        Check for updates


        • Article


        Other Metrics

        Bibliometrics & Citations


        Article Metrics

        • Downloads (Last 12 months)466
        • Downloads (Last 6 weeks)58

        Other Metrics


        Cited By

        View all
        • (2024)Enhancing Machine Learning Performance in Estimating CDOM Absorption Coefficient via Data ResamplingRemote Sensing10.3390/rs1613231316:13(2313)Online publication date: 25-Jun-2024
        • (2024)Early Identification of River Blockage Disasters Caused by Debris Flows in the Bailong River Basin, ChinaRemote Sensing10.3390/rs1607130216:7(1302)Online publication date: 7-Apr-2024
        • (2024)Predictors of In-Hospital Mortality after Thrombectomy in Anterior Circulation Large Vessel Occlusion: A Retrospective, Machine Learning StudyDiagnostics10.3390/diagnostics1414153114:14(1531)Online publication date: 16-Jul-2024
        • (2024)Processing and optimized learning for improved classification of categorical plant disease datasetsIntelligent Data Analysis10.3233/IDA-230651(1-25)Online publication date: 21-Mar-2024
        • (2024)Review on Improved Machine Learning Techniques for Predicting Chronic DiseasesOptical Memory and Neural Networks10.3103/S1060992X2401002833:1(28-46)Online publication date: 25-Mar-2024
        • (2024)Classifying User Roles in Online News Forums: A Model for User Interaction and Behavior AnalysisAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3665187(240-249)Online publication date: 27-Jun-2024
        • (2024)Machine Learning-Based Prediction on Relapse of Acute Myeloid Leukemia2024 7th World Conference on Computing and Communication Technologies (WCCCT)10.1109/WCCCT60665.2024.10541360(330-336)Online publication date: 12-Apr-2024
        • (2024)Graph Attention Networks to Identify the Impact of Transistor Degradation on Circuit ReliabilityIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.339746071:7(3269-3281)Online publication date: Jul-2024
        • (2024)An Adaptive Safe-Region Diversity Oversampling Algorithm for Imbalanced ClassificationIEEE Access10.1109/ACCESS.2024.339615512(63713-63724)Online publication date: 2024
        • (2024)Genetically optimised SMOTE-based adversarial discriminative domain adaptation for rotor fault diagnosis at variable operating conditionsMeasurement Science and Technology10.1088/1361-6501/ad5b7d35:10(106109)Online publication date: 11-Jul-2024
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options


        View or Download as a PDF file.



        View online with eReader.








        Share this Publication link

        Share on social media