Abstract
Multi-label learning has been becoming an increasingly active area into the machine learning community since a wide variety of real world problems are naturally multi-labeled. However, it is not uncommon to find disparities among the number of samples of each class, which constitutes an additional challenge for the learning algorithm. Smote is an oversampling technique that has been successfully applied for balancing single-labeled data sets, but has not been used in multi-label frameworks so far. In this work, several strategies are proposed and compared in order to generate synthetic samples for balancing data sets in the training of multi-label algorithms. Results show that a correct selection of seed samples for oversampling improves the classification performance of multi-label algorithms. The uniform generation oversampling, provides an efficient methodology for a wide scope of real world problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)
Elisseeff, A.: Kernel methods for multi-labelled classification and categorical regression problems. In: Advances in Neural Information Processing (2002)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Jaramillo-Garzón, J.A., et al.: Predictability of protein subcellular locations by pattern recognition techniques. In: EMBC-IEEE (2010)
Zhang, M., Zhou, Z.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Huang, S.J., Zhou, Z.H.: Multi-Label Learning by Exploiting Label Correlations Locally. In: IAAA (2012)
Kong, X., Ng, M., Zhou, Z.: Transductive Multi-Label Learning via Label Set Propagation. IEEE Transactions on Knowledge and Data Engineering, 1–14 (2011)
He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Chawla, N., Bowyer, K., Hall, L.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial 16 (2002)
Tahir, M.A., Kittler, et al.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition (2012)
Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) PAKDD Workshops 2009. LNCS, vol. 5669, pp. 40–52. Springer, Heidelberg (2010)
Chen, K., Liang Lu, B.: Efficient classification of multilabel and imbalanced data using min-max modular classifiers. In: The International Joint Conference on Neural Networks (IJCNN 2006), pp. 1770–1775 (2006)
Tsoumakas, G., Vilcek, J., Spyromitros, E., Vlahavas, I.: Mulan: A java library for multi-label learning. Journal of Machine Learning Research 1, 1–48 (2010)
Zhou, Z.-H., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems (2007)
Klimt, B., Yang, Y.: Introducing the Enron Corpus. Machine Learning (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G. (2013). Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-41822-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)