Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms

Charte, Francisco; Rivera, Antonio; del Jesus, María José; Herrera, Francisco

doi:10.1007/978-3-319-07617-1_10

Francisco Charte²⁵,
Antonio Rivera²⁶,
María José del Jesus²⁶ &
…
Francisco Herrera²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8480))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

2244 Accesses

Abstract

In the context of multilabel classification, the learning from imbalanced data is getting considerable attention recently. Several algorithms to face this problem have been proposed in the late five years, as well as various measures to assess the imbalance level. Some of the proposed methods are based on resampling techniques, a very well-known approach whose utility in traditional classification has been proven.

This paper aims to describe how a specific characteristic of multilabel datasets (MLDs), the level of concurrence among imbalanced labels, could have a great impact in resampling algorithms behavior. Towards this goal, a measure named SCUMBLE, designed to evaluate this concurrence level, is proposed and its usefulness is experimentally tested. As a result, a straightforward guideline on the effectiveness of multilabel resampling algorithms depending on MLDs characteristics can be inferred.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels

MLeNN: A First Approach to Heuristic Multilabel Undersampling

When is Undersampling Effective in Unbalanced Classification Tasks?

References

Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, ch. 34, pp. 667–685. Springer US, Boston (2010)
Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel Text Classification for Automated Tag Suggestion. In: Proc. ECML PKDD 2008 Discovery Challenge, Antwerp, Belgium, pp. 75–83 (2008)
Google Scholar
Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005)
Chapter Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), 7155 (2012)
Google Scholar
Li, C., Shi, G.: Improvement of learning algorithm for the multi-instance multi-label rbf neural networks trained with imbalanced samples. J. Inf. Sci. Eng. 29(4), 765–776 (2013)
Google Scholar
Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: IEEE Int. Joint Conf. on Neural Networks, IJCNN, 2008, pp. 1301–1307 (2008)
Google Scholar
Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recognit. Lett. 33(5), 513–523 (2012)
Article Google Scholar
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 45(10), 3738–3750 (2012)
Article Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 150–160. Springer, Heidelberg (2013)
Chapter Google Scholar
Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G.: Managing imbalanced data sets in multi-label problems: A case study with the smote algorithm. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013, Part I. LNCS, vol. 8258, pp. 334–342. Springer, Heidelberg (2013)
Chapter Google Scholar
García, V., Sánchez, J., Mollineda, R.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Systems 25(1), 13–21 (2012)
Article Google Scholar
Szymański, P., Kajdanowicz, T.: MLG: Enchancing multi-label classification with modularity-based label grouping. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 431–440. Springer, Heidelberg (2013)
Chapter Google Scholar
Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Audio, Speech, Language Process. 16(2), 467–476 (2008)
Article Google Scholar
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
Chapter Google Scholar
Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. In: Advances in Neural Information Processing Systems 14, vol. 14, pp. 681–687. MIT Press (2001)
Google Scholar
Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic Code Assignment to Medical Text. In: Proc. Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic, pp. 129–136 (2007)
Google Scholar
Godbole, S., Sarawagi, S.: Discriminative Methods for Multi-labeled Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Chapter Google Scholar
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)
Article Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Clare, A.J., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Chapter Google Scholar
Zhang, M., Zhou, Z.: A Review on Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng., doi:10.1109/TKDE.2013.39
Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sciences 250, 113–141 (2013)
Article Google Scholar
Fernández, A., López, V., Galar, M., del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl. Based Systems 42, 97–110 (2013)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16, 321–357 (2002)
MATH Google Scholar
Kotsiantis, S.B., Pintelas, P.E.: Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1, 46–55 (2003)
Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42, 203–231 (2001)
Article MATH Google Scholar
Atkinson, A.B.: On the measurement of inequality. Journal of Economic Theory 2(3), 244–263 (1970)
Article MathSciNet Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In: Proc. ECML/PKDD Workshop on Mining Multidimensional Data, MMD 2008, Antwerp, Belgium, pp. 30–44 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Francisco Charte & Francisco Herrera
Dep. of Computer Science, University of Jaén, Jaén, Spain
Antonio Rivera & María José del Jesus

Authors

Francisco Charte
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Rivera
View author publications
You can also search for this author in PubMed Google Scholar
María José del Jesus
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios Polycarpou
Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, SP, Brazil
André C. P. L. F. de Carvalho
Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien Kung Road, 80778, Kaohsiung, Taiwan
Jeng-Shyang Pan
Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain and University of A Coruna, Escuela Universitaria Politecnica, Departamento de Enxeñeria Industrial, A Coruna, Spain
Héctor Quintian
University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Charte, F., Rivera, A., del Jesus, M.J., Herrera, F. (2014). Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms. In: Polycarpou, M., de Carvalho, A.C.P.L.F., Pan, JS., Woźniak, M., Quintian, H., Corchado, E. (eds) Hybrid Artificial Intelligence Systems. HAIS 2014. Lecture Notes in Computer Science(), vol 8480. Springer, Cham. https://doi.org/10.1007/978-3-319-07617-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-07617-1_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07616-4
Online ISBN: 978-3-319-07617-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels

MLeNN: A First Approach to Heuristic Multilabel Undersampling

When is Undersampling Effective in Unbalanced Classification Tasks?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Resampling Multilabel Datasets by Decoupling Highly Imbalanced Labels

MLeNN: A First Approach to Heuristic Multilabel Undersampling

When is Undersampling Effective in Unbalanced Classification Tasks?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation