Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Liu, Bin; Tsoumakas, Grigorios

doi:10.1007/978-3-030-46147-8_11

Bin Liu¹⁴ &
Grigorios Tsoumakas¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1824 Accesses

Abstract

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multi-label sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare sub-concepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Article 30 March 2024

A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration

MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix

Article Open access 05 August 2024

Notes

References

Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17, 1–10 (2016)
MathSciNet MATH Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004). https://doi.org/10.1016/j.patcog.2004.03.009
Article Google Scholar
Cao, P., Liu, X., Zhao, D., Zaiane, O.: Cost sensitive ranking support vector machine for multi-label data learning. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) HIS 2016. AISC, vol. 552, pp. 244–255. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52941-7_25
Chapter Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS (LNAI), vol. 8073, pp. 150–160. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40846-5_16
Chapter Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: MLeNN: a first approach to heuristic multilabel undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 1–9. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10840-7_1
Chapter Google Scholar
Charte, F., Rivera, A.J., Del Jesus, M.J., Herrera, F.: MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015). https://doi.org/10.1016/j.knosys.2015.07.019
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015). https://doi.org/10.1016/j.neucom.2014.08.091
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomputing 326–327, 39–53 (2019). https://doi.org/10.1016/j.neucom.2016.08.158
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: REMEDIAL-HwR: tackling multilabel imbalance through label decoupling and data resampling hybridization. Neurocomputing 326–327, 110–122 (2019). https://doi.org/10.1016/j.neucom.2017.01.118
Article Google Scholar
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: Proceedings of the 2006 IEEE International Joint Conference on Neural Network, pp. 1770–1775. IEEE (2006). https://doi.org/10.1109/IJCNN.2006.246893
Daniels, Z.A., Metaxas, D.N.: Addressing imbalance in multi-label classification using structured hellinger forests. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 1826–1832 (2017)
Google Scholar
Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Theeramunkong, T., et al. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5669, pp. 40–52. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14640-4_4
Chapter Google Scholar
Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008). https://doi.org/10.1007/s10994-008-5064-8
Article Google Scholar
Garcia, S., Herrera, F.: An extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2016). https://doi.org/10.1007/978-0-387-21606-5
Book MATH Google Scholar
Li, C., Shi, G.: Improvement of learning algorithm for the multi-instance multi-label RBF neural networks trained with imbalanced samples. J. Inf. Sci. Eng. 29(4), 765–776 (2013)
MathSciNet Google Scholar
Li, L., Wang, H.: Towards label imbalance in multi-label classification with many labels. arXiv preprint arXiv:1604.01304 (2016)
Liu, B., Tsoumakas, G.: Making classifier chains resilient to class imbalance. In: 10th Asian Conference on Machine Learning (ACML 2018), Beijing, pp. 280–295 (2018)
Google Scholar
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2015)
Article Google Scholar
Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016). https://doi.org/10.1016/j.patcog.2016.03.012
Article Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE (2015). https://doi.org/10.1371/journal.pone.0118432
Article Google Scholar
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Chapter Google Scholar
Sozykin, K., Khan, A.M., Protasov, S., Hussain, R.: Multi-label class-imbalanced action recognition in hockey videos via 3D convolutional neural networks. In: 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 146–151 (2018)
Google Scholar
Sun, K.W., Lee, C.H.: Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork. Neurocomputing 266, 375–389 (2017). https://doi.org/10.1016/j.neucom.2017.05.049
Article Google Scholar
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45(10), 3738–3750 (2012). https://doi.org/10.1016/j.patcog.2012.03.014
Article Google Scholar
Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1301–1307 (2008). https://doi.org/10.1109/IJCNN.2008.4633966
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)
Article Google Scholar
Wan, S., Duan, Y., Zou, Q.: HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics 17(17–18), 1700262 (2017). https://doi.org/10.1002/pmic.201700262
Article Google Scholar
Wu, B., Lyu, S., Ghanem, B.: Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI 2016, pp. 2229–2236. AAAI Press (2016)
Google Scholar
Zeng, W., Chen, X., Cheng, H.: Pseudo labels for imbalanced multi-label learning. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 25–31, October 2014. https://doi.org/10.1109/DSAA.2014.7058047
Zhang, M.L., Li, Y.K., Liu, X.Y.: Towards class-imbalance aware multi-label learning. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4041–4047 (2015)
Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar

Download references

Acknowledgements

Bin Liu is supported from the China Scholarship Council (CSC) under the Grant CSC No. 201708500095.

Author information

Authors and Affiliations

School of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Bin Liu & Grigorios Tsoumakas

Authors

Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Liu .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, B., Tsoumakas, G. (2020). Synthetic Oversampling of Multi-label Data Based on Local Label Distribution. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-46147-8_11
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46146-1
Online ISBN: 978-3-030-46147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration

MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration

MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation