research-article

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

Authors:

Paul J. KennedyAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 10, Issue 5

Article No.: 56, Pages 1 - 26

https://doi.org/10.1145/3342512

Published: 05 September 2019 Publication History

Abstract

Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges, including class imbalance, label correlation, incomplete multi-label matrices, and noisy and irrelevant features. In this article, we propose an integrated multi-label classification approach with incomplete label space and class imbalance (ML-CIB) for simultaneously training the multi-label classification model and addressing the aforementioned challenges. The model learns a new label matrix and captures new label correlations, because it is difficult to find a complete label vector for each instance in real-world data. We also propose a label regularization to handle the imbalanced multi-labeled issue in the new label, and l₁ regularization norm is incorporated in the objective function to select the relevant sparse features. A multi-label feature selection (ML-CIB-FS) method is presented as a variant of the proposed ML-CIB to show the efficacy of the proposed method in selecting the relevant features. ML-CIB is formulated as a constrained objective function. We use the accelerated proximal gradient method to solve the proposed optimisation problem. Last, extensive experiments are conducted on 19 regular-scale and large-scale imbalanced multi-labeled datasets. The promising results show that our method significantly outperforms the state-of-the-art.

References

[1]

Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7 (2006), 830--836.

Digital Library

[2]

Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 1 (2009), 183--202.

Digital Library

[3]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 730--738.

Digital Library

[4]

Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recog. 37, 9 (2004), 1757--1771.

[5]

Ali Braytee, Wei Liu, and Paul Kennedy. 2016. A cost-sensitive learning strategy for feature extraction from imbalanced data. In Proceedings of the International Conference on Neural Information Processing. Springer, 78--86.

[6]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning. IEEE Trans. Neural Netw. 20, 3 (2009), 542--542.

Digital Library

[7]

Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163 (2015), 3--16.

Digital Library

[8]

Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-based Syst. 89 (2015), 385--397.

Digital Library

[9]

Ken Chen, Bao-Liang Lu, and James T. Kwok. 2006. Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’06). IEEE, 1770--1775.

[10]

Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke Hüllermeier. 2013. Optimizing the f-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. Proceedings of the 30th International Conference on Machine Learning (ICML’13) 28 (2013), 1130--1138.

Digital Library

[11]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7 (Jan. 2006), 1--30.

Digital Library

[12]

Sareewan Dendamrongvit and Miroslav Kubat. 2009. Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 40--52.

Digital Library

[13]

Vicente García, Javier Salvador Sánchez, and Ramón Alberto Mollineda. 2012. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-based Syst. 25, 1 (2012), 13--21.

Digital Library

[14]

Nadia Ghamrawi and Andrew McCallum. 2005. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 195--200.

Digital Library

[15]

Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, José Francisco Ruiz-Muñoz, and César Germán Castellanos-Domínguez. 2013. Managing imbalanced data sets in multi-label problems: A case study with the SMOTE algorithm. In Proceedings of the Iberoamerican Congress on Pattern Recognition. Springer, 334--342.

Digital Library

[16]

Shantanu Godbole and Sunita Sarawagi. 2004. Discriminative methods for multi-labeled classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 22--30.

[17]

Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Correlated multi-label feature selection. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 1087--1096.

Digital Library

[18]

Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Exp. Syst. Appl. 73 (2017), 220--239.

Digital Library

[19]

Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263--1284.

Digital Library

[20]

Daniel J. Hsu, Sham Kakade, John Langford, and Tong Zhang. 2009. Multi-label prediction via compressed sensing. In Proceedings of the International Conference on Neural Information Processing (NIPS’09), Vol. 22. 772--780.

Digital Library

[21]

Jun Huang, Guorong Li, Qingming Huang, and Xindong Wu. 2016. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3309--3323.

Digital Library

[22]

Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. 2008. Extracting shared subspace for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 381--389.

Digital Library

[23]

Ling Jian, Jundong Li, Kai Shu, and Huan Liu. 2016. Multi-label informed feature selection. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 1627--1633.

Digital Library

[24]

Cunhe Li and Guoqiang Shi. 2013. Improvement of learning algorithm for the multi-instance multi-label RBF neural networks trained with imbalanced samples.J. Inf. Sci. Eng. 29, 4 (2013), 765--776.

[25]

Zhigang Ma, Feiping Nie, Yi Yang, Jasper R. R. Uijlings, and Nicu Sebe. 2012. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans. Multimed. 14, 4 (2012), 1021--1030.

Digital Library

[26]

Olena Morozova and Marco A. Marra. 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 5 (2008), 255--264.

[27]

Yu Nesterov. 2005. Smooth minimization of non-smooth functions. Math. Prog. 103, 1 (2005), 127--152.

Digital Library

[28]

Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Diego F. Silva. 2015. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inform. Syst. 45, 1 (2015), 247--270.

Digital Library

[29]

Robert E. Schapire and Yoram Singer. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learn. 39, 2--3 (2000), 135--168.

Digital Library

[30]

Chuan Shi, Xiangnan Kong, Di Fu, Philip S. Yu, and Bin Wu. 2014. Multi-label classification based on multi-objective optimization. ACM Trans. Intell. Syst. Technol. 5, 2 (2014), 35.

Digital Library

[31]

Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recog. 45, 10 (2012), 3738--3750.

Digital Library

[32]

Mingkui Tan, Qinfeng Shi, Anton van den Hengel, Chunhua Shen, Junbin Gao, Fuyuan Hu, and Zhen Zhang. 2015. Learning graph structure for multi-label image classification via clique generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4100--4109.

[33]

Mingkui Tan, Ivor W. Tsang, and Li Wang. 2014. Towards ultrahigh dimensional feature selection for big data. J. Machine Learn. Res. 15, 1 (2014), 1371--1429.

Digital Library

[34]

Gorn Tepvorachai and Chris Papachristou. 2008. Multi-label imbalanced data enrichment process in neural net classifier training. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’08). IEEE, 1301--1307.

[35]

Naonori Ueda and Kazumi Saito. 2003. Parametric mixture models for multi-labeled text. Proceedings of the Conference on Advances in Neural Information Processing Systems. 737--744.

Digital Library

[36]

Jun Wang, Tony Jebara, and Shih-Fu Chang. 2008. Graph transduction via alternating minimization. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1144--1151.

Digital Library

[37]

Baoyuan Wu, Siwei Lyu, and Bernard Ghanem. 2016. Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’16). 2229--2236.

Digital Library

[38]

Linli Xu, Zhen Wang, Zefan Shen, Yubo Wang, and Enhong Chen. 2014. Learning low-rank label correlations for multi-label classification with missing labels. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). IEEE, 1067--1072.

Digital Library

[39]

Min-Ling Zhang, Yu-Kun Li, and Xu-Ying Liu. 2015. Towards class-imbalance aware multi-label learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15). 4041--4047.

Digital Library

[40]

Min-Ling Zhang, José M. Peña, and Victor Robles. 2009. Feature selection for multi-label naive Bayes classification. Inform. Sci. 179, 19 (2009), 3218--3229.

Digital Library

[41]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 8 (2014), 1819--1837.

[42]

Tianyi Zhou, Dacheng Tao, and Xindong Wu. 2012. Compressed labeling on distilled labelsets for multi-label learning. Machine Learn. 88, 1--2 (2012), 69--126.

Digital Library

[43]

Pengfei Zhu, Qian Xu, Qinghua Hu, Changqing Zhang, and Hong Zhao. 2018. Multi-label feature selection with missing labels. Pattern Recog. 74 (2018), 488--502.

Digital Library

[44]

Yue Zhu, James T. Kwok, and Zhi-Hua Zhou. 2018. Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30, 6 (2018), 1081--1094.

Cited By

Du GZhang JZhang NWu HWu PLi S(2024)Semi-supervised imbalanced multi-label classification with label propagationPattern Recognition10.1016/j.patcog.2024.110358(110358)Online publication date: Mar-2024
https://doi.org/10.1016/j.patcog.2024.110358
García-Pedrajas NCuevas-Muñoz Jde Haro-García A(2024)Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problemsApplied Soft Computing10.1016/j.asoc.2024.111618159(111618)Online publication date: Jul-2024
https://doi.org/10.1016/j.asoc.2024.111618
Braytee ALiu W(2024)Robust multi-label feature learning-based dual spaceInternational Journal of Data Science and Analytics10.1007/s41060-023-00496-417:4(373-387)Online publication date: 13-Jan-2024
https://doi.org/10.1007/s41060-023-00496-4
Show More Cited By

Index Terms

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

Multi-Label Feature Selection using Correlation Information
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the ...
MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble
In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 10, Issue 5

Special Section on Advances in Causal Discovery and Inference and Regular Papers

September 2019

314 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3360733

Editor:
Yu Zheng
JD Finance, China

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2019

Accepted: 01 June 2019

Revised: 01 June 2019

Received: 01 October 2018

Published in TIST Volume 10, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
682
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)8

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Du GZhang JZhang NWu HWu PLi S(2024)Semi-supervised imbalanced multi-label classification with label propagationPattern Recognition10.1016/j.patcog.2024.110358(110358)Online publication date: Mar-2024
https://doi.org/10.1016/j.patcog.2024.110358
García-Pedrajas NCuevas-Muñoz Jde Haro-García A(2024)Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problemsApplied Soft Computing10.1016/j.asoc.2024.111618159(111618)Online publication date: Jul-2024
https://doi.org/10.1016/j.asoc.2024.111618
Braytee ALiu W(2024)Robust multi-label feature learning-based dual spaceInternational Journal of Data Science and Analytics10.1007/s41060-023-00496-417:4(373-387)Online publication date: 13-Jan-2024
https://doi.org/10.1007/s41060-023-00496-4
Chen TTosello GCalaon M(2024)Multi-label oxide classification in float-zone silicon crystal growth using transfer learning and asymmetric lossJournal of Intelligent Manufacturing10.1007/s10845-023-02302-1Online publication date: 1-Feb-2024
https://doi.org/10.1007/s10845-023-02302-1
Wan QGuo WWang Y(2024)SGBGAN: minority class image generation for class-imbalanced datasetsMachine Vision and Applications10.1007/s00138-023-01506-y35:2Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/s00138-023-01506-y
Liu YLiu CSong JYang XXu TWang P(2023)Multi-Scale Annulus Clustering for Multi-Label ClassificationMathematics10.3390/math1108196911:8(1969)Online publication date: 21-Apr-2023
https://doi.org/10.3390/math11081969
Du GZhang JJiang MLong JLin YLi STan K(2023)Graph-Based Class-Imbalance Learning With Label EnhancementIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313326234:9(6081-6095)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2021.3133262
Ji XTan AWu WGu S(2023)Multi-label classification with weak labels by learning label correlation and label regularizationApplied Intelligence10.1007/s10489-023-04562-z53:17(20110-20133)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s10489-023-04562-z
Zhao TZhang YPedrycz W(2022)Robust Multi-Label Classification with Enhanced Global and Local Label CorrelationMathematics10.3390/math1011187110:11(1871)Online publication date: 30-May-2022
https://doi.org/10.3390/math10111871
Sun LYin TDing WQian YXu J(2022)Feature Selection With Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum RedundancyIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2021.305384430:5(1197-1211)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1109/TFUZZ.2021.3053844
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents