Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

Published: 05 September 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges, including class imbalance, label correlation, incomplete multi-label matrices, and noisy and irrelevant features. In this article, we propose an integrated multi-label classification approach with incomplete label space and class imbalance (ML-CIB) for simultaneously training the multi-label classification model and addressing the aforementioned challenges. The model learns a new label matrix and captures new label correlations, because it is difficult to find a complete label vector for each instance in real-world data. We also propose a label regularization to handle the imbalanced multi-labeled issue in the new label, and l1 regularization norm is incorporated in the objective function to select the relevant sparse features. A multi-label feature selection (ML-CIB-FS) method is presented as a variant of the proposed ML-CIB to show the efficacy of the proposed method in selecting the relevant features. ML-CIB is formulated as a constrained objective function. We use the accelerated proximal gradient method to solve the proposed optimisation problem. Last, extensive experiments are conducted on 19 regular-scale and large-scale imbalanced multi-labeled datasets. The promising results show that our method significantly outperforms the state-of-the-art.

    References

    [1]
    Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7 (2006), 830--836.
    [2]
    Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 1 (2009), 183--202.
    [3]
    Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 730--738.
    [4]
    Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recog. 37, 9 (2004), 1757--1771.
    [5]
    Ali Braytee, Wei Liu, and Paul Kennedy. 2016. A cost-sensitive learning strategy for feature extraction from imbalanced data. In Proceedings of the International Conference on Neural Information Processing. Springer, 78--86.
    [6]
    Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning. IEEE Trans. Neural Netw. 20, 3 (2009), 542--542.
    [7]
    Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163 (2015), 3--16.
    [8]
    Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-based Syst. 89 (2015), 385--397.
    [9]
    Ken Chen, Bao-Liang Lu, and James T. Kwok. 2006. Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’06). IEEE, 1770--1775.
    [10]
    Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke Hüllermeier. 2013. Optimizing the f-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. Proceedings of the 30th International Conference on Machine Learning (ICML’13) 28 (2013), 1130--1138.
    [11]
    Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Machine Learn. Res. 7 (Jan. 2006), 1--30.
    [12]
    Sareewan Dendamrongvit and Miroslav Kubat. 2009. Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 40--52.
    [13]
    Vicente García, Javier Salvador Sánchez, and Ramón Alberto Mollineda. 2012. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-based Syst. 25, 1 (2012), 13--21.
    [14]
    Nadia Ghamrawi and Andrew McCallum. 2005. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 195--200.
    [15]
    Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, José Francisco Ruiz-Muñoz, and César Germán Castellanos-Domínguez. 2013. Managing imbalanced data sets in multi-label problems: A case study with the SMOTE algorithm. In Proceedings of the Iberoamerican Congress on Pattern Recognition. Springer, 334--342.
    [16]
    Shantanu Godbole and Sunita Sarawagi. 2004. Discriminative methods for multi-labeled classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 22--30.
    [17]
    Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Correlated multi-label feature selection. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 1087--1096.
    [18]
    Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Exp. Syst. Appl. 73 (2017), 220--239.
    [19]
    Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263--1284.
    [20]
    Daniel J. Hsu, Sham Kakade, John Langford, and Tong Zhang. 2009. Multi-label prediction via compressed sensing. In Proceedings of the International Conference on Neural Information Processing (NIPS’09), Vol. 22. 772--780.
    [21]
    Jun Huang, Guorong Li, Qingming Huang, and Xindong Wu. 2016. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3309--3323.
    [22]
    Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. 2008. Extracting shared subspace for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 381--389.
    [23]
    Ling Jian, Jundong Li, Kai Shu, and Huan Liu. 2016. Multi-label informed feature selection. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 1627--1633.
    [24]
    Cunhe Li and Guoqiang Shi. 2013. Improvement of learning algorithm for the multi-instance multi-label RBF neural networks trained with imbalanced samples.J. Inf. Sci. Eng. 29, 4 (2013), 765--776.
    [25]
    Zhigang Ma, Feiping Nie, Yi Yang, Jasper R. R. Uijlings, and Nicu Sebe. 2012. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans. Multimed. 14, 4 (2012), 1021--1030.
    [26]
    Olena Morozova and Marco A. Marra. 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 5 (2008), 255--264.
    [27]
    Yu Nesterov. 2005. Smooth minimization of non-smooth functions. Math. Prog. 103, 1 (2005), 127--152.
    [28]
    Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Diego F. Silva. 2015. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inform. Syst. 45, 1 (2015), 247--270.
    [29]
    Robert E. Schapire and Yoram Singer. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learn. 39, 2--3 (2000), 135--168.
    [30]
    Chuan Shi, Xiangnan Kong, Di Fu, Philip S. Yu, and Bin Wu. 2014. Multi-label classification based on multi-objective optimization. ACM Trans. Intell. Syst. Technol. 5, 2 (2014), 35.
    [31]
    Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recog. 45, 10 (2012), 3738--3750.
    [32]
    Mingkui Tan, Qinfeng Shi, Anton van den Hengel, Chunhua Shen, Junbin Gao, Fuyuan Hu, and Zhen Zhang. 2015. Learning graph structure for multi-label image classification via clique generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4100--4109.
    [33]
    Mingkui Tan, Ivor W. Tsang, and Li Wang. 2014. Towards ultrahigh dimensional feature selection for big data. J. Machine Learn. Res. 15, 1 (2014), 1371--1429.
    [34]
    Gorn Tepvorachai and Chris Papachristou. 2008. Multi-label imbalanced data enrichment process in neural net classifier training. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’08). IEEE, 1301--1307.
    [35]
    Naonori Ueda and Kazumi Saito. 2003. Parametric mixture models for multi-labeled text. Proceedings of the Conference on Advances in Neural Information Processing Systems. 737--744.
    [36]
    Jun Wang, Tony Jebara, and Shih-Fu Chang. 2008. Graph transduction via alternating minimization. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1144--1151.
    [37]
    Baoyuan Wu, Siwei Lyu, and Bernard Ghanem. 2016. Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’16). 2229--2236.
    [38]
    Linli Xu, Zhen Wang, Zefan Shen, Yubo Wang, and Enhong Chen. 2014. Learning low-rank label correlations for multi-label classification with missing labels. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). IEEE, 1067--1072.
    [39]
    Min-Ling Zhang, Yu-Kun Li, and Xu-Ying Liu. 2015. Towards class-imbalance aware multi-label learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15). 4041--4047.
    [40]
    Min-Ling Zhang, José M. Peña, and Victor Robles. 2009. Feature selection for multi-label naive Bayes classification. Inform. Sci. 179, 19 (2009), 3218--3229.
    [41]
    Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 8 (2014), 1819--1837.
    [42]
    Tianyi Zhou, Dacheng Tao, and Xindong Wu. 2012. Compressed labeling on distilled labelsets for multi-label learning. Machine Learn. 88, 1--2 (2012), 69--126.
    [43]
    Pengfei Zhu, Qian Xu, Qinghua Hu, Changqing Zhang, and Hong Zhao. 2018. Multi-label feature selection with missing labels. Pattern Recog. 74 (2018), 488--502.
    [44]
    Yue Zhu, James T. Kwok, and Zhi-Hua Zhou. 2018. Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30, 6 (2018), 1081--1094.

    Cited By

    View all
    • (2024)Semi-supervised imbalanced multi-label classification with label propagationPattern Recognition10.1016/j.patcog.2024.110358(110358)Online publication date: Mar-2024
    • (2024)Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problemsApplied Soft Computing10.1016/j.asoc.2024.111618159(111618)Online publication date: Jul-2024
    • (2024)Robust multi-label feature learning-based dual spaceInternational Journal of Data Science and Analytics10.1007/s41060-023-00496-417:4(373-387)Online publication date: 13-Jan-2024
    • Show More Cited By

    Index Terms

    1. Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 5
      Special Section on Advances in Causal Discovery and Inference and Regular Papers
      September 2019
      314 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3360733
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 September 2019
      Accepted: 01 June 2019
      Revised: 01 June 2019
      Received: 01 October 2018
      Published in TIST Volume 10, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-label classification
      2. class imbalance
      3. label correlation
      4. multi-label feature selection

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)83
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Semi-supervised imbalanced multi-label classification with label propagationPattern Recognition10.1016/j.patcog.2024.110358(110358)Online publication date: Mar-2024
      • (2024)Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problemsApplied Soft Computing10.1016/j.asoc.2024.111618159(111618)Online publication date: Jul-2024
      • (2024)Robust multi-label feature learning-based dual spaceInternational Journal of Data Science and Analytics10.1007/s41060-023-00496-417:4(373-387)Online publication date: 13-Jan-2024
      • (2024)Multi-label oxide classification in float-zone silicon crystal growth using transfer learning and asymmetric lossJournal of Intelligent Manufacturing10.1007/s10845-023-02302-1Online publication date: 1-Feb-2024
      • (2024)SGBGAN: minority class image generation for class-imbalanced datasetsMachine Vision and Applications10.1007/s00138-023-01506-y35:2Online publication date: 29-Jan-2024
      • (2023)Multi-Scale Annulus Clustering for Multi-Label ClassificationMathematics10.3390/math1108196911:8(1969)Online publication date: 21-Apr-2023
      • (2023)Graph-Based Class-Imbalance Learning With Label EnhancementIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313326234:9(6081-6095)Online publication date: Oct-2023
      • (2023)Multi-label classification with weak labels by learning label correlation and label regularizationApplied Intelligence10.1007/s10489-023-04562-z53:17(20110-20133)Online publication date: 1-Sep-2023
      • (2022)Robust Multi-Label Classification with Enhanced Global and Local Label CorrelationMathematics10.3390/math1011187110:11(1871)Online publication date: 30-May-2022
      • (2022)Feature Selection With Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum RedundancyIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2021.305384430:5(1197-1211)Online publication date: 1-May-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media