Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132847.3132858acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Multi-Label Feature Selection using Correlation Information

Published: 06 November 2017 Publication History

Abstract

High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification performance in machine learning. In multi-label learning, to select the discriminative features among multiple labels, several challenges should be considered: interdependent labels, different instances may share different label correlations, correlated features, and missing and flawed labels. This work is part of a project at The Children's Hospital at Westmead (TB-CHW), Australia to explore the genomics of childhood leukaemia. In this paper, we propose a CMFS (Correlated- and Multi-label Feature Selection method), based on non-negative matrix factorization (NMF) for simultaneously performing feature selection and addressing the aforementioned challenges. Significantly, a major advantage of our research is to exploit the correlation information contained in features, labels and instances to select the relevant features among multiple labels. Furthermore, l2,1 -norm regularization is incorporated in the objective function to undertake feature selection by imposing sparsity on the feature matrix rows. We employ CMFS to decompose the data and multi-label matrices into a low-dimensional space. To solve the objective function, an efficient iterative optimization algorithm is proposed with guaranteed convergence. Finally, extensive experiments are conducted on high-dimensional multi-labeled datasets. The experimental results demonstrate that our method significantly outperforms state-of-the-art multi-label feature selection methods.

References

[1]
Zafer Barutcuoglu, Robert E Schapire, and Olga G Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics, Vol. 22, 7 (2006), 830--836.
[2]
Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown. 2004. Learning multi-label scene classification. Pattern recognition, Vol. 37, 9 (2004), 1757--1771.
[3]
Ali Braytee, Daniel R Catchpoole, Paul J Kennedy, and Wei Liu. 2016. Balanced Supervised Non-Negative Matrix Factorization for Childhood Leukaemia Patients Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2405--2408.
[4]
Janez Demvsar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research Vol. 7, Jan (2006), 1--30.
[5]
Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix t-factorizations for clustering Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 126--135.
[6]
Susan T Dumais. 2004. Latent semantic analysis. Annual review of information science and technology, Vol. 38, 1 (2004), 188--230.
[7]
Sheng-Jun Huang, Zhi-Hua Zhou, and ZH Zhou. 2012. Multi-Label Learning by Exploiting Label Correlations Locally. AAAI.
[8]
Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. 2008. Extracting shared subspace for multi-label classification Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 381--389.
[9]
Ling Jian, Jundong Li, Kai Shu, and Huan Liu. 2016. Multi-Label Informed Feature Selection. In Proceedings of the 25th International Joint Conference on Artificial Intelligence.
[10]
Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. Advances in neural information processing systems. 556--562.
[11]
Jaesung Lee and Dae-Won Kim. 2015. Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognition, Vol. 48, 9 (2015), 2761--2771.
[12]
Huan Liu and Hiroshi Motoda. 2007. Computational methods of feature selection. CRC Press.
[13]
Zhigang Ma, Feiping Nie, Yi Yang, Jasper RR Uijlings, and Nicu Sebe. 2012. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Transactions on Multimedia Vol. 14, 4 (2012), 1021--1030.
[14]
Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. 2010. Efficient and robust feature selection via joint ffl2, 1-norms minimization Advances in neural information processing systems. 1813--1821.
[15]
Alberto Pascual-Montano, Pedro Carmona-Saez, Monica Chagoyen, Francisco Tirado, Jose M Carazo, and Roberto D Pascual-Marqui. 2006. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC bioinformatics, Vol. 7, 1 (2006), 1.
[16]
Newton Spolaôr, Everton Alvares Cherman, Maria Carolina Monard, and Huei Diana Lee. 2012. Filter approach feature selection methods to support multi-label learning based on relieff and information gain. Advances in Artificial Intelligence-SBIA 2012. Springer, 72--81.
[17]
Newton Spolaôr, Everton Alvares Cherman, Maria Carolina Monard, and Huei Diana Lee. 2013. ReliefF for multi-label feature selection. In Intelligent Systems (BRACIS), 2013 Brazilian Conference on. IEEE, 6--11.
[18]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.
[19]
Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A java library for multi-label learning. Journal of Machine Learning Research Vol. 12, Jul (2011), 2411--2414.
[20]
Grigorios Tsoumakas and Min-ling Zhang. 2009. Learning from multi-label data. (2009).
[21]
Naonori Ueda and Kazumi Saito. 2002. Parametric mixture models for multi-labeled text. Advances in neural information processing systems. 721--728.
[22]
Linli Xu, Zhen Wang, Zefan Shen, Yubo Wang, and Enhong Chen. 2014. Learning low-rank label correlations for multi-label classification with missing labels Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 1067--1072.
[23]
Yin Zhang and Zhi-Hua Zhou. 2010. Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 4, 3 (2010), 14.
[24]
Li Zhou, Namrata Nath, Oksana Markovich, Aysen Yuksel, Aedan Roberts, and Daniel Catchpoole. 2015. The Tumour Bank of The Children's Hospital at Westmead. Biopreservation and biobanking Vol. 13, 2 (2015), 147--148.
[25]
Xin Zhou and David P Tuck. 2007. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics, Vol. 23, 9 (2007), 1106--1114.

Cited By

View all
  • (2024)Evolutionary Sparse Coding and Graph Regularisation for Embedded Multi-label Feature SelectionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654094(271-274)Online publication date: 14-Jul-2024
  • (2024)Learning Accurate Label-Specific Features From Partially Multilabeled DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324192135:8(10436-10450)Online publication date: Aug-2024
  • (2024)Dual Sparse Structured Subspaces and Graph Regularisation for Particle Swarm Optimisation-Based Multi-Label Feature SelectionIEEE Computational Intelligence Magazine10.1109/MCI.2023.332784119:1(36-50)Online publication date: Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high dimensional data
  2. multi-label classification
  3. multi-label feature selection
  4. new application

Qualifiers

  • Research-article

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evolutionary Sparse Coding and Graph Regularisation for Embedded Multi-label Feature SelectionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654094(271-274)Online publication date: 14-Jul-2024
  • (2024)Learning Accurate Label-Specific Features From Partially Multilabeled DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.324192135:8(10436-10450)Online publication date: Aug-2024
  • (2024)Dual Sparse Structured Subspaces and Graph Regularisation for Particle Swarm Optimisation-Based Multi-Label Feature SelectionIEEE Computational Intelligence Magazine10.1109/MCI.2023.332784119:1(36-50)Online publication date: Feb-2024
  • (2024)Learning correlation information for multi-label feature selectionPattern Recognition10.1016/j.patcog.2023.109899145(109899)Online publication date: Jan-2024
  • (2024)Multi-label feature selection based on nonlinear mappingInformation Sciences10.1016/j.ins.2024.121168(121168)Online publication date: Jul-2024
  • (2024)Label relaxation and shared information for multi-label feature selectionInformation Sciences10.1016/j.ins.2024.120662(120662)Online publication date: Apr-2024
  • (2024)Multi-label feature selection with global and local label correlationExpert Systems with Applications10.1016/j.eswa.2024.123198246(123198)Online publication date: Jul-2024
  • (2024)Multi-label feature selection via adaptive dual-graph optimizationExpert Systems with Applications10.1016/j.eswa.2023.122884243(122884)Online publication date: Jun-2024
  • (2024)Robust multi-label feature learning-based dual spaceInternational Journal of Data Science and Analytics10.1007/s41060-023-00496-417:4(373-387)Online publication date: 13-Jan-2024
  • (2024)Weakly supervised multi-label feature selection based on shared subspaceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02426-7Online publication date: 11-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media