Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Online multi-label streaming feature selection based on neighborhood rough set

Published: 01 December 2018 Publication History

Highlights

A new neighborhood relation is proposed to effectively solve the problem of granularity selection in neighborhood rough set.
We generalize classical neighborhood rough set model to fit multi-label learning and present a novel measure to compute positive region.
We propose a new feature selection framework, which solves online streaming feature selection and multi-label feature selection simultaneously.
The experiment on ten benchmark datasets with different application scenarios shows a competitive performance of our proposed method against the state-of-the-art multi-label feature selection algorithms.

Abstract

Multi-label feature selection has grabbed intensive attention in many big data applications. However, traditional multi-label feature selection methods generally ignore a real-world scenario, i.e., the features constantly flow into the model one by one over time. To address this problem, we develop a novel online multi-label streaming feature selection method based on neighborhood rough set to select a feature subset which contains strongly relevant and non-redundant features. The main motivation is that data mining based on neighborhood rough set does not require any priori knowledge of the feature space structure. Moreover, neighborhood rough set deals with mixed data without breaking the neighborhood and order structure of data. In this paper, we first introduce the maximum-nearest-neighbor of instance to granulate all instances which can solve the problem of granularity selection in neighborhood rough set, and then generalize neighborhood rough set in single-label to fit multi-label learning. Meanwhile, an online multi-label streaming feature selection framework, which includes online importance selection and online redundancy update, is presented. Under this framework, we propose a criterion to select the important features relative to the currently selected features, and design a bound on pairwise correlations between features under label set to filter out redundant features. An empirical study using a series of benchmark datasets demonstrates that the proposed method outperforms other state-of-the-art multi-label feature selection methods.

References

[1]
K. Bache, M. Lichman, UCI Machine Learning Repository, CA: University of California, School of Information and Computer Science, 2013, Master’s Thesis.
[2]
G. Doquire, M. Verleysen, Feature selection for multi-label classification problems, Adv. Comput. Intell. (2011) 9–16.
[3]
O. Dunn, Multiple comparisons among means, J. Am. Stat. Assoc. 56 (293) (1961) 52–64.
[4]
A. Elisseeff, J. Weston, A kernel method for multi-labelled classification, Advances in Neural Information Processing Systems, Cambridge, MA, 2001, pp. 681–687.
[5]
M. Friedman, A comparison of alternative tests of significance for the problem of m ranking, Ann. Math. Stat. 11 (1) (1940) 86–92.
[6]
S. Eskandari, M. Javidi, Online streaming feature selection using rough sets, Int. J. Approx. Reason. 69 (2016) 35–57.
[8]
Q. Gu, Z. Li, J. Han, Correlated Multi-label Feature Selection, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 1087–1096.
[9]
H. Hotelling, Relations between two sets of variates, Biometrika 28 (3/4) (1936) 321–377.
[10]
Q. Hu, D. Yu, J. Liu, C. Wu, Neighborhood rough set based heterogeneous feature subset selection, Inf Sci (Ny) 178 (18) (2008) 3577–3594.
[11]
Q. Hu, D. Yu, Z. Xie, Numerical attribute reduction based on neighborhood granulation and rough approximation, J.Softw. 19 (3) (2008) 640–649.
[12]
J. Lee, D. Kim, SCLS: multi-label feature selection based on scalable criterion for large label set, Pattern Recognit. 66 (2017) 342–352.
[13]
J. Lee, D. Kim, Feature selection for multi-label classification using multivariate mutual information, Pattern Recognit. Lett. 34 (3) (2013) 349–357.
[14]
J. Lee, D. Kim, Memetic feature selection algorithm for multi-label classification, Inf. Sci. 293 (2015) 80–96.
[15]
J. Lee, D. Kim, Efficient multi-label feature selection using entropy-based label selection, Entropy 18 (11) (2016) 3–26.
[16]
J. Lee, D. Kim, Fast multi-label feature selection based on information-theoretic feature ranking, Pattern Recognit. 48 (9) (2015) 2761–2771.
[17]
Y. Lei, J. Liu, J. Ye, Efficient methods for overlapping group Lasso, Advances in Neural Information Processing Systems, 2011, pp. 352–360.
[18]
D. Lewis, Y. Yang, T. Rose, F. Li, RCV1: a new benchmark collection for text categorization research, J.Mach.Learn.Res. 5 (April) (2004) 361–397.
[19]
J. Liu, Y. Lin, M. Lin, S. Wu, J. Zhang, Feature selection based on quality of information, Neurocomputing 255 (2017) 11–22.
[20]
J. Liu, Y. Lin, S. Wu, C. Wang, Online multi-label group feature selection, Knowl. Based Syst. 143 (2018) 42–57.
[21]
F. Li, D. Miao, W. Pedrycz, Granular multi-label feature selection based on mutual information, Pattern Recognit. 67 (2017) 410–423.
[22]
Y. Li, S. Wu, Y. Lin, J. Liu, Different classes ratio fuzzy rough set based robust feature selection, Knowl. Based Syst. 120 (2017) 74–86.
[23]
L. Li, H. Liu, Z. Ma, Y. Mo, Z. Duan, J. Zhou, J. Zhao, Multi-label feature selection via information gain, International Conference on Advanced Data Mining and Applications, Springer International Publishing, 2014, pp. 345–355.
[24]
Y. Lin, J. Li, P. Lin, G. Lin, J. Chen, Feature selection via neighborhood multi-granulation fusion, Knowl. Based Syst. 67 (2014) 162–168.
[25]
Y. Lin, Q. Hu, J. Liu, J. Duan, Multi-label feature selection based on max-dependency and min-redundancy, Neurocomputing 168 (2015) 92–103.
[26]
Y. Lin, Q. Hu, J. Liu, J. Chen, J. Duan, Multi-label feature selection based on neighborhood mutual information, Appl. Soft Comput. 38 (2016) 244–256.
[27]
Y. Lin, Q. Hu, J. J. Zhang, X. Wu, Multi-label feature selection with streaming labels, Inf. Sci. 372 (2016) 256–275.
[28]
Y. Lin, Q. Hu, J. Liu, J. Li, X. Wu, Streaming feature selection for multi-label learning based on fuzzy mutual information, IEEE Trans. Fuzzy Syst. 25 (6) (2017) 1491–1507.
[29]
H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1226–1238.
[30]
S. Perkins, J. Theiler, Online feature selection using grafting, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003, pp. 592–599.
[31]
L. Qiao, B. Zhang, J. Su, X. Lu, A systematic review of structured sparse learning, Front. Inf. Technol. Electron.Eng. 18 (4) (2017) 445–463.
[32]
Y. Qian, J. Liang, W. Pedrycz, C. Dang, Positive approximation: an accelerator for attribute reduction in rough set theory, Artif. Intell. 174 (9–10) (2010) 597–618.
[33]
Y. Qian, Q. Wang, H. Cheng, J. Liang, C. Dang, Fuzzy-rough feature selection accelerator, Fuzzy Sets Syst. 258 (2015) 61–78.
[34]
R. Schapire, Y. Singer, BoosTexter: a boosting-based system for text categorization, Mach. Learn. 39 (2–3) (2000) 135–168.
[35]
N.S. or, E. Cherman, M. Monard, H. Lee, Relieff for multi-label feature selection, Intelligent Systems (BRACIS), 2013 Brazilian Conference on. IEEE, 2013, pp. 6–11.
[36]
N.S. or, M. Monard, G. Tsoumakas, H. Lee, Label construction for multi-label feature selection, Intelligent Systems (BRACIS), 2014 Brazilian Conference on. IEEE, 2014, pp. 247–252.
[37]
N.S. or, E. Cherman, M. Monard, H. Lee, A comparison of multi-label feature selection methods using the problem transformation approach, Electron. Notes Theor. Comput. Sci. 292 (2013) 135–151.
[38]
K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, Multi-label classification of music into emotions, Proceedings of the 9th International Society Music Information Retrieval, Philadelphia, USA, 2008, pp. 325–330.
[39]
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. 67 (1) (2005) 91–108.
[40]
J. Wang, M. Wang, P. Li, L. Liu, Z. Zhao, X. Hu, X. Wu, Online feature selection with group structure analysis, IEEE Trans. Knowl. Data Eng. 27 (11) (2015) 3029–3041.
[41]
L. Wei, P. Xing, G. Shi, Z. Ji, Q. Zou, Fast prediction of methylation sites using sequence-based feature selection technique, IEEE/ACM Trans. Comput. Biol. Bioinform. (2017).
[42]
X. Wu, Y. Yu, W. Ding, H. Wang, X. Zhu, Online feature selection with streaming features, IEEE Trans. Pattern Anal. Mach. Intell. 35 (5) (2013) 1178–1192.
[43]
P. Yan, Y. Li, Graph-margin based multi-label feature selection, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, 2016, pp. 540–555.
[44]
L. Yong, H. Wenliang, J. Yunliang, Z. Zeng, Quick attribute reduct algorithm for neighborhood rough set model, Inf. Sci. 271 (2014) 65–81.
[45]
K. Yu, S. Yu, V. Tresp, Multi-label informed latent semantic indexing, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005, pp. 258–265.
[46]
K. Yu, X. Wu, W. Ding, J. Pei, Towards scalable and accurate online feature selection for big data, IEEE International Conference on Data Mining. IEEE, 2014, pp. 660–669.
[47]
A. Zeng, T. Li, D. Liu, J. Zhang, H. Chen, A fuzzy rough set approach for incremental feature selection on hybrid information systems, Fuzzy Sets Syst. 258 (2015) 39–60.
[48]
M. Zhang, Z. Zhou, A review on multi-label learning algorithms, IEEE Trans. Knowl. Data Eng. 26 (2014) 1819–1837.
[49]
M. Zhang, J.P. na, V. Robles, Feature selection for multi-label Naive Bayes classificaiton, Inf. Sci. 179 (2009) 3218–3229.
[50]
Y. Zhang, Z. Zhou, Multilabel dimensionality reduction via dependence maximization, ACM Trans. Knowl. Discov. Data 4 (2010) 1–21.
[51]
L. Zhang, Q. Hu, J. Duan, X. Wang, Multi-label feature selection with fuzzyrough sets, Rough Sets and Knowledge Technology, Springer InternationalPublishing, 2014, pp. 121–128.
[52]
J. Zhou, D. Foster, R. Stine, L. Ungar, Streaming feature selection using alpha-investing, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge discovery in data Mining. ACM, 2005, pp. 384–393.
[53]
P. Zhu, Q. Xu, Q. Hu, H. Zhao, Multi-label feature selection with missing labels, Pattern Recognit. 74 (2018) 488–502.
[54]
P. Zhu, Q. Hu, Adaptive neighborhood granularity selection and combination based on margin distribution optimization, Inf. Sci. 249 (2013) 1–12.
[55]
P. Zhu, W. Zhu, Q. Hu, C. Zhang, W. Zuo, Subspace clustering guided unsupervised feature selection, Pattern Recognit. 66 (2017) 364–374.
[56]
P. Zhu, W. Zuo, L. Zhang, Q. Hu, S. Shiu, Unsupervised feature selection by regularized self-representation, Pattern Recognit. 48 (2) (2015) 438–446.

Cited By

View all

Index Terms

  1. Online multi-label streaming feature selection based on neighborhood rough set
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition
      Pattern Recognition  Volume 84, Issue C
      Dec 2018
      371 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 December 2018

      Author Tags

      1. Online feature selection
      2. Multi-label learning
      3. Neighborhood rough set
      4. Granularity

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced DatasetProcedia Computer Science10.1016/j.procs.2024.03.050234:C(645-653)Online publication date: 17-Jul-2024
      • (2024)Learning implicit labeling-importance and label correlation for multi-label feature selection with streaming labelsPattern Recognition10.1016/j.patcog.2023.110081147:COnline publication date: 4-Mar-2024
      • (2024)Gradient-based multi-label feature selection considering three-way variable interactionPattern Recognition10.1016/j.patcog.2023.109900145:COnline publication date: 1-Jan-2024
      • (2024)Learning correlation information for multi-label feature selectionPattern Recognition10.1016/j.patcog.2023.109899145:COnline publication date: 1-Jan-2024
      • (2024)Feature selection considering feature relevance, redundancy and interactivity for neighbourhood decision systemsNeurocomputing10.1016/j.neucom.2024.128092596:COnline publication date: 1-Sep-2024
      • (2024)An incremental feature selection approach for dynamic feature variationNeurocomputing10.1016/j.neucom.2023.127138570:COnline publication date: 12-Apr-2024
      • (2024)Partial multi-label learning via robust feature selection and relevance fusion optimizationKnowledge-Based Systems10.1016/j.knosys.2023.111365286:COnline publication date: 17-Apr-2024
      • (2024)Feature selection for multi-labeled data based on label enhancement technique and mutual informationInformation Sciences: an International Journal10.1016/j.ins.2024.121113679:COnline publication date: 1-Sep-2024
      • (2024)Feature selection based on self-information combining double-quantitative class weights and three-order approximation accuracies in neighborhood rough setsInformation Sciences: an International Journal10.1016/j.ins.2023.119945657:COnline publication date: 1-Feb-2024
      • (2024)Incremental reduction of imbalanced distributed mixed data based on k-nearest neighbor rough setInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109218172:COnline publication date: 1-Sep-2024
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media