survey

Public Access

Feature Selection: A Data Perspective

Authors:

Fred Morstatter,

Robert P. Trevino,

Jiliang Tang, and

Huan LiuAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 50, Issue 6

Article No.: 94, Pages 1 - 45

https://doi.org/10.1145/3136625

Published: 06 December 2017 Publication History

Abstract

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include building simpler and more comprehensible models, improving data-mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity-based, information-theoretical-based, sparse-learning-based, and statistical-based methods. To facilitate and promote the research in this community, we also present an open source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

References

[1]

Thomas Abeel, Thibault Helleputte, Yves Van de Peer, Pierre Dupont, and Yvan Saeys. 2010. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 3 (2010), 392--398.

Digital Library

[2]

Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2009. Mixed membership stochastic blockmodels. In NIPS. 33--40.

Digital Library

[3]

Salem Alelyani, Huan Liu, and Lei Wang. 2011. The effect of the characteristics of the dataset on the selection stability. In ICTAI. 970--977.

Digital Library

[4]

Salem Alelyani, Jiliang Tang, and Huan Liu. 2013. Feature selection for clustering: A review. Data Clustering: Algorithms and Applications 29 (2013).

[5]

Jun Chin Ang, Andri Mirzal, Habibollah Haron, and Haza Nuzly Abdull Hamed. 2016. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM TCBB 13, 5 (2016), 971--989.

Digital Library

[6]

Hiromasa Arai, Crystal Maung, Ke Xu, and Haim Schweitzer. 2016. Unsupervised feature selection by heuristic search with provable bounds on suboptimality. In AAAI. 666--672.

Digital Library

[7]

Francis R. Bach. 2008. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 (2008), 1179--1225.

Digital Library

[8]

Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In WSDM. 635--644.

Digital Library

[9]

Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Network. 5, 4 (1994), 537--550.

Digital Library

[10]

Mustafa Bilgic, Lilyana Mihalkova, and Lise Getoor. 2010. Active learning for networked data. In ICML. 79--86.

Digital Library

[11]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.

Digital Library

[12]

Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information-theoretic feature selection. J. Mach. Learn. Res. 13, 1 (2012), 27--66.

Digital Library

[13]

Deng Cai, Chiyuan Zhang, and Xiaofei He. 2010. Unsupervised feature selection for multi-cluster data. In KDD. 333--342.

Digital Library

[14]

Xiao Cai, Feiping Nie, and Heng Huang. 2013. Exact top-k feature selection via &ell;<sub>2,0</sub>-norm constraint. In IJCAI. 1240--1246.

Digital Library

[15]

Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Comput. Electr. Eng. 40, 1 (2014), 16--28.

Digital Library

[16]

Xiaojun Chang, Feiping Nie, Yi Yang, and Heng Huang. 2014. A convex formulation for semi-supervised multi-label feature selection. In AAAI. 1171--1177.

Digital Library

[17]

Chen Chen, Hanghang Tong, Lei Xie, Lei Ying, and Qing He. 2016. FASCINATE: Fast cross-layer dependency inference on multi-layered networks. In KDD. 765--774.

Digital Library

[18]

Kewei Cheng, Jundong Li, and Huan Liu. 2016. FeatureMiner: A tool for interactive feature selection. In CIKM. 2445--2448.

Digital Library

[19]

Kewei Cheng, Jundong Li, and Huan Liu. 2017. Unsupervised feature selection in signed social networks. In KDD. 777--786.

Digital Library

[20]

Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, and Gert R. G. Lanckriet. 2007. A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49, 3 (2007), 434--448.

Digital Library

[21]

John C. Davis and Robert J. Sampson. 1986. Statistics and Data Analysis in Geology. Vol. 646. Wiley. New York.

[22]

Chris Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R 1-PCA: Rotational invariant -norm principal component analysis for robust subspace factorization. In ICML. 281--288.

Digital Library

[23]

Liang Du and Yi-Dong Shen. 2015. Unsupervised feature selection with adaptive structure learning. In KDD. 209--218.

Digital Library

[24]

Liang Du, Zhiyong Shen, Xuan Li, Peng Zhou, and Yi-Dong Shen. 2013. Local and global discriminative learning for unsupervised feature selection. In ICDM. 131--140.

[25]

Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.

[26]

Janusz Dutkowski and Anna Gambin. 2007. On consensus biomarker selection. BMC Bioinform. 8, 5 (2007), S5.

[27]

Ali El Akadi, Abdeljalil El Ouardighi, and Driss Aboutajdine. 2008. A powerful feature selection approach based on mutual information. Int. J. Comput. Sci. Netw. Secur. 8, 4 (2008), 116.

[28]

Jianqing Fan, Richard Samworth, and Yichao Wu. 2009. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 10 (2009), 2013--2038.

Digital Library

[29]

Ahmed K. Farahat, Ali Ghodsi, and Mohamed S. Kamel. 2011. An efficient greedy method for unsupervised feature selection. In ICDM. 161--170.

Digital Library

[30]

Christiane Fellbaum. 1998. WordNet. Wiley Online Library.

[31]

Yinfu Feng, Jun Xiao, Yueting Zhuang, and Xiaoming Liu. 2013. Adaptive unsupervised multi-view feature selection for visual concept recognition. In ACCV. 343--357.

Digital Library

[32]

François Fleuret. 2004. Fast binary feature selection with conditional mutual information. JMLR 5 (2004), 1531--1555.

Digital Library

[33]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010).

[34]

Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Academic Press.

Digital Library

[35]

Shuyang Gao, Greg Ver Steeg, and Aram Galstyan. 2016. Variational information maximization for feature selection. In NIPS. 487--495.

Digital Library

[36]

C. W. Gini. 1912. Variability and mutability, contribution to the study of statistical distribution and relaitons. Studi Economico-Giuricici Della R (1912).

[37]

David E. Golberg. 1989. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.

Digital Library

[38]

Quanquan Gu, Marina Danilevsky, Zhenhui Li, and Jiawei Han. 2012. Locality preserving feature learning. In AISTATS. 477--485.

[39]

Quanquan Gu and Jiawei Han. 2011. Towards feature selection in network. In CIKM. 1175--1184.

Digital Library

[40]

Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011a. Correlated multi-label feature selection. In CIKM. ACM, 1087--1096.

Digital Library

[41]

Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011b. Generalized fisher score for feature selection. In UAI. 266--273.

Digital Library

[42]

Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011c. Joint feature selection and subspace learning. In IJCAI. 1294--1299.

Digital Library

[43]

Baofeng Guo and Mark S. Nixon. 2009. Gait feature subset selection by mutual information. IEEE TMSC(A) 39, 1 (2009), 36--46.

Digital Library

[44]

Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. JMLR 3 (2003), 1157--1182.

Digital Library

[45]

Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti A Zadeh. 2008. Feature Extraction: Foundations and Applications. Springer.

[46]

Mark A. Hall and Lloyd A. Smith. 1999. Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In FLAIRS. 235--239.

Digital Library

[47]

Satoshi Hara and Takanori Maehara. 2017. Enumerate lasso solutions for feature selection. In AAAI. 1985--1991.

[48]

Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. 2005. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 27, 2 (2005), 83--85.

[49]

Xiaofei He, Deng Cai, and Partha Niyogi. 2005. Laplacian score for feature selection. In NIPS. 507--514.

Digital Library

[50]

Zengyou He and Weichuan Yu. 2010. Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 4 (2010), 215--225.

Digital Library

[51]

Chenping Hou, Feiping Nie, Dongyun Yi, and Yi Wu. 2011. Feature selection via joint embedding learning and sparse regression. In IJCAI. 1324--1329.

Digital Library

[52]

Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. 2013. ActNeT: Active learning for networked texts in microblogging. In SDM. 306--314.

[53]

Hao Huang, Shinjae Yoo, and S Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In CIKM. 1031--1040.

Digital Library

[54]

Junzhou Huang, Tong Zhang, and Dimitris Metaxas. 2011. Learning with structured sparsity. J. Mach. Learn. Res. 12 (2011), 3371--3412.

Digital Library

[55]

Laurent Jacob, Guillaume Obozinski, and Jean-Philippe Vert. 2009. Group lasso with overlap and graph lasso. In ICML. 433--440.

Digital Library

[56]

Aleks Jakulin. 2005. Machine Learning Based on Attribute Interactions. Ph.D. Dissertation. Univerza v Ljubljani.

[57]

Rodolphe Jenatton, Jean-Yves Audibert, and Francis Bach. 2011. Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12 (2011), 2777--2824.

Digital Library

[58]

Rodolphe Jenatton, Julien Mairal, Francis R. Bach, and Guillaume R. Obozinski. 2010. Proximal methods for sparse hierarchical dictionary learning. In ICML. 487--494.

Digital Library

[59]

Ling Jian, Jundong Li, Kai Shu, and Huan Liu. 2016. Multi-label informed feature selection. In IJCAI. 1627--1633.

Digital Library

[60]

Yi Jiang and Jiangtao Ren. 2011. Eigenvalue sensitive feature selection. In ICML. 89--96.

Digital Library

[61]

Alexandros Kalousis, Julien Prados, and Melanie Hilario. 2007. Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12, 1 (2007), 95--116.

Digital Library

[62]

Seyoung Kim and Eric P Xing. 2009. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5, 8 (2009).

[63]

Seyoung Kim and Eric P Xing. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML. 543--550.

Digital Library

[64]

Kenji Kira and Larry A. Rendell. 1992. A practical approach to feature selection. In ICML Workshop. 249--256.

Digital Library

[65]

Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1 (1997), 273--324.

Digital Library

[66]

Daphne Koller and Mehran Sahami. 1995. Toward optimal feature selection. In ICML. 284--292.

Digital Library

[67]

Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. 2004. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5 (2004), 27--72.

Digital Library

[68]

David D. Lewis. 1992. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language. 212--217.

Digital Library

[69]

Jundong Li, Harsh Dani, Xia Hu, and Huan Liu. 2017. Radar: Residual analysis for anomaly detection in attributed networks. In IJCAI. 2152--2158.

Digital Library

[70]

Jundong Li, Xia Hu, Ling Jian, and Huan Liu. 2016. Toward time-evolving feature selection on dynamic networks. In ICDM. 1003--1008.

[71]

Jundong Li, Xia Hu, Jiliang Tang, and Huan Liu. 2015. Unsupervised streaming feature selection in social media. In CIKM. 1041--1050.

Digital Library

[72]

Jundong Li, Xia Hu, Liang Wu, and Huan Liu. 2016. Robust unsupervised feature selection on networked data. In SDM. 387--395.

[73]

Jundong Li and Huan Liu. 2017. Challenges of feature selection for big data analytics. IEEE Intell. Syst. 32, 2 (2017), 9--15.

Digital Library

[74]

Jundong Li, Jiliang Tang, and Huan Liu. 2017a. Reconstruction-based unsupervised feature selection: An embedded approach. In IJCAI. 2159--2165.

Digital Library

[75]

Jundong Li, Liang Wu, Osmar R. Zaïane, and Huan Liu. 2017b. Toward personalized relational learning. In SDM. 444--452.

[76]

Yifeng Li, Chih-Yu Chen, and Wyeth W. Wasserman. 2015. Deep feature selection: Theory and application to identify enhancers and promoters. In RECOMB. 205--217.

[77]

Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, and Hanqing Lu. 2012. Unsupervised feature selection using nonnegative spectral analysis. In AAAI. 1026--1032.

Digital Library

[78]

David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. J. Assist Inf. Sci. Technol. 58, 7 (2007), 1019--1031.

Digital Library

[79]

Dahua Lin and Xiaoou Tang. 2006. Conditional infomax learning: An integrated framework for feature extraction and fusion. In ECCV. 68--82.

Digital Library

[80]

Hongfu Liu, Haiyi Mao, and Yun Fu. 2016a. Robust multi-view feature selection. In ICDM. 281--290.

[81]

Huan Liu and Hiroshi Motoda. 2007. Computational Methods of Feature Selection. CRC Press.

Digital Library

[82]

Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In ICTAI. 388--391.

Digital Library

[83]

Hongfu Liu, Ming Shao, and Yun Fu. 2016b. Consensus guided unsupervised feature selection. In AAAI. 1874--1880.

Digital Library

[84]

Jun Liu, Shuiwang Ji, and Jieping Ye. 2009a. Multi-task feature learning via efficient &ell;<sub>2,0</sub>-norm minimization. In UAI. 339--348.

Digital Library

[85]

Jun Liu, Shuiwang Ji, and Jieping Ye. 2009b. SLEP: Sparse Learning with Efficient Projections. Arizona State University. Retrieved from http://www.public.asu.edu/&sim;jye02/Software/SLEP.

[86]

Jun Liu and Jieping Ye. 2010. Moreau-Yosida regularization for grouped tree structure learning. In NIPS. 1459--1467.

Digital Library

[87]

Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin, and Huan Liu. 2014. Global and local structure preservation for feature selection. Trans. Neur. Netw. Learn. Syst. 25, 6 (2014), 1083--1095.

[88]

Bo Long, Zhongfei Mark Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In ICML. 585--592.

Digital Library

[89]

Bo Long, Zhongfei Mark Zhang, and Philip S Yu. 2007. A probabilistic framework for relational clustering. In KDD. 470--479.

Digital Library

[90]

Steven Loscalzo, Lei Yu, and Chris Ding. 2009. Consensus group stable feature selection. In KDD. 567--576.

Digital Library

[91]

Shuangge Ma, Xiao Song, and Jian Huang. 2007. Supervised group Lasso with applications to microarray data analysis. BMC Bioinf. 8, 1 (2007), 60.

[92]

Sofus A Macskassy and Foster Provost. 2007. Classification in networked data: A toolkit and a univariate case study. J. Mach. Learn. Res. 8 (2007), 935--983.

Digital Library

[93]

Peter V. Marsden and Noah E Friedkin. 1993. Network studies of social influence. Sociol. Methods Res. 22, 1 (1993), 127--151.

[94]

Mahdokht Masaeli, Yan Yan, Ying Cui, Glenn Fung, and Jennifer G. Dy. 2010. Convex principal feature selection. In SDM. 619--628.

[95]

Crystal Maung and Haim Schweitzer. 2013. Pass-efficient unsupervised feature selection. In NIPS. 1628--1636.

Digital Library

[96]

James McAuley, Ji Ming, Darryl Stewart, and Philip Hanna. 2005. Subband correlation and robust speech recognition. IEEE Trans. Speech Audio Process. 13, 5 (2005), 956--964.

[97]

Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. (2001), 415--444.

[98]

Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. J. Roy. Stat. Soc. B 70, 1 (2008), 53--71.

[99]

Patrick E. Meyer and Gianluca Bontempi. 2006. On the use of variable complementarity for feature selection in cancer classification. In Applications of Evolutionary Computing. 91--102.

Digital Library

[100]

Patrick Emmanuel Meyer, Colas Schretter, and Gianluca Bontempi. 2008. Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Select. Top. Sign. Process. 2, 3 (2008), 261--274.

[101]

Patrenahalli M Narendra and Keinosuke Fukunaga. 1977. A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 100, 9 (1977), 917--922.

Digital Library

[102]

Michael Netzer, Gunda Millonig, Melanie Osl, Bernhard Pfeifer, Siegfried Praun, Johannes Villinger, Wolfgang Vogel, and Christian Baumgartner. 2009. A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 25, 7 (2009), 941--947.

Digital Library

[103]

Xuan Vinh Nguyen, Jeffrey Chan, Simone Romano, and James Bailey. 2014. Effective global approaches for mutual information based feature selection. In KDD. 512--521.

Digital Library

[104]

Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. 2010. Efficient and robust feature selection via joint -norms minimization. In NIPS. 1813--1821.

Digital Library

[105]

Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. 2008. Trace ratio criterion for feature selection. In AAAI. 671--676.

Digital Library

[106]

Feiping Nie, Wei Zhu, Xuelong Li, and others. 2016. Unsupervised feature selection with structured graph optimization. In AAAI. 1302--1308.

Digital Library

[107]

Guillaume Obozinski, Ben Taskar, and Michael Jordan. 2007. Joint Covariate Selection for Grouped Classification. Technical Report. Technical Report, Statistics Department, UC Berkeley.

[108]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, Oct (2011), 2825--2830.

Digital Library

[109]

Hanyang Peng and Yong Fan. 2016. Direct sparsity optimization based feature selection for multi-class classification. In IJCAI. 1918--1924.

Digital Library

[110]

Hanyang Peng and Yong Fan. 2017. A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization. In AAAI. 2471--2477.

[111]

Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (2005), 1226--1238.

Digital Library

[112]

Jie Peng, Ji Zhu, Anna Bergamaschi, Wonshik Han, Dong-Young Noh, Jonathan R Pollack, and Pei Wang. 2010. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4, 1 (2010), 53.

[113]

Simon Perkins, Kevin Lacker, and James Theiler. 2003. Grafting: Fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3 (2003), 1333--1356.

Digital Library

[114]

Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In ICML. 592--599.

Digital Library

[115]

Mingjie Qian and Chengxiang Zhai. 2013. Robust unsupervised feature selection. In IJCAI. 1621--1627.

Digital Library

[116]

Ariadna Quattoni, Xavier Carreras, Michael Collins, and Trevor Darrell. 2009. An efficient projection for regularization. In ICML. 857--864.

Digital Library

[117]

Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 1-2 (2003), 23--69.

Digital Library

[118]

Debaditya Roy, K Sri Rama Murty, and C Krishna Mohan. 2015. Feature selection using deep neural networks. In IJCNN. 1--6.

[119]

Yvan Saeys, Thomas Abeel, and Yves Van de Peer. 2008. Robust feature selection using ensemble feature selection techniques. In ECMLPKDD (2008), 313--325.

[120]

Yvan Saeys, Iñaki Inza, and Pedro Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.

Digital Library

[121]

Ted Sandler, John Blitzer, Partha P. Talukdar, and Lyle H. Ungar. 2009. Regularized learning with networks of features. In NIPS. 1401--1408.

Digital Library

[122]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Mag. 29, 3 (2008), 93.

Digital Library

[123]

Qiang Shen, Ren Diao, and Pan Su. 2012. Feature selection ensemble. Turing-100 10 (2012), 289--306.

[124]

Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.

Digital Library

[125]

Lei Shi, Liang Du, and Yi-Dong Shen. 2014. Robust spectral learning for unsupervised feature selection. In ICDM. 977--982.

Digital Library

[126]

Alexander Shishkin, Anastasia Bezzubtseva, Alexey Drutsa, Ilia Shishkov, Ekaterina Gladkikh, Gleb Gusev, and Pavel Serdyukov. 2016. Efficient high-order interaction-aware feature selection based on conditional mutual information. In NIPS. 4637--4645.

Digital Library

[127]

Sameer Singh, Jeremy Kubica, Scott Larsen, and Daria Sorokina. 2009. Parallel large scale feature selection for logistic regression. In SDM. 1172--1183.

[128]

Mingkui Tan, Ivor W Tsang, and Li Wang. 2014. Towards ultrahigh dimensional feature selection for big data. J. Mach. Learn. Res. 15, 1 (2014), 1371--1429.

Digital Library

[129]

Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature selection for classification: A review. Data Classification: Algorithms and Applications (2014), 37.

[130]

Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. 2013. Unsupervised feature selection for multi-view data in social media. In SDM. 270--278.

[131]

Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. 2014. Discriminant analysis for unsupervised feature selection. In SDM. 938--946.

[132]

Jiliang Tang and Huan Liu. 2012a. Feature selection with linked data in social media. In SDM. 118--128.

[133]

Jiliang Tang and Huan Liu. 2012b. Unsupervised feature selection for linked social media data. In KDD. 904--912.

Digital Library

[134]

Jiliang Tang and Huan Liu. 2013. Coselect: Feature selection with instance selection for social media data. In SDM. 695--703.

[135]

Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD. 817--826.

Digital Library

[136]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B (1996), 267--288.

[137]

Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67, 1 (2005), 91--108.

[138]

Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63, 2 (2001), 411--423.

[139]

William T. Vetterling, Saul A. Teukolsky, and William H. Press. 1992. Numerical Recipes: Example Book (C). Press Syndicate of the University of Cambridge.

Digital Library

[140]

Michel Vidal-Naquet and Shimon Ullman. 2003. Object recognition with informative features and linear classification. In ICCV. 281--288.

Digital Library

[141]

Hua Wang, Feiping Nie, and Heng Huang. 2013. Multi-view clustering and feature learning via structured sparsity. In ICML. 352--360.

Digital Library

[142]

Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and Thomas Huang. 2007. Trace ratio vs. ratio trace for dimensionality reduction. In CVPR. 1--8.

[143]

Jie Wang and Jieping Ye. 2015. Multi-layer feature reduction for tree structured group lasso via hierarchical projection. In NIPS. 1279--1287.

Digital Library

[144]

Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014b. Online feature selection and its applications. IEEE TKDE 26, 3 (2014), 698--710.

Digital Library

[145]

Qian Wang, Jiaxing Zhang, Sen Song, and Zheng Zhang. 2014a. Attentional neural network: Feature selection using cognitive feedback. In NIPS. 2033--2041.

Digital Library

[146]

Xiaokai Wei, Bokai Cao, and Philip S. Yu. 2016a. Nonlinear joint unsupervised feature selection. In SDM. 414--422.

Digital Library

[147]

Xiaokai Wei, Bokai Cao, and Philip S. Yu. 2016b. Unsupervised feature selection on networks: A generative view. In AAAI. 2215--2221.

Digital Library

[148]

Xiaokai Wei, Sihong Xie, and Philip S. Yu. 2015. Efficient partial order preserving unsupervised feature selection on networks. In SDM. 82--90.

[149]

Xiaokai Wei and Philip S. Yu. 2016. Unsupervised feature selection by preserving stochastic neighbors. In AISTATS. 995--1003.

Digital Library

[150]

Liang Wu, Jundong Li, Xia Hu, and Huan Liu. 2017. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM. SIAM, 99--107.

[151]

Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In ICML. 1159--1166.

Digital Library

[152]

Zhixiang Xu, Gao Huang, Kilian Q. Weinberger, and Alice X. Zheng. 2014. Gradient boosted feature selection. In KDD. 522--531.

Digital Library

[153]

Makoto Yamada, Avishek Saha, Hua Ouyang, Dawei Yin, and Yi Chang. 2014. N3LARS: Minimum redundancy maximum relevance feature selection for large and high-dimensional data. arXiv preprint arXiv:1411.2331 (2014).

[154]

Feng Yang and K. Z. Mao. 2011. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 4 (2011), 1080--1092.

Digital Library

[155]

Howard Hua Yang and John E. Moody. 1999. Data visualization and feature selection: New algorithms for nongaussian data. In NIPS. 687--693.

Digital Library

[156]

Sen Yang, Lei Yuan, Ying-Cheng Lai, Xiaotong Shen, Peter Wonka, and Jieping Ye. 2012. Feature grouping and selection over an undirected graph. In KDD. 922--930.

Digital Library

[157]

Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. 2011. &ell;<sub>2,0</sub>-norm regularized discriminative feature selection for unsupervised learning. In IJCAI. 1589--1594.

Digital Library

[158]

Yi Yang, Dong Xu, Feiping Nie, Shuicheng Yan, and Yueting Zhuang. 2010. Image clustering using local discriminant models and global integration. IEEE Trans. Inf. Process. 19, 10 (2010), 2761--2773.

Digital Library

[159]

Yee Hwa Yang, Yuanyuan Xiao, and Mark R. Segal. 2005. Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21, 7 (2005), 1084--1093.

Digital Library

[160]

Jieping Ye and Jun Liu. 2012. Sparse methods for biomedical data. ACM SIGKDD Explor. Newslett. 14, 1 (2012), 4--15.

Digital Library

[161]

Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2014. Towards scalable and accurate online feature selection for big data. In ICDM. 660--669.

Digital Library

[162]

Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In ICML. 856--863.

Digital Library

[163]

Stella X. Yu and Jianbo Shi. 2003. Multiclass spectral clustering. In ICCV. 313--319.

Digital Library

[164]

Lei Yuan, Jun Liu, and Jieping Ye. 2011. Efficient methods for overlapping group lasso. In NIPS. 352--360.

Digital Library

[165]

Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. J. Roy Stat. Soc. B 68, 1 (2006), 49--67.

[166]

Sepehr Abbasi Zadeh, Mehrdad Ghadiri, Vahab S. Mirrokni, and Morteza Zadimoghaddam. 2017. Scalable feature selection via distributed diversity maximization. In AAAI. 2876--2883.

[167]

Jian Zhang, Zoubin Ghahramani, and Yiming Yang. 2008. Flexible latent variable models for multi-task learning. Mach. Learn. 73, 3 (2008), 221--242.

Digital Library

[168]

Miao Zhang, Chris H. Q. Ding, Ya Zhang, and Feiping Nie. 2014. Feature selection at the discrete limit. In AAAI. 1355--1361.

Digital Library

[169]

Qin Zhang, Peng Zhang, Guodong Long, Wei Ding, Chengqi Zhang, and Xindong Wu. 2015. Towards mining trapezoidal data streams. In ICDM. 1111--1116.

Digital Library

[170]

Lei Zhao, Qinghua Hu, and Wenwu Wang. 2015. Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans. Multimedia 17, 11 (2015), 1936--1948.

Digital Library

[171]

Peng Zhao, Guilherme Rocha, and Bin Yu. 2009. The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics (2009), 3468--3497.

[172]

Zhou Zhao, Xiaofei He, Deng Cai, Lijun Zhang, Wilfred Ng, and Yueting Zhuang. 2016. Graph regularized feature selection with data reconstruction. IEEE Trans. Knowl. Data Eng. 28, 3 (2016), 689--700.

Digital Library

[173]

Zheng Zhao and Huan Liu. 2007. Spectral feature selection for supervised and unsupervised learning. In ICML. 1151--1157.

Digital Library

[174]

Zheng Zhao and Huan Liu. 2008. Multi-source feature selection via geometry-dependent covariance analysis. In FSDM. 36--47.

Digital Library

[175]

Zheng Zhao, Lei Wang, Huan Liu, and others. 2010. Efficient spectral feature selection with minimum redundancy. In AAAI. 673--678.

Digital Library

[176]

Zheng Zhao, Ruiwen Zhang, James Cox, David Duling, and Warren Sarle. 2013. Massively parallel feature selection: An approach based on variance preservation. Mach. Learn. 92, 1 (2013), 195--220.

Digital Library

[177]

Jing Zhou, Dean Foster, Robert Stine, and Lyle Ungar. 2005. Streaming feature selection using alpha-investing. In KDD. 384--393.

Digital Library

[178]

Jiayu Zhou, Jun Liu, Vaibhav A Narayan, and Jieping Ye. 2012. Modeling disease progression via fused sparse group lasso. In KDD. 1095--1103.

Digital Library

[179]

Yao Zhou and Jingrui He. 2017. A randomized approach for crowdsourcing in the presence of multiple views. In ICDM.

[180]

Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press.

[181]

Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor J. Hastie. 2004. 1-norm support vector machines. In NIPS. 49--56.

Digital Library

[182]

Pengfei Zhu, Qinghua Hu, Changqing Zhang, and Wangmeng Zuo. 2016. Coupled dictionary learning for unsupervised feature selection. In AAAI. 2422--2428.

Digital Library

Cited By

Lakkshmanan ASeranmadevi RSree PTyagi A(2024)Engineering Applications of Artificial IntelligenceEnhancing Medical Imaging with Emerging Technologies10.4018/979-8-3693-5261-8.ch010(166-179)Online publication date: 15-Apr-2024
https://doi.org/10.4018/979-8-3693-5261-8.ch010
Nie YSa RChumachenko SHu YWang YFan W(2024)Inversion of Forest Aboveground Biomass in Regions with Complex Terrain Based on PolSAR Data and a Machine Learning Model: Radiometric Terrain Correction AssessmentRemote Sensing10.3390/rs1612222916:12(2229)Online publication date: 19-Jun-2024
https://doi.org/10.3390/rs16122229
Zhai XChen M(2024)Accelerated Design for Perovskite-Oxide-Based Photocatalysts Using Machine Learning TechniquesMaterials10.3390/ma1712302617:12(3026)Online publication date: 20-Jun-2024
https://doi.org/10.3390/ma17123026
Show More Cited By

Index Terms

Feature Selection: A Data Perspective
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

A survey on online feature selection with streaming features

In the era of big data, the dimensionality of data is increasing dramatically in many domains. To deal with high dimensionality, online feature selection becomes critical in big data mining. Recently, online selection of dynamic features has received ...
Read More
Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?
Advances in Computational Intelligence
Abstract
The advent of Big Data has brought with it an unprecedented and overwhelming increase in data volume, not only in samples but also in available features. Feature selection, the process of selecting the relevant features and discarding the ...
Read More
General framework for class-specific feature selection

Commonly, when a feature selection algorithm is applied, a single feature subset is selected for all the classes, but this subset could be inadequate for some classes. Class-specific feature selection allows selecting a possible different feature subset ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 50, Issue 6

November 2018

752 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3161158

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2017

Accepted: 01 August 2017

Revised: 01 July 2017

Received: 01 September 2016

Published in CSUR Volume 50, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Feature selection

Qualifiers

Survey
Research
Refereed

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,571
Total Citations
View Citations
29,980
Total Downloads

Downloads (Last 12 months)8,562
Downloads (Last 6 weeks)1,178

Other Metrics

View Author Metrics

Citations

Cited By

Lakkshmanan ASeranmadevi RSree PTyagi A(2024)Engineering Applications of Artificial IntelligenceEnhancing Medical Imaging with Emerging Technologies10.4018/979-8-3693-5261-8.ch010(166-179)Online publication date: 15-Apr-2024
https://doi.org/10.4018/979-8-3693-5261-8.ch010
Nie YSa RChumachenko SHu YWang YFan W(2024)Inversion of Forest Aboveground Biomass in Regions with Complex Terrain Based on PolSAR Data and a Machine Learning Model: Radiometric Terrain Correction AssessmentRemote Sensing10.3390/rs1612222916:12(2229)Online publication date: 19-Jun-2024
https://doi.org/10.3390/rs16122229
Zhai XChen M(2024)Accelerated Design for Perovskite-Oxide-Based Photocatalysts Using Machine Learning TechniquesMaterials10.3390/ma1712302617:12(3026)Online publication date: 20-Jun-2024
https://doi.org/10.3390/ma17123026
Ramos-Pérez IBarbero-Aparicio JCanepa-Oneto AArnaiz-González ÁMaudes-Raedo J(2024)An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide DataInformation10.3390/info1504022315:4(223)Online publication date: 16-Apr-2024
https://doi.org/10.3390/info15040223
Liu JChen JChen SWu K(2024)Forest Management Type Identification Based on Stacking Ensemble LearningForests10.3390/f1505088715:5(887)Online publication date: 20-May-2024
https://doi.org/10.3390/f15050887
Zhuang ZLi HShao JChan PTai H(2024)Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) DataApplied Sciences10.3390/app1411441914:11(4419)Online publication date: 23-May-2024
https://doi.org/10.3390/app14114419
Sembiring JSasongko RBastian ERaditya BLimansubroto R(2024)A Deep Learning Approach for Trajectory Control of Tilt-Rotor UAVAerospace10.3390/aerospace1101009611:1(96)Online publication date: 19-Jan-2024
https://doi.org/10.3390/aerospace11010096
Le PGong XUng LYang HKeenan BZhang LHe T(2024)A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression dataFrontiers in Systems Biology10.3389/fsysb.2024.13555954Online publication date: 21-Mar-2024
https://doi.org/10.3389/fsysb.2024.1355595
Hautala AShavazipour BAfsar BTulppo MMiettinen K(2024)Machine learning models for assessing risk factors affecting health care costs: 12-month exercise-based cardiac rehabilitationFrontiers in Public Health10.3389/fpubh.2024.137834912Online publication date: 28-May-2024
https://doi.org/10.3389/fpubh.2024.1378349
Zhu TFeng YDong XYang XLiu BYuan PSong XChen SSui S(2024)Optimizing DUS testing for Chimonanthus praecox using feature selection based on a genetic algorithmFrontiers in Plant Science10.3389/fpls.2023.132860314Online publication date: 18-Jan-2024
https://doi.org/10.3389/fpls.2023.1328603
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents