research-article

Leveraging Model Inherent Variable Importance for Stable Online Feature Selection

Authors:

Martin Pawelczyk,

Klaus Broelemann,

Gjergji KasneciAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1478 - 1502

https://doi.org/10.1145/3394486.3403200

Published: 20 August 2020 Publication History

Abstract

Feature selection can be a crucial factor in obtaining robust and accurate predictions. Online feature selection models, however, operate under considerable restrictions; they need to efficiently extract salient input features based on a bounded set of observations, while enabling robust and accurate predictions. In this work, we introduce FIRES, a novel framework for online feature selection. The proposed feature weighting mechanism leverages the importance information inherent in the parameters of a predictive model. By treating model parameters as random variables, we can penalize features with high uncertainty and thus generate more stable feature sets. Our framework is generic in that it leaves the choice of the underlying model to the user. Strikingly, experiments suggest that the model complexity has only a minor effect on the discriminative power and stability of the selected feature sets. In fact, using a simple linear model, FIRES obtains feature sets that compete with state-of-the-art methods, while dramatically reducing computation time. In addition, experiments show that the proposed framework is clearly superior in terms of feature selection stability.

References

[1]

Jean Paul Barddal, Fabricio Enembreck, Heitor Murilo Gomes, Albert Bifet, and Bernhard Pfahringer. 2019. Boosting decision stumps for dynamic feature selection on data streams. Information Systems, Vol. 83 (2019), 13--29.

Digital Library

[2]

Albert Bifet, Gianmarco de Francisci Morales, Jesse Read, Geoff Holmes, and Bernhard Pfahringer. 2015. Efficient online evaluation of big data stream classifiers. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 59--68.

Digital Library

[3]

Verónica Bolón-Canedo, Noelia Sánchez-Maro no, and Amparo Alonso-Betanzos. 2015. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Systems, Vol. 86 (2015), 33--45.

Digital Library

[4]

Vadim Borisov, Johannes Haug, and Gjergji Kasneci. 2019. CancelOut: A Layer for Feature Selection in Deep Neural Networks. In International Conference on Artificial Neural Networks. Springer, 72--83.

[5]

Olivier Bousquet and André Elisseeff. 2002. Stability and generalization. Journal of machine learning research, Vol. 2, Mar (2002), 499--526.

Digital Library

[6]

Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C Wilson, and Michael I Jordan. 2013. Streaming variational bayes. In Advances in neural information processing systems. 1727--1735.

[7]

Vitor R Carvalho and William W Cohen. 2006. Single-pass online learning: Performance, voting schemes and online feature selection. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 548--553.

Digital Library

[8]

Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering, Vol. 40, 1 (2014), 16--28.

Digital Library

[9]

Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 71--80.

Digital Library

[10]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[11]

Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784 (2017).

[12]

Jo ao Gama, Indr.e vZ liobait.e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys (CSUR), Vol. 46, 4 (2014), 44.

[13]

Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. 2018. Making machine learning robust against adversarial inputs. Commun. ACM, Vol. 61, 7 (2018), 56--66.

Digital Library

[14]

Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research, Vol. 3, Mar (2003), 1157--1182.

Digital Library

[15]

Ronan Hamon, Henrik Junklewitz, and Ignacio Sanchez. 2020. Robustness and Explainability of Artificial Intelligence. Publications Office of the European Union (2020).

[16]

Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031--1040.

Digital Library

[17]

Ozan Irsoy, Olcay Taner Yildiz, and Ethem Alpaydin. 2012. Soft decision trees. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 1819--1822.

[18]

Alexandros Kalousis, Julien Prados, and Melanie Hilario. 2005. Stability of feature selection algorithms. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 8--pp.

Digital Library

[19]

Gjergji Kasneci and Thomas Gottron. 2016. Licon: A linear weighting scheme for the contribution of input variables in deep artificial neural networks. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 45--54.

Digital Library

[20]

Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2005. On the utility of incremental feature selection for the classification of textual data streams. In Panhellenic Conference on Informatics. Springer, 338--348.

Digital Library

[21]

Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Online group feature selection from feature streams. In Twenty-seventh AAAI conference on artificial intelligence .

Digital Library

[22]

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Computing Surveys (CSUR), Vol. 50, 6 (2018), 94.

Digital Library

[23]

Yanbin Liu, Yan Yan, Ling Chen, Yahong Han, and Yi Yang. 2019. Adaptive Sparse Confidence-Weighted Learning for Online Feature Selection. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (2019), 4408--4415. https://doi.org/10.1609/aaai.v33i01.33014408

Digital Library

[24]

Mohammad M Masud, Qing Chen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2010. Classification and novel class detection of data streams in a dynamic feature space. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 337--352.

Digital Library

[25]

Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. 2018. Scikit-multiflow: a multi-output streaming framework. The Journal of Machine Learning Research, Vol. 19, 1 (2018), 2915--2914.

Digital Library

[26]

John Ashworth Nelder and Robert WM Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General), Vol. 135, 3 (1972), 370--384.

[27]

Hai-Long Nguyen, Yew-Kwong Woon, Wee-Keong Ng, and Li Wan. 2012. Heterogeneous ensemble for feature drifts in data streams. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 1--12.

Digital Library

[28]

Sarah Nogueira, Konstantinos Sechidis, and Gavin Brown. 2017. On the Stability of Feature Selection Algorithms. Journal of Machine Learning Research, Vol. 18 (2017), 174--1.

[29]

Simon Perkins, Kevin Lacker, and James Theiler. 2003. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of machine learning research, Vol. 3, Mar (2003), 1333--1356.

Digital Library

[30]

Sergio Ramirez-Gallego, Bartosz Krawczyk, Salvador Garcia, Michał Wozniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, Vol. 239 (2017), 39--57.

Digital Library

[31]

Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206--215.

[32]

Peter Turney. 1995. Bias and the quantification of stability. Machine Learning, Vol. 20, 1--2 (1995), 23--33.

Digital Library

[33]

Boyu Wang and Joelle Pineau. 2016. Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 12 (2016), 3353--3366.

Digital Library

[34]

Jing Wang, Jie Shen, and Ping Li. 2018. Provable variable selection for streaming features. In International Conference on Machine Learning. 5158--5166.

[35]

Jialei Wang, Peilin Zhao, Steven CH Hoi, and Rong Jin. 2013. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 3 (2013), 698--710.

Digital Library

[36]

Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2012. Online feature selection with streaming features. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 5 (2012), 1178--1192.

[37]

Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2014. Towards scalable and accurate online feature selection for big data. In 2014 IEEE International Conference on Data Mining. IEEE, 660--669.

Digital Library

[38]

Jing Zhou, Dean P Foster, Robert A Stine, and Lyle H Ungar. 2006. Streamwise feature selection. Journal of Machine Learning Research, Vol. 7, Sep (2006), 1861--1885.

Digital Library

Cited By

Borisov VLeemann TSeßler KHaug JPawelczyk MKasneci G(2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3229161
Zyblewski P(2024)Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream ClassificationMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_5(73-89)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70371-3_5
Wang MBarbu A(2023)Online Feature Screening for Data Streams With Concept DriftIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323275235:11(11693-11707)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3232752
Show More Cited By

Index Terms

Leveraging Model Inherent Variable Importance for Stable Online Feature Selection

Recommendations

Stable feature selection via dense feature groups
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Many feature selection algorithms have been proposed in the past focusing on improving classification accuracy. In this work, we point out the importance of stable feature selection for knowledge discovery from high-dimensional data, and identify two ...
Stable and accurate feature selection
ECMLPKDD'09: Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

In addition to accuracy, stability is also a measure of success for a feature selection algorithm. Stability could especially be a concern when the number of samples in a data set is small and the dimensionality is high. In this study, we introduce a ...
Stable Feature Selection with Minimal Independent Dominating Sets
BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

In this paper, we focus on stable selection of relevant features. The main contribution is a novel framework for selecting most informative features which can preserve the linear combination property of the original feature space. We propose a novel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
446
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)10

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Borisov VLeemann TSeßler KHaug JPawelczyk MKasneci G(2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3229161
Zyblewski P(2024)Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream ClassificationMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_5(73-89)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70371-3_5
Wang MBarbu A(2023)Online Feature Screening for Data Streams With Concept DriftIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323275235:11(11693-11707)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3232752
Haug JBraun AZürn SKasneci GAl Hasan MXiong L(2022)Change Detection for Local Explainability in Evolving Data StreamsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557257(706-716)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557257
Wang XStadler R(2022)Online Feature Selection for Efficient Learning in Networked SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2022.318093619:3(2885-2898)Online publication date: Sep-2022
https://doi.org/10.1109/TNSM.2022.3180936
Haug JBroelemann KKasneci G(2022)Dynamic Model Tree for Interpretable Data Stream Learning2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00237(2562-2574)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00237
Haug JKasneci G(2021)Learning Parameter Distributions to Detect Concept Drift in Data Streams2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412499(9452-9459)Online publication date: 10-Jan-2021
https://doi.org/10.1109/ICPR48806.2021.9412499
Bel OMukhopadhyay STallent NNawab FLong D(2021)WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671602(3031-3041)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671602

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents