Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394486.3403200acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Leveraging Model Inherent Variable Importance for Stable Online Feature Selection

Published: 20 August 2020 Publication History

Abstract

Feature selection can be a crucial factor in obtaining robust and accurate predictions. Online feature selection models, however, operate under considerable restrictions; they need to efficiently extract salient input features based on a bounded set of observations, while enabling robust and accurate predictions. In this work, we introduce FIRES, a novel framework for online feature selection. The proposed feature weighting mechanism leverages the importance information inherent in the parameters of a predictive model. By treating model parameters as random variables, we can penalize features with high uncertainty and thus generate more stable feature sets. Our framework is generic in that it leaves the choice of the underlying model to the user. Strikingly, experiments suggest that the model complexity has only a minor effect on the discriminative power and stability of the selected feature sets. In fact, using a simple linear model, FIRES obtains feature sets that compete with state-of-the-art methods, while dramatically reducing computation time. In addition, experiments show that the proposed framework is clearly superior in terms of feature selection stability.

References

[1]
Jean Paul Barddal, Fabricio Enembreck, Heitor Murilo Gomes, Albert Bifet, and Bernhard Pfahringer. 2019. Boosting decision stumps for dynamic feature selection on data streams. Information Systems, Vol. 83 (2019), 13--29.
[2]
Albert Bifet, Gianmarco de Francisci Morales, Jesse Read, Geoff Holmes, and Bernhard Pfahringer. 2015. Efficient online evaluation of big data stream classifiers. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 59--68.
[3]
Verónica Bolón-Canedo, Noelia Sánchez-Maro no, and Amparo Alonso-Betanzos. 2015. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Systems, Vol. 86 (2015), 33--45.
[4]
Vadim Borisov, Johannes Haug, and Gjergji Kasneci. 2019. CancelOut: A Layer for Feature Selection in Deep Neural Networks. In International Conference on Artificial Neural Networks. Springer, 72--83.
[5]
Olivier Bousquet and André Elisseeff. 2002. Stability and generalization. Journal of machine learning research, Vol. 2, Mar (2002), 499--526.
[6]
Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C Wilson, and Michael I Jordan. 2013. Streaming variational bayes. In Advances in neural information processing systems. 1727--1735.
[7]
Vitor R Carvalho and William W Cohen. 2006. Single-pass online learning: Performance, voting schemes and online feature selection. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 548--553.
[8]
Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering, Vol. 40, 1 (2014), 16--28.
[9]
Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 71--80.
[10]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[11]
Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784 (2017).
[12]
Jo ao Gama, Indr.e vZ liobait.e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys (CSUR), Vol. 46, 4 (2014), 44.
[13]
Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. 2018. Making machine learning robust against adversarial inputs. Commun. ACM, Vol. 61, 7 (2018), 56--66.
[14]
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research, Vol. 3, Mar (2003), 1157--1182.
[15]
Ronan Hamon, Henrik Junklewitz, and Ignacio Sanchez. 2020. Robustness and Explainability of Artificial Intelligence. Publications Office of the European Union (2020).
[16]
Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031--1040.
[17]
Ozan Irsoy, Olcay Taner Yildiz, and Ethem Alpaydin. 2012. Soft decision trees. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 1819--1822.
[18]
Alexandros Kalousis, Julien Prados, and Melanie Hilario. 2005. Stability of feature selection algorithms. In Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 8--pp.
[19]
Gjergji Kasneci and Thomas Gottron. 2016. Licon: A linear weighting scheme for the contribution of input variables in deep artificial neural networks. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 45--54.
[20]
Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2005. On the utility of incremental feature selection for the classification of textual data streams. In Panhellenic Conference on Informatics. Springer, 338--348.
[21]
Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Online group feature selection from feature streams. In Twenty-seventh AAAI conference on artificial intelligence .
[22]
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Computing Surveys (CSUR), Vol. 50, 6 (2018), 94.
[23]
Yanbin Liu, Yan Yan, Ling Chen, Yahong Han, and Yi Yang. 2019. Adaptive Sparse Confidence-Weighted Learning for Online Feature Selection. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (2019), 4408--4415. https://doi.org/10.1609/aaai.v33i01.33014408
[24]
Mohammad M Masud, Qing Chen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2010. Classification and novel class detection of data streams in a dynamic feature space. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 337--352.
[25]
Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. 2018. Scikit-multiflow: a multi-output streaming framework. The Journal of Machine Learning Research, Vol. 19, 1 (2018), 2915--2914.
[26]
John Ashworth Nelder and Robert WM Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General), Vol. 135, 3 (1972), 370--384.
[27]
Hai-Long Nguyen, Yew-Kwong Woon, Wee-Keong Ng, and Li Wan. 2012. Heterogeneous ensemble for feature drifts in data streams. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 1--12.
[28]
Sarah Nogueira, Konstantinos Sechidis, and Gavin Brown. 2017. On the Stability of Feature Selection Algorithms. Journal of Machine Learning Research, Vol. 18 (2017), 174--1.
[29]
Simon Perkins, Kevin Lacker, and James Theiler. 2003. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of machine learning research, Vol. 3, Mar (2003), 1333--1356.
[30]
Sergio Ramirez-Gallego, Bartosz Krawczyk, Salvador Garcia, Michał Wozniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, Vol. 239 (2017), 39--57.
[31]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206--215.
[32]
Peter Turney. 1995. Bias and the quantification of stability. Machine Learning, Vol. 20, 1--2 (1995), 23--33.
[33]
Boyu Wang and Joelle Pineau. 2016. Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 12 (2016), 3353--3366.
[34]
Jing Wang, Jie Shen, and Ping Li. 2018. Provable variable selection for streaming features. In International Conference on Machine Learning. 5158--5166.
[35]
Jialei Wang, Peilin Zhao, Steven CH Hoi, and Rong Jin. 2013. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 3 (2013), 698--710.
[36]
Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2012. Online feature selection with streaming features. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 5 (2012), 1178--1192.
[37]
Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2014. Towards scalable and accurate online feature selection for big data. In 2014 IEEE International Conference on Data Mining. IEEE, 660--669.
[38]
Jing Zhou, Dean P Foster, Robert A Stine, and Lyle H Ungar. 2006. Streamwise feature selection. Journal of Machine Learning Research, Vol. 7, Sep (2006), 1861--1885.

Cited By

View all
  • (2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
  • (2024)Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream ClassificationMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_5(73-89)Online publication date: 22-Aug-2024
  • (2023)Online Feature Screening for Data Streams With Concept DriftIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323275235:11(11693-11707)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. feature selection
  3. stability
  4. uncertainty

Qualifiers

  • Research-article

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)10
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Deep Neural Networks and Tabular Data: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322916135:6(7499-7519)Online publication date: Jun-2024
  • (2024)Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream ClassificationMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_5(73-89)Online publication date: 22-Aug-2024
  • (2023)Online Feature Screening for Data Streams With Concept DriftIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323275235:11(11693-11707)Online publication date: 1-Nov-2023
  • (2022)Change Detection for Local Explainability in Evolving Data StreamsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557257(706-716)Online publication date: 17-Oct-2022
  • (2022)Online Feature Selection for Efficient Learning in Networked SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2022.318093619:3(2885-2898)Online publication date: Sep-2022
  • (2022)Dynamic Model Tree for Interpretable Data Stream Learning2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00237(2562-2574)Online publication date: May-2022
  • (2021)Learning Parameter Distributions to Detect Concept Drift in Data Streams2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412499(9452-9459)Online publication date: 10-Jan-2021
  • (2021)WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671602(3031-3041)Online publication date: 15-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media