Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1363686.1363909acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Deciding what to observe next: adaptive variable selection for regression in multivariate data streams

Published: 16 March 2008 Publication History

Abstract

Variable selection can be valuable in the analysis of streaming data with costly measurements, as in intensive care monitoring or battery-powered sensor networks. In the presence of drift, selections must be constantly revised, calling for adaptive variable selection schemes. An important and novel problem arises from the fact that non-selected variables become missing variables, which induces bias upon subsequent decisions. Here, we consider adaptive variable selection in the context of linear regression, using only a fraction of the available regressors per timepoint. We suggest a scheme that fits a multivariate Gaussian over a sliding window using the EM algorithm and selects which variables to observe next using the Lasso algorithm. We experiment with simulated and real data to demonstrate that very high prediction accuracy may be retained using as little as 10% of the data.

References

[1]
P. Brown, J., T. Fearn, and M. Vannucci. The choice of variables in multivariate regressio: a non-conjugate bayesian decision theory approach. Biometrika, 86(3):635--648, 1999.
[2]
P. Dellaportas, J. Forster, and I. Ntzoufras. On Bayesian Model and Variable Selection using MCMC. Statistics and Computing, 12(1):27--36, 2002.
[3]
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.
[4]
A. Deshpande, C. Guestrin, and S. Madden. Using Probabilistic Models for Data Management in Acquisitional Environments. Proc. of the Biennial Conf. on Innovative Data Sys. Res.(CIDR), pages 317--328, 2005.
[5]
A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-Driven Data Acquisition in Sensor Networks. Proc. of the 30th VLDB Conf., 2004.
[6]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least Angle Regression. Annals of Statistics, 32(2):407--499, 2004.
[7]
W. Fu. Penalized Regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7(3):397--416, 1998.
[8]
E. George. The Variable Selection Problem. Journal of the American Statistical Association, 95(452), 2000.
[9]
S. Han, E. Chan, R. Cheng, and K. Lam. A Statistics-Based Sensor Selection Scheme for Continuous Probabilistic Queries in Sensor Networks. Real-Time Systems, 35(1):33--58, 2007.
[10]
J. Hellerstein, M. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. Shah. Adaptive Query Processing: Technology in Evolution. IEEE Data Engineering Bulletin, 23(2):7--18, 2000.
[11]
R. Little and D. Rubin. Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics, 2002.
[12]
M. Sato. Online model selection based on the variational bayes. Neural Computation, 12:1649--1681, 2001.
[13]
K. Sjöstrand. Matlab implementation of LASSO, LARS, the elastic net and SPCA, jun 2005. Ver 2.0.
[14]
R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press Cambridge, MA, USA, 1998.
[15]
M. Thompson. Selection of variables in multiple regression: Part i. a review and evaluation. International Statistical Review/Revue Internationale de Statistique, 46(1):1--19, 1978.
[16]
R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267--288, 1996.
[17]
J. Vermorel and M. Mohri. Multi-Armed Bandit Algorithms and Empirical Evaluation. Proc of the 16th European Conf. on Machine Learning, pages 437--448.
[18]
H. Wang, J. Yin, J. Pei, P. Yu, and J. Yu. Suppressing Model Overfitting in Mining Concept-Drifting Data Streams. Proc. of the 12th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 736--741, 2006.
[19]
P. Zhang. Inference after Variable Selection in Linear Regression Models. Biometrika, 79(4):741--746, 1992.

Cited By

View all
  • (2019)Sensor Fusion Used in Applications for Hand Rehabilitation: A Systematic ReviewIEEE Sensors Journal10.1109/JSEN.2019.289708319:10(3581-3592)Online publication date: 15-May-2019
  • (2015)An efficient adaptive preprocessing mechanism for streaming sensor data2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO)10.1109/ISCO.2015.7282266(1-6)Online publication date: Jan-2015
  • (2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EM algorithm
  2. exploration-exploitation
  3. lasso
  4. sensor networks
  5. variable selection

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '08
Sponsor:
SAC '08: The 2008 ACM Symposium on Applied Computing
March 16 - 20, 2008
Fortaleza, Ceara, Brazil

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Sensor Fusion Used in Applications for Hand Rehabilitation: A Systematic ReviewIEEE Sensors Journal10.1109/JSEN.2019.289708319:10(3581-3592)Online publication date: 15-May-2019
  • (2015)An efficient adaptive preprocessing mechanism for streaming sensor data2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO)10.1109/ISCO.2015.7282266(1-6)Online publication date: Jan-2015
  • (2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014
  • (2008)Online optimization for variable selection in data streamsProceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence10.5555/1567281.1567314(132-136)Online publication date: 27-Jun-2008
  • (2008)Simulating Dynamic Covariance Structures for Testing the Adaptive Behavior of Variable Selection Algorithms (Invited Paper)Proceedings of the Tenth International Conference on Computer Modeling and Simulation10.1109/UKSIM.2008.92(52-57)Online publication date: 1-Apr-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media