Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1015330.1015358acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

The Bayesian backfitting relevance vector machine

Published: 04 July 2004 Publication History

Abstract

Traditional non-parametric statistical learning techniques are often computationally attractive, but lack the same generalization and model selection abilities as state-of-the-art Bayesian algorithms which, however, are usually computationally prohibitive. This paper makes several important contributions that allow Bayesian learning to scale to more complex, real-world learning scenarios. Firstly, we show that backfitting --- a traditional non-parametric, yet highly efficient regression tool --- can be derived in a novel formulation within an expectation maximization (EM) framework and thus can finally be given a probabilistic interpretation. Secondly, we show that the general framework of sparse Bayesian learning and in particular the relevance vector machine (RVM), can be derived as a highly efficient algorithm using a Bayesian version of backfitting at its core. As we demonstrate on several regression and classification benchmarks, Bayesian backfitting offers a compelling alternative to current regression methods, especially when the size and dimensionality of the data challenge computational resources.

References

[1]
Bishop, C. M., & Tipping, M. E. (2000). Variational relevance vector machine. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (pp. 46--53). Morgan Kaufmann Publishers.
[2]
Csató, L., & Opper, M. (2001). Sparse representation for Gaussian process models. In (Leen et al., 2001), 444--450.
[3]
Fukumizu, K., Bach, F. R., & Jordan, M. I. (2004). Dimensionality reduction for supervised learning using reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5, 73--99.
[4]
Ghahramani, Z., & Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems 12 (pp. 509--514). Cambridge, MA: MIT Press.
[5]
Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. No. 43 in Monographs on Statistics and Applied Probability. Chapman & Hall.
[6]
Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25--37.
[7]
Leen, T. K., Diettrich, T. G., & Tresp, V. (Eds.). (2001). Advances in neural information processing systems 13, vol. 13. Cambridge, MA: MIT Press.
[8]
MacKay, D. J. C. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11, 1035--1068.
[9]
Massey, W. F. (1965). Principal component regression in exploratory statistical research. Journal of the American Statistical Association, 60, 234--246.
[10]
Neal, R. M. (1994). Bayesian learning for neural networks. Doctoral dissertation, Dept. of Computer Science, University of Toronto.
[11]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing. Cambridge University Press. 2 edition.
[12]
Schaal, S., Vijayakumar, S., & Atkeson, C. G. (1998). Local dimensionality reduction. Advances in Neural Information Processing Systems 10 (pp. 633--639). Cambridge, MA: MIT Press.
[13]
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211--244.
[14]
Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximization for sparse Bayesian models. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics.
[15]
Vijayakumar, S., & Schaal, S. (2000). An O (n) algorithm for incremental real time learning in high dimensional space. Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000) (pp. 1079--1086). Stanford, CA.
[16]
Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. Advances in Neural Information Processing Systems 8 (pp. 514--520). Cambridge, MA: MIT Press.
[17]
Williams, C. K. I., & Seeger, M. (2001). Using the Nyströöm method to speed up kernel machines. In (Leen et al., 2001), 682--688.
[18]
Wold, H. (1975). Soft modeling by latent variables: The nonlinear iterative partial least squares approach. In J. Gani (Ed.), Perspectives in probability and statistics, papers in honour of M. S. Bartlett, 520--540. London: Academic Press.

Cited By

View all
  • (2017)Probabilistic Model for Robust Affine and Non-Rigid Point Set MatchingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.254565939:2(371-384)Online publication date: 1-Feb-2017
  • (2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-27(481-498)Online publication date: 16-Jun-2016
  • (2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-22(21-1-21-18)Online publication date: 16-Jun-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
  • Conference Chair:
  • Carla Brodley
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Probabilistic Model for Robust Affine and Non-Rigid Point Set MatchingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.254565939:2(371-384)Online publication date: 1-Feb-2017
  • (2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-27(481-498)Online publication date: 16-Jun-2016
  • (2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-22(21-1-21-18)Online publication date: 16-Jun-2016
  • (2014)The Group Latent Variable Approach to Probit Binary ClassificationsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2013.228578425:7(1277-1286)Online publication date: Jul-2014
  • (2014)Efficient Bayesian local model learning for control2014 IEEE/RSJ International Conference on Intelligent Robots and Systems10.1109/IROS.2014.6942865(2244-2249)Online publication date: Sep-2014
  • (2007)Extended linear models with Gaussian prior on the parameters and adaptive expansion vectorsProceedings of the 17th international conference on Artificial neural networks10.5555/1776814.1776863(431-440)Online publication date: 9-Sep-2007
  • (2007)Smooth relevance vector machineMachine Language10.1007/s10994-007-5012-z68:2(107-135)Online publication date: 1-Aug-2007
  • (2007)Extended Linear Models with Gaussian Prior on the Parameters and Adaptive Expansion VectorsArtificial Neural Networks – ICANN 200710.1007/978-3-540-74690-4_44(431-440)Online publication date: 2007
  • (2006)Bayesian regression with input noise for high dimensional dataProceedings of the 23rd international conference on Machine learning10.1145/1143844.1143962(937-944)Online publication date: 25-Jun-2006

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media