Article

The Bayesian backfitting relevance vector machine

Authors:

Sethu Vijayakumar,

Stefan SchaalAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 31

https://doi.org/10.1145/1015330.1015358

Published: 04 July 2004 Publication History

Abstract

Traditional non-parametric statistical learning techniques are often computationally attractive, but lack the same generalization and model selection abilities as state-of-the-art Bayesian algorithms which, however, are usually computationally prohibitive. This paper makes several important contributions that allow Bayesian learning to scale to more complex, real-world learning scenarios. Firstly, we show that backfitting --- a traditional non-parametric, yet highly efficient regression tool --- can be derived in a novel formulation within an expectation maximization (EM) framework and thus can finally be given a probabilistic interpretation. Secondly, we show that the general framework of sparse Bayesian learning and in particular the relevance vector machine (RVM), can be derived as a highly efficient algorithm using a Bayesian version of backfitting at its core. As we demonstrate on several regression and classification benchmarks, Bayesian backfitting offers a compelling alternative to current regression methods, especially when the size and dimensionality of the data challenge computational resources.

References

[1]

Bishop, C. M., & Tipping, M. E. (2000). Variational relevance vector machine. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (pp. 46--53). Morgan Kaufmann Publishers.

Digital Library

[2]

Csató, L., & Opper, M. (2001). Sparse representation for Gaussian process models. In (Leen et al., 2001), 444--450.

[3]

Fukumizu, K., Bach, F. R., & Jordan, M. I. (2004). Dimensionality reduction for supervised learning using reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5, 73--99.

Digital Library

[4]

Ghahramani, Z., & Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems 12 (pp. 509--514). Cambridge, MA: MIT Press.

[5]

Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. No. 43 in Monographs on Statistics and Applied Probability. Chapman & Hall.

[6]

Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25--37.

Digital Library

[7]

Leen, T. K., Diettrich, T. G., & Tresp, V. (Eds.). (2001). Advances in neural information processing systems 13, vol. 13. Cambridge, MA: MIT Press.

[8]

MacKay, D. J. C. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11, 1035--1068.

Digital Library

[9]

Massey, W. F. (1965). Principal component regression in exploratory statistical research. Journal of the American Statistical Association, 60, 234--246.

[10]

Neal, R. M. (1994). Bayesian learning for neural networks. Doctoral dissertation, Dept. of Computer Science, University of Toronto.

Digital Library

[11]

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing. Cambridge University Press. 2 edition.

Digital Library

[12]

Schaal, S., Vijayakumar, S., & Atkeson, C. G. (1998). Local dimensionality reduction. Advances in Neural Information Processing Systems 10 (pp. 633--639). Cambridge, MA: MIT Press.

Digital Library

[13]

Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211--244.

Digital Library

[14]

Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximization for sparse Bayesian models. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics.

[15]

Vijayakumar, S., & Schaal, S. (2000). An O (n) algorithm for incremental real time learning in high dimensional space. Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000) (pp. 1079--1086). Stanford, CA.

Digital Library

[16]

Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. Advances in Neural Information Processing Systems 8 (pp. 514--520). Cambridge, MA: MIT Press.

[17]

Williams, C. K. I., & Seeger, M. (2001). Using the Nyströöm method to speed up kernel machines. In (Leen et al., 2001), 682--688.

[18]

Wold, H. (1975). Soft modeling by latent variables: The nonlinear iterative partial least squares approach. In J. Gani (Ed.), Perspectives in probability and statistics, papers in honour of M. S. Bartlett, 520--540. London: Academic Press.

Cited By

Qu HWang JLi BYu M(2017)Probabilistic Model for Robust Affine and Non-Rigid Point Set MatchingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.254565939:2(371-384)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1109/TPAMI.2016.2545659
Nakajima SSugiyama MBabacan S(2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-27(481-498)Online publication date: 16-Jun-2016
https://doi.org/10.1201/b20190-27
(2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-22(21-1-21-18)Online publication date: 16-Jun-2016
https://doi.org/10.1201/b20190-22
Show More Cited By

The Bayesian backfitting relevance vector machine
1. Computing methodologies
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms

Recommendations

Smooth relevance vector machine: a smoothness prior extension of the RVM

Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes ...
Sparse Bayesian Adversarial Learning Using Relevance Vector Machine Ensembles
ICDM '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Data mining tasks are made more complicated when adversaries attack by modifying malicious data to evade detection. The main challenge lies in finding a robust learning model that is insensitive to unpredictable malicious data distribution. In this ...
The Probabilistic Model and Forecasting of Power Load Based on Variational Bayesian Expectation Maximization and Relevance Vector Machine

As the surging demands of secure power supply and reliable power system, the power load approximation and forecasting are becoming more significant and more important. Different from the current research work, we integrate power load approximation and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
568
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qu HWang JLi BYu M(2017)Probabilistic Model for Robust Affine and Non-Rigid Point Set MatchingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.254565939:2(371-384)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1109/TPAMI.2016.2545659
Nakajima SSugiyama MBabacan S(2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-27(481-498)Online publication date: 16-Jun-2016
https://doi.org/10.1201/b20190-27
(2016)Bayesian Sparse Estimation for Background/Foreground SeparationHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-22(21-1-21-18)Online publication date: 16-Jun-2016
https://doi.org/10.1201/b20190-22
Yeun YKim N(2014)The Group Latent Variable Approach to Probit Binary ClassificationsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2013.228578425:7(1277-1286)Online publication date: Jul-2014
https://doi.org/10.1109/TNNLS.2013.2285784
Meier FHennig PSchaal S(2014)Efficient Bayesian local model learning for control2014 IEEE/RSJ International Conference on Intelligent Robots and Systems10.1109/IROS.2014.6942865(2244-2249)Online publication date: Sep-2014
https://doi.org/10.1109/IROS.2014.6942865
Barrio IRomero EBelanche L(2007)Extended linear models with Gaussian prior on the parameters and adaptive expansion vectorsProceedings of the 17th international conference on Artificial neural networks10.5555/1776814.1776863(431-440)Online publication date: 9-Sep-2007
https://dl.acm.org/doi/10.5555/1776814.1776863
Schmolck AEverson R(2007)Smooth relevance vector machineMachine Language10.1007/s10994-007-5012-z68:2(107-135)Online publication date: 1-Aug-2007
https://dl.acm.org/doi/10.1007/s10994-007-5012-z
Barrio IRomero EBelanche L(2007)Extended Linear Models with Gaussian Prior on the Parameters and Adaptive Expansion VectorsArtificial Neural Networks – ICANN 200710.1007/978-3-540-74690-4_44(431-440)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74690-4_44
Ting JD'Souza ASchaal SCohen WMoore A(2006)Bayesian regression with input noise for high dimensional dataProceedings of the 23rd international conference on Machine learning10.1145/1143844.1143962(937-944)Online publication date: 25-Jun-2006
https://dl.acm.org/doi/10.1145/1143844.1143962

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents