article

Smooth relevance vector machine: a smoothness prior extension of the RVM

Authors:

Alexander Schmolck,

Richard EversonAuthors Info & Claims

Machine Learning, Volume 68, Issue 2

Pages 107 - 135

https://doi.org/10.1007/s10994-007-5012-z

Published: 01 August 2007 Publication History

Abstract

Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing--possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.

References

[1]

Arfken, G. (1985). Mathematical methods for physicists. New York: Academic Press.

[2]

Bernardo, J., & Smith, A. (1994). Bayesian theory. New York: Wiley.

[3]

Bishop, C., & Tipping, M. (2000). Variational relevance vector machines. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 46-53).

Digital Library

[4]

Bobin, J., Moudden, Y., Starck, J.-L., & Elad, M. (2005). Multichannel morphological component analysis. In Proceedings of Spars05 (pp. 103-106), Rennes, France.

[5]

Chipman, H., Kolaczyk, E., & McCulloch, R. (1997). Adaptive Bayesian wavelet shrinkage. Journal of the American Statistical Association, 92, 1413-1421.

[6]

Clarkson, E., & Barrett, H. (2001). High-pass filters give histograms with positive kurtosis. Optics Letters, 26(16), 1253-1255.

[7]

Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

Digital Library

[8]

Denison, D., Holmes, C., Mallick, B., & Smith, A. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.

[9]

Donoho, D., & Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425- 455.

[10]

D'Souza, A., Vijayakumar, S., & Schaal, S. (2004). The Bayesian backfitting relevance vector machine. In Proceedings of the international conference on machine learning (ICML 2004).

Digital Library

[11]

Faul, A., & Tipping, M. (2002). Analysis of sparse Bayesian learning. In Advances in neural information processing systems (Vol. 14).

[12]

Figueiredo, M. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150-1159.

Digital Library

[13]

Fokoué, E., Goel, P., & Sun, D. (2004). A prior for consistent estimation for the relevance vector machine. Technical report, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC, USA.

[14]

Girolami, M., & Rogers, S. (2005). Hierachic Bayesian models for kernel learning. In 22nd international conference on machine learning (ICML 2005) (pp. 241-248), Bonn.

Digital Library

[15]

Golub, G. H., & van Loan, C. E. (1989). Matrix computations (2nd ed.). Baltimore: Hopkins.

Digital Library

[16]

Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.

[17]

Hoerl, A., & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55-67.

[18]

Holmes, C., & Denison, G. (1999). Bayesian wavelet analysis with a model complexity prior. In Bayesian statistics 6: proceedings of the sixth Valencia international meeting (pp. 769-776), Oxford.

[19]

Jansen, M. (2001). Noise reduction by wavelet thresholding. New York: Springer.

[20]

Lam, E., & Goodman, J. (2000). A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Processing, 9, 1661-1666.

Digital Library

[21]

MacKay, D. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720-736.

Digital Library

[22]

Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). New York: Academic Press.

[23]

Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1992). Numerical Recipes in C (2nd ed.). Cambridge: Cambridge University Press.

Digital Library

[24]

Quiñonero-Candela, J. (2004). Learning with uncertainty--Gaussian processes and relevance vector machines. PhD thesis, Technical University of Denmark, Lyngby, Denmark.

[25]

Roweis, S. (1999). Matrix identities. Available from http://www.cs.toronto.edu/~roweis/notes/matrixid.pdf.

[26]

Schmolck, A., & Everson, R. (2005). Smoothness priors for sparse Bayesian regression. In Workshop on signal processing with adaptive sparse structured representations, Rennes, France.

[27]

Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT.

[28]

Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In P. Langley (Ed.), Proceedings of the 17th international conference on machine learning (pp. 911-918). San Fransisco: Morgan Kaufman.

Digital Library

[29]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (B), 58, 267-288.

[30]

Tipping, M. (2000). The relevance vector machine. In A. Solla, T. Leen, K.-R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 652-658). Cambridge: MIT.

[31]

Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211-244.

Digital Library

[32]

Tipping, M., & Faul, A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the ninth international workshop on artificial intelligence and statistics.

[33]

Vidakovic, B. (1998a). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. Journal of the American Statistical Association, 93, 173-179.

[34]

Vidakovic, B. (1998b). Wavelet-based non-parametric Bayes Methods. In Practical nonparametric and semi-parametric Bayesian statistics (pp. 133-155). New York: Springer.

[35]

Wipf, D., & Rao, B. (2004). Perspectives on sparse Bayesian learning. In Advances in neural information processing systems (Vol. 16).

[36]

Wipf, D., & Rao, B. (2005). Finding sparse representations in multiple response models via Bayesian learning. In Proceedings of Spars05 (pp. 155-158), Rennes.

Cited By

Helgøy ISkaug HLi Y(2024)Sparse Bayesian learning using TMB (Template Model Builder)Statistics and Computing10.1007/s11222-024-10476-834:5Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1007/s11222-024-10476-8
Koda S(2016)Adaptive Sparse Bayesian Regression with Variational Inference for Parameter EstimationStructural, Syntactic, and Statistical Pattern Recognition10.1007/978-3-319-49055-7_24(263-273)Online publication date: 29-Nov-2016
https://dl.acm.org/doi/10.1007/978-3-319-49055-7_24
Blekas KLikas A(2014)Sparse regression mixture modeling with the multi-kernel relevance vector machineKnowledge and Information Systems10.1007/s10115-013-0704-039:2(241-264)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1007/s10115-013-0704-0
Show More Cited By

Index Terms

Smooth relevance vector machine: a smoothness prior extension of the RVM
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis

Recommendations

Incremental Learning Algorithm Based on Relevance Vector Machine
ICAIP '18: Proceedings of the 2nd International Conference on Advances in Image Processing

Aiming at the large memory footprint of traditional vector machine (Relevance Vector Machine, RVM) when processing big data in supervised learning, the idea of incremental learning is introduced into the traditional RVM and the incremental learning of ...
Pairing support vector algorithm for data regression

This study introduces a novel and efficient pairing support vector algorithm for data regression, called PSVR. The introduced PSVR approach aims at estimating an insensitive zone of flexible shape to tightly fit the training samples. Motivated by the ...
Fatigue crack growth estimation by relevance vector machine

The investigation of damage propagation mechanisms on a selected safety-critical component or structure requires the quantification of its remaining useful life (RUL) to verify until when it can continue performing the required function. In this work, a ...

Comments

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 68, Issue 2

August 2007

92 pages

ISSN:0885-6125

Issue’s Table of Contents

Copyright © Copyright © 2007 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2007

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Helgøy ISkaug HLi Y(2024)Sparse Bayesian learning using TMB (Template Model Builder)Statistics and Computing10.1007/s11222-024-10476-834:5Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1007/s11222-024-10476-8
Koda S(2016)Adaptive Sparse Bayesian Regression with Variational Inference for Parameter EstimationStructural, Syntactic, and Statistical Pattern Recognition10.1007/978-3-319-49055-7_24(263-273)Online publication date: 29-Nov-2016
https://dl.acm.org/doi/10.1007/978-3-319-49055-7_24
Blekas KLikas A(2014)Sparse regression mixture modeling with the multi-kernel relevance vector machineKnowledge and Information Systems10.1007/s10115-013-0704-039:2(241-264)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1007/s10115-013-0704-0
Cheng DNguyen MGao JShi D(2013)On the construction of the relevance vector machine based on Bayesian Ying-Yang harmony learningNeural Networks10.1016/j.neunet.2013.08.00548(173-179)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.neunet.2013.08.005
Lima CCoelho A(2011)Kernel machines for epilepsy diagnosis via EEG signal classificationArtificial Intelligence in Medicine10.1016/j.artmed.2011.07.00353:2(83-95)Online publication date: 1-Oct-2011
https://dl.acm.org/doi/10.1016/j.artmed.2011.07.003
Tzikas DLikas A(2010)An incremental Bayesian approach for training multilayer perceptronsProceedings of the 20th international conference on Artificial neural networks: Part I10.5555/1886351.1886365(87-96)Online publication date: 15-Sep-2010
https://dl.acm.org/doi/10.5555/1886351.1886365
Psorakis IDamoulas TGirolami M(2010)Multiclass relevance vector machinesIEEE Transactions on Neural Networks10.1109/TNN.2010.206478721:10(1588-1598)Online publication date: 1-Oct-2010
https://dl.acm.org/doi/10.1109/TNN.2010.2064787
Tzikas DLikas AGalatsanos N(2009)Sparse Bayesian modeling with adaptive kernel learningIEEE Transactions on Neural Networks10.1109/TNN.2009.201406020:6(926-937)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1109/TNN.2009.2014060
Yuan JBo LWang KYu T(2009)Adaptive spherical Gaussian kernel in sparse Bayesian learning framework for nonlinear regressionExpert Systems with Applications: An International Journal10.1016/j.eswa.2008.02.05536:2(3982-3989)Online publication date: 1-Mar-2009
https://dl.acm.org/doi/10.1016/j.eswa.2008.02.055

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents