Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Smooth relevance vector machine: a smoothness prior extension of the RVM

Published: 01 August 2007 Publication History

Abstract

Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing--possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.

References

[1]
Arfken, G. (1985). Mathematical methods for physicists. New York: Academic Press.
[2]
Bernardo, J., & Smith, A. (1994). Bayesian theory. New York: Wiley.
[3]
Bishop, C., & Tipping, M. (2000). Variational relevance vector machines. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 46-53).
[4]
Bobin, J., Moudden, Y., Starck, J.-L., & Elad, M. (2005). Multichannel morphological component analysis. In Proceedings of Spars05 (pp. 103-106), Rennes, France.
[5]
Chipman, H., Kolaczyk, E., & McCulloch, R. (1997). Adaptive Bayesian wavelet shrinkage. Journal of the American Statistical Association, 92, 1413-1421.
[6]
Clarkson, E., & Barrett, H. (2001). High-pass filters give histograms with positive kurtosis. Optics Letters, 26(16), 1253-1255.
[7]
Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.
[8]
Denison, D., Holmes, C., Mallick, B., & Smith, A. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.
[9]
Donoho, D., & Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425- 455.
[10]
D'Souza, A., Vijayakumar, S., & Schaal, S. (2004). The Bayesian backfitting relevance vector machine. In Proceedings of the international conference on machine learning (ICML 2004).
[11]
Faul, A., & Tipping, M. (2002). Analysis of sparse Bayesian learning. In Advances in neural information processing systems (Vol. 14).
[12]
Figueiredo, M. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150-1159.
[13]
Fokoué, E., Goel, P., & Sun, D. (2004). A prior for consistent estimation for the relevance vector machine. Technical report, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC, USA.
[14]
Girolami, M., & Rogers, S. (2005). Hierachic Bayesian models for kernel learning. In 22nd international conference on machine learning (ICML 2005) (pp. 241-248), Bonn.
[15]
Golub, G. H., & van Loan, C. E. (1989). Matrix computations (2nd ed.). Baltimore: Hopkins.
[16]
Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.
[17]
Hoerl, A., & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55-67.
[18]
Holmes, C., & Denison, G. (1999). Bayesian wavelet analysis with a model complexity prior. In Bayesian statistics 6: proceedings of the sixth Valencia international meeting (pp. 769-776), Oxford.
[19]
Jansen, M. (2001). Noise reduction by wavelet thresholding. New York: Springer.
[20]
Lam, E., & Goodman, J. (2000). A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Processing, 9, 1661-1666.
[21]
MacKay, D. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720-736.
[22]
Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). New York: Academic Press.
[23]
Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1992). Numerical Recipes in C (2nd ed.). Cambridge: Cambridge University Press.
[24]
Quiñonero-Candela, J. (2004). Learning with uncertainty--Gaussian processes and relevance vector machines. PhD thesis, Technical University of Denmark, Lyngby, Denmark.
[25]
Roweis, S. (1999). Matrix identities. Available from http://www.cs.toronto.edu/~roweis/notes/matrixid.pdf.
[26]
Schmolck, A., & Everson, R. (2005). Smoothness priors for sparse Bayesian regression. In Workshop on signal processing with adaptive sparse structured representations, Rennes, France.
[27]
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT.
[28]
Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In P. Langley (Ed.), Proceedings of the 17th international conference on machine learning (pp. 911-918). San Fransisco: Morgan Kaufman.
[29]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (B), 58, 267-288.
[30]
Tipping, M. (2000). The relevance vector machine. In A. Solla, T. Leen, K.-R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 652-658). Cambridge: MIT.
[31]
Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211-244.
[32]
Tipping, M., & Faul, A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the ninth international workshop on artificial intelligence and statistics.
[33]
Vidakovic, B. (1998a). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. Journal of the American Statistical Association, 93, 173-179.
[34]
Vidakovic, B. (1998b). Wavelet-based non-parametric Bayes Methods. In Practical nonparametric and semi-parametric Bayesian statistics (pp. 133-155). New York: Springer.
[35]
Wipf, D., & Rao, B. (2004). Perspectives on sparse Bayesian learning. In Advances in neural information processing systems (Vol. 16).
[36]
Wipf, D., & Rao, B. (2005). Finding sparse representations in multiple response models via Bayesian learning. In Proceedings of Spars05 (pp. 155-158), Rennes.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 68, Issue 2
August 2007
92 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2007

Author Tags

  1. Kernel regression
  2. Relevance vector machine
  3. Smoothness prior
  4. Sparse regression

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Sparse Bayesian learning using TMB (Template Model Builder)Statistics and Computing10.1007/s11222-024-10476-834:5Online publication date: 28-Aug-2024
  • (2016)Adaptive Sparse Bayesian Regression with Variational Inference for Parameter EstimationStructural, Syntactic, and Statistical Pattern Recognition10.1007/978-3-319-49055-7_24(263-273)Online publication date: 29-Nov-2016
  • (2014)Sparse regression mixture modeling with the multi-kernel relevance vector machineKnowledge and Information Systems10.1007/s10115-013-0704-039:2(241-264)Online publication date: 1-May-2014
  • (2013)On the construction of the relevance vector machine based on Bayesian Ying-Yang harmony learningNeural Networks10.1016/j.neunet.2013.08.00548(173-179)Online publication date: 1-Dec-2013
  • (2011)Kernel machines for epilepsy diagnosis via EEG signal classificationArtificial Intelligence in Medicine10.1016/j.artmed.2011.07.00353:2(83-95)Online publication date: 1-Oct-2011
  • (2010)An incremental Bayesian approach for training multilayer perceptronsProceedings of the 20th international conference on Artificial neural networks: Part I10.5555/1886351.1886365(87-96)Online publication date: 15-Sep-2010
  • (2010)Multiclass relevance vector machinesIEEE Transactions on Neural Networks10.1109/TNN.2010.206478721:10(1588-1598)Online publication date: 1-Oct-2010
  • (2009)Sparse Bayesian modeling with adaptive kernel learningIEEE Transactions on Neural Networks10.1109/TNN.2009.201406020:6(926-937)Online publication date: 1-Jun-2009
  • (2009)Adaptive spherical Gaussian kernel in sparse Bayesian learning framework for nonlinear regressionExpert Systems with Applications: An International Journal10.1016/j.eswa.2008.02.05536:2(3982-3989)Online publication date: 1-Mar-2009

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media