Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

A framework for evaluating approximation methods for Gaussian process regression

Published: 01 February 2013 Publication History

Abstract

Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n2) space and O(n3) time for a data set of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.

References

[1]
R. P. Adams, G. E. Dahl, and I. Murray. Incorporating side information into probabilistic matrix factorization using Gaussian processes. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pages 1-9. AUAI Press, 2010.
[2]
M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection. Journal of Computer and System Sciences, 7:448-461, 1973.
[3]
K. Chalupka. Empirical evaluation of Gaussian process approximation algorithms. Master's thesis, School of Informatics, University of Edinburgh, 2011. http://homepages.inf.ed.ac.uk/ ckiw/postscript/Chalupka2011diss.pdf.
[4]
T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In Proceedings of the 20th ACM Symposium on Theory of Computing, pages 434-444. ACM Press, New York, USA, 1988. ISBN 0-89791-264-0.
[5]
N. De Freitas, Y. Wang, M. Mahdaviani, and D. Lang. Fast Krylov methods for N-body learning. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 251-258. MIT Press, 2006.
[6]
J. Fritz, I. Neuweiler, and W. Nowak. Application of FFT-based algorithms for large-scale universal Kriging problems. Mathematical Geosciences, 41:509-533, 2009.
[7]
M. Gibbs. Bayesian Gaussian processes for Classification and Regression. PhD thesis, University of Cambridge, 1997.
[8]
G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University Press, third edition, 1996.
[9]
T. F. Gonzales. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38(2-3):293-306, 1985.
[10]
A. Gray. Fast kernel matrix-vector multiplication with application to Gaussian process learning. Technical Report CMU-CS-04-110, School of Computer Science, Carnegie Mellon University, 2004.
[11]
S. Keerthi and W. Chu. A matching pursuit approach to sparse Gaussian process regression. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 643-650. MIT Press, Cambridge, MA, 2006.
[12]
N. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: The informative vector machine. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 625-632. MIT Press, 2003.
[13]
N. D. Lawrence. Gaussian process latent variable models for visualization of high dimensional data. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 329-336. MIT Press, 2004.
[14]
M. Léazaro-Gredilla, J. Quiñonero-Candela, C. E. Rasmussen, and A. R. Figueiras-Vidal. Sparse spectrum Gaussian process regression. Journal of Machine Learning Research, 11:1865-1881, 2010.
[15]
W. Li, K-H. Lee, and K-S. Leung. Large-scale RLSC learning without agony. In Proceedings of the 24th International Conference on Machine learning, pages 529-536. ACM Press New York, NY, USA, 2007.
[16]
E. Liberty, F. Woolfe, P-G. Martinsson, V. Rokhlin, and M. Tygert. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 104(51): 20167-72, 2007.
[17]
M. Malshe, L. M. Raff, M. G. Rockey, M. Hagan, P. M. Agrawal, and R. Komanduri. Theoretical investigation of the dissociation dynamics of vibrationally excited vinyl bromide on an ab initio potential-energy surface obtained using modified novelty sampling and feedforward neural networks. II. Numerical application of the method. The Journal of Chemical Physics, 127(13): 134105, 2007.
[18]
S. Manzhos and T. Carrington Jr. Using neural networks, optimized coordinates, and high-dimensional model representations to obtain a vinyl bromide potential surface. The Journal of Chemical Physics, 129:224104-1-224104-8, 2008.
[19]
V. I. Morariu, B. V. Srinivasan, V. C. Raykar, R. Duraiswami, and L. S. Davis. Automatic online tuning for fast Gaussian summation. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1113-1120, 2009.
[20]
I. Murray. Gaussian processes and fast matrix-vector multiplies, 2009. Presented at the Numerical Mathematics in Machine Learning workshop at the 26th International Conference on Machine Learning (ICML 2009), Montreal, Canada. URL http://www.cs.toronto.edu/~murray/ pub/09gp_eval/ (as of March 2011).
[21]
R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996. Lecture Notes in Statistics 118.
[22]
C. J. Paciorek. Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP package. Journal of Statistical Software, 19(2):1-38, 2007. URL http://www. jstatsoft.org/v19/i02.
[23]
J. Quiñonero-Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1939-1959, 2005.
[24]
J. Quiñonero-Candela, C. E. Rasmussen, and C. K. I. Williams. Approximation methods for Gaussian process regression. In L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, editors, Large Scale Learning Machines, pages 203-223. MIT Press, 2007.
[25]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, Massachusetts, 2006.
[26]
V. C. Raykar and R. Duraiswami. Fast large scale Gaussian process regression using approximate matrix-vector products. In Learning Workshop 2007, 2007. Available from: http://www.umiacs.umd.edu/~vikas/publications/raykar_learning_workshop_2007_full_paper.pdf.
[27]
Y. Shen, A. Ng, and M. Seeger. Fast Gaussian process regression using KD-trees. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1225-1232. MIT Press, 2006.
[28]
E. Snelson. Flexible and Efficient Gaussian Process Models for Machine Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2007.
[29]
E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1257-1264, 2006.
[30]
E. Snelson and Z. Ghahramani. Local and global sparse Gaussian process approximations. In M. Meila and X. Shen, editors, Artificial Intelligence and Statistics 11. Omnipress, 2007.
[31]
E. Sudderth and M. Jordan. Shared segmentation of natural scenes using dependent Pitman-Yor processes. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1585-1592, 2009.
[32]
M. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Artificial Intelligence and Statistics 12, volume 5, pages 567-574. JMLR: W&CP, 2009.
[33]
C. K. Wikle, R. F. Milliff, D. Nychka, and L. M. Berliner. Spatiotemporal hierarchical Bayesian modeling: tropical ocean surface winds. Journal of the American Statistical Association, 96 (454):382-397, 2001.
[34]
C. Yang, R. Duraiswami, and L. Davis. Efficient kernel machines using the improved fast Gauss transform. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1561-1568. MIT Press, 2005.

Cited By

View all
  • (2024)Optimal Composite Likelihood Estimation and Prediction for Distributed Gaussian Process ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332837846:2(1134-1147)Online publication date: 1-Feb-2024
  • (2023)Leveraging locality and robustness to achieve massively scalable Gaussian process regressionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666951(18906-18931)Online publication date: 10-Dec-2023
  • (2023)QR decomposition based low rank approximation for Gaussian process regressionApplied Intelligence10.1007/s10489-023-05064-853:23(28924-28936)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 14, Issue 1
January 2013
3717 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 February 2013
Published in JMLR Volume 14, Issue 1

Author Tags

  1. FITC
  2. Gaussian process regression
  3. local GP
  4. subset of data

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)11
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimal Composite Likelihood Estimation and Prediction for Distributed Gaussian Process ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332837846:2(1134-1147)Online publication date: 1-Feb-2024
  • (2023)Leveraging locality and robustness to achieve massively scalable Gaussian process regressionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666951(18906-18931)Online publication date: 10-Dec-2023
  • (2023)QR decomposition based low rank approximation for Gaussian process regressionApplied Intelligence10.1007/s10489-023-05064-853:23(28924-28936)Online publication date: 1-Dec-2023
  • (2022)Hyperparameters Adaptive Sharing Based on Transfer Learning for Scalable GPs2022 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC55065.2022.9870288(01-07)Online publication date: 18-Jul-2022
  • (2022)A sparse multi-fidelity surrogate-based optimization method with computational awarenessEngineering with Computers10.1007/s00366-022-01766-839:5(3473-3489)Online publication date: 1-Dec-2022
  • (2022)aphBO-2GP-3B: a budgeted asynchronous parallel multi-acquisition functions for constrained Bayesian optimization on high-performing computing architectureStructural and Multidisciplinary Optimization10.1007/s00158-021-03102-y65:4Online publication date: 1-Apr-2022
  • (2021)Gaussian processes with skewed Laplace spectral mixture kernels for long-term forecastingMachine Language10.1007/s10994-021-06031-5110:8(2213-2238)Online publication date: 1-Aug-2021
  • (2021)Online voltage prediction using gaussian process regression for fault-tolerant photovoltaic standalone applicationsNeural Computing and Applications10.1007/s00521-021-06254-633:23(16577-16590)Online publication date: 1-Dec-2021
  • (2020)Examining the Role of Mood Patterns in Predicting Self-Reported Depressive symptomsProceedings of the 12th ACM Conference on Web Science10.1145/3394231.3397906(164-173)Online publication date: 6-Jul-2020
  • (2020)Fusing Online Gaussian Process-Based Learning and Control for Scanning Quantum Dot Microscopy2020 59th IEEE Conference on Decision and Control (CDC)10.1109/CDC42340.2020.9304053(5525-5531)Online publication date: 14-Dec-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media