research-article

Robust and conjugate Gaussian process regression

AUTHORs:

Matias Altamirano,

François-Xavier Briol,

Jeremias KnoblauchAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 49, Pages 1155 - 1185

Published: 21 July 2024 Publication History

Abstract

To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.

References

[1]

Algikar, P. and Mili, L. Robust Gaussian process regression with Huber likelihood. arXiv:2301.07858, 2023.

[2]

Altamirano, M. and Tobar, F. Nonstationary multi-output gaussian processes via harmonizable spectral mixtures. In International Conference on Artificial Intelligence and Statistics, pp. 3204-3218. PMLR, 2022.

[3]

Altamirano, M., Briol, F.-X., and Knoblauch, J. Robust and scalable Bayesian online changepoint detection. In Proceedings of the 40th International Conference on Machine Learning, pp. 642-663. PMLR, 2023.

Digital Library

[4]

Andrade, D. and Takeda, A. Robust Gaussian process regression with the trimmed marginal likelihood. In Uncertainty in Artificial Intelligence, pp. 67-76, 2023.

[5]

Andrianakis, I. and Challenor, P. The effect of the nugget on Gaussian process emulators of computer models. Computational Statistics and Data Analysis, 56(12):4215-4228, 2012.

Digital Library

[6]

Bachoc, F. Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics and Data Analysis, 66:55-69, 2013a.

[7]

Bachoc, F. Cross validation and maximum likelihood estimations of hyper-parameters of gaussian processes with model misspecification. Computational Statistics & Data Analysis, 66:55-69, 2013b.

[8]

Balandat, M., Karrer, B., Jiang, D. R., Daulton, S., Letham, B., Wilson, A. G., and Bakshy, E. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Advances in Neural Information Processing Systems 33, 2020. URL http://arxiv.org/abs/1910.06403.

[9]

Barp, A., Briol, F.-X., Duncan, A., Girolami, M., and Mackey, L. Minimum Stein discrepancy estimators. In Advances in Neural Information Processing Systems, pp. 12964-12976, 2019.

[10]

Bissiri, P. G., Holmes, C. C., and Walker, S. G. A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):1103-1130, 2016.

[11]

Bochkina, N. Bernstein-von Mises theorem and misspecified models: a review. In Foundations of Modern Statistics, pp. 355-380, 2023.

[12]

Bogunovic, I. and Krause, A. Misspecified Gaussian process bandit optimization. Advances in Neural Information Processing Systems, 34:3004-3015, 2021.

[13]

Bogunovic, I., Krause, A., and Scarlett, J. Corruption-tolerant Gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics, pp. 1071-1081, 2020.

[14]

Bonilla, E. V., Chai, K., and Williams, C. Multi-task Gaussian process prediction. Advances in Neural Information Processing Systems, 20, 2007.

[15]

Boustati, A., Akyildiz, O. D., Damoulas, T., and Johansen, A. Generalised Bayesian filtering via sequential Monte Carlo. In Advances in Neural Information Processing Systems, pp. 418-429, 2020.

[16]

Chérief-Abdellatif, B.-E. and Alquier, P. MMD-Bayes: Robust Bayesian estimation via maximum mean discrepancy. In Symposium on Advances in Approximate Bayesian Inference, 2020.

[17]

Daemi, A., Alipouri, Y., and Huang, B. Identification of robust Gaussian process regression with noisy input using EM algorithm. Chemometrics and Intelligent Laboratory Systems, 191:1-11, 2019.

[18]

Damianou, A. and Lawrence, N. D. Deep Gaussian processes. In Artificial intelligence and statistics, pp. 207-215. PMLR, 2013.

[19]

De Ath, G., Fieldsend, J. E., and Everson, R. M. What do you mean? the role of the mean function in bayesian optimisation. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1623-1631, 2020.

Digital Library

[20]

Dellaporta, C., Knoblauch, J., Damoulas, T., and Briol, F.-X. Robust Bayesian inference for simulator-based models via the MM Dposterior bootstrap. In International Conference on Artificial Intelligence and Statistics, pp. 943-970, 2022.

[21]

Duran-Martin, G., Altamirano, M., Shestopaloff, A. Y., Knoblauch, J., Jones, M., Briol, F.-X., and Murphy, K. Outlier-robust Kalman filtering through generalised Bayes. arXiv:2403.05646, 2024.

[22]

Dutordoir, V., Wilk, M., Artemev, A., and Hensman, J. Bayesian image classification with deep convolutional Gaussian processes. In International Conference on Artificial Intelligence and Statistics, pp. 1529-1539. PMLR, 2020.

[23]

Fong, E., Holmes, C., and Walker, S. G. Martingale posterior distributions. arXiv:2103.15671, 2021.

[24]

Frazier, D. T., Kohn, R., Drovandi, C., and Gunawan, D. Reliable Bayesian inference in misspecified models. arXiv:2302.06031, 2023.

[25]

Frazier, D. T., Knoblauch, J., and Drovandi, C. The impact of loss estimation on gibbs measures. arXiv preprint arXiv:2404.15649, 2024.

[26]

Futami, F., Sato, I., and Sugiyama, M. Variational inference based on robust divergences. In International Conference on Artificial Intelligence and Statistics, pp. 813-822, 2018.

[27]

Gardner, J. R., Pleiss, G., Bindel, D., Weinberger, K. Q., and Wilson, A. GPyTorch: Blackbox matrix-matrix Gaussian Pprocess inference with GPU acceleration. In Advances in Neural Information Processing Systems, pp. 7587-7597, 2018.

[28]

Garnett, R. Bayesian Optimization. Cambridge University Press, 2021.

[29]

Ghosh, A. and Basu, A. Robust Bayes estimation using the density power divergence. Annals of the Institute of Statistical Mathematics, 68(2):413-437, 2016.

[30]

Goldberg, P., Williams, C., and Bishop, C. Regression with input-dependent noise: A Gaussian process treatment. Advances in Neural Information Processing Systems, 10, 1997.

[31]

Grünwald, P. The safe Bayesian. In International Conference on Algorithmic Learning Theory, pp. 169-183, 2012.

Digital Library

[32]

Hartikainen, J. and Sarkka, S. Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In 2010 IEEE International Workshop on Machine Learning for Signal Processing, pp. 379-384, 2010. ISBN 9781424478774. URL https://users.aalto.fi/$\sim$ssarkka/pub/gp-ts-kfrts.pdf.

[33]

Hennig, P., Osborne, M. A., and Kersting, H. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022.

[34]

Hensman, J., Fusi, N., and Lawrence, N. D. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.

[35]

Hooker, G. and Vidyashankar, A. Bayesian model robustness via disparities. TEST, 23(3):556-584, 2014.

[36]

Huber, P. J. Robust statistics. Wiley Series in Probability and Mathematical Statistics, 1981.

[37]

Huggins, J. H. and Miller, J. W. Reproducible model selection using bagged posteriors. Bayesian Analysis, 18(1): 79-104, 2023.

[38]

Hwang, S.-g., L'Huillier, B., Keeley, R. E., Jee, M. J., and Shafieloo, A. How to use gp: effects of the mean function and hyperparameter selection on gaussian process regression. Journal of Cosmology and Astroparticle Physics, 2023(02):014, 2023.

[39]

Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695-708, 2006.

[40]

Jewson, J. and Rossell, D. General Bayesian loss function selection and the use of improper models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(5):1640-1665, 2022.

[41]

Jewson, J., Smith, J. Q., and Holmes, C. Principled Bayesian minimum divergence inference. Entropy, 20(6):442, 2018.

[42]

Jylänki, P., Vanhatalo, J., and Vehtari, A. Robust Gaussian process regression with a student-t likelihood. Journal of Machine Learning Research, 12(11), 2011.

[43]

Karvonen, T. Estimation of the scale parameter for a misspecified Gaussian process model. arXiv:2110.02810, 2021.

[44]

Kirschner, J. and Krause, A. Bias-robust Bayesian optimization via dueling bandits. In International Conference on Machine Learning, pp. 5595-5605, 2021.

[45]

Knoblauch, J. Robust deep Gaussian processes. arXiv preprint arXiv:1904.02303, 2019.

[46]

Knoblauch, J., Jewson, J. E., and Damoulas, T. Doubly robust Bayesian inference for non-stationary streaming data with β-divergences. In Advances in Neural Information Processing Systems, pp. 64-75, 2018.

[47]

Knoblauch, J., Jewson, J., and Damoulas, T. An optimization-centric view on Bayes' rule: Reviewing and generalizing variational inference. Journal of Machine Learning Research, 23(132):1-109, 2022.

[48]

Kuss, M. Gaussian process models for robust regression, classification, and reinforcement learning. PhD thesis, echnische Universität Darmstadt Darmstadt, Germany, 2006.

[49]

Legramanti, S., Durante, D., and Alquier, P. Concentration and robustness of discrepancy-based ABC via Rademacher complexity. arXiv:2206.06991, 2022.

[50]

Li, Z.-Z., Li, L., and Shao, Z. Robust Gaussian process regression based on iterative trimming. Astronomy and Computing, 36:100483, 2021.

[51]

Liu, S., Kanamori, T., and Williams, D. J. Estimating density models with truncation boundaries using score matching. Journal of Machine Learning Research, 23(186):1-38, 2022.

[52]

Lu, Y., Ma, J., Fang, L., Tian, X., and Jiang, J. Robust and scalable Gaussian process regression and its applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21950-21959, 2023.

[53]

Lyddon, S. P., Holmes, C. C., and Walker, S. G. General Bayesian updating and the loss-likelihood bootstrap. Biometrika, 106(2):465-478, 2019.

[54]

Lyu, S. Interpretation and generalization of score matching. In Conference on Uncertainty in Artificial Intelligence, pp. 359-366, 2009.

[55]

Maroñas, J., Hamelijnck, O., Knoblauch, J., and Damoulas, T. Transforming gaussian processes with normalizing flows. In International Conference on Artificial Intelligence and Statistics, pp. 1081-1089. PMLR, 2021.

[56]

Martinez-Cantin, R., Tee, K., and McCourt, M. Practical Bayesian optimization in the presence of outliers. International Conference on Artificial Intelligence and Statistics, pp. 1722-1731, 2018.

[57]

Matsubara, T., Knoblauch, J., Briol, F.-X., Oates, C., et al. Generalised Bayesian inference for discrete intractable likelihood. arXiv:2206.08420, 2022a.

[58]

Matsubara, T., Knoblauch, J., Briol, F.-X., and Oates, C. J. Robust generalised Bayesian inference for intractable likelihoods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(3):997-1022, 2022b.

[59]

Matthews, A. G. d. G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., León-Villagrá, P., Ghahramani, Z., and Hensman, J. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40):1-6, apr 2017. URL http://jmlr.org/papers/v18/16-537.html.

Digital Library

[60]

Miller, J. W. Asymptotic normality, concentration, and coverage of generalized posteriors. Journal of Machine Learning Research, 22(1):7598-7650, 2021.

Digital Library

[61]

Miller, J. W. and Dunson, D. B. Robust bayesian inference via coarsening. Journal of the American Statistical Association, 2018.

[62]

Moreno-Muñoz, P., Artés, A., and Alvarez, M. Heterogeneous multi-output Gaussian process prediction. Advances in Neural Nnformation Processing Systems, 31, 2018.

[63]

Morris, M. D., Mitchell, T. J., and Ylvisaker, D. Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. Technometrics, 35(3): 243-255, 1993.

[64]

Naish-Guzman, A. and Holden, S. Robust regression with twinned Gaussian processes. Advances in Neural Information Processing Systems, 20, 2007.

[65]

Naslidnyk, M., Kanagawa, M., Karvonen, T., and Mahsereci, M. Comparing scale parameter estimators for Gaussian process regression Cross validation and maximum likelihood. arXiv:2307.07466, 2023.

[66]

Opper, M. and Archambeau, C. The variational gaussian approximation revisited. Neural computation, 21(3):786-792, 2009.

[67]

Pacchiardi, L. and Dutta, R. Generalized Bayesian likelihood-free inference using scoring rules estimators. arXiv:2104.03889, 2021.

[68]

Paleyes, A., Mahsereci, M., and Lawrence, N. D. Emukit: A Python toolkit for decision making under uncertainty. Proceedings of the Python in Science Conference, 2023.

[69]

Park, C., Borth, D. J., Wilson, N. S., Hunter, C. N., and Friedersdorf, F. J. Robust Gaussian process regression with a bias model. Pattern Recognition, 124:108444, 2022.

Digital Library

[70]

Petit, S., Bect, J., Da Veiga, S., Feliot, P., and Vazquez, E. Towards new cross-validation-based estimators for gaussian process regression: efficient adjoint computation of gradients. arXiv preprint arXiv:2002.11543, 2020.

[71]

Ranjan, R., Huang, B., and Fatehi, A. Robust Gaussian process modeling using EM algorithm. Journal of Process Control, 42:125-136, 2016.

[72]

Rasmussen, C. E. and Williams, C. K. Gaussian processes for machine learning. MIT press Cambridge, MA, 2006.

Digital Library

[73]

Salimbeni, H., Dutordoir, V., Hensman, J., and Deisenroth, M. Deep Gaussian processes with importance-weighted variational inference. In International Conference on Machine Learning, pp. 5589-5598. PMLR, 2019.

[74]

Santner, T. J., Williams, B. J., and Notz, W. I. The Design and Analysis of Computer Experiments. Springer, 2nd edition, 2018. ISBN 9781493988457.

[75]

Schmon, S. M., Cannon, P. W., and Knoblauch, J. Generalized posteriors in approximate Bayesian computation. In Symposium on Advances in Approximate Bayesian Inference, 2020.

[76]

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148-175, 2015.

[77]

Stegle, O., Fallert, S. V., MacKay, D. J., and Brage, S. Gaussian process robust regression for noisy heart rate data. IEEE Transactions on Biomedical Engineering, 55 (9):2143-2151, 2008.

[78]

Stephenson, W. T., Ghosh, S., Nguyen, T. D., Yurochkin, M., Deshpande, S. K., and Broderick, T. Measuring the robustness of Gaussian processes to kernel choice. In International Conference on Artificial Intelligence and Statistics, PMLR, volume 151, pp. 3308-3331, 2022.

[79]

Sundararajan, S. and Keerthi, S. Predictive app roaches for choosing hyperparameters in gaussian processes. Advances in neural information processing systems, 12, 1999.

[80]

Syring, N. and Martin, R. Calibrating general posterior credible regions. Biometrika, 106(2):479-486, 2019.

[81]

Tanaka, Y., Tanaka, T., Iwata, T., Kurashima, T., Okawa, M., Akagi, Y., and Toda, H. Spatially aggregated Gaussian processes with multivariate areal outputs. In Advances in Neural Information Processing Systems, 2019.

[82]

Teckentrup, A. L. Convergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. SIAM-ASA Journal on Uncertainty Quantification, 8(4):1310-1337, 2020. ISSN 21662525.

[83]

Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Artificial intelligence and statistics, pp. 567-574. PMLR, 2009.

[84]

van Der Vaart, A. and van Zanten, H. Information rates of nonparametric Gaussian process methods. Journal of Machine Learning Research, 12:2095-2119, 2011.

Digital Library

[85]

Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. Bayesian leave-one-out cross-validation approximations for gaussian latent variable models. Journal of Machine Learning Research, 17(103):1-38, 2016.

[86]

Von Luxburg, U. and Schölkopf, B. Statistical learning theory: Models, concepts, and results. In Handbook of the History of Logic, volume 10, pp. 651-706. Elsevier, 2011.

[87]

Wang, W., Tuo, R., and Wu, C. F. J. On prediction properties of kriging: uniform error bounds and robustness. Journal of the American Statistical Association, 115(530):920-930, 2020.

[88]

Wenger, J., Krämer, N., Pförtner, M., Schmidt, J., Bosch, N., Effenberger, N., Zenn, J., Gessner, A., Karvonen, T., Briol, F.-X., Mahsereci, M., and Hennig, P. ProbNum: Probabilistic numerics in Python. arXiv:2112.02100, 2021. URL http://arxiv.org/abs/2112.02100.

[89]

Wild, V. D., Hu, R., and Sejdinovic, D. Generalized variational inference in function spaces: Gaussian measures meet Bayesian deep learning. Advances in Neural Information Processing Systems, 35:3716-3730, 2022.

[90]

Wu, J., Poloczek, M., Gordon Wilson, A., and Frazier, P. I. Bayesian optimization with gradients. In Advances in Neural Information Processing Systems, 2017.

[91]

Wu, P.-S. and Martin, R. A comparison of learning rate selection methods in generalized Bayesian inference. Bayesian Analysis, 18(1):105-132, 2023.

[92]

Wynne, G., Briol, F.-X., and Girolami, M. Convergence guarantees for Gaussian process means with misspecified likelihoods and smoothness. Journal of Machine Learning Research, 22(123):1-40, 2021.

[93]

Xu, J., Scealy, J. L., Wood, A. T., and Zou, T. Generalized score matching for regression. arXiv preprint arXiv:2203.09864, 2022.

[94]

Yousefi, F., Smith, M. T., and Álvarez, M. A. Multitask learning for aggregated data using Gaussian processes. In Advances in Neural Information Processing Systems, 2019.

[95]

Yu, S., Drton, M., and Shojaie, A. Generalized score matching for general domains. Information and Inference, 11(2):739-780, 2022. ISSN 20498772.

[96]

Zellner, A. Optimal information processing and Bayes's theorem. The American Statistician, 42(4):278-280, 1988.

Index Terms

Robust and conjugate Gaussian process regression
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis
        Robust regression

Index terms have been assigned to the content through auto-classification.

Recommendations

Variational Bayesian multinomial probit regression with Gaussian process priors

It is well known in the statistics literature that augmenting binary and polychotomous response models with gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation ...
Robust regression with twinned Gaussian processes
NIPS'07: Proceedings of the 21st International Conference on Neural Information Processing Systems

We propose a Gaussian process (GP) framework for robust inference in which a GP prior on the mixing weights of a two-component noise model augments the standard process over latent function values. This approach is a generalization of the mixture ...
A prior near-ignorance Gaussian process model for nonparametric regression

This paper proposes a prior near-ignorance model for regression based on a set of Gaussian Processes (GP). GPs are natural prior distributions for Bayesian regression. They offer a great modeling flexibility and have found widespread application in many ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten