PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification
Matthias Seeger;
3(Oct):233-269, 2002.
Abstract
Approximate Bayesian Gaussian process (GP) classification techniques are
powerful non-parametric learning methods, similar in appearance and performance
to support vector machines. Based on simple probabilistic models, they render
interpretable results and can be embedded in Bayesian frameworks for model
selection, feature selection, etc. In this paper, by applying the PAC-Bayesian
theorem of McAllester (1999a), we prove distribution-free generalisation
error bounds for a wide range of approximate Bayesian GP classification
techniques. We also provide a new and much simplified proof for this powerful
theorem, making use of the concept of convex duality which is a backbone of
many machine learning techniques. We instantiate and test our bounds for two
particular GPC techniques, including a recent sparse method which circumvents
the unfavourable scaling of standard GP algorithms. As is shown in experiments
on a real-world task, the bounds can be very tight for moderate training
sample sizes. To the best of our knowledge, these results provide the tightest
known distribution-free error bounds for approximate Bayesian GPC methods,
giving a strong learning-theoretical justification for the use of these
techniques.
[abs]
[pdf]
[ps.gz]
[ps]