Andreas Geiger

Gaussian Processes for Machine Learning

In recent years Gaussian Processes have become more and more popular for doing machine learning. A Gaussian Process can be seen as an infinite dimensional Gaussian Distribution defined by a mean function and a covariance function. Using Gaussian Processes for non-linear regression only involves the choice of such a covariance function (the mean function is often neglected due to missing knowledge). Inferring new points now involves an inversion of the covariance matrix defined by this kernel. You can use the following Java Applet to generate some training points and evaluate the influence of different covariance functions and hyperparameters on the predicted curve:

Some (very high-levelish) Background

Gaussian Process can be seen as a generalization of Gaussian Distributions. Instead of being specified by mean and variance like Gaussian Distributions they are fully specified by a mean function and a covariance function. This fact is illustrated in Figure 1. Putting a query point x and a 'time step' t into it the GP delivers the value of the probability density function. The covariance function allows for specifying a-priori knowledge which could be the training data for solving the regression problem.

Figure 1: Gaussian Distribution (left) vs. Gaussian Process (right)

It is well known, that the conditional as well as the marginal distribution of a normal distributed multidimensional random variable is normal distributed again. This can be used to infer the conditional probability density function (specified by mean and covariance) of a given joint gaussian distribution (with known covariance) as described in Figure 2:

Figure 2: Inferring the mean and variance of a conditional Gaussian Distribution

Introducing a new representation, where the mean values and error bars for the variances are drawn over the dimensions y, reveals how GPs can be used to solve the regression problem. In Figure 3 for example a 5x5 covariance matrix and a 3-d input vector was used to calculate the 2-d output mean vector and the corresponding variances which are depict as error bars.

Figure 3: A new representation depicts the mean of 3 input and 2 output points

The correlation between the data values can be expressed by any covariance function. In the applet above, for example you have the possibility to select between a radial basis function (or sometimes called squared exponential kernel function), a periodic RBF and a polynomial function. In Figure 4 a RBF is shown. It can clearly be seen that points lying close together are strongly correlated (or have a covariance value close to 1). The form of the RBF is governed by learnable (using gradient descent methods) hyperparameters. For example the horizontal lengthscale controls the influence of data points on each other depending on their distance. A large horizontal lengthscale means that points lying far from each other are weakly correlated whereas a small horizontal lengthscale indicates almost no correlation between points to far from each other.

Figure 4: A Radial Basis Function (RBF)

In order to perform non-linear dimensionality reduction the Gaussian Process Latent Variable Model (GPLVM) has been developed by Neil Lawrence. To apply this to human pose tracking and style-based inverse kinematics a enhanced model, the so-called Gaussian Process Dynamical Model has been proposed by Wang et al. It considers in addition to the latent-to-pose mapping of the GPLVM (blue) in addition a dynamics mapping (green) in latent space. Thus this model can be described by 2 GPs. This fact is illustrated for 3 frames (or time-steps) in Figure 5:

Figure 5: GPLVMs (left) vs. GPDMs (right)

Learning a GPLVM or a GPDM now involves maximizing the log-posterior probability with respect to the model which consists of the latent variables and the hyperparameters for the GPs. By doing so in was experementally shown by Hertzmann et al. and Urtasun et al. that it is sufficient to use a 2-d or 3-d latent space to capture an entire pose in a realistic way. GP models proved to exhibit good generalization properties. Thus articulated human body tracking with a 30-d model can be done with the use of a GPDM trained using only one motion sequence. Figure 6 shows an example of different poses and the corresponding latent space. The red dot indicate training points and correspond to the poses of the articulated body model.

Figure 6: A learned latent space and the corresponding poses (Hertzmann et al.)

Here are some more links: