Latent Force Models
Mauricio Alvarez
School of Computer Science
University of Manchester
Manchester, UK, M13 9PL
alvarezm@cs.man.ac.uk
1
David Luengo
Dep. Teorı́a de Señal y Comunicaciones
Universidad Carlos III de Madrid
28911 Leganés, Spain
luengod@ieee.org
Neil D. Lawrence
School of Computer Science
University of Manchester
Manchester, UK, M13 9PL
neill@cs.man.ac.uk
Abstract
extrapolate, i.e. make predictions in a regime in which
data has not been seen yet, performance can be poor.
Purely data driven approaches for machine
learning present difficulties when data is
scarce relative to the complexity of the model
or when the model is forced to extrapolate.
On the other hand, purely mechanistic approaches need to identify and specify all the
interactions in the problem at hand (which
may not be feasible) and still leave the issue of how to parameterize the system. In
this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with
a physical model of the system. We show how
different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies
from computational biology, motion capture
and geostatistics.
Purely mechanistic models, i.e. models which are inspired by the underlying physical knowledge of the
system, are common in many areas such as chemistry, systems biology, climate modelling and geophysical sciences, etc. They normally make use of a fairly
well characterized physical process that underpins the
system, typically represented with a set of differential
equations. The purely mechanistic approach leaves us
with a different set of problems to those from the data
driven approach. In particular, accurate description
of a complex system through a mechanistic modelling
paradigm may not be possible: even if all the physical
processes can be adequately described, the resulting
model could become extremely complex. Identifying
and specifying all the interactions might not be feasible, and we would still be faced with the problem of
identifying the parameters of the system.
Despite these problems, physically well characterized
models retain a major advantage over purely data
driven models. A mechanistic model can enable accurate prediction even in regions where there may be
no available training data. For example, Pioneer space
probes have been able to enter different extra terrestrial orbits despite the absence of data for these orbits.
Introduction
Traditionally, the main focus in machine learning
has been model generation through a data driven
paradigm. The usual approach is to combine a data
set with a (typically fairly flexible) class of models
and, through judicious use of regularization, make useful predictions on previously unseen data. There are
two key problems with purely data driven approaches.
Firstly, if data is scarce relative to the complexity of
the system we may be unable to make accurate predictions on test data. Secondly, if the model is forced to
In this paper we advocate an alternative approach.
Rather than relying on an exclusively mechanistic
or data driven approach we suggest a hybrid system
which involves a (typically overly simplistic) mechanistic model of the system which can easily be augmented
through machine learning techniques. We will start by
considering two dynamical systems, both simple latent
variable models, which incorporate first and second order differential equations. Our inspiration is the work
of (Lawrence et al., 2007; Gao et al., 2008) who encoded a first order differential equation in a Gaussian
process (GP). However, their aim was to construct an
accurate model of transcriptional regulation, whereas
ours is to make use of the mechanistic model to in-
Appearing in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS)
2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR:
W&CP 5. Copyright 2009 by the authors.
9
Latent Force Models
corporate salient characteristics of the data (e.g. in a
mechanical system inertia) without necessarily associating the components of our mechanistic model with
actual physical components of the system. For example, for a human motion capture dataset we develop
a mechanistic model of motion capture that does not
exactly replicate the physics of human movement, but
nevertheless captures salient features of the movement.
Having shown how first and second order dynamical
systems can be incorporated in a GP, we finally show
how partial differential equations can also be incorporated for modelling systems with multiple inputs.
2
with a process prior. The notation used, f:,r , indicates
the r-th column of F, and represents the values of
that function for the r-th dimension at the times given
by t. The matrix Kf:,r ,f:,r is the covariance function
associated to fr (t) computed at the times given in t.
Such a GP can be readily implemented. Given the covariance functions for {fr (t)} the implied covariance
functions for {yq (t)} are straightforward to derive. In
(Teh et al., 2005) this is known as a semi-parametric
latent factor model (SLFM), although their main focus is not the temporal case. Historically the Kalman
filter approach has been preferred, perhaps because of
its linear computational complexity in N . However,
recent advances in sparse approximations have made
the general GP framework practical (see (Quiñonero
Candela and Rasmussen, 2005) for a review).
Latent Variables and Physical
Systems
From the perspective of machine learning our approach
can be seen as a type of latent variable model. In a
latent variable model we may summarize a high dimensional data set with a reduced dimensional representation. For example, if our data consists of N points
in a Q dimensional space we might seek a linear relationship between the data, Y ∈ RN ×Q , and a reduced
dimensional representation, F ∈ RN ×R , where R < Q.
From a probabilistic perspective this involves an assumption that we can represent the data as
Y = FW + E,
So far the model described relies on the latent variables
to provide the dynamic information. Our main contribution is to include a further dynamical system with
a mechanistic inspiration. We now use a mechanical
analogy to introduce it. Consider the following physical interpretation of (1): the latent functions, fr (t),
are R forces and we observe the displacement of Q
springs, yq (t), to the forces. Then we can reinterpret
(1) as the force balance equation, YD = FS+ Ẽ. Here
we have assumed that the forces are acting, for example, through levers, so that we have a matrix of sensitivities, S ∈ RR×Q , and a diagonal matrix of spring
constants, D ∈ RQ×Q . The original model is recovered
by setting W = SD−1 and ǫ̃:,q ∼ N 0, D⊤ ΣD . The
model can be extended by assuming that the spring is
acting in parallel with a damper and that the system
has mass, allowing us to write,
(1)
where E is a matrix-variate Gaussian noise: each column, ǫ:,q (1 ≤ q ≤ Q), is a multi-variate Gaussian
with zero mean and covariance Σ, i.e. ǫ:,q ∼ N (0, Σ).
The usual approach, as undertaken in factor analysis
and principal component analysis (PCA), to dealing
with the unknowns in this model is to integrate out F
under a Gaussian prior and optimize with respect to
W ∈ RR×Q (although it turns out that for a non-linear
variant of the model it can be convenient to do this the
other way around, see e.g. (Lawrence, 2005)). If the
data has a temporal nature, then the Gaussian prior in
the latent space could express a relationship between
the rows of F, ftn = ftn−1 + η, where η ∼ N (0, C)
and ftn is the n-th row of F, which we associate with
time tn . This is known as the Kalman filter/smoother.
Normally the times, tn , are taken to be equally spaced,
but more generally we can consider a joint distribution
⊤
for p (F|t), t = [t1 . . . tN ] , which has the form of a
Gaussian process (GP),
p (F|t) =
R
Y
FS = ŸM + ẎC + YD + ǫ,
(2)
where M and C are diagonal matrices of masses and
damping coefficients respectively, Ẏ ∈ RN ×Q is the
first derivative of Y w.r.t. time and Ÿ is the second
derivative. The second order mechanical system that
this model describes will exhibit several characteristics which are impossible to represent in the simpler
latent variable model given by (1), such as inertia and
resonance. This model is not only appropriate for
data from mechanical systems. There are many analogous systems which can also be represented by second
order differential equations, e.g. Resistor-InductorCapacitor circuits. A unifying characteristic for all
these models is that the system is beign forced by laR
tent functions, {fr (t)}r=1 . Hence, we refer to them as
latent force models (LFMs).
N f:,r |0, Kf:,r ,f:,r ,
r=1
where we have assumed zero mean and independence
across the R dimensions of the latent space. The GP
makes explicit the fact that the latent variables are
R
functions, {fr (t)}r=1 , and we have now described them
One way of thinking of our model is to consider puppetry. A marionette is a representation of a human
(or animal) controlled by a limited number of inputs
through strings (or rods) attached to the character.
10
Alvarez, Luengo, Lawrence
If each latent force is taken to be independent with a
covariance function given by
(t − t′ )2
kfr ,fr (t, t′ ) = exp −
,
ℓ2r
This limited number of inputs can lead to a wide range
of character movements. In our model, the data is the
movements of the marionette, and the latent forces are
the inputs to the system from the puppeteer.
Finally, note that it is of little use to include dynamical models of the type specified in (2) if their effects
cannot be efficiently incorporated into the inference
process. Fortunately, as we will see in the case studies,
for an important class of covariance functions it is analytically tractable to compute the implied covariance
Q
functions for {yq (t)}q=1 . Then, given the data a conjugate gradient descent algorithm can be used to obtain
the hyperparameters of the model which minimize the
minus log-likelihood, and inference is performed based
on standard GP regression techniques.
3
then we can compute the covariance of the outputs
analytically, obtaining (Lawrence et al., 2007)
′
kyp yq (t, t ) =
r=1
2
[hqp (t′ , t) + hpq (t, t′ )],
where
(
2
exp(νrq
)
′
hqp (t , t) =
exp(−Dq t ) exp(Dq t)
Dp + Dq
′
t −t
t
× erf
− νrq + erf
+ νrq
ℓr
ℓr
)
′
t
− νrq + erf(νrq ) ,
− exp(−Dp t) erf
ℓr
′
First Order Dynamical System
A single input module is a biological network motif
where the transcription of a number of genes is driven
by a single transcription factor. In (Barenco et al.,
2006) a simple first order differential equation was
proposed to model this situation. Then (Lawrence et
al., 2007; Gao et al., 2008) suggested that inference of
the latent transcription factor concentration should be
handled using GPs. In effect their model can be seen as
a latent force model based on a first order differential
equation with a single latent force. Here we consider
the extension of this model to multiple latent forces.
As a mechanistic model, this is a severe over simplification of the physical system: transcription factors
are known to interact in a non linear manner. Despite
this we will be able to uncover useful information. Our
model is based on the following differential equation,
R
X
dyq (t)
Srq fr (t).
+ Dq yq (t) = Bq +
dt
r=1
√
R
X
Srp Srq πℓr
hereR erf(x) is the real valued error function, erf(x) =
x
√2
exp(−y 2 )dy, and νrq = ℓr Dq /2.
π 0
Additionally, we can compute the cross-covariance between the inputs and outputs,
√
Srq πℓr
2
exp(νrq
) exp(−Dq (t − t′ ))
kyq fr (t, t′ ) =
2
′
′
t −t
t
× erf
.
− νrq + erf
+ νrq
ℓr
ℓr
3.1
p53 Data
Our data is from (Barenco et al., 2006), where
leukemia cell lines were bombarded with radiation to
induce activity of the transcription factor p53. This
transcription factor repairs DNA damage and triggers
a mechanism which pauses the cell-cycle and potentially terminates the cell. In (Barenco et al., 2006)
microarray gene expression levels of known targets of
p53 were used to fit a first order differential equation
model to the data. The model was then used to provide a ranked list of 50 genes identified as regulated
by p53.
(3)
Here the latent forces, fr (t), represent protein concentration (which is difficult to observe directly), the
outputs, yq (t), are the mRNA abundance levels for
different genes, Bq and Dq are respectively the basal
transcription and the decay rates of the q-th gene, and
Srq are coupling constants that quantify the influence
of the r-th input on the q-th output (i.e. the sensitivity of gene q to the concentration of protein r). Solving
(3) for yq (t), we obtain
Our aim is to determine if there are additional “latent forces” which could better explain the activity of
some of these genes. The experimental data consists of
measurements of expression levels of 50 genes for three
different replicas. Within each replica, there are measurements at seven different time instants. We constructed a latent force model with six latent forces, assuming that each replica was independently produced
but fixing the hyperparameters of the kernel across the
R
Bq X
Lrq [fr ](t),
+
yq (t) =
Dq r=1
where we have ignored transient terms, which are easily included, and the linear operator is given by the
following linear convolution operator,
Z t
fr (τ ) exp(Dq τ )dτ .
Lrq [fr ](t) = Srq exp(−Dq t)
0
11
Latent Force Models
replicas1 . We employed a sparse approximation, as
proposed in (Alvarez and Lawrence, 2009), with ten
inducing points for speeding up computation.
For the motion capture data yq (t) corresponds to a
given observed angle over time, and its derivatives represent angular velocity and acceleration. The system
is summarized
by the undamped natural frequency,
p
p
ω0q = Dq , and the damping ratio, ζq = 12 Cq / Dq .
Systems with a damping ratio greater than one are
said to be overdamped, whereas underdamped systems
exhibit resonance and have a damping ratio less than
one. For critically damped systems ζq = 1, and finally,
for undamped systems (i.e. no friction) ζq = 0.
Of the six latent functions, two were automatically
switched off by the model. Two further latent functions, shown in Figure 1 as latent forces 1 & 2, were
consistent across all replicas: their shapes were time
translated versions of the p53 profile as identified by
(Barenco et al., 2006; Lawrence et al., 2007; Gao et
al., 2008). This time translation allows genes to experience different transcriptional delays, a mechanism
not included explicitly in our model, but mimicked by
linear mixing of an early and a late signal. The remaining two latent functions were inconsistent across
the replicas (see e.g. latent force 3 in Figure 1). They
appear to represent processes not directly related to
p53. This was backed up by the sensitivity parameters
found in the model. The known p53 targets DDB2,
p21, SESN1/hPA26, BIK and TNFRSF10b were found
to respond to latent forces 1 & 2. Conversely, the
genes that were most responsive to latent force 3 were
MAP4K4, a gene involved in environmental stress signalling, and FDXR, an electron transfer protein.
4
Ignoring the initial conditions once more, the solution
of (4) is again given by a convolution, with the linear
operator now being
Lrq [fr ](t)
Srq
exp(−αq t)
ωq
Z t
×
fr (τ ) exp(αq τ ) sin(ωq (t − τ ))dτ ,
=
0
(5)
where ωq =
q
4Dq − Cq2 /2 and αq = Cq /2.
Once again, if we consider a latent force governed by
a GP with the RBF covariance function we can solve
(5) analytically, obtaining a closed-form expression for
the covariance matrix of the outputs,
Second Order Dynamical System
In Section 1 we introduced the analogy of a marionette’s motion being controlled by a reduced number
of forces. Human motion capture data consists of a
skeleton and multivariate time courses of angles which
summarize the motion. This motion can be modelled
with a set of second order differential equations which,
due to variations in the centers of mass induced by
the movement, are non-linear. The simplification we
consider for the latent force model is to linearize these
differential equations, resulting in the following second
order dynamical system,
′
kyp yq (t, t ) =
R
X
Srp Srq
r=1
p
πℓ2r
8ωp ωq
ky(r)
(t, t′ ).
p yq
(r)
Here kyp yq (t, t′ ) can be considered the cross-covariance
between the p-th and q-th outputs under the effect of
the r-th latent force, and is given by
ky(r)
(t, t′ )
p yq
= hr (e
γq , γp , t, t′ ) + hr (γp , γ
eq , t′ , t)
+ hr (γq , γ
ep , t, t′ ) + hr (e
γp , γq , t′ , t)
− hr (e
γq , γ
ep , t, t′ ) − hr (e
γp , γ
eq , t′ , t)
− hr (γq , γp , t, t′ ) − hr (γp , γq , t′ , t),
R
X
d2 yq (t)
dyq (t)
Srq fr (t), (4)
+C
+D
y
(t)
=
B
+
q
q q
q
dt2
dt
r=1
where γp = αp + jωp , γ
ep = αp − jωp , and
hr (γq , γp , t, t′ ) =
where the mass of the system, without loss of generality, is normalized to 1. Whilst (4) is not the correct
physical model for our system, it will still be helpful when extrapolating predictions across different motions, as we shall see in the next section. Note also
that, although similar to (3), the dynamic behavior of
this system is much richer than that of the first order
system, since it can exhibit inertia and resonance.
Υr (γq , t′ , t) − exp(−γp t)Υr (γq , t′ , 0)
,
γp + γq
with
2 2
ℓ γ
Υr (γq , t, t′ ) = 2 exp r4 q exp(−γq (t − t′ ))
′ 2
(t′ )2
)
w(jz
(t))
−
exp
−
− exp − (t−t
rq
ℓ2
ℓ2
r
× exp(−γq t)w(−jzrq (0)),
1
The decay rates were asssumed equal within replicas.
Although this might be an important restriction for this
experiment, our purpose in this paper is to expose a general methodology without delving into the details of each
experimental setup.
r
(6)
and zrq (t) = (t − t′ )/ℓr − (ℓr γq )/2. Note that zrq (t) ∈
C, and w(jz) in (6), for z ∈ C, denotes Faddeeva’s
function w(jz) = exp(z 2 )erfc(z), where erfc(z) is the
complex version of the complementary error function,
12
Alvarez, Luengo, Lawrence
2
2
1.5
1.5
1
1
0.5
0.5
3
2
1
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
0
−1
−2
−2
0
2
4
6
8
10
12
(a) Replica 1. Latent force 1.
−2
0
2
4
6
8
10
12
¿
−3
0
(b) Replica 2. Latent force 1.
2
2
1.5
1.5
1
1
0.5
0.5
2
4
6
8
10
12
(c) Replica 3. Latent force 1.
3
2
1
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
0
−1
−2
−2
0
2
4
6
8
10
12
−2
0
(d) Replica 1. Latent force 2.
2
4
6
8
10
12
−3
0
(e) Replica 2. Latent force 2.
2
2.5
4
6
8
10
12
2
2
1.5
2
(f) Replica 3. Latent force 2.
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
0
−0.5
−0.5
−0.5
−1
−1
−1
−1.5
−1.5
−2
0
−1.5
−2
2
4
6
8
10
(g) Replica 1. Latent force 3.
12
−2.5
0
2
4
6
8
10
(h) Replica 2. Latent force 3.
12
−2
0
2
4
6
8
10
12
(i) Replica 3. Latent force 3.
Figure 1: (a)-(c) and (d)-(f) the two latent forces associated with p53 activity. p53 targets are sensitive to a
combination of these functions allowing them to account for transcriptional delays. (g)-(h) a latent force that
was inconsistent across the replicas. It may be associated with cellular processes not directly related to p53.
R∞
erfc(z) = 1 − erf(z) = √2π z exp(−v 2 )dv. Faddeeva’s
function is usually considered the complex equivalent
of the error function, since |w(jz)| is bounded whenever the imaginary part of jz is greater or equal than
zero, and is the key to achieving a good numerical stability when computing (6) and its gradients.
4.1
Motion Capture data
Our motion capture data set is from the CMU motion
capture data base2 . We considered 3 balancing motions (18, 19, 20) from subject 49. The subject starts
in a standing position with arms raised, then, over
about 10 seconds, he raises one leg in the air and lowers his arms to an outstretched position. Of interest
to us was the fact that, whilst motions 18 and 19 are
relatively similar, motion 20 contains more dramatic
movements. We were interested in training on motions
18 and 19 and testing on the more dramatic movement
Similarly, the cross-covariance between latent functions and outputs is given by
√
ℓr Srq π
[Υr (e
γq , t, t′ ) − Υr (γq , t, t′ )],
kyq fr (t, t′ ) =
j4ωq
A visualization of a covariance matrix with a latent
force and three different outputs (overdamped, underdamped and critically damped) is given in Figure 2.
2
The CMU Graphics Lab Motion Capture Database was
created with funding from NSF EIA-0196217 and is available at http://mocap.cs.cmu.edu.
13
f(t)
Latent Force Models
Table 1: Root mean squared (RMS) angle error for
prediction of the left arm’s configuration in the motion
capture data. Prediction with the latent force model
outperforms the prediction with regression for all apart
from the radius’s angle.
Latent Force Regression
Angle
Error
Error
Radius
4.11
4.02
Wrist
6.55
6.65
Hand X rotation
1.82
3.21
Hand Z rotation
2.76
6.14
Thumb X rotation
1.77
3.10
Thumb Z rotation
2.73
6.09
0.8
y1(t)
0.6
0.4
y2(t)
0.2
0
y3(t)
−0.2
−0.4
f(t)
y (t)
1
y (t)
2
y (t)
3
Figure 2: Visualization of the covariance matrix associated with the second order kernel. Three outputs and
their correlation with the latent function are shown.
Output 1 is underdamped and the natural frequency
is observable through the bars of alternating correlation and anti correlation in the associated portions of
the covariance matrix. Output 2 is overdamped, note
the more diffuse covariance in comparison to Output
3 which is critically damped.
tial equations in order to recover multioutput Gaussian
processes which are functions of several inputs.
5.1
The Jura data is a set of measurements of concentrations of several heavy metal pollutants collected from
topsoil in a 14.5 km2 region of the Swiss Jura. We
consider a latent function that represents how the pollutants were originally laid down. As time passes, we
assume that the pollutants diffuse at different rates
resulting in the concentrations observed in the data
set. We therefore consider a simplified version of the
diffusion equation, known also as the heat equation,
to assess the model’s ability to extrapolate. The data
was down-sampled by 32 (from 120 frames per second
to just 3.75) and we focused on the subject’s left arm.
Our objective was to reconstruct the motion of this
arm for motion 20 given the angles of the shoulder
and the parameters learned from motions 18 and 19
using two latent functions. First, we train the second
order differential equation latent force model on motions 18 and 19, treating the sequences as independent
but sharing parameters (i.e. the damping coefficients
and natural frequencies of the two differential equations associated with each angle were constrained to
be the same). Then, for the test data, we condition on
the observations of the shoulder’s orientation to make
predictions for the rest of the arm’s angles.
d
∂yq (x, t) X ∂ 2 yq (x, t)
κq
,
=
∂t
∂x2j
j=1
where d = 2 is the dimension of x, the measured concentration of each pollutant over space and time is
given by yq (x, t), and the latent function fr (x) now
represents the concentration of pollutants at time zero
(i.e. the system’s initial condition). The solution to
the system (Polyanin, 2002) is then given by
For comparison, we considered a regression model that
directly predicts the angles of the arm given the orientation of the shoulder using standard independent
GPs with RBF covariance functions. Results are summarized in Table 1, with some example plots of the
tracks of the angles given in Figure 3.
5
Diffusion in the Swiss Jura
yq (x, t) =
R
X
r=1
Srq
Z
fr (x′ )Gq (x, x′ , t)dx′
Rd
where Gq (x, x′ , t) is the Green’s function given as
d
X
(xj − x′j )2
1
′
,
Gq (x, x , t) =
exp −
d/2
4Tq
2d π d/2 Tq
j=1
Partial Differential Equations and
Latent Forces
So far we have considered dynamical latent force models based on ordinary differential equations, leading to
multioutput Gaussian processes which are functions
of a single variable: time. However, the methodology
can also be applied in the context of partial differen-
with Tq = κq t. Again, if we take the latent function to
be given by a GP with the RBF covariance function
we can compute the multiple output covariance functions analytically. The covariance function between
14
Alvarez, Luengo, Lawrence
150
45
100
40
50
35
0
−5
30
0
−10
25
−50
−15
20
−100
15
−150
5
−250
0
−300
0
−20
10
−200
1
2
3
4
5
6
7
8
9
−5
0
−25
1
2
(a) Inferred Latent Force
3
4
5
6
7
8
9
−30
0
1
(b) Wrist
2
3
4
5
6
7
8
9
8
9
(c) Hand X Rotation
0
12
20
−5
10
15
8
10
6
5
4
0
2
−5
0
−10
−10
−15
−20
−25
−30
−35
−40
−45
0
1
2
3
4
5
6
7
8
9
(d) Hand Z Rotation
−2
0
1
2
3
4
5
6
7
8
9
(e) Thumb X Rotation
−15
0
1
2
3
4
5
6
7
(f) Thumb Z Rotation
Figure 3: (a) Inferred latent force for the motion capture data. The force shown is the weighted sum of the two
forces that drive the system. (b)-(f) Predictions from the latent force model (solid line, grey error bars) and from
direct regression from the shoulder angles (crosses with stick error bars). For these examples noise is high due
to the relatively small length of the bones. Despite this the latent force model does a credible job of capturing
the angle, whereas direct regression with independent GPs fails to capture the trends.
the output functions is obtained as
kyp yq (x, x′ , t) =
R
X
r=1
zinc for copper; copper, nickel and zinc for lead; nickel
and zinc for cobalt).3 By conditioning on the values of the secondary variables we can improve the
prediction of the primary variables. We compare results for the diffusion kernel with results from prediction using independent GPs for the metals and “ordinary co-kriging” (as reported by (Goovaerts, 1997,
pp. 248,249)). For our experiments we made use of
10 repeats to report standard deviations. Mean absolute errors and standard deviations are shown in Table
2 ((Goovaerts, 1997) does not report standard deviations for the co-kriging method). Our diffusion model
outperforms co-kriging for all but one example.
Srp Srq |Lr |1/2
|Lrp + Lrq + Lr |1/2
1
⊤
−1
× exp − (x − x′ ) (Lrp + Lrq + Lr ) (x − x′ ) ,
2
where Lrp , Lrq and Lr are diagonal isotropic matrices
with entries 2κp t, 2κq t and 1/ℓ2r respectively. The covariance function between the output and latent functions is given by
Srq |Lr |1/2
kyq fr (x, x′ , t) =
|Lrq + Lr |1/2
1
−1
′ ⊤
′
× exp − (x − x ) (Lrq + Lr ) (x − x ) .
2
5.2
6
Discussion
We have proposed a hybrid approach for the use of simple mechanistic models with Gaussian processes which
allows for the creation of new kernels with physically
meaningful parameters. We have shown how these kernels can be applied to a range of data sets for the
analysis of microarray data, motion capture data and
Prediction of Metal Concentrations
We used our model to replicate the experiments described in (Goovaerts, 1997, pp. 248,249) in which a
primary variable (cadmium, copper, lead and cobalt)
is predicted in conjunction with some secondary variables (nickel and zinc for cadmium; lead, nickel and
3
15
Data available at http://www.ai-geostats.org/.
Latent Force Models
References
Table 2: Mean absolute error and standard deviation for ten repetitions of the experiment for the Jura
dataset. IGPs stands for independent GPs, GPDK
stands for GP diffusion kernel, OCK for ordinary cokriging, Cd for Cadmium, Cu for Copper, Pb for lead
and Co for Cobalt. For the Gaussian process with diffusion kernel, we learn the diffusion coefficients and the
length-scale of the covariance of the latent function.
Metals
Cd
Cu
Pb
Co
IGPs
0.5823±0.0133
15.9357±0.0907
22.9141±0.6076
2.0735±0.1070
GPDK
0.4505±0.0126
7.1677±0.2266
10.1097±0.2842
1.7546±0.0895
Mauricio Alvarez and Neil D. Lawrence. Sparse convolved gaussian processes for multi-output regression. In Advances in Neural Information Processing
Systems 21, pages 57–64. MIT Press, 2009.
Martino Barenco, Daniela Tomescu, Daniel Brewer,
Robin Callard, Jaroslav Stark, and Michael Hubank.
Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3):R25,
2006.
OCK
0.5
7.8
10.7
1.5
Phillip Boyle and Marcus Frean. Dependent Gaussian processes. In Lawrence Saul, Yair Weiss, and
Léon Bouttou, editors, Advances in Neural Information Processing Systems, volume 17, pages 217–224,
Cambridge, MA, 2005. MIT Press.
Pei Gao, Antti Honkela, Magnus Rattray, and Neil D.
Lawrence. Gaussian process modelling of latent
chemical species: Applications to inferring transcription factor activities. Bioinformatics, 24:i70–
i75, 2008.
geostatistical data. To do this we proposed a range of
linear differential equation models: first order, second
order and a partial differential equation. The solutions to all these differential equations are in the form
of convolutions. When applied to a Gaussian process
latent function they result in a joint GP over the latent
functions and the observed outputs which provides a
general framework for multi-output GP regression.
Pierre Goovaerts. Geostatistics For Natural Resources
Evaluation. Oxford University Press, 1997.
David M. Higdon. Space and space-time modelling
using process convolutions. In C. Anderson, V. Barnett, P. Chatwin, and A. El-Shaarawi, editors,
Quantitative methods for current environmental issues, pages 37–56. Springer-Verlag, 2002.
We are not the first to suggest the use of convolution processes for multi-output regression, they were
proposed by (Higdon, 2002) and built on by (Boyle
and Frean, 2005) — the ideas in these papers have
also recently been made more computationally practical through sparse approximations suggested by (Alvarez and Lawrence, 2009). However, whilst (Boyle
and Frean, 2005) was motivated by the general idea of
constructing multi-output GPs, our aims are different.
Our focus has been embodying GPs with the characteristics of mechanistic models so that our data driven
models can exhibit well understood characteristics of
these physical systems. To maintain tractability these
mechanistic models are necessarily over simplistic, but
our results have shown that they can lead to significant
improvements on a wide range of data sets.
Neil D. Lawrence, Guido Sanguinetti, and Magnus
Rattray. Modelling transcriptional regulation using
Gaussian processes. In Bernhard Schölkopf, John C.
Platt, and Thomas Hofmann, editors, Advances in
Neural Information Processing Systems, volume 19,
pages 785–792, Cambridge, MA, 2007. MIT Press.
Neil D. Lawrence. Probabilistic non-linear principal
component analysis with Gaussian process latent
variable models. Journal of Machine Learning Research, 6:1783–1816, Nov. 2005.
Andrei D. Polyanin. Handbook of Linear Partial Differential Equations for Engineers and Scientists.
Chapman & Hall/CRC Press, 2002.
Joaquin Quiñonero Candela and Carl Edward Rasmussen. A unifying view of sparse approximate
Gaussian process regression. Journal of Machine
Learning Research, 6:1939–1959, 2005.
Acknowledgements
Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In
Robert G. Cowell and Zoubin Ghahramani, editors,
Proceedings of the Tenth International Workshop on
Artificial Intelligence and Statistics, pages 333–340,
Barbados, 6-8 January 2005. Society for Artificial
Intelligence and Statistics.
DL has been partly financed by Comunidad de Madrid
(project PRO-MULTIDIS-CM, S-0505/TIC/0233),
and by the Spanish government (CICYT project
TEC2006-13514-C02-01 and researh grant JC200800219). MA and NL have been financed by a Google
Research Award and EPSRC Grant No EP/F005687/1
“Gaussian Processes for Systems Identification with
Applications in Systems Biology”.
16