Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Latent Force Models Mauricio Alvarez School of Computer Science University of Manchester Manchester, UK, M13 9PL alvarezm@cs.man.ac.uk 1 David Luengo Dep. Teorı́a de Señal y Comunicaciones Universidad Carlos III de Madrid 28911 Leganés, Spain luengod@ieee.org Neil D. Lawrence School of Computer Science University of Manchester Manchester, UK, M13 9PL neill@cs.man.ac.uk Abstract extrapolate, i.e. make predictions in a regime in which data has not been seen yet, performance can be poor. Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from computational biology, motion capture and geostatistics. Purely mechanistic models, i.e. models which are inspired by the underlying physical knowledge of the system, are common in many areas such as chemistry, systems biology, climate modelling and geophysical sciences, etc. They normally make use of a fairly well characterized physical process that underpins the system, typically represented with a set of differential equations. The purely mechanistic approach leaves us with a different set of problems to those from the data driven approach. In particular, accurate description of a complex system through a mechanistic modelling paradigm may not be possible: even if all the physical processes can be adequately described, the resulting model could become extremely complex. Identifying and specifying all the interactions might not be feasible, and we would still be faced with the problem of identifying the parameters of the system. Despite these problems, physically well characterized models retain a major advantage over purely data driven models. A mechanistic model can enable accurate prediction even in regions where there may be no available training data. For example, Pioneer space probes have been able to enter different extra terrestrial orbits despite the absence of data for these orbits. Introduction Traditionally, the main focus in machine learning has been model generation through a data driven paradigm. The usual approach is to combine a data set with a (typically fairly flexible) class of models and, through judicious use of regularization, make useful predictions on previously unseen data. There are two key problems with purely data driven approaches. Firstly, if data is scarce relative to the complexity of the system we may be unable to make accurate predictions on test data. Secondly, if the model is forced to In this paper we advocate an alternative approach. Rather than relying on an exclusively mechanistic or data driven approach we suggest a hybrid system which involves a (typically overly simplistic) mechanistic model of the system which can easily be augmented through machine learning techniques. We will start by considering two dynamical systems, both simple latent variable models, which incorporate first and second order differential equations. Our inspiration is the work of (Lawrence et al., 2007; Gao et al., 2008) who encoded a first order differential equation in a Gaussian process (GP). However, their aim was to construct an accurate model of transcriptional regulation, whereas ours is to make use of the mechanistic model to in- Appearing in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR: W&CP 5. Copyright 2009 by the authors. 9 Latent Force Models corporate salient characteristics of the data (e.g. in a mechanical system inertia) without necessarily associating the components of our mechanistic model with actual physical components of the system. For example, for a human motion capture dataset we develop a mechanistic model of motion capture that does not exactly replicate the physics of human movement, but nevertheless captures salient features of the movement. Having shown how first and second order dynamical systems can be incorporated in a GP, we finally show how partial differential equations can also be incorporated for modelling systems with multiple inputs. 2 with a process prior. The notation used, f:,r , indicates the r-th column of F, and represents the values of that function for the r-th dimension at the times given by t. The matrix Kf:,r ,f:,r is the covariance function associated to fr (t) computed at the times given in t. Such a GP can be readily implemented. Given the covariance functions for {fr (t)} the implied covariance functions for {yq (t)} are straightforward to derive. In (Teh et al., 2005) this is known as a semi-parametric latent factor model (SLFM), although their main focus is not the temporal case. Historically the Kalman filter approach has been preferred, perhaps because of its linear computational complexity in N . However, recent advances in sparse approximations have made the general GP framework practical (see (Quiñonero Candela and Rasmussen, 2005) for a review). Latent Variables and Physical Systems From the perspective of machine learning our approach can be seen as a type of latent variable model. In a latent variable model we may summarize a high dimensional data set with a reduced dimensional representation. For example, if our data consists of N points in a Q dimensional space we might seek a linear relationship between the data, Y ∈ RN ×Q , and a reduced dimensional representation, F ∈ RN ×R , where R < Q. From a probabilistic perspective this involves an assumption that we can represent the data as Y = FW + E, So far the model described relies on the latent variables to provide the dynamic information. Our main contribution is to include a further dynamical system with a mechanistic inspiration. We now use a mechanical analogy to introduce it. Consider the following physical interpretation of (1): the latent functions, fr (t), are R forces and we observe the displacement of Q springs, yq (t), to the forces. Then we can reinterpret (1) as the force balance equation, YD = FS+ Ẽ. Here we have assumed that the forces are acting, for example, through levers, so that we have a matrix of sensitivities, S ∈ RR×Q , and a diagonal matrix of spring constants, D ∈ RQ×Q . The original model is recovered  by setting W = SD−1 and ǫ̃:,q ∼ N 0, D⊤ ΣD . The model can be extended by assuming that the spring is acting in parallel with a damper and that the system has mass, allowing us to write, (1) where E is a matrix-variate Gaussian noise: each column, ǫ:,q (1 ≤ q ≤ Q), is a multi-variate Gaussian with zero mean and covariance Σ, i.e. ǫ:,q ∼ N (0, Σ). The usual approach, as undertaken in factor analysis and principal component analysis (PCA), to dealing with the unknowns in this model is to integrate out F under a Gaussian prior and optimize with respect to W ∈ RR×Q (although it turns out that for a non-linear variant of the model it can be convenient to do this the other way around, see e.g. (Lawrence, 2005)). If the data has a temporal nature, then the Gaussian prior in the latent space could express a relationship between the rows of F, ftn = ftn−1 + η, where η ∼ N (0, C) and ftn is the n-th row of F, which we associate with time tn . This is known as the Kalman filter/smoother. Normally the times, tn , are taken to be equally spaced, but more generally we can consider a joint distribution ⊤ for p (F|t), t = [t1 . . . tN ] , which has the form of a Gaussian process (GP), p (F|t) = R Y FS = ŸM + ẎC + YD + ǫ, (2) where M and C are diagonal matrices of masses and damping coefficients respectively, Ẏ ∈ RN ×Q is the first derivative of Y w.r.t. time and Ÿ is the second derivative. The second order mechanical system that this model describes will exhibit several characteristics which are impossible to represent in the simpler latent variable model given by (1), such as inertia and resonance. This model is not only appropriate for data from mechanical systems. There are many analogous systems which can also be represented by second order differential equations, e.g. Resistor-InductorCapacitor circuits. A unifying characteristic for all these models is that the system is beign forced by laR tent functions, {fr (t)}r=1 . Hence, we refer to them as latent force models (LFMs).  N f:,r |0, Kf:,r ,f:,r , r=1 where we have assumed zero mean and independence across the R dimensions of the latent space. The GP makes explicit the fact that the latent variables are R functions, {fr (t)}r=1 , and we have now described them One way of thinking of our model is to consider puppetry. A marionette is a representation of a human (or animal) controlled by a limited number of inputs through strings (or rods) attached to the character. 10 Alvarez, Luengo, Lawrence If each latent force is taken to be independent with a covariance function given by   (t − t′ )2 kfr ,fr (t, t′ ) = exp − , ℓ2r This limited number of inputs can lead to a wide range of character movements. In our model, the data is the movements of the marionette, and the latent forces are the inputs to the system from the puppeteer. Finally, note that it is of little use to include dynamical models of the type specified in (2) if their effects cannot be efficiently incorporated into the inference process. Fortunately, as we will see in the case studies, for an important class of covariance functions it is analytically tractable to compute the implied covariance Q functions for {yq (t)}q=1 . Then, given the data a conjugate gradient descent algorithm can be used to obtain the hyperparameters of the model which minimize the minus log-likelihood, and inference is performed based on standard GP regression techniques. 3 then we can compute the covariance of the outputs analytically, obtaining (Lawrence et al., 2007) ′ kyp yq (t, t ) = r=1 2 [hqp (t′ , t) + hpq (t, t′ )], where ( 2 exp(νrq ) ′ hqp (t , t) = exp(−Dq t ) exp(Dq t) Dp + Dq   ′    t −t t × erf − νrq + erf + νrq ℓr ℓr  )   ′ t − νrq + erf(νrq ) , − exp(−Dp t) erf ℓr ′ First Order Dynamical System A single input module is a biological network motif where the transcription of a number of genes is driven by a single transcription factor. In (Barenco et al., 2006) a simple first order differential equation was proposed to model this situation. Then (Lawrence et al., 2007; Gao et al., 2008) suggested that inference of the latent transcription factor concentration should be handled using GPs. In effect their model can be seen as a latent force model based on a first order differential equation with a single latent force. Here we consider the extension of this model to multiple latent forces. As a mechanistic model, this is a severe over simplification of the physical system: transcription factors are known to interact in a non linear manner. Despite this we will be able to uncover useful information. Our model is based on the following differential equation, R X dyq (t) Srq fr (t). + Dq yq (t) = Bq + dt r=1 √ R X Srp Srq πℓr hereR erf(x) is the real valued error function, erf(x) = x √2 exp(−y 2 )dy, and νrq = ℓr Dq /2. π 0 Additionally, we can compute the cross-covariance between the inputs and outputs, √ Srq πℓr 2 exp(νrq ) exp(−Dq (t − t′ )) kyq fr (t, t′ ) = 2   ′   ′  t −t t × erf . − νrq + erf + νrq ℓr ℓr 3.1 p53 Data Our data is from (Barenco et al., 2006), where leukemia cell lines were bombarded with radiation to induce activity of the transcription factor p53. This transcription factor repairs DNA damage and triggers a mechanism which pauses the cell-cycle and potentially terminates the cell. In (Barenco et al., 2006) microarray gene expression levels of known targets of p53 were used to fit a first order differential equation model to the data. The model was then used to provide a ranked list of 50 genes identified as regulated by p53. (3) Here the latent forces, fr (t), represent protein concentration (which is difficult to observe directly), the outputs, yq (t), are the mRNA abundance levels for different genes, Bq and Dq are respectively the basal transcription and the decay rates of the q-th gene, and Srq are coupling constants that quantify the influence of the r-th input on the q-th output (i.e. the sensitivity of gene q to the concentration of protein r). Solving (3) for yq (t), we obtain Our aim is to determine if there are additional “latent forces” which could better explain the activity of some of these genes. The experimental data consists of measurements of expression levels of 50 genes for three different replicas. Within each replica, there are measurements at seven different time instants. We constructed a latent force model with six latent forces, assuming that each replica was independently produced but fixing the hyperparameters of the kernel across the R Bq X Lrq [fr ](t), + yq (t) = Dq r=1 where we have ignored transient terms, which are easily included, and the linear operator is given by the following linear convolution operator, Z t fr (τ ) exp(Dq τ )dτ . Lrq [fr ](t) = Srq exp(−Dq t) 0 11 Latent Force Models replicas1 . We employed a sparse approximation, as proposed in (Alvarez and Lawrence, 2009), with ten inducing points for speeding up computation. For the motion capture data yq (t) corresponds to a given observed angle over time, and its derivatives represent angular velocity and acceleration. The system is summarized by the undamped natural frequency, p p ω0q = Dq , and the damping ratio, ζq = 12 Cq / Dq . Systems with a damping ratio greater than one are said to be overdamped, whereas underdamped systems exhibit resonance and have a damping ratio less than one. For critically damped systems ζq = 1, and finally, for undamped systems (i.e. no friction) ζq = 0. Of the six latent functions, two were automatically switched off by the model. Two further latent functions, shown in Figure 1 as latent forces 1 & 2, were consistent across all replicas: their shapes were time translated versions of the p53 profile as identified by (Barenco et al., 2006; Lawrence et al., 2007; Gao et al., 2008). This time translation allows genes to experience different transcriptional delays, a mechanism not included explicitly in our model, but mimicked by linear mixing of an early and a late signal. The remaining two latent functions were inconsistent across the replicas (see e.g. latent force 3 in Figure 1). They appear to represent processes not directly related to p53. This was backed up by the sensitivity parameters found in the model. The known p53 targets DDB2, p21, SESN1/hPA26, BIK and TNFRSF10b were found to respond to latent forces 1 & 2. Conversely, the genes that were most responsive to latent force 3 were MAP4K4, a gene involved in environmental stress signalling, and FDXR, an electron transfer protein. 4 Ignoring the initial conditions once more, the solution of (4) is again given by a convolution, with the linear operator now being Lrq [fr ](t) Srq exp(−αq t) ωq Z t × fr (τ ) exp(αq τ ) sin(ωq (t − τ ))dτ , = 0 (5) where ωq = q 4Dq − Cq2 /2 and αq = Cq /2. Once again, if we consider a latent force governed by a GP with the RBF covariance function we can solve (5) analytically, obtaining a closed-form expression for the covariance matrix of the outputs, Second Order Dynamical System In Section 1 we introduced the analogy of a marionette’s motion being controlled by a reduced number of forces. Human motion capture data consists of a skeleton and multivariate time courses of angles which summarize the motion. This motion can be modelled with a set of second order differential equations which, due to variations in the centers of mass induced by the movement, are non-linear. The simplification we consider for the latent force model is to linearize these differential equations, resulting in the following second order dynamical system, ′ kyp yq (t, t ) = R X Srp Srq r=1 p πℓ2r 8ωp ωq ky(r) (t, t′ ). p yq (r) Here kyp yq (t, t′ ) can be considered the cross-covariance between the p-th and q-th outputs under the effect of the r-th latent force, and is given by ky(r) (t, t′ ) p yq = hr (e γq , γp , t, t′ ) + hr (γp , γ eq , t′ , t) + hr (γq , γ ep , t, t′ ) + hr (e γp , γq , t′ , t) − hr (e γq , γ ep , t, t′ ) − hr (e γp , γ eq , t′ , t) − hr (γq , γp , t, t′ ) − hr (γp , γq , t′ , t), R X d2 yq (t) dyq (t) Srq fr (t), (4) +C +D y (t) = B + q q q q dt2 dt r=1 where γp = αp + jωp , γ ep = αp − jωp , and hr (γq , γp , t, t′ ) = where the mass of the system, without loss of generality, is normalized to 1. Whilst (4) is not the correct physical model for our system, it will still be helpful when extrapolating predictions across different motions, as we shall see in the next section. Note also that, although similar to (3), the dynamic behavior of this system is much richer than that of the first order system, since it can exhibit inertia and resonance. Υr (γq , t′ , t) − exp(−γp t)Υr (γq , t′ , 0) , γp + γq with  2 2 ℓ γ Υr (γq , t, t′ ) = 2 exp r4 q exp(−γq (t − t′ ))     ′ 2 (t′ )2 ) w(jz (t)) − exp − − exp − (t−t rq ℓ2 ℓ2 r × exp(−γq t)w(−jzrq (0)), 1 The decay rates were asssumed equal within replicas. Although this might be an important restriction for this experiment, our purpose in this paper is to expose a general methodology without delving into the details of each experimental setup. r (6) and zrq (t) = (t − t′ )/ℓr − (ℓr γq )/2. Note that zrq (t) ∈ C, and w(jz) in (6), for z ∈ C, denotes Faddeeva’s function w(jz) = exp(z 2 )erfc(z), where erfc(z) is the complex version of the complementary error function, 12 Alvarez, Luengo, Lawrence 2 2 1.5 1.5 1 1 0.5 0.5 3 2 1 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 0 −1 −2 −2 0 2 4 6 8 10 12 (a) Replica 1. Latent force 1. −2 0 2 4 6 8 10 12 ¿ −3 0 (b) Replica 2. Latent force 1. 2 2 1.5 1.5 1 1 0.5 0.5 2 4 6 8 10 12 (c) Replica 3. Latent force 1. 3 2 1 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 0 −1 −2 −2 0 2 4 6 8 10 12 −2 0 (d) Replica 1. Latent force 2. 2 4 6 8 10 12 −3 0 (e) Replica 2. Latent force 2. 2 2.5 4 6 8 10 12 2 2 1.5 2 (f) Replica 3. Latent force 2. 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 −0.5 −0.5 −0.5 −1 −1 −1 −1.5 −1.5 −2 0 −1.5 −2 2 4 6 8 10 (g) Replica 1. Latent force 3. 12 −2.5 0 2 4 6 8 10 (h) Replica 2. Latent force 3. 12 −2 0 2 4 6 8 10 12 (i) Replica 3. Latent force 3. Figure 1: (a)-(c) and (d)-(f) the two latent forces associated with p53 activity. p53 targets are sensitive to a combination of these functions allowing them to account for transcriptional delays. (g)-(h) a latent force that was inconsistent across the replicas. It may be associated with cellular processes not directly related to p53. R∞ erfc(z) = 1 − erf(z) = √2π z exp(−v 2 )dv. Faddeeva’s function is usually considered the complex equivalent of the error function, since |w(jz)| is bounded whenever the imaginary part of jz is greater or equal than zero, and is the key to achieving a good numerical stability when computing (6) and its gradients. 4.1 Motion Capture data Our motion capture data set is from the CMU motion capture data base2 . We considered 3 balancing motions (18, 19, 20) from subject 49. The subject starts in a standing position with arms raised, then, over about 10 seconds, he raises one leg in the air and lowers his arms to an outstretched position. Of interest to us was the fact that, whilst motions 18 and 19 are relatively similar, motion 20 contains more dramatic movements. We were interested in training on motions 18 and 19 and testing on the more dramatic movement Similarly, the cross-covariance between latent functions and outputs is given by √ ℓr Srq π [Υr (e γq , t, t′ ) − Υr (γq , t, t′ )], kyq fr (t, t′ ) = j4ωq A visualization of a covariance matrix with a latent force and three different outputs (overdamped, underdamped and critically damped) is given in Figure 2. 2 The CMU Graphics Lab Motion Capture Database was created with funding from NSF EIA-0196217 and is available at http://mocap.cs.cmu.edu. 13 f(t) Latent Force Models Table 1: Root mean squared (RMS) angle error for prediction of the left arm’s configuration in the motion capture data. Prediction with the latent force model outperforms the prediction with regression for all apart from the radius’s angle. Latent Force Regression Angle Error Error Radius 4.11 4.02 Wrist 6.55 6.65 Hand X rotation 1.82 3.21 Hand Z rotation 2.76 6.14 Thumb X rotation 1.77 3.10 Thumb Z rotation 2.73 6.09 0.8 y1(t) 0.6 0.4 y2(t) 0.2 0 y3(t) −0.2 −0.4 f(t) y (t) 1 y (t) 2 y (t) 3 Figure 2: Visualization of the covariance matrix associated with the second order kernel. Three outputs and their correlation with the latent function are shown. Output 1 is underdamped and the natural frequency is observable through the bars of alternating correlation and anti correlation in the associated portions of the covariance matrix. Output 2 is overdamped, note the more diffuse covariance in comparison to Output 3 which is critically damped. tial equations in order to recover multioutput Gaussian processes which are functions of several inputs. 5.1 The Jura data is a set of measurements of concentrations of several heavy metal pollutants collected from topsoil in a 14.5 km2 region of the Swiss Jura. We consider a latent function that represents how the pollutants were originally laid down. As time passes, we assume that the pollutants diffuse at different rates resulting in the concentrations observed in the data set. We therefore consider a simplified version of the diffusion equation, known also as the heat equation, to assess the model’s ability to extrapolate. The data was down-sampled by 32 (from 120 frames per second to just 3.75) and we focused on the subject’s left arm. Our objective was to reconstruct the motion of this arm for motion 20 given the angles of the shoulder and the parameters learned from motions 18 and 19 using two latent functions. First, we train the second order differential equation latent force model on motions 18 and 19, treating the sequences as independent but sharing parameters (i.e. the damping coefficients and natural frequencies of the two differential equations associated with each angle were constrained to be the same). Then, for the test data, we condition on the observations of the shoulder’s orientation to make predictions for the rest of the arm’s angles. d ∂yq (x, t) X ∂ 2 yq (x, t) κq , = ∂t ∂x2j j=1 where d = 2 is the dimension of x, the measured concentration of each pollutant over space and time is given by yq (x, t), and the latent function fr (x) now represents the concentration of pollutants at time zero (i.e. the system’s initial condition). The solution to the system (Polyanin, 2002) is then given by For comparison, we considered a regression model that directly predicts the angles of the arm given the orientation of the shoulder using standard independent GPs with RBF covariance functions. Results are summarized in Table 1, with some example plots of the tracks of the angles given in Figure 3. 5 Diffusion in the Swiss Jura yq (x, t) = R X r=1 Srq Z fr (x′ )Gq (x, x′ , t)dx′ Rd where Gq (x, x′ , t) is the Green’s function given as   d X (xj − x′j )2 1 ′ , Gq (x, x , t) = exp − d/2 4Tq 2d π d/2 Tq j=1 Partial Differential Equations and Latent Forces So far we have considered dynamical latent force models based on ordinary differential equations, leading to multioutput Gaussian processes which are functions of a single variable: time. However, the methodology can also be applied in the context of partial differen- with Tq = κq t. Again, if we take the latent function to be given by a GP with the RBF covariance function we can compute the multiple output covariance functions analytically. The covariance function between 14 Alvarez, Luengo, Lawrence 150 45 100 40 50 35 0 −5 30 0 −10 25 −50 −15 20 −100 15 −150 5 −250 0 −300 0 −20 10 −200 1 2 3 4 5 6 7 8 9 −5 0 −25 1 2 (a) Inferred Latent Force 3 4 5 6 7 8 9 −30 0 1 (b) Wrist 2 3 4 5 6 7 8 9 8 9 (c) Hand X Rotation 0 12 20 −5 10 15 8 10 6 5 4 0 2 −5 0 −10 −10 −15 −20 −25 −30 −35 −40 −45 0 1 2 3 4 5 6 7 8 9 (d) Hand Z Rotation −2 0 1 2 3 4 5 6 7 8 9 (e) Thumb X Rotation −15 0 1 2 3 4 5 6 7 (f) Thumb Z Rotation Figure 3: (a) Inferred latent force for the motion capture data. The force shown is the weighted sum of the two forces that drive the system. (b)-(f) Predictions from the latent force model (solid line, grey error bars) and from direct regression from the shoulder angles (crosses with stick error bars). For these examples noise is high due to the relatively small length of the bones. Despite this the latent force model does a credible job of capturing the angle, whereas direct regression with independent GPs fails to capture the trends. the output functions is obtained as kyp yq (x, x′ , t) = R X r=1 zinc for copper; copper, nickel and zinc for lead; nickel and zinc for cobalt).3 By conditioning on the values of the secondary variables we can improve the prediction of the primary variables. We compare results for the diffusion kernel with results from prediction using independent GPs for the metals and “ordinary co-kriging” (as reported by (Goovaerts, 1997, pp. 248,249)). For our experiments we made use of 10 repeats to report standard deviations. Mean absolute errors and standard deviations are shown in Table 2 ((Goovaerts, 1997) does not report standard deviations for the co-kriging method). Our diffusion model outperforms co-kriging for all but one example. Srp Srq |Lr |1/2 |Lrp + Lrq + Lr |1/2   1 ⊤ −1 × exp − (x − x′ ) (Lrp + Lrq + Lr ) (x − x′ ) , 2 where Lrp , Lrq and Lr are diagonal isotropic matrices with entries 2κp t, 2κq t and 1/ℓ2r respectively. The covariance function between the output and latent functions is given by Srq |Lr |1/2 kyq fr (x, x′ , t) = |Lrq + Lr |1/2   1 −1 ′ ⊤ ′ × exp − (x − x ) (Lrq + Lr ) (x − x ) . 2 5.2 6 Discussion We have proposed a hybrid approach for the use of simple mechanistic models with Gaussian processes which allows for the creation of new kernels with physically meaningful parameters. We have shown how these kernels can be applied to a range of data sets for the analysis of microarray data, motion capture data and Prediction of Metal Concentrations We used our model to replicate the experiments described in (Goovaerts, 1997, pp. 248,249) in which a primary variable (cadmium, copper, lead and cobalt) is predicted in conjunction with some secondary variables (nickel and zinc for cadmium; lead, nickel and 3 15 Data available at http://www.ai-geostats.org/. Latent Force Models References Table 2: Mean absolute error and standard deviation for ten repetitions of the experiment for the Jura dataset. IGPs stands for independent GPs, GPDK stands for GP diffusion kernel, OCK for ordinary cokriging, Cd for Cadmium, Cu for Copper, Pb for lead and Co for Cobalt. For the Gaussian process with diffusion kernel, we learn the diffusion coefficients and the length-scale of the covariance of the latent function. Metals Cd Cu Pb Co IGPs 0.5823±0.0133 15.9357±0.0907 22.9141±0.6076 2.0735±0.1070 GPDK 0.4505±0.0126 7.1677±0.2266 10.1097±0.2842 1.7546±0.0895 Mauricio Alvarez and Neil D. Lawrence. Sparse convolved gaussian processes for multi-output regression. In Advances in Neural Information Processing Systems 21, pages 57–64. MIT Press, 2009. Martino Barenco, Daniela Tomescu, Daniel Brewer, Robin Callard, Jaroslav Stark, and Michael Hubank. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3):R25, 2006. OCK 0.5 7.8 10.7 1.5 Phillip Boyle and Marcus Frean. Dependent Gaussian processes. In Lawrence Saul, Yair Weiss, and Léon Bouttou, editors, Advances in Neural Information Processing Systems, volume 17, pages 217–224, Cambridge, MA, 2005. MIT Press. Pei Gao, Antti Honkela, Magnus Rattray, and Neil D. Lawrence. Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities. Bioinformatics, 24:i70– i75, 2008. geostatistical data. To do this we proposed a range of linear differential equation models: first order, second order and a partial differential equation. The solutions to all these differential equations are in the form of convolutions. When applied to a Gaussian process latent function they result in a joint GP over the latent functions and the observed outputs which provides a general framework for multi-output GP regression. Pierre Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, 1997. David M. Higdon. Space and space-time modelling using process convolutions. In C. Anderson, V. Barnett, P. Chatwin, and A. El-Shaarawi, editors, Quantitative methods for current environmental issues, pages 37–56. Springer-Verlag, 2002. We are not the first to suggest the use of convolution processes for multi-output regression, they were proposed by (Higdon, 2002) and built on by (Boyle and Frean, 2005) — the ideas in these papers have also recently been made more computationally practical through sparse approximations suggested by (Alvarez and Lawrence, 2009). However, whilst (Boyle and Frean, 2005) was motivated by the general idea of constructing multi-output GPs, our aims are different. Our focus has been embodying GPs with the characteristics of mechanistic models so that our data driven models can exhibit well understood characteristics of these physical systems. To maintain tractability these mechanistic models are necessarily over simplistic, but our results have shown that they can lead to significant improvements on a wide range of data sets. Neil D. Lawrence, Guido Sanguinetti, and Magnus Rattray. Modelling transcriptional regulation using Gaussian processes. In Bernhard Schölkopf, John C. Platt, and Thomas Hofmann, editors, Advances in Neural Information Processing Systems, volume 19, pages 785–792, Cambridge, MA, 2007. MIT Press. Neil D. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6:1783–1816, Nov. 2005. Andrei D. Polyanin. Handbook of Linear Partial Differential Equations for Engineers and Scientists. Chapman & Hall/CRC Press, 2002. Joaquin Quiñonero Candela and Carl Edward Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1939–1959, 2005. Acknowledgements Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In Robert G. Cowell and Zoubin Ghahramani, editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pages 333–340, Barbados, 6-8 January 2005. Society for Artificial Intelligence and Statistics. DL has been partly financed by Comunidad de Madrid (project PRO-MULTIDIS-CM, S-0505/TIC/0233), and by the Spanish government (CICYT project TEC2006-13514-C02-01 and researh grant JC200800219). MA and NL have been financed by a Google Research Award and EPSRC Grant No EP/F005687/1 “Gaussian Processes for Systems Identification with Applications in Systems Biology”. 16