Gaussian Process Regression

Random Process
A random process 𝑋𝑡 is completely characterized if the following is known.
𝑃((𝑋𝑡1
, ⋯ ⋯ , 𝑋𝑡 𝑘
) for any 𝐵, 𝑘, and 𝑡1, ⋯ ⋯ , 𝑡 𝑘
A random process (RP) (or stochastic process) is an infinite indexed collection
of random variables {𝑋(𝑡) ∶ 𝑡 ∈ 𝑇 }, defined over a common probability space.
(Functions are infinite dimensional vectors)
Note that given a random process, only ’ﬁnite-dimensional’ probabilities or
probability functions can be speciﬁed
𝐹𝑜𝑟 𝑡𝑖𝑚𝑒 𝑡 ∈ 𝑇 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑢𝑛𝑑𝑒𝑟𝑙𝑦𝑖𝑛𝑔 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝜔 ∈ Ω
𝑇 × Ω → ℝ

Gaussian Process
(Background)
Gaussian Process

Gaussian Process
A Gaussian process is a collection of random variables, any ﬁnite number of
which have a joint Gaussian distribution

Gaussian Process
* Multivariate and Joint distribution are basically synonyms.

Gaussian Process
Gaussian process and Gaussian process regression are diﬀerent.
Gaussian process regression: A nonparametric Bayesian
regression method using the properties of Gaussian processes.
Two views to interpret Gaussian process regression
• Weight-space view
• Function-space view

MLE vs MAP
Linear regression, 𝑓 𝑥 = 𝑤 𝑇 𝑥
𝐺𝑜𝑎𝑙 𝑜𝑓 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑚𝑖𝑛𝑚𝑖𝑧𝑒: 𝑦 − 𝑓(𝑥) 2
𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛: 𝑤 = (𝑋𝑋 𝑇
)−1
𝑋𝑦

MLE vs MAP
Another perspective of Bayesian linear regression :
Ridge regularization

MLE vs MAP
Return to Bayesian solution:
Mean value of 𝑥𝑤 𝑀𝐴𝑃

Gaussian Process regression
• Weight Space View
• Function Space View

Weight Space View

Weight Space View
𝑌 = ∅(𝑥) 𝑇 𝑤, 𝑤~𝑁 0, 𝐴−1 𝐼
𝐼 𝑚𝑒𝑎𝑛𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜
𝑐𝑜𝑙𝑖𝑛𝑒𝑎𝑟𝑡𝑦 𝑖𝑛 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 𝑜𝑓 𝑤
𝐸 𝑌 = 𝐸 ∅ 𝑥 𝑇 𝑤 = ∅ 𝑥 𝑇 𝐸 𝑤 = 0
Cov 𝑌 = 𝐸 𝑌 − 0 𝑌 − 0 = 𝐸 𝑌𝑌 𝑇
= ∅ 𝑥 𝑇 𝐸 𝑤𝑤 𝑇 ∅ 𝑥 = ∅ 𝑥 𝑇 𝐴−1∅ 𝑥
𝑘 𝑥𝑖, 𝑥𝑗 = 𝑒𝑥𝑝(− 𝑥𝑖 − 𝑥𝑗
2
)
𝑘 𝑋 𝑇, 𝑋 = ∅ 𝑥 𝑇∅ 𝑥 =
𝑘(𝑥1, 𝑥1) ⋯ 𝑘(𝑥1, 𝑥 𝑛)
⋮ ⋱ ⋮
𝑘(𝑥 𝑛, 𝑥1) ⋯ 𝑘(𝑥 𝑛, 𝑥 𝑛)
𝑤𝑒 𝑑𝑒𝑓𝑖𝑛𝑒 𝐾 = ∅ 𝑥 𝑇 𝐴−1∅ 𝑥
𝑃 𝑌 = 𝑁(𝑌ㅣ0, 𝐾)
𝑤ℎ𝑎𝑡 𝑑𝑜 𝑤𝑒 𝑔𝑒𝑡 𝑖𝑓 𝑛𝑒𝑤 𝑑𝑎𝑡𝑎 𝑥∗ 𝑎𝑝𝑝𝑒𝑎𝑟?
𝑃 𝑦∗ㅣ𝑥∗, 𝑋, 𝑌 = 𝑁(𝑦∗ㅣ? , ? )

Weight Space View
𝑤ℎ𝑎𝑡 𝑑𝑜 𝑤𝑒 𝑔𝑒𝑡 𝑖𝑓 𝑛𝑒𝑤 𝑑𝑎𝑡𝑎 𝑥∗ ℎ𝑎𝑠 𝑎𝑝𝑝𝑒𝑎𝑟?
𝑃 𝑦∗ㅣ𝑥∗, 𝑋, 𝑌 = 𝑁(𝑦∗ㅣ? , ? )
𝑊ℎ𝑎𝑡 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠?
𝑃 𝑦∗ㅣ𝑌

Weight Space View
∑ 𝑦∗,𝑌 = 𝐸 𝑦∗ 𝑌 𝑇 = ∅(𝑥∗) 𝑇 𝐴−1∅(𝑋) 𝑇 = 𝐶
∑ 𝑦∗,𝑦∗ = 𝐸 𝑦∗ 𝑦∗ 𝑇
= ∅(𝑥∗) 𝑇 𝐴−1∅(𝑥∗) 𝑇 =k
𝑃 𝑦∗ㅣ𝑌 = 𝑃 ∅(𝑥∗) 𝑇 𝑤ㅣ∅(𝑋) 𝑇 𝑤 = 𝑁(𝑦∗ㅣ𝜇 𝑦∗ + ∑ 𝑦∗,𝑌 ∑ 𝑌,𝑌
−1
𝑌 − 𝜇 𝑌 , ∑ 𝑦∗,𝑦∗ −∑ 𝑦∗,𝑌 ∑ 𝑌,𝑌
−1
∑ 𝑌,𝑦∗)
= 𝑁(𝑦∗ㅣ𝐶𝐾−1 𝑌, 𝑘 − 𝐶𝐾−1 𝐶 𝑇)
Var( 𝑌∗)?
𝑌∗ = [𝑦1, ⋯ , 𝑦𝑛, 𝑦 𝑛+1] 𝑇, → Var( 𝑌∗)= 𝑐𝑜𝑣 𝑛 𝐶 𝑇
𝐶 𝑘

Weight Space View
𝑊ℎ𝑎𝑡 𝑖𝑓 𝑌 = ∅(𝑥) 𝑇
𝑤 + 𝜀, 𝑤~𝑁 0, 𝐴−1
𝐼 , 𝜀~𝑁 0, 𝐵−1
𝐼
Cov 𝑌 = 𝐸 𝑌 − 0 𝑌 − 0 = 𝐸 𝑌𝑌 𝑇 = ∅ 𝑥 𝑇 𝐸 𝑤𝑤 𝑇 ∅ 𝑥 + 𝐸 2𝜀∅ 𝑥 𝑇 𝑤 + 𝜀𝜀 𝑇
= ∅ 𝑥 𝑇 𝐴−1∅ 𝑥 + 𝐵−1 𝐼 = K + 𝐵−1 𝐼

Function Space View

Amazing properties of Non-parametric method

References
C. E. Rasmussen and C. K. Williams. Gaussian processes for machine learning, volume 1.
MIT press Cambridge, 2006.

References
"Gaussian Process", Lectured by Professor Il-Chul Moon
-video link: https://youtu.be/RmN54ykspK4
Ian Goodfellow et al. Deep Learning, (2016)
Trevor Hastie et al. The Elements of Statistical Learning (2001)
Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17 by Kilian Weinberger
-video link: https://www.youtube.com/watch?v=R-NUdqxKjos&t=1000s
9.520/6.860S Statistical Learning Theory by Lorenzo Rosasco
http://www.mit.edu/~9.520/fall14/slides/class03/class03_rkhsPart1.pdf
-video link: https://www.youtube.com/watch?v=9-oxo_k69qs
Bayesian Deep Learning by Sungjoon Choi
-video link: https://www.edwith.org/bayesiandeeplearning/joinLectures/14426

Gaussian Process Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Gaussian Process Regression

Similar to Gaussian Process Regression (20)

More from SEMINARGROOT

More from SEMINARGROOT (20)

Recently uploaded

Recently uploaded (20)

Gaussian Process Regression