Brownian Motion and Gaussian Processes For Machine Learning
Brownian Motion and Gaussian Processes For Machine Learning
Independent Study:
Introduction to Stochastic Processes
Umang Srivastava
References:
Topics Covered:
Brownian Motion
Gaussian Regression
Brownian Motion:
Assumptions:
X0 = 0
The motion is completely random.
For any s1 ≤ t1 ≤ s2 ≤ t2 ≤ ... ≤ sn ≤ tn , the random variables
Xt1 − Xs1 , Xt2 − Xs2 , ..., Xtn − Xsn are independent.
The distribution of Xt − Xs depends only on t − s
Xt is a continuous function of t.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 3
Brownian Motion: Distribution
Indian Institute of Science Education and Research, Kolkata
Brownian Motion:
Define,
then, Mn → 0 as n → ∞
The only distribution that can be written as the sum of n iid random
variables such that the maximum of the variables goes to 0 is a normal
distribution.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 4
Brownian Motion: Distribution
Indian Institute of Science Education and Research, Kolkata
Sn = Y1 + Y2 + ... + Yn , where
Markov Property:
Tx = inf {t : Xt = x}
Define,
Yt = Xt+T − XT
Strong Markov Property: Yt is a Brownian Motion independent of FT
Xt has both positive and negative values for arbitrarily large values of
t and arbitrarily small values of t.
Z is a closed set.
Topologically, Z is similar to Cantor set.
The fractal dimension of Z is 1/2.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 10
Brownian Motion in Several Dimensions
Indian Institute of Science Education and Research, Kolkata
f (t, y) = Ey [f (Xt ]
R1
(2−d)
− |x|(2−d)
f (x) = φ(|x|) =
R1 − R2
(2−d) (2−d)
A = x ∈ Rd : Xt = x for some t
For d = 2, every open ball is visited by the Brownian Motion. Hence the
dimension of A is 2.
For d ≥ 3, Take a typical ball of diameter . By the calculations done for
proving transience, a ball of radius /2 around a point x (with |x| > /2) is
visited with probability ((/2)|x|)d−2 .
Hence, if is small and |x| is of order 1, the total number of balls needed
is about d−2 d = 2 .
Supervised Learning:
Gaussian Processes:
A Gaussian process is a stochastic process (a collection of random
variables indexed by time or space), such that every finite collection of
those random variables has a multivariate normal distribution, i.e.
every finite linear combination of them is normally distributed.
A Gaussian process is essentially a generalization of the Gaussian
probability distribution.
Whereas a probability distribution describes random variables which
are scalars or vectors (for multivariate distributions), a stochastic
process governs the properties of functions.
We can think of a function as a very long vector, each entry in the
vector specifying the function value f(x) at a particular input x.
There are two ways to interpret Gaussian process (GP) regression models
which will be discussion:
Weight-space view
Function-space view
Weight-space view:
w ∼ N (0, Σp )
We know,
and, Z
p(y|X ) = p(y|X , w)p(w)dw
Therefore,
To make predictions for a test case we average over all possible parameter
values, weighted by their posterior probability. Thus the predictive
distribution f∗ is,
f (x) = φ(x)T w,
where, K = ΦT Σp Φ
Define the kernel as, k(x,x’) = φ(x)T Σp φ(x’), where x and x’ are in either
the training or the test sets.
1/2
Define, ψ(x) = Σp φ(x) =⇒ k(x, x 0 ) = ψ(x)· ψ(x’)
where δpq is the Kronecker delta. Now, we can write the joint distribution
of the observed target values and the function values at the test locations
under the prior as
In general the value of yguess that minimizes the risk for the loss function
|yguess y∗ | is the median of p(y∗ |x∗ , D) while for the squared loss (yguess y∗ )2
squared error loss is the mean of this distribution. When the predictive
distribution is Gaussian the mean and the median coincide and for any
symmetric loss function and symmetric predictive distribution we always
get yguess as the mean of the predictive distribution.
Stochastic Processes:
Markov Property:
Chapman-Kolmogorov Equation:
If 0 < m, n < ∞,
P{Xm+n = y, Xm = z|X0 = x}
X
=
z∈S
If the chain is transient, then every state is visited only a finite number of
times.
If the chain is transient, then every state is visited only a finite number of
times.