Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
88 views

Brownian Motion and Gaussian Processes For Machine Learning

This document summarizes an independent study on stochastic processes conducted by Umang Srivastava under the supervision of Dr. Anirvan Chakraborty. The study covered topics such as Brownian motion and Gaussian regression. Brownian motion is introduced as a continuous-time, continuous-state stochastic process with independent and normally distributed increments. Some key properties of Brownian motion like its distribution, Markov property, strong Markov property, and behavior in multiple dimensions are discussed. Recurrence and transience of Brownian motion paths are also examined.

Uploaded by

Umang Srivastava
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Brownian Motion and Gaussian Processes For Machine Learning

This document summarizes an independent study on stochastic processes conducted by Umang Srivastava under the supervision of Dr. Anirvan Chakraborty. The study covered topics such as Brownian motion and Gaussian regression. Brownian motion is introduced as a continuous-time, continuous-state stochastic process with independent and normally distributed increments. Some key properties of Brownian motion like its distribution, Markov property, strong Markov property, and behavior in multiple dimensions are discussed. Recurrence and transience of Brownian motion paths are also examined.

Uploaded by

Umang Srivastava
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Indian Institute of Science Education and Research, Kolkata

Independent Study:
Introduction to Stochastic Processes

Umang Srivastava

Supervisor: Dr. Anirvan Chakraborty

References:

Introduction to Stochastic Processes by Gregory F. Lawler


Gaussian Processes for Machine Learning by C. E. Rasmussen & C.K.I.
Williams

Umang Srivastava Independent Study: Introduction to Stochastic Processes 1


Overview & Introduction
Indian Institute of Science Education and Research, Kolkata

Topics Covered:
Brownian Motion
Gaussian Regression

Umang Srivastava Independent Study: Introduction to Stochastic Processes 2


Brownian Motion
Indian Institute of Science Education and Research, Kolkata

Brownian Motion:

Brownian motion is a stochastic process with both continuous time and


continuous state space.. Let Xt represent the position of a particle at time
t. Here, t takes values in the non negative real numbers and Xt takes
values in the real line.

Assumptions:
X0 = 0
The motion is completely random.
For any s1 ≤ t1 ≤ s2 ≤ t2 ≤ ... ≤ sn ≤ tn , the random variables
Xt1 − Xs1 , Xt2 − Xs2 , ..., Xtn − Xsn are independent.
The distribution of Xt − Xs depends only on t − s
Xt is a continuous function of t.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 3
Brownian Motion: Distribution
Indian Institute of Science Education and Research, Kolkata

Brownian Motion:

The above assumptions uniquely describe the process up to a scaling


constant.
For the case t = 1, we can write

X1 = [X1/n − X0 ] + [X2/n − X1/n ] + ... + [Xn/n − X(n−1)/n ]

Define,

Mn = max|X1/n − X0 |, |X2/n − X1/n |, ..., |Xn/n − X(n−1)/n |

then, Mn → 0 as n → ∞
The only distribution that can be written as the sum of n iid random
variables such that the maximum of the variables goes to 0 is a normal
distribution.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 4
Brownian Motion: Distribution
Indian Institute of Science Education and Research, Kolkata

From the assumptions, we get


For a Brownian motion with variance parameter σ 2 , for any s < t, the
random variable Xt − Xs has a normal distribution with mean 0 and
variance (t - s)σ 2 .

Brownian motion can also be constructed as a limit of random walks. Let


Sn be an unbiased random walk on integers, we have

Sn = Y1 + Y2 + ... + Yn , where

P{Yi = 1} = P{Yi = −1} = 1/2


Instead of increments of size 1, set increments of size ∆t = 1/N where

N ∈ Z. Define, Wk∆t = aN Sk , where aN = N so that Var (W1 ) = 1
(N)

Umang Srivastava Independent Study: Introduction to Stochastic Processes 5


Bownian Motion
Indian Institute of Science Education and Research, Kolkata

As N → ∞, the discrete approximation approaches a continuous-time,


continuous-space process. By central limit theorem,

W1 = Sn / N
(N)

approaches a normal distribution with mean 0 and variance 1 and the


distribution of Wt approaches a normal distribution with mean 0 and
(N)

variance t. The limiting process here can be shown to be a Brownian


Motion

Additionally, it can also be shown that the path of a Brownian motion is


nowhere differentiable.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 6


Brownian Motion: Markov Property
Indian Institute of Science Education and Research, Kolkata

Markov Property:

Let Xt be a standard Brownian Motion and Ft represent the information


contained in Xs , s ≤ t. Now

E (Xt |Fs ) = E (Xs |Fs ) + E (Xt − Xs |Fs ).

=⇒ E (Xt |Fs ) = Xs = E (Xt |Xs )


Therefore, in order to predict Xt given all the information up through time
s, it suffices to consider only the value of the Brownian motion at time s.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 7


Brownian Motion: Strong Markov Property
Indian Institute of Science Education and Research, Kolkata

Stopping Time: A random variable T taking values in [0, ∞] is a


stopping time if for each t the event 1{T ≤t} is measurable with respect to
Ft . Usually stopping times are of the form,

Tx = inf {t : Xt = x}

Define,
Yt = Xt+T − XT
Strong Markov Property: Yt is a Brownian Motion independent of FT

Umang Srivastava Independent Study: Introduction to Stochastic Processes 8


Brownian Motion
Indian Institute of Science Education and Research, Kolkata

Reflection Principle: Suppose Xt is a Brownian motion with variance


parameter σ 2 starting at a and a < b. Then for any t > 0,

P{Xs ≥ b for some 0 ≤ s ≤ t} = 2P{X b|X0 = a}


Z ∞t ≥ √
2 2
=2 (1/ 2πtσ 2 )e −(x−a) /2σ t dx
b

Scaling Properties: Let Xt be a standard Brownian motion. Then,


If a > 0, and Yi = a −1/2 Xat , then Yi is a standard Brownian motion.
If Xt is a standard Brownian motion and Yi = tX1/t , then Yi is a
standard Brownian motion.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 9


Brownian Motion: Zero Set
Indian Institute of Science Education and Research, Kolkata

Zero Set: Define,


Z = {t : Xt = 0}
Using the reflection principle, we can find the probability that a standard
Brownian motion crosses the x-axis sometime between times 1 and t, i.e.,

P{Z ∩ [1, t] 6= φ} = 1 − (2/π)arctan(1/ t − 1)

As t → ∞ the quantity on the RHS → 1. Therefore the Brownian motion


eventually returns to the origin infinitely often with probability 1.

Xt has both positive and negative values for arbitrarily large values of
t and arbitrarily small values of t.
Z is a closed set.
Topologically, Z is similar to Cantor set.
The fractal dimension of Z is 1/2.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 10
Brownian Motion in Several Dimensions
Indian Institute of Science Education and Research, Kolkata

Suppose Xt1 , ..., Xtd are independent (one-dimensional) standard Brownian


motions. The vector-valued stochastic process

Xt = (Xt1 , ..., Xtd )

is called a standard d-dimensional Brownian motion. Xt defined as above


satisfies the following:
X0 = 0
For any s1 ≤ t1 ≤ s2 ≤ t2 ≤ ... ≤ sn ≤ tn , the random variables
Xt1 − Xs1 , Xt2 − Xs2 , ..., Xtn − Xsn are independent.
The random variable Xt − Xs has a joint normal distribution with
mean 0 and covariance matrix (t-s)I.
Xt is a continuous function of t.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 11


Brownian Motion in Several Dimensions
Indian Institute of Science Education and Research, Kolkata

Brownian motion is closely related to the theory of diffusion. Suppose that


a large number of particles are distributed in Rd according to a density
f(y). Let f(t, y) denote the density of the particles at time t so that f(0, y)
= f(y). We can shot that

f (t, y) = Ey [f (Xt ]

where Ey is used to denote expectations of Xt assuming X0 = y.


We can also show that f follows the differential equation,
∂f 1
= ∆f
∂t 2
where ∆ denotes the Laplacian.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 12


Brownian Motion: Recurrence & Transience
Indian Institute of Science Education and Research, Kolkata

We have already shown that if Xt is a standard (one-dimensional)


Brownian motion, then Xt is recurrent at 0. Now suppose Xt is a standard
d-dimensional Brownian motion. Let 0 < R1 < R2 < ∞ and let
B = B(R1, R2) be the annulus

B = {x ∈ Rd : R1 < |x| < R2 }

Suppose x ∈ B. Let f (x) = f (x, R1 , R2 ) be the probability that a standard


Brownian motion starting at x hits the sphere {y : |y| = R2 } before it hits
the sphere y : |y| = R1 .

Umang Srivastava Independent Study: Introduction to Stochastic Processes 13


Brownian Motion: Recurrence & Transience
Indian Institute of Science Education and Research, Kolkata

We can show that, for d ≥ 3,

R1
(2−d)
− |x|(2−d)
f (x) = φ(|x|) =
R1 − R2
(2−d) (2−d)

Therefore, by taking lim R2 → ∞, we can say that for d ≥ 3 Brownian


Motion is Transient. For d = 2, the Brownian motion in returns arbitrarily
close to 0 infinitely often, but never actually returns to 0. Therefore the
Brownian motion in two dimensions is neighborhood recurrent but not
point recurrent.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 14


Brownian Motion: Fractal Nature
Indian Institute of Science Education and Research, Kolkata

Let Xt be a standard d-dimensional Brownian motion and let A represent


the (random) set of points visited by the path,

A = x ∈ Rd : Xt = x for some t

For d = 2, every open ball is visited by the Brownian Motion. Hence the
dimension of A is 2.
For d ≥ 3, Take a typical ball of diameter . By the calculations done for
proving transience, a ball of radius /2 around a point x (with |x| > /2) is
visited with probability ((/2)|x|)d−2 .
Hence, if  is small and |x| is of order 1, the total number of balls needed
is about d−2 d = 2 .

Therefore, the path of a d-dimensional Brownian motion (d > 2) has


fractal dimension two.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 15


Indian Institute of Science Education and Research, Kolkata

Gaussian Processes for Machine Learning

Umang Srivastava Independent Study: Introduction to Stochastic Processes 16


Gaussian Processes
Indian Institute of Science Education and Research, Kolkata

Supervised Learning:

Supervised learning is the problem of learning input-output mappings from


empirical data (the training dataset). We denote the input as x, and the
output (or target) as y. Supervised learning can be divided into two types
of problems:
Regression: Regression is concerned with prediction of continuous
quantities
Classification: The outputs for a classification problem are discrete
class labels,

Umang Srivastava Independent Study: Introduction to Stochastic Processes 17


Supervised Learning: Approaches
Indian Institute of Science Education and Research, Kolkata

Suppose we have a dataset D of n observations, D = (xi , yi )|i = 1, ..., n.


Given this training data we wish to make predictions for new inputs x∗.
There are two most common approaches:
Restricting the class of functions that we consider, for example by
only considering linear functions of the input.
Giving a prior probability to every possible function, where higher
probabilities are given to functions that we consider to be more likely.
Each approach has its drawbacks:
For the first approach, if we are using a model based on a certain
class of functions and the target function is not well modelled by this
class, then the predictions will be poor.
For the second approach appears there are an uncountably infinite set
of possible functions. How are we going to compute with this set in
finite time! Gaussian processes help us in this case.
Umang Srivastava Independent Study: Introduction to Stochastic Processes 18
Gaussian Processes
Indian Institute of Science Education and Research, Kolkata

Gaussian Processes:
A Gaussian process is a stochastic process (a collection of random
variables indexed by time or space), such that every finite collection of
those random variables has a multivariate normal distribution, i.e.
every finite linear combination of them is normally distributed.
A Gaussian process is essentially a generalization of the Gaussian
probability distribution.
Whereas a probability distribution describes random variables which
are scalars or vectors (for multivariate distributions), a stochastic
process governs the properties of functions.
We can think of a function as a very long vector, each entry in the
vector specifying the function value f(x) at a particular input x.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 19


Regression: Weight-space View
Indian Institute of Science Education and Research, Kolkata

There are two ways to interpret Gaussian process (GP) regression models
which will be discussion:
Weight-space view
Function-space view

Weight-space view:

We have a training set D of n observations, D = (xi , yi )|i = 1, ..., n, where


x denotes an input vector of dimension D and y denotes a scalar output or
target. The column vector inputs for all n cases are aggregated in the
D×n design matrix design matrix X, and the targets are collected in the
vector y, so we can write D = (X , y).

Umang Srivastava Independent Study: Introduction to Stochastic Processes 20


Regression: Weight-space view
Indian Institute of Science Education and Research, Kolkata

Standard Linear Model:

Bayesian analysis of the standard linear regression model with Gaussian


noise:
f (x) = xT w, y = f (x) + ,
where x is the input vector, w is a vector of weights (parameters) of the
linear model, f is the function value and y is the observed target value.
We have assumed that the observed values y differ from the function
values f(x) by additive noise, and  ∼ N (0, σn2 ).
We get,

Umang Srivastava Independent Study: Introduction to Stochastic Processes 21


Regression: Weight-space view
Indian Institute of Science Education and Research, Kolkata

In the Bayesian formalism we need to specify a prior over the parameters.

w ∼ N (0, Σp )

We know,

and, Z
p(y|X ) = p(y|X , w)p(w)dw

Therefore,

where A = σn−2 XX T + Σ−1


p

Umang Srivastava Independent Study: Introduction to Stochastic Processes 22


Regression: Weight-space view
Indian Institute of Science Education and Research, Kolkata

To make predictions for a test case we average over all possible parameter
values, weighted by their posterior probability. Thus the predictive
distribution f∗ is,

Projections of Inputs into Feature Space:


We introduce the function �(x) which maps a D-dimensional input vector x
into an N dimensional feature space. Let Φ(X ) be the aggregation of
columns φ(x) for all cases in the training set. Now,

f (x) = φ(x)T w,

where the vector of parameters now has length N.


Umang Srivastava Independent Study: Introduction to Stochastic Processes 23
Regression: Weight-space view
Indian Institute of Science Education and Research, Kolkata

Thus the predictive distribution becomes

We can rewrite the equation,

where, K = ΦT Σp Φ
Define the kernel as, k(x,x’) = φ(x)T Σp φ(x’), where x and x’ are in either
the training or the test sets.
1/2
Define, ψ(x) = Σp φ(x) =⇒ k(x, x 0 ) = ψ(x)· ψ(x’)

Umang Srivastava Independent Study: Introduction to Stochastic Processes 24


Regression: Function-space view
Indian Institute of Science Education and Research, Kolkata

A Gaussian process is completely specified by its mean function and


covariance function. We define mean function m(x) and the covariance
function k(x,x’) of a process f(x) as,

We will write a Gaussian process as

f (x) ∼ GP (m(x), k(x, x’))

A simple example of a Gaussian process can be obtained from the


Bayesian linear regression model.
Here, we use the squared exponential covariance function.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 25


Regression: Function-space view
Indian Institute of Science Education and Research, Kolkata

The specification of the covariance function implies a distribution over


function. To see this, we can draw samples from the distribution of
functions evaluated at any number of points X and write out the
corresponding covariance matrix. Then we generate a random Gaussian
vector with this covariance matrix,

Prediction with Noise-free Observations: Consider the simple


special case where the observations are noise free, The joint distribution of
the training outputs, f, and the test outputs f∗ is

Umang Srivastava Independent Study: Introduction to Stochastic Processes 26


Regression: Function-space view
Indian Institute of Science Education and Research, Kolkata

To get the posterior distribution over functions we need to restrict this


joint prior distribution to contain only those functions which agree with
the observed data points. This strategy is computationally very efficient.
In probabilistic terms this operation is extremely simple, corresponding to
conditioning the joint Gaussian prior distribution on the observations

Function values f∗ (corresponding to test inputs) X∗ ) can be sampled from


the joint posterior distribution by evaluating the mean and covariance
matrix and then generating the samples.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 27


Continuous Time Markov Chains: General Case
Indian Institute of Science Education and Research, Kolkata

Prediction using Noisy Observations:


The prior on the noisy observations becomes,

where δpq is the Kronecker delta. Now, we can write the joint distribution
of the observed target values and the function values at the test locations
under the prior as

Umang Srivastava Independent Study: Introduction to Stochastic Processes 28


Indian Institute of Science Education and Research, Kolkata

Deriving the conditional distribution we arrive at the key predictive


equations for Gaussian process regression,

Let, K = K (X , X ) and K∗ = K (X , X∗ ). In the case that there is only one


test point x∗ , let k(x∗ ) = k∗ , we can rewrite previous equations as

Umang Srivastava Independent Study: Introduction to Stochastic Processes 29


Indian Institute of Science Education and Research, Kolkata

Decision Theory for Regression:


To discuss the optimality of our prediction, we need a loss function
L(ytrue , yguess ). The loss function captures the consequences of making a
specific choice, given an actual true state.
We minimize the expected loss by averaging w.r.t. our model’s opinion as
to what the truth might be, so

Therefore our best guess is,

Umang Srivastava Independent Study: Introduction to Stochastic Processes 30


Indian Institute of Science Education and Research, Kolkata

In general the value of yguess that minimizes the risk for the loss function
|yguess y∗ | is the median of p(y∗ |x∗ , D) while for the squared loss (yguess y∗ )2
squared error loss is the mean of this distribution. When the predictive
distribution is Gaussian the mean and the median coincide and for any
symmetric loss function and symmetric predictive distribution we always
get yguess as the mean of the predictive distribution.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 31


Appendix
Indian Institute of Science Education and Research, Kolkata

Stochastic Processes:

A stochastic process is a collection of random variables Xt indexed by time.


Time can be considered to be either a subset of the non-negative integers
{0, 1, 2, ... } or a subset of [0, ∞), the non-negative real numbers.

Markov Property:

The Markov property states that to make predictions of the behavior of a


system, only the present state of the system is important but not how it
arrived at that state.

P{Xn = in |X0 = i0 , ..., Xn−1 = in−1 } = P{Xn = in |Xn−1 = in−1 }.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 32


Appendix
Indian Institute of Science Education and Research, Kolkata

Time Homogeneous Markov Chain:

If the transition probabilities do not depend on time, the system is called


to be a time homogeneous system.

P{Xn = in |X0 = i0 , ..., Xn−1 = in−1 } = p(in−1 , in ).

for some function p : S x S → [0, 1]

Umang Srivastava Independent Study: Introduction to Stochastic Processes 33


Appendix
Indian Institute of Science Education and Research, Kolkata

Chapman-Kolmogorov Equation:

If 0 < m, n < ∞,

pm+n (x + y) = P{Xm+n = y|X0 = x}

P{Xm+n = y, Xm = z|X0 = x}
X
=
z∈S

pm (x, z)pn (z, y)


X
=
z∈S

Umang Srivastava Independent Study: Introduction to Stochastic Processes 34


Appendix
Indian Institute of Science Education and Research, Kolkata

Recurrent and Transient Chains:

Xn is called a recurrent chain if for each state,

P{Xn = x for infinitely many n} = 1

If the chain is transient, then every state is visited only a finite number of
times.
If the chain is transient, then every state is visited only a finite number of
times.

Umang Srivastava Independent Study: Introduction to Stochastic Processes 35


Appendix
Indian Institute of Science Education and Research, Kolkata

Matrix Inversion Lemma:

Umang Srivastava Independent Study: Introduction to Stochastic Processes 36


Indian Institute of Science Education and Research, Kolkata

Umang Srivastava Independent Study: Introduction to Stochastic Processes 37

You might also like