SEEM5820/ECLT5920: Models and Decisions With Financial Applications
SEEM5820/ECLT5920: Models and Decisions With Financial Applications
Xuedong He
Spring, 2020
1 / 35
Review of Probability Theory: Random Variables
I A random variable is a mapping from a space of relevant states of
world/scenarios to real numbers
I An event is a subset of the space of relevant states of world
I A probability measure assigns a number between 0 and 1 to an
event, which represents the chance of occurrence of the event
I Probability measures can be subjective
I The (probability) distribution of a random variable represents the
chances that the random variable takes various values
2 / 35
Review of Probability Theory: Discrete Random Variable
I A discrete random variable takes countably many values only
I We can represent the distribution of a discrete random variable by
its probability mass function (pmf):
I Suppose a random variable X can possibly take the following distinct
values: x1 , x2 , . . . , xm .
I The pmf of X is represented by pi , the probability that X takes
value xi , i = 1, . . . , m.
We must have pi ≥ 0, i = 1, . . . , m, and m
P
i=1 pi = 1.
I
3 / 35
Review of Probability Theory: Continuous Random
Variable
I The probability that a continuous random variable takes any given
real number is infinitesimally small.
I We can represent the distribution of a continuous random variable
by its probability density function (pdf)
I The probability that a continuous random variable X with pdf
f (x), x ∈ R takes values in a small neighbourhood of a given real
number x is approximately equal to f (x) multiplied by the size of the
neighbourhood. R
I We must have f (x) ≥ 0, x ∈ R and R f (x)dx = 1.
I For any random variables X1 , . . . , Xn and any (nice) multi-variate
function g(x1 , . . . , xn ), g(X1 , . . . , Xn ) defines a new random
variable.
4 / 35
Review of Probability Theory: Mean Value
I The mean value/expected value/mean/expectation of a random
variable X, denoted as E[X], is the average of the values it takes,
weighted by the chances of the occurrences of these values
I For a discrete random variable X that takes distinct values:
x1 , x2 , . .P
. , xm with corresponding probabilities p1 , p2 , . . . , pm , its
mean is m i=1 xi pi .
I ForR a continuous random variable X with pdf f (x), x ∈ R, its mean
is R xp(x)dx.
I For any x ∈ R, then E[x] = x
I Linearity of E
5 / 35
Review of Probability Theory: Variance and Standard
Deviation
I The variance of a random variable X, denoted as var(X), is the
mean value of the distance between X and its mean, with the
distance measured by the square of the difference, namely
var(X) = E[(X − E[X])2 ].
I Variance formula
6 / 35
Review of Probability Theory: Distribution Function
I The (cumulative) distribution function (CDF) of a random
variable X, denoted as F (x), x ∈ R, is defined to be
F (x) := the prob. that X takes a value less than or equal to x,
x ∈ R.
I F is increasing, right-continuous and satisfies F (−∞) = 0 and
F (+∞) = 1.
I For a discrete random variable X that takes distinct values:
x1 , x2 , . . . , xm with corresponding probabilities p1 , p2 , . . . , pm , its
CDF is
Xm
F (x) = pi 1(−∞,xi ] (x), x ∈ R
i=1
8 / 35
Sample Mean and Sample Variance
I Suppose we are interested in estimating the mean and variance of
certain distribution.
I Suppose we collected n data points that are believed to be
independently sampled from this distribution: x1 , x2 ,. . . , xn
I How do we estimate the mean µ and variance σ 2 of the distribution?
I Sample mean:
n
1X
µ̂ := xi
n i=1
I Sample variance:
n
1 X
σ̂ 2 := (xi − µ̂)2
n − 1 i=1
9 / 35
Properties of Estimators
I Suppose we want to estimate certain parameter θ
I An estimator of this parameter is a function that maps samples to
estimates
I Samples are ex ante random and ex post known
I How to understand the randomness? Imagine that you have a
chance to repeat the sampling.
I Evaluation of an estimator should be made ex ante
I An estimator θ̂ is unbiased if its ex ante expectation is equal to the
true value θ, i.e., if E[θ̂] = θ.
I An estimator θ̂ is consistent if ex ante it converges to the true value
θ when the number of samples goes to infinity.
I How to evaluate an estimator? Mean-square error (MSE): ex ante
expected squared difference between the estimate and the true
value, i.e.,
10 / 35
Properties of Sample Mean and Sample Variance
I Both the sample mean and the sample variance are unbiased and
consistent
I MSE of the sample mean:
1 2
MSE(µ) = σ .
n
I If we know mean µ of the distribution and wants to estimate the
unknown variance σ 2 , then the sample variance is no longer
unbiased. Instead, we can use the following estimator
n
1X
(xi − µ)2 ,
n i=1
which is unbiased.
11 / 35
Example: Estimate the Mean and Variance of the S&P500
Index Return
I Suppose that you observe the following returns of the S&P 500
index in the course of 10 days:
Day −9 −8 −7 −6 −5 −4 −3 −2 −1 0
Return 7% −4% 11% 8% 3% 9% −21% 10% −9% −1%
I Estimate the mean of the daily return of the S&P 500 index
I Estimate the variance of the daily return of the S&P 500 index
I Suppose you know that the mean of the daily return of the S&P 500
index is 0. Estimate the variance of the daily return of the S&P 500
index
12 / 35
Time-Varying Enviroment
I In the analysis of the properties of the sample mean and sample
variance, we assume that the data points are sampled independently
from a given distribution
I In practice, many financial data are time series data
I People believe that the market is constantly changing, so the data
points cannot be i.i.d. samples of a given distribution
I Suppose that we want to estimated the expected return rate of a
stock tomorrow. Heuristically, we should assign more weight to more
recent return data of the stock.
13 / 35
EWMA Estimates
I Suppose we have n time-series return data points
rt−(n−1) , rt−(n−2) , . . . , rt , where t stands for the current time
I Exponentially weighted moving average (EWMA) mean:
n−1
1−δ X i
µ̂t = δ rt−i
1 − δ n i=0
I In EWMA mean, the weight for distant data points decays at a rate
of δ ∈ (0, 1].
I For an infinite series of data, i.e., when n → ∞, we have
∞
X
µ̂t = (1 − δ) δ i rt−i
i=0
14 / 35
EWMA Estimates (Cont’d)
I When the mean µ is known, the EWMA variance is
n−1
1−δ X i
σ̂t2 = δ (rt−i − µ)2
1 − δ n i=0
I Suppose that you use EWMA estimators to estimate the mean and
variance of the daily return of the index tomorrow
I You choose the decay factor to be 0.95.
I What are the estimates of the mean and variance?
16 / 35
Dollar Standard Deviation vs Return Standard Deviation
I Suppose that we estimated the standard deviation of the daily return
of a stock to be 3%, and we want to calculate the standard
deviation of the loss of $10,000 worth of the stock we are holding.
I Recall the relation between losses and return rates
17 / 35
Time Aggregation
I Suppose that you hold a portfolio and you estimated the standard
deviation of the daily change in the portfolio value to be σ1−day
I Now, you want to calculate the standard deviation of the change in
the portfolio value in 10 days.
I Assume that the daily changes in the portfolio value are i.i.d.
I Then, the standard deviation of the 10-day change in value is
√
σ10−day = 10σ1−day
I The above is called the square-root rule, which holds under the
i.i.d. assumption
18 / 35
Time-Varying Volatility
I In financial markets, the variance of the return rate of a stock is far
from constant over time.
I Volatility clustering: large changes tend to be followed by large
changes, of either sign, and small changes tend to be followed by
small changes.
I Example: Consider S&P 500 daily returns
I We first fit S&P 500 daily returns to an ARMA model
I We then find that the residuals are uncorrelated
I The absolute values of the residuals, however, are correlated, and so
are the squares of the residuals
I A model to account for volatility clustering: (G)ARCH model
I Prerequisites:
I Further knowledge in probability theory
I ARMA model
19 / 35
Review of Probability Theory: Paired Random Variables
I Consider random variables X and Y on some probability space
I The joint (probability) distribution of X and Y represents the
chances that the pair (X, Y ) takes various values (x, y) in R
I The distribution of X is also referred to as marginal distribution of
X in the context of paired random variables, and it can be derived
from the joint distribution of X and Y . So is the marginal
distribution of Y .
I The marginal distributions of X and Y , however, are insufficient to
determine the joint distribution of X and Y .
20 / 35
Review of Probability Theory: Paired Random Variables
(Cont’d)
I Discrete paired random variables: both X and Y can take
countably many values only, e.g., X can take values x1 , . . . , xn only
and Y can take values y1 , . . . , ym only.
I In this case, the joint distribution of (X, Y ) can be represented by
the joint pmf f (x, y), x ∈ {x1 , . . . , xn }, y ∈ {y1 , . . . , ym }, where
f (xi , yj ) represents the probability of the event that X takes value
xi and Y takes value yj .
I The marginal pmf of X, denoted as f X (x), x ∈ {x1 , . . . , xn }, can be
computed from the joint pmf of X and Y :
m
X
f X (xi ) = f (xi , yj ).
j=1
21 / 35
Review of Probability Theory: Paired Random Variables
(Cont’d)
I Continuous paired random variables: for any (x, y) ∈ R, the
probability that (X, Y ) takes values in a neighbourhood of (x, y) is
approximately equal to f (x, y) times the size of the neighbourhood,
where f (x, y), (x, y) ∈ R is called the joint pdf of X and Y .
I X is also a continuous random variable and its pmf, denoted as
f X (x), x ∈ R, can be computed from the joint pdf of X and Y :
Z +∞
f X (x) = f (x, y)dy.
−∞
22 / 35
Review of Probability Theory: Paired Random Variables
(Cont’d)
I Mixed paired random variables: X is a continuous random
variable and Y is a discrete random variable taking distinct values
y1 , . . . , ym only. X and Y have a joint probability mass-density
function (pmdf) of (X, Y ), denoted as
f (x, y), x ∈ R, y ∈ {y1 , . . . , ym }, so that the probability that Y = yi
and X takes values in a neighbourhood of any given x ∈ R is
f (x, yi ) times the size of the neighbourhood
I We can derive the marginal distributions of X and Y from the joint
pmdf.
I In general, we can consider multiple random variables X1 , . . . , Xn
and define their joint distribution to represent the probability that
the random vector (X1 , . . . , Xd )> take various values
(x1 , . . . , xd )> ∈ Rd
23 / 35
Example: Paired Random Variables
I Consider random variables X and Y whose joint pmf is given as
follows:
X = \Y = 0 1 2
0 .07 .22 .15
1 .30 .14 .12
What are the marginal distributions of X and Y ?
I Consider random variables X and Y with joint pdf
24 / 35
Conditional Distribution
I Given any two random variables X and Y , the conditional
distribution of Y given X takes certain fixed value x (or takes
values in a small neighbourhood of x) represents the chances of
occurrences of various values of Y given that we already observe
that X takes value x (or takes values in a small neighbourhood of x)
I For discrete paired random variables and X and Y with joint pmf
f (x, y), x ∈ {x1 , . . . , xn }, y ∈ {y1 , . . . , ym }, the conditional
distribution of Y given X takes value xi is represented by the
conditional pmf of Y given X = x ∈ {x1 , . . . , xn }:
f Y |X (y|x), y ∈ {y1 , . . . , ym }
f Y |X (y|x), y∈R
I E[Y |X = x] is a function of x
27 / 35
Covariance and Correlation
I The covariance of two random variables X and Y are defined to be
28 / 35
Independence
I Random variable X and Y are independent if for any function f
and g, f (X) and g(Y ) are uncorrelated, i.e.,
E[f (X)g(Y )] = E[f (X)]E[g(Y )]
I Equivalent characterization of independence:
I For any sets A and B in R, the joint probability that X takes values
in A and Y takes values in B is equal to the marginal probability
that X takes values in A times the marginal probability that Y takes
values in B.
I For continuous, discrete, or mixed paired random variables, the above
is equivalent to
f (x, y) = f X (x)f Y (y), ∀x, y
29 / 35
Information Set
I Usually we gain more information as time goes by, and at each time,
we make decisions, such as computing expectation of certain random
payoff in the future, conditional on the information at that time.
I Example: Toss two fair coins sequentially. Then, the set of all
possible outcomes is
30 / 35
Information Set (Cont’d)
I Without knowing the outcome of tossing these two coins, the
expectation of X is
E[X] = 45.
I Suppose that we observe the outcome of tossing the first coin, i.e.,
know the information F, the expectation of X, denoted as
E[X | F], is
(
80, first coin toss yields heads
E[X | F] =
10, first coin toss yields tails.
31 / 35
Law of Iterated Expectation
I Consider a space of states of world with certain probability measure.
I Consider some information set F on this space
I The conditional expectation of X given information F is
denoted as E[X | F].
I Anything that is known given information F is treated as a constant
in the calculation of the conditional expectation.
I E[X | F] is known given information F, but is unknown before
observing information F
I The Law of Iterated Expectation:
E[X] = E E[X | F] .
32 / 35
Law of Iterated Expectation (Cont’d)
I For any two random variables X and Y , E[X|Y ] denotes the
conditional expectation of X, given the information generated by
observing Y .
I Example: Y in the coin toss example
I Thus, we have the Law of Iterated Expectation:
E[X] = E E[X | Y ] .
E[X | Y ] = h(Y ),
where
33 / 35
Example: Multivariate Normal Random Variables
I Random variables X1 , . . . , Xd follow multivariate normal
distribution with mean vector µ ∈ Rd and covariance matrix Σ,
where Σ is a d × d symmetric, positive definite matrix, if their joint
pdf is given by
−d −1/2 1 > −1
f (x) = (2π) |Σ|
2 exp − (x − µ) Σ (x − µ) , x ∈ Rd .
2
I Denote X := (X1 , . . . , Xd )> . Properties of multivariate normal
distribution
I E[X] = µ, cov(X, X) = Σ
I For any k × d matrix A and b ∈ Rk , AX + b is also a multivariate
normal distribution with mean vector
Aµ + b
and covariance matrix
AΣA>
I X and Y with a bivariate joint normal distribution are independent
if and only if they are uncorrelated.
34 / 35
References
35 / 35